All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/6] NFSD: Improvements for the NFSv4.2 READ_PLUS operation
@ 2022-05-16 20:35 Anna.Schumaker
  2022-05-16 20:35 ` [PATCH v1 1/6] SUNRPC: Introduce xdr_stream_move_segment() Anna.Schumaker
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Anna.Schumaker @ 2022-05-16 20:35 UTC (permalink / raw)
  To: steved, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

The main motivation for this patchset is fixing generic/091 and
generic/263 with READ_PLUS. These tests appear to be failing due to
files getting modified in the middle of reply encoding. Attempts to lock
the file for the entire encode result in a deadlock, since llseek() and
read() both need the file lock.

The solution is to read everything from disk at once, and then check if
each buffer page is all zeroes or not. As a bonus, this lets us support
READ_PLUS hole segments on filesystems that don't track sparse files.
Additionally, this also solves the performance issues I hit when testing
using btrfs on a virtual machine.

I created a wiki page with the results of my performance testing here:
    https://wiki.linux-nfs.org/wiki/index.php/Read_Plus_May_2022

These should probably have some soak time in linux-next, so let's target
them for the Linux 5.20 (6.0?) merge window rather than rushing to get
them into 5.19. As far as ordering goes, these patches should probably
go in before the related client changes as the client will also be
changed to make use of the xdr_stream_move_segment() function.

Thoughts?
Anna


Anna Schumaker (6):
  SUNRPC: Introduce xdr_stream_move_segment()
  SUNRPC: Introduce xdr_encode_double()
  SUNRPC: Introduce xdr_buf_trim_head()
  SUNRPC: Introduce xdr_buf_nth_page_address()
  SUNRPC: Export xdr_buf_pagecount()
  NFSD: Repeal and replace the READ_PLUS implementation

 fs/nfsd/nfs4xdr.c          | 202 +++++++++++++++++++------------------
 include/linux/sunrpc/xdr.h |   6 ++
 net/sunrpc/xdr.c           | 102 +++++++++++++++++++
 3 files changed, 210 insertions(+), 100 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v1 1/6] SUNRPC: Introduce xdr_stream_move_segment()
  2022-05-16 20:35 [PATCH v1 0/6] NFSD: Improvements for the NFSv4.2 READ_PLUS operation Anna.Schumaker
@ 2022-05-16 20:35 ` Anna.Schumaker
  2022-05-16 20:35 ` [PATCH v1 2/6] SUNRPC: Introduce xdr_encode_double() Anna.Schumaker
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Anna.Schumaker @ 2022-05-16 20:35 UTC (permalink / raw)
  To: steved, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

I do this by creating an xdr subsegment for the range we will be
operating over. This lets me shift data to the correct place without
potentially overwriting anything outside the region.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 include/linux/sunrpc/xdr.h |  2 ++
 net/sunrpc/xdr.c           | 59 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 4417f667c757..04d47a096f8a 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -262,6 +262,8 @@ extern unsigned int xdr_align_data(struct xdr_stream *, unsigned int offset, uns
 extern unsigned int xdr_expand_hole(struct xdr_stream *, unsigned int offset, unsigned int length);
 extern bool xdr_stream_subsegment(struct xdr_stream *xdr, struct xdr_buf *subbuf,
 				  unsigned int len);
+extern unsigned int xdr_stream_move_segment(struct xdr_stream *xdr, unsigned int offset,
+					    unsigned int target, unsigned int length);
 
 /**
  * xdr_set_scratch_buffer - Attach a scratch buffer for decoding data.
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index df194cc07035..26559c08f68f 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -775,6 +775,34 @@ static void xdr_buf_pages_shift_left(const struct xdr_buf *buf,
 	xdr_buf_tail_copy_left(buf, 0, len - buf->page_len, shift);
 }
 
+static void xdr_buf_head_shift_left(const struct xdr_buf *buf,
+				    unsigned int base, unsigned int len,
+				    unsigned int shift)
+{
+	const struct kvec *head = buf->head;
+	unsigned int bytes;
+
+	if (!shift || !len)
+		return;
+
+	if (shift > base) {
+		bytes = (shift - base);
+		if (bytes >= len)
+			return;
+		base += bytes;
+		len -= bytes;
+	}
+
+	if (base < head->iov_len) {
+		bytes = min_t(unsigned int, len, head->iov_len - base);
+		memmove(head->iov_base + (base - shift),
+			head->iov_base + base, bytes);
+		base += bytes;
+		len -= bytes;
+	}
+	xdr_buf_pages_shift_left(buf, base - head->iov_len, len, shift);
+}
+
 /**
  * xdr_shrink_bufhead
  * @buf: xdr_buf
@@ -1671,6 +1699,37 @@ bool xdr_stream_subsegment(struct xdr_stream *xdr, struct xdr_buf *subbuf,
 }
 EXPORT_SYMBOL_GPL(xdr_stream_subsegment);
 
+/**
+ * xdr_stream_move_segment - Move part of a stream to another position
+ * @xdr: the source xdr_stream
+ * @offset: the source offset of the segment
+ * @target: the target offset of the segment
+ * @length: the number of bytes to move
+ *
+ * Moves @length bytes from @offset to @target in the xdr_stream, overwriting
+ * anything in its space. Returns the number of bytes in the segment.
+ */
+unsigned int xdr_stream_move_segment(struct xdr_stream *xdr, unsigned int offset,
+				     unsigned int target, unsigned int length)
+{
+	struct xdr_buf buf;
+	unsigned int shift;
+
+	if (offset < target) {
+		shift = target - offset;
+		if (xdr_buf_subsegment(xdr->buf, &buf, offset, shift + length) < 0)
+			return 0;
+		xdr_buf_head_shift_right(&buf, 0, length, shift);
+	} else if (offset > target) {
+		shift = offset - target;
+		if (xdr_buf_subsegment(xdr->buf, &buf, target, shift + length) < 0)
+			return 0;
+		xdr_buf_head_shift_left(&buf, shift, length, shift);
+	}
+	return length;
+}
+EXPORT_SYMBOL_GPL(xdr_stream_move_segment);
+
 /**
  * xdr_buf_trim - lop at most "len" bytes off the end of "buf"
  * @buf: buf to be trimmed
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v1 2/6] SUNRPC: Introduce xdr_encode_double()
  2022-05-16 20:35 [PATCH v1 0/6] NFSD: Improvements for the NFSv4.2 READ_PLUS operation Anna.Schumaker
  2022-05-16 20:35 ` [PATCH v1 1/6] SUNRPC: Introduce xdr_stream_move_segment() Anna.Schumaker
@ 2022-05-16 20:35 ` Anna.Schumaker
  2022-05-16 20:35 ` [PATCH v1 3/6] SUNRPC: Introduce xdr_buf_trim_head() Anna.Schumaker
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Anna.Schumaker @ 2022-05-16 20:35 UTC (permalink / raw)
  To: steved, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

This is similar to xdr_encode_word(), but instead lets us encode a
64-bit wide value into the xdr_buf at the given offset.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 include/linux/sunrpc/xdr.h | 1 +
 net/sunrpc/xdr.c           | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 04d47a096f8a..e502fe576813 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -196,6 +196,7 @@ extern int read_bytes_from_xdr_buf(const struct xdr_buf *, unsigned int, void *,
 extern int write_bytes_to_xdr_buf(const struct xdr_buf *, unsigned int, void *, unsigned int);
 
 extern int xdr_encode_word(const struct xdr_buf *, unsigned int, u32);
+extern int xdr_encode_double(const struct xdr_buf *, unsigned int, u64);
 extern int xdr_decode_word(const struct xdr_buf *, unsigned int, u32 *);
 
 struct xdr_array2_desc;
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 26559c08f68f..87526bce9e15 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -1856,6 +1856,14 @@ int xdr_encode_word(const struct xdr_buf *buf, unsigned int base, u32 obj)
 }
 EXPORT_SYMBOL_GPL(xdr_encode_word);
 
+int xdr_encode_double(const struct xdr_buf *buf, unsigned int base, u64 obj)
+{
+	__be64 raw = cpu_to_be64(obj);
+
+	return write_bytes_to_xdr_buf(buf, base, &raw, sizeof(obj));
+}
+EXPORT_SYMBOL_GPL(xdr_encode_double);
+
 /* Returns 0 on success, or else a negative error code. */
 static int xdr_xcode_array2(const struct xdr_buf *buf, unsigned int base,
 			    struct xdr_array2_desc *desc, int encode)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v1 3/6] SUNRPC: Introduce xdr_buf_trim_head()
  2022-05-16 20:35 [PATCH v1 0/6] NFSD: Improvements for the NFSv4.2 READ_PLUS operation Anna.Schumaker
  2022-05-16 20:35 ` [PATCH v1 1/6] SUNRPC: Introduce xdr_stream_move_segment() Anna.Schumaker
  2022-05-16 20:35 ` [PATCH v1 2/6] SUNRPC: Introduce xdr_encode_double() Anna.Schumaker
@ 2022-05-16 20:35 ` Anna.Schumaker
  2022-05-16 20:35 ` [PATCH v1 4/6] SUNRPC: Introduce xdr_buf_nth_page_address() Anna.Schumaker
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Anna.Schumaker @ 2022-05-16 20:35 UTC (permalink / raw)
  To: steved, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

The READ_PLUS operation uses a 32-bit length field for encoding a DATA
segment, but 64-bit length field for encoding a HOLE segment. When
setting up our reply buffer, we need to reserve enough space to encode
a HOLE before reading the file data and use this function if the first
segment turns out to be DATA.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 include/linux/sunrpc/xdr.h |  1 +
 net/sunrpc/xdr.c           | 17 +++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index e502fe576813..344c820484ba 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -191,6 +191,7 @@ xdr_adjust_iovec(struct kvec *iov, __be32 *p)
 extern void xdr_shift_buf(struct xdr_buf *, size_t);
 extern void xdr_buf_from_iov(const struct kvec *, struct xdr_buf *);
 extern int xdr_buf_subsegment(const struct xdr_buf *, struct xdr_buf *, unsigned int, unsigned int);
+extern void xdr_buf_trim_head(struct xdr_buf *, unsigned int);
 extern void xdr_buf_trim(struct xdr_buf *, unsigned int);
 extern int read_bytes_from_xdr_buf(const struct xdr_buf *, unsigned int, void *, unsigned int);
 extern int write_bytes_to_xdr_buf(const struct xdr_buf *, unsigned int, void *, unsigned int);
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 87526bce9e15..3ecc444b27be 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -1730,6 +1730,23 @@ unsigned int xdr_stream_move_segment(struct xdr_stream *xdr, unsigned int offset
 }
 EXPORT_SYMBOL_GPL(xdr_stream_move_segment);
 
+/**
+ * xdr_buf_trim_head - lop at most "len" bytes off the end of "buf"->head
+ * @buf: buf to be trimmed
+ * @len: number of bytes to reduce "buf"->head by
+ *
+ * Trim an xdr_buf->head by the given number of bytes by fixing up the lengths.
+ * Note that it's possible that we'll trim less than that amount if the
+ *  xdr_buf->head is too small.
+ */
+void xdr_buf_trim_head(struct xdr_buf *buf, unsigned int len)
+{
+	size_t trim = min_t(size_t, buf->head[0].iov_len, len);
+	buf->head[0].iov_len -= trim;
+	buf->len -= trim;
+}
+EXPORT_SYMBOL_GPL(xdr_buf_trim_head);
+
 /**
  * xdr_buf_trim - lop at most "len" bytes off the end of "buf"
  * @buf: buf to be trimmed
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v1 4/6] SUNRPC: Introduce xdr_buf_nth_page_address()
  2022-05-16 20:35 [PATCH v1 0/6] NFSD: Improvements for the NFSv4.2 READ_PLUS operation Anna.Schumaker
                   ` (2 preceding siblings ...)
  2022-05-16 20:35 ` [PATCH v1 3/6] SUNRPC: Introduce xdr_buf_trim_head() Anna.Schumaker
@ 2022-05-16 20:35 ` Anna.Schumaker
  2022-05-16 20:35 ` [PATCH v1 5/6] SUNRPC: Export xdr_buf_pagecount() Anna.Schumaker
  2022-05-16 20:35 ` [PATCH v1 6/6] NFSD: Repeal and replace the READ_PLUS implementation Anna.Schumaker
  5 siblings, 0 replies; 11+ messages in thread
From: Anna.Schumaker @ 2022-05-16 20:35 UTC (permalink / raw)
  To: steved, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

For getting a pointer to the memory address represented by the nth page,
along with the length of the data on that page.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 include/linux/sunrpc/xdr.h |  2 ++
 net/sunrpc/xdr.c           | 17 +++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 344c820484ba..b375b284afbe 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -137,6 +137,8 @@ void	xdr_inline_pages(struct xdr_buf *, unsigned int,
 			 struct page **, unsigned int, unsigned int);
 void	xdr_terminate_string(const struct xdr_buf *, const u32);
 size_t	xdr_buf_pagecount(const struct xdr_buf *buf);
+char	*xdr_buf_nth_page_address(const struct xdr_buf *buf, unsigned int n,
+				  unsigned int *len);
 int	xdr_alloc_bvec(struct xdr_buf *buf, gfp_t gfp);
 void	xdr_free_bvec(struct xdr_buf *buf);
 
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 3ecc444b27be..0b378c80f5f5 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -140,6 +140,23 @@ size_t xdr_buf_pagecount(const struct xdr_buf *buf)
 	return (buf->page_base + buf->page_len + PAGE_SIZE - 1) >> PAGE_SHIFT;
 }
 
+char *xdr_buf_nth_page_address(const struct xdr_buf *buf, unsigned int n,
+			       unsigned int *len)
+{
+	unsigned int pgbase = buf->page_base + (n * PAGE_SIZE);
+	struct page **pages = buf->pages;
+	struct page **page;
+
+	if (n >= xdr_buf_pagecount(buf))
+		return NULL;
+
+	page = pages + (pgbase >> PAGE_SHIFT);
+	pgbase &= ~PAGE_MASK;
+	*len = min_t(size_t, PAGE_SIZE, buf->page_len - (n * PAGE_SIZE));
+	return page_address(*page) + pgbase;
+}
+EXPORT_SYMBOL_GPL(xdr_buf_nth_page_address);
+
 int
 xdr_alloc_bvec(struct xdr_buf *buf, gfp_t gfp)
 {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v1 5/6] SUNRPC: Export xdr_buf_pagecount()
  2022-05-16 20:35 [PATCH v1 0/6] NFSD: Improvements for the NFSv4.2 READ_PLUS operation Anna.Schumaker
                   ` (3 preceding siblings ...)
  2022-05-16 20:35 ` [PATCH v1 4/6] SUNRPC: Introduce xdr_buf_nth_page_address() Anna.Schumaker
@ 2022-05-16 20:35 ` Anna.Schumaker
  2022-05-16 20:35 ` [PATCH v1 6/6] NFSD: Repeal and replace the READ_PLUS implementation Anna.Schumaker
  5 siblings, 0 replies; 11+ messages in thread
From: Anna.Schumaker @ 2022-05-16 20:35 UTC (permalink / raw)
  To: steved, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

The NFS server will need this for iterating over pages in a READ_PLUS
reply

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 net/sunrpc/xdr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 0b378c80f5f5..d71c90552fa2 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -139,6 +139,7 @@ size_t xdr_buf_pagecount(const struct xdr_buf *buf)
 		return 0;
 	return (buf->page_base + buf->page_len + PAGE_SIZE - 1) >> PAGE_SHIFT;
 }
+EXPORT_SYMBOL_GPL(xdr_buf_pagecount);
 
 char *xdr_buf_nth_page_address(const struct xdr_buf *buf, unsigned int n,
 			       unsigned int *len)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v1 6/6] NFSD: Repeal and replace the READ_PLUS implementation
  2022-05-16 20:35 [PATCH v1 0/6] NFSD: Improvements for the NFSv4.2 READ_PLUS operation Anna.Schumaker
                   ` (4 preceding siblings ...)
  2022-05-16 20:35 ` [PATCH v1 5/6] SUNRPC: Export xdr_buf_pagecount() Anna.Schumaker
@ 2022-05-16 20:35 ` Anna.Schumaker
  2022-05-16 21:37   ` Chuck Lever III
  5 siblings, 1 reply; 11+ messages in thread
From: Anna.Schumaker @ 2022-05-16 20:35 UTC (permalink / raw)
  To: steved, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

Rather than relying on the underlying filesystem to tell us where hole
and data segments are through vfs_llseek(), let's instead do the hole
compression ourselves. This has a few advantages over the old
implementation:

1) A single call to the underlying filesystem through nfsd_readv() means
   the file can't change from underneath us in the middle of encoding.
2) A single call to the underlying filestem also means that the
   underlying filesystem only needs to synchronize cached and on-disk
   data one time instead of potentially many speeding up the reply.
3) Hole support for filesystems that don't support SEEK_HOLE and SEEK_DATA

I also included an optimization where we can cut down on the amount of
memory being shifed around by doing the compression as (hole, data)
pairs.

This patch not only fixes xfstests generic/091 and generic/263
for me but the "-g quick" group tests also finish about a minute faster.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfsd/nfs4xdr.c | 202 +++++++++++++++++++++++-----------------------
 1 file changed, 102 insertions(+), 100 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index da92e7d2ab6a..973b4a1e7990 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4731,81 +4731,121 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
 	return nfserr;
 }
 
+struct read_plus_segment {
+	enum data_content4 type;
+	unsigned long offset;
+	unsigned long length;
+	unsigned int page_pos;
+};
+
 static __be32
-nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
-			    struct nfsd4_read *read,
-			    unsigned long *maxcount, u32 *eof,
-			    loff_t *pos)
+nfsd4_read_plus_readv(struct nfsd4_compoundres *resp, struct nfsd4_read *read,
+		      unsigned long *maxcount, u32 *eof)
 {
 	struct xdr_stream *xdr = resp->xdr;
-	struct file *file = read->rd_nf->nf_file;
-	int starting_len = xdr->buf->len;
-	loff_t hole_pos;
-	__be32 nfserr;
-	__be32 *p, tmp;
-	__be64 tmp64;
-
-	hole_pos = pos ? *pos : vfs_llseek(file, read->rd_offset, SEEK_HOLE);
-	if (hole_pos > read->rd_offset)
-		*maxcount = min_t(unsigned long, *maxcount, hole_pos - read->rd_offset);
-	*maxcount = min_t(unsigned long, *maxcount, (xdr->buf->buflen - xdr->buf->len));
-
-	/* Content type, offset, byte count */
-	p = xdr_reserve_space(xdr, 4 + 8 + 4);
-	if (!p)
-		return nfserr_resource;
+	unsigned int starting_len = xdr->buf->len;
+	__be32 nfserr, zero = xdr_zero;
+	int pad;
 
+	/* xdr_reserve_space_vec() switches us to the xdr->pages */
 	read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, *maxcount);
 	if (read->rd_vlen < 0)
 		return nfserr_resource;
 
-	nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
-			    resp->rqstp->rq_vec, read->rd_vlen, maxcount, eof);
+	nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, read->rd_nf->nf_file,
+			    read->rd_offset, resp->rqstp->rq_vec, read->rd_vlen,
+			    maxcount, eof);
 	if (nfserr)
 		return nfserr;
-	xdr_truncate_encode(xdr, starting_len + 16 + xdr_align_size(*maxcount));
+	xdr_truncate_encode(xdr, starting_len + xdr_align_size(*maxcount));
 
-	tmp = htonl(NFS4_CONTENT_DATA);
-	write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
-	tmp64 = cpu_to_be64(read->rd_offset);
-	write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
-	tmp = htonl(*maxcount);
-	write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
-
-	tmp = xdr_zero;
-	write_bytes_to_xdr_buf(xdr->buf, starting_len + 16 + *maxcount, &tmp,
-			       xdr_pad_size(*maxcount));
+	pad = (*maxcount&3) ? 4 - (*maxcount&3) : 0;
+	write_bytes_to_xdr_buf(xdr->buf, starting_len + *maxcount, &zero, pad);
 	return nfs_ok;
 }
 
+static void
+nfsd4_encode_read_plus_segment(struct xdr_stream *xdr,
+			       struct read_plus_segment *segment,
+			       unsigned int *bufpos, unsigned int *segments)
+{
+	struct xdr_buf *buf = xdr->buf;
+
+	xdr_encode_word(buf, *bufpos, segment->type);
+	xdr_encode_double(buf, *bufpos + 4, segment->offset);
+
+	if (segment->type == NFS4_CONTENT_HOLE) {
+		xdr_encode_double(buf, *bufpos + 12, segment->length);
+		*bufpos += 4 + 8 + 8;
+	} else {
+		size_t align = xdr_align_size(segment->length);
+		xdr_encode_word(buf, *bufpos + 12, segment->length);
+		if (*segments == 0)
+			xdr_buf_trim_head(buf, 4);
+
+		xdr_stream_move_segment(xdr,
+				buf->head[0].iov_len + segment->page_pos,
+				*bufpos + 16, align);
+		*bufpos += 4 + 8 + 4 + align;
+	}
+
+	*segments += 1;
+}
+
 static __be32
-nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
-			    struct nfsd4_read *read,
-			    unsigned long *maxcount, u32 *eof)
+nfsd4_encode_read_plus_segments(struct nfsd4_compoundres *resp,
+				struct nfsd4_read *read,
+				unsigned int *segments, u32 *eof)
 {
-	struct file *file = read->rd_nf->nf_file;
-	loff_t data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
-	loff_t f_size = i_size_read(file_inode(file));
-	unsigned long count;
-	__be32 *p;
+	enum data_content4 pagetype;
+	struct read_plus_segment segment;
+	struct xdr_stream *xdr = resp->xdr;
+	unsigned long offset = read->rd_offset;
+	unsigned int bufpos = xdr->buf->len;
+	unsigned long maxcount;
+	unsigned int pagelen, i = 0;
+	char *vpage, *p;
+	__be32 nfserr;
 
-	if (data_pos == -ENXIO)
-		data_pos = f_size;
-	else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE))
-		return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size);
-	count = data_pos - read->rd_offset;
-
-	/* Content type, offset, byte count */
-	p = xdr_reserve_space(resp->xdr, 4 + 8 + 8);
-	if (!p)
+	/* enough space for a HOLE segment before we switch to the pages */
+	if (!xdr_reserve_space(xdr, 4 + 8 + 8))
 		return nfserr_resource;
+	xdr_commit_encode(xdr);
 
-	*p++ = htonl(NFS4_CONTENT_HOLE);
-	p = xdr_encode_hyper(p, read->rd_offset);
-	p = xdr_encode_hyper(p, count);
+	maxcount = min_t(unsigned long, read->rd_length,
+			 (xdr->buf->buflen - xdr->buf->len));
 
-	*eof = (read->rd_offset + count) >= f_size;
-	*maxcount = min_t(unsigned long, count, *maxcount);
+	nfserr = nfsd4_read_plus_readv(resp, read, &maxcount, eof);
+	if (nfserr)
+		return nfserr;
+
+	while (maxcount > 0) {
+		vpage = xdr_buf_nth_page_address(xdr->buf, i, &pagelen);
+		pagelen = min_t(unsigned int, pagelen, maxcount);
+		if (!vpage || pagelen == 0)
+			break;
+		p = memchr_inv(vpage, 0, pagelen);
+		pagetype = (p == NULL) ? NFS4_CONTENT_HOLE : NFS4_CONTENT_DATA;
+
+		if (pagetype != segment.type || i == 0) {
+			if (likely(i > 0)) {
+				nfsd4_encode_read_plus_segment(xdr, &segment,
+							      &bufpos, segments);
+				offset += segment.length;
+			}
+			segment.type = pagetype;
+			segment.offset = offset;
+			segment.length = pagelen;
+			segment.page_pos = i * PAGE_SIZE;
+		} else
+			segment.length += pagelen;
+
+		maxcount -= pagelen;
+		i++;
+	}
+
+	nfsd4_encode_read_plus_segment(xdr, &segment, &bufpos, segments);
+	xdr_truncate_encode(xdr, bufpos);
 	return nfs_ok;
 }
 
@@ -4813,69 +4853,31 @@ static __be32
 nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 		       struct nfsd4_read *read)
 {
-	unsigned long maxcount, count;
 	struct xdr_stream *xdr = resp->xdr;
-	struct file *file;
 	int starting_len = xdr->buf->len;
-	int last_segment = xdr->buf->len;
-	int segments = 0;
-	__be32 *p, tmp;
-	bool is_data;
-	loff_t pos;
+	unsigned int segments = 0;
 	u32 eof;
 
 	if (nfserr)
 		return nfserr;
-	file = read->rd_nf->nf_file;
 
 	/* eof flag, segment count */
-	p = xdr_reserve_space(xdr, 4 + 4);
-	if (!p)
+	if (!xdr_reserve_space(xdr, 4 + 4))
 		return nfserr_resource;
 	xdr_commit_encode(xdr);
 
-	maxcount = min_t(unsigned long, read->rd_length,
-			 (xdr->buf->buflen - xdr->buf->len));
-	count    = maxcount;
-
-	eof = read->rd_offset >= i_size_read(file_inode(file));
+	eof = read->rd_offset >= i_size_read(file_inode(read->rd_nf->nf_file));
 	if (eof)
 		goto out;
 
-	pos = vfs_llseek(file, read->rd_offset, SEEK_HOLE);
-	is_data = pos > read->rd_offset;
-
-	while (count > 0 && !eof) {
-		maxcount = count;
-		if (is_data)
-			nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof,
-						segments == 0 ? &pos : NULL);
-		else
-			nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof);
-		if (nfserr)
-			goto out;
-		count -= maxcount;
-		read->rd_offset += maxcount;
-		is_data = !is_data;
-		last_segment = xdr->buf->len;
-		segments++;
-	}
-
+	nfserr = nfsd4_encode_read_plus_segments(resp, read, &segments, &eof);
 out:
-	if (nfserr && segments == 0)
+	if (nfserr)
 		xdr_truncate_encode(xdr, starting_len);
 	else {
-		if (nfserr) {
-			xdr_truncate_encode(xdr, last_segment);
-			nfserr = nfs_ok;
-			eof = 0;
-		}
-		tmp = htonl(eof);
-		write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
-		tmp = htonl(segments);
-		write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
+		xdr_encode_word(xdr->buf, starting_len,     eof);
+		xdr_encode_word(xdr->buf, starting_len + 4, segments);
 	}
-
 	return nfserr;
 }
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 6/6] NFSD: Repeal and replace the READ_PLUS implementation
  2022-05-16 20:35 ` [PATCH v1 6/6] NFSD: Repeal and replace the READ_PLUS implementation Anna.Schumaker
@ 2022-05-16 21:37   ` Chuck Lever III
  2022-05-17 13:39     ` Anna Schumaker
  0 siblings, 1 reply; 11+ messages in thread
From: Chuck Lever III @ 2022-05-16 21:37 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Steve Dickson, Linux NFS Mailing List



> On May 16, 2022, at 4:35 PM, Anna.Schumaker@Netapp.com wrote:
> 
> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> 
> Rather than relying on the underlying filesystem to tell us where hole
> and data segments are through vfs_llseek(), let's instead do the hole
> compression ourselves. This has a few advantages over the old
> implementation:
> 
> 1) A single call to the underlying filesystem through nfsd_readv() means
>   the file can't change from underneath us in the middle of encoding.
> 2) A single call to the underlying filestem also means that the
>   underlying filesystem only needs to synchronize cached and on-disk
>   data one time instead of potentially many speeding up the reply.
> 3) Hole support for filesystems that don't support SEEK_HOLE and SEEK_DATA
> 
> I also included an optimization where we can cut down on the amount of
> memory being shifed around by doing the compression as (hole, data)
> pairs.
> 
> This patch not only fixes xfstests generic/091 and generic/263
> for me but the "-g quick" group tests also finish about a minute faster.
> 
> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Hi Anna, some general comments on the NFSD pieces:

- Doesn't READ have the same issue that the file bytes can't change
  until the Reply is fully sent? I would think the payload data
  can't change until there is no possibility of a transport-layer
  retransmit. Also, special restrictions like this should be
  documented via code comments, IMHO.

- David Howells might be interested in this approach, as he had some
  concerns about compressing files in place that would appear to
  apply also to READ_PLUS. Please copy David on the next round of
  these patches.

- Can you say why the READ_PLUS decoder and encoder operates on
  struct xdr_buf instead of struct xdr_stream? I'd prefer xdr_stream
  if you can. You could get rid of write_bytes_to_xdr_buf,
  xdr_encode_word and xdr_encode_double and use the stream-based
  helpers.

- Instead of using naked integers, please use multiples of XDR_UNIT.


> ---
> fs/nfsd/nfs4xdr.c | 202 +++++++++++++++++++++++-----------------------
> 1 file changed, 102 insertions(+), 100 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index da92e7d2ab6a..973b4a1e7990 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -4731,81 +4731,121 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
> 	return nfserr;
> }
> 
> +struct read_plus_segment {
> +	enum data_content4 type;
> +	unsigned long offset;
> +	unsigned long length;
> +	unsigned int page_pos;
> +};
> +
> static __be32
> -nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
> -			    struct nfsd4_read *read,
> -			    unsigned long *maxcount, u32 *eof,
> -			    loff_t *pos)
> +nfsd4_read_plus_readv(struct nfsd4_compoundres *resp, struct nfsd4_read *read,
> +		      unsigned long *maxcount, u32 *eof)
> {
> 	struct xdr_stream *xdr = resp->xdr;
> -	struct file *file = read->rd_nf->nf_file;
> -	int starting_len = xdr->buf->len;
> -	loff_t hole_pos;
> -	__be32 nfserr;
> -	__be32 *p, tmp;
> -	__be64 tmp64;
> -
> -	hole_pos = pos ? *pos : vfs_llseek(file, read->rd_offset, SEEK_HOLE);
> -	if (hole_pos > read->rd_offset)
> -		*maxcount = min_t(unsigned long, *maxcount, hole_pos - read->rd_offset);
> -	*maxcount = min_t(unsigned long, *maxcount, (xdr->buf->buflen - xdr->buf->len));
> -
> -	/* Content type, offset, byte count */
> -	p = xdr_reserve_space(xdr, 4 + 8 + 4);
> -	if (!p)
> -		return nfserr_resource;
> +	unsigned int starting_len = xdr->buf->len;
> +	__be32 nfserr, zero = xdr_zero;
> +	int pad;
> 
> +	/* xdr_reserve_space_vec() switches us to the xdr->pages */
> 	read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, *maxcount);
> 	if (read->rd_vlen < 0)
> 		return nfserr_resource;
> 
> -	nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
> -			    resp->rqstp->rq_vec, read->rd_vlen, maxcount, eof);
> +	nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, read->rd_nf->nf_file,
> +			    read->rd_offset, resp->rqstp->rq_vec, read->rd_vlen,
> +			    maxcount, eof);
> 	if (nfserr)
> 		return nfserr;
> -	xdr_truncate_encode(xdr, starting_len + 16 + xdr_align_size(*maxcount));
> +	xdr_truncate_encode(xdr, starting_len + xdr_align_size(*maxcount));
> 
> -	tmp = htonl(NFS4_CONTENT_DATA);
> -	write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
> -	tmp64 = cpu_to_be64(read->rd_offset);
> -	write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
> -	tmp = htonl(*maxcount);
> -	write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
> -
> -	tmp = xdr_zero;
> -	write_bytes_to_xdr_buf(xdr->buf, starting_len + 16 + *maxcount, &tmp,
> -			       xdr_pad_size(*maxcount));
> +	pad = (*maxcount&3) ? 4 - (*maxcount&3) : 0;
> +	write_bytes_to_xdr_buf(xdr->buf, starting_len + *maxcount, &zero, pad);
> 	return nfs_ok;
> }
> 
> +static void
> +nfsd4_encode_read_plus_segment(struct xdr_stream *xdr,
> +			       struct read_plus_segment *segment,
> +			       unsigned int *bufpos, unsigned int *segments)
> +{
> +	struct xdr_buf *buf = xdr->buf;
> +
> +	xdr_encode_word(buf, *bufpos, segment->type);
> +	xdr_encode_double(buf, *bufpos + 4, segment->offset);
> +
> +	if (segment->type == NFS4_CONTENT_HOLE) {
> +		xdr_encode_double(buf, *bufpos + 12, segment->length);
> +		*bufpos += 4 + 8 + 8;
> +	} else {
> +		size_t align = xdr_align_size(segment->length);
> +		xdr_encode_word(buf, *bufpos + 12, segment->length);
> +		if (*segments == 0)
> +			xdr_buf_trim_head(buf, 4);
> +
> +		xdr_stream_move_segment(xdr,
> +				buf->head[0].iov_len + segment->page_pos,
> +				*bufpos + 16, align);

The term "segment" is overloaded in general, and in particular
here. xdr_stream_move_subsegment() might be a less confusing
name for this helper.

However I don't immediately see a benefit from working with the
xdr_buf here. Can't you do these operations entirely with the
passed-in xdr_stream?


> +		*bufpos += 4 + 8 + 4 + align;
> +	}
> +
> +	*segments += 1;
> +}
> +
> static __be32
> -nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
> -			    struct nfsd4_read *read,
> -			    unsigned long *maxcount, u32 *eof)
> +nfsd4_encode_read_plus_segments(struct nfsd4_compoundres *resp,
> +				struct nfsd4_read *read,
> +				unsigned int *segments, u32 *eof)
> {
> -	struct file *file = read->rd_nf->nf_file;
> -	loff_t data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
> -	loff_t f_size = i_size_read(file_inode(file));
> -	unsigned long count;
> -	__be32 *p;
> +	enum data_content4 pagetype;
> +	struct read_plus_segment segment;
> +	struct xdr_stream *xdr = resp->xdr;
> +	unsigned long offset = read->rd_offset;
> +	unsigned int bufpos = xdr->buf->len;
> +	unsigned long maxcount;
> +	unsigned int pagelen, i = 0;
> +	char *vpage, *p;
> +	__be32 nfserr;
> 
> -	if (data_pos == -ENXIO)
> -		data_pos = f_size;
> -	else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE))
> -		return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size);
> -	count = data_pos - read->rd_offset;
> -
> -	/* Content type, offset, byte count */
> -	p = xdr_reserve_space(resp->xdr, 4 + 8 + 8);
> -	if (!p)
> +	/* enough space for a HOLE segment before we switch to the pages */
> +	if (!xdr_reserve_space(xdr, 4 + 8 + 8))
> 		return nfserr_resource;
> +	xdr_commit_encode(xdr);
> 
> -	*p++ = htonl(NFS4_CONTENT_HOLE);
> -	p = xdr_encode_hyper(p, read->rd_offset);
> -	p = xdr_encode_hyper(p, count);
> +	maxcount = min_t(unsigned long, read->rd_length,
> +			 (xdr->buf->buflen - xdr->buf->len));
> 
> -	*eof = (read->rd_offset + count) >= f_size;
> -	*maxcount = min_t(unsigned long, count, *maxcount);
> +	nfserr = nfsd4_read_plus_readv(resp, read, &maxcount, eof);
> +	if (nfserr)
> +		return nfserr;
> +
> +	while (maxcount > 0) {
> +		vpage = xdr_buf_nth_page_address(xdr->buf, i, &pagelen);
> +		pagelen = min_t(unsigned int, pagelen, maxcount);
> +		if (!vpage || pagelen == 0)
> +			break;
> +		p = memchr_inv(vpage, 0, pagelen);
> +		pagetype = (p == NULL) ? NFS4_CONTENT_HOLE : NFS4_CONTENT_DATA;
> +
> +		if (pagetype != segment.type || i == 0) {
> +			if (likely(i > 0)) {
> +				nfsd4_encode_read_plus_segment(xdr, &segment,
> +							      &bufpos, segments);
> +				offset += segment.length;
> +			}
> +			segment.type = pagetype;
> +			segment.offset = offset;
> +			segment.length = pagelen;
> +			segment.page_pos = i * PAGE_SIZE;
> +		} else
> +			segment.length += pagelen;
> +
> +		maxcount -= pagelen;
> +		i++;
> +	}
> +
> +	nfsd4_encode_read_plus_segment(xdr, &segment, &bufpos, segments);
> +	xdr_truncate_encode(xdr, bufpos);
> 	return nfs_ok;
> }
> 
> @@ -4813,69 +4853,31 @@ static __be32
> nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> 		       struct nfsd4_read *read)
> {
> -	unsigned long maxcount, count;
> 	struct xdr_stream *xdr = resp->xdr;
> -	struct file *file;
> 	int starting_len = xdr->buf->len;
> -	int last_segment = xdr->buf->len;
> -	int segments = 0;
> -	__be32 *p, tmp;
> -	bool is_data;
> -	loff_t pos;
> +	unsigned int segments = 0;
> 	u32 eof;
> 
> 	if (nfserr)
> 		return nfserr;
> -	file = read->rd_nf->nf_file;
> 
> 	/* eof flag, segment count */
> -	p = xdr_reserve_space(xdr, 4 + 4);
> -	if (!p)
> +	if (!xdr_reserve_space(xdr, 4 + 4))
> 		return nfserr_resource;
> 	xdr_commit_encode(xdr);
> 
> -	maxcount = min_t(unsigned long, read->rd_length,
> -			 (xdr->buf->buflen - xdr->buf->len));
> -	count    = maxcount;
> -
> -	eof = read->rd_offset >= i_size_read(file_inode(file));
> +	eof = read->rd_offset >= i_size_read(file_inode(read->rd_nf->nf_file));
> 	if (eof)
> 		goto out;
> 
> -	pos = vfs_llseek(file, read->rd_offset, SEEK_HOLE);
> -	is_data = pos > read->rd_offset;
> -
> -	while (count > 0 && !eof) {
> -		maxcount = count;
> -		if (is_data)
> -			nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof,
> -						segments == 0 ? &pos : NULL);
> -		else
> -			nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof);
> -		if (nfserr)
> -			goto out;
> -		count -= maxcount;
> -		read->rd_offset += maxcount;
> -		is_data = !is_data;
> -		last_segment = xdr->buf->len;
> -		segments++;
> -	}
> -
> +	nfserr = nfsd4_encode_read_plus_segments(resp, read, &segments, &eof);
> out:
> -	if (nfserr && segments == 0)
> +	if (nfserr)
> 		xdr_truncate_encode(xdr, starting_len);
> 	else {
> -		if (nfserr) {
> -			xdr_truncate_encode(xdr, last_segment);
> -			nfserr = nfs_ok;
> -			eof = 0;
> -		}
> -		tmp = htonl(eof);
> -		write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
> -		tmp = htonl(segments);
> -		write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
> +		xdr_encode_word(xdr->buf, starting_len,     eof);
> +		xdr_encode_word(xdr->buf, starting_len + 4, segments);
> 	}
> -
> 	return nfserr;
> }
> 
> -- 
> 2.36.1
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 6/6] NFSD: Repeal and replace the READ_PLUS implementation
  2022-05-16 21:37   ` Chuck Lever III
@ 2022-05-17 13:39     ` Anna Schumaker
  2022-05-17 13:53       ` Chuck Lever III
  0 siblings, 1 reply; 11+ messages in thread
From: Anna Schumaker @ 2022-05-17 13:39 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Anna Schumaker, Steve Dickson, Linux NFS Mailing List

Hi Chuck,

On Mon, May 16, 2022 at 11:16 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On May 16, 2022, at 4:35 PM, Anna.Schumaker@Netapp.com wrote:
> >
> > From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> >
> > Rather than relying on the underlying filesystem to tell us where hole
> > and data segments are through vfs_llseek(), let's instead do the hole
> > compression ourselves. This has a few advantages over the old
> > implementation:
> >
> > 1) A single call to the underlying filesystem through nfsd_readv() means
> >   the file can't change from underneath us in the middle of encoding.
> > 2) A single call to the underlying filestem also means that the
> >   underlying filesystem only needs to synchronize cached and on-disk
> >   data one time instead of potentially many speeding up the reply.
> > 3) Hole support for filesystems that don't support SEEK_HOLE and SEEK_DATA
> >
> > I also included an optimization where we can cut down on the amount of
> > memory being shifed around by doing the compression as (hole, data)
> > pairs.
> >
> > This patch not only fixes xfstests generic/091 and generic/263
> > for me but the "-g quick" group tests also finish about a minute faster.
> >
> > Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
>
> Hi Anna, some general comments on the NFSD pieces:

Thanks for looking over this!

>
> - Doesn't READ have the same issue that the file bytes can't change
>   until the Reply is fully sent? I would think the payload data
>   can't change until there is no possibility of a transport-layer
>   retransmit. Also, special restrictions like this should be
>   documented via code comments, IMHO.

The problem with the current READ_PLUS code is that the first segment
in the reply could already be stale data by the time the last segment
has been encoded. So we're returning a view of the file from multiple
points in time. READ only does the single call into the vfs, so it
doesn't have the same race that's causing issues.

>
> - David Howells might be interested in this approach, as he had some
>   concerns about compressing files in place that would appear to
>   apply also to READ_PLUS. Please copy David on the next round of
>   these patches.

Will do!

>
> - Can you say why the READ_PLUS decoder and encoder operates on
>   struct xdr_buf instead of struct xdr_stream? I'd prefer xdr_stream
>   if you can. You could get rid of write_bytes_to_xdr_buf,
>   xdr_encode_word and xdr_encode_double and use the stream-based
>   helpers.

I think my reason was that write_bytes_to_xdr_buf() and
xdr_encode_word() already took the xdr_buf, so I was matching the
style. I can change everything over to xdr_stream easily enough, since
the xdr_bufs in question are always attached to the xdr_stream. Would
you prefer it if I rework the existing functions as part of this patch
series, or as a separate prerequisite series?

>
> - Instead of using naked integers, please use multiples of XDR_UNIT.

I hadn't encountered XDR_UNIT before, but I can do that.

>
>
> > ---
> > fs/nfsd/nfs4xdr.c | 202 +++++++++++++++++++++++-----------------------
> > 1 file changed, 102 insertions(+), 100 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > index da92e7d2ab6a..973b4a1e7990 100644
> > --- a/fs/nfsd/nfs4xdr.c
> > +++ b/fs/nfsd/nfs4xdr.c
> > @@ -4731,81 +4731,121 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
> >       return nfserr;
> > }
> >
> > +struct read_plus_segment {
> > +     enum data_content4 type;
> > +     unsigned long offset;
> > +     unsigned long length;
> > +     unsigned int page_pos;
> > +};
> > +
> > static __be32
> > -nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
> > -                         struct nfsd4_read *read,
> > -                         unsigned long *maxcount, u32 *eof,
> > -                         loff_t *pos)
> > +nfsd4_read_plus_readv(struct nfsd4_compoundres *resp, struct nfsd4_read *read,
> > +                   unsigned long *maxcount, u32 *eof)
> > {
> >       struct xdr_stream *xdr = resp->xdr;
> > -     struct file *file = read->rd_nf->nf_file;
> > -     int starting_len = xdr->buf->len;
> > -     loff_t hole_pos;
> > -     __be32 nfserr;
> > -     __be32 *p, tmp;
> > -     __be64 tmp64;
> > -
> > -     hole_pos = pos ? *pos : vfs_llseek(file, read->rd_offset, SEEK_HOLE);
> > -     if (hole_pos > read->rd_offset)
> > -             *maxcount = min_t(unsigned long, *maxcount, hole_pos - read->rd_offset);
> > -     *maxcount = min_t(unsigned long, *maxcount, (xdr->buf->buflen - xdr->buf->len));
> > -
> > -     /* Content type, offset, byte count */
> > -     p = xdr_reserve_space(xdr, 4 + 8 + 4);
> > -     if (!p)
> > -             return nfserr_resource;
> > +     unsigned int starting_len = xdr->buf->len;
> > +     __be32 nfserr, zero = xdr_zero;
> > +     int pad;
> >
> > +     /* xdr_reserve_space_vec() switches us to the xdr->pages */
> >       read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, *maxcount);
> >       if (read->rd_vlen < 0)
> >               return nfserr_resource;
> >
> > -     nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
> > -                         resp->rqstp->rq_vec, read->rd_vlen, maxcount, eof);
> > +     nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, read->rd_nf->nf_file,
> > +                         read->rd_offset, resp->rqstp->rq_vec, read->rd_vlen,
> > +                         maxcount, eof);
> >       if (nfserr)
> >               return nfserr;
> > -     xdr_truncate_encode(xdr, starting_len + 16 + xdr_align_size(*maxcount));
> > +     xdr_truncate_encode(xdr, starting_len + xdr_align_size(*maxcount));
> >
> > -     tmp = htonl(NFS4_CONTENT_DATA);
> > -     write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
> > -     tmp64 = cpu_to_be64(read->rd_offset);
> > -     write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
> > -     tmp = htonl(*maxcount);
> > -     write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
> > -
> > -     tmp = xdr_zero;
> > -     write_bytes_to_xdr_buf(xdr->buf, starting_len + 16 + *maxcount, &tmp,
> > -                            xdr_pad_size(*maxcount));
> > +     pad = (*maxcount&3) ? 4 - (*maxcount&3) : 0;
> > +     write_bytes_to_xdr_buf(xdr->buf, starting_len + *maxcount, &zero, pad);
> >       return nfs_ok;
> > }
> >
> > +static void
> > +nfsd4_encode_read_plus_segment(struct xdr_stream *xdr,
> > +                            struct read_plus_segment *segment,
> > +                            unsigned int *bufpos, unsigned int *segments)
> > +{
> > +     struct xdr_buf *buf = xdr->buf;
> > +
> > +     xdr_encode_word(buf, *bufpos, segment->type);
> > +     xdr_encode_double(buf, *bufpos + 4, segment->offset);
> > +
> > +     if (segment->type == NFS4_CONTENT_HOLE) {
> > +             xdr_encode_double(buf, *bufpos + 12, segment->length);
> > +             *bufpos += 4 + 8 + 8;
> > +     } else {
> > +             size_t align = xdr_align_size(segment->length);
> > +             xdr_encode_word(buf, *bufpos + 12, segment->length);
> > +             if (*segments == 0)
> > +                     xdr_buf_trim_head(buf, 4);
> > +
> > +             xdr_stream_move_segment(xdr,
> > +                             buf->head[0].iov_len + segment->page_pos,
> > +                             *bufpos + 16, align);
>
> The term "segment" is overloaded in general, and in particular
> here. xdr_stream_move_subsegment() might be a less confusing
> name for this helper.

Sure, no problem.

>
> However I don't immediately see a benefit from working with the
> xdr_buf here. Can't you do these operations entirely with the
> passed-in xdr_stream?

I assume so, since the xdr_buf is part of the xdr_stream

>
>
> > +             *bufpos += 4 + 8 + 4 + align;
> > +     }
> > +
> > +     *segments += 1;
> > +}
> > +
> > static __be32
> > -nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
> > -                         struct nfsd4_read *read,
> > -                         unsigned long *maxcount, u32 *eof)
> > +nfsd4_encode_read_plus_segments(struct nfsd4_compoundres *resp,
> > +                             struct nfsd4_read *read,
> > +                             unsigned int *segments, u32 *eof)
> > {
> > -     struct file *file = read->rd_nf->nf_file;
> > -     loff_t data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
> > -     loff_t f_size = i_size_read(file_inode(file));
> > -     unsigned long count;
> > -     __be32 *p;
> > +     enum data_content4 pagetype;
> > +     struct read_plus_segment segment;
> > +     struct xdr_stream *xdr = resp->xdr;
> > +     unsigned long offset = read->rd_offset;
> > +     unsigned int bufpos = xdr->buf->len;
> > +     unsigned long maxcount;
> > +     unsigned int pagelen, i = 0;
> > +     char *vpage, *p;
> > +     __be32 nfserr;
> >
> > -     if (data_pos == -ENXIO)
> > -             data_pos = f_size;
> > -     else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE))
> > -             return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size);
> > -     count = data_pos - read->rd_offset;
> > -
> > -     /* Content type, offset, byte count */
> > -     p = xdr_reserve_space(resp->xdr, 4 + 8 + 8);
> > -     if (!p)
> > +     /* enough space for a HOLE segment before we switch to the pages */
> > +     if (!xdr_reserve_space(xdr, 4 + 8 + 8))
> >               return nfserr_resource;
> > +     xdr_commit_encode(xdr);
> >
> > -     *p++ = htonl(NFS4_CONTENT_HOLE);
> > -     p = xdr_encode_hyper(p, read->rd_offset);
> > -     p = xdr_encode_hyper(p, count);
> > +     maxcount = min_t(unsigned long, read->rd_length,
> > +                      (xdr->buf->buflen - xdr->buf->len));
> >
> > -     *eof = (read->rd_offset + count) >= f_size;
> > -     *maxcount = min_t(unsigned long, count, *maxcount);
> > +     nfserr = nfsd4_read_plus_readv(resp, read, &maxcount, eof);
> > +     if (nfserr)
> > +             return nfserr;
> > +
> > +     while (maxcount > 0) {
> > +             vpage = xdr_buf_nth_page_address(xdr->buf, i, &pagelen);
> > +             pagelen = min_t(unsigned int, pagelen, maxcount);
> > +             if (!vpage || pagelen == 0)
> > +                     break;
> > +             p = memchr_inv(vpage, 0, pagelen);
> > +             pagetype = (p == NULL) ? NFS4_CONTENT_HOLE : NFS4_CONTENT_DATA;
> > +
> > +             if (pagetype != segment.type || i == 0) {
> > +                     if (likely(i > 0)) {
> > +                             nfsd4_encode_read_plus_segment(xdr, &segment,
> > +                                                           &bufpos, segments);
> > +                             offset += segment.length;
> > +                     }
> > +                     segment.type = pagetype;
> > +                     segment.offset = offset;
> > +                     segment.length = pagelen;
> > +                     segment.page_pos = i * PAGE_SIZE;
> > +             } else
> > +                     segment.length += pagelen;
> > +
> > +             maxcount -= pagelen;
> > +             i++;
> > +     }
> > +
> > +     nfsd4_encode_read_plus_segment(xdr, &segment, &bufpos, segments);
> > +     xdr_truncate_encode(xdr, bufpos);
> >       return nfs_ok;
> > }
> >
> > @@ -4813,69 +4853,31 @@ static __be32
> > nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> >                      struct nfsd4_read *read)
> > {
> > -     unsigned long maxcount, count;
> >       struct xdr_stream *xdr = resp->xdr;
> > -     struct file *file;
> >       int starting_len = xdr->buf->len;
> > -     int last_segment = xdr->buf->len;
> > -     int segments = 0;
> > -     __be32 *p, tmp;
> > -     bool is_data;
> > -     loff_t pos;
> > +     unsigned int segments = 0;
> >       u32 eof;
> >
> >       if (nfserr)
> >               return nfserr;
> > -     file = read->rd_nf->nf_file;
> >
> >       /* eof flag, segment count */
> > -     p = xdr_reserve_space(xdr, 4 + 4);
> > -     if (!p)
> > +     if (!xdr_reserve_space(xdr, 4 + 4))
> >               return nfserr_resource;
> >       xdr_commit_encode(xdr);
> >
> > -     maxcount = min_t(unsigned long, read->rd_length,
> > -                      (xdr->buf->buflen - xdr->buf->len));
> > -     count    = maxcount;
> > -
> > -     eof = read->rd_offset >= i_size_read(file_inode(file));
> > +     eof = read->rd_offset >= i_size_read(file_inode(read->rd_nf->nf_file));
> >       if (eof)
> >               goto out;
> >
> > -     pos = vfs_llseek(file, read->rd_offset, SEEK_HOLE);
> > -     is_data = pos > read->rd_offset;
> > -
> > -     while (count > 0 && !eof) {
> > -             maxcount = count;
> > -             if (is_data)
> > -                     nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof,
> > -                                             segments == 0 ? &pos : NULL);
> > -             else
> > -                     nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof);
> > -             if (nfserr)
> > -                     goto out;
> > -             count -= maxcount;
> > -             read->rd_offset += maxcount;
> > -             is_data = !is_data;
> > -             last_segment = xdr->buf->len;
> > -             segments++;
> > -     }
> > -
> > +     nfserr = nfsd4_encode_read_plus_segments(resp, read, &segments, &eof);
> > out:
> > -     if (nfserr && segments == 0)
> > +     if (nfserr)
> >               xdr_truncate_encode(xdr, starting_len);
> >       else {
> > -             if (nfserr) {
> > -                     xdr_truncate_encode(xdr, last_segment);
> > -                     nfserr = nfs_ok;
> > -                     eof = 0;
> > -             }
> > -             tmp = htonl(eof);
> > -             write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
> > -             tmp = htonl(segments);
> > -             write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
> > +             xdr_encode_word(xdr->buf, starting_len,     eof);
> > +             xdr_encode_word(xdr->buf, starting_len + 4, segments);
> >       }
> > -
> >       return nfserr;
> > }
> >
> > --
> > 2.36.1
> >
>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 6/6] NFSD: Repeal and replace the READ_PLUS implementation
  2022-05-17 13:39     ` Anna Schumaker
@ 2022-05-17 13:53       ` Chuck Lever III
  2022-05-17 14:55         ` Anna Schumaker
  0 siblings, 1 reply; 11+ messages in thread
From: Chuck Lever III @ 2022-05-17 13:53 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Anna Schumaker, Steve Dickson, Linux NFS Mailing List



> On May 17, 2022, at 9:39 AM, Anna Schumaker <schumaker.anna@gmail.com> wrote:
> 
> Hi Chuck,
> 
> On Mon, May 16, 2022 at 11:16 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>> 
>> 
>> 
>>> On May 16, 2022, at 4:35 PM, Anna.Schumaker@Netapp.com wrote:
>>> 
>>> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
>>> 
>>> Rather than relying on the underlying filesystem to tell us where hole
>>> and data segments are through vfs_llseek(), let's instead do the hole
>>> compression ourselves. This has a few advantages over the old
>>> implementation:
>>> 
>>> 1) A single call to the underlying filesystem through nfsd_readv() means
>>>  the file can't change from underneath us in the middle of encoding.
>>> 2) A single call to the underlying filestem also means that the
>>>  underlying filesystem only needs to synchronize cached and on-disk
>>>  data one time instead of potentially many speeding up the reply.
>>> 3) Hole support for filesystems that don't support SEEK_HOLE and SEEK_DATA
>>> 
>>> I also included an optimization where we can cut down on the amount of
>>> memory being shifed around by doing the compression as (hole, data)
>>> pairs.
>>> 
>>> This patch not only fixes xfstests generic/091 and generic/263
>>> for me but the "-g quick" group tests also finish about a minute faster.
>>> 
>>> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
>> 
>> Hi Anna, some general comments on the NFSD pieces:
> 
> Thanks for looking over this!
> 
>> 
>> - Doesn't READ have the same issue that the file bytes can't change
>>  until the Reply is fully sent? I would think the payload data
>>  can't change until there is no possibility of a transport-layer
>>  retransmit. Also, special restrictions like this should be
>>  documented via code comments, IMHO.
> 
> The problem with the current READ_PLUS code is that the first segment
> in the reply could already be stale data by the time the last segment
> has been encoded. So we're returning a view of the file from multiple
> points in time. READ only does the single call into the vfs, so it
> doesn't have the same race that's causing issues.

IIUI readv might pull the payload into a buffer, but spliced reads
don't. I realize you can't use splice here, but it seems like splice
would have to guarantee the payload data was immutable while the
Reply is transmitted... maybe that same mechanism could be used here.

Just spit-balling!


>> - David Howells might be interested in this approach, as he had some
>>  concerns about compressing files in place that would appear to
>>  apply also to READ_PLUS. Please copy David on the next round of
>>  these patches.
> 
> Will do!
> 
>> 
>> - Can you say why the READ_PLUS decoder and encoder operates on
>>  struct xdr_buf instead of struct xdr_stream? I'd prefer xdr_stream
>>  if you can. You could get rid of write_bytes_to_xdr_buf,
>>  xdr_encode_word and xdr_encode_double and use the stream-based
>>  helpers.
> 
> I think my reason was that write_bytes_to_xdr_buf() and
> xdr_encode_word() already took the xdr_buf, so I was matching the
> style. I can change everything over to xdr_stream easily enough, since
> the xdr_bufs in question are always attached to the xdr_stream. Would
> you prefer it if I rework the existing functions as part of this patch
> series, or as a separate prerequisite series?
> 
>> 
>> - Instead of using naked integers, please use multiples of XDR_UNIT.
> 
> I hadn't encountered XDR_UNIT before, but I can do that.
> 
>> 
>> 
>>> ---
>>> fs/nfsd/nfs4xdr.c | 202 +++++++++++++++++++++++-----------------------
>>> 1 file changed, 102 insertions(+), 100 deletions(-)
>>> 
>>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>>> index da92e7d2ab6a..973b4a1e7990 100644
>>> --- a/fs/nfsd/nfs4xdr.c
>>> +++ b/fs/nfsd/nfs4xdr.c
>>> @@ -4731,81 +4731,121 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
>>>      return nfserr;
>>> }
>>> 
>>> +struct read_plus_segment {
>>> +     enum data_content4 type;
>>> +     unsigned long offset;
>>> +     unsigned long length;
>>> +     unsigned int page_pos;
>>> +};
>>> +
>>> static __be32
>>> -nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
>>> -                         struct nfsd4_read *read,
>>> -                         unsigned long *maxcount, u32 *eof,
>>> -                         loff_t *pos)
>>> +nfsd4_read_plus_readv(struct nfsd4_compoundres *resp, struct nfsd4_read *read,
>>> +                   unsigned long *maxcount, u32 *eof)
>>> {
>>>      struct xdr_stream *xdr = resp->xdr;
>>> -     struct file *file = read->rd_nf->nf_file;
>>> -     int starting_len = xdr->buf->len;
>>> -     loff_t hole_pos;
>>> -     __be32 nfserr;
>>> -     __be32 *p, tmp;
>>> -     __be64 tmp64;
>>> -
>>> -     hole_pos = pos ? *pos : vfs_llseek(file, read->rd_offset, SEEK_HOLE);
>>> -     if (hole_pos > read->rd_offset)
>>> -             *maxcount = min_t(unsigned long, *maxcount, hole_pos - read->rd_offset);
>>> -     *maxcount = min_t(unsigned long, *maxcount, (xdr->buf->buflen - xdr->buf->len));
>>> -
>>> -     /* Content type, offset, byte count */
>>> -     p = xdr_reserve_space(xdr, 4 + 8 + 4);
>>> -     if (!p)
>>> -             return nfserr_resource;
>>> +     unsigned int starting_len = xdr->buf->len;
>>> +     __be32 nfserr, zero = xdr_zero;
>>> +     int pad;
>>> 
>>> +     /* xdr_reserve_space_vec() switches us to the xdr->pages */
>>>      read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, *maxcount);
>>>      if (read->rd_vlen < 0)
>>>              return nfserr_resource;
>>> 
>>> -     nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
>>> -                         resp->rqstp->rq_vec, read->rd_vlen, maxcount, eof);
>>> +     nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, read->rd_nf->nf_file,
>>> +                         read->rd_offset, resp->rqstp->rq_vec, read->rd_vlen,
>>> +                         maxcount, eof);
>>>      if (nfserr)
>>>              return nfserr;
>>> -     xdr_truncate_encode(xdr, starting_len + 16 + xdr_align_size(*maxcount));
>>> +     xdr_truncate_encode(xdr, starting_len + xdr_align_size(*maxcount));
>>> 
>>> -     tmp = htonl(NFS4_CONTENT_DATA);
>>> -     write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
>>> -     tmp64 = cpu_to_be64(read->rd_offset);
>>> -     write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
>>> -     tmp = htonl(*maxcount);
>>> -     write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
>>> -
>>> -     tmp = xdr_zero;
>>> -     write_bytes_to_xdr_buf(xdr->buf, starting_len + 16 + *maxcount, &tmp,
>>> -                            xdr_pad_size(*maxcount));
>>> +     pad = (*maxcount&3) ? 4 - (*maxcount&3) : 0;
>>> +     write_bytes_to_xdr_buf(xdr->buf, starting_len + *maxcount, &zero, pad);
>>>      return nfs_ok;
>>> }
>>> 
>>> +static void
>>> +nfsd4_encode_read_plus_segment(struct xdr_stream *xdr,
>>> +                            struct read_plus_segment *segment,
>>> +                            unsigned int *bufpos, unsigned int *segments)
>>> +{
>>> +     struct xdr_buf *buf = xdr->buf;
>>> +
>>> +     xdr_encode_word(buf, *bufpos, segment->type);
>>> +     xdr_encode_double(buf, *bufpos + 4, segment->offset);
>>> +
>>> +     if (segment->type == NFS4_CONTENT_HOLE) {
>>> +             xdr_encode_double(buf, *bufpos + 12, segment->length);
>>> +             *bufpos += 4 + 8 + 8;
>>> +     } else {
>>> +             size_t align = xdr_align_size(segment->length);
>>> +             xdr_encode_word(buf, *bufpos + 12, segment->length);
>>> +             if (*segments == 0)
>>> +                     xdr_buf_trim_head(buf, 4);
>>> +
>>> +             xdr_stream_move_segment(xdr,
>>> +                             buf->head[0].iov_len + segment->page_pos,
>>> +                             *bufpos + 16, align);
>> 
>> The term "segment" is overloaded in general, and in particular
>> here. xdr_stream_move_subsegment() might be a less confusing
>> name for this helper.
> 
> Sure, no problem.
> 
>> 
>> However I don't immediately see a benefit from working with the
>> xdr_buf here. Can't you do these operations entirely with the
>> passed-in xdr_stream?
> 
> I assume so, since the xdr_buf is part of the xdr_stream
> 
>> 
>> 
>>> +             *bufpos += 4 + 8 + 4 + align;
>>> +     }
>>> +
>>> +     *segments += 1;
>>> +}
>>> +
>>> static __be32
>>> -nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
>>> -                         struct nfsd4_read *read,
>>> -                         unsigned long *maxcount, u32 *eof)
>>> +nfsd4_encode_read_plus_segments(struct nfsd4_compoundres *resp,
>>> +                             struct nfsd4_read *read,
>>> +                             unsigned int *segments, u32 *eof)
>>> {
>>> -     struct file *file = read->rd_nf->nf_file;
>>> -     loff_t data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
>>> -     loff_t f_size = i_size_read(file_inode(file));
>>> -     unsigned long count;
>>> -     __be32 *p;
>>> +     enum data_content4 pagetype;
>>> +     struct read_plus_segment segment;
>>> +     struct xdr_stream *xdr = resp->xdr;
>>> +     unsigned long offset = read->rd_offset;
>>> +     unsigned int bufpos = xdr->buf->len;
>>> +     unsigned long maxcount;
>>> +     unsigned int pagelen, i = 0;
>>> +     char *vpage, *p;
>>> +     __be32 nfserr;
>>> 
>>> -     if (data_pos == -ENXIO)
>>> -             data_pos = f_size;
>>> -     else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE))
>>> -             return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size);
>>> -     count = data_pos - read->rd_offset;
>>> -
>>> -     /* Content type, offset, byte count */
>>> -     p = xdr_reserve_space(resp->xdr, 4 + 8 + 8);
>>> -     if (!p)
>>> +     /* enough space for a HOLE segment before we switch to the pages */
>>> +     if (!xdr_reserve_space(xdr, 4 + 8 + 8))
>>>              return nfserr_resource;
>>> +     xdr_commit_encode(xdr);
>>> 
>>> -     *p++ = htonl(NFS4_CONTENT_HOLE);
>>> -     p = xdr_encode_hyper(p, read->rd_offset);
>>> -     p = xdr_encode_hyper(p, count);
>>> +     maxcount = min_t(unsigned long, read->rd_length,
>>> +                      (xdr->buf->buflen - xdr->buf->len));
>>> 
>>> -     *eof = (read->rd_offset + count) >= f_size;
>>> -     *maxcount = min_t(unsigned long, count, *maxcount);
>>> +     nfserr = nfsd4_read_plus_readv(resp, read, &maxcount, eof);
>>> +     if (nfserr)
>>> +             return nfserr;
>>> +
>>> +     while (maxcount > 0) {
>>> +             vpage = xdr_buf_nth_page_address(xdr->buf, i, &pagelen);
>>> +             pagelen = min_t(unsigned int, pagelen, maxcount);
>>> +             if (!vpage || pagelen == 0)
>>> +                     break;
>>> +             p = memchr_inv(vpage, 0, pagelen);
>>> +             pagetype = (p == NULL) ? NFS4_CONTENT_HOLE : NFS4_CONTENT_DATA;
>>> +
>>> +             if (pagetype != segment.type || i == 0) {
>>> +                     if (likely(i > 0)) {
>>> +                             nfsd4_encode_read_plus_segment(xdr, &segment,
>>> +                                                           &bufpos, segments);
>>> +                             offset += segment.length;
>>> +                     }
>>> +                     segment.type = pagetype;
>>> +                     segment.offset = offset;
>>> +                     segment.length = pagelen;
>>> +                     segment.page_pos = i * PAGE_SIZE;
>>> +             } else
>>> +                     segment.length += pagelen;
>>> +
>>> +             maxcount -= pagelen;
>>> +             i++;
>>> +     }
>>> +
>>> +     nfsd4_encode_read_plus_segment(xdr, &segment, &bufpos, segments);
>>> +     xdr_truncate_encode(xdr, bufpos);
>>>      return nfs_ok;
>>> }
>>> 
>>> @@ -4813,69 +4853,31 @@ static __be32
>>> nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
>>>                     struct nfsd4_read *read)
>>> {
>>> -     unsigned long maxcount, count;
>>>      struct xdr_stream *xdr = resp->xdr;
>>> -     struct file *file;
>>>      int starting_len = xdr->buf->len;
>>> -     int last_segment = xdr->buf->len;
>>> -     int segments = 0;
>>> -     __be32 *p, tmp;
>>> -     bool is_data;
>>> -     loff_t pos;
>>> +     unsigned int segments = 0;
>>>      u32 eof;
>>> 
>>>      if (nfserr)
>>>              return nfserr;
>>> -     file = read->rd_nf->nf_file;
>>> 
>>>      /* eof flag, segment count */
>>> -     p = xdr_reserve_space(xdr, 4 + 4);
>>> -     if (!p)
>>> +     if (!xdr_reserve_space(xdr, 4 + 4))
>>>              return nfserr_resource;
>>>      xdr_commit_encode(xdr);
>>> 
>>> -     maxcount = min_t(unsigned long, read->rd_length,
>>> -                      (xdr->buf->buflen - xdr->buf->len));
>>> -     count    = maxcount;
>>> -
>>> -     eof = read->rd_offset >= i_size_read(file_inode(file));
>>> +     eof = read->rd_offset >= i_size_read(file_inode(read->rd_nf->nf_file));
>>>      if (eof)
>>>              goto out;
>>> 
>>> -     pos = vfs_llseek(file, read->rd_offset, SEEK_HOLE);
>>> -     is_data = pos > read->rd_offset;
>>> -
>>> -     while (count > 0 && !eof) {
>>> -             maxcount = count;
>>> -             if (is_data)
>>> -                     nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof,
>>> -                                             segments == 0 ? &pos : NULL);
>>> -             else
>>> -                     nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof);
>>> -             if (nfserr)
>>> -                     goto out;
>>> -             count -= maxcount;
>>> -             read->rd_offset += maxcount;
>>> -             is_data = !is_data;
>>> -             last_segment = xdr->buf->len;
>>> -             segments++;
>>> -     }
>>> -
>>> +     nfserr = nfsd4_encode_read_plus_segments(resp, read, &segments, &eof);
>>> out:
>>> -     if (nfserr && segments == 0)
>>> +     if (nfserr)
>>>              xdr_truncate_encode(xdr, starting_len);
>>>      else {
>>> -             if (nfserr) {
>>> -                     xdr_truncate_encode(xdr, last_segment);
>>> -                     nfserr = nfs_ok;
>>> -                     eof = 0;
>>> -             }
>>> -             tmp = htonl(eof);
>>> -             write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
>>> -             tmp = htonl(segments);
>>> -             write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
>>> +             xdr_encode_word(xdr->buf, starting_len,     eof);
>>> +             xdr_encode_word(xdr->buf, starting_len + 4, segments);
>>>      }
>>> -
>>>      return nfserr;
>>> }
>>> 
>>> --
>>> 2.36.1
>>> 
>> 
>> --
>> Chuck Lever

--
Chuck Lever




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 6/6] NFSD: Repeal and replace the READ_PLUS implementation
  2022-05-17 13:53       ` Chuck Lever III
@ 2022-05-17 14:55         ` Anna Schumaker
  0 siblings, 0 replies; 11+ messages in thread
From: Anna Schumaker @ 2022-05-17 14:55 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Anna Schumaker, Steve Dickson, Linux NFS Mailing List

On Tue, May 17, 2022 at 9:53 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On May 17, 2022, at 9:39 AM, Anna Schumaker <schumaker.anna@gmail.com> wrote:
> >
> > Hi Chuck,
> >
> > On Mon, May 16, 2022 at 11:16 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>
> >>
> >>
> >>> On May 16, 2022, at 4:35 PM, Anna.Schumaker@Netapp.com wrote:
> >>>
> >>> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> >>>
> >>> Rather than relying on the underlying filesystem to tell us where hole
> >>> and data segments are through vfs_llseek(), let's instead do the hole
> >>> compression ourselves. This has a few advantages over the old
> >>> implementation:
> >>>
> >>> 1) A single call to the underlying filesystem through nfsd_readv() means
> >>>  the file can't change from underneath us in the middle of encoding.
> >>> 2) A single call to the underlying filestem also means that the
> >>>  underlying filesystem only needs to synchronize cached and on-disk
> >>>  data one time instead of potentially many speeding up the reply.
> >>> 3) Hole support for filesystems that don't support SEEK_HOLE and SEEK_DATA
> >>>
> >>> I also included an optimization where we can cut down on the amount of
> >>> memory being shifed around by doing the compression as (hole, data)
> >>> pairs.
> >>>
> >>> This patch not only fixes xfstests generic/091 and generic/263
> >>> for me but the "-g quick" group tests also finish about a minute faster.
> >>>
> >>> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> >>
> >> Hi Anna, some general comments on the NFSD pieces:
> >
> > Thanks for looking over this!
> >
> >>
> >> - Doesn't READ have the same issue that the file bytes can't change
> >>  until the Reply is fully sent? I would think the payload data
> >>  can't change until there is no possibility of a transport-layer
> >>  retransmit. Also, special restrictions like this should be
> >>  documented via code comments, IMHO.
> >
> > The problem with the current READ_PLUS code is that the first segment
> > in the reply could already be stale data by the time the last segment
> > has been encoded. So we're returning a view of the file from multiple
> > points in time. READ only does the single call into the vfs, so it
> > doesn't have the same race that's causing issues.
>
> IIUI readv might pull the payload into a buffer, but spliced reads
> don't. I realize you can't use splice here, but it seems like splice
> would have to guarantee the payload data was immutable while the
> Reply is transmitted... maybe that same mechanism could be used here.

I'm not seeing any special handling in the splice read path for
locking pages. There is a get_page() in svc_rqst_replace_page() to
make sure the pages don't disappear from under us, but I'm not seeing
anything to lock the content. But I'm not sure how much of an issue
that is, since the returned bytes are basically one big data segment.
I don't see any problems with READ_PLUS either when it's hacked to
only return a single data segment.

It's when we start mixing hole and data that problems start showing up
when using the current style of mixed calls to vfs_llseek() and
vfs_iter_read(), since each of these functions try to lock the file
(so we can't lock it ourselves for the duration of the encode). Some
filesystems (*cough* btrfs *cough) also seem to do a lot of
synchronization between page cache and on disk data when replying to
vfs_llseek() which is especially painful on virtual machines (which is
how reading file data that is already in the server's page cache ends
up taking almost 2000 seconds. See here:
http://wiki.linux-nfs.org/wiki/index.php/Read_Plus_May_2022#BTRFS_3).
So scanning the returned buffer turns out to be better for
performance, and fixes the broken tests along the way.

Anna

>
> Just spit-balling!
>
>
> >> - David Howells might be interested in this approach, as he had some
> >>  concerns about compressing files in place that would appear to
> >>  apply also to READ_PLUS. Please copy David on the next round of
> >>  these patches.
> >
> > Will do!
> >
> >>
> >> - Can you say why the READ_PLUS decoder and encoder operates on
> >>  struct xdr_buf instead of struct xdr_stream? I'd prefer xdr_stream
> >>  if you can. You could get rid of write_bytes_to_xdr_buf,
> >>  xdr_encode_word and xdr_encode_double and use the stream-based
> >>  helpers.
> >
> > I think my reason was that write_bytes_to_xdr_buf() and
> > xdr_encode_word() already took the xdr_buf, so I was matching the
> > style. I can change everything over to xdr_stream easily enough, since
> > the xdr_bufs in question are always attached to the xdr_stream. Would
> > you prefer it if I rework the existing functions as part of this patch
> > series, or as a separate prerequisite series?
> >
> >>
> >> - Instead of using naked integers, please use multiples of XDR_UNIT.
> >
> > I hadn't encountered XDR_UNIT before, but I can do that.
> >
> >>
> >>
> >>> ---
> >>> fs/nfsd/nfs4xdr.c | 202 +++++++++++++++++++++++-----------------------
> >>> 1 file changed, 102 insertions(+), 100 deletions(-)
> >>>
> >>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> >>> index da92e7d2ab6a..973b4a1e7990 100644
> >>> --- a/fs/nfsd/nfs4xdr.c
> >>> +++ b/fs/nfsd/nfs4xdr.c
> >>> @@ -4731,81 +4731,121 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
> >>>      return nfserr;
> >>> }
> >>>
> >>> +struct read_plus_segment {
> >>> +     enum data_content4 type;
> >>> +     unsigned long offset;
> >>> +     unsigned long length;
> >>> +     unsigned int page_pos;
> >>> +};
> >>> +
> >>> static __be32
> >>> -nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
> >>> -                         struct nfsd4_read *read,
> >>> -                         unsigned long *maxcount, u32 *eof,
> >>> -                         loff_t *pos)
> >>> +nfsd4_read_plus_readv(struct nfsd4_compoundres *resp, struct nfsd4_read *read,
> >>> +                   unsigned long *maxcount, u32 *eof)
> >>> {
> >>>      struct xdr_stream *xdr = resp->xdr;
> >>> -     struct file *file = read->rd_nf->nf_file;
> >>> -     int starting_len = xdr->buf->len;
> >>> -     loff_t hole_pos;
> >>> -     __be32 nfserr;
> >>> -     __be32 *p, tmp;
> >>> -     __be64 tmp64;
> >>> -
> >>> -     hole_pos = pos ? *pos : vfs_llseek(file, read->rd_offset, SEEK_HOLE);
> >>> -     if (hole_pos > read->rd_offset)
> >>> -             *maxcount = min_t(unsigned long, *maxcount, hole_pos - read->rd_offset);
> >>> -     *maxcount = min_t(unsigned long, *maxcount, (xdr->buf->buflen - xdr->buf->len));
> >>> -
> >>> -     /* Content type, offset, byte count */
> >>> -     p = xdr_reserve_space(xdr, 4 + 8 + 4);
> >>> -     if (!p)
> >>> -             return nfserr_resource;
> >>> +     unsigned int starting_len = xdr->buf->len;
> >>> +     __be32 nfserr, zero = xdr_zero;
> >>> +     int pad;
> >>>
> >>> +     /* xdr_reserve_space_vec() switches us to the xdr->pages */
> >>>      read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, *maxcount);
> >>>      if (read->rd_vlen < 0)
> >>>              return nfserr_resource;
> >>>
> >>> -     nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
> >>> -                         resp->rqstp->rq_vec, read->rd_vlen, maxcount, eof);
> >>> +     nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, read->rd_nf->nf_file,
> >>> +                         read->rd_offset, resp->rqstp->rq_vec, read->rd_vlen,
> >>> +                         maxcount, eof);
> >>>      if (nfserr)
> >>>              return nfserr;
> >>> -     xdr_truncate_encode(xdr, starting_len + 16 + xdr_align_size(*maxcount));
> >>> +     xdr_truncate_encode(xdr, starting_len + xdr_align_size(*maxcount));
> >>>
> >>> -     tmp = htonl(NFS4_CONTENT_DATA);
> >>> -     write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
> >>> -     tmp64 = cpu_to_be64(read->rd_offset);
> >>> -     write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
> >>> -     tmp = htonl(*maxcount);
> >>> -     write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
> >>> -
> >>> -     tmp = xdr_zero;
> >>> -     write_bytes_to_xdr_buf(xdr->buf, starting_len + 16 + *maxcount, &tmp,
> >>> -                            xdr_pad_size(*maxcount));
> >>> +     pad = (*maxcount&3) ? 4 - (*maxcount&3) : 0;
> >>> +     write_bytes_to_xdr_buf(xdr->buf, starting_len + *maxcount, &zero, pad);
> >>>      return nfs_ok;
> >>> }
> >>>
> >>> +static void
> >>> +nfsd4_encode_read_plus_segment(struct xdr_stream *xdr,
> >>> +                            struct read_plus_segment *segment,
> >>> +                            unsigned int *bufpos, unsigned int *segments)
> >>> +{
> >>> +     struct xdr_buf *buf = xdr->buf;
> >>> +
> >>> +     xdr_encode_word(buf, *bufpos, segment->type);
> >>> +     xdr_encode_double(buf, *bufpos + 4, segment->offset);
> >>> +
> >>> +     if (segment->type == NFS4_CONTENT_HOLE) {
> >>> +             xdr_encode_double(buf, *bufpos + 12, segment->length);
> >>> +             *bufpos += 4 + 8 + 8;
> >>> +     } else {
> >>> +             size_t align = xdr_align_size(segment->length);
> >>> +             xdr_encode_word(buf, *bufpos + 12, segment->length);
> >>> +             if (*segments == 0)
> >>> +                     xdr_buf_trim_head(buf, 4);
> >>> +
> >>> +             xdr_stream_move_segment(xdr,
> >>> +                             buf->head[0].iov_len + segment->page_pos,
> >>> +                             *bufpos + 16, align);
> >>
> >> The term "segment" is overloaded in general, and in particular
> >> here. xdr_stream_move_subsegment() might be a less confusing
> >> name for this helper.
> >
> > Sure, no problem.
> >
> >>
> >> However I don't immediately see a benefit from working with the
> >> xdr_buf here. Can't you do these operations entirely with the
> >> passed-in xdr_stream?
> >
> > I assume so, since the xdr_buf is part of the xdr_stream
> >
> >>
> >>
> >>> +             *bufpos += 4 + 8 + 4 + align;
> >>> +     }
> >>> +
> >>> +     *segments += 1;
> >>> +}
> >>> +
> >>> static __be32
> >>> -nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
> >>> -                         struct nfsd4_read *read,
> >>> -                         unsigned long *maxcount, u32 *eof)
> >>> +nfsd4_encode_read_plus_segments(struct nfsd4_compoundres *resp,
> >>> +                             struct nfsd4_read *read,
> >>> +                             unsigned int *segments, u32 *eof)
> >>> {
> >>> -     struct file *file = read->rd_nf->nf_file;
> >>> -     loff_t data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
> >>> -     loff_t f_size = i_size_read(file_inode(file));
> >>> -     unsigned long count;
> >>> -     __be32 *p;
> >>> +     enum data_content4 pagetype;
> >>> +     struct read_plus_segment segment;
> >>> +     struct xdr_stream *xdr = resp->xdr;
> >>> +     unsigned long offset = read->rd_offset;
> >>> +     unsigned int bufpos = xdr->buf->len;
> >>> +     unsigned long maxcount;
> >>> +     unsigned int pagelen, i = 0;
> >>> +     char *vpage, *p;
> >>> +     __be32 nfserr;
> >>>
> >>> -     if (data_pos == -ENXIO)
> >>> -             data_pos = f_size;
> >>> -     else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE))
> >>> -             return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size);
> >>> -     count = data_pos - read->rd_offset;
> >>> -
> >>> -     /* Content type, offset, byte count */
> >>> -     p = xdr_reserve_space(resp->xdr, 4 + 8 + 8);
> >>> -     if (!p)
> >>> +     /* enough space for a HOLE segment before we switch to the pages */
> >>> +     if (!xdr_reserve_space(xdr, 4 + 8 + 8))
> >>>              return nfserr_resource;
> >>> +     xdr_commit_encode(xdr);
> >>>
> >>> -     *p++ = htonl(NFS4_CONTENT_HOLE);
> >>> -     p = xdr_encode_hyper(p, read->rd_offset);
> >>> -     p = xdr_encode_hyper(p, count);
> >>> +     maxcount = min_t(unsigned long, read->rd_length,
> >>> +                      (xdr->buf->buflen - xdr->buf->len));
> >>>
> >>> -     *eof = (read->rd_offset + count) >= f_size;
> >>> -     *maxcount = min_t(unsigned long, count, *maxcount);
> >>> +     nfserr = nfsd4_read_plus_readv(resp, read, &maxcount, eof);
> >>> +     if (nfserr)
> >>> +             return nfserr;
> >>> +
> >>> +     while (maxcount > 0) {
> >>> +             vpage = xdr_buf_nth_page_address(xdr->buf, i, &pagelen);
> >>> +             pagelen = min_t(unsigned int, pagelen, maxcount);
> >>> +             if (!vpage || pagelen == 0)
> >>> +                     break;
> >>> +             p = memchr_inv(vpage, 0, pagelen);
> >>> +             pagetype = (p == NULL) ? NFS4_CONTENT_HOLE : NFS4_CONTENT_DATA;
> >>> +
> >>> +             if (pagetype != segment.type || i == 0) {
> >>> +                     if (likely(i > 0)) {
> >>> +                             nfsd4_encode_read_plus_segment(xdr, &segment,
> >>> +                                                           &bufpos, segments);
> >>> +                             offset += segment.length;
> >>> +                     }
> >>> +                     segment.type = pagetype;
> >>> +                     segment.offset = offset;
> >>> +                     segment.length = pagelen;
> >>> +                     segment.page_pos = i * PAGE_SIZE;
> >>> +             } else
> >>> +                     segment.length += pagelen;
> >>> +
> >>> +             maxcount -= pagelen;
> >>> +             i++;
> >>> +     }
> >>> +
> >>> +     nfsd4_encode_read_plus_segment(xdr, &segment, &bufpos, segments);
> >>> +     xdr_truncate_encode(xdr, bufpos);
> >>>      return nfs_ok;
> >>> }
> >>>
> >>> @@ -4813,69 +4853,31 @@ static __be32
> >>> nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> >>>                     struct nfsd4_read *read)
> >>> {
> >>> -     unsigned long maxcount, count;
> >>>      struct xdr_stream *xdr = resp->xdr;
> >>> -     struct file *file;
> >>>      int starting_len = xdr->buf->len;
> >>> -     int last_segment = xdr->buf->len;
> >>> -     int segments = 0;
> >>> -     __be32 *p, tmp;
> >>> -     bool is_data;
> >>> -     loff_t pos;
> >>> +     unsigned int segments = 0;
> >>>      u32 eof;
> >>>
> >>>      if (nfserr)
> >>>              return nfserr;
> >>> -     file = read->rd_nf->nf_file;
> >>>
> >>>      /* eof flag, segment count */
> >>> -     p = xdr_reserve_space(xdr, 4 + 4);
> >>> -     if (!p)
> >>> +     if (!xdr_reserve_space(xdr, 4 + 4))
> >>>              return nfserr_resource;
> >>>      xdr_commit_encode(xdr);
> >>>
> >>> -     maxcount = min_t(unsigned long, read->rd_length,
> >>> -                      (xdr->buf->buflen - xdr->buf->len));
> >>> -     count    = maxcount;
> >>> -
> >>> -     eof = read->rd_offset >= i_size_read(file_inode(file));
> >>> +     eof = read->rd_offset >= i_size_read(file_inode(read->rd_nf->nf_file));
> >>>      if (eof)
> >>>              goto out;
> >>>
> >>> -     pos = vfs_llseek(file, read->rd_offset, SEEK_HOLE);
> >>> -     is_data = pos > read->rd_offset;
> >>> -
> >>> -     while (count > 0 && !eof) {
> >>> -             maxcount = count;
> >>> -             if (is_data)
> >>> -                     nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof,
> >>> -                                             segments == 0 ? &pos : NULL);
> >>> -             else
> >>> -                     nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof);
> >>> -             if (nfserr)
> >>> -                     goto out;
> >>> -             count -= maxcount;
> >>> -             read->rd_offset += maxcount;
> >>> -             is_data = !is_data;
> >>> -             last_segment = xdr->buf->len;
> >>> -             segments++;
> >>> -     }
> >>> -
> >>> +     nfserr = nfsd4_encode_read_plus_segments(resp, read, &segments, &eof);
> >>> out:
> >>> -     if (nfserr && segments == 0)
> >>> +     if (nfserr)
> >>>              xdr_truncate_encode(xdr, starting_len);
> >>>      else {
> >>> -             if (nfserr) {
> >>> -                     xdr_truncate_encode(xdr, last_segment);
> >>> -                     nfserr = nfs_ok;
> >>> -                     eof = 0;
> >>> -             }
> >>> -             tmp = htonl(eof);
> >>> -             write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
> >>> -             tmp = htonl(segments);
> >>> -             write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
> >>> +             xdr_encode_word(xdr->buf, starting_len,     eof);
> >>> +             xdr_encode_word(xdr->buf, starting_len + 4, segments);
> >>>      }
> >>> -
> >>>      return nfserr;
> >>> }
> >>>
> >>> --
> >>> 2.36.1
> >>>
> >>
> >> --
> >> Chuck Lever
>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-05-17 14:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-16 20:35 [PATCH v1 0/6] NFSD: Improvements for the NFSv4.2 READ_PLUS operation Anna.Schumaker
2022-05-16 20:35 ` [PATCH v1 1/6] SUNRPC: Introduce xdr_stream_move_segment() Anna.Schumaker
2022-05-16 20:35 ` [PATCH v1 2/6] SUNRPC: Introduce xdr_encode_double() Anna.Schumaker
2022-05-16 20:35 ` [PATCH v1 3/6] SUNRPC: Introduce xdr_buf_trim_head() Anna.Schumaker
2022-05-16 20:35 ` [PATCH v1 4/6] SUNRPC: Introduce xdr_buf_nth_page_address() Anna.Schumaker
2022-05-16 20:35 ` [PATCH v1 5/6] SUNRPC: Export xdr_buf_pagecount() Anna.Schumaker
2022-05-16 20:35 ` [PATCH v1 6/6] NFSD: Repeal and replace the READ_PLUS implementation Anna.Schumaker
2022-05-16 21:37   ` Chuck Lever III
2022-05-17 13:39     ` Anna Schumaker
2022-05-17 13:53       ` Chuck Lever III
2022-05-17 14:55         ` Anna Schumaker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.