Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 0/6] NFS: Add support for the v4.2 READ_PLUS operation
@ 2020-02-14 21:12 schumaker.anna
  2020-02-14 21:12 ` [PATCH v2 1/6] SUNRPC: Split out a function for setting current page schumaker.anna
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: schumaker.anna @ 2020-02-14 21:12 UTC (permalink / raw)
  To: Trond.Myklebust, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

These patches add client support for the READ_PLUS operation, which
breaks read requests into several "data" and "hole" segments when
replying to the client. I also add a "noreadplus" mount option to allow
users to disable the new operation if it becomes a problem, similar to
the "nordirplus" mount option that we already have.

Here are the results of some performance tests I ran on Netapp lab
machines. I tested by reading various 2G files from a few different
undelying filesystems and across several NFS versions. I used the
`vmtouch` utility to make sure files were only cached when we wanted
them to be. In addition to 100% data and 100% hole cases, I also tested
with files that alternate between data and hole segments. These files
have either 4K, 8K, 16K, or 32K segment sizes and start with either data
or hole segments. So the file mixed-4d has a 4K segment size beginning
with a data segment, but mixed-32h hase 32K segments beginning with a
hole. The units are in seconds, with the first number for each NFS
version being the uncached read time and the second number is for when
the file is cached on the server.

ext4      |        v3       |       v4.0      |       v4.1      |       v4.2      |
----------|-----------------|-----------------|-----------------|-----------------|
data      | 22.909 : 18.253 | 22.934 : 18.252 | 22.902 : 18.253 | 23.485 : 18.253 |
hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 |  0.708 :  0.709 |
mixed-4d  | 28.261 : 18.253 | 29.616 : 18.252 | 28.341 : 18.252 | 24.508 :  9.150 |
mixed-8d  | 27.956 : 18.253 | 28.404 : 18.252 | 28.320 : 18.252 | 23.967 :  9.140 |
mixed-16d | 28.172 : 18.253 | 27.946 : 18.252 | 27.627 : 18.252 | 23.043 :  9.134 |
mixed-32d | 25.350 : 18.253 | 24.406 : 18.252 | 24.384 : 18.253 | 20.698 :  9.132 |
mixed-4h  | 28.913 : 18.253 | 28.564 : 18.252 | 27.996 : 18.252 | 21.837 :  9.150 |
mixed-8h  | 28.625 : 18.253 | 27.833 : 18.252 | 27.798 : 18.253 | 21.710 :  9.140 |
mixed-16h | 27.975 : 18.253 | 27.662 : 18.252 | 27.795 : 18.253 | 20.585 :  9.134 |
mixed-32h | 25.958 : 18.253 | 25.491 : 18.252 | 24.856 : 18.252 | 21.018 :  9.132 |

xfs       |        v3       |       v4.0      |       v4.1      |       v4.2      |
----------|-----------------|-----------------|-----------------|-----------------|
data      | 22.041 : 18.253 | 22.618 : 18.252 | 23.067 : 18.253 | 23.496 : 18.253 |
hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 |  0.723 :  0.708 |
mixed-4d  | 29.417 : 18.253 | 28.503 : 18.252 | 28.671 : 18.253 | 24.957 :  9.150 |
mixed-8d  | 29.080 : 18.253 | 29.401 : 18.252 | 29.251 : 18.252 | 24.625 :  9.140 |
mixed-16d | 27.638 : 18.253 | 28.606 : 18.252 | 27.871 : 18.253 | 25.511 :  9.135 |
mixed-32d | 24.967 : 18.253 | 25.239 : 18.252 | 25.434 : 18.252 | 21.728 :  9.132 |
mixed-4h  | 34.816 : 18.253 | 36.243 : 18.252 | 35.837 : 18.252 | 32.332 :  9.150 |
mixed-8h  | 43.469 : 18.253 | 44.009 : 18.252 | 43.810 : 18.253 | 37.962 :  9.140 |
mixed-16h | 29.280 : 18.253 | 28.563 : 18.252 | 28.241 : 18.252 | 22.116 :  9.134 |
mixed-32h | 29.428 : 18.253 | 29.378 : 18.252 | 28.808 : 18.253 | 27.378 :  9.134 |

btrfs     |        v3       |       v4.0      |       v4.1      |       v4.2      |
----------|-----------------|-----------------|-----------------|-----------------|
data      | 25.547 : 18.253 | 25.053 : 18.252 | 24.209 : 18.253 | 32.121 : 18.253 |
hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.252 |  0.702 :  0.724 |
mixed-4d  | 19.016 : 18.253 | 18.822 : 18.252 | 18.955 : 18.253 | 18.697 :  9.150 |
mixed-8d  | 19.186 : 18.253 | 19.444 : 18.252 | 18.841 : 18.253 | 18.452 :  9.140 |
mixed-16d | 18.480 : 18.253 | 19.010 : 18.252 | 19.167 : 18.252 | 16.000 :  9.134 |
mixed-32d | 18.635 : 18.253 | 18.565 : 18.252 | 18.550 : 18.252 | 15.930 :  9.132 |
mixed-4h  | 19.079 : 18.253 | 18.990 : 18.252 | 19.157 : 18.253 | 27.834 :  9.150 |
mixed-8h  | 18.613 : 18.253 | 19.234 : 18.252 | 18.616 : 18.253 | 20.177 :  9.140 |
mixed-16h | 18.590 : 18.253 | 19.221 : 18.252 | 19.654 : 18.253 | 17.273 :  9.135 |
mixed-32h | 18.768 : 18.253 | 19.122 : 18.252 | 18.535 : 18.252 | 15.791 :  9.132 |

ext3      |        v3       |       v4.0      |       v4.1      |       v4.2      |
----------|-----------------|-----------------|-----------------|-----------------|
data      | 34.292 : 18.253 | 33.810 : 18.252 | 33.450 : 18.253 | 33.390 : 18.254 |
hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 |  0.718 :  0.728 |
mixed-4d  | 46.818 : 18.253 | 47.140 : 18.252 | 48.385 : 18.253 | 42.887 :  9.150 |
mixed-8d  | 58.554 : 18.253 | 59.277 : 18.252 | 59.673 : 18.253 | 56.760 :  9.140 |
mixed-16d | 44.631 : 18.253 | 44.291 : 18.252 | 44.729 : 18.253 | 40.237 :  9.135 |
mixed-32d | 39.110 : 18.253 | 38.735 : 18.252 | 38.902 : 18.252 | 35.270 :  9.132 |
mixed-4h  | 56.396 : 18.253 | 56.387 : 18.252 | 56.573 : 18.253 | 67.661 :  9.150 |
mixed-8h  | 58.483 : 18.253 | 58.484 : 18.252 | 59.099 : 18.253 | 77.958 :  9.140 |
mixed-16h | 42.511 : 18.253 | 42.338 : 18.252 | 42.356 : 18.252 | 51.805 :  9.135 |
mixed-32h | 38.419 : 18.253 | 38.504 : 18.252 | 38.643 : 18.252 | 40.411 :  9.132 |


Changes since v1:
- Rebase to 5.6-rc1
- Drop the mount option patch for now
- Fix fallback to READ when the server doesn't support READ_PLUS

Any questions?
Anna


Anna Schumaker (6):
  SUNRPC: Split out a function for setting current page
  SUNRPC: Add the ability to expand holes in data pages
  SUNRPC: Add the ability to shift data to a specific offset
  NFS: Add READ_PLUS data segment support
  NFS: Add READ_PLUS hole segment decoding
  NFS: Decode multiple READ_PLUS segments

 fs/nfs/nfs42xdr.c          | 169 +++++++++++++++++++++++++
 fs/nfs/nfs4proc.c          |  43 ++++++-
 fs/nfs/nfs4xdr.c           |   1 +
 include/linux/nfs4.h       |   2 +-
 include/linux/nfs_fs_sb.h  |   1 +
 include/linux/nfs_xdr.h    |   2 +-
 include/linux/sunrpc/xdr.h |   2 +
 net/sunrpc/xdr.c           | 244 ++++++++++++++++++++++++++++++++++++-
 8 files changed, 457 insertions(+), 7 deletions(-)

-- 
2.25.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/6] SUNRPC: Split out a function for setting current page
  2020-02-14 21:12 [PATCH v2 0/6] NFS: Add support for the v4.2 READ_PLUS operation schumaker.anna
@ 2020-02-14 21:12 ` schumaker.anna
  2020-02-14 21:12 ` [PATCH v2 2/6] SUNRPC: Add the ability to expand holes in data pages schumaker.anna
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: schumaker.anna @ 2020-02-14 21:12 UTC (permalink / raw)
  To: Trond.Myklebust, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

I'm going to need this bit of code in a few places for READ_PLUS
decoding, so let's make it a helper function.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 net/sunrpc/xdr.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index e5497dc2475b..8bed0ec21563 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -825,6 +825,12 @@ static int xdr_set_page_base(struct xdr_stream *xdr,
 	return 0;
 }
 
+static void xdr_set_page(struct xdr_stream *xdr, unsigned int base)
+{
+	if (xdr_set_page_base(xdr, base, PAGE_SIZE) < 0)
+		xdr_set_iov(xdr, xdr->buf->tail, xdr->nwords << 2);
+}
+
 static void xdr_set_next_page(struct xdr_stream *xdr)
 {
 	unsigned int newbase;
@@ -832,8 +838,7 @@ static void xdr_set_next_page(struct xdr_stream *xdr)
 	newbase = (1 + xdr->page_ptr - xdr->buf->pages) << PAGE_SHIFT;
 	newbase -= xdr->buf->page_base;
 
-	if (xdr_set_page_base(xdr, newbase, PAGE_SIZE) < 0)
-		xdr_set_iov(xdr, xdr->buf->tail, xdr->nwords << 2);
+	xdr_set_page(xdr, newbase);
 }
 
 static bool xdr_set_next_buffer(struct xdr_stream *xdr)
-- 
2.25.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 2/6] SUNRPC: Add the ability to expand holes in data pages
  2020-02-14 21:12 [PATCH v2 0/6] NFS: Add support for the v4.2 READ_PLUS operation schumaker.anna
  2020-02-14 21:12 ` [PATCH v2 1/6] SUNRPC: Split out a function for setting current page schumaker.anna
@ 2020-02-14 21:12 ` schumaker.anna
  2020-02-14 21:12 ` [PATCH v2 3/6] SUNRPC: Add the ability to shift data to a specific offset schumaker.anna
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: schumaker.anna @ 2020-02-14 21:12 UTC (permalink / raw)
  To: Trond.Myklebust, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

This patch adds the ability to "read a hole" into a set of XDR data
pages by taking the following steps:

1) Shift all data after the current xdr->p to the right, possibly into
   the tail,
2) Zero the specified range, and
3) Update xdr->p to point beyond the hole.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 include/linux/sunrpc/xdr.h |   1 +
 net/sunrpc/xdr.c           | 100 +++++++++++++++++++++++++++++++++++++
 2 files changed, 101 insertions(+)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index b41f34977995..81a79099ed3b 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -263,6 +263,7 @@ extern __be32 *xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes);
 extern unsigned int xdr_read_pages(struct xdr_stream *xdr, unsigned int len);
 extern void xdr_enter_page(struct xdr_stream *xdr, unsigned int len);
 extern int xdr_process_buf(struct xdr_buf *buf, unsigned int offset, unsigned int len, int (*actor)(struct scatterlist *, void *), void *data);
+extern size_t xdr_expand_hole(struct xdr_stream *, size_t, uint64_t);
 
 /**
  * xdr_stream_remaining - Return the number of bytes remaining in the stream
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 8bed0ec21563..bc9b9b0945f5 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -266,6 +266,40 @@ _shift_data_right_pages(struct page **pages, size_t pgto_base,
 	} while ((len -= copy) != 0);
 }
 
+static void
+_shift_data_right_tail(struct xdr_buf *buf, size_t pgfrom_base, size_t len)
+{
+	struct kvec *tail = buf->tail;
+
+	/* Make room for new data. */
+	if (tail->iov_len > 0)
+		memmove((char *)tail->iov_base + len, tail->iov_base, len);
+
+	_copy_from_pages((char *)tail->iov_base,
+			 buf->pages,
+			 buf->page_base + pgfrom_base,
+			 len);
+
+	tail->iov_len += len;
+}
+
+static void
+_shift_data_right(struct xdr_buf *buf, size_t to, size_t from, size_t len)
+{
+	size_t shift = len;
+
+	if ((to + len) > buf->page_len) {
+		shift = (to + len) - buf->page_len;
+		_shift_data_right_tail(buf, (from + len) - shift, shift);
+		shift = len - shift;
+	}
+
+	_shift_data_right_pages(buf->pages,
+				buf->page_base + to,
+				buf->page_base + from,
+				shift);
+}
+
 /**
  * _copy_to_pages
  * @pages: array of pages
@@ -350,6 +384,33 @@ _copy_from_pages(char *p, struct page **pages, size_t pgbase, size_t len)
 }
 EXPORT_SYMBOL_GPL(_copy_from_pages);
 
+/**
+ * _zero_data_pages
+ * @pages: array of pages
+ * @pgbase: beginning page vector address
+ * @len: length
+ */
+static void
+_zero_data_pages(struct page **pages, size_t pgbase, size_t len)
+{
+	struct page **page;
+	size_t zero;
+
+	page = pages + (pgbase >> PAGE_SHIFT);
+	pgbase &= ~PAGE_MASK;
+
+	do {
+		zero = len;
+		if (pgbase + zero > PAGE_SIZE)
+			zero = PAGE_SIZE - pgbase;
+
+		zero_user_segment(*page, pgbase, pgbase + zero);
+		page++;
+		pgbase = 0;
+
+	} while ((len -= zero) != 0);
+}
+
 /**
  * xdr_shrink_bufhead
  * @buf: xdr_buf
@@ -505,6 +566,24 @@ unsigned int xdr_stream_pos(const struct xdr_stream *xdr)
 }
 EXPORT_SYMBOL_GPL(xdr_stream_pos);
 
+/**
+ * xdr_page_pos - Return the current offset from the start of the xdr->buf->pages
+ * @xdr: pointer to struct xdr_stream
+ */
+static size_t xdr_page_pos(const struct xdr_stream *xdr)
+{
+	unsigned int offset;
+	unsigned int base = xdr->buf->page_len;
+	void *kaddr = xdr->buf->tail->iov_base;;
+
+	if (xdr->page_ptr) {
+		base   = (xdr->page_ptr - xdr->buf->pages) * PAGE_SIZE;
+		kaddr  = page_address(*xdr->page_ptr);
+	}
+	offset = xdr->p - (__be32 *)kaddr;
+	return base + (offset * sizeof(__be32));
+}
+
 /**
  * xdr_init_encode - Initialize a struct xdr_stream for sending data.
  * @xdr: pointer to xdr_stream struct
@@ -1062,6 +1141,27 @@ unsigned int xdr_read_pages(struct xdr_stream *xdr, unsigned int len)
 }
 EXPORT_SYMBOL_GPL(xdr_read_pages);
 
+size_t xdr_expand_hole(struct xdr_stream *xdr, size_t offset, uint64_t length)
+{
+	struct xdr_buf *buf = xdr->buf;
+	size_t from = 0;
+
+	if ((offset + length) < offset ||
+	    (offset + length) > buf->page_len)
+		length = buf->page_len - offset;
+
+	if (offset == 0)
+		xdr_align_pages(xdr, xdr->nwords << 2);
+	else
+		from = xdr_page_pos(xdr);
+
+	_shift_data_right(buf, offset + length, from, xdr->nwords << 2);
+	_zero_data_pages(buf->pages, buf->page_base + offset, length);
+	xdr_set_page(xdr, offset + length);
+	return length;
+}
+EXPORT_SYMBOL_GPL(xdr_expand_hole);
+
 /**
  * xdr_enter_page - decode data from the XDR page
  * @xdr: pointer to xdr_stream struct
-- 
2.25.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 3/6] SUNRPC: Add the ability to shift data to a specific offset
  2020-02-14 21:12 [PATCH v2 0/6] NFS: Add support for the v4.2 READ_PLUS operation schumaker.anna
  2020-02-14 21:12 ` [PATCH v2 1/6] SUNRPC: Split out a function for setting current page schumaker.anna
  2020-02-14 21:12 ` [PATCH v2 2/6] SUNRPC: Add the ability to expand holes in data pages schumaker.anna
@ 2020-02-14 21:12 ` schumaker.anna
  2020-02-14 21:12 ` [PATCH v2 4/6] NFS: Add READ_PLUS data segment support schumaker.anna
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: schumaker.anna @ 2020-02-14 21:12 UTC (permalink / raw)
  To: Trond.Myklebust, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

Expanding holes tends to put the data content a few bytes to the right
of where we want it.  This patch implements a left-shift operation to
line everything up properly.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 include/linux/sunrpc/xdr.h |   1 +
 net/sunrpc/xdr.c           | 135 +++++++++++++++++++++++++++++++++++++
 2 files changed, 136 insertions(+)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 81a79099ed3b..90abf732c8ee 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -264,6 +264,7 @@ extern unsigned int xdr_read_pages(struct xdr_stream *xdr, unsigned int len);
 extern void xdr_enter_page(struct xdr_stream *xdr, unsigned int len);
 extern int xdr_process_buf(struct xdr_buf *buf, unsigned int offset, unsigned int len, int (*actor)(struct scatterlist *, void *), void *data);
 extern size_t xdr_expand_hole(struct xdr_stream *, size_t, uint64_t);
+extern uint64_t xdr_align_data(struct xdr_stream *, uint64_t, uint64_t);
 
 /**
  * xdr_stream_remaining - Return the number of bytes remaining in the stream
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index bc9b9b0945f5..a857c2308ee1 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -19,6 +19,9 @@
 #include <linux/bvec.h>
 #include <trace/events/sunrpc.h>
 
+static void _copy_to_pages(struct page **, size_t, const char *, size_t);
+
+
 /*
  * XDR functions for basic NFS types
  */
@@ -300,6 +303,117 @@ _shift_data_right(struct xdr_buf *buf, size_t to, size_t from, size_t len)
 				shift);
 }
 
+
+/**
+ * _shift_data_left_pages
+ * @pages: vector of pages containing both the source and dest memory area.
+ * @pgto_base: page vector address of destination
+ * @pgfrom_base: page vector address of source
+ * @len: number of bytes to copy
+ *
+ * Note: the addresses pgto_base and pgfrom_base are both calculated in
+ *       the same way:
+ *            if a memory area starts at byte 'base' in page 'pages[i]',
+ *            then its address is given as (i << PAGE_CACHE_SHIFT) + base
+ * Alse note: pgto_base must be < pgfrom_base, but the memory areas
+ * 	they point to may overlap.
+ */
+static void
+_shift_data_left_pages(struct page **pages, size_t pgto_base,
+			size_t pgfrom_base, size_t len)
+{
+	struct page **pgfrom, **pgto;
+	char *vfrom, *vto;
+	size_t copy;
+
+	BUG_ON(pgfrom_base <= pgto_base);
+
+	pgto = pages + (pgto_base >> PAGE_SHIFT);
+	pgfrom = pages + (pgfrom_base >> PAGE_SHIFT);
+
+	pgto_base = pgto_base % PAGE_SIZE;
+	pgfrom_base = pgfrom_base % PAGE_SIZE;
+
+	do {
+		if (pgto_base >= PAGE_SIZE) {
+			pgto_base = 0;
+			pgto++;
+		}
+		if (pgfrom_base >= PAGE_SIZE){
+			pgfrom_base = 0;
+			pgfrom++;
+		}
+
+		copy = len;
+		if (copy > (PAGE_SIZE - pgto_base))
+			copy = PAGE_SIZE - pgto_base;
+		if (copy > (PAGE_SIZE - pgfrom_base))
+			copy = PAGE_SIZE - pgfrom_base;
+
+		if (pgto_base == 131056)
+			break;
+
+		vto = kmap_atomic(*pgto);
+		if (*pgto != *pgfrom) {
+			vfrom = kmap_atomic(*pgfrom);
+			memcpy(vto + pgto_base, vfrom + pgfrom_base, copy);
+			kunmap_atomic(vfrom);
+		} else
+			memmove(vto + pgto_base, vto + pgfrom_base, copy);
+		flush_dcache_page(*pgto);
+		kunmap_atomic(vto);
+
+		pgto_base += copy;
+		pgfrom_base += copy;
+
+	} while ((len -= copy) != 0);
+}
+
+static void
+_shift_data_left_tail(struct xdr_buf *buf, size_t pgto_base,
+		      size_t tail_from, size_t len)
+{
+	struct kvec *tail = buf->tail;
+	size_t shift = len;
+
+	if (len == 0)
+		return;
+	if (pgto_base + len > buf->page_len)
+		shift = buf->page_len - pgto_base;
+
+	_copy_to_pages(buf->pages,
+			buf->page_base + pgto_base,
+			(char *)(tail->iov_base + tail_from),
+			shift);
+
+	memmove((char *)tail->iov_base, tail->iov_base + tail_from + shift, shift);
+	tail->iov_len -= (tail_from + shift);
+}
+
+static void
+_shift_data_left(struct xdr_buf *buf, size_t to, size_t from, size_t len)
+{
+	size_t shift = len;
+
+	if (from < buf->page_len) {
+		shift = min(len, buf->page_len - from);
+		_shift_data_left_pages(buf->pages,
+					buf->page_base + to,
+					buf->page_base + from,
+					shift);
+		to += shift;
+		from += shift;
+		shift = len - shift;
+	}
+
+	if (shift == 0)
+		return;
+	if (from >= buf->page_len)
+		from -= buf->page_len;
+
+	_shift_data_left_tail(buf, to, from, shift);
+}
+
 /**
  * _copy_to_pages
  * @pages: array of pages
@@ -1162,6 +1276,27 @@ size_t xdr_expand_hole(struct xdr_stream *xdr, size_t offset, uint64_t length)
 }
 EXPORT_SYMBOL_GPL(xdr_expand_hole);
 
+uint64_t xdr_align_data(struct xdr_stream *xdr, uint64_t offset, uint64_t length)
+{
+	struct xdr_buf *buf = xdr->buf;
+	size_t from = offset;
+
+	if (offset + length > buf->page_len)
+		length = buf->page_len - offset;
+
+	if (offset == 0)
+		xdr_align_pages(xdr, xdr->nwords << 2);
+	else {
+		from = xdr_page_pos(xdr);
+		_shift_data_left(buf, offset, from, length);
+	}
+
+	xdr->nwords -= XDR_QUADLEN(length);
+	xdr_set_page(xdr, from + length);
+	return length;
+}
+EXPORT_SYMBOL_GPL(xdr_align_data);
+
 /**
  * xdr_enter_page - decode data from the XDR page
  * @xdr: pointer to xdr_stream struct
-- 
2.25.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 4/6] NFS: Add READ_PLUS data segment support
  2020-02-14 21:12 [PATCH v2 0/6] NFS: Add support for the v4.2 READ_PLUS operation schumaker.anna
                   ` (2 preceding siblings ...)
  2020-02-14 21:12 ` [PATCH v2 3/6] SUNRPC: Add the ability to shift data to a specific offset schumaker.anna
@ 2020-02-14 21:12 ` schumaker.anna
  2020-02-14 22:28   ` Chuck Lever
  2020-02-14 21:12 ` [PATCH v2 5/6] NFS: Add READ_PLUS hole segment decoding schumaker.anna
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: schumaker.anna @ 2020-02-14 21:12 UTC (permalink / raw)
  To: Trond.Myklebust, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

This patch adds client support for decoding a single NFS4_CONTENT_DATA
segment returned by the server. This is the simplest implementation
possible, since it does not account for any hole segments in the reply.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfs/nfs42xdr.c         | 138 ++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4proc.c         |  43 +++++++++++-
 fs/nfs/nfs4xdr.c          |   1 +
 include/linux/nfs4.h      |   2 +-
 include/linux/nfs_fs_sb.h |   1 +
 include/linux/nfs_xdr.h   |   2 +-
 6 files changed, 182 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index c03f3246d6c5..bf118ecabe2c 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -45,6 +45,15 @@
 #define encode_deallocate_maxsz		(op_encode_hdr_maxsz + \
 					 encode_fallocate_maxsz)
 #define decode_deallocate_maxsz		(op_decode_hdr_maxsz)
+#define encode_read_plus_maxsz		(op_encode_hdr_maxsz + \
+					 encode_stateid_maxsz + 3)
+#define NFS42_READ_PLUS_SEGMENT_SIZE	(1 /* data_content4 */ + \
+					 2 /* data_info4.di_offset */ + \
+					 2 /* data_info4.di_length */)
+#define decode_read_plus_maxsz		(op_decode_hdr_maxsz + \
+					 1 /* rpr_eof */ + \
+					 1 /* rpr_contents count */ + \
+					 NFS42_READ_PLUS_SEGMENT_SIZE)
 #define encode_seek_maxsz		(op_encode_hdr_maxsz + \
 					 encode_stateid_maxsz + \
 					 2 /* offset */ + \
@@ -128,6 +137,14 @@
 					 decode_putfh_maxsz + \
 					 decode_deallocate_maxsz + \
 					 decode_getattr_maxsz)
+#define NFS4_enc_read_plus_sz		(compound_encode_hdr_maxsz + \
+					 encode_sequence_maxsz + \
+					 encode_putfh_maxsz + \
+					 encode_read_plus_maxsz)
+#define NFS4_dec_read_plus_sz		(compound_decode_hdr_maxsz + \
+					 decode_sequence_maxsz + \
+					 decode_putfh_maxsz + \
+					 decode_read_plus_maxsz)
 #define NFS4_enc_seek_sz		(compound_encode_hdr_maxsz + \
 					 encode_sequence_maxsz + \
 					 encode_putfh_maxsz + \
@@ -252,6 +269,16 @@ static void encode_deallocate(struct xdr_stream *xdr,
 	encode_fallocate(xdr, args);
 }
 
+static void encode_read_plus(struct xdr_stream *xdr,
+			     const struct nfs_pgio_args *args,
+			     struct compound_hdr *hdr)
+{
+	encode_op_hdr(xdr, OP_READ_PLUS, decode_read_plus_maxsz, hdr);
+	encode_nfs4_stateid(xdr, &args->stateid);
+	encode_uint64(xdr, args->offset);
+	encode_uint32(xdr, args->count);
+}
+
 static void encode_seek(struct xdr_stream *xdr,
 			const struct nfs42_seek_args *args,
 			struct compound_hdr *hdr)
@@ -446,6 +473,29 @@ static void nfs4_xdr_enc_deallocate(struct rpc_rqst *req,
 	encode_nops(&hdr);
 }
 
+/*
+ * Encode READ_PLUS request
+ */
+static void nfs4_xdr_enc_read_plus(struct rpc_rqst *req,
+				   struct xdr_stream *xdr,
+				   const void *data)
+{
+	const struct nfs_pgio_args *args = data;
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
+	};
+
+	encode_compound_hdr(xdr, req, &hdr);
+	encode_sequence(xdr, &args->seq_args, &hdr);
+	encode_putfh(xdr, args->fh, &hdr);
+	encode_read_plus(xdr, args, &hdr);
+
+	rpc_prepare_reply_pages(req, args->pages, args->pgbase,
+				args->count, hdr.replen);
+	req->rq_rcv_buf.flags |= XDRBUF_READ;
+	encode_nops(&hdr);
+}
+
 /*
  * Encode SEEK request
  */
@@ -694,6 +744,67 @@ static int decode_deallocate(struct xdr_stream *xdr, struct nfs42_falloc_res *re
 	return decode_op_hdr(xdr, OP_DEALLOCATE);
 }
 
+static uint32_t decode_read_plus_data(struct xdr_stream *xdr, struct nfs_pgio_res *res,
+				      uint32_t *eof)
+{
+	__be32 *p;
+	uint32_t count, recvd;
+	uint64_t offset;
+
+	p = xdr_inline_decode(xdr, 8 + 4);
+	if (unlikely(!p))
+		return -EIO;
+
+	p = xdr_decode_hyper(p, &offset);
+	count = be32_to_cpup(p);
+	if (count == 0)
+		return 0;
+
+	recvd = xdr_read_pages(xdr, count);
+	if (count > recvd) {
+		dprintk("NFS: server cheating in read reply: "
+				"count %u > recvd %u\n", count, recvd);
+		count = recvd;
+		*eof = 0;
+	}
+
+	return count;
+}
+
+static int decode_read_plus(struct xdr_stream *xdr, struct nfs_pgio_res *res)
+{
+	__be32 *p;
+	uint32_t count, eof, segments, type;
+	int status;
+
+	status = decode_op_hdr(xdr, OP_READ_PLUS);
+	if (status)
+		return status;
+
+	p = xdr_inline_decode(xdr, 4 + 4);
+	if (unlikely(!p))
+		return -EIO;
+
+	eof = be32_to_cpup(p++);
+	segments = be32_to_cpup(p++);
+	if (segments == 0)
+		return 0;
+
+	p = xdr_inline_decode(xdr, 4);
+	if (unlikely(!p))
+		return -EIO;
+
+	type = be32_to_cpup(p++);
+	if (type == NFS4_CONTENT_DATA)
+		count = decode_read_plus_data(xdr, res, &eof);
+	else
+		return -EINVAL;
+
+	res->eof = eof;
+	res->count = count;
+	return 0;
+}
+
 static int decode_seek(struct xdr_stream *xdr, struct nfs42_seek_res *res)
 {
 	int status;
@@ -870,6 +981,33 @@ static int nfs4_xdr_dec_deallocate(struct rpc_rqst *rqstp,
 	return status;
 }
 
+/*
+ * Decode READ_PLUS request
+ */
+static int nfs4_xdr_dec_read_plus(struct rpc_rqst *rqstp,
+				  struct xdr_stream *xdr,
+				  void *data)
+{
+	struct nfs_pgio_res *res = data;
+	struct compound_hdr hdr;
+	int status;
+
+	status = decode_compound_hdr(xdr, &hdr);
+	if (status)
+		goto out;
+	status = decode_sequence(xdr, &res->seq_res, rqstp);
+	if (status)
+		goto out;
+	status = decode_putfh(xdr);
+	if (status)
+		goto out;
+	status = decode_read_plus(xdr, res);
+	if (!status)
+		status = res->count;
+out:
+	return status;
+}
+
 /*
  * Decode SEEK request
  */
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 95d07a3dc5d1..ed3ec8c36273 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -69,6 +69,10 @@
 
 #include "nfs4trace.h"
 
+#ifdef CONFIG_NFS_V4_2
+#include "nfs42.h"
+#endif /* CONFIG_NFS_V4_2 */
+
 #define NFSDBG_FACILITY		NFSDBG_PROC
 
 #define NFS4_BITMASK_SZ		3
@@ -5199,28 +5203,60 @@ static bool nfs4_read_stateid_changed(struct rpc_task *task,
 	return true;
 }
 
+static bool nfs4_read_plus_not_supported(struct rpc_task *task,
+					 struct nfs_pgio_header *hdr)
+{
+	struct nfs_server *server = NFS_SERVER(hdr->inode);
+	struct rpc_message *msg = &task->tk_msg;
+
+	if (msg->rpc_proc == &nfs4_procedures[NFSPROC4_CLNT_READ_PLUS] &&
+	    server->caps & NFS_CAP_READ_PLUS && task->tk_status == -ENOTSUPP) {
+		server->caps &= ~NFS_CAP_READ_PLUS;
+		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
+		rpc_restart_call_prepare(task);
+		return true;
+	}
+	return false;
+}
+
 static int nfs4_read_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
 {
-
 	dprintk("--> %s\n", __func__);
 
 	if (!nfs4_sequence_done(task, &hdr->res.seq_res))
 		return -EAGAIN;
 	if (nfs4_read_stateid_changed(task, &hdr->args))
 		return -EAGAIN;
+	if (nfs4_read_plus_not_supported(task, hdr))
+		return -EAGAIN;
 	if (task->tk_status > 0)
 		nfs_invalidate_atime(hdr->inode);
 	return hdr->pgio_done_cb ? hdr->pgio_done_cb(task, hdr) :
 				    nfs4_read_done_cb(task, hdr);
 }
 
+#ifdef CONFIG_NFS_V4_2
+static void nfs42_read_plus_support(struct nfs_server *server, struct rpc_message *msg)
+{
+	if (server->caps & NFS_CAP_READ_PLUS)
+		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ_PLUS];
+	else
+		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
+}
+#else
+static void nfs42_read_plus_support(struct nfs_server *server, struct rpc_message *msg)
+{
+	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
+}
+#endif /* CONFIG_NFS_V4_2 */
+
 static void nfs4_proc_read_setup(struct nfs_pgio_header *hdr,
 				 struct rpc_message *msg)
 {
 	hdr->timestamp   = jiffies;
 	if (!hdr->pgio_done_cb)
 		hdr->pgio_done_cb = nfs4_read_done_cb;
-	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
+	nfs42_read_plus_support(NFS_SERVER(hdr->inode), msg);
 	nfs4_init_sequence(&hdr->args.seq_args, &hdr->res.seq_res, 0, 0);
 }
 
@@ -9970,7 +10006,8 @@ static const struct nfs4_minor_version_ops nfs_v4_2_minor_ops = {
 		| NFS_CAP_SEEK
 		| NFS_CAP_LAYOUTSTATS
 		| NFS_CAP_CLONE
-		| NFS_CAP_LAYOUTERROR,
+		| NFS_CAP_LAYOUTERROR
+		| NFS_CAP_READ_PLUS,
 	.init_client = nfs41_init_client,
 	.shutdown_client = nfs41_shutdown_client,
 	.match_stateid = nfs41_match_stateid,
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 47817ef0aadb..68b2917d0537 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -7584,6 +7584,7 @@ const struct rpc_procinfo nfs4_procedures[] = {
 	PROC42(COPY_NOTIFY,	enc_copy_notify,	dec_copy_notify),
 	PROC(LOOKUPP,		enc_lookupp,		dec_lookupp),
 	PROC42(LAYOUTERROR,	enc_layouterror,	dec_layouterror),
+	PROC42(READ_PLUS,	enc_read_plus,		dec_read_plus),
 };
 
 static unsigned int nfs_version4_counts[ARRAY_SIZE(nfs4_procedures)];
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 82d8fb422092..c1eeef52545c 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -540,8 +540,8 @@ enum {
 
 	NFSPROC4_CLNT_LOOKUPP,
 	NFSPROC4_CLNT_LAYOUTERROR,
-
 	NFSPROC4_CLNT_COPY_NOTIFY,
+	NFSPROC4_CLNT_READ_PLUS,
 };
 
 /* nfs41 types */
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 465fa98258a3..11248c5a7b24 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -281,5 +281,6 @@ struct nfs_server {
 #define NFS_CAP_OFFLOAD_CANCEL	(1U << 25)
 #define NFS_CAP_LAYOUTERROR	(1U << 26)
 #define NFS_CAP_COPY_NOTIFY	(1U << 27)
+#define NFS_CAP_READ_PLUS	(1U << 28)
 
 #endif
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 94c77ed55ce1..8efbf3d8b263 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -655,7 +655,7 @@ struct nfs_pgio_args {
 struct nfs_pgio_res {
 	struct nfs4_sequence_res	seq_res;
 	struct nfs_fattr *	fattr;
-	__u32			count;
+	__u64			count;
 	__u32			op_status;
 	union {
 		struct {
-- 
2.25.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 5/6] NFS: Add READ_PLUS hole segment decoding
  2020-02-14 21:12 [PATCH v2 0/6] NFS: Add support for the v4.2 READ_PLUS operation schumaker.anna
                   ` (3 preceding siblings ...)
  2020-02-14 21:12 ` [PATCH v2 4/6] NFS: Add READ_PLUS data segment support schumaker.anna
@ 2020-02-14 21:12 ` schumaker.anna
  2020-02-14 21:12 ` [PATCH v2 6/6] NFS: Decode multiple READ_PLUS segments schumaker.anna
  2020-02-20 17:40 ` [PATCH v2 0/6] NFS: Add support for the v4.2 READ_PLUS operation Chuck Lever
  6 siblings, 0 replies; 17+ messages in thread
From: schumaker.anna @ 2020-02-14 21:12 UTC (permalink / raw)
  To: Trond.Myklebust, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

We keep things simple for now by only decoding a single hole or data
segment returned by the server, even if they returned more to us.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfs/nfs42xdr.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index bf118ecabe2c..3407a3cf2e13 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -47,7 +47,7 @@
 #define decode_deallocate_maxsz		(op_decode_hdr_maxsz)
 #define encode_read_plus_maxsz		(op_encode_hdr_maxsz + \
 					 encode_stateid_maxsz + 3)
-#define NFS42_READ_PLUS_SEGMENT_SIZE	(1 /* data_content4 */ + \
+#define NFS42_READ_PLUS_SEGMENT_SIZE	(2 /* data_content4 */ + \
 					 2 /* data_info4.di_offset */ + \
 					 2 /* data_info4.di_length */)
 #define decode_read_plus_maxsz		(op_decode_hdr_maxsz + \
@@ -771,6 +771,31 @@ static uint32_t decode_read_plus_data(struct xdr_stream *xdr, struct nfs_pgio_re
 	return count;
 }
 
+static uint32_t decode_read_plus_hole(struct xdr_stream *xdr, struct nfs_pgio_res *res,
+				      uint32_t *eof)
+{
+	__be32 *p;
+	uint64_t offset, length;
+	size_t recvd;
+
+	p = xdr_inline_decode(xdr, 8 + 8);
+	if (unlikely(!p))
+		return -EIO;
+
+	p = xdr_decode_hyper(p, &offset);
+	p = xdr_decode_hyper(p, &length);
+	if (length == 0)
+		return 0;
+
+	recvd = xdr_expand_hole(xdr, 0, length);
+	if (recvd < length) {
+		*eof = 0;
+		length = recvd;
+	}
+
+	return length;
+}
+
 static int decode_read_plus(struct xdr_stream *xdr, struct nfs_pgio_res *res)
 {
 	__be32 *p;
@@ -798,7 +823,10 @@ static int decode_read_plus(struct xdr_stream *xdr, struct nfs_pgio_res *res)
 	if (type == NFS4_CONTENT_DATA)
 		count = decode_read_plus_data(xdr, res, &eof);
 	else
-		return -EINVAL;
+		count = decode_read_plus_hole(xdr, res, &eof);
+
+	if (segments > 1)
+		eof = 0;
 
 	res->eof = eof;
 	res->count = count;
-- 
2.25.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 6/6] NFS: Decode multiple READ_PLUS segments
  2020-02-14 21:12 [PATCH v2 0/6] NFS: Add support for the v4.2 READ_PLUS operation schumaker.anna
                   ` (4 preceding siblings ...)
  2020-02-14 21:12 ` [PATCH v2 5/6] NFS: Add READ_PLUS hole segment decoding schumaker.anna
@ 2020-02-14 21:12 ` schumaker.anna
  2020-02-20 17:40 ` [PATCH v2 0/6] NFS: Add support for the v4.2 READ_PLUS operation Chuck Lever
  6 siblings, 0 replies; 17+ messages in thread
From: schumaker.anna @ 2020-02-14 21:12 UTC (permalink / raw)
  To: Trond.Myklebust, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

We now have everything we need to read holes and shift data to where
it's supposed to be. I switch over to using xdr_align_data() to put data
segments in the proper place.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfs/nfs42xdr.c | 43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index 3407a3cf2e13..b5c638bcab66 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -53,7 +53,7 @@
 #define decode_read_plus_maxsz		(op_decode_hdr_maxsz + \
 					 1 /* rpr_eof */ + \
 					 1 /* rpr_contents count */ + \
-					 NFS42_READ_PLUS_SEGMENT_SIZE)
+					 (2 * NFS42_READ_PLUS_SEGMENT_SIZE))
 #define encode_seek_maxsz		(op_encode_hdr_maxsz + \
 					 encode_stateid_maxsz + \
 					 2 /* offset */ + \
@@ -745,7 +745,7 @@ static int decode_deallocate(struct xdr_stream *xdr, struct nfs42_falloc_res *re
 }
 
 static uint32_t decode_read_plus_data(struct xdr_stream *xdr, struct nfs_pgio_res *res,
-				      uint32_t *eof)
+				      uint32_t *eof, uint64_t total)
 {
 	__be32 *p;
 	uint32_t count, recvd;
@@ -760,7 +760,7 @@ static uint32_t decode_read_plus_data(struct xdr_stream *xdr, struct nfs_pgio_re
 	if (count == 0)
 		return 0;
 
-	recvd = xdr_read_pages(xdr, count);
+	recvd = xdr_align_data(xdr, total, count);
 	if (count > recvd) {
 		dprintk("NFS: server cheating in read reply: "
 				"count %u > recvd %u\n", count, recvd);
@@ -772,7 +772,7 @@ static uint32_t decode_read_plus_data(struct xdr_stream *xdr, struct nfs_pgio_re
 }
 
 static uint32_t decode_read_plus_hole(struct xdr_stream *xdr, struct nfs_pgio_res *res,
-				      uint32_t *eof)
+				     uint32_t *eof, uint64_t total)
 {
 	__be32 *p;
 	uint64_t offset, length;
@@ -787,7 +787,7 @@ static uint32_t decode_read_plus_hole(struct xdr_stream *xdr, struct nfs_pgio_re
 	if (length == 0)
 		return 0;
 
-	recvd = xdr_expand_hole(xdr, 0, length);
+	recvd = xdr_expand_hole(xdr, total, length);
 	if (recvd < length) {
 		*eof = 0;
 		length = recvd;
@@ -799,8 +799,9 @@ static uint32_t decode_read_plus_hole(struct xdr_stream *xdr, struct nfs_pgio_re
 static int decode_read_plus(struct xdr_stream *xdr, struct nfs_pgio_res *res)
 {
 	__be32 *p;
-	uint32_t count, eof, segments, type;
-	int status;
+	uint32_t eof, segments, type, total;
+	int32_t count;
+	int status, i;
 
 	status = decode_op_hdr(xdr, OP_READ_PLUS);
 	if (status)
@@ -810,26 +811,28 @@ static int decode_read_plus(struct xdr_stream *xdr, struct nfs_pgio_res *res)
 	if (unlikely(!p))
 		return -EIO;
 
+	total = 0;
 	eof = be32_to_cpup(p++);
 	segments = be32_to_cpup(p++);
-	if (segments == 0)
-		return 0;
 
-	p = xdr_inline_decode(xdr, 4);
-	if (unlikely(!p))
-		return -EIO;
+	for (i = 0; i < segments; i++) {
+		p = xdr_inline_decode(xdr, 4);
+		if (unlikely(!p))
+			return -EIO;
 
-	type = be32_to_cpup(p++);
-	if (type == NFS4_CONTENT_DATA)
-		count = decode_read_plus_data(xdr, res, &eof);
-	else
-		count = decode_read_plus_hole(xdr, res, &eof);
+		type = be32_to_cpup(p);
+		if (type == NFS4_CONTENT_DATA)
+			count = decode_read_plus_data(xdr, res, &eof, total);
+		else
+			count = decode_read_plus_hole(xdr, res, &eof, total);
 
-	if (segments > 1)
-		eof = 0;
+		if (count < 0)
+			return count;
+		total += count;
+	}
 
 	res->eof = eof;
-	res->count = count;
+	res->count = total;
 	return 0;
 }
 
-- 
2.25.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/6] NFS: Add READ_PLUS data segment support
  2020-02-14 21:12 ` [PATCH v2 4/6] NFS: Add READ_PLUS data segment support schumaker.anna
@ 2020-02-14 22:28   ` Chuck Lever
  2020-02-20 14:42     ` Anna Schumaker
  0 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2020-02-14 22:28 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Trond.Myklebust, Linux NFS Mailing List, Anna Schumaker



> On Feb 14, 2020, at 4:12 PM, schumaker.anna@gmail.com wrote:
> 
> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> 
> This patch adds client support for decoding a single NFS4_CONTENT_DATA
> segment returned by the server. This is the simplest implementation
> possible, since it does not account for any hole segments in the reply.
> 
> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> ---
> fs/nfs/nfs42xdr.c         | 138 ++++++++++++++++++++++++++++++++++++++
> fs/nfs/nfs4proc.c         |  43 +++++++++++-
> fs/nfs/nfs4xdr.c          |   1 +
> include/linux/nfs4.h      |   2 +-
> include/linux/nfs_fs_sb.h |   1 +
> include/linux/nfs_xdr.h   |   2 +-
> 6 files changed, 182 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> index c03f3246d6c5..bf118ecabe2c 100644
> --- a/fs/nfs/nfs42xdr.c
> +++ b/fs/nfs/nfs42xdr.c
> @@ -45,6 +45,15 @@
> #define encode_deallocate_maxsz		(op_encode_hdr_maxsz + \
> 					 encode_fallocate_maxsz)
> #define decode_deallocate_maxsz		(op_decode_hdr_maxsz)
> +#define encode_read_plus_maxsz		(op_encode_hdr_maxsz + \
> +					 encode_stateid_maxsz + 3)
> +#define NFS42_READ_PLUS_SEGMENT_SIZE	(1 /* data_content4 */ + \
> +					 2 /* data_info4.di_offset */ + \
> +					 2 /* data_info4.di_length */)
> +#define decode_read_plus_maxsz		(op_decode_hdr_maxsz + \
> +					 1 /* rpr_eof */ + \
> +					 1 /* rpr_contents count */ + \
> +					 NFS42_READ_PLUS_SEGMENT_SIZE)
> #define encode_seek_maxsz		(op_encode_hdr_maxsz + \
> 					 encode_stateid_maxsz + \
> 					 2 /* offset */ + \
> @@ -128,6 +137,14 @@
> 					 decode_putfh_maxsz + \
> 					 decode_deallocate_maxsz + \
> 					 decode_getattr_maxsz)
> +#define NFS4_enc_read_plus_sz		(compound_encode_hdr_maxsz + \
> +					 encode_sequence_maxsz + \
> +					 encode_putfh_maxsz + \
> +					 encode_read_plus_maxsz)
> +#define NFS4_dec_read_plus_sz		(compound_decode_hdr_maxsz + \
> +					 decode_sequence_maxsz + \
> +					 decode_putfh_maxsz + \
> +					 decode_read_plus_maxsz)
> #define NFS4_enc_seek_sz		(compound_encode_hdr_maxsz + \
> 					 encode_sequence_maxsz + \
> 					 encode_putfh_maxsz + \
> @@ -252,6 +269,16 @@ static void encode_deallocate(struct xdr_stream *xdr,
> 	encode_fallocate(xdr, args);
> }
> 
> +static void encode_read_plus(struct xdr_stream *xdr,
> +			     const struct nfs_pgio_args *args,
> +			     struct compound_hdr *hdr)
> +{
> +	encode_op_hdr(xdr, OP_READ_PLUS, decode_read_plus_maxsz, hdr);
> +	encode_nfs4_stateid(xdr, &args->stateid);
> +	encode_uint64(xdr, args->offset);
> +	encode_uint32(xdr, args->count);
> +}
> +
> static void encode_seek(struct xdr_stream *xdr,
> 			const struct nfs42_seek_args *args,
> 			struct compound_hdr *hdr)
> @@ -446,6 +473,29 @@ static void nfs4_xdr_enc_deallocate(struct rpc_rqst *req,
> 	encode_nops(&hdr);
> }
> 
> +/*
> + * Encode READ_PLUS request
> + */
> +static void nfs4_xdr_enc_read_plus(struct rpc_rqst *req,
> +				   struct xdr_stream *xdr,
> +				   const void *data)
> +{
> +	const struct nfs_pgio_args *args = data;
> +	struct compound_hdr hdr = {
> +		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
> +	};
> +
> +	encode_compound_hdr(xdr, req, &hdr);
> +	encode_sequence(xdr, &args->seq_args, &hdr);
> +	encode_putfh(xdr, args->fh, &hdr);
> +	encode_read_plus(xdr, args, &hdr);
> +
> +	rpc_prepare_reply_pages(req, args->pages, args->pgbase,
> +				args->count, hdr.replen);
> +	req->rq_rcv_buf.flags |= XDRBUF_READ;

IMO this line is incorrect.

RFC 8267 Section 6.1 does not list any part of the result of READ_PLUS
as DDP-eligible. There's no way for a client to know how to set up
Write chunks, unless it knows exactly where the file's holes are in
advance. Even then... racy.

Just curious, have you tried READ_PLUS with proto=rdma ?


> +	encode_nops(&hdr);
> +}
> +
> /*
>  * Encode SEEK request
>  */
> @@ -694,6 +744,67 @@ static int decode_deallocate(struct xdr_stream *xdr, struct nfs42_falloc_res *re
> 	return decode_op_hdr(xdr, OP_DEALLOCATE);
> }
> 
> +static uint32_t decode_read_plus_data(struct xdr_stream *xdr, struct nfs_pgio_res *res,
> +				      uint32_t *eof)
> +{
> +	__be32 *p;
> +	uint32_t count, recvd;
> +	uint64_t offset;
> +
> +	p = xdr_inline_decode(xdr, 8 + 4);
> +	if (unlikely(!p))
> +		return -EIO;
> +
> +	p = xdr_decode_hyper(p, &offset);
> +	count = be32_to_cpup(p);
> +	if (count == 0)
> +		return 0;
> +
> +	recvd = xdr_read_pages(xdr, count);
> +	if (count > recvd) {
> +		dprintk("NFS: server cheating in read reply: "
> +				"count %u > recvd %u\n", count, recvd);
> +		count = recvd;
> +		*eof = 0;
> +	}
> +
> +	return count;
> +}
> +
> +static int decode_read_plus(struct xdr_stream *xdr, struct nfs_pgio_res *res)
> +{
> +	__be32 *p;
> +	uint32_t count, eof, segments, type;
> +	int status;
> +
> +	status = decode_op_hdr(xdr, OP_READ_PLUS);
> +	if (status)
> +		return status;
> +
> +	p = xdr_inline_decode(xdr, 4 + 4);
> +	if (unlikely(!p))
> +		return -EIO;
> +
> +	eof = be32_to_cpup(p++);
> +	segments = be32_to_cpup(p++);
> +	if (segments == 0)
> +		return 0;
> +
> +	p = xdr_inline_decode(xdr, 4);
> +	if (unlikely(!p))
> +		return -EIO;
> +
> +	type = be32_to_cpup(p++);
> +	if (type == NFS4_CONTENT_DATA)
> +		count = decode_read_plus_data(xdr, res, &eof);
> +	else
> +		return -EINVAL;
> +
> +	res->eof = eof;
> +	res->count = count;
> +	return 0;
> +}
> +
> static int decode_seek(struct xdr_stream *xdr, struct nfs42_seek_res *res)
> {
> 	int status;
> @@ -870,6 +981,33 @@ static int nfs4_xdr_dec_deallocate(struct rpc_rqst *rqstp,
> 	return status;
> }
> 
> +/*
> + * Decode READ_PLUS request
> + */
> +static int nfs4_xdr_dec_read_plus(struct rpc_rqst *rqstp,
> +				  struct xdr_stream *xdr,
> +				  void *data)
> +{
> +	struct nfs_pgio_res *res = data;
> +	struct compound_hdr hdr;
> +	int status;
> +
> +	status = decode_compound_hdr(xdr, &hdr);
> +	if (status)
> +		goto out;
> +	status = decode_sequence(xdr, &res->seq_res, rqstp);
> +	if (status)
> +		goto out;
> +	status = decode_putfh(xdr);
> +	if (status)
> +		goto out;
> +	status = decode_read_plus(xdr, res);
> +	if (!status)
> +		status = res->count;
> +out:
> +	return status;
> +}
> +
> /*
>  * Decode SEEK request
>  */
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 95d07a3dc5d1..ed3ec8c36273 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -69,6 +69,10 @@
> 
> #include "nfs4trace.h"
> 
> +#ifdef CONFIG_NFS_V4_2
> +#include "nfs42.h"
> +#endif /* CONFIG_NFS_V4_2 */
> +
> #define NFSDBG_FACILITY		NFSDBG_PROC
> 
> #define NFS4_BITMASK_SZ		3
> @@ -5199,28 +5203,60 @@ static bool nfs4_read_stateid_changed(struct rpc_task *task,
> 	return true;
> }
> 
> +static bool nfs4_read_plus_not_supported(struct rpc_task *task,
> +					 struct nfs_pgio_header *hdr)
> +{
> +	struct nfs_server *server = NFS_SERVER(hdr->inode);
> +	struct rpc_message *msg = &task->tk_msg;
> +
> +	if (msg->rpc_proc == &nfs4_procedures[NFSPROC4_CLNT_READ_PLUS] &&
> +	    server->caps & NFS_CAP_READ_PLUS && task->tk_status == -ENOTSUPP) {
> +		server->caps &= ~NFS_CAP_READ_PLUS;
> +		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> +		rpc_restart_call_prepare(task);
> +		return true;
> +	}
> +	return false;
> +}
> +
> static int nfs4_read_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
> {
> -
> 	dprintk("--> %s\n", __func__);
> 
> 	if (!nfs4_sequence_done(task, &hdr->res.seq_res))
> 		return -EAGAIN;
> 	if (nfs4_read_stateid_changed(task, &hdr->args))
> 		return -EAGAIN;
> +	if (nfs4_read_plus_not_supported(task, hdr))
> +		return -EAGAIN;
> 	if (task->tk_status > 0)
> 		nfs_invalidate_atime(hdr->inode);
> 	return hdr->pgio_done_cb ? hdr->pgio_done_cb(task, hdr) :
> 				    nfs4_read_done_cb(task, hdr);
> }
> 
> +#ifdef CONFIG_NFS_V4_2
> +static void nfs42_read_plus_support(struct nfs_server *server, struct rpc_message *msg)
> +{
> +	if (server->caps & NFS_CAP_READ_PLUS)
> +		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ_PLUS];
> +	else
> +		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> +}
> +#else
> +static void nfs42_read_plus_support(struct nfs_server *server, struct rpc_message *msg)
> +{
> +	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> +}
> +#endif /* CONFIG_NFS_V4_2 */
> +
> static void nfs4_proc_read_setup(struct nfs_pgio_header *hdr,
> 				 struct rpc_message *msg)
> {
> 	hdr->timestamp   = jiffies;
> 	if (!hdr->pgio_done_cb)
> 		hdr->pgio_done_cb = nfs4_read_done_cb;
> -	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> +	nfs42_read_plus_support(NFS_SERVER(hdr->inode), msg);
> 	nfs4_init_sequence(&hdr->args.seq_args, &hdr->res.seq_res, 0, 0);
> }
> 
> @@ -9970,7 +10006,8 @@ static const struct nfs4_minor_version_ops nfs_v4_2_minor_ops = {
> 		| NFS_CAP_SEEK
> 		| NFS_CAP_LAYOUTSTATS
> 		| NFS_CAP_CLONE
> -		| NFS_CAP_LAYOUTERROR,
> +		| NFS_CAP_LAYOUTERROR
> +		| NFS_CAP_READ_PLUS,
> 	.init_client = nfs41_init_client,
> 	.shutdown_client = nfs41_shutdown_client,
> 	.match_stateid = nfs41_match_stateid,
> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> index 47817ef0aadb..68b2917d0537 100644
> --- a/fs/nfs/nfs4xdr.c
> +++ b/fs/nfs/nfs4xdr.c
> @@ -7584,6 +7584,7 @@ const struct rpc_procinfo nfs4_procedures[] = {
> 	PROC42(COPY_NOTIFY,	enc_copy_notify,	dec_copy_notify),
> 	PROC(LOOKUPP,		enc_lookupp,		dec_lookupp),
> 	PROC42(LAYOUTERROR,	enc_layouterror,	dec_layouterror),
> +	PROC42(READ_PLUS,	enc_read_plus,		dec_read_plus),
> };
> 
> static unsigned int nfs_version4_counts[ARRAY_SIZE(nfs4_procedures)];
> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> index 82d8fb422092..c1eeef52545c 100644
> --- a/include/linux/nfs4.h
> +++ b/include/linux/nfs4.h
> @@ -540,8 +540,8 @@ enum {
> 
> 	NFSPROC4_CLNT_LOOKUPP,
> 	NFSPROC4_CLNT_LAYOUTERROR,
> -
> 	NFSPROC4_CLNT_COPY_NOTIFY,
> +	NFSPROC4_CLNT_READ_PLUS,
> };
> 
> /* nfs41 types */
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index 465fa98258a3..11248c5a7b24 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -281,5 +281,6 @@ struct nfs_server {
> #define NFS_CAP_OFFLOAD_CANCEL	(1U << 25)
> #define NFS_CAP_LAYOUTERROR	(1U << 26)
> #define NFS_CAP_COPY_NOTIFY	(1U << 27)
> +#define NFS_CAP_READ_PLUS	(1U << 28)
> 
> #endif
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index 94c77ed55ce1..8efbf3d8b263 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -655,7 +655,7 @@ struct nfs_pgio_args {
> struct nfs_pgio_res {
> 	struct nfs4_sequence_res	seq_res;
> 	struct nfs_fattr *	fattr;
> -	__u32			count;
> +	__u64			count;
> 	__u32			op_status;
> 	union {
> 		struct {
> -- 
> 2.25.0
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/6] NFS: Add READ_PLUS data segment support
  2020-02-14 22:28   ` Chuck Lever
@ 2020-02-20 14:42     ` Anna Schumaker
  2020-02-20 14:55       ` Chuck Lever
  0 siblings, 1 reply; 17+ messages in thread
From: Anna Schumaker @ 2020-02-20 14:42 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond.Myklebust, Linux NFS Mailing List

On Fri, 2020-02-14 at 17:28 -0500, Chuck Lever wrote:
> > On Feb 14, 2020, at 4:12 PM, schumaker.anna@gmail.com wrote:
> > 
> > From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > 
> > This patch adds client support for decoding a single NFS4_CONTENT_DATA
> > segment returned by the server. This is the simplest implementation
> > possible, since it does not account for any hole segments in the reply.
> > 
> > Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > ---
> > fs/nfs/nfs42xdr.c         | 138 ++++++++++++++++++++++++++++++++++++++
> > fs/nfs/nfs4proc.c         |  43 +++++++++++-
> > fs/nfs/nfs4xdr.c          |   1 +
> > include/linux/nfs4.h      |   2 +-
> > include/linux/nfs_fs_sb.h |   1 +
> > include/linux/nfs_xdr.h   |   2 +-
> > 6 files changed, 182 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> > index c03f3246d6c5..bf118ecabe2c 100644
> > --- a/fs/nfs/nfs42xdr.c
> > +++ b/fs/nfs/nfs42xdr.c
> > @@ -45,6 +45,15 @@
> > #define encode_deallocate_maxsz		(op_encode_hdr_maxsz + \
> > 					 encode_fallocate_maxsz)
> > #define decode_deallocate_maxsz		(op_decode_hdr_maxsz)
> > +#define encode_read_plus_maxsz		(op_encode_hdr_maxsz + \
> > +					 encode_stateid_maxsz + 3)
> > +#define NFS42_READ_PLUS_SEGMENT_SIZE	(1 /* data_content4 */ + \
> > +					 2 /* data_info4.di_offset */ + \
> > +					 2 /* data_info4.di_length */)
> > +#define decode_read_plus_maxsz		(op_decode_hdr_maxsz + \
> > +					 1 /* rpr_eof */ + \
> > +					 1 /* rpr_contents count */ + \
> > +					 NFS42_READ_PLUS_SEGMENT_SIZE)
> > #define encode_seek_maxsz		(op_encode_hdr_maxsz + \
> > 					 encode_stateid_maxsz + \
> > 					 2 /* offset */ + \
> > @@ -128,6 +137,14 @@
> > 					 decode_putfh_maxsz + \
> > 					 decode_deallocate_maxsz + \
> > 					 decode_getattr_maxsz)
> > +#define NFS4_enc_read_plus_sz		(compound_encode_hdr_maxsz + \
> > +					 encode_sequence_maxsz + \
> > +					 encode_putfh_maxsz + \
> > +					 encode_read_plus_maxsz)
> > +#define NFS4_dec_read_plus_sz		(compound_decode_hdr_maxsz + \
> > +					 decode_sequence_maxsz + \
> > +					 decode_putfh_maxsz + \
> > +					 decode_read_plus_maxsz)
> > #define NFS4_enc_seek_sz		(compound_encode_hdr_maxsz + \
> > 					 encode_sequence_maxsz + \
> > 					 encode_putfh_maxsz + \
> > @@ -252,6 +269,16 @@ static void encode_deallocate(struct xdr_stream *xdr,
> > 	encode_fallocate(xdr, args);
> > }
> > 
> > +static void encode_read_plus(struct xdr_stream *xdr,
> > +			     const struct nfs_pgio_args *args,
> > +			     struct compound_hdr *hdr)
> > +{
> > +	encode_op_hdr(xdr, OP_READ_PLUS, decode_read_plus_maxsz, hdr);
> > +	encode_nfs4_stateid(xdr, &args->stateid);
> > +	encode_uint64(xdr, args->offset);
> > +	encode_uint32(xdr, args->count);
> > +}
> > +
> > static void encode_seek(struct xdr_stream *xdr,
> > 			const struct nfs42_seek_args *args,
> > 			struct compound_hdr *hdr)
> > @@ -446,6 +473,29 @@ static void nfs4_xdr_enc_deallocate(struct rpc_rqst
> > *req,
> > 	encode_nops(&hdr);
> > }
> > 
> > +/*
> > + * Encode READ_PLUS request
> > + */
> > +static void nfs4_xdr_enc_read_plus(struct rpc_rqst *req,
> > +				   struct xdr_stream *xdr,
> > +				   const void *data)
> > +{
> > +	const struct nfs_pgio_args *args = data;
> > +	struct compound_hdr hdr = {
> > +		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
> > +	};
> > +
> > +	encode_compound_hdr(xdr, req, &hdr);
> > +	encode_sequence(xdr, &args->seq_args, &hdr);
> > +	encode_putfh(xdr, args->fh, &hdr);
> > +	encode_read_plus(xdr, args, &hdr);
> > +
> > +	rpc_prepare_reply_pages(req, args->pages, args->pgbase,
> > +				args->count, hdr.replen);
> > +	req->rq_rcv_buf.flags |= XDRBUF_READ;
> 
> IMO this line is incorrect.

You're right, this line causes problems for RDMA with READ_PLUS. I added it to
match how the other xdr read encoders were set up
> 
> RFC 8267 Section 6.1 does not list any part of the result of READ_PLUS
> as DDP-eligible. There's no way for a client to know how to set up
> Write chunks, unless it knows exactly where the file's holes are in
> advance. Even then... racy.
> 
> Just curious, have you tried READ_PLUS with proto=rdma ?

I haven't done in-depth performance testing, but I have been able to run it.

Anna

> 
> 
> > +	encode_nops(&hdr);
> > +}
> > +
> > /*
> >  * Encode SEEK request
> >  */
> > @@ -694,6 +744,67 @@ static int decode_deallocate(struct xdr_stream *xdr,
> > struct nfs42_falloc_res *re
> > 	return decode_op_hdr(xdr, OP_DEALLOCATE);
> > }
> > 
> > +static uint32_t decode_read_plus_data(struct xdr_stream *xdr, struct
> > nfs_pgio_res *res,
> > +				      uint32_t *eof)
> > +{
> > +	__be32 *p;
> > +	uint32_t count, recvd;
> > +	uint64_t offset;
> > +
> > +	p = xdr_inline_decode(xdr, 8 + 4);
> > +	if (unlikely(!p))
> > +		return -EIO;
> > +
> > +	p = xdr_decode_hyper(p, &offset);
> > +	count = be32_to_cpup(p);
> > +	if (count == 0)
> > +		return 0;
> > +
> > +	recvd = xdr_read_pages(xdr, count);
> > +	if (count > recvd) {
> > +		dprintk("NFS: server cheating in read reply: "
> > +				"count %u > recvd %u\n", count, recvd);
> > +		count = recvd;
> > +		*eof = 0;
> > +	}
> > +
> > +	return count;
> > +}
> > +
> > +static int decode_read_plus(struct xdr_stream *xdr, struct nfs_pgio_res
> > *res)
> > +{
> > +	__be32 *p;
> > +	uint32_t count, eof, segments, type;
> > +	int status;
> > +
> > +	status = decode_op_hdr(xdr, OP_READ_PLUS);
> > +	if (status)
> > +		return status;
> > +
> > +	p = xdr_inline_decode(xdr, 4 + 4);
> > +	if (unlikely(!p))
> > +		return -EIO;
> > +
> > +	eof = be32_to_cpup(p++);
> > +	segments = be32_to_cpup(p++);
> > +	if (segments == 0)
> > +		return 0;
> > +
> > +	p = xdr_inline_decode(xdr, 4);
> > +	if (unlikely(!p))
> > +		return -EIO;
> > +
> > +	type = be32_to_cpup(p++);
> > +	if (type == NFS4_CONTENT_DATA)
> > +		count = decode_read_plus_data(xdr, res, &eof);
> > +	else
> > +		return -EINVAL;
> > +
> > +	res->eof = eof;
> > +	res->count = count;
> > +	return 0;
> > +}
> > +
> > static int decode_seek(struct xdr_stream *xdr, struct nfs42_seek_res *res)
> > {
> > 	int status;
> > @@ -870,6 +981,33 @@ static int nfs4_xdr_dec_deallocate(struct rpc_rqst
> > *rqstp,
> > 	return status;
> > }
> > 
> > +/*
> > + * Decode READ_PLUS request
> > + */
> > +static int nfs4_xdr_dec_read_plus(struct rpc_rqst *rqstp,
> > +				  struct xdr_stream *xdr,
> > +				  void *data)
> > +{
> > +	struct nfs_pgio_res *res = data;
> > +	struct compound_hdr hdr;
> > +	int status;
> > +
> > +	status = decode_compound_hdr(xdr, &hdr);
> > +	if (status)
> > +		goto out;
> > +	status = decode_sequence(xdr, &res->seq_res, rqstp);
> > +	if (status)
> > +		goto out;
> > +	status = decode_putfh(xdr);
> > +	if (status)
> > +		goto out;
> > +	status = decode_read_plus(xdr, res);
> > +	if (!status)
> > +		status = res->count;
> > +out:
> > +	return status;
> > +}
> > +
> > /*
> >  * Decode SEEK request
> >  */
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index 95d07a3dc5d1..ed3ec8c36273 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -69,6 +69,10 @@
> > 
> > #include "nfs4trace.h"
> > 
> > +#ifdef CONFIG_NFS_V4_2
> > +#include "nfs42.h"
> > +#endif /* CONFIG_NFS_V4_2 */
> > +
> > #define NFSDBG_FACILITY		NFSDBG_PROC
> > 
> > #define NFS4_BITMASK_SZ		3
> > @@ -5199,28 +5203,60 @@ static bool nfs4_read_stateid_changed(struct
> > rpc_task *task,
> > 	return true;
> > }
> > 
> > +static bool nfs4_read_plus_not_supported(struct rpc_task *task,
> > +					 struct nfs_pgio_header *hdr)
> > +{
> > +	struct nfs_server *server = NFS_SERVER(hdr->inode);
> > +	struct rpc_message *msg = &task->tk_msg;
> > +
> > +	if (msg->rpc_proc == &nfs4_procedures[NFSPROC4_CLNT_READ_PLUS] &&
> > +	    server->caps & NFS_CAP_READ_PLUS && task->tk_status == -ENOTSUPP) {
> > +		server->caps &= ~NFS_CAP_READ_PLUS;
> > +		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> > +		rpc_restart_call_prepare(task);
> > +		return true;
> > +	}
> > +	return false;
> > +}
> > +
> > static int nfs4_read_done(struct rpc_task *task, struct nfs_pgio_header
> > *hdr)
> > {
> > -
> > 	dprintk("--> %s\n", __func__);
> > 
> > 	if (!nfs4_sequence_done(task, &hdr->res.seq_res))
> > 		return -EAGAIN;
> > 	if (nfs4_read_stateid_changed(task, &hdr->args))
> > 		return -EAGAIN;
> > +	if (nfs4_read_plus_not_supported(task, hdr))
> > +		return -EAGAIN;
> > 	if (task->tk_status > 0)
> > 		nfs_invalidate_atime(hdr->inode);
> > 	return hdr->pgio_done_cb ? hdr->pgio_done_cb(task, hdr) :
> > 				    nfs4_read_done_cb(task, hdr);
> > }
> > 
> > +#ifdef CONFIG_NFS_V4_2
> > +static void nfs42_read_plus_support(struct nfs_server *server, struct
> > rpc_message *msg)
> > +{
> > +	if (server->caps & NFS_CAP_READ_PLUS)
> > +		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ_PLUS];
> > +	else
> > +		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> > +}
> > +#else
> > +static void nfs42_read_plus_support(struct nfs_server *server, struct
> > rpc_message *msg)
> > +{
> > +	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> > +}
> > +#endif /* CONFIG_NFS_V4_2 */
> > +
> > static void nfs4_proc_read_setup(struct nfs_pgio_header *hdr,
> > 				 struct rpc_message *msg)
> > {
> > 	hdr->timestamp   = jiffies;
> > 	if (!hdr->pgio_done_cb)
> > 		hdr->pgio_done_cb = nfs4_read_done_cb;
> > -	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> > +	nfs42_read_plus_support(NFS_SERVER(hdr->inode), msg);
> > 	nfs4_init_sequence(&hdr->args.seq_args, &hdr->res.seq_res, 0, 0);
> > }
> > 
> > @@ -9970,7 +10006,8 @@ static const struct nfs4_minor_version_ops
> > nfs_v4_2_minor_ops = {
> > 		| NFS_CAP_SEEK
> > 		| NFS_CAP_LAYOUTSTATS
> > 		| NFS_CAP_CLONE
> > -		| NFS_CAP_LAYOUTERROR,
> > +		| NFS_CAP_LAYOUTERROR
> > +		| NFS_CAP_READ_PLUS,
> > 	.init_client = nfs41_init_client,
> > 	.shutdown_client = nfs41_shutdown_client,
> > 	.match_stateid = nfs41_match_stateid,
> > diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> > index 47817ef0aadb..68b2917d0537 100644
> > --- a/fs/nfs/nfs4xdr.c
> > +++ b/fs/nfs/nfs4xdr.c
> > @@ -7584,6 +7584,7 @@ const struct rpc_procinfo nfs4_procedures[] = {
> > 	PROC42(COPY_NOTIFY,	enc_copy_notify,	dec_copy_notify),
> > 	PROC(LOOKUPP,		enc_lookupp,		dec_lookupp),
> > 	PROC42(LAYOUTERROR,	enc_layouterror,	dec_layouterror),
> > +	PROC42(READ_PLUS,	enc_read_plus,		dec_read_plus),
> > };
> > 
> > static unsigned int nfs_version4_counts[ARRAY_SIZE(nfs4_procedures)];
> > diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> > index 82d8fb422092..c1eeef52545c 100644
> > --- a/include/linux/nfs4.h
> > +++ b/include/linux/nfs4.h
> > @@ -540,8 +540,8 @@ enum {
> > 
> > 	NFSPROC4_CLNT_LOOKUPP,
> > 	NFSPROC4_CLNT_LAYOUTERROR,
> > -
> > 	NFSPROC4_CLNT_COPY_NOTIFY,
> > +	NFSPROC4_CLNT_READ_PLUS,
> > };
> > 
> > /* nfs41 types */
> > diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> > index 465fa98258a3..11248c5a7b24 100644
> > --- a/include/linux/nfs_fs_sb.h
> > +++ b/include/linux/nfs_fs_sb.h
> > @@ -281,5 +281,6 @@ struct nfs_server {
> > #define NFS_CAP_OFFLOAD_CANCEL	(1U << 25)
> > #define NFS_CAP_LAYOUTERROR	(1U << 26)
> > #define NFS_CAP_COPY_NOTIFY	(1U << 27)
> > +#define NFS_CAP_READ_PLUS	(1U << 28)
> > 
> > #endif
> > diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> > index 94c77ed55ce1..8efbf3d8b263 100644
> > --- a/include/linux/nfs_xdr.h
> > +++ b/include/linux/nfs_xdr.h
> > @@ -655,7 +655,7 @@ struct nfs_pgio_args {
> > struct nfs_pgio_res {
> > 	struct nfs4_sequence_res	seq_res;
> > 	struct nfs_fattr *	fattr;
> > -	__u32			count;
> > +	__u64			count;
> > 	__u32			op_status;
> > 	union {
> > 		struct {
> > -- 
> > 2.25.0
> > 
> 
> --
> Chuck Lever
> 
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/6] NFS: Add READ_PLUS data segment support
  2020-02-20 14:42     ` Anna Schumaker
@ 2020-02-20 14:55       ` Chuck Lever
  2020-02-20 18:28         ` Anna Schumaker
  0 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2020-02-20 14:55 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Trond.Myklebust, Linux NFS Mailing List



> On Feb 20, 2020, at 9:42 AM, Anna Schumaker <schumaker.anna@gmail.com> wrote:
> 
> On Fri, 2020-02-14 at 17:28 -0500, Chuck Lever wrote:
>>> On Feb 14, 2020, at 4:12 PM, schumaker.anna@gmail.com wrote:
>>> 
>>> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
>>> 
>>> This patch adds client support for decoding a single NFS4_CONTENT_DATA
>>> segment returned by the server. This is the simplest implementation
>>> possible, since it does not account for any hole segments in the reply.
>>> 
>>> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
>>> ---
>>> fs/nfs/nfs42xdr.c         | 138 ++++++++++++++++++++++++++++++++++++++
>>> fs/nfs/nfs4proc.c         |  43 +++++++++++-
>>> fs/nfs/nfs4xdr.c          |   1 +
>>> include/linux/nfs4.h      |   2 +-
>>> include/linux/nfs_fs_sb.h |   1 +
>>> include/linux/nfs_xdr.h   |   2 +-
>>> 6 files changed, 182 insertions(+), 5 deletions(-)
>>> 
>>> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
>>> index c03f3246d6c5..bf118ecabe2c 100644
>>> --- a/fs/nfs/nfs42xdr.c
>>> +++ b/fs/nfs/nfs42xdr.c
>>> @@ -45,6 +45,15 @@
>>> #define encode_deallocate_maxsz		(op_encode_hdr_maxsz + \
>>> 					encode_fallocate_maxsz)
>>> #define decode_deallocate_maxsz		(op_decode_hdr_maxsz)
>>> +#define encode_read_plus_maxsz		(op_encode_hdr_maxsz + \
>>> +					encode_stateid_maxsz + 3)
>>> +#define NFS42_READ_PLUS_SEGMENT_SIZE	(1 /* data_content4 */ + \
>>> +					 2 /* data_info4.di_offset */ + \
>>> +					 2 /* data_info4.di_length */)
>>> +#define decode_read_plus_maxsz		(op_decode_hdr_maxsz + \
>>> +					 1 /* rpr_eof */ + \
>>> +					 1 /* rpr_contents count */ + \
>>> +					NFS42_READ_PLUS_SEGMENT_SIZE)
>>> #define encode_seek_maxsz		(op_encode_hdr_maxsz + \
>>> 					encode_stateid_maxsz + \
>>> 					 2 /* offset */ + \
>>> @@ -128,6 +137,14 @@
>>> 					decode_putfh_maxsz + \
>>> 					decode_deallocate_maxsz + \
>>> 					decode_getattr_maxsz)
>>> +#define NFS4_enc_read_plus_sz		(compound_encode_hdr_maxsz + \
>>> +					encode_sequence_maxsz + \
>>> +					encode_putfh_maxsz + \
>>> +					encode_read_plus_maxsz)
>>> +#define NFS4_dec_read_plus_sz		(compound_decode_hdr_maxsz + \
>>> +					decode_sequence_maxsz + \
>>> +					decode_putfh_maxsz + \
>>> +					decode_read_plus_maxsz)
>>> #define NFS4_enc_seek_sz		(compound_encode_hdr_maxsz + \
>>> 					encode_sequence_maxsz + \
>>> 					encode_putfh_maxsz + \
>>> @@ -252,6 +269,16 @@ static void encode_deallocate(struct xdr_stream *xdr,
>>> 	encode_fallocate(xdr, args);
>>> }
>>> 
>>> +static void encode_read_plus(struct xdr_stream *xdr,
>>> +			     const struct nfs_pgio_args *args,
>>> +			     struct compound_hdr *hdr)
>>> +{
>>> +	encode_op_hdr(xdr, OP_READ_PLUS, decode_read_plus_maxsz, hdr);
>>> +	encode_nfs4_stateid(xdr, &args->stateid);
>>> +	encode_uint64(xdr, args->offset);
>>> +	encode_uint32(xdr, args->count);
>>> +}
>>> +
>>> static void encode_seek(struct xdr_stream *xdr,
>>> 			const struct nfs42_seek_args *args,
>>> 			struct compound_hdr *hdr)
>>> @@ -446,6 +473,29 @@ static void nfs4_xdr_enc_deallocate(struct rpc_rqst
>>> *req,
>>> 	encode_nops(&hdr);
>>> }
>>> 
>>> +/*
>>> + * Encode READ_PLUS request
>>> + */
>>> +static void nfs4_xdr_enc_read_plus(struct rpc_rqst *req,
>>> +				   struct xdr_stream *xdr,
>>> +				   const void *data)
>>> +{
>>> +	const struct nfs_pgio_args *args = data;
>>> +	struct compound_hdr hdr = {
>>> +		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
>>> +	};
>>> +
>>> +	encode_compound_hdr(xdr, req, &hdr);
>>> +	encode_sequence(xdr, &args->seq_args, &hdr);
>>> +	encode_putfh(xdr, args->fh, &hdr);
>>> +	encode_read_plus(xdr, args, &hdr);
>>> +
>>> +	rpc_prepare_reply_pages(req, args->pages, args->pgbase,
>>> +				args->count, hdr.replen);
>>> +	req->rq_rcv_buf.flags |= XDRBUF_READ;
>> 
>> IMO this line is incorrect.
> 
> You're right, this line causes problems for RDMA with READ_PLUS. I added it to
> match how the other xdr read encoders were set up

Ja, I think just removing that line should be sufficient.
Better would be replacing it with a comment explaining
why this encoder does not set XDRBUF_READ. :-)


>> RFC 8267 Section 6.1 does not list any part of the result of READ_PLUS
>> as DDP-eligible. There's no way for a client to know how to set up
>> Write chunks, unless it knows exactly where the file's holes are in
>> advance. Even then... racy.
>> 
>> Just curious, have you tried READ_PLUS with proto=rdma ?
> 
> I haven't done in-depth performance testing, but I have been able to run it.

We should figure out whether that will have a regressive
impact on NFS/RDMA workloads. I expect that it will, but
the client can always set up the Reply chunk so that the
READ payload fits precisely in an RDMA segment that lines
up with page cache pages. That mitigates some impact.

If your patch set already changes NFSv4.2 mounts to always
use READ_PLUS in place of READ, it might be prudent for the
"proto=rdma" mount option to also set "noreadplus", at least
for the time being.

The down-side here is that would make NFSv4.2 on RDMA
unable to recognize holes in files the same way as it
does on TCP, and that's a pretty significant variation
in behavior. Does "noreadplus" even deal with that?


> Anna
> 
>> 
>> 
>>> +	encode_nops(&hdr);
>>> +}
>>> +
>>> /*
>>> * Encode SEEK request
>>> */
>>> @@ -694,6 +744,67 @@ static int decode_deallocate(struct xdr_stream *xdr,
>>> struct nfs42_falloc_res *re
>>> 	return decode_op_hdr(xdr, OP_DEALLOCATE);
>>> }
>>> 
>>> +static uint32_t decode_read_plus_data(struct xdr_stream *xdr, struct
>>> nfs_pgio_res *res,
>>> +				      uint32_t *eof)
>>> +{
>>> +	__be32 *p;
>>> +	uint32_t count, recvd;
>>> +	uint64_t offset;
>>> +
>>> +	p = xdr_inline_decode(xdr, 8 + 4);
>>> +	if (unlikely(!p))
>>> +		return -EIO;
>>> +
>>> +	p = xdr_decode_hyper(p, &offset);
>>> +	count = be32_to_cpup(p);
>>> +	if (count == 0)
>>> +		return 0;
>>> +
>>> +	recvd = xdr_read_pages(xdr, count);
>>> +	if (count > recvd) {
>>> +		dprintk("NFS: server cheating in read reply: "
>>> +				"count %u > recvd %u\n", count, recvd);
>>> +		count = recvd;
>>> +		*eof = 0;
>>> +	}
>>> +
>>> +	return count;
>>> +}
>>> +
>>> +static int decode_read_plus(struct xdr_stream *xdr, struct nfs_pgio_res
>>> *res)
>>> +{
>>> +	__be32 *p;
>>> +	uint32_t count, eof, segments, type;
>>> +	int status;
>>> +
>>> +	status = decode_op_hdr(xdr, OP_READ_PLUS);
>>> +	if (status)
>>> +		return status;
>>> +
>>> +	p = xdr_inline_decode(xdr, 4 + 4);
>>> +	if (unlikely(!p))
>>> +		return -EIO;
>>> +
>>> +	eof = be32_to_cpup(p++);
>>> +	segments = be32_to_cpup(p++);
>>> +	if (segments == 0)
>>> +		return 0;
>>> +
>>> +	p = xdr_inline_decode(xdr, 4);
>>> +	if (unlikely(!p))
>>> +		return -EIO;
>>> +
>>> +	type = be32_to_cpup(p++);
>>> +	if (type == NFS4_CONTENT_DATA)
>>> +		count = decode_read_plus_data(xdr, res, &eof);
>>> +	else
>>> +		return -EINVAL;
>>> +
>>> +	res->eof = eof;
>>> +	res->count = count;
>>> +	return 0;
>>> +}
>>> +
>>> static int decode_seek(struct xdr_stream *xdr, struct nfs42_seek_res *res)
>>> {
>>> 	int status;
>>> @@ -870,6 +981,33 @@ static int nfs4_xdr_dec_deallocate(struct rpc_rqst
>>> *rqstp,
>>> 	return status;
>>> }
>>> 
>>> +/*
>>> + * Decode READ_PLUS request
>>> + */
>>> +static int nfs4_xdr_dec_read_plus(struct rpc_rqst *rqstp,
>>> +				  struct xdr_stream *xdr,
>>> +				  void *data)
>>> +{
>>> +	struct nfs_pgio_res *res = data;
>>> +	struct compound_hdr hdr;
>>> +	int status;
>>> +
>>> +	status = decode_compound_hdr(xdr, &hdr);
>>> +	if (status)
>>> +		goto out;
>>> +	status = decode_sequence(xdr, &res->seq_res, rqstp);
>>> +	if (status)
>>> +		goto out;
>>> +	status = decode_putfh(xdr);
>>> +	if (status)
>>> +		goto out;
>>> +	status = decode_read_plus(xdr, res);
>>> +	if (!status)
>>> +		status = res->count;
>>> +out:
>>> +	return status;
>>> +}
>>> +
>>> /*
>>> * Decode SEEK request
>>> */
>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>> index 95d07a3dc5d1..ed3ec8c36273 100644
>>> --- a/fs/nfs/nfs4proc.c
>>> +++ b/fs/nfs/nfs4proc.c
>>> @@ -69,6 +69,10 @@
>>> 
>>> #include "nfs4trace.h"
>>> 
>>> +#ifdef CONFIG_NFS_V4_2
>>> +#include "nfs42.h"
>>> +#endif /* CONFIG_NFS_V4_2 */
>>> +
>>> #define NFSDBG_FACILITY		NFSDBG_PROC
>>> 
>>> #define NFS4_BITMASK_SZ		3
>>> @@ -5199,28 +5203,60 @@ static bool nfs4_read_stateid_changed(struct
>>> rpc_task *task,
>>> 	return true;
>>> }
>>> 
>>> +static bool nfs4_read_plus_not_supported(struct rpc_task *task,
>>> +					struct nfs_pgio_header *hdr)
>>> +{
>>> +	struct nfs_server *server = NFS_SERVER(hdr->inode);
>>> +	struct rpc_message *msg = &task->tk_msg;
>>> +
>>> +	if (msg->rpc_proc == &nfs4_procedures[NFSPROC4_CLNT_READ_PLUS] &&
>>> +	    server->caps & NFS_CAP_READ_PLUS && task->tk_status == -ENOTSUPP) {
>>> +		server->caps &= ~NFS_CAP_READ_PLUS;
>>> +		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
>>> +		rpc_restart_call_prepare(task);
>>> +		return true;
>>> +	}
>>> +	return false;
>>> +}
>>> +
>>> static int nfs4_read_done(struct rpc_task *task, struct nfs_pgio_header
>>> *hdr)
>>> {
>>> -
>>> 	dprintk("--> %s\n", __func__);
>>> 
>>> 	if (!nfs4_sequence_done(task, &hdr->res.seq_res))
>>> 		return -EAGAIN;
>>> 	if (nfs4_read_stateid_changed(task, &hdr->args))
>>> 		return -EAGAIN;
>>> +	if (nfs4_read_plus_not_supported(task, hdr))
>>> +		return -EAGAIN;
>>> 	if (task->tk_status > 0)
>>> 		nfs_invalidate_atime(hdr->inode);
>>> 	return hdr->pgio_done_cb ? hdr->pgio_done_cb(task, hdr) :
>>> 				    nfs4_read_done_cb(task, hdr);
>>> }
>>> 
>>> +#ifdef CONFIG_NFS_V4_2
>>> +static void nfs42_read_plus_support(struct nfs_server *server, struct
>>> rpc_message *msg)
>>> +{
>>> +	if (server->caps & NFS_CAP_READ_PLUS)
>>> +		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ_PLUS];
>>> +	else
>>> +		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
>>> +}
>>> +#else
>>> +static void nfs42_read_plus_support(struct nfs_server *server, struct
>>> rpc_message *msg)
>>> +{
>>> +	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
>>> +}
>>> +#endif /* CONFIG_NFS_V4_2 */
>>> +
>>> static void nfs4_proc_read_setup(struct nfs_pgio_header *hdr,
>>> 				 struct rpc_message *msg)
>>> {
>>> 	hdr->timestamp   = jiffies;
>>> 	if (!hdr->pgio_done_cb)
>>> 		hdr->pgio_done_cb = nfs4_read_done_cb;
>>> -	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
>>> +	nfs42_read_plus_support(NFS_SERVER(hdr->inode), msg);
>>> 	nfs4_init_sequence(&hdr->args.seq_args, &hdr->res.seq_res, 0, 0);
>>> }
>>> 
>>> @@ -9970,7 +10006,8 @@ static const struct nfs4_minor_version_ops
>>> nfs_v4_2_minor_ops = {
>>> 		| NFS_CAP_SEEK
>>> 		| NFS_CAP_LAYOUTSTATS
>>> 		| NFS_CAP_CLONE
>>> -		| NFS_CAP_LAYOUTERROR,
>>> +		| NFS_CAP_LAYOUTERROR
>>> +		| NFS_CAP_READ_PLUS,
>>> 	.init_client = nfs41_init_client,
>>> 	.shutdown_client = nfs41_shutdown_client,
>>> 	.match_stateid = nfs41_match_stateid,
>>> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
>>> index 47817ef0aadb..68b2917d0537 100644
>>> --- a/fs/nfs/nfs4xdr.c
>>> +++ b/fs/nfs/nfs4xdr.c
>>> @@ -7584,6 +7584,7 @@ const struct rpc_procinfo nfs4_procedures[] = {
>>> 	PROC42(COPY_NOTIFY,	enc_copy_notify,	dec_copy_notify),
>>> 	PROC(LOOKUPP,		enc_lookupp,		dec_lookupp),
>>> 	PROC42(LAYOUTERROR,	enc_layouterror,	dec_layouterror),
>>> +	PROC42(READ_PLUS,	enc_read_plus,		dec_read_plus),
>>> };
>>> 
>>> static unsigned int nfs_version4_counts[ARRAY_SIZE(nfs4_procedures)];
>>> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
>>> index 82d8fb422092..c1eeef52545c 100644
>>> --- a/include/linux/nfs4.h
>>> +++ b/include/linux/nfs4.h
>>> @@ -540,8 +540,8 @@ enum {
>>> 
>>> 	NFSPROC4_CLNT_LOOKUPP,
>>> 	NFSPROC4_CLNT_LAYOUTERROR,
>>> -
>>> 	NFSPROC4_CLNT_COPY_NOTIFY,
>>> +	NFSPROC4_CLNT_READ_PLUS,
>>> };
>>> 
>>> /* nfs41 types */
>>> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
>>> index 465fa98258a3..11248c5a7b24 100644
>>> --- a/include/linux/nfs_fs_sb.h
>>> +++ b/include/linux/nfs_fs_sb.h
>>> @@ -281,5 +281,6 @@ struct nfs_server {
>>> #define NFS_CAP_OFFLOAD_CANCEL	(1U << 25)
>>> #define NFS_CAP_LAYOUTERROR	(1U << 26)
>>> #define NFS_CAP_COPY_NOTIFY	(1U << 27)
>>> +#define NFS_CAP_READ_PLUS	(1U << 28)
>>> 
>>> #endif
>>> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
>>> index 94c77ed55ce1..8efbf3d8b263 100644
>>> --- a/include/linux/nfs_xdr.h
>>> +++ b/include/linux/nfs_xdr.h
>>> @@ -655,7 +655,7 @@ struct nfs_pgio_args {
>>> struct nfs_pgio_res {
>>> 	struct nfs4_sequence_res	seq_res;
>>> 	struct nfs_fattr *	fattr;
>>> -	__u32			count;
>>> +	__u64			count;
>>> 	__u32			op_status;
>>> 	union {
>>> 		struct {
>>> -- 
>>> 2.25.0
>>> 
>> 
>> --
>> Chuck Lever

--
Chuck Lever




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/6] NFS: Add support for the v4.2 READ_PLUS operation
  2020-02-14 21:12 [PATCH v2 0/6] NFS: Add support for the v4.2 READ_PLUS operation schumaker.anna
                   ` (5 preceding siblings ...)
  2020-02-14 21:12 ` [PATCH v2 6/6] NFS: Decode multiple READ_PLUS segments schumaker.anna
@ 2020-02-20 17:40 ` Chuck Lever
  2020-02-20 18:25   ` Anna Schumaker
  6 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2020-02-20 17:40 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Trond.Myklebust, Linux NFS Mailing List, Anna Schumaker

Hi Anna-

> On Feb 14, 2020, at 4:12 PM, schumaker.anna@gmail.com wrote:
> 
> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> 
> These patches add client support for the READ_PLUS operation, which
> breaks read requests into several "data" and "hole" segments when
> replying to the client. I also add a "noreadplus" mount option to allow
> users to disable the new operation if it becomes a problem, similar to
> the "nordirplus" mount option that we already have.

Hrm, I went looking for the patch that adds "noreadplus", but I
don't see it in this series?

Wondering if, to start off, the default should be "noreadplus"
until our feet are under us. Just a thought.


> Here are the results of some performance tests I ran on Netapp lab
> machines. I tested by reading various 2G files from a few different
> undelying filesystems and across several NFS versions. I used the
> `vmtouch` utility to make sure files were only cached when we wanted
> them to be. In addition to 100% data and 100% hole cases, I also tested
> with files that alternate between data and hole segments. These files
> have either 4K, 8K, 16K, or 32K segment sizes and start with either data
> or hole segments. So the file mixed-4d has a 4K segment size beginning
> with a data segment, but mixed-32h hase 32K segments beginning with a
> hole. The units are in seconds, with the first number for each NFS
> version being the uncached read time and the second number is for when
> the file is cached on the server.
> 
> ext4      |        v3       |       v4.0      |       v4.1      |       v4.2      |
> ----------|-----------------|-----------------|-----------------|-----------------|
> data      | 22.909 : 18.253 | 22.934 : 18.252 | 22.902 : 18.253 | 23.485 : 18.253 |
> hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 |  0.708 :  0.709 |
> mixed-4d  | 28.261 : 18.253 | 29.616 : 18.252 | 28.341 : 18.252 | 24.508 :  9.150 |
> mixed-8d  | 27.956 : 18.253 | 28.404 : 18.252 | 28.320 : 18.252 | 23.967 :  9.140 |
> mixed-16d | 28.172 : 18.253 | 27.946 : 18.252 | 27.627 : 18.252 | 23.043 :  9.134 |
> mixed-32d | 25.350 : 18.253 | 24.406 : 18.252 | 24.384 : 18.253 | 20.698 :  9.132 |
> mixed-4h  | 28.913 : 18.253 | 28.564 : 18.252 | 27.996 : 18.252 | 21.837 :  9.150 |
> mixed-8h  | 28.625 : 18.253 | 27.833 : 18.252 | 27.798 : 18.253 | 21.710 :  9.140 |
> mixed-16h | 27.975 : 18.253 | 27.662 : 18.252 | 27.795 : 18.253 | 20.585 :  9.134 |
> mixed-32h | 25.958 : 18.253 | 25.491 : 18.252 | 24.856 : 18.252 | 21.018 :  9.132 |
> 
> xfs       |        v3       |       v4.0      |       v4.1      |       v4.2      |
> ----------|-----------------|-----------------|-----------------|-----------------|
> data      | 22.041 : 18.253 | 22.618 : 18.252 | 23.067 : 18.253 | 23.496 : 18.253 |
> hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 |  0.723 :  0.708 |
> mixed-4d  | 29.417 : 18.253 | 28.503 : 18.252 | 28.671 : 18.253 | 24.957 :  9.150 |
> mixed-8d  | 29.080 : 18.253 | 29.401 : 18.252 | 29.251 : 18.252 | 24.625 :  9.140 |
> mixed-16d | 27.638 : 18.253 | 28.606 : 18.252 | 27.871 : 18.253 | 25.511 :  9.135 |
> mixed-32d | 24.967 : 18.253 | 25.239 : 18.252 | 25.434 : 18.252 | 21.728 :  9.132 |
> mixed-4h  | 34.816 : 18.253 | 36.243 : 18.252 | 35.837 : 18.252 | 32.332 :  9.150 |
> mixed-8h  | 43.469 : 18.253 | 44.009 : 18.252 | 43.810 : 18.253 | 37.962 :  9.140 |
> mixed-16h | 29.280 : 18.253 | 28.563 : 18.252 | 28.241 : 18.252 | 22.116 :  9.134 |
> mixed-32h | 29.428 : 18.253 | 29.378 : 18.252 | 28.808 : 18.253 | 27.378 :  9.134 |
> 
> btrfs     |        v3       |       v4.0      |       v4.1      |       v4.2      |
> ----------|-----------------|-----------------|-----------------|-----------------|
> data      | 25.547 : 18.253 | 25.053 : 18.252 | 24.209 : 18.253 | 32.121 : 18.253 |
> hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.252 |  0.702 :  0.724 |
> mixed-4d  | 19.016 : 18.253 | 18.822 : 18.252 | 18.955 : 18.253 | 18.697 :  9.150 |
> mixed-8d  | 19.186 : 18.253 | 19.444 : 18.252 | 18.841 : 18.253 | 18.452 :  9.140 |
> mixed-16d | 18.480 : 18.253 | 19.010 : 18.252 | 19.167 : 18.252 | 16.000 :  9.134 |
> mixed-32d | 18.635 : 18.253 | 18.565 : 18.252 | 18.550 : 18.252 | 15.930 :  9.132 |
> mixed-4h  | 19.079 : 18.253 | 18.990 : 18.252 | 19.157 : 18.253 | 27.834 :  9.150 |
> mixed-8h  | 18.613 : 18.253 | 19.234 : 18.252 | 18.616 : 18.253 | 20.177 :  9.140 |
> mixed-16h | 18.590 : 18.253 | 19.221 : 18.252 | 19.654 : 18.253 | 17.273 :  9.135 |
> mixed-32h | 18.768 : 18.253 | 19.122 : 18.252 | 18.535 : 18.252 | 15.791 :  9.132 |
> 
> ext3      |        v3       |       v4.0      |       v4.1      |       v4.2      |
> ----------|-----------------|-----------------|-----------------|-----------------|
> data      | 34.292 : 18.253 | 33.810 : 18.252 | 33.450 : 18.253 | 33.390 : 18.254 |
> hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 |  0.718 :  0.728 |
> mixed-4d  | 46.818 : 18.253 | 47.140 : 18.252 | 48.385 : 18.253 | 42.887 :  9.150 |
> mixed-8d  | 58.554 : 18.253 | 59.277 : 18.252 | 59.673 : 18.253 | 56.760 :  9.140 |
> mixed-16d | 44.631 : 18.253 | 44.291 : 18.252 | 44.729 : 18.253 | 40.237 :  9.135 |
> mixed-32d | 39.110 : 18.253 | 38.735 : 18.252 | 38.902 : 18.252 | 35.270 :  9.132 |
> mixed-4h  | 56.396 : 18.253 | 56.387 : 18.252 | 56.573 : 18.253 | 67.661 :  9.150 |
> mixed-8h  | 58.483 : 18.253 | 58.484 : 18.252 | 59.099 : 18.253 | 77.958 :  9.140 |
> mixed-16h | 42.511 : 18.253 | 42.338 : 18.252 | 42.356 : 18.252 | 51.805 :  9.135 |
> mixed-32h | 38.419 : 18.253 | 38.504 : 18.252 | 38.643 : 18.252 | 40.411 :  9.132 |
> 
> 
> Changes since v1:
> - Rebase to 5.6-rc1
> - Drop the mount option patch for now
> - Fix fallback to READ when the server doesn't support READ_PLUS
> 
> Any questions?
> Anna
> 
> 
> Anna Schumaker (6):
>  SUNRPC: Split out a function for setting current page
>  SUNRPC: Add the ability to expand holes in data pages
>  SUNRPC: Add the ability to shift data to a specific offset
>  NFS: Add READ_PLUS data segment support
>  NFS: Add READ_PLUS hole segment decoding
>  NFS: Decode multiple READ_PLUS segments
> 
> fs/nfs/nfs42xdr.c          | 169 +++++++++++++++++++++++++
> fs/nfs/nfs4proc.c          |  43 ++++++-
> fs/nfs/nfs4xdr.c           |   1 +
> include/linux/nfs4.h       |   2 +-
> include/linux/nfs_fs_sb.h  |   1 +
> include/linux/nfs_xdr.h    |   2 +-
> include/linux/sunrpc/xdr.h |   2 +
> net/sunrpc/xdr.c           | 244 ++++++++++++++++++++++++++++++++++++-
> 8 files changed, 457 insertions(+), 7 deletions(-)
> 
> -- 
> 2.25.0
> 

--
Chuck Lever
chucklever@gmail.com




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/6] NFS: Add support for the v4.2 READ_PLUS operation
  2020-02-20 17:40 ` [PATCH v2 0/6] NFS: Add support for the v4.2 READ_PLUS operation Chuck Lever
@ 2020-02-20 18:25   ` Anna Schumaker
  2020-02-20 18:29     ` Chuck Lever
  0 siblings, 1 reply; 17+ messages in thread
From: Anna Schumaker @ 2020-02-20 18:25 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond.Myklebust, Linux NFS Mailing List

On Thu, 2020-02-20 at 12:40 -0500, Chuck Lever wrote:
> Hi Anna-
> 
> > On Feb 14, 2020, at 4:12 PM, schumaker.anna@gmail.com wrote:
> > 
> > From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > 
> > These patches add client support for the READ_PLUS operation, which
> > breaks read requests into several "data" and "hole" segments when
> > replying to the client. I also add a "noreadplus" mount option to allow
> > users to disable the new operation if it becomes a problem, similar to
> > the "nordirplus" mount option that we already have.
> 
> Hrm, I went looking for the patch that adds "noreadplus", but I
> don't see it in this series?

You suggested dropping that patch in the v1 posting and waiting to see if
anybody asks for it.

> 
> Wondering if, to start off, the default should be "noreadplus"
> until our feet are under us. Just a thought.

I could re-add the patch with this as the default if that's the way everybody
wants to go.

Anna

> 
> 
> > Here are the results of some performance tests I ran on Netapp lab
> > machines. I tested by reading various 2G files from a few different
> > undelying filesystems and across several NFS versions. I used the
> > `vmtouch` utility to make sure files were only cached when we wanted
> > them to be. In addition to 100% data and 100% hole cases, I also tested
> > with files that alternate between data and hole segments. These files
> > have either 4K, 8K, 16K, or 32K segment sizes and start with either data
> > or hole segments. So the file mixed-4d has a 4K segment size beginning
> > with a data segment, but mixed-32h hase 32K segments beginning with a
> > hole. The units are in seconds, with the first number for each NFS
> > version being the uncached read time and the second number is for when
> > the file is cached on the server.
> > 
> > ext4      |        v3       |       v4.0      |       v4.1      |       v4.2
> >       |
> > ----------|-----------------|-----------------|-----------------|-----------
> > ------|
> > data      | 22.909 : 18.253 | 22.934 : 18.252 | 22.902 : 18.253 | 23.485 :
> > 18.253 |
> > hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 |  0.708
> > :  0.709 |
> > mixed-4d  | 28.261 : 18.253 | 29.616 : 18.252 | 28.341 : 18.252 | 24.508
> > :  9.150 |
> > mixed-8d  | 27.956 : 18.253 | 28.404 : 18.252 | 28.320 : 18.252 | 23.967
> > :  9.140 |
> > mixed-16d | 28.172 : 18.253 | 27.946 : 18.252 | 27.627 : 18.252 | 23.043
> > :  9.134 |
> > mixed-32d | 25.350 : 18.253 | 24.406 : 18.252 | 24.384 : 18.253 | 20.698
> > :  9.132 |
> > mixed-4h  | 28.913 : 18.253 | 28.564 : 18.252 | 27.996 : 18.252 | 21.837
> > :  9.150 |
> > mixed-8h  | 28.625 : 18.253 | 27.833 : 18.252 | 27.798 : 18.253 | 21.710
> > :  9.140 |
> > mixed-16h | 27.975 : 18.253 | 27.662 : 18.252 | 27.795 : 18.253 | 20.585
> > :  9.134 |
> > mixed-32h | 25.958 : 18.253 | 25.491 : 18.252 | 24.856 : 18.252 | 21.018
> > :  9.132 |
> > 
> > xfs       |        v3       |       v4.0      |       v4.1      |       v4.2
> >       |
> > ----------|-----------------|-----------------|-----------------|-----------
> > ------|
> > data      | 22.041 : 18.253 | 22.618 : 18.252 | 23.067 : 18.253 | 23.496 :
> > 18.253 |
> > hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 |  0.723
> > :  0.708 |
> > mixed-4d  | 29.417 : 18.253 | 28.503 : 18.252 | 28.671 : 18.253 | 24.957
> > :  9.150 |
> > mixed-8d  | 29.080 : 18.253 | 29.401 : 18.252 | 29.251 : 18.252 | 24.625
> > :  9.140 |
> > mixed-16d | 27.638 : 18.253 | 28.606 : 18.252 | 27.871 : 18.253 | 25.511
> > :  9.135 |
> > mixed-32d | 24.967 : 18.253 | 25.239 : 18.252 | 25.434 : 18.252 | 21.728
> > :  9.132 |
> > mixed-4h  | 34.816 : 18.253 | 36.243 : 18.252 | 35.837 : 18.252 | 32.332
> > :  9.150 |
> > mixed-8h  | 43.469 : 18.253 | 44.009 : 18.252 | 43.810 : 18.253 | 37.962
> > :  9.140 |
> > mixed-16h | 29.280 : 18.253 | 28.563 : 18.252 | 28.241 : 18.252 | 22.116
> > :  9.134 |
> > mixed-32h | 29.428 : 18.253 | 29.378 : 18.252 | 28.808 : 18.253 | 27.378
> > :  9.134 |
> > 
> > btrfs     |        v3       |       v4.0      |       v4.1      |       v4.2
> >       |
> > ----------|-----------------|-----------------|-----------------|-----------
> > ------|
> > data      | 25.547 : 18.253 | 25.053 : 18.252 | 24.209 : 18.253 | 32.121 :
> > 18.253 |
> > hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.252 |  0.702
> > :  0.724 |
> > mixed-4d  | 19.016 : 18.253 | 18.822 : 18.252 | 18.955 : 18.253 | 18.697
> > :  9.150 |
> > mixed-8d  | 19.186 : 18.253 | 19.444 : 18.252 | 18.841 : 18.253 | 18.452
> > :  9.140 |
> > mixed-16d | 18.480 : 18.253 | 19.010 : 18.252 | 19.167 : 18.252 | 16.000
> > :  9.134 |
> > mixed-32d | 18.635 : 18.253 | 18.565 : 18.252 | 18.550 : 18.252 | 15.930
> > :  9.132 |
> > mixed-4h  | 19.079 : 18.253 | 18.990 : 18.252 | 19.157 : 18.253 | 27.834
> > :  9.150 |
> > mixed-8h  | 18.613 : 18.253 | 19.234 : 18.252 | 18.616 : 18.253 | 20.177
> > :  9.140 |
> > mixed-16h | 18.590 : 18.253 | 19.221 : 18.252 | 19.654 : 18.253 | 17.273
> > :  9.135 |
> > mixed-32h | 18.768 : 18.253 | 19.122 : 18.252 | 18.535 : 18.252 | 15.791
> > :  9.132 |
> > 
> > ext3      |        v3       |       v4.0      |       v4.1      |       v4.2
> >       |
> > ----------|-----------------|-----------------|-----------------|-----------
> > ------|
> > data      | 34.292 : 18.253 | 33.810 : 18.252 | 33.450 : 18.253 | 33.390 :
> > 18.254 |
> > hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 |  0.718
> > :  0.728 |
> > mixed-4d  | 46.818 : 18.253 | 47.140 : 18.252 | 48.385 : 18.253 | 42.887
> > :  9.150 |
> > mixed-8d  | 58.554 : 18.253 | 59.277 : 18.252 | 59.673 : 18.253 | 56.760
> > :  9.140 |
> > mixed-16d | 44.631 : 18.253 | 44.291 : 18.252 | 44.729 : 18.253 | 40.237
> > :  9.135 |
> > mixed-32d | 39.110 : 18.253 | 38.735 : 18.252 | 38.902 : 18.252 | 35.270
> > :  9.132 |
> > mixed-4h  | 56.396 : 18.253 | 56.387 : 18.252 | 56.573 : 18.253 | 67.661
> > :  9.150 |
> > mixed-8h  | 58.483 : 18.253 | 58.484 : 18.252 | 59.099 : 18.253 | 77.958
> > :  9.140 |
> > mixed-16h | 42.511 : 18.253 | 42.338 : 18.252 | 42.356 : 18.252 | 51.805
> > :  9.135 |
> > mixed-32h | 38.419 : 18.253 | 38.504 : 18.252 | 38.643 : 18.252 | 40.411
> > :  9.132 |
> > 
> > 
> > Changes since v1:
> > - Rebase to 5.6-rc1
> > - Drop the mount option patch for now
> > - Fix fallback to READ when the server doesn't support READ_PLUS
> > 
> > Any questions?
> > Anna
> > 
> > 
> > Anna Schumaker (6):
> >  SUNRPC: Split out a function for setting current page
> >  SUNRPC: Add the ability to expand holes in data pages
> >  SUNRPC: Add the ability to shift data to a specific offset
> >  NFS: Add READ_PLUS data segment support
> >  NFS: Add READ_PLUS hole segment decoding
> >  NFS: Decode multiple READ_PLUS segments
> > 
> > fs/nfs/nfs42xdr.c          | 169 +++++++++++++++++++++++++
> > fs/nfs/nfs4proc.c          |  43 ++++++-
> > fs/nfs/nfs4xdr.c           |   1 +
> > include/linux/nfs4.h       |   2 +-
> > include/linux/nfs_fs_sb.h  |   1 +
> > include/linux/nfs_xdr.h    |   2 +-
> > include/linux/sunrpc/xdr.h |   2 +
> > net/sunrpc/xdr.c           | 244 ++++++++++++++++++++++++++++++++++++-
> > 8 files changed, 457 insertions(+), 7 deletions(-)
> > 
> > -- 
> > 2.25.0
> > 
> 
> --
> Chuck Lever
> chucklever@gmail.com
> 
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/6] NFS: Add READ_PLUS data segment support
  2020-02-20 14:55       ` Chuck Lever
@ 2020-02-20 18:28         ` Anna Schumaker
  2020-02-20 18:30           ` Chuck Lever
  0 siblings, 1 reply; 17+ messages in thread
From: Anna Schumaker @ 2020-02-20 18:28 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond.Myklebust, Linux NFS Mailing List

On Thu, 2020-02-20 at 09:55 -0500, Chuck Lever wrote:
> > On Feb 20, 2020, at 9:42 AM, Anna Schumaker <schumaker.anna@gmail.com>
> > wrote:
> > 
> > On Fri, 2020-02-14 at 17:28 -0500, Chuck Lever wrote:
> > > > On Feb 14, 2020, at 4:12 PM, schumaker.anna@gmail.com wrote:
> > > > 
> > > > From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > > > 
> > > > This patch adds client support for decoding a single NFS4_CONTENT_DATA
> > > > segment returned by the server. This is the simplest implementation
> > > > possible, since it does not account for any hole segments in the reply.
> > > > 
> > > > Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > > > ---
> > > > fs/nfs/nfs42xdr.c         | 138 ++++++++++++++++++++++++++++++++++++++
> > > > fs/nfs/nfs4proc.c         |  43 +++++++++++-
> > > > fs/nfs/nfs4xdr.c          |   1 +
> > > > include/linux/nfs4.h      |   2 +-
> > > > include/linux/nfs_fs_sb.h |   1 +
> > > > include/linux/nfs_xdr.h   |   2 +-
> > > > 6 files changed, 182 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> > > > index c03f3246d6c5..bf118ecabe2c 100644
> > > > --- a/fs/nfs/nfs42xdr.c
> > > > +++ b/fs/nfs/nfs42xdr.c
> > > > @@ -45,6 +45,15 @@
> > > > #define encode_deallocate_maxsz		(op_encode_hdr_maxsz + \
> > > > 					encode_fallocate_maxsz)
> > > > #define decode_deallocate_maxsz		(op_decode_hdr_maxsz)
> > > > +#define encode_read_plus_maxsz		(op_encode_hdr_maxsz + \
> > > > +					encode_stateid_maxsz + 3)
> > > > +#define NFS42_READ_PLUS_SEGMENT_SIZE	(1 /* data_content4 */ + \
> > > > +					 2 /* data_info4.di_offset */ +
> > > > \
> > > > +					 2 /* data_info4.di_length */)
> > > > +#define decode_read_plus_maxsz		(op_decode_hdr_maxsz + \
> > > > +					 1 /* rpr_eof */ + \
> > > > +					 1 /* rpr_contents count */ + \
> > > > +					NFS42_READ_PLUS_SEGMENT_SIZE)
> > > > #define encode_seek_maxsz		(op_encode_hdr_maxsz + \
> > > > 					encode_stateid_maxsz + \
> > > > 					 2 /* offset */ + \
> > > > @@ -128,6 +137,14 @@
> > > > 					decode_putfh_maxsz + \
> > > > 					decode_deallocate_maxsz + \
> > > > 					decode_getattr_maxsz)
> > > > +#define NFS4_enc_read_plus_sz		(compound_encode_hdr_maxsz + \
> > > > +					encode_sequence_maxsz + \
> > > > +					encode_putfh_maxsz + \
> > > > +					encode_read_plus_maxsz)
> > > > +#define NFS4_dec_read_plus_sz		(compound_decode_hdr_maxsz + \
> > > > +					decode_sequence_maxsz + \
> > > > +					decode_putfh_maxsz + \
> > > > +					decode_read_plus_maxsz)
> > > > #define NFS4_enc_seek_sz		(compound_encode_hdr_maxsz + \
> > > > 					encode_sequence_maxsz + \
> > > > 					encode_putfh_maxsz + \
> > > > @@ -252,6 +269,16 @@ static void encode_deallocate(struct xdr_stream
> > > > *xdr,
> > > > 	encode_fallocate(xdr, args);
> > > > }
> > > > 
> > > > +static void encode_read_plus(struct xdr_stream *xdr,
> > > > +			     const struct nfs_pgio_args *args,
> > > > +			     struct compound_hdr *hdr)
> > > > +{
> > > > +	encode_op_hdr(xdr, OP_READ_PLUS, decode_read_plus_maxsz, hdr);
> > > > +	encode_nfs4_stateid(xdr, &args->stateid);
> > > > +	encode_uint64(xdr, args->offset);
> > > > +	encode_uint32(xdr, args->count);
> > > > +}
> > > > +
> > > > static void encode_seek(struct xdr_stream *xdr,
> > > > 			const struct nfs42_seek_args *args,
> > > > 			struct compound_hdr *hdr)
> > > > @@ -446,6 +473,29 @@ static void nfs4_xdr_enc_deallocate(struct rpc_rqst
> > > > *req,
> > > > 	encode_nops(&hdr);
> > > > }
> > > > 
> > > > +/*
> > > > + * Encode READ_PLUS request
> > > > + */
> > > > +static void nfs4_xdr_enc_read_plus(struct rpc_rqst *req,
> > > > +				   struct xdr_stream *xdr,
> > > > +				   const void *data)
> > > > +{
> > > > +	const struct nfs_pgio_args *args = data;
> > > > +	struct compound_hdr hdr = {
> > > > +		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
> > > > +	};
> > > > +
> > > > +	encode_compound_hdr(xdr, req, &hdr);
> > > > +	encode_sequence(xdr, &args->seq_args, &hdr);
> > > > +	encode_putfh(xdr, args->fh, &hdr);
> > > > +	encode_read_plus(xdr, args, &hdr);
> > > > +
> > > > +	rpc_prepare_reply_pages(req, args->pages, args->pgbase,
> > > > +				args->count, hdr.replen);
> > > > +	req->rq_rcv_buf.flags |= XDRBUF_READ;
> > > 
> > > IMO this line is incorrect.
> > 
> > You're right, this line causes problems for RDMA with READ_PLUS. I added it
> > to
> > match how the other xdr read encoders were set up
> 
> Ja, I think just removing that line should be sufficient.
> Better would be replacing it with a comment explaining
> why this encoder does not set XDRBUF_READ. :-)
> 
> 
> > > RFC 8267 Section 6.1 does not list any part of the result of READ_PLUS
> > > as DDP-eligible. There's no way for a client to know how to set up
> > > Write chunks, unless it knows exactly where the file's holes are in
> > > advance. Even then... racy.
> > > 
> > > Just curious, have you tried READ_PLUS with proto=rdma ?
> > 
> > I haven't done in-depth performance testing, but I have been able to run it.
> 
> We should figure out whether that will have a regressive
> impact on NFS/RDMA workloads. I expect that it will, but
> the client can always set up the Reply chunk so that the
> READ payload fits precisely in an RDMA segment that lines
> up with page cache pages. That mitigates some impact.
> 
> If your patch set already changes NFSv4.2 mounts to always
> use READ_PLUS in place of READ, it might be prudent for the
> "proto=rdma" mount option to also set "noreadplus", at least
> for the time being.

I can make this change.

> 
> The down-side here is that would make NFSv4.2 on RDMA
> unable to recognize holes in files the same way as it
> does on TCP, and that's a pretty significant variation
> in behavior. Does "noreadplus" even deal with that?

Setting "noreadplus" just causes the client to use the READ operation instead,
so there should be no difference between v4.1 and v4.2 if the option is set.

Anna

> 
> 
> > Anna
> > 
> > > 
> > > > +	encode_nops(&hdr);
> > > > +}
> > > > +
> > > > /*
> > > > * Encode SEEK request
> > > > */
> > > > @@ -694,6 +744,67 @@ static int decode_deallocate(struct xdr_stream
> > > > *xdr,
> > > > struct nfs42_falloc_res *re
> > > > 	return decode_op_hdr(xdr, OP_DEALLOCATE);
> > > > }
> > > > 
> > > > +static uint32_t decode_read_plus_data(struct xdr_stream *xdr, struct
> > > > nfs_pgio_res *res,
> > > > +				      uint32_t *eof)
> > > > +{
> > > > +	__be32 *p;
> > > > +	uint32_t count, recvd;
> > > > +	uint64_t offset;
> > > > +
> > > > +	p = xdr_inline_decode(xdr, 8 + 4);
> > > > +	if (unlikely(!p))
> > > > +		return -EIO;
> > > > +
> > > > +	p = xdr_decode_hyper(p, &offset);
> > > > +	count = be32_to_cpup(p);
> > > > +	if (count == 0)
> > > > +		return 0;
> > > > +
> > > > +	recvd = xdr_read_pages(xdr, count);
> > > > +	if (count > recvd) {
> > > > +		dprintk("NFS: server cheating in read reply: "
> > > > +				"count %u > recvd %u\n", count, recvd);
> > > > +		count = recvd;
> > > > +		*eof = 0;
> > > > +	}
> > > > +
> > > > +	return count;
> > > > +}
> > > > +
> > > > +static int decode_read_plus(struct xdr_stream *xdr, struct nfs_pgio_res
> > > > *res)
> > > > +{
> > > > +	__be32 *p;
> > > > +	uint32_t count, eof, segments, type;
> > > > +	int status;
> > > > +
> > > > +	status = decode_op_hdr(xdr, OP_READ_PLUS);
> > > > +	if (status)
> > > > +		return status;
> > > > +
> > > > +	p = xdr_inline_decode(xdr, 4 + 4);
> > > > +	if (unlikely(!p))
> > > > +		return -EIO;
> > > > +
> > > > +	eof = be32_to_cpup(p++);
> > > > +	segments = be32_to_cpup(p++);
> > > > +	if (segments == 0)
> > > > +		return 0;
> > > > +
> > > > +	p = xdr_inline_decode(xdr, 4);
> > > > +	if (unlikely(!p))
> > > > +		return -EIO;
> > > > +
> > > > +	type = be32_to_cpup(p++);
> > > > +	if (type == NFS4_CONTENT_DATA)
> > > > +		count = decode_read_plus_data(xdr, res, &eof);
> > > > +	else
> > > > +		return -EINVAL;
> > > > +
> > > > +	res->eof = eof;
> > > > +	res->count = count;
> > > > +	return 0;
> > > > +}
> > > > +
> > > > static int decode_seek(struct xdr_stream *xdr, struct nfs42_seek_res
> > > > *res)
> > > > {
> > > > 	int status;
> > > > @@ -870,6 +981,33 @@ static int nfs4_xdr_dec_deallocate(struct rpc_rqst
> > > > *rqstp,
> > > > 	return status;
> > > > }
> > > > 
> > > > +/*
> > > > + * Decode READ_PLUS request
> > > > + */
> > > > +static int nfs4_xdr_dec_read_plus(struct rpc_rqst *rqstp,
> > > > +				  struct xdr_stream *xdr,
> > > > +				  void *data)
> > > > +{
> > > > +	struct nfs_pgio_res *res = data;
> > > > +	struct compound_hdr hdr;
> > > > +	int status;
> > > > +
> > > > +	status = decode_compound_hdr(xdr, &hdr);
> > > > +	if (status)
> > > > +		goto out;
> > > > +	status = decode_sequence(xdr, &res->seq_res, rqstp);
> > > > +	if (status)
> > > > +		goto out;
> > > > +	status = decode_putfh(xdr);
> > > > +	if (status)
> > > > +		goto out;
> > > > +	status = decode_read_plus(xdr, res);
> > > > +	if (!status)
> > > > +		status = res->count;
> > > > +out:
> > > > +	return status;
> > > > +}
> > > > +
> > > > /*
> > > > * Decode SEEK request
> > > > */
> > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > > > index 95d07a3dc5d1..ed3ec8c36273 100644
> > > > --- a/fs/nfs/nfs4proc.c
> > > > +++ b/fs/nfs/nfs4proc.c
> > > > @@ -69,6 +69,10 @@
> > > > 
> > > > #include "nfs4trace.h"
> > > > 
> > > > +#ifdef CONFIG_NFS_V4_2
> > > > +#include "nfs42.h"
> > > > +#endif /* CONFIG_NFS_V4_2 */
> > > > +
> > > > #define NFSDBG_FACILITY		NFSDBG_PROC
> > > > 
> > > > #define NFS4_BITMASK_SZ		3
> > > > @@ -5199,28 +5203,60 @@ static bool nfs4_read_stateid_changed(struct
> > > > rpc_task *task,
> > > > 	return true;
> > > > }
> > > > 
> > > > +static bool nfs4_read_plus_not_supported(struct rpc_task *task,
> > > > +					struct nfs_pgio_header *hdr)
> > > > +{
> > > > +	struct nfs_server *server = NFS_SERVER(hdr->inode);
> > > > +	struct rpc_message *msg = &task->tk_msg;
> > > > +
> > > > +	if (msg->rpc_proc == &nfs4_procedures[NFSPROC4_CLNT_READ_PLUS]
> > > > &&
> > > > +	    server->caps & NFS_CAP_READ_PLUS && task->tk_status ==
> > > > -ENOTSUPP) {
> > > > +		server->caps &= ~NFS_CAP_READ_PLUS;
> > > > +		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> > > > +		rpc_restart_call_prepare(task);
> > > > +		return true;
> > > > +	}
> > > > +	return false;
> > > > +}
> > > > +
> > > > static int nfs4_read_done(struct rpc_task *task, struct nfs_pgio_header
> > > > *hdr)
> > > > {
> > > > -
> > > > 	dprintk("--> %s\n", __func__);
> > > > 
> > > > 	if (!nfs4_sequence_done(task, &hdr->res.seq_res))
> > > > 		return -EAGAIN;
> > > > 	if (nfs4_read_stateid_changed(task, &hdr->args))
> > > > 		return -EAGAIN;
> > > > +	if (nfs4_read_plus_not_supported(task, hdr))
> > > > +		return -EAGAIN;
> > > > 	if (task->tk_status > 0)
> > > > 		nfs_invalidate_atime(hdr->inode);
> > > > 	return hdr->pgio_done_cb ? hdr->pgio_done_cb(task, hdr) :
> > > > 				    nfs4_read_done_cb(task, hdr);
> > > > }
> > > > 
> > > > +#ifdef CONFIG_NFS_V4_2
> > > > +static void nfs42_read_plus_support(struct nfs_server *server, struct
> > > > rpc_message *msg)
> > > > +{
> > > > +	if (server->caps & NFS_CAP_READ_PLUS)
> > > > +		msg->rpc_proc =
> > > > &nfs4_procedures[NFSPROC4_CLNT_READ_PLUS];
> > > > +	else
> > > > +		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> > > > +}
> > > > +#else
> > > > +static void nfs42_read_plus_support(struct nfs_server *server, struct
> > > > rpc_message *msg)
> > > > +{
> > > > +	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> > > > +}
> > > > +#endif /* CONFIG_NFS_V4_2 */
> > > > +
> > > > static void nfs4_proc_read_setup(struct nfs_pgio_header *hdr,
> > > > 				 struct rpc_message *msg)
> > > > {
> > > > 	hdr->timestamp   = jiffies;
> > > > 	if (!hdr->pgio_done_cb)
> > > > 		hdr->pgio_done_cb = nfs4_read_done_cb;
> > > > -	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> > > > +	nfs42_read_plus_support(NFS_SERVER(hdr->inode), msg);
> > > > 	nfs4_init_sequence(&hdr->args.seq_args, &hdr->res.seq_res, 0,
> > > > 0);
> > > > }
> > > > 
> > > > @@ -9970,7 +10006,8 @@ static const struct nfs4_minor_version_ops
> > > > nfs_v4_2_minor_ops = {
> > > > 		| NFS_CAP_SEEK
> > > > 		| NFS_CAP_LAYOUTSTATS
> > > > 		| NFS_CAP_CLONE
> > > > -		| NFS_CAP_LAYOUTERROR,
> > > > +		| NFS_CAP_LAYOUTERROR
> > > > +		| NFS_CAP_READ_PLUS,
> > > > 	.init_client = nfs41_init_client,
> > > > 	.shutdown_client = nfs41_shutdown_client,
> > > > 	.match_stateid = nfs41_match_stateid,
> > > > diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> > > > index 47817ef0aadb..68b2917d0537 100644
> > > > --- a/fs/nfs/nfs4xdr.c
> > > > +++ b/fs/nfs/nfs4xdr.c
> > > > @@ -7584,6 +7584,7 @@ const struct rpc_procinfo nfs4_procedures[] = {
> > > > 	PROC42(COPY_NOTIFY,	enc_copy_notify,	dec_copy_notify)
> > > > ,
> > > > 	PROC(LOOKUPP,		enc_lookupp,		dec_lookupp),
> > > > 	PROC42(LAYOUTERROR,	enc_layouterror,	dec_layouterror)
> > > > ,
> > > > +	PROC42(READ_PLUS,	enc_read_plus,		dec_read_plus),
> > > > };
> > > > 
> > > > static unsigned int nfs_version4_counts[ARRAY_SIZE(nfs4_procedures)];
> > > > diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> > > > index 82d8fb422092..c1eeef52545c 100644
> > > > --- a/include/linux/nfs4.h
> > > > +++ b/include/linux/nfs4.h
> > > > @@ -540,8 +540,8 @@ enum {
> > > > 
> > > > 	NFSPROC4_CLNT_LOOKUPP,
> > > > 	NFSPROC4_CLNT_LAYOUTERROR,
> > > > -
> > > > 	NFSPROC4_CLNT_COPY_NOTIFY,
> > > > +	NFSPROC4_CLNT_READ_PLUS,
> > > > };
> > > > 
> > > > /* nfs41 types */
> > > > diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> > > > index 465fa98258a3..11248c5a7b24 100644
> > > > --- a/include/linux/nfs_fs_sb.h
> > > > +++ b/include/linux/nfs_fs_sb.h
> > > > @@ -281,5 +281,6 @@ struct nfs_server {
> > > > #define NFS_CAP_OFFLOAD_CANCEL	(1U << 25)
> > > > #define NFS_CAP_LAYOUTERROR	(1U << 26)
> > > > #define NFS_CAP_COPY_NOTIFY	(1U << 27)
> > > > +#define NFS_CAP_READ_PLUS	(1U << 28)
> > > > 
> > > > #endif
> > > > diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> > > > index 94c77ed55ce1..8efbf3d8b263 100644
> > > > --- a/include/linux/nfs_xdr.h
> > > > +++ b/include/linux/nfs_xdr.h
> > > > @@ -655,7 +655,7 @@ struct nfs_pgio_args {
> > > > struct nfs_pgio_res {
> > > > 	struct nfs4_sequence_res	seq_res;
> > > > 	struct nfs_fattr *	fattr;
> > > > -	__u32			count;
> > > > +	__u64			count;
> > > > 	__u32			op_status;
> > > > 	union {
> > > > 		struct {
> > > > -- 
> > > > 2.25.0
> > > > 
> > > 
> > > --
> > > Chuck Lever
> 
> --
> Chuck Lever
> 
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/6] NFS: Add support for the v4.2 READ_PLUS operation
  2020-02-20 18:25   ` Anna Schumaker
@ 2020-02-20 18:29     ` Chuck Lever
  0 siblings, 0 replies; 17+ messages in thread
From: Chuck Lever @ 2020-02-20 18:29 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Trond.Myklebust, Linux NFS Mailing List



> On Feb 20, 2020, at 1:25 PM, Anna Schumaker <schumaker.anna@gmail.com> wrote:
> 
> On Thu, 2020-02-20 at 12:40 -0500, Chuck Lever wrote:
>> Hi Anna-
>> 
>>> On Feb 14, 2020, at 4:12 PM, schumaker.anna@gmail.com wrote:
>>> 
>>> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
>>> 
>>> These patches add client support for the READ_PLUS operation, which
>>> breaks read requests into several "data" and "hole" segments when
>>> replying to the client. I also add a "noreadplus" mount option to allow
>>> users to disable the new operation if it becomes a problem, similar to
>>> the "nordirplus" mount option that we already have.
>> 
>> Hrm, I went looking for the patch that adds "noreadplus", but I
>> don't see it in this series?
> 
> You suggested dropping that patch in the v1 posting and waiting to see if
> anybody asks for it.

Yes, recalling that now.

I requested that because I don't like to add administrative interfaces
if there isn't a clear need. I think we might have one now!


>> Wondering if, to start off, the default should be "noreadplus"
>> until our feet are under us. Just a thought.
> 
> I could re-add the patch with this as the default if that's the way everybody
> wants to go.

Yep, I'm interested in other opinions too.


> Anna
> 
>> 
>> 
>>> Here are the results of some performance tests I ran on Netapp lab
>>> machines. I tested by reading various 2G files from a few different
>>> undelying filesystems and across several NFS versions. I used the
>>> `vmtouch` utility to make sure files were only cached when we wanted
>>> them to be. In addition to 100% data and 100% hole cases, I also tested
>>> with files that alternate between data and hole segments. These files
>>> have either 4K, 8K, 16K, or 32K segment sizes and start with either data
>>> or hole segments. So the file mixed-4d has a 4K segment size beginning
>>> with a data segment, but mixed-32h hase 32K segments beginning with a
>>> hole. The units are in seconds, with the first number for each NFS
>>> version being the uncached read time and the second number is for when
>>> the file is cached on the server.
>>> 
>>> ext4      |        v3       |       v4.0      |       v4.1      |       v4.2
>>>      |
>>> ----------|-----------------|-----------------|-----------------|-----------
>>> ------|
>>> data      | 22.909 : 18.253 | 22.934 : 18.252 | 22.902 : 18.253 | 23.485 :
>>> 18.253 |
>>> hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 |  0.708
>>> :  0.709 |
>>> mixed-4d  | 28.261 : 18.253 | 29.616 : 18.252 | 28.341 : 18.252 | 24.508
>>> :  9.150 |
>>> mixed-8d  | 27.956 : 18.253 | 28.404 : 18.252 | 28.320 : 18.252 | 23.967
>>> :  9.140 |
>>> mixed-16d | 28.172 : 18.253 | 27.946 : 18.252 | 27.627 : 18.252 | 23.043
>>> :  9.134 |
>>> mixed-32d | 25.350 : 18.253 | 24.406 : 18.252 | 24.384 : 18.253 | 20.698
>>> :  9.132 |
>>> mixed-4h  | 28.913 : 18.253 | 28.564 : 18.252 | 27.996 : 18.252 | 21.837
>>> :  9.150 |
>>> mixed-8h  | 28.625 : 18.253 | 27.833 : 18.252 | 27.798 : 18.253 | 21.710
>>> :  9.140 |
>>> mixed-16h | 27.975 : 18.253 | 27.662 : 18.252 | 27.795 : 18.253 | 20.585
>>> :  9.134 |
>>> mixed-32h | 25.958 : 18.253 | 25.491 : 18.252 | 24.856 : 18.252 | 21.018
>>> :  9.132 |
>>> 
>>> xfs       |        v3       |       v4.0      |       v4.1      |       v4.2
>>>      |
>>> ----------|-----------------|-----------------|-----------------|-----------
>>> ------|
>>> data      | 22.041 : 18.253 | 22.618 : 18.252 | 23.067 : 18.253 | 23.496 :
>>> 18.253 |
>>> hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 |  0.723
>>> :  0.708 |
>>> mixed-4d  | 29.417 : 18.253 | 28.503 : 18.252 | 28.671 : 18.253 | 24.957
>>> :  9.150 |
>>> mixed-8d  | 29.080 : 18.253 | 29.401 : 18.252 | 29.251 : 18.252 | 24.625
>>> :  9.140 |
>>> mixed-16d | 27.638 : 18.253 | 28.606 : 18.252 | 27.871 : 18.253 | 25.511
>>> :  9.135 |
>>> mixed-32d | 24.967 : 18.253 | 25.239 : 18.252 | 25.434 : 18.252 | 21.728
>>> :  9.132 |
>>> mixed-4h  | 34.816 : 18.253 | 36.243 : 18.252 | 35.837 : 18.252 | 32.332
>>> :  9.150 |
>>> mixed-8h  | 43.469 : 18.253 | 44.009 : 18.252 | 43.810 : 18.253 | 37.962
>>> :  9.140 |
>>> mixed-16h | 29.280 : 18.253 | 28.563 : 18.252 | 28.241 : 18.252 | 22.116
>>> :  9.134 |
>>> mixed-32h | 29.428 : 18.253 | 29.378 : 18.252 | 28.808 : 18.253 | 27.378
>>> :  9.134 |
>>> 
>>> btrfs     |        v3       |       v4.0      |       v4.1      |       v4.2
>>>      |
>>> ----------|-----------------|-----------------|-----------------|-----------
>>> ------|
>>> data      | 25.547 : 18.253 | 25.053 : 18.252 | 24.209 : 18.253 | 32.121 :
>>> 18.253 |
>>> hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.252 |  0.702
>>> :  0.724 |
>>> mixed-4d  | 19.016 : 18.253 | 18.822 : 18.252 | 18.955 : 18.253 | 18.697
>>> :  9.150 |
>>> mixed-8d  | 19.186 : 18.253 | 19.444 : 18.252 | 18.841 : 18.253 | 18.452
>>> :  9.140 |
>>> mixed-16d | 18.480 : 18.253 | 19.010 : 18.252 | 19.167 : 18.252 | 16.000
>>> :  9.134 |
>>> mixed-32d | 18.635 : 18.253 | 18.565 : 18.252 | 18.550 : 18.252 | 15.930
>>> :  9.132 |
>>> mixed-4h  | 19.079 : 18.253 | 18.990 : 18.252 | 19.157 : 18.253 | 27.834
>>> :  9.150 |
>>> mixed-8h  | 18.613 : 18.253 | 19.234 : 18.252 | 18.616 : 18.253 | 20.177
>>> :  9.140 |
>>> mixed-16h | 18.590 : 18.253 | 19.221 : 18.252 | 19.654 : 18.253 | 17.273
>>> :  9.135 |
>>> mixed-32h | 18.768 : 18.253 | 19.122 : 18.252 | 18.535 : 18.252 | 15.791
>>> :  9.132 |
>>> 
>>> ext3      |        v3       |       v4.0      |       v4.1      |       v4.2
>>>      |
>>> ----------|-----------------|-----------------|-----------------|-----------
>>> ------|
>>> data      | 34.292 : 18.253 | 33.810 : 18.252 | 33.450 : 18.253 | 33.390 :
>>> 18.254 |
>>> hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 |  0.718
>>> :  0.728 |
>>> mixed-4d  | 46.818 : 18.253 | 47.140 : 18.252 | 48.385 : 18.253 | 42.887
>>> :  9.150 |
>>> mixed-8d  | 58.554 : 18.253 | 59.277 : 18.252 | 59.673 : 18.253 | 56.760
>>> :  9.140 |
>>> mixed-16d | 44.631 : 18.253 | 44.291 : 18.252 | 44.729 : 18.253 | 40.237
>>> :  9.135 |
>>> mixed-32d | 39.110 : 18.253 | 38.735 : 18.252 | 38.902 : 18.252 | 35.270
>>> :  9.132 |
>>> mixed-4h  | 56.396 : 18.253 | 56.387 : 18.252 | 56.573 : 18.253 | 67.661
>>> :  9.150 |
>>> mixed-8h  | 58.483 : 18.253 | 58.484 : 18.252 | 59.099 : 18.253 | 77.958
>>> :  9.140 |
>>> mixed-16h | 42.511 : 18.253 | 42.338 : 18.252 | 42.356 : 18.252 | 51.805
>>> :  9.135 |
>>> mixed-32h | 38.419 : 18.253 | 38.504 : 18.252 | 38.643 : 18.252 | 40.411
>>> :  9.132 |
>>> 
>>> 
>>> Changes since v1:
>>> - Rebase to 5.6-rc1
>>> - Drop the mount option patch for now
>>> - Fix fallback to READ when the server doesn't support READ_PLUS
>>> 
>>> Any questions?
>>> Anna
>>> 
>>> 
>>> Anna Schumaker (6):
>>> SUNRPC: Split out a function for setting current page
>>> SUNRPC: Add the ability to expand holes in data pages
>>> SUNRPC: Add the ability to shift data to a specific offset
>>> NFS: Add READ_PLUS data segment support
>>> NFS: Add READ_PLUS hole segment decoding
>>> NFS: Decode multiple READ_PLUS segments
>>> 
>>> fs/nfs/nfs42xdr.c          | 169 +++++++++++++++++++++++++
>>> fs/nfs/nfs4proc.c          |  43 ++++++-
>>> fs/nfs/nfs4xdr.c           |   1 +
>>> include/linux/nfs4.h       |   2 +-
>>> include/linux/nfs_fs_sb.h  |   1 +
>>> include/linux/nfs_xdr.h    |   2 +-
>>> include/linux/sunrpc/xdr.h |   2 +
>>> net/sunrpc/xdr.c           | 244 ++++++++++++++++++++++++++++++++++++-
>>> 8 files changed, 457 insertions(+), 7 deletions(-)
>>> 
>>> -- 
>>> 2.25.0
>>> 
>> 
>> --
>> Chuck Lever
>> chucklever@gmail.com

--
Chuck Lever
chucklever@gmail.com




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/6] NFS: Add READ_PLUS data segment support
  2020-02-20 18:28         ` Anna Schumaker
@ 2020-02-20 18:30           ` Chuck Lever
  2020-02-20 18:35             ` Anna Schumaker
  0 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2020-02-20 18:30 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Trond.Myklebust, Linux NFS Mailing List



> On Feb 20, 2020, at 1:28 PM, Anna Schumaker <schumaker.anna@gmail.com> wrote:
> 
> On Thu, 2020-02-20 at 09:55 -0500, Chuck Lever wrote:
>>> On Feb 20, 2020, at 9:42 AM, Anna Schumaker <schumaker.anna@gmail.com>
>>> wrote:
>>> 
>>> On Fri, 2020-02-14 at 17:28 -0500, Chuck Lever wrote:
>>>>> On Feb 14, 2020, at 4:12 PM, schumaker.anna@gmail.com wrote:
>>>>> 
>>>>> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
>>>>> 
>>>>> This patch adds client support for decoding a single NFS4_CONTENT_DATA
>>>>> segment returned by the server. This is the simplest implementation
>>>>> possible, since it does not account for any hole segments in the reply.
>>>>> 
>>>>> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
>>>>> ---
>>>>> fs/nfs/nfs42xdr.c         | 138 ++++++++++++++++++++++++++++++++++++++
>>>>> fs/nfs/nfs4proc.c         |  43 +++++++++++-
>>>>> fs/nfs/nfs4xdr.c          |   1 +
>>>>> include/linux/nfs4.h      |   2 +-
>>>>> include/linux/nfs_fs_sb.h |   1 +
>>>>> include/linux/nfs_xdr.h   |   2 +-
>>>>> 6 files changed, 182 insertions(+), 5 deletions(-)
>>>>> 
>>>>> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
>>>>> index c03f3246d6c5..bf118ecabe2c 100644
>>>>> --- a/fs/nfs/nfs42xdr.c
>>>>> +++ b/fs/nfs/nfs42xdr.c
>>>>> @@ -45,6 +45,15 @@
>>>>> #define encode_deallocate_maxsz		(op_encode_hdr_maxsz + \
>>>>> 					encode_fallocate_maxsz)
>>>>> #define decode_deallocate_maxsz		(op_decode_hdr_maxsz)
>>>>> +#define encode_read_plus_maxsz		(op_encode_hdr_maxsz + \
>>>>> +					encode_stateid_maxsz + 3)
>>>>> +#define NFS42_READ_PLUS_SEGMENT_SIZE	(1 /* data_content4 */ + \
>>>>> +					 2 /* data_info4.di_offset */ +
>>>>> \
>>>>> +					 2 /* data_info4.di_length */)
>>>>> +#define decode_read_plus_maxsz		(op_decode_hdr_maxsz + \
>>>>> +					 1 /* rpr_eof */ + \
>>>>> +					 1 /* rpr_contents count */ + \
>>>>> +					NFS42_READ_PLUS_SEGMENT_SIZE)
>>>>> #define encode_seek_maxsz		(op_encode_hdr_maxsz + \
>>>>> 					encode_stateid_maxsz + \
>>>>> 					 2 /* offset */ + \
>>>>> @@ -128,6 +137,14 @@
>>>>> 					decode_putfh_maxsz + \
>>>>> 					decode_deallocate_maxsz + \
>>>>> 					decode_getattr_maxsz)
>>>>> +#define NFS4_enc_read_plus_sz		(compound_encode_hdr_maxsz + \
>>>>> +					encode_sequence_maxsz + \
>>>>> +					encode_putfh_maxsz + \
>>>>> +					encode_read_plus_maxsz)
>>>>> +#define NFS4_dec_read_plus_sz		(compound_decode_hdr_maxsz + \
>>>>> +					decode_sequence_maxsz + \
>>>>> +					decode_putfh_maxsz + \
>>>>> +					decode_read_plus_maxsz)
>>>>> #define NFS4_enc_seek_sz		(compound_encode_hdr_maxsz + \
>>>>> 					encode_sequence_maxsz + \
>>>>> 					encode_putfh_maxsz + \
>>>>> @@ -252,6 +269,16 @@ static void encode_deallocate(struct xdr_stream
>>>>> *xdr,
>>>>> 	encode_fallocate(xdr, args);
>>>>> }
>>>>> 
>>>>> +static void encode_read_plus(struct xdr_stream *xdr,
>>>>> +			     const struct nfs_pgio_args *args,
>>>>> +			     struct compound_hdr *hdr)
>>>>> +{
>>>>> +	encode_op_hdr(xdr, OP_READ_PLUS, decode_read_plus_maxsz, hdr);
>>>>> +	encode_nfs4_stateid(xdr, &args->stateid);
>>>>> +	encode_uint64(xdr, args->offset);
>>>>> +	encode_uint32(xdr, args->count);
>>>>> +}
>>>>> +
>>>>> static void encode_seek(struct xdr_stream *xdr,
>>>>> 			const struct nfs42_seek_args *args,
>>>>> 			struct compound_hdr *hdr)
>>>>> @@ -446,6 +473,29 @@ static void nfs4_xdr_enc_deallocate(struct rpc_rqst
>>>>> *req,
>>>>> 	encode_nops(&hdr);
>>>>> }
>>>>> 
>>>>> +/*
>>>>> + * Encode READ_PLUS request
>>>>> + */
>>>>> +static void nfs4_xdr_enc_read_plus(struct rpc_rqst *req,
>>>>> +				   struct xdr_stream *xdr,
>>>>> +				   const void *data)
>>>>> +{
>>>>> +	const struct nfs_pgio_args *args = data;
>>>>> +	struct compound_hdr hdr = {
>>>>> +		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
>>>>> +	};
>>>>> +
>>>>> +	encode_compound_hdr(xdr, req, &hdr);
>>>>> +	encode_sequence(xdr, &args->seq_args, &hdr);
>>>>> +	encode_putfh(xdr, args->fh, &hdr);
>>>>> +	encode_read_plus(xdr, args, &hdr);
>>>>> +
>>>>> +	rpc_prepare_reply_pages(req, args->pages, args->pgbase,
>>>>> +				args->count, hdr.replen);
>>>>> +	req->rq_rcv_buf.flags |= XDRBUF_READ;
>>>> 
>>>> IMO this line is incorrect.
>>> 
>>> You're right, this line causes problems for RDMA with READ_PLUS. I added it
>>> to
>>> match how the other xdr read encoders were set up
>> 
>> Ja, I think just removing that line should be sufficient.
>> Better would be replacing it with a comment explaining
>> why this encoder does not set XDRBUF_READ. :-)
>> 
>> 
>>>> RFC 8267 Section 6.1 does not list any part of the result of READ_PLUS
>>>> as DDP-eligible. There's no way for a client to know how to set up
>>>> Write chunks, unless it knows exactly where the file's holes are in
>>>> advance. Even then... racy.
>>>> 
>>>> Just curious, have you tried READ_PLUS with proto=rdma ?
>>> 
>>> I haven't done in-depth performance testing, but I have been able to run it.
>> 
>> We should figure out whether that will have a regressive
>> impact on NFS/RDMA workloads. I expect that it will, but
>> the client can always set up the Reply chunk so that the
>> READ payload fits precisely in an RDMA segment that lines
>> up with page cache pages. That mitigates some impact.
>> 
>> If your patch set already changes NFSv4.2 mounts to always
>> use READ_PLUS in place of READ, it might be prudent for the
>> "proto=rdma" mount option to also set "noreadplus", at least
>> for the time being.
> 
> I can make this change.
> 
>> 
>> The down-side here is that would make NFSv4.2 on RDMA
>> unable to recognize holes in files the same way as it
>> does on TCP, and that's a pretty significant variation
>> in behavior. Does "noreadplus" even deal with that?
> 
> Setting "noreadplus" just causes the client to use the READ operation instead,
> so there should be no difference between v4.1 and v4.2 if the option is set.

My concern is the difference between NFSv4.2 with noreadplus
and NFSv4.2 with readplus. The former is not able to detect
holes in files on the server, but the latter is.

Is that worth mentioning in the man page, or in release notes
when NFSv4.2 becomes the default?


> Anna
> 
>> 
>> 
>>> Anna
>>> 
>>>> 
>>>>> +	encode_nops(&hdr);
>>>>> +}
>>>>> +
>>>>> /*
>>>>> * Encode SEEK request
>>>>> */
>>>>> @@ -694,6 +744,67 @@ static int decode_deallocate(struct xdr_stream
>>>>> *xdr,
>>>>> struct nfs42_falloc_res *re
>>>>> 	return decode_op_hdr(xdr, OP_DEALLOCATE);
>>>>> }
>>>>> 
>>>>> +static uint32_t decode_read_plus_data(struct xdr_stream *xdr, struct
>>>>> nfs_pgio_res *res,
>>>>> +				      uint32_t *eof)
>>>>> +{
>>>>> +	__be32 *p;
>>>>> +	uint32_t count, recvd;
>>>>> +	uint64_t offset;
>>>>> +
>>>>> +	p = xdr_inline_decode(xdr, 8 + 4);
>>>>> +	if (unlikely(!p))
>>>>> +		return -EIO;
>>>>> +
>>>>> +	p = xdr_decode_hyper(p, &offset);
>>>>> +	count = be32_to_cpup(p);
>>>>> +	if (count == 0)
>>>>> +		return 0;
>>>>> +
>>>>> +	recvd = xdr_read_pages(xdr, count);
>>>>> +	if (count > recvd) {
>>>>> +		dprintk("NFS: server cheating in read reply: "
>>>>> +				"count %u > recvd %u\n", count, recvd);
>>>>> +		count = recvd;
>>>>> +		*eof = 0;
>>>>> +	}
>>>>> +
>>>>> +	return count;
>>>>> +}
>>>>> +
>>>>> +static int decode_read_plus(struct xdr_stream *xdr, struct nfs_pgio_res
>>>>> *res)
>>>>> +{
>>>>> +	__be32 *p;
>>>>> +	uint32_t count, eof, segments, type;
>>>>> +	int status;
>>>>> +
>>>>> +	status = decode_op_hdr(xdr, OP_READ_PLUS);
>>>>> +	if (status)
>>>>> +		return status;
>>>>> +
>>>>> +	p = xdr_inline_decode(xdr, 4 + 4);
>>>>> +	if (unlikely(!p))
>>>>> +		return -EIO;
>>>>> +
>>>>> +	eof = be32_to_cpup(p++);
>>>>> +	segments = be32_to_cpup(p++);
>>>>> +	if (segments == 0)
>>>>> +		return 0;
>>>>> +
>>>>> +	p = xdr_inline_decode(xdr, 4);
>>>>> +	if (unlikely(!p))
>>>>> +		return -EIO;
>>>>> +
>>>>> +	type = be32_to_cpup(p++);
>>>>> +	if (type == NFS4_CONTENT_DATA)
>>>>> +		count = decode_read_plus_data(xdr, res, &eof);
>>>>> +	else
>>>>> +		return -EINVAL;
>>>>> +
>>>>> +	res->eof = eof;
>>>>> +	res->count = count;
>>>>> +	return 0;
>>>>> +}
>>>>> +
>>>>> static int decode_seek(struct xdr_stream *xdr, struct nfs42_seek_res
>>>>> *res)
>>>>> {
>>>>> 	int status;
>>>>> @@ -870,6 +981,33 @@ static int nfs4_xdr_dec_deallocate(struct rpc_rqst
>>>>> *rqstp,
>>>>> 	return status;
>>>>> }
>>>>> 
>>>>> +/*
>>>>> + * Decode READ_PLUS request
>>>>> + */
>>>>> +static int nfs4_xdr_dec_read_plus(struct rpc_rqst *rqstp,
>>>>> +				  struct xdr_stream *xdr,
>>>>> +				  void *data)
>>>>> +{
>>>>> +	struct nfs_pgio_res *res = data;
>>>>> +	struct compound_hdr hdr;
>>>>> +	int status;
>>>>> +
>>>>> +	status = decode_compound_hdr(xdr, &hdr);
>>>>> +	if (status)
>>>>> +		goto out;
>>>>> +	status = decode_sequence(xdr, &res->seq_res, rqstp);
>>>>> +	if (status)
>>>>> +		goto out;
>>>>> +	status = decode_putfh(xdr);
>>>>> +	if (status)
>>>>> +		goto out;
>>>>> +	status = decode_read_plus(xdr, res);
>>>>> +	if (!status)
>>>>> +		status = res->count;
>>>>> +out:
>>>>> +	return status;
>>>>> +}
>>>>> +
>>>>> /*
>>>>> * Decode SEEK request
>>>>> */
>>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>>> index 95d07a3dc5d1..ed3ec8c36273 100644
>>>>> --- a/fs/nfs/nfs4proc.c
>>>>> +++ b/fs/nfs/nfs4proc.c
>>>>> @@ -69,6 +69,10 @@
>>>>> 
>>>>> #include "nfs4trace.h"
>>>>> 
>>>>> +#ifdef CONFIG_NFS_V4_2
>>>>> +#include "nfs42.h"
>>>>> +#endif /* CONFIG_NFS_V4_2 */
>>>>> +
>>>>> #define NFSDBG_FACILITY		NFSDBG_PROC
>>>>> 
>>>>> #define NFS4_BITMASK_SZ		3
>>>>> @@ -5199,28 +5203,60 @@ static bool nfs4_read_stateid_changed(struct
>>>>> rpc_task *task,
>>>>> 	return true;
>>>>> }
>>>>> 
>>>>> +static bool nfs4_read_plus_not_supported(struct rpc_task *task,
>>>>> +					struct nfs_pgio_header *hdr)
>>>>> +{
>>>>> +	struct nfs_server *server = NFS_SERVER(hdr->inode);
>>>>> +	struct rpc_message *msg = &task->tk_msg;
>>>>> +
>>>>> +	if (msg->rpc_proc == &nfs4_procedures[NFSPROC4_CLNT_READ_PLUS]
>>>>> &&
>>>>> +	    server->caps & NFS_CAP_READ_PLUS && task->tk_status ==
>>>>> -ENOTSUPP) {
>>>>> +		server->caps &= ~NFS_CAP_READ_PLUS;
>>>>> +		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
>>>>> +		rpc_restart_call_prepare(task);
>>>>> +		return true;
>>>>> +	}
>>>>> +	return false;
>>>>> +}
>>>>> +
>>>>> static int nfs4_read_done(struct rpc_task *task, struct nfs_pgio_header
>>>>> *hdr)
>>>>> {
>>>>> -
>>>>> 	dprintk("--> %s\n", __func__);
>>>>> 
>>>>> 	if (!nfs4_sequence_done(task, &hdr->res.seq_res))
>>>>> 		return -EAGAIN;
>>>>> 	if (nfs4_read_stateid_changed(task, &hdr->args))
>>>>> 		return -EAGAIN;
>>>>> +	if (nfs4_read_plus_not_supported(task, hdr))
>>>>> +		return -EAGAIN;
>>>>> 	if (task->tk_status > 0)
>>>>> 		nfs_invalidate_atime(hdr->inode);
>>>>> 	return hdr->pgio_done_cb ? hdr->pgio_done_cb(task, hdr) :
>>>>> 				    nfs4_read_done_cb(task, hdr);
>>>>> }
>>>>> 
>>>>> +#ifdef CONFIG_NFS_V4_2
>>>>> +static void nfs42_read_plus_support(struct nfs_server *server, struct
>>>>> rpc_message *msg)
>>>>> +{
>>>>> +	if (server->caps & NFS_CAP_READ_PLUS)
>>>>> +		msg->rpc_proc =
>>>>> &nfs4_procedures[NFSPROC4_CLNT_READ_PLUS];
>>>>> +	else
>>>>> +		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
>>>>> +}
>>>>> +#else
>>>>> +static void nfs42_read_plus_support(struct nfs_server *server, struct
>>>>> rpc_message *msg)
>>>>> +{
>>>>> +	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
>>>>> +}
>>>>> +#endif /* CONFIG_NFS_V4_2 */
>>>>> +
>>>>> static void nfs4_proc_read_setup(struct nfs_pgio_header *hdr,
>>>>> 				 struct rpc_message *msg)
>>>>> {
>>>>> 	hdr->timestamp   = jiffies;
>>>>> 	if (!hdr->pgio_done_cb)
>>>>> 		hdr->pgio_done_cb = nfs4_read_done_cb;
>>>>> -	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
>>>>> +	nfs42_read_plus_support(NFS_SERVER(hdr->inode), msg);
>>>>> 	nfs4_init_sequence(&hdr->args.seq_args, &hdr->res.seq_res, 0,
>>>>> 0);
>>>>> }
>>>>> 
>>>>> @@ -9970,7 +10006,8 @@ static const struct nfs4_minor_version_ops
>>>>> nfs_v4_2_minor_ops = {
>>>>> 		| NFS_CAP_SEEK
>>>>> 		| NFS_CAP_LAYOUTSTATS
>>>>> 		| NFS_CAP_CLONE
>>>>> -		| NFS_CAP_LAYOUTERROR,
>>>>> +		| NFS_CAP_LAYOUTERROR
>>>>> +		| NFS_CAP_READ_PLUS,
>>>>> 	.init_client = nfs41_init_client,
>>>>> 	.shutdown_client = nfs41_shutdown_client,
>>>>> 	.match_stateid = nfs41_match_stateid,
>>>>> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
>>>>> index 47817ef0aadb..68b2917d0537 100644
>>>>> --- a/fs/nfs/nfs4xdr.c
>>>>> +++ b/fs/nfs/nfs4xdr.c
>>>>> @@ -7584,6 +7584,7 @@ const struct rpc_procinfo nfs4_procedures[] = {
>>>>> 	PROC42(COPY_NOTIFY,	enc_copy_notify,	dec_copy_notify)
>>>>> ,
>>>>> 	PROC(LOOKUPP,		enc_lookupp,		dec_lookupp),
>>>>> 	PROC42(LAYOUTERROR,	enc_layouterror,	dec_layouterror)
>>>>> ,
>>>>> +	PROC42(READ_PLUS,	enc_read_plus,		dec_read_plus),
>>>>> };
>>>>> 
>>>>> static unsigned int nfs_version4_counts[ARRAY_SIZE(nfs4_procedures)];
>>>>> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
>>>>> index 82d8fb422092..c1eeef52545c 100644
>>>>> --- a/include/linux/nfs4.h
>>>>> +++ b/include/linux/nfs4.h
>>>>> @@ -540,8 +540,8 @@ enum {
>>>>> 
>>>>> 	NFSPROC4_CLNT_LOOKUPP,
>>>>> 	NFSPROC4_CLNT_LAYOUTERROR,
>>>>> -
>>>>> 	NFSPROC4_CLNT_COPY_NOTIFY,
>>>>> +	NFSPROC4_CLNT_READ_PLUS,
>>>>> };
>>>>> 
>>>>> /* nfs41 types */
>>>>> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
>>>>> index 465fa98258a3..11248c5a7b24 100644
>>>>> --- a/include/linux/nfs_fs_sb.h
>>>>> +++ b/include/linux/nfs_fs_sb.h
>>>>> @@ -281,5 +281,6 @@ struct nfs_server {
>>>>> #define NFS_CAP_OFFLOAD_CANCEL	(1U << 25)
>>>>> #define NFS_CAP_LAYOUTERROR	(1U << 26)
>>>>> #define NFS_CAP_COPY_NOTIFY	(1U << 27)
>>>>> +#define NFS_CAP_READ_PLUS	(1U << 28)
>>>>> 
>>>>> #endif
>>>>> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
>>>>> index 94c77ed55ce1..8efbf3d8b263 100644
>>>>> --- a/include/linux/nfs_xdr.h
>>>>> +++ b/include/linux/nfs_xdr.h
>>>>> @@ -655,7 +655,7 @@ struct nfs_pgio_args {
>>>>> struct nfs_pgio_res {
>>>>> 	struct nfs4_sequence_res	seq_res;
>>>>> 	struct nfs_fattr *	fattr;
>>>>> -	__u32			count;
>>>>> +	__u64			count;
>>>>> 	__u32			op_status;
>>>>> 	union {
>>>>> 		struct {
>>>>> -- 
>>>>> 2.25.0
>>>>> 
>>>> 
>>>> --
>>>> Chuck Lever
>> 
>> --
>> Chuck Lever

--
Chuck Lever




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/6] NFS: Add READ_PLUS data segment support
  2020-02-20 18:30           ` Chuck Lever
@ 2020-02-20 18:35             ` Anna Schumaker
  2020-02-20 18:54               ` Chuck Lever
  0 siblings, 1 reply; 17+ messages in thread
From: Anna Schumaker @ 2020-02-20 18:35 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond.Myklebust, Linux NFS Mailing List

On Thu, 2020-02-20 at 13:30 -0500, Chuck Lever wrote:
> > On Feb 20, 2020, at 1:28 PM, Anna Schumaker <schumaker.anna@gmail.com>
> > wrote:
> > 
> > On Thu, 2020-02-20 at 09:55 -0500, Chuck Lever wrote:
> > > > On Feb 20, 2020, at 9:42 AM, Anna Schumaker <schumaker.anna@gmail.com>
> > > > wrote:
> > > > 
> > > > On Fri, 2020-02-14 at 17:28 -0500, Chuck Lever wrote:
> > > > > > On Feb 14, 2020, at 4:12 PM, schumaker.anna@gmail.com wrote:
> > > > > > 
> > > > > > From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > > > > > 
> > > > > > This patch adds client support for decoding a single
> > > > > > NFS4_CONTENT_DATA
> > > > > > segment returned by the server. This is the simplest implementation
> > > > > > possible, since it does not account for any hole segments in the
> > > > > > reply.
> > > > > > 
> > > > > > Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > > > > > ---
> > > > > > fs/nfs/nfs42xdr.c         | 138
> > > > > > ++++++++++++++++++++++++++++++++++++++
> > > > > > fs/nfs/nfs4proc.c         |  43 +++++++++++-
> > > > > > fs/nfs/nfs4xdr.c          |   1 +
> > > > > > include/linux/nfs4.h      |   2 +-
> > > > > > include/linux/nfs_fs_sb.h |   1 +
> > > > > > include/linux/nfs_xdr.h   |   2 +-
> > > > > > 6 files changed, 182 insertions(+), 5 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> > > > > > index c03f3246d6c5..bf118ecabe2c 100644
> > > > > > --- a/fs/nfs/nfs42xdr.c
> > > > > > +++ b/fs/nfs/nfs42xdr.c
> > > > > > @@ -45,6 +45,15 @@
> > > > > > #define encode_deallocate_maxsz		(op_encode_hdr_maxsz + \
> > > > > > 					encode_fallocate_maxsz)
> > > > > > #define decode_deallocate_maxsz		(op_decode_hdr_maxsz)
> > > > > > +#define encode_read_plus_maxsz		(op_encode_hdr_maxsz + \
> > > > > > +					encode_stateid_maxsz + 3)
> > > > > > +#define NFS42_READ_PLUS_SEGMENT_SIZE	(1 /* data_content4 */ +
> > > > > > \
> > > > > > +					 2 /* data_info4.di_offset */ +
> > > > > > \
> > > > > > +					 2 /* data_info4.di_length */)
> > > > > > +#define decode_read_plus_maxsz		(op_decode_hdr_maxsz + \
> > > > > > +					 1 /* rpr_eof */ + \
> > > > > > +					 1 /* rpr_contents count */ + \
> > > > > > +					NFS42_READ_PLUS_SEGMENT_SIZE)
> > > > > > #define encode_seek_maxsz		(op_encode_hdr_maxsz + \
> > > > > > 					encode_stateid_maxsz + \
> > > > > > 					 2 /* offset */ + \
> > > > > > @@ -128,6 +137,14 @@
> > > > > > 					decode_putfh_maxsz + \
> > > > > > 					decode_deallocate_maxsz + \
> > > > > > 					decode_getattr_maxsz)
> > > > > > +#define NFS4_enc_read_plus_sz		(compound_encode_hdr_max
> > > > > > sz + \
> > > > > > +					encode_sequence_maxsz + \
> > > > > > +					encode_putfh_maxsz + \
> > > > > > +					encode_read_plus_maxsz)
> > > > > > +#define NFS4_dec_read_plus_sz		(compound_decode_hdr_max
> > > > > > sz + \
> > > > > > +					decode_sequence_maxsz + \
> > > > > > +					decode_putfh_maxsz + \
> > > > > > +					decode_read_plus_maxsz)
> > > > > > #define NFS4_enc_seek_sz		(compound_encode_hdr_maxsz + \
> > > > > > 					encode_sequence_maxsz + \
> > > > > > 					encode_putfh_maxsz + \
> > > > > > @@ -252,6 +269,16 @@ static void encode_deallocate(struct xdr_stream
> > > > > > *xdr,
> > > > > > 	encode_fallocate(xdr, args);
> > > > > > }
> > > > > > 
> > > > > > +static void encode_read_plus(struct xdr_stream *xdr,
> > > > > > +			     const struct nfs_pgio_args *args,
> > > > > > +			     struct compound_hdr *hdr)
> > > > > > +{
> > > > > > +	encode_op_hdr(xdr, OP_READ_PLUS, decode_read_plus_maxsz, hdr);
> > > > > > +	encode_nfs4_stateid(xdr, &args->stateid);
> > > > > > +	encode_uint64(xdr, args->offset);
> > > > > > +	encode_uint32(xdr, args->count);
> > > > > > +}
> > > > > > +
> > > > > > static void encode_seek(struct xdr_stream *xdr,
> > > > > > 			const struct nfs42_seek_args *args,
> > > > > > 			struct compound_hdr *hdr)
> > > > > > @@ -446,6 +473,29 @@ static void nfs4_xdr_enc_deallocate(struct
> > > > > > rpc_rqst
> > > > > > *req,
> > > > > > 	encode_nops(&hdr);
> > > > > > }
> > > > > > 
> > > > > > +/*
> > > > > > + * Encode READ_PLUS request
> > > > > > + */
> > > > > > +static void nfs4_xdr_enc_read_plus(struct rpc_rqst *req,
> > > > > > +				   struct xdr_stream *xdr,
> > > > > > +				   const void *data)
> > > > > > +{
> > > > > > +	const struct nfs_pgio_args *args = data;
> > > > > > +	struct compound_hdr hdr = {
> > > > > > +		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
> > > > > > +	};
> > > > > > +
> > > > > > +	encode_compound_hdr(xdr, req, &hdr);
> > > > > > +	encode_sequence(xdr, &args->seq_args, &hdr);
> > > > > > +	encode_putfh(xdr, args->fh, &hdr);
> > > > > > +	encode_read_plus(xdr, args, &hdr);
> > > > > > +
> > > > > > +	rpc_prepare_reply_pages(req, args->pages, args->pgbase,
> > > > > > +				args->count, hdr.replen);
> > > > > > +	req->rq_rcv_buf.flags |= XDRBUF_READ;
> > > > > 
> > > > > IMO this line is incorrect.
> > > > 
> > > > You're right, this line causes problems for RDMA with READ_PLUS. I added
> > > > it
> > > > to
> > > > match how the other xdr read encoders were set up
> > > 
> > > Ja, I think just removing that line should be sufficient.
> > > Better would be replacing it with a comment explaining
> > > why this encoder does not set XDRBUF_READ. :-)
> > > 
> > > 
> > > > > RFC 8267 Section 6.1 does not list any part of the result of READ_PLUS
> > > > > as DDP-eligible. There's no way for a client to know how to set up
> > > > > Write chunks, unless it knows exactly where the file's holes are in
> > > > > advance. Even then... racy.
> > > > > 
> > > > > Just curious, have you tried READ_PLUS with proto=rdma ?
> > > > 
> > > > I haven't done in-depth performance testing, but I have been able to run
> > > > it.
> > > 
> > > We should figure out whether that will have a regressive
> > > impact on NFS/RDMA workloads. I expect that it will, but
> > > the client can always set up the Reply chunk so that the
> > > READ payload fits precisely in an RDMA segment that lines
> > > up with page cache pages. That mitigates some impact.
> > > 
> > > If your patch set already changes NFSv4.2 mounts to always
> > > use READ_PLUS in place of READ, it might be prudent for the
> > > "proto=rdma" mount option to also set "noreadplus", at least
> > > for the time being.
> > 
> > I can make this change.
> > 
> > > The down-side here is that would make NFSv4.2 on RDMA
> > > unable to recognize holes in files the same way as it
> > > does on TCP, and that's a pretty significant variation
> > > in behavior. Does "noreadplus" even deal with that?
> > 
> > Setting "noreadplus" just causes the client to use the READ operation
> > instead,
> > so there should be no difference between v4.1 and v4.2 if the option is set.
> 
> My concern is the difference between NFSv4.2 with noreadplus
> and NFSv4.2 with readplus. The former is not able to detect
> holes in files on the server, but the latter is.

The client could still use lseek to detect holes. The client throws away the
hole information after xdr decoding, and zeroes out the corresponding pages for
the page cache.

> 
> Is that worth mentioning in the man page, or in release notes
> when NFSv4.2 becomes the default?
> 
> 
> > Anna
> > 
> > > 
> > > > Anna
> > > > 
> > > > > > +	encode_nops(&hdr);
> > > > > > +}
> > > > > > +
> > > > > > /*
> > > > > > * Encode SEEK request
> > > > > > */
> > > > > > @@ -694,6 +744,67 @@ static int decode_deallocate(struct xdr_stream
> > > > > > *xdr,
> > > > > > struct nfs42_falloc_res *re
> > > > > > 	return decode_op_hdr(xdr, OP_DEALLOCATE);
> > > > > > }
> > > > > > 
> > > > > > +static uint32_t decode_read_plus_data(struct xdr_stream *xdr,
> > > > > > struct
> > > > > > nfs_pgio_res *res,
> > > > > > +				      uint32_t *eof)
> > > > > > +{
> > > > > > +	__be32 *p;
> > > > > > +	uint32_t count, recvd;
> > > > > > +	uint64_t offset;
> > > > > > +
> > > > > > +	p = xdr_inline_decode(xdr, 8 + 4);
> > > > > > +	if (unlikely(!p))
> > > > > > +		return -EIO;
> > > > > > +
> > > > > > +	p = xdr_decode_hyper(p, &offset);
> > > > > > +	count = be32_to_cpup(p);
> > > > > > +	if (count == 0)
> > > > > > +		return 0;
> > > > > > +
> > > > > > +	recvd = xdr_read_pages(xdr, count);
> > > > > > +	if (count > recvd) {
> > > > > > +		dprintk("NFS: server cheating in read reply: "
> > > > > > +				"count %u > recvd %u\n", count, recvd);
> > > > > > +		count = recvd;
> > > > > > +		*eof = 0;
> > > > > > +	}
> > > > > > +
> > > > > > +	return count;
> > > > > > +}
> > > > > > +
> > > > > > +static int decode_read_plus(struct xdr_stream *xdr, struct
> > > > > > nfs_pgio_res
> > > > > > *res)
> > > > > > +{
> > > > > > +	__be32 *p;
> > > > > > +	uint32_t count, eof, segments, type;
> > > > > > +	int status;
> > > > > > +
> > > > > > +	status = decode_op_hdr(xdr, OP_READ_PLUS);
> > > > > > +	if (status)
> > > > > > +		return status;
> > > > > > +
> > > > > > +	p = xdr_inline_decode(xdr, 4 + 4);
> > > > > > +	if (unlikely(!p))
> > > > > > +		return -EIO;
> > > > > > +
> > > > > > +	eof = be32_to_cpup(p++);
> > > > > > +	segments = be32_to_cpup(p++);
> > > > > > +	if (segments == 0)
> > > > > > +		return 0;
> > > > > > +
> > > > > > +	p = xdr_inline_decode(xdr, 4);
> > > > > > +	if (unlikely(!p))
> > > > > > +		return -EIO;
> > > > > > +
> > > > > > +	type = be32_to_cpup(p++);
> > > > > > +	if (type == NFS4_CONTENT_DATA)
> > > > > > +		count = decode_read_plus_data(xdr, res, &eof);
> > > > > > +	else
> > > > > > +		return -EINVAL;
> > > > > > +
> > > > > > +	res->eof = eof;
> > > > > > +	res->count = count;
> > > > > > +	return 0;
> > > > > > +}
> > > > > > +
> > > > > > static int decode_seek(struct xdr_stream *xdr, struct nfs42_seek_res
> > > > > > *res)
> > > > > > {
> > > > > > 	int status;
> > > > > > @@ -870,6 +981,33 @@ static int nfs4_xdr_dec_deallocate(struct
> > > > > > rpc_rqst
> > > > > > *rqstp,
> > > > > > 	return status;
> > > > > > }
> > > > > > 
> > > > > > +/*
> > > > > > + * Decode READ_PLUS request
> > > > > > + */
> > > > > > +static int nfs4_xdr_dec_read_plus(struct rpc_rqst *rqstp,
> > > > > > +				  struct xdr_stream *xdr,
> > > > > > +				  void *data)
> > > > > > +{
> > > > > > +	struct nfs_pgio_res *res = data;
> > > > > > +	struct compound_hdr hdr;
> > > > > > +	int status;
> > > > > > +
> > > > > > +	status = decode_compound_hdr(xdr, &hdr);
> > > > > > +	if (status)
> > > > > > +		goto out;
> > > > > > +	status = decode_sequence(xdr, &res->seq_res, rqstp);
> > > > > > +	if (status)
> > > > > > +		goto out;
> > > > > > +	status = decode_putfh(xdr);
> > > > > > +	if (status)
> > > > > > +		goto out;
> > > > > > +	status = decode_read_plus(xdr, res);
> > > > > > +	if (!status)
> > > > > > +		status = res->count;
> > > > > > +out:
> > > > > > +	return status;
> > > > > > +}
> > > > > > +
> > > > > > /*
> > > > > > * Decode SEEK request
> > > > > > */
> > > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > > > > > index 95d07a3dc5d1..ed3ec8c36273 100644
> > > > > > --- a/fs/nfs/nfs4proc.c
> > > > > > +++ b/fs/nfs/nfs4proc.c
> > > > > > @@ -69,6 +69,10 @@
> > > > > > 
> > > > > > #include "nfs4trace.h"
> > > > > > 
> > > > > > +#ifdef CONFIG_NFS_V4_2
> > > > > > +#include "nfs42.h"
> > > > > > +#endif /* CONFIG_NFS_V4_2 */
> > > > > > +
> > > > > > #define NFSDBG_FACILITY		NFSDBG_PROC
> > > > > > 
> > > > > > #define NFS4_BITMASK_SZ		3
> > > > > > @@ -5199,28 +5203,60 @@ static bool nfs4_read_stateid_changed(struct
> > > > > > rpc_task *task,
> > > > > > 	return true;
> > > > > > }
> > > > > > 
> > > > > > +static bool nfs4_read_plus_not_supported(struct rpc_task *task,
> > > > > > +					struct nfs_pgio_header *hdr)
> > > > > > +{
> > > > > > +	struct nfs_server *server = NFS_SERVER(hdr->inode);
> > > > > > +	struct rpc_message *msg = &task->tk_msg;
> > > > > > +
> > > > > > +	if (msg->rpc_proc == &nfs4_procedures[NFSPROC4_CLNT_READ_PLUS]
> > > > > > &&
> > > > > > +	    server->caps & NFS_CAP_READ_PLUS && task->tk_status ==
> > > > > > -ENOTSUPP) {
> > > > > > +		server->caps &= ~NFS_CAP_READ_PLUS;
> > > > > > +		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> > > > > > +		rpc_restart_call_prepare(task);
> > > > > > +		return true;
> > > > > > +	}
> > > > > > +	return false;
> > > > > > +}
> > > > > > +
> > > > > > static int nfs4_read_done(struct rpc_task *task, struct
> > > > > > nfs_pgio_header
> > > > > > *hdr)
> > > > > > {
> > > > > > -
> > > > > > 	dprintk("--> %s\n", __func__);
> > > > > > 
> > > > > > 	if (!nfs4_sequence_done(task, &hdr->res.seq_res))
> > > > > > 		return -EAGAIN;
> > > > > > 	if (nfs4_read_stateid_changed(task, &hdr->args))
> > > > > > 		return -EAGAIN;
> > > > > > +	if (nfs4_read_plus_not_supported(task, hdr))
> > > > > > +		return -EAGAIN;
> > > > > > 	if (task->tk_status > 0)
> > > > > > 		nfs_invalidate_atime(hdr->inode);
> > > > > > 	return hdr->pgio_done_cb ? hdr->pgio_done_cb(task, hdr) :
> > > > > > 				    nfs4_read_done_cb(task, hdr);
> > > > > > }
> > > > > > 
> > > > > > +#ifdef CONFIG_NFS_V4_2
> > > > > > +static void nfs42_read_plus_support(struct nfs_server *server,
> > > > > > struct
> > > > > > rpc_message *msg)
> > > > > > +{
> > > > > > +	if (server->caps & NFS_CAP_READ_PLUS)
> > > > > > +		msg->rpc_proc =
> > > > > > &nfs4_procedures[NFSPROC4_CLNT_READ_PLUS];
> > > > > > +	else
> > > > > > +		msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> > > > > > +}
> > > > > > +#else
> > > > > > +static void nfs42_read_plus_support(struct nfs_server *server,
> > > > > > struct
> > > > > > rpc_message *msg)
> > > > > > +{
> > > > > > +	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> > > > > > +}
> > > > > > +#endif /* CONFIG_NFS_V4_2 */
> > > > > > +
> > > > > > static void nfs4_proc_read_setup(struct nfs_pgio_header *hdr,
> > > > > > 				 struct rpc_message *msg)
> > > > > > {
> > > > > > 	hdr->timestamp   = jiffies;
> > > > > > 	if (!hdr->pgio_done_cb)
> > > > > > 		hdr->pgio_done_cb = nfs4_read_done_cb;
> > > > > > -	msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> > > > > > +	nfs42_read_plus_support(NFS_SERVER(hdr->inode), msg);
> > > > > > 	nfs4_init_sequence(&hdr->args.seq_args, &hdr->res.seq_res, 0,
> > > > > > 0);
> > > > > > }
> > > > > > 
> > > > > > @@ -9970,7 +10006,8 @@ static const struct nfs4_minor_version_ops
> > > > > > nfs_v4_2_minor_ops = {
> > > > > > 		| NFS_CAP_SEEK
> > > > > > 		| NFS_CAP_LAYOUTSTATS
> > > > > > 		| NFS_CAP_CLONE
> > > > > > -		| NFS_CAP_LAYOUTERROR,
> > > > > > +		| NFS_CAP_LAYOUTERROR
> > > > > > +		| NFS_CAP_READ_PLUS,
> > > > > > 	.init_client = nfs41_init_client,
> > > > > > 	.shutdown_client = nfs41_shutdown_client,
> > > > > > 	.match_stateid = nfs41_match_stateid,
> > > > > > diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> > > > > > index 47817ef0aadb..68b2917d0537 100644
> > > > > > --- a/fs/nfs/nfs4xdr.c
> > > > > > +++ b/fs/nfs/nfs4xdr.c
> > > > > > @@ -7584,6 +7584,7 @@ const struct rpc_procinfo nfs4_procedures[] =
> > > > > > {
> > > > > > 	PROC42(COPY_NOTIFY,	enc_copy_notify,	dec_copy_notify)
> > > > > > ,
> > > > > > 	PROC(LOOKUPP,		enc_lookupp,		dec_lookupp)
> > > > > > ,
> > > > > > 	PROC42(LAYOUTERROR,	enc_layouterror,	dec_layouterror)
> > > > > > ,
> > > > > > +	PROC42(READ_PLUS,	enc_read_plus,		dec_read_plus),
> > > > > > };
> > > > > > 
> > > > > > static unsigned int
> > > > > > nfs_version4_counts[ARRAY_SIZE(nfs4_procedures)];
> > > > > > diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> > > > > > index 82d8fb422092..c1eeef52545c 100644
> > > > > > --- a/include/linux/nfs4.h
> > > > > > +++ b/include/linux/nfs4.h
> > > > > > @@ -540,8 +540,8 @@ enum {
> > > > > > 
> > > > > > 	NFSPROC4_CLNT_LOOKUPP,
> > > > > > 	NFSPROC4_CLNT_LAYOUTERROR,
> > > > > > -
> > > > > > 	NFSPROC4_CLNT_COPY_NOTIFY,
> > > > > > +	NFSPROC4_CLNT_READ_PLUS,
> > > > > > };
> > > > > > 
> > > > > > /* nfs41 types */
> > > > > > diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> > > > > > index 465fa98258a3..11248c5a7b24 100644
> > > > > > --- a/include/linux/nfs_fs_sb.h
> > > > > > +++ b/include/linux/nfs_fs_sb.h
> > > > > > @@ -281,5 +281,6 @@ struct nfs_server {
> > > > > > #define NFS_CAP_OFFLOAD_CANCEL	(1U << 25)
> > > > > > #define NFS_CAP_LAYOUTERROR	(1U << 26)
> > > > > > #define NFS_CAP_COPY_NOTIFY	(1U << 27)
> > > > > > +#define NFS_CAP_READ_PLUS	(1U << 28)
> > > > > > 
> > > > > > #endif
> > > > > > diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> > > > > > index 94c77ed55ce1..8efbf3d8b263 100644
> > > > > > --- a/include/linux/nfs_xdr.h
> > > > > > +++ b/include/linux/nfs_xdr.h
> > > > > > @@ -655,7 +655,7 @@ struct nfs_pgio_args {
> > > > > > struct nfs_pgio_res {
> > > > > > 	struct nfs4_sequence_res	seq_res;
> > > > > > 	struct nfs_fattr *	fattr;
> > > > > > -	__u32			count;
> > > > > > +	__u64			count;
> > > > > > 	__u32			op_status;
> > > > > > 	union {
> > > > > > 		struct {
> > > > > > -- 
> > > > > > 2.25.0
> > > > > > 
> > > > > 
> > > > > --
> > > > > Chuck Lever
> > > 
> > > --
> > > Chuck Lever
> 
> --
> Chuck Lever
> 
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/6] NFS: Add READ_PLUS data segment support
  2020-02-20 18:35             ` Anna Schumaker
@ 2020-02-20 18:54               ` Chuck Lever
  0 siblings, 0 replies; 17+ messages in thread
From: Chuck Lever @ 2020-02-20 18:54 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Trond.Myklebust, Linux NFS Mailing List


> On Feb 20, 2020, at 1:35 PM, Anna Schumaker <schumaker.anna@gmail.com> wrote:
> 
> On Thu, 2020-02-20 at 13:30 -0500, Chuck Lever wrote:
>>> On Feb 20, 2020, at 1:28 PM, Anna Schumaker <schumaker.anna@gmail.com>
>>> wrote:
>>> 
>>> On Thu, 2020-02-20 at 09:55 -0500, Chuck Lever wrote:
>>> 
>>>> The down-side here is that would make NFSv4.2 on RDMA
>>>> unable to recognize holes in files the same way as it
>>>> does on TCP, and that's a pretty significant variation
>>>> in behavior. Does "noreadplus" even deal with that?
>>> 
>>> Setting "noreadplus" just causes the client to use the READ operation
>>> instead,
>>> so there should be no difference between v4.1 and v4.2 if the option is set.
>> 
>> My concern is the difference between NFSv4.2 with noreadplus
>> and NFSv4.2 with readplus. The former is not able to detect
>> holes in files on the server, but the latter is.
> 
> The client could still use lseek to detect holes. The client throws away the
> hole information after xdr decoding, and zeroes out the corresponding pages for
> the page cache.

Then maybe that's an optimization for another day. The READ_PLUS
reply can have hole information. The client could save itself some
SEEK_HOLE operations if it cached that hole information somehow.

But if "readplus" and "noreadplus" result in exactly the same hole
retention behavior on our client, I guess there's no harm in using
"noreadplus" instead, except for possible performance regression.

An alternative to making "noreadplus" the default would be to
temporarily add "noreadplus" semantics to the "proto=rdma" mount
option, as a separate patch; maybe after some performance results
show that it is necessary. That would keep the mount interface a
little less complicated.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, back to index

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-14 21:12 [PATCH v2 0/6] NFS: Add support for the v4.2 READ_PLUS operation schumaker.anna
2020-02-14 21:12 ` [PATCH v2 1/6] SUNRPC: Split out a function for setting current page schumaker.anna
2020-02-14 21:12 ` [PATCH v2 2/6] SUNRPC: Add the ability to expand holes in data pages schumaker.anna
2020-02-14 21:12 ` [PATCH v2 3/6] SUNRPC: Add the ability to shift data to a specific offset schumaker.anna
2020-02-14 21:12 ` [PATCH v2 4/6] NFS: Add READ_PLUS data segment support schumaker.anna
2020-02-14 22:28   ` Chuck Lever
2020-02-20 14:42     ` Anna Schumaker
2020-02-20 14:55       ` Chuck Lever
2020-02-20 18:28         ` Anna Schumaker
2020-02-20 18:30           ` Chuck Lever
2020-02-20 18:35             ` Anna Schumaker
2020-02-20 18:54               ` Chuck Lever
2020-02-14 21:12 ` [PATCH v2 5/6] NFS: Add READ_PLUS hole segment decoding schumaker.anna
2020-02-14 21:12 ` [PATCH v2 6/6] NFS: Decode multiple READ_PLUS segments schumaker.anna
2020-02-20 17:40 ` [PATCH v2 0/6] NFS: Add support for the v4.2 READ_PLUS operation Chuck Lever
2020-02-20 18:25   ` Anna Schumaker
2020-02-20 18:29     ` Chuck Lever

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org
	public-inbox-index linux-nfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git