linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] NFSD: Simplify READ_PLUS
@ 2022-09-13 18:01 Anna Schumaker
  2022-09-13 18:01 ` [PATCH v4 1/2] NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data Anna Schumaker
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Anna Schumaker @ 2022-09-13 18:01 UTC (permalink / raw)
  To: linux-nfs, chuck.lever; +Cc: anna

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

When we left off with READ_PLUS, Chuck had suggested reverting the
server to reply with a single NFS4_CONTENT_DATA segment essentially
mimicing how the READ operation behaves. Then, a future sparse read
function can be added and the server modified to support it without
needing to rip out the old READ_PLUS code at the same time.

This patch takes that first step. I was even able to re-use the
nfsd4_encode_readv() and nfsd4_encode_splice_read() functions to
remove some duuplicate code.

Below is some performance data comparing the READ and READ_PLUS
operations with v4.2. I tested reading 2G files with various hole
lengths including 100% data, 100% hole, and a handful of mixed hole and
data files. For the mixed files, a notation like "1d" means
every-other-page is data, and the first page is data. "4h" would mean
alternating 4 pages data and 4 pages hole, beginning with hole.

I also used the 'vmtouch' utility to make sure the file is either
evicted from the server's pagecache ("Uncached on server") or present in
the server's page cache ("Cached on server").

   2048M-data
   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.555 s, 712 MB/s, 0.74 s kern, 24% cpu
   :    :........................... Cached on server .....  1.346 s, 1.6 GB/s, 0.69 s kern, 52% cpu
   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.596 s, 690 MB/s, 0.72 s kern, 23% cpu
        :........................... Cached on server .....  1.394 s, 1.6 GB/s, 0.67 s kern, 48% cpu
   2048M-hole
   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  4.934 s, 762 MB/s, 1.86 s kern, 29% cpu
   :    :........................... Cached on server .....  1.328 s, 1.6 GB/s, 0.72 s kern, 54% cpu
   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  4.823 s, 739 MB/s, 1.88 s kern, 28% cpu
        :........................... Cached on server .....  1.399 s, 1.5 GB/s, 0.70 s kern, 50% cpu
   2048M-mixed-1d
   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  4.480 s, 598 MB/s, 0.76 s kern, 21% cpu
   :    :........................... Cached on server .....  1.445 s, 1.5 GB/s, 0.71 s kern, 50% cpu
   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  4.774 s, 559 MB/s, 0.75 s kern, 19% cpu
        :........................... Cached on server .....  1.514 s, 1.4 GB/s, 0.67 s kern, 44% cpu
   2048M-mixed-1h
   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.568 s, 633 MB/s, 0.78 s kern, 23% cpu
   :    :........................... Cached on server .....  1.357 s, 1.6 GB/s, 0.71 s kern, 53% cpu
   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.580 s, 641 MB/s, 0.74 s kern, 22% cpu
        :........................... Cached on server .....  1.396 s, 1.5 GB/s, 0.67 s kern, 48% cpu
   2048M-mixed-2d
   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.159 s, 708 MB/s, 0.78 s kern, 26% cpu
   :    :........................... Cached on server .....  1.410 s, 1.5 GB/s, 0.70 s kern, 50% cpu
   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.093 s, 712 MB/s, 0.74 s kern, 25% cpu
        :........................... Cached on server .....  1.474 s, 1.4 GB/s, 0.67 s kern, 46% cpu
   2048M-mixed-2h
   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.043 s, 722 MB/s, 0.78 s kern, 26% cpu
   :    :........................... Cached on server .....  1.374 s, 1.6 GB/s, 0.72 s kern, 53% cpu
   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.913 s, 756 MB/s, 0.74 s kern, 26% cpu
        :........................... Cached on server .....  1.349 s, 1.6 GB/s, 0.67 s kern, 50% cpu
   2048M-mixed-4d
   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.275 s, 680 MB/s, 0.75 s kern, 24% cpu
   :    :........................... Cached on server .....  1.391 s, 1.5 GB/s, 0.71 s kern, 52% cpu
   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.470 s, 626 MB/s, 0.72 s kern, 21% cpu
        :........................... Cached on server .....  1.456 s, 1.5 GB/s, 0.67 s kern, 46% cpu
   2048M-mixed-4h
   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.035 s, 743 MB/s, 0.74 s kern, 26% cpu
   :    :........................... Cached on server .....  1.345 s, 1.6 GB/s, 0.71 s kern, 53% cpu
   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.848 s, 779 MB/s, 0.73 s kern, 26% cpu
        :........................... Cached on server .....  1.421 s, 1.5 GB/s, 0.68 s kern, 48% cpu
   2048M-mixed-8d
   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.262 s, 687 MB/s, 0.74 s kern, 24% cpu
   :    :........................... Cached on server .....  1.366 s, 1.6 GB/s, 0.69 s kern, 51% cpu
   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.195 s, 709 MB/s, 0.72 s kern, 24% cpu
        :........................... Cached on server .....  1.414 s, 1.5 GB/s, 0.67 s kern, 48% cpu
   2048M-mixed-8h
   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  2.899 s, 789 MB/s, 0.73 s kern, 27% cpu
   :    :........................... Cached on server .....  1.338 s, 1.6 GB/s, 0.69 s kern, 52% cpu
   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.910 s, 772 MB/s, 0.72 s kern, 26% cpu
        :........................... Cached on server .....  1.438 s, 1.5 GB/s, 0.67 s kern, 47% cpu
   2048M-mixed-16d
   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.416 s, 661 MB/s, 0.73 s kern, 23% cpu
   :    :........................... Cached on server .....  1.345 s, 1.6 GB/s, 0.70 s kern, 53% cpu
   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.177 s, 713 MB/s, 0.70 s kern, 23% cpu
        :........................... Cached on server .....  1.447 s, 1.5 GB/s, 0.68 s kern, 47% cpu
   2048M-mixed-16h
   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  2.919 s, 780 MB/s, 0.73 s kern, 26% cpu
   :    :........................... Cached on server .....  1.363 s, 1.6 GB/s, 0.70 s kern, 51% cpu
   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.934 s, 773 MB/s, 0.70 s kern, 25% cpu
        :........................... Cached on server .....  1.435 s, 1.5 GB/s, 0.67 s kern, 47% cpu

- v4:
  - Change READ and READ_PLUS to return nfserr_serverfault if the splice
    splice check fails.

Thanks,
Anna


Anna Schumaker (2):
  NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data
  NFSD: Simplify READ_PLUS

 fs/nfsd/nfs4xdr.c | 141 +++++++++++-----------------------------------
 1 file changed, 33 insertions(+), 108 deletions(-)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 1/2] NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data
  2022-09-13 18:01 [PATCH v4 0/2] NFSD: Simplify READ_PLUS Anna Schumaker
@ 2022-09-13 18:01 ` Anna Schumaker
  2022-09-13 20:12   ` Chuck Lever III
                     ` (2 more replies)
  2022-09-13 18:01 ` [PATCH v4 2/2] NFSD: Simplify READ_PLUS Anna Schumaker
  2022-09-13 18:45 ` [PATCH v4 0/2] " Chuck Lever III
  2 siblings, 3 replies; 15+ messages in thread
From: Anna Schumaker @ 2022-09-13 18:01 UTC (permalink / raw)
  To: linux-nfs, chuck.lever; +Cc: anna

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

This was discussed with Chuck as part of this patch set. Returning
nfserr_resource was decided to not be the best error message here, and
he suggested changing to nfserr_serverfault instead.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfsd/nfs4xdr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 1e9690a061ec..01dd73ed5720 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3994,7 +3994,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
 	}
 	if (resp->xdr->buf->page_len && splice_ok) {
 		WARN_ON_ONCE(1);
-		return nfserr_resource;
+		return nfserr_serverfault;
 	}
 	xdr_commit_encode(xdr);
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 2/2] NFSD: Simplify READ_PLUS
  2022-09-13 18:01 [PATCH v4 0/2] NFSD: Simplify READ_PLUS Anna Schumaker
  2022-09-13 18:01 ` [PATCH v4 1/2] NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data Anna Schumaker
@ 2022-09-13 18:01 ` Anna Schumaker
  2022-10-06 16:35   ` Jeff Layton
  2022-09-13 18:45 ` [PATCH v4 0/2] " Chuck Lever III
  2 siblings, 1 reply; 15+ messages in thread
From: Anna Schumaker @ 2022-09-13 18:01 UTC (permalink / raw)
  To: linux-nfs, chuck.lever; +Cc: anna

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

Chuck had suggested reverting READ_PLUS so it returns a single DATA
segment covering the requested read range. This prepares the server for
a future "sparse read" function so support can easily be added without
needing to rip out the old READ_PLUS code at the same time.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfsd/nfs4xdr.c | 139 +++++++++++-----------------------------------
 1 file changed, 32 insertions(+), 107 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 01dd73ed5720..280c7c8ac807 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4731,79 +4731,37 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
 
 static __be32
 nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
-			    struct nfsd4_read *read,
-			    unsigned long *maxcount, u32 *eof,
-			    loff_t *pos)
+			    struct nfsd4_read *read)
 {
-	struct xdr_stream *xdr = resp->xdr;
+	bool splice_ok = test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags);
 	struct file *file = read->rd_nf->nf_file;
-	int starting_len = xdr->buf->len;
-	loff_t hole_pos;
-	__be32 nfserr;
-	__be32 *p, tmp;
-	__be64 tmp64;
-
-	hole_pos = pos ? *pos : vfs_llseek(file, read->rd_offset, SEEK_HOLE);
-	if (hole_pos > read->rd_offset)
-		*maxcount = min_t(unsigned long, *maxcount, hole_pos - read->rd_offset);
-	*maxcount = min_t(unsigned long, *maxcount, (xdr->buf->buflen - xdr->buf->len));
+	struct xdr_stream *xdr = resp->xdr;
+	unsigned long maxcount;
+	__be32 nfserr, *p;
 
 	/* Content type, offset, byte count */
 	p = xdr_reserve_space(xdr, 4 + 8 + 4);
 	if (!p)
-		return nfserr_resource;
+		return nfserr_io;
+	if (resp->xdr->buf->page_len && splice_ok) {
+		WARN_ON_ONCE(splice_ok);
+		return nfserr_serverfault;
+	}
 
-	read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, *maxcount);
-	if (read->rd_vlen < 0)
-		return nfserr_resource;
+	maxcount = min_t(unsigned long, read->rd_length,
+			 (xdr->buf->buflen - xdr->buf->len));
 
-	nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
-			    resp->rqstp->rq_vec, read->rd_vlen, maxcount, eof);
+	if (file->f_op->splice_read && splice_ok)
+		nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount);
+	else
+		nfserr = nfsd4_encode_readv(resp, read, file, maxcount);
 	if (nfserr)
 		return nfserr;
-	xdr_truncate_encode(xdr, starting_len + 16 + xdr_align_size(*maxcount));
 
-	tmp = htonl(NFS4_CONTENT_DATA);
-	write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
-	tmp64 = cpu_to_be64(read->rd_offset);
-	write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
-	tmp = htonl(*maxcount);
-	write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
-
-	tmp = xdr_zero;
-	write_bytes_to_xdr_buf(xdr->buf, starting_len + 16 + *maxcount, &tmp,
-			       xdr_pad_size(*maxcount));
-	return nfs_ok;
-}
-
-static __be32
-nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
-			    struct nfsd4_read *read,
-			    unsigned long *maxcount, u32 *eof)
-{
-	struct file *file = read->rd_nf->nf_file;
-	loff_t data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
-	loff_t f_size = i_size_read(file_inode(file));
-	unsigned long count;
-	__be32 *p;
-
-	if (data_pos == -ENXIO)
-		data_pos = f_size;
-	else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE))
-		return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size);
-	count = data_pos - read->rd_offset;
-
-	/* Content type, offset, byte count */
-	p = xdr_reserve_space(resp->xdr, 4 + 8 + 8);
-	if (!p)
-		return nfserr_resource;
-
-	*p++ = htonl(NFS4_CONTENT_HOLE);
+	*p++ = cpu_to_be32(NFS4_CONTENT_DATA);
 	p = xdr_encode_hyper(p, read->rd_offset);
-	p = xdr_encode_hyper(p, count);
+	*p = cpu_to_be32(read->rd_length);
 
-	*eof = (read->rd_offset + count) >= f_size;
-	*maxcount = min_t(unsigned long, count, *maxcount);
 	return nfs_ok;
 }
 
@@ -4811,69 +4769,36 @@ static __be32
 nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 		       struct nfsd4_read *read)
 {
-	unsigned long maxcount, count;
+	struct file *file = read->rd_nf->nf_file;
 	struct xdr_stream *xdr = resp->xdr;
-	struct file *file;
 	int starting_len = xdr->buf->len;
-	int last_segment = xdr->buf->len;
-	int segments = 0;
-	__be32 *p, tmp;
-	bool is_data;
-	loff_t pos;
-	u32 eof;
+	u32 segments = 0;
+	__be32 *p;
 
 	if (nfserr)
 		return nfserr;
-	file = read->rd_nf->nf_file;
 
 	/* eof flag, segment count */
 	p = xdr_reserve_space(xdr, 4 + 4);
 	if (!p)
-		return nfserr_resource;
+		return nfserr_io;
 	xdr_commit_encode(xdr);
 
-	maxcount = min_t(unsigned long, read->rd_length,
-			 (xdr->buf->buflen - xdr->buf->len));
-	count    = maxcount;
-
-	eof = read->rd_offset >= i_size_read(file_inode(file));
-	if (eof)
+	read->rd_eof = read->rd_offset >= i_size_read(file_inode(file));
+	if (read->rd_eof)
 		goto out;
 
-	pos = vfs_llseek(file, read->rd_offset, SEEK_HOLE);
-	is_data = pos > read->rd_offset;
-
-	while (count > 0 && !eof) {
-		maxcount = count;
-		if (is_data)
-			nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof,
-						segments == 0 ? &pos : NULL);
-		else
-			nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof);
-		if (nfserr)
-			goto out;
-		count -= maxcount;
-		read->rd_offset += maxcount;
-		is_data = !is_data;
-		last_segment = xdr->buf->len;
-		segments++;
-	}
-
-out:
-	if (nfserr && segments == 0)
+	nfserr = nfsd4_encode_read_plus_data(resp, read);
+	if (nfserr) {
 		xdr_truncate_encode(xdr, starting_len);
-	else {
-		if (nfserr) {
-			xdr_truncate_encode(xdr, last_segment);
-			nfserr = nfs_ok;
-			eof = 0;
-		}
-		tmp = htonl(eof);
-		write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
-		tmp = htonl(segments);
-		write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
+		return nfserr;
 	}
 
+	segments++;
+
+out:
+	p = xdr_encode_bool(p, read->rd_eof);
+	*p = cpu_to_be32(segments);
 	return nfserr;
 }
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/2] NFSD: Simplify READ_PLUS
  2022-09-13 18:01 [PATCH v4 0/2] NFSD: Simplify READ_PLUS Anna Schumaker
  2022-09-13 18:01 ` [PATCH v4 1/2] NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data Anna Schumaker
  2022-09-13 18:01 ` [PATCH v4 2/2] NFSD: Simplify READ_PLUS Anna Schumaker
@ 2022-09-13 18:45 ` Chuck Lever III
  2022-09-13 20:45   ` Anna Schumaker
  2 siblings, 1 reply; 15+ messages in thread
From: Chuck Lever III @ 2022-09-13 18:45 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Linux NFS Mailing List



> On Sep 13, 2022, at 11:01 AM, Anna Schumaker <anna@kernel.org> wrote:
> 
> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> 
> When we left off with READ_PLUS, Chuck had suggested reverting the
> server to reply with a single NFS4_CONTENT_DATA segment essentially
> mimicing how the READ operation behaves. Then, a future sparse read
> function can be added and the server modified to support it without
> needing to rip out the old READ_PLUS code at the same time.
> 
> This patch takes that first step. I was even able to re-use the
> nfsd4_encode_readv() and nfsd4_encode_splice_read() functions to
> remove some duuplicate code.
> 
> Below is some performance data comparing the READ and READ_PLUS
> operations with v4.2. I tested reading 2G files with various hole
> lengths including 100% data, 100% hole, and a handful of mixed hole and
> data files. For the mixed files, a notation like "1d" means
> every-other-page is data, and the first page is data. "4h" would mean
> alternating 4 pages data and 4 pages hole, beginning with hole.
> 
> I also used the 'vmtouch' utility to make sure the file is either
> evicted from the server's pagecache ("Uncached on server") or present in
> the server's page cache ("Cached on server").
> 
>   2048M-data
>   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.555 s, 712 MB/s, 0.74 s kern, 24% cpu
>   :    :........................... Cached on server .....  1.346 s, 1.6 GB/s, 0.69 s kern, 52% cpu
>   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.596 s, 690 MB/s, 0.72 s kern, 23% cpu
>        :........................... Cached on server .....  1.394 s, 1.6 GB/s, 0.67 s kern, 48% cpu
>   2048M-hole
>   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  4.934 s, 762 MB/s, 1.86 s kern, 29% cpu
>   :    :........................... Cached on server .....  1.328 s, 1.6 GB/s, 0.72 s kern, 54% cpu
>   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  4.823 s, 739 MB/s, 1.88 s kern, 28% cpu
>        :........................... Cached on server .....  1.399 s, 1.5 GB/s, 0.70 s kern, 50% cpu
>   2048M-mixed-1d
>   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  4.480 s, 598 MB/s, 0.76 s kern, 21% cpu
>   :    :........................... Cached on server .....  1.445 s, 1.5 GB/s, 0.71 s kern, 50% cpu
>   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  4.774 s, 559 MB/s, 0.75 s kern, 19% cpu
>        :........................... Cached on server .....  1.514 s, 1.4 GB/s, 0.67 s kern, 44% cpu
>   2048M-mixed-1h
>   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.568 s, 633 MB/s, 0.78 s kern, 23% cpu
>   :    :........................... Cached on server .....  1.357 s, 1.6 GB/s, 0.71 s kern, 53% cpu
>   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.580 s, 641 MB/s, 0.74 s kern, 22% cpu
>        :........................... Cached on server .....  1.396 s, 1.5 GB/s, 0.67 s kern, 48% cpu
>   2048M-mixed-2d
>   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.159 s, 708 MB/s, 0.78 s kern, 26% cpu
>   :    :........................... Cached on server .....  1.410 s, 1.5 GB/s, 0.70 s kern, 50% cpu
>   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.093 s, 712 MB/s, 0.74 s kern, 25% cpu
>        :........................... Cached on server .....  1.474 s, 1.4 GB/s, 0.67 s kern, 46% cpu
>   2048M-mixed-2h
>   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.043 s, 722 MB/s, 0.78 s kern, 26% cpu
>   :    :........................... Cached on server .....  1.374 s, 1.6 GB/s, 0.72 s kern, 53% cpu
>   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.913 s, 756 MB/s, 0.74 s kern, 26% cpu
>        :........................... Cached on server .....  1.349 s, 1.6 GB/s, 0.67 s kern, 50% cpu
>   2048M-mixed-4d
>   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.275 s, 680 MB/s, 0.75 s kern, 24% cpu
>   :    :........................... Cached on server .....  1.391 s, 1.5 GB/s, 0.71 s kern, 52% cpu
>   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.470 s, 626 MB/s, 0.72 s kern, 21% cpu
>        :........................... Cached on server .....  1.456 s, 1.5 GB/s, 0.67 s kern, 46% cpu
>   2048M-mixed-4h
>   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.035 s, 743 MB/s, 0.74 s kern, 26% cpu
>   :    :........................... Cached on server .....  1.345 s, 1.6 GB/s, 0.71 s kern, 53% cpu
>   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.848 s, 779 MB/s, 0.73 s kern, 26% cpu
>        :........................... Cached on server .....  1.421 s, 1.5 GB/s, 0.68 s kern, 48% cpu
>   2048M-mixed-8d
>   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.262 s, 687 MB/s, 0.74 s kern, 24% cpu
>   :    :........................... Cached on server .....  1.366 s, 1.6 GB/s, 0.69 s kern, 51% cpu
>   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.195 s, 709 MB/s, 0.72 s kern, 24% cpu
>        :........................... Cached on server .....  1.414 s, 1.5 GB/s, 0.67 s kern, 48% cpu
>   2048M-mixed-8h
>   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  2.899 s, 789 MB/s, 0.73 s kern, 27% cpu
>   :    :........................... Cached on server .....  1.338 s, 1.6 GB/s, 0.69 s kern, 52% cpu
>   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.910 s, 772 MB/s, 0.72 s kern, 26% cpu
>        :........................... Cached on server .....  1.438 s, 1.5 GB/s, 0.67 s kern, 47% cpu
>   2048M-mixed-16d
>   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.416 s, 661 MB/s, 0.73 s kern, 23% cpu
>   :    :........................... Cached on server .....  1.345 s, 1.6 GB/s, 0.70 s kern, 53% cpu
>   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.177 s, 713 MB/s, 0.70 s kern, 23% cpu
>        :........................... Cached on server .....  1.447 s, 1.5 GB/s, 0.68 s kern, 47% cpu
>   2048M-mixed-16h
>   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  2.919 s, 780 MB/s, 0.73 s kern, 26% cpu
>   :    :........................... Cached on server .....  1.363 s, 1.6 GB/s, 0.70 s kern, 51% cpu
>   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.934 s, 773 MB/s, 0.70 s kern, 25% cpu
>        :........................... Cached on server .....  1.435 s, 1.5 GB/s, 0.67 s kern, 47% cpu

For this particular change, I'm interested only in cases where the
whole file is cached on the server. We're focusing on the efficiency
and performance of the protocol and transport here, not the underlying
filesystem (which is... xfs?).

Also, 2GB files can be read with just 20 1MB READ requests. That
means we don't have a large sample size of READ operations for any
single test, assuming the client is using 1MB rsize.

Also, are these averages, or single runs? I think running each test
5-10 times (at least) and including some variance data in the results
would help build more confidence that the small differences in the
timing are not noise.

All that said, however, I see with some consistency that READ_PLUS
takes longer to pull data over the wire, but uses slightly less CPU.
Assuming the CPU utilizations are client-side, that matches my
expectations of lower CPU utilization results if the throughput is
lower.

Looking at the 100% data results, READ_PLUS takes 3.5% longer than
READ. That to me is a small but significant drop -- I think it will
be noticeable for large workloads. Can you explain the difference?

For subsequent test runs, can you find a server with more memory,
test with larger files, and test with a variety of rsize settings?
You can reduce your test matrix by leaving out the tests with holey
files for the moment.


> - v4:
>  - Change READ and READ_PLUS to return nfserr_serverfault if the splice
>    splice check fails.

At this point, the code looks fine, but I'd like to understand why
the performance is not the same.


> Thanks,
> Anna
> 
> 
> Anna Schumaker (2):
>  NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data
>  NFSD: Simplify READ_PLUS
> 
> fs/nfsd/nfs4xdr.c | 141 +++++++++++-----------------------------------
> 1 file changed, 33 insertions(+), 108 deletions(-)
> 
> -- 
> 2.37.3
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 1/2] NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data
  2022-09-13 18:01 ` [PATCH v4 1/2] NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data Anna Schumaker
@ 2022-09-13 20:12   ` Chuck Lever III
  2022-09-13 20:42     ` Anna Schumaker
  2022-09-15 19:59   ` Jeff Layton
  2022-10-06 16:35   ` Jeff Layton
  2 siblings, 1 reply; 15+ messages in thread
From: Chuck Lever III @ 2022-09-13 20:12 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Linux NFS Mailing List



> On Sep 13, 2022, at 11:01 AM, Anna Schumaker <anna@kernel.org> wrote:
> 
> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> 
> This was discussed with Chuck as part of this patch set. Returning
> nfserr_resource was decided to not be the best error message here, and
> he suggested changing to nfserr_serverfault instead.
> 
> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> ---
> fs/nfsd/nfs4xdr.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)

I've applied this one for nfsd for-next. Thanks!

As I mentioned, 2/2 looks OK, and I'll apply it to my private
tree for testing while we work out why it's a little slower.


> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 1e9690a061ec..01dd73ed5720 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -3994,7 +3994,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
> 	}
> 	if (resp->xdr->buf->page_len && splice_ok) {
> 		WARN_ON_ONCE(1);
> -		return nfserr_resource;
> +		return nfserr_serverfault;

Odd, I couldn't find a definition for nfserr_serverfault when
I asked for this patch last week, but this one-liner seems to
compile correctly. Oh well!


> 	}
> 	xdr_commit_encode(xdr);
> 
> -- 
> 2.37.3
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 1/2] NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data
  2022-09-13 20:12   ` Chuck Lever III
@ 2022-09-13 20:42     ` Anna Schumaker
  0 siblings, 0 replies; 15+ messages in thread
From: Anna Schumaker @ 2022-09-13 20:42 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List

On Tue, Sep 13, 2022 at 4:12 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Sep 13, 2022, at 11:01 AM, Anna Schumaker <anna@kernel.org> wrote:
> >
> > From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> >
> > This was discussed with Chuck as part of this patch set. Returning
> > nfserr_resource was decided to not be the best error message here, and
> > he suggested changing to nfserr_serverfault instead.
> >
> > Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > ---
> > fs/nfsd/nfs4xdr.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
>
> I've applied this one for nfsd for-next. Thanks!
>
> As I mentioned, 2/2 looks OK, and I'll apply it to my private
> tree for testing while we work out why it's a little slower.
>
>
> > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > index 1e9690a061ec..01dd73ed5720 100644
> > --- a/fs/nfsd/nfs4xdr.c
> > +++ b/fs/nfsd/nfs4xdr.c
> > @@ -3994,7 +3994,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
> >       }
> >       if (resp->xdr->buf->page_len && splice_ok) {
> >               WARN_ON_ONCE(1);
> > -             return nfserr_resource;
> > +             return nfserr_serverfault;
>
> Odd, I couldn't find a definition for nfserr_serverfault when
> I asked for this patch last week, but this one-liner seems to
> compile correctly. Oh well!

I found it in fs/nfsd/nfsd.h with `git grep`: #define
nfserr_serverfault      cpu_to_be32(NFSERR_SERVERFAULT)

Anna

>
>
> >       }
> >       xdr_commit_encode(xdr);
> >
> > --
> > 2.37.3
> >
>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/2] NFSD: Simplify READ_PLUS
  2022-09-13 18:45 ` [PATCH v4 0/2] " Chuck Lever III
@ 2022-09-13 20:45   ` Anna Schumaker
  2022-10-05 15:10     ` Anna Schumaker
  0 siblings, 1 reply; 15+ messages in thread
From: Anna Schumaker @ 2022-09-13 20:45 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List

On Tue, Sep 13, 2022 at 2:45 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Sep 13, 2022, at 11:01 AM, Anna Schumaker <anna@kernel.org> wrote:
> >
> > From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> >
> > When we left off with READ_PLUS, Chuck had suggested reverting the
> > server to reply with a single NFS4_CONTENT_DATA segment essentially
> > mimicing how the READ operation behaves. Then, a future sparse read
> > function can be added and the server modified to support it without
> > needing to rip out the old READ_PLUS code at the same time.
> >
> > This patch takes that first step. I was even able to re-use the
> > nfsd4_encode_readv() and nfsd4_encode_splice_read() functions to
> > remove some duuplicate code.
> >
> > Below is some performance data comparing the READ and READ_PLUS
> > operations with v4.2. I tested reading 2G files with various hole
> > lengths including 100% data, 100% hole, and a handful of mixed hole and
> > data files. For the mixed files, a notation like "1d" means
> > every-other-page is data, and the first page is data. "4h" would mean
> > alternating 4 pages data and 4 pages hole, beginning with hole.
> >
> > I also used the 'vmtouch' utility to make sure the file is either
> > evicted from the server's pagecache ("Uncached on server") or present in
> > the server's page cache ("Cached on server").
> >
> >   2048M-data
> >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.555 s, 712 MB/s, 0.74 s kern, 24% cpu
> >   :    :........................... Cached on server .....  1.346 s, 1.6 GB/s, 0.69 s kern, 52% cpu
> >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.596 s, 690 MB/s, 0.72 s kern, 23% cpu
> >        :........................... Cached on server .....  1.394 s, 1.6 GB/s, 0.67 s kern, 48% cpu
> >   2048M-hole
> >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  4.934 s, 762 MB/s, 1.86 s kern, 29% cpu
> >   :    :........................... Cached on server .....  1.328 s, 1.6 GB/s, 0.72 s kern, 54% cpu
> >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  4.823 s, 739 MB/s, 1.88 s kern, 28% cpu
> >        :........................... Cached on server .....  1.399 s, 1.5 GB/s, 0.70 s kern, 50% cpu
> >   2048M-mixed-1d
> >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  4.480 s, 598 MB/s, 0.76 s kern, 21% cpu
> >   :    :........................... Cached on server .....  1.445 s, 1.5 GB/s, 0.71 s kern, 50% cpu
> >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  4.774 s, 559 MB/s, 0.75 s kern, 19% cpu
> >        :........................... Cached on server .....  1.514 s, 1.4 GB/s, 0.67 s kern, 44% cpu
> >   2048M-mixed-1h
> >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.568 s, 633 MB/s, 0.78 s kern, 23% cpu
> >   :    :........................... Cached on server .....  1.357 s, 1.6 GB/s, 0.71 s kern, 53% cpu
> >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.580 s, 641 MB/s, 0.74 s kern, 22% cpu
> >        :........................... Cached on server .....  1.396 s, 1.5 GB/s, 0.67 s kern, 48% cpu
> >   2048M-mixed-2d
> >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.159 s, 708 MB/s, 0.78 s kern, 26% cpu
> >   :    :........................... Cached on server .....  1.410 s, 1.5 GB/s, 0.70 s kern, 50% cpu
> >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.093 s, 712 MB/s, 0.74 s kern, 25% cpu
> >        :........................... Cached on server .....  1.474 s, 1.4 GB/s, 0.67 s kern, 46% cpu
> >   2048M-mixed-2h
> >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.043 s, 722 MB/s, 0.78 s kern, 26% cpu
> >   :    :........................... Cached on server .....  1.374 s, 1.6 GB/s, 0.72 s kern, 53% cpu
> >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.913 s, 756 MB/s, 0.74 s kern, 26% cpu
> >        :........................... Cached on server .....  1.349 s, 1.6 GB/s, 0.67 s kern, 50% cpu
> >   2048M-mixed-4d
> >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.275 s, 680 MB/s, 0.75 s kern, 24% cpu
> >   :    :........................... Cached on server .....  1.391 s, 1.5 GB/s, 0.71 s kern, 52% cpu
> >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.470 s, 626 MB/s, 0.72 s kern, 21% cpu
> >        :........................... Cached on server .....  1.456 s, 1.5 GB/s, 0.67 s kern, 46% cpu
> >   2048M-mixed-4h
> >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.035 s, 743 MB/s, 0.74 s kern, 26% cpu
> >   :    :........................... Cached on server .....  1.345 s, 1.6 GB/s, 0.71 s kern, 53% cpu
> >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.848 s, 779 MB/s, 0.73 s kern, 26% cpu
> >        :........................... Cached on server .....  1.421 s, 1.5 GB/s, 0.68 s kern, 48% cpu
> >   2048M-mixed-8d
> >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.262 s, 687 MB/s, 0.74 s kern, 24% cpu
> >   :    :........................... Cached on server .....  1.366 s, 1.6 GB/s, 0.69 s kern, 51% cpu
> >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.195 s, 709 MB/s, 0.72 s kern, 24% cpu
> >        :........................... Cached on server .....  1.414 s, 1.5 GB/s, 0.67 s kern, 48% cpu
> >   2048M-mixed-8h
> >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  2.899 s, 789 MB/s, 0.73 s kern, 27% cpu
> >   :    :........................... Cached on server .....  1.338 s, 1.6 GB/s, 0.69 s kern, 52% cpu
> >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.910 s, 772 MB/s, 0.72 s kern, 26% cpu
> >        :........................... Cached on server .....  1.438 s, 1.5 GB/s, 0.67 s kern, 47% cpu
> >   2048M-mixed-16d
> >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.416 s, 661 MB/s, 0.73 s kern, 23% cpu
> >   :    :........................... Cached on server .....  1.345 s, 1.6 GB/s, 0.70 s kern, 53% cpu
> >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.177 s, 713 MB/s, 0.70 s kern, 23% cpu
> >        :........................... Cached on server .....  1.447 s, 1.5 GB/s, 0.68 s kern, 47% cpu
> >   2048M-mixed-16h
> >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  2.919 s, 780 MB/s, 0.73 s kern, 26% cpu
> >   :    :........................... Cached on server .....  1.363 s, 1.6 GB/s, 0.70 s kern, 51% cpu
> >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.934 s, 773 MB/s, 0.70 s kern, 25% cpu
> >        :........................... Cached on server .....  1.435 s, 1.5 GB/s, 0.67 s kern, 47% cpu
>
> For this particular change, I'm interested only in cases where the
> whole file is cached on the server. We're focusing on the efficiency
> and performance of the protocol and transport here, not the underlying
> filesystem (which is... xfs?).

Sounds good, I can narrow down to just that test.

>
> Also, 2GB files can be read with just 20 1MB READ requests. That
> means we don't have a large sample size of READ operations for any
> single test, assuming the client is using 1MB rsize.
>
> Also, are these averages, or single runs? I think running each test
> 5-10 times (at least) and including some variance data in the results
> would help build more confidence that the small differences in the
> timing are not noise.

This is an average across 10 runs.

>
> All that said, however, I see with some consistency that READ_PLUS
> takes longer to pull data over the wire, but uses slightly less CPU.
> Assuming the CPU utilizations are client-side, that matches my
> expectations of lower CPU utilization results if the throughput is
> lower.
>
> Looking at the 100% data results, READ_PLUS takes 3.5% longer than
> READ. That to me is a small but significant drop -- I think it will
> be noticeable for large workloads. Can you explain the difference?

I'll try larger files for my next round of testing. I was assuming the
difference is just noise, since there are cases like the mixed-2h test
where READ_PLUS was slightly faster. But more testing will help figure
that out.

>
> For subsequent test runs, can you find a server with more memory,
> test with larger files, and test with a variety of rsize settings?
> You can reduce your test matrix by leaving out the tests with holey
> files for the moment.

Sure thing!

Anna

>
>
> > - v4:
> >  - Change READ and READ_PLUS to return nfserr_serverfault if the splice
> >    splice check fails.
>
> At this point, the code looks fine, but I'd like to understand why
> the performance is not the same.
>
>
> > Thanks,
> > Anna
> >
> >
> > Anna Schumaker (2):
> >  NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data
> >  NFSD: Simplify READ_PLUS
> >
> > fs/nfsd/nfs4xdr.c | 141 +++++++++++-----------------------------------
> > 1 file changed, 33 insertions(+), 108 deletions(-)
> >
> > --
> > 2.37.3
> >
>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 1/2] NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data
  2022-09-13 18:01 ` [PATCH v4 1/2] NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data Anna Schumaker
  2022-09-13 20:12   ` Chuck Lever III
@ 2022-09-15 19:59   ` Jeff Layton
  2022-09-16  2:17     ` Chuck Lever III
  2022-10-06 16:35   ` Jeff Layton
  2 siblings, 1 reply; 15+ messages in thread
From: Jeff Layton @ 2022-09-15 19:59 UTC (permalink / raw)
  To: Anna Schumaker, linux-nfs, chuck.lever

On Tue, 2022-09-13 at 14:01 -0400, Anna Schumaker wrote:
> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> 
> This was discussed with Chuck as part of this patch set. Returning
> nfserr_resource was decided to not be the best error message here, and
> he suggested changing to nfserr_serverfault instead.
> 
> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> ---
>  fs/nfsd/nfs4xdr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 1e9690a061ec..01dd73ed5720 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -3994,7 +3994,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
>  	}
>  	if (resp->xdr->buf->page_len && splice_ok) {
>  		WARN_ON_ONCE(1);
> -		return nfserr_resource;
> +		return nfserr_serverfault;
>  	}
>  	xdr_commit_encode(xdr);
>  

IIRC the problem is that nfserr_resource is not valid in v4.1+. Do we
also need to change the nfserr_resource return in the if block above
this one?
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 1/2] NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data
  2022-09-15 19:59   ` Jeff Layton
@ 2022-09-16  2:17     ` Chuck Lever III
  0 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever III @ 2022-09-16  2:17 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Anna Schumaker, Linux NFS Mailing List



> On Sep 15, 2022, at 12:59 PM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> On Tue, 2022-09-13 at 14:01 -0400, Anna Schumaker wrote:
>> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
>> 
>> This was discussed with Chuck as part of this patch set. Returning
>> nfserr_resource was decided to not be the best error message here, and
>> he suggested changing to nfserr_serverfault instead.
>> 
>> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
>> ---
>> fs/nfsd/nfs4xdr.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> index 1e9690a061ec..01dd73ed5720 100644
>> --- a/fs/nfsd/nfs4xdr.c
>> +++ b/fs/nfsd/nfs4xdr.c
>> @@ -3994,7 +3994,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
>> 	}
>> 	if (resp->xdr->buf->page_len && splice_ok) {
>> 		WARN_ON_ONCE(1);
>> -		return nfserr_resource;
>> +		return nfserr_serverfault;
>> 	}
>> 	xdr_commit_encode(xdr);
>> 
> 
> IIRC the problem is that nfserr_resource is not valid in v4.1+. Do we
> also need to change the nfserr_resource return in the if block above
> this one?

I discovered that nfsd4_encode_operation() converts nfserr_resource
to the status code that is proper for the minor version that is in
use. We're good to go there.

--
Chuck Lever




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/2] NFSD: Simplify READ_PLUS
  2022-09-13 20:45   ` Anna Schumaker
@ 2022-10-05 15:10     ` Anna Schumaker
  2022-10-05 15:53       ` Chuck Lever III
  0 siblings, 1 reply; 15+ messages in thread
From: Anna Schumaker @ 2022-10-05 15:10 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List

Hi Chuck,

On Tue, Sep 13, 2022 at 4:45 PM Anna Schumaker <anna@kernel.org> wrote:
>
> On Tue, Sep 13, 2022 at 2:45 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >
> >
> >
> > > On Sep 13, 2022, at 11:01 AM, Anna Schumaker <anna@kernel.org> wrote:
> > >
> > > From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > >
> > > When we left off with READ_PLUS, Chuck had suggested reverting the
> > > server to reply with a single NFS4_CONTENT_DATA segment essentially
> > > mimicing how the READ operation behaves. Then, a future sparse read
> > > function can be added and the server modified to support it without
> > > needing to rip out the old READ_PLUS code at the same time.
> > >
> > > This patch takes that first step. I was even able to re-use the
> > > nfsd4_encode_readv() and nfsd4_encode_splice_read() functions to
> > > remove some duuplicate code.
> > >
> > > Below is some performance data comparing the READ and READ_PLUS
> > > operations with v4.2. I tested reading 2G files with various hole
> > > lengths including 100% data, 100% hole, and a handful of mixed hole and
> > > data files. For the mixed files, a notation like "1d" means
> > > every-other-page is data, and the first page is data. "4h" would mean
> > > alternating 4 pages data and 4 pages hole, beginning with hole.
> > >
> > > I also used the 'vmtouch' utility to make sure the file is either
> > > evicted from the server's pagecache ("Uncached on server") or present in
> > > the server's page cache ("Cached on server").
> > >
> > >   2048M-data
> > >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.555 s, 712 MB/s, 0.74 s kern, 24% cpu
> > >   :    :........................... Cached on server .....  1.346 s, 1.6 GB/s, 0.69 s kern, 52% cpu
> > >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.596 s, 690 MB/s, 0.72 s kern, 23% cpu
> > >        :........................... Cached on server .....  1.394 s, 1.6 GB/s, 0.67 s kern, 48% cpu
> > >   2048M-hole
> > >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  4.934 s, 762 MB/s, 1.86 s kern, 29% cpu
> > >   :    :........................... Cached on server .....  1.328 s, 1.6 GB/s, 0.72 s kern, 54% cpu
> > >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  4.823 s, 739 MB/s, 1.88 s kern, 28% cpu
> > >        :........................... Cached on server .....  1.399 s, 1.5 GB/s, 0.70 s kern, 50% cpu
> > >   2048M-mixed-1d
> > >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  4.480 s, 598 MB/s, 0.76 s kern, 21% cpu
> > >   :    :........................... Cached on server .....  1.445 s, 1.5 GB/s, 0.71 s kern, 50% cpu
> > >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  4.774 s, 559 MB/s, 0.75 s kern, 19% cpu
> > >        :........................... Cached on server .....  1.514 s, 1.4 GB/s, 0.67 s kern, 44% cpu
> > >   2048M-mixed-1h
> > >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.568 s, 633 MB/s, 0.78 s kern, 23% cpu
> > >   :    :........................... Cached on server .....  1.357 s, 1.6 GB/s, 0.71 s kern, 53% cpu
> > >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.580 s, 641 MB/s, 0.74 s kern, 22% cpu
> > >        :........................... Cached on server .....  1.396 s, 1.5 GB/s, 0.67 s kern, 48% cpu
> > >   2048M-mixed-2d
> > >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.159 s, 708 MB/s, 0.78 s kern, 26% cpu
> > >   :    :........................... Cached on server .....  1.410 s, 1.5 GB/s, 0.70 s kern, 50% cpu
> > >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.093 s, 712 MB/s, 0.74 s kern, 25% cpu
> > >        :........................... Cached on server .....  1.474 s, 1.4 GB/s, 0.67 s kern, 46% cpu
> > >   2048M-mixed-2h
> > >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.043 s, 722 MB/s, 0.78 s kern, 26% cpu
> > >   :    :........................... Cached on server .....  1.374 s, 1.6 GB/s, 0.72 s kern, 53% cpu
> > >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.913 s, 756 MB/s, 0.74 s kern, 26% cpu
> > >        :........................... Cached on server .....  1.349 s, 1.6 GB/s, 0.67 s kern, 50% cpu
> > >   2048M-mixed-4d
> > >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.275 s, 680 MB/s, 0.75 s kern, 24% cpu
> > >   :    :........................... Cached on server .....  1.391 s, 1.5 GB/s, 0.71 s kern, 52% cpu
> > >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.470 s, 626 MB/s, 0.72 s kern, 21% cpu
> > >        :........................... Cached on server .....  1.456 s, 1.5 GB/s, 0.67 s kern, 46% cpu
> > >   2048M-mixed-4h
> > >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.035 s, 743 MB/s, 0.74 s kern, 26% cpu
> > >   :    :........................... Cached on server .....  1.345 s, 1.6 GB/s, 0.71 s kern, 53% cpu
> > >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.848 s, 779 MB/s, 0.73 s kern, 26% cpu
> > >        :........................... Cached on server .....  1.421 s, 1.5 GB/s, 0.68 s kern, 48% cpu
> > >   2048M-mixed-8d
> > >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.262 s, 687 MB/s, 0.74 s kern, 24% cpu
> > >   :    :........................... Cached on server .....  1.366 s, 1.6 GB/s, 0.69 s kern, 51% cpu
> > >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.195 s, 709 MB/s, 0.72 s kern, 24% cpu
> > >        :........................... Cached on server .....  1.414 s, 1.5 GB/s, 0.67 s kern, 48% cpu
> > >   2048M-mixed-8h
> > >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  2.899 s, 789 MB/s, 0.73 s kern, 27% cpu
> > >   :    :........................... Cached on server .....  1.338 s, 1.6 GB/s, 0.69 s kern, 52% cpu
> > >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.910 s, 772 MB/s, 0.72 s kern, 26% cpu
> > >        :........................... Cached on server .....  1.438 s, 1.5 GB/s, 0.67 s kern, 47% cpu
> > >   2048M-mixed-16d
> > >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.416 s, 661 MB/s, 0.73 s kern, 23% cpu
> > >   :    :........................... Cached on server .....  1.345 s, 1.6 GB/s, 0.70 s kern, 53% cpu
> > >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.177 s, 713 MB/s, 0.70 s kern, 23% cpu
> > >        :........................... Cached on server .....  1.447 s, 1.5 GB/s, 0.68 s kern, 47% cpu
> > >   2048M-mixed-16h
> > >   :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  2.919 s, 780 MB/s, 0.73 s kern, 26% cpu
> > >   :    :........................... Cached on server .....  1.363 s, 1.6 GB/s, 0.70 s kern, 51% cpu
> > >   :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.934 s, 773 MB/s, 0.70 s kern, 25% cpu
> > >        :........................... Cached on server .....  1.435 s, 1.5 GB/s, 0.67 s kern, 47% cpu
> >
> > For this particular change, I'm interested only in cases where the
> > whole file is cached on the server. We're focusing on the efficiency
> > and performance of the protocol and transport here, not the underlying
> > filesystem (which is... xfs?).
>
> Sounds good, I can narrow down to just that test.
>
> >
> > Also, 2GB files can be read with just 20 1MB READ requests. That
> > means we don't have a large sample size of READ operations for any
> > single test, assuming the client is using 1MB rsize.
> >
> > Also, are these averages, or single runs? I think running each test
> > 5-10 times (at least) and including some variance data in the results
> > would help build more confidence that the small differences in the
> > timing are not noise.
>
> This is an average across 10 runs.
>
> >
> > All that said, however, I see with some consistency that READ_PLUS
> > takes longer to pull data over the wire, but uses slightly less CPU.
> > Assuming the CPU utilizations are client-side, that matches my
> > expectations of lower CPU utilization results if the throughput is
> > lower.
> >
> > Looking at the 100% data results, READ_PLUS takes 3.5% longer than
> > READ. That to me is a small but significant drop -- I think it will
> > be noticeable for large workloads. Can you explain the difference?
>
> I'll try larger files for my next round of testing. I was assuming the
> difference is just noise, since there are cases like the mixed-2h test
> where READ_PLUS was slightly faster. But more testing will help figure
> that out.
>
> >
> > For subsequent test runs, can you find a server with more memory,
> > test with larger files, and test with a variety of rsize settings?
> > You can reduce your test matrix by leaving out the tests with holey
> > files for the moment.

Hi Chuck,

The following numbers are for 10G files that I created on Netapp lab
machines. I narrowed down my testing to files already in the server's
cache and read with directio to get the pagecache out of the way as
much as possible.  I did 25 iterations this time around, and included
minimum time, maximum time, and standard deviation in the report.

The following numbers are for XFS:

   10240M-data
   :... v6.0-rc6 (w/o Read Plus) ... 95.804 s, 112 MB/s, 0.42 s kern,  0% cpu
   :    :........................... Min: 108.000, Max: 114.000, StDev:  1.555
   :... v6.0-rc6 (w/ Read Plus) .... 96.683 s, 111 MB/s, 0.42 s kern,  0% cpu
        :........................... Min: 107.000, Max: 113.000, StDev:  1.481
   10240M-hole
   :... v6.0-rc6 (w/o Read Plus) ... 94.807 s, 113 MB/s, 0.42 s kern,  0% cpu
   :    :........................... Min: 113.000, Max: 114.000, StDev:  0.200
   :... v6.0-rc6 (w/ Read Plus) .... 95.388 s, 113 MB/s, 0.42 s kern,  0% cpu
        :........................... Min: 112.000, Max: 113.000, StDev:  0.476
   10240M-mixed-1d
   :... v6.0-rc6 (w/o Read Plus) ... 126.681 s,  89 MB/s, 0.43 s kern,  0% cpu
   :    :........................... Min: 59.000, Max: 114.000, StDev: 17.866
   :... v6.0-rc6 (w/ Read Plus) .... 110.494 s,  98 MB/s, 0.42 s kern,  0% cpu
        :........................... Min: 78.200, Max: 113.000, StDev:  7.353
   10240M-mixed-1h
   :... v6.0-rc6 (w/o Read Plus) ... 117.997 s,  94 MB/s, 0.43 s kern,  0% cpu
   :    :........................... Min: 59.200, Max: 114.000, StDev: 14.759
   :... v6.0-rc6 (w/ Read Plus) .... 115.344 s,  96 MB/s, 0.43 s kern,  0% cpu
        :........................... Min: 57.800, Max: 113.000, StDev: 14.192
   10240M-mixed-2d
   :... v6.0-rc6 (w/o Read Plus) ... 103.242 s, 104 MB/s, 0.43 s kern,  0% cpu
   :    :........................... Min: 96.000, Max: 114.000, StDev:  5.758
   :... v6.0-rc6 (w/ Read Plus) .... 101.552 s, 106 MB/s, 0.42 s kern,  0% cpu
        :........................... Min: 94.100, Max: 113.000, StDev:  4.744
   10240M-mixed-2h
   :... v6.0-rc6 (w/o Read Plus) ... 105.777 s, 102 MB/s, 0.42 s kern,  0% cpu
   :    :........................... Min: 90.000, Max: 112.000, StDev:  7.702
   :... v6.0-rc6 (w/ Read Plus) .... 101.782 s, 106 MB/s, 0.43 s kern,  0% cpu
        :........................... Min: 93.900, Max: 111.000, StDev:  4.059
   10240M-mixed-4d
   :... v6.0-rc6 (w/o Read Plus) ... 106.094 s, 102 MB/s, 0.43 s kern,  0% cpu
   :    :........................... Min: 87.900, Max: 110.000, StDev:  7.503
   :... v6.0-rc6 (w/ Read Plus) .... 103.356 s, 104 MB/s, 0.43 s kern,  0% cpu
        :........................... Min: 89.900, Max: 110.000, StDev:  5.255
   10240M-mixed-4h
   :... v6.0-rc6 (w/o Read Plus) ... 101.561 s, 106 MB/s, 0.43 s kern,  0% cpu
   :    :........................... Min: 98.400, Max: 111.000, StDev:  3.838
   :... v6.0-rc6 (w/ Read Plus) .... 106.004 s, 102 MB/s, 0.43 s kern,  0% cpu
        :........................... Min: 87.800, Max: 110.000, StDev:  6.811
   10240M-mixed-8d
   :... v6.0-rc6 (w/o Read Plus) ... 104.373 s, 103 MB/s, 0.42 s kern,  0% cpu
   :    :........................... Min: 92.200, Max: 111.000, StDev:  5.469
   :... v6.0-rc6 (w/ Read Plus) .... 102.490 s, 105 MB/s, 0.42 s kern,  0% cpu
        :........................... Min: 96.100, Max: 111.000, StDev:  4.708
   10240M-mixed-8h
   :... v6.0-rc6 (w/o Read Plus) ... 101.897 s, 105 MB/s, 0.43 s kern,  0% cpu
   :    :........................... Min: 96.300, Max: 111.000, StDev:  4.926
   :... v6.0-rc6 (w/ Read Plus) .... 100.914 s, 107 MB/s, 0.42 s kern,  0% cpu
        :........................... Min: 95.700, Max: 111.000, StDev:  3.886
   10240M-mixed-16d
   :... v6.0-rc6 (w/o Read Plus) ... 99.315 s, 108 MB/s, 0.42 s kern,  0% cpu
   :    :........................... Min: 102.000, Max: 112.000, StDev:  2.685
   :... v6.0-rc6 (w/ Read Plus) .... 100.228 s, 107 MB/s, 0.43 s kern,  0% cpu
        :........................... Min: 100.000, Max: 112.000, StDev:  3.597
   10240M-mixed-16h
   :... v6.0-rc6 (w/o Read Plus) ... 100.184 s, 107 MB/s, 0.42 s kern,  0% cpu
   :    :........................... Min: 99.500, Max: 112.000, StDev:  3.086
   :... v6.0-rc6 (w/ Read Plus) .... 102.623 s, 105 MB/s, 0.42 s kern,  0% cpu
        :........................... Min: 94.500, Max: 112.000, StDev:  5.330


And here is EXT4:
   10240M-data
   :... v6.0-rc6 (w/o Read Plus) ... 95.419 s, 113 MB/s, 0.43 s kern,  0% cpu
   :    :........................... Min: 111.000, Max: 113.000, StDev:  0.557
   :... v6.0-rc6 (w/ Read Plus) .... 95.764 s, 112 MB/s, 0.42 s kern,  0% cpu
        :........................... Min: 111.000, Max: 112.000, StDev:  0.200
   10240M-hole
   :... v6.0-rc6 (w/o Read Plus) ... 95.170 s, 113 MB/s, 0.43 s kern,  0% cpu
   :    :........................... Min: 113.000, Max: 113.000, StDev:  0.000
   :... v6.0-rc6 (w/ Read Plus) .... 95.446 s, 113 MB/s, 0.42 s kern,  0% cpu
        :........................... Min: 112.000, Max: 113.000, StDev:  0.507
   10240M-mixed-1d
   :... v6.0-rc6 (w/o Read Plus) ... 101.329 s, 106 MB/s, 0.43 s kern,  0% cpu
   :    :........................... Min: 102.000, Max: 113.000, StDev:  2.217
   :... v6.0-rc6 (w/ Read Plus) .... 101.708 s, 106 MB/s, 0.42 s kern,  0% cpu
        :........................... Min: 86.200, Max: 109.000, StDev:  4.494
   10240M-mixed-1h
   :... v6.0-rc6 (w/o Read Plus) ... 103.570 s, 104 MB/s, 0.43 s kern,  0% cpu
   :    :........................... Min: 101.000, Max: 113.000, StDev:  2.337
   :... v6.0-rc6 (w/ Read Plus) .... 104.246 s, 103 MB/s, 0.43 s kern,  0% cpu
        :........................... Min: 98.400, Max: 108.000, StDev:  2.622
   10240M-mixed-2d
   :... v6.0-rc6 (w/o Read Plus) ... 101.004 s, 106 MB/s, 0.43 s kern,  0% cpu
   :    :........................... Min: 103.000, Max: 108.000, StDev:  1.332
   :... v6.0-rc6 (w/ Read Plus) .... 102.730 s, 105 MB/s, 0.42 s kern,  0% cpu
        :........................... Min: 97.200, Max: 109.000, StDev:  2.630
   10240M-mixed-2h
   :... v6.0-rc6 (w/o Read Plus) ... 101.532 s, 106 MB/s, 0.43 s kern,  0% cpu
   :    :........................... Min: 100.000, Max: 110.000, StDev:  2.132
   :... v6.0-rc6 (w/ Read Plus) .... 102.048 s, 105 MB/s, 0.43 s kern,  0% cpu
        :........................... Min: 96.800, Max: 108.000, StDev:  2.501
   10240M-mixed-4d
   :... v6.0-rc6 (w/o Read Plus) ... 101.672 s, 106 MB/s, 0.43 s kern,  0% cpu
   :    :........................... Min: 101.000, Max: 110.000, StDev:  2.241
   :... v6.0-rc6 (w/ Read Plus) .... 103.032 s, 104 MB/s, 0.43 s kern,  0% cpu
        :........................... Min: 93.900, Max: 109.000, StDev:  3.028
   10240M-mixed-4h
   :... v6.0-rc6 (w/o Read Plus) ... 101.248 s, 106 MB/s, 0.43 s kern,  0% cpu
   :    :........................... Min: 103.000, Max: 110.000, StDev:  1.818
   :... v6.0-rc6 (w/ Read Plus) .... 103.324 s, 104 MB/s, 0.43 s kern,  0% cpu
        :........................... Min: 91.900, Max: 109.000, StDev:  3.395
   10240M-mixed-8d
   :... v6.0-rc6 (w/o Read Plus) ... 100.295 s, 107 MB/s, 0.43 s kern,  0% cpu
   :    :........................... Min: 104.000, Max: 110.000, StDev:  1.864
   :... v6.0-rc6 (w/ Read Plus) .... 101.728 s, 106 MB/s, 0.42 s kern,  0% cpu
        :........................... Min: 95.100, Max: 108.000, StDev:  2.565
   10240M-mixed-8h
   :... v6.0-rc6 (w/o Read Plus) ... 98.927 s, 109 MB/s, 0.43 s kern,  0% cpu
   :    :........................... Min: 106.000, Max: 111.000, StDev:  1.414
   :... v6.0-rc6 (w/ Read Plus) .... 101.399 s, 106 MB/s, 0.42 s kern,  0% cpu
        :........................... Min: 94.900, Max: 109.000, StDev:  2.725
   10240M-mixed-16d
   :... v6.0-rc6 (w/o Read Plus) ... 98.060 s, 109 MB/s, 0.42 s kern,  0% cpu
   :    :........................... Min: 107.000, Max: 112.000, StDev:  1.447
   :... v6.0-rc6 (w/ Read Plus) .... 98.994 s, 109 MB/s, 0.43 s kern,  0% cpu
        :........................... Min: 99.400, Max: 111.000, StDev:  2.168
   10240M-mixed-16h
   :... v6.0-rc6 (w/o Read Plus) ... 97.599 s, 110 MB/s, 0.43 s kern,  0% cpu
   :    :........................... Min: 106.000, Max: 112.000, StDev:  1.541
   :... v6.0-rc6 (w/ Read Plus) .... 98.895 s, 109 MB/s, 0.42 s kern,  0% cpu
        :........................... Min: 93.600, Max: 111.000, StDev:  3.532

Is there anything else you want me to test?

Anna

>
> Sure thing!
>
> Anna
>
> >
> >
> > > - v4:
> > >  - Change READ and READ_PLUS to return nfserr_serverfault if the splice
> > >    splice check fails.
> >
> > At this point, the code looks fine, but I'd like to understand why
> > the performance is not the same.
> >
> >
> > > Thanks,
> > > Anna
> > >
> > >
> > > Anna Schumaker (2):
> > >  NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data
> > >  NFSD: Simplify READ_PLUS
> > >
> > > fs/nfsd/nfs4xdr.c | 141 +++++++++++-----------------------------------
> > > 1 file changed, 33 insertions(+), 108 deletions(-)
> > >
> > > --
> > > 2.37.3
> > >
> >
> > --
> > Chuck Lever
> >
> >
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/2] NFSD: Simplify READ_PLUS
  2022-10-05 15:10     ` Anna Schumaker
@ 2022-10-05 15:53       ` Chuck Lever III
  2022-10-31 17:55         ` Anna Schumaker
  0 siblings, 1 reply; 15+ messages in thread
From: Chuck Lever III @ 2022-10-05 15:53 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Linux NFS Mailing List



> On Oct 5, 2022, at 11:10 AM, Anna Schumaker <anna@kernel.org> wrote:
> 
> Hi Chuck,
> 
> On Tue, Sep 13, 2022 at 4:45 PM Anna Schumaker <anna@kernel.org> wrote:
>> 
>> On Tue, Sep 13, 2022 at 2:45 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>> 
>>> 
>>> 
>>>> On Sep 13, 2022, at 11:01 AM, Anna Schumaker <anna@kernel.org> wrote:
>>>> 
>>>> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
>>>> 
>>>> When we left off with READ_PLUS, Chuck had suggested reverting the
>>>> server to reply with a single NFS4_CONTENT_DATA segment essentially
>>>> mimicing how the READ operation behaves. Then, a future sparse read
>>>> function can be added and the server modified to support it without
>>>> needing to rip out the old READ_PLUS code at the same time.
>>>> 
>>>> This patch takes that first step. I was even able to re-use the
>>>> nfsd4_encode_readv() and nfsd4_encode_splice_read() functions to
>>>> remove some duuplicate code.
>>>> 
>>>> Below is some performance data comparing the READ and READ_PLUS
>>>> operations with v4.2. I tested reading 2G files with various hole
>>>> lengths including 100% data, 100% hole, and a handful of mixed hole and
>>>> data files. For the mixed files, a notation like "1d" means
>>>> every-other-page is data, and the first page is data. "4h" would mean
>>>> alternating 4 pages data and 4 pages hole, beginning with hole.
>>>> 
>>>> I also used the 'vmtouch' utility to make sure the file is either
>>>> evicted from the server's pagecache ("Uncached on server") or present in
>>>> the server's page cache ("Cached on server").
>>>> 
>>>>  2048M-data
>>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.555 s, 712 MB/s, 0.74 s kern, 24% cpu
>>>>  :    :........................... Cached on server .....  1.346 s, 1.6 GB/s, 0.69 s kern, 52% cpu
>>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.596 s, 690 MB/s, 0.72 s kern, 23% cpu
>>>>       :........................... Cached on server .....  1.394 s, 1.6 GB/s, 0.67 s kern, 48% cpu
>>>>  2048M-hole
>>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  4.934 s, 762 MB/s, 1.86 s kern, 29% cpu
>>>>  :    :........................... Cached on server .....  1.328 s, 1.6 GB/s, 0.72 s kern, 54% cpu
>>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  4.823 s, 739 MB/s, 1.88 s kern, 28% cpu
>>>>       :........................... Cached on server .....  1.399 s, 1.5 GB/s, 0.70 s kern, 50% cpu
>>>>  2048M-mixed-1d
>>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  4.480 s, 598 MB/s, 0.76 s kern, 21% cpu
>>>>  :    :........................... Cached on server .....  1.445 s, 1.5 GB/s, 0.71 s kern, 50% cpu
>>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  4.774 s, 559 MB/s, 0.75 s kern, 19% cpu
>>>>       :........................... Cached on server .....  1.514 s, 1.4 GB/s, 0.67 s kern, 44% cpu
>>>>  2048M-mixed-1h
>>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.568 s, 633 MB/s, 0.78 s kern, 23% cpu
>>>>  :    :........................... Cached on server .....  1.357 s, 1.6 GB/s, 0.71 s kern, 53% cpu
>>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.580 s, 641 MB/s, 0.74 s kern, 22% cpu
>>>>       :........................... Cached on server .....  1.396 s, 1.5 GB/s, 0.67 s kern, 48% cpu
>>>>  2048M-mixed-2d
>>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.159 s, 708 MB/s, 0.78 s kern, 26% cpu
>>>>  :    :........................... Cached on server .....  1.410 s, 1.5 GB/s, 0.70 s kern, 50% cpu
>>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.093 s, 712 MB/s, 0.74 s kern, 25% cpu
>>>>       :........................... Cached on server .....  1.474 s, 1.4 GB/s, 0.67 s kern, 46% cpu
>>>>  2048M-mixed-2h
>>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.043 s, 722 MB/s, 0.78 s kern, 26% cpu
>>>>  :    :........................... Cached on server .....  1.374 s, 1.6 GB/s, 0.72 s kern, 53% cpu
>>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.913 s, 756 MB/s, 0.74 s kern, 26% cpu
>>>>       :........................... Cached on server .....  1.349 s, 1.6 GB/s, 0.67 s kern, 50% cpu
>>>>  2048M-mixed-4d
>>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.275 s, 680 MB/s, 0.75 s kern, 24% cpu
>>>>  :    :........................... Cached on server .....  1.391 s, 1.5 GB/s, 0.71 s kern, 52% cpu
>>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.470 s, 626 MB/s, 0.72 s kern, 21% cpu
>>>>       :........................... Cached on server .....  1.456 s, 1.5 GB/s, 0.67 s kern, 46% cpu
>>>>  2048M-mixed-4h
>>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.035 s, 743 MB/s, 0.74 s kern, 26% cpu
>>>>  :    :........................... Cached on server .....  1.345 s, 1.6 GB/s, 0.71 s kern, 53% cpu
>>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.848 s, 779 MB/s, 0.73 s kern, 26% cpu
>>>>       :........................... Cached on server .....  1.421 s, 1.5 GB/s, 0.68 s kern, 48% cpu
>>>>  2048M-mixed-8d
>>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.262 s, 687 MB/s, 0.74 s kern, 24% cpu
>>>>  :    :........................... Cached on server .....  1.366 s, 1.6 GB/s, 0.69 s kern, 51% cpu
>>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.195 s, 709 MB/s, 0.72 s kern, 24% cpu
>>>>       :........................... Cached on server .....  1.414 s, 1.5 GB/s, 0.67 s kern, 48% cpu
>>>>  2048M-mixed-8h
>>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  2.899 s, 789 MB/s, 0.73 s kern, 27% cpu
>>>>  :    :........................... Cached on server .....  1.338 s, 1.6 GB/s, 0.69 s kern, 52% cpu
>>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.910 s, 772 MB/s, 0.72 s kern, 26% cpu
>>>>       :........................... Cached on server .....  1.438 s, 1.5 GB/s, 0.67 s kern, 47% cpu
>>>>  2048M-mixed-16d
>>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.416 s, 661 MB/s, 0.73 s kern, 23% cpu
>>>>  :    :........................... Cached on server .....  1.345 s, 1.6 GB/s, 0.70 s kern, 53% cpu
>>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.177 s, 713 MB/s, 0.70 s kern, 23% cpu
>>>>       :........................... Cached on server .....  1.447 s, 1.5 GB/s, 0.68 s kern, 47% cpu
>>>>  2048M-mixed-16h
>>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  2.919 s, 780 MB/s, 0.73 s kern, 26% cpu
>>>>  :    :........................... Cached on server .....  1.363 s, 1.6 GB/s, 0.70 s kern, 51% cpu
>>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.934 s, 773 MB/s, 0.70 s kern, 25% cpu
>>>>       :........................... Cached on server .....  1.435 s, 1.5 GB/s, 0.67 s kern, 47% cpu
>>> 
>>> For this particular change, I'm interested only in cases where the
>>> whole file is cached on the server. We're focusing on the efficiency
>>> and performance of the protocol and transport here, not the underlying
>>> filesystem (which is... xfs?).
>> 
>> Sounds good, I can narrow down to just that test.
>> 
>>> 
>>> Also, 2GB files can be read with just 20 1MB READ requests. That
>>> means we don't have a large sample size of READ operations for any
>>> single test, assuming the client is using 1MB rsize.
>>> 
>>> Also, are these averages, or single runs? I think running each test
>>> 5-10 times (at least) and including some variance data in the results
>>> would help build more confidence that the small differences in the
>>> timing are not noise.
>> 
>> This is an average across 10 runs.
>> 
>>> 
>>> All that said, however, I see with some consistency that READ_PLUS
>>> takes longer to pull data over the wire, but uses slightly less CPU.
>>> Assuming the CPU utilizations are client-side, that matches my
>>> expectations of lower CPU utilization results if the throughput is
>>> lower.
>>> 
>>> Looking at the 100% data results, READ_PLUS takes 3.5% longer than
>>> READ. That to me is a small but significant drop -- I think it will
>>> be noticeable for large workloads. Can you explain the difference?
>> 
>> I'll try larger files for my next round of testing. I was assuming the
>> difference is just noise, since there are cases like the mixed-2h test
>> where READ_PLUS was slightly faster. But more testing will help figure
>> that out.
>> 
>>> 
>>> For subsequent test runs, can you find a server with more memory,
>>> test with larger files, and test with a variety of rsize settings?
>>> You can reduce your test matrix by leaving out the tests with holey
>>> files for the moment.
> 
> Hi Chuck,
> 
> The following numbers are for 10G files that I created on Netapp lab
> machines. I narrowed down my testing to files already in the server's
> cache and read with directio to get the pagecache out of the way as
> much as possible.  I did 25 iterations this time around, and included
> minimum time, maximum time, and standard deviation in the report.
> 
> The following numbers are for XFS:
> 
>   10240M-data
>   :... v6.0-rc6 (w/o Read Plus) ... 95.804 s, 112 MB/s, 0.42 s kern,  0% cpu
>   :    :........................... Min: 108.000, Max: 114.000, StDev:  1.555
>   :... v6.0-rc6 (w/ Read Plus) .... 96.683 s, 111 MB/s, 0.42 s kern,  0% cpu
>        :........................... Min: 107.000, Max: 113.000, StDev:  1.481


> And here is EXT4:
>   10240M-data
>   :... v6.0-rc6 (w/o Read Plus) ... 95.419 s, 113 MB/s, 0.43 s kern,  0% cpu
>   :    :........................... Min: 111.000, Max: 113.000, StDev:  0.557
>   :... v6.0-rc6 (w/ Read Plus) .... 95.764 s, 112 MB/s, 0.42 s kern,  0% cpu
>        :........................... Min: 111.000, Max: 112.000, StDev:  0.200

For this case, only the single data segment results are
interesting, since that's all the server implementation now
supports.

The ext4 results show that the difference between the
average throughput results (112 v. 113) is larger than the
standard deviation: thus, the slower result is not noise
(differences in significant figures notwithstanding).

I've tested on 100GbE (TCP) against a tmpfs export using iozone,
and consistently see 10% lower data and IOPS throughput with
NFSv4.2 READ_PLUS compared with NFSv4.1 with READ.

Maybe tmpfs is not a real world test case? If you don't see a
significant difference on a filesystem that stores its data on
durable media, then maybe my 10% result doesn't matter. But it
does expose an inefficiency somewhere.


> Is there anything else you want me to test?

I was asking not just for more precise test results, but also
for an explanation/analysis of the differences.

At this point I expect the problem is on the client -- perhaps
it is not aligning the receive buffer to expect a single data
segment. Do you think that case should be optimized on the
client? For typical small files, I would expect that single
data segment reads would dominate.

Do you have a way of assessing whether the client has to
re-align READ_PLUS data segments when it receives just one
of them per Reply? There might be a SunRPC tracepoint that
fires when re-alignment is needed, for instance.

I'll add "Simplify READ_PLUS" to the NFSD v6.2 pile, but IMO
understanding (and hopefully addressing) the performance
difference here is key to the success of the Linux READ_PLUS
implementation.

--
Chuck Lever




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 1/2] NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data
  2022-09-13 18:01 ` [PATCH v4 1/2] NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data Anna Schumaker
  2022-09-13 20:12   ` Chuck Lever III
  2022-09-15 19:59   ` Jeff Layton
@ 2022-10-06 16:35   ` Jeff Layton
  2 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2022-10-06 16:35 UTC (permalink / raw)
  To: Anna Schumaker, linux-nfs, chuck.lever

On Tue, 2022-09-13 at 14:01 -0400, Anna Schumaker wrote:
> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> 
> This was discussed with Chuck as part of this patch set. Returning
> nfserr_resource was decided to not be the best error message here, and
> he suggested changing to nfserr_serverfault instead.
> 
> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> ---
>  fs/nfsd/nfs4xdr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 1e9690a061ec..01dd73ed5720 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -3994,7 +3994,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
>  	}
>  	if (resp->xdr->buf->page_len && splice_ok) {
>  		WARN_ON_ONCE(1);
> -		return nfserr_resource;
> +		return nfserr_serverfault;
>  	}
>  	xdr_commit_encode(xdr);
>  


Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 2/2] NFSD: Simplify READ_PLUS
  2022-09-13 18:01 ` [PATCH v4 2/2] NFSD: Simplify READ_PLUS Anna Schumaker
@ 2022-10-06 16:35   ` Jeff Layton
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2022-10-06 16:35 UTC (permalink / raw)
  To: Anna Schumaker, linux-nfs, chuck.lever

On Tue, 2022-09-13 at 14:01 -0400, Anna Schumaker wrote:
> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> 
> Chuck had suggested reverting READ_PLUS so it returns a single DATA
> segment covering the requested read range. This prepares the server for
> a future "sparse read" function so support can easily be added without
> needing to rip out the old READ_PLUS code at the same time.
> 
> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> ---
>  fs/nfsd/nfs4xdr.c | 139 +++++++++++-----------------------------------
>  1 file changed, 32 insertions(+), 107 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 01dd73ed5720..280c7c8ac807 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -4731,79 +4731,37 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
>  
>  static __be32
>  nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
> -			    struct nfsd4_read *read,
> -			    unsigned long *maxcount, u32 *eof,
> -			    loff_t *pos)
> +			    struct nfsd4_read *read)
>  {
> -	struct xdr_stream *xdr = resp->xdr;
> +	bool splice_ok = test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags);
>  	struct file *file = read->rd_nf->nf_file;
> -	int starting_len = xdr->buf->len;
> -	loff_t hole_pos;
> -	__be32 nfserr;
> -	__be32 *p, tmp;
> -	__be64 tmp64;
> -
> -	hole_pos = pos ? *pos : vfs_llseek(file, read->rd_offset, SEEK_HOLE);
> -	if (hole_pos > read->rd_offset)
> -		*maxcount = min_t(unsigned long, *maxcount, hole_pos - read->rd_offset);
> -	*maxcount = min_t(unsigned long, *maxcount, (xdr->buf->buflen - xdr->buf->len));
> +	struct xdr_stream *xdr = resp->xdr;
> +	unsigned long maxcount;
> +	__be32 nfserr, *p;
>  
>  	/* Content type, offset, byte count */
>  	p = xdr_reserve_space(xdr, 4 + 8 + 4);
>  	if (!p)
> -		return nfserr_resource;
> +		return nfserr_io;
> +	if (resp->xdr->buf->page_len && splice_ok) {
> +		WARN_ON_ONCE(splice_ok);
> +		return nfserr_serverfault;
> +	}
>  
> -	read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, *maxcount);
> -	if (read->rd_vlen < 0)
> -		return nfserr_resource;
> +	maxcount = min_t(unsigned long, read->rd_length,
> +			 (xdr->buf->buflen - xdr->buf->len));
>  
> -	nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
> -			    resp->rqstp->rq_vec, read->rd_vlen, maxcount, eof);
> +	if (file->f_op->splice_read && splice_ok)
> +		nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount);
> +	else
> +		nfserr = nfsd4_encode_readv(resp, read, file, maxcount);
>  	if (nfserr)
>  		return nfserr;
> -	xdr_truncate_encode(xdr, starting_len + 16 + xdr_align_size(*maxcount));
>  
> -	tmp = htonl(NFS4_CONTENT_DATA);
> -	write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
> -	tmp64 = cpu_to_be64(read->rd_offset);
> -	write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
> -	tmp = htonl(*maxcount);
> -	write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
> -
> -	tmp = xdr_zero;
> -	write_bytes_to_xdr_buf(xdr->buf, starting_len + 16 + *maxcount, &tmp,
> -			       xdr_pad_size(*maxcount));
> -	return nfs_ok;
> -}
> -
> -static __be32
> -nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
> -			    struct nfsd4_read *read,
> -			    unsigned long *maxcount, u32 *eof)
> -{
> -	struct file *file = read->rd_nf->nf_file;
> -	loff_t data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
> -	loff_t f_size = i_size_read(file_inode(file));
> -	unsigned long count;
> -	__be32 *p;
> -
> -	if (data_pos == -ENXIO)
> -		data_pos = f_size;
> -	else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE))
> -		return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size);
> -	count = data_pos - read->rd_offset;
> -
> -	/* Content type, offset, byte count */
> -	p = xdr_reserve_space(resp->xdr, 4 + 8 + 8);
> -	if (!p)
> -		return nfserr_resource;
> -
> -	*p++ = htonl(NFS4_CONTENT_HOLE);
> +	*p++ = cpu_to_be32(NFS4_CONTENT_DATA);
>  	p = xdr_encode_hyper(p, read->rd_offset);
> -	p = xdr_encode_hyper(p, count);
> +	*p = cpu_to_be32(read->rd_length);
>  
> -	*eof = (read->rd_offset + count) >= f_size;
> -	*maxcount = min_t(unsigned long, count, *maxcount);
>  	return nfs_ok;
>  }
>  
> @@ -4811,69 +4769,36 @@ static __be32
>  nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
>  		       struct nfsd4_read *read)
>  {
> -	unsigned long maxcount, count;
> +	struct file *file = read->rd_nf->nf_file;
>  	struct xdr_stream *xdr = resp->xdr;
> -	struct file *file;
>  	int starting_len = xdr->buf->len;
> -	int last_segment = xdr->buf->len;
> -	int segments = 0;
> -	__be32 *p, tmp;
> -	bool is_data;
> -	loff_t pos;
> -	u32 eof;
> +	u32 segments = 0;
> +	__be32 *p;
>  
>  	if (nfserr)
>  		return nfserr;
> -	file = read->rd_nf->nf_file;
>  
>  	/* eof flag, segment count */
>  	p = xdr_reserve_space(xdr, 4 + 4);
>  	if (!p)
> -		return nfserr_resource;
> +		return nfserr_io;
>  	xdr_commit_encode(xdr);
>  
> -	maxcount = min_t(unsigned long, read->rd_length,
> -			 (xdr->buf->buflen - xdr->buf->len));
> -	count    = maxcount;
> -
> -	eof = read->rd_offset >= i_size_read(file_inode(file));
> -	if (eof)
> +	read->rd_eof = read->rd_offset >= i_size_read(file_inode(file));
> +	if (read->rd_eof)
>  		goto out;
>  
> -	pos = vfs_llseek(file, read->rd_offset, SEEK_HOLE);
> -	is_data = pos > read->rd_offset;
> -
> -	while (count > 0 && !eof) {
> -		maxcount = count;
> -		if (is_data)
> -			nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof,
> -						segments == 0 ? &pos : NULL);
> -		else
> -			nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof);
> -		if (nfserr)
> -			goto out;
> -		count -= maxcount;
> -		read->rd_offset += maxcount;
> -		is_data = !is_data;
> -		last_segment = xdr->buf->len;
> -		segments++;
> -	}
> -
> -out:
> -	if (nfserr && segments == 0)
> +	nfserr = nfsd4_encode_read_plus_data(resp, read);
> +	if (nfserr) {
>  		xdr_truncate_encode(xdr, starting_len);
> -	else {
> -		if (nfserr) {
> -			xdr_truncate_encode(xdr, last_segment);
> -			nfserr = nfs_ok;
> -			eof = 0;
> -		}
> -		tmp = htonl(eof);
> -		write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
> -		tmp = htonl(segments);
> -		write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
> +		return nfserr;
>  	}
>  
> +	segments++;
> +
> +out:
> +	p = xdr_encode_bool(p, read->rd_eof);
> +	*p = cpu_to_be32(segments);
>  	return nfserr;
>  }
>  

Reviewed-by: Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/2] NFSD: Simplify READ_PLUS
  2022-10-05 15:53       ` Chuck Lever III
@ 2022-10-31 17:55         ` Anna Schumaker
  2022-10-31 18:00           ` Chuck Lever III
  0 siblings, 1 reply; 15+ messages in thread
From: Anna Schumaker @ 2022-10-31 17:55 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List

On Wed, Oct 5, 2022 at 11:53 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Oct 5, 2022, at 11:10 AM, Anna Schumaker <anna@kernel.org> wrote:
> >
> > Hi Chuck,
> >
> > On Tue, Sep 13, 2022 at 4:45 PM Anna Schumaker <anna@kernel.org> wrote:
> >>
> >> On Tue, Sep 13, 2022 at 2:45 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>>
> >>>
> >>>
> >>>> On Sep 13, 2022, at 11:01 AM, Anna Schumaker <anna@kernel.org> wrote:
> >>>>
> >>>> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> >>>>
> >>>> When we left off with READ_PLUS, Chuck had suggested reverting the
> >>>> server to reply with a single NFS4_CONTENT_DATA segment essentially
> >>>> mimicing how the READ operation behaves. Then, a future sparse read
> >>>> function can be added and the server modified to support it without
> >>>> needing to rip out the old READ_PLUS code at the same time.
> >>>>
> >>>> This patch takes that first step. I was even able to re-use the
> >>>> nfsd4_encode_readv() and nfsd4_encode_splice_read() functions to
> >>>> remove some duuplicate code.
> >>>>
> >>>> Below is some performance data comparing the READ and READ_PLUS
> >>>> operations with v4.2. I tested reading 2G files with various hole
> >>>> lengths including 100% data, 100% hole, and a handful of mixed hole and
> >>>> data files. For the mixed files, a notation like "1d" means
> >>>> every-other-page is data, and the first page is data. "4h" would mean
> >>>> alternating 4 pages data and 4 pages hole, beginning with hole.
> >>>>
> >>>> I also used the 'vmtouch' utility to make sure the file is either
> >>>> evicted from the server's pagecache ("Uncached on server") or present in
> >>>> the server's page cache ("Cached on server").
> >>>>
> >>>>  2048M-data
> >>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.555 s, 712 MB/s, 0.74 s kern, 24% cpu
> >>>>  :    :........................... Cached on server .....  1.346 s, 1.6 GB/s, 0.69 s kern, 52% cpu
> >>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.596 s, 690 MB/s, 0.72 s kern, 23% cpu
> >>>>       :........................... Cached on server .....  1.394 s, 1.6 GB/s, 0.67 s kern, 48% cpu
> >>>>  2048M-hole
> >>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  4.934 s, 762 MB/s, 1.86 s kern, 29% cpu
> >>>>  :    :........................... Cached on server .....  1.328 s, 1.6 GB/s, 0.72 s kern, 54% cpu
> >>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  4.823 s, 739 MB/s, 1.88 s kern, 28% cpu
> >>>>       :........................... Cached on server .....  1.399 s, 1.5 GB/s, 0.70 s kern, 50% cpu
> >>>>  2048M-mixed-1d
> >>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  4.480 s, 598 MB/s, 0.76 s kern, 21% cpu
> >>>>  :    :........................... Cached on server .....  1.445 s, 1.5 GB/s, 0.71 s kern, 50% cpu
> >>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  4.774 s, 559 MB/s, 0.75 s kern, 19% cpu
> >>>>       :........................... Cached on server .....  1.514 s, 1.4 GB/s, 0.67 s kern, 44% cpu
> >>>>  2048M-mixed-1h
> >>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.568 s, 633 MB/s, 0.78 s kern, 23% cpu
> >>>>  :    :........................... Cached on server .....  1.357 s, 1.6 GB/s, 0.71 s kern, 53% cpu
> >>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.580 s, 641 MB/s, 0.74 s kern, 22% cpu
> >>>>       :........................... Cached on server .....  1.396 s, 1.5 GB/s, 0.67 s kern, 48% cpu
> >>>>  2048M-mixed-2d
> >>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.159 s, 708 MB/s, 0.78 s kern, 26% cpu
> >>>>  :    :........................... Cached on server .....  1.410 s, 1.5 GB/s, 0.70 s kern, 50% cpu
> >>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.093 s, 712 MB/s, 0.74 s kern, 25% cpu
> >>>>       :........................... Cached on server .....  1.474 s, 1.4 GB/s, 0.67 s kern, 46% cpu
> >>>>  2048M-mixed-2h
> >>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.043 s, 722 MB/s, 0.78 s kern, 26% cpu
> >>>>  :    :........................... Cached on server .....  1.374 s, 1.6 GB/s, 0.72 s kern, 53% cpu
> >>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.913 s, 756 MB/s, 0.74 s kern, 26% cpu
> >>>>       :........................... Cached on server .....  1.349 s, 1.6 GB/s, 0.67 s kern, 50% cpu
> >>>>  2048M-mixed-4d
> >>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.275 s, 680 MB/s, 0.75 s kern, 24% cpu
> >>>>  :    :........................... Cached on server .....  1.391 s, 1.5 GB/s, 0.71 s kern, 52% cpu
> >>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.470 s, 626 MB/s, 0.72 s kern, 21% cpu
> >>>>       :........................... Cached on server .....  1.456 s, 1.5 GB/s, 0.67 s kern, 46% cpu
> >>>>  2048M-mixed-4h
> >>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.035 s, 743 MB/s, 0.74 s kern, 26% cpu
> >>>>  :    :........................... Cached on server .....  1.345 s, 1.6 GB/s, 0.71 s kern, 53% cpu
> >>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.848 s, 779 MB/s, 0.73 s kern, 26% cpu
> >>>>       :........................... Cached on server .....  1.421 s, 1.5 GB/s, 0.68 s kern, 48% cpu
> >>>>  2048M-mixed-8d
> >>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.262 s, 687 MB/s, 0.74 s kern, 24% cpu
> >>>>  :    :........................... Cached on server .....  1.366 s, 1.6 GB/s, 0.69 s kern, 51% cpu
> >>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.195 s, 709 MB/s, 0.72 s kern, 24% cpu
> >>>>       :........................... Cached on server .....  1.414 s, 1.5 GB/s, 0.67 s kern, 48% cpu
> >>>>  2048M-mixed-8h
> >>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  2.899 s, 789 MB/s, 0.73 s kern, 27% cpu
> >>>>  :    :........................... Cached on server .....  1.338 s, 1.6 GB/s, 0.69 s kern, 52% cpu
> >>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.910 s, 772 MB/s, 0.72 s kern, 26% cpu
> >>>>       :........................... Cached on server .....  1.438 s, 1.5 GB/s, 0.67 s kern, 47% cpu
> >>>>  2048M-mixed-16d
> >>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.416 s, 661 MB/s, 0.73 s kern, 23% cpu
> >>>>  :    :........................... Cached on server .....  1.345 s, 1.6 GB/s, 0.70 s kern, 53% cpu
> >>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.177 s, 713 MB/s, 0.70 s kern, 23% cpu
> >>>>       :........................... Cached on server .....  1.447 s, 1.5 GB/s, 0.68 s kern, 47% cpu
> >>>>  2048M-mixed-16h
> >>>>  :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  2.919 s, 780 MB/s, 0.73 s kern, 26% cpu
> >>>>  :    :........................... Cached on server .....  1.363 s, 1.6 GB/s, 0.70 s kern, 51% cpu
> >>>>  :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.934 s, 773 MB/s, 0.70 s kern, 25% cpu
> >>>>       :........................... Cached on server .....  1.435 s, 1.5 GB/s, 0.67 s kern, 47% cpu
> >>>
> >>> For this particular change, I'm interested only in cases where the
> >>> whole file is cached on the server. We're focusing on the efficiency
> >>> and performance of the protocol and transport here, not the underlying
> >>> filesystem (which is... xfs?).
> >>
> >> Sounds good, I can narrow down to just that test.
> >>
> >>>
> >>> Also, 2GB files can be read with just 20 1MB READ requests. That
> >>> means we don't have a large sample size of READ operations for any
> >>> single test, assuming the client is using 1MB rsize.
> >>>
> >>> Also, are these averages, or single runs? I think running each test
> >>> 5-10 times (at least) and including some variance data in the results
> >>> would help build more confidence that the small differences in the
> >>> timing are not noise.
> >>
> >> This is an average across 10 runs.
> >>
> >>>
> >>> All that said, however, I see with some consistency that READ_PLUS
> >>> takes longer to pull data over the wire, but uses slightly less CPU.
> >>> Assuming the CPU utilizations are client-side, that matches my
> >>> expectations of lower CPU utilization results if the throughput is
> >>> lower.
> >>>
> >>> Looking at the 100% data results, READ_PLUS takes 3.5% longer than
> >>> READ. That to me is a small but significant drop -- I think it will
> >>> be noticeable for large workloads. Can you explain the difference?
> >>
> >> I'll try larger files for my next round of testing. I was assuming the
> >> difference is just noise, since there are cases like the mixed-2h test
> >> where READ_PLUS was slightly faster. But more testing will help figure
> >> that out.
> >>
> >>>
> >>> For subsequent test runs, can you find a server with more memory,
> >>> test with larger files, and test with a variety of rsize settings?
> >>> You can reduce your test matrix by leaving out the tests with holey
> >>> files for the moment.
> >
> > Hi Chuck,
> >
> > The following numbers are for 10G files that I created on Netapp lab
> > machines. I narrowed down my testing to files already in the server's
> > cache and read with directio to get the pagecache out of the way as
> > much as possible.  I did 25 iterations this time around, and included
> > minimum time, maximum time, and standard deviation in the report.
> >
> > The following numbers are for XFS:
> >
> >   10240M-data
> >   :... v6.0-rc6 (w/o Read Plus) ... 95.804 s, 112 MB/s, 0.42 s kern,  0% cpu
> >   :    :........................... Min: 108.000, Max: 114.000, StDev:  1.555
> >   :... v6.0-rc6 (w/ Read Plus) .... 96.683 s, 111 MB/s, 0.42 s kern,  0% cpu
> >        :........................... Min: 107.000, Max: 113.000, StDev:  1.481
>
>
> > And here is EXT4:
> >   10240M-data
> >   :... v6.0-rc6 (w/o Read Plus) ... 95.419 s, 113 MB/s, 0.43 s kern,  0% cpu
> >   :    :........................... Min: 111.000, Max: 113.000, StDev:  0.557
> >   :... v6.0-rc6 (w/ Read Plus) .... 95.764 s, 112 MB/s, 0.42 s kern,  0% cpu
> >        :........................... Min: 111.000, Max: 112.000, StDev:  0.200
>
> For this case, only the single data segment results are
> interesting, since that's all the server implementation now
> supports.
>
> The ext4 results show that the difference between the
> average throughput results (112 v. 113) is larger than the
> standard deviation: thus, the slower result is not noise
> (differences in significant figures notwithstanding).
>
> I've tested on 100GbE (TCP) against a tmpfs export using iozone,
> and consistently see 10% lower data and IOPS throughput with
> NFSv4.2 READ_PLUS compared with NFSv4.1 with READ.
>
> Maybe tmpfs is not a real world test case? If you don't see a
> significant difference on a filesystem that stores its data on
> durable media, then maybe my 10% result doesn't matter. But it
> does expose an inefficiency somewhere.
>
>
> > Is there anything else you want me to test?
>
> I was asking not just for more precise test results, but also
> for an explanation/analysis of the differences.
>
> At this point I expect the problem is on the client -- perhaps
> it is not aligning the receive buffer to expect a single data
> segment. Do you think that case should be optimized on the
> client? For typical small files, I would expect that single
> data segment reads would dominate.

Hi Chuck, I added a patch to my client to hack in decoding
single-segment READ_PLUS calls the same way we decode READ, and I'm
not seeing a huge difference in transfer speed before and after the
patch:

With EXT4:
   10240M-data
   :... v6.0-rc6 (w/o Read Plus) ... 94.648 s, 113 MB/s, 0.44 s kern,  0% cpu
   :    :........................... Min: 94.500, Max: 95.141, StDev:  0.107
   :... v6.0-rc6 (w/ Read Plus) .... 95.831 s, 112 MB/s, 0.44 s kern,  0% cpu
   :    :........................... Min: 95.731, Max: 96.261, StDev:  0.089
   :... w/ Client Patch ............ 95.799 s, 112 MB/s, 0.44 s kern,  0% cpu
        :........................... Min: 95.690, Max: 96.229, StDev:  0.089

And with XFS:
   10240M-data
   :... v6.0-rc6 (w/o Read Plus) ... 94.443 s, 114 MB/s, 0.43 s kern,  0% cpu
   :    :........................... Min: 94.217, Max: 94.653, StDev:  0.095
   :... v6.0-rc6 (w/ Read Plus) .... 95.725 s, 112 MB/s, 0.44 s kern,  0% cpu
   :    :........................... Min: 95.612, Max: 95.843, StDev:  0.067
   :... w/ Client Patch ............ 95.721 s, 112 MB/s, 0.44 s kern,  0% cpu
        :........................... Min: 95.602, Max: 95.805, StDev:  0.052


>
> Do you have a way of assessing whether the client has to
> re-align READ_PLUS data segments when it receives just one
> of them per Reply? There might be a SunRPC tracepoint that
> fires when re-alignment is needed, for instance.

For the READ case, the client always calls xdr_realign_pages() during
decoding to align the data so it is always doing some shifting around
to get the read reply positioned right.

Anna

>
> I'll add "Simplify READ_PLUS" to the NFSD v6.2 pile, but IMO
> understanding (and hopefully addressing) the performance
> difference here is key to the success of the Linux READ_PLUS
> implementation.
>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/2] NFSD: Simplify READ_PLUS
  2022-10-31 17:55         ` Anna Schumaker
@ 2022-10-31 18:00           ` Chuck Lever III
  0 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever III @ 2022-10-31 18:00 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Linux NFS Mailing List



> On Oct 31, 2022, at 1:55 PM, Anna Schumaker <anna@kernel.org> wrote:
> 
> On Wed, Oct 5, 2022 at 11:53 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>> 
>> 
>> 
>>> On Oct 5, 2022, at 11:10 AM, Anna Schumaker <anna@kernel.org> wrote:
>>> 
>>> Hi Chuck,
>>> 
>>> On Tue, Sep 13, 2022 at 4:45 PM Anna Schumaker <anna@kernel.org> wrote:
>>>> 
>>>> On Tue, Sep 13, 2022 at 2:45 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Sep 13, 2022, at 11:01 AM, Anna Schumaker <anna@kernel.org> wrote:
>>>>>> 
>>>>>> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
>>>>>> 
>>>>>> When we left off with READ_PLUS, Chuck had suggested reverting the
>>>>>> server to reply with a single NFS4_CONTENT_DATA segment essentially
>>>>>> mimicing how the READ operation behaves. Then, a future sparse read
>>>>>> function can be added and the server modified to support it without
>>>>>> needing to rip out the old READ_PLUS code at the same time.
>>>>>> 
>>>>>> This patch takes that first step. I was even able to re-use the
>>>>>> nfsd4_encode_readv() and nfsd4_encode_splice_read() functions to
>>>>>> remove some duuplicate code.
>>>>>> 
>>>>>> Below is some performance data comparing the READ and READ_PLUS
>>>>>> operations with v4.2. I tested reading 2G files with various hole
>>>>>> lengths including 100% data, 100% hole, and a handful of mixed hole and
>>>>>> data files. For the mixed files, a notation like "1d" means
>>>>>> every-other-page is data, and the first page is data. "4h" would mean
>>>>>> alternating 4 pages data and 4 pages hole, beginning with hole.
>>>>>> 
>>>>>> I also used the 'vmtouch' utility to make sure the file is either
>>>>>> evicted from the server's pagecache ("Uncached on server") or present in
>>>>>> the server's page cache ("Cached on server").
>>>>>> 
>>>>>> 2048M-data
>>>>>> :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.555 s, 712 MB/s, 0.74 s kern, 24% cpu
>>>>>> :    :........................... Cached on server .....  1.346 s, 1.6 GB/s, 0.69 s kern, 52% cpu
>>>>>> :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.596 s, 690 MB/s, 0.72 s kern, 23% cpu
>>>>>>      :........................... Cached on server .....  1.394 s, 1.6 GB/s, 0.67 s kern, 48% cpu
>>>>>> 2048M-hole
>>>>>> :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  4.934 s, 762 MB/s, 1.86 s kern, 29% cpu
>>>>>> :    :........................... Cached on server .....  1.328 s, 1.6 GB/s, 0.72 s kern, 54% cpu
>>>>>> :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  4.823 s, 739 MB/s, 1.88 s kern, 28% cpu
>>>>>>      :........................... Cached on server .....  1.399 s, 1.5 GB/s, 0.70 s kern, 50% cpu
>>>>>> 2048M-mixed-1d
>>>>>> :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  4.480 s, 598 MB/s, 0.76 s kern, 21% cpu
>>>>>> :    :........................... Cached on server .....  1.445 s, 1.5 GB/s, 0.71 s kern, 50% cpu
>>>>>> :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  4.774 s, 559 MB/s, 0.75 s kern, 19% cpu
>>>>>>      :........................... Cached on server .....  1.514 s, 1.4 GB/s, 0.67 s kern, 44% cpu
>>>>>> 2048M-mixed-1h
>>>>>> :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.568 s, 633 MB/s, 0.78 s kern, 23% cpu
>>>>>> :    :........................... Cached on server .....  1.357 s, 1.6 GB/s, 0.71 s kern, 53% cpu
>>>>>> :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.580 s, 641 MB/s, 0.74 s kern, 22% cpu
>>>>>>      :........................... Cached on server .....  1.396 s, 1.5 GB/s, 0.67 s kern, 48% cpu
>>>>>> 2048M-mixed-2d
>>>>>> :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.159 s, 708 MB/s, 0.78 s kern, 26% cpu
>>>>>> :    :........................... Cached on server .....  1.410 s, 1.5 GB/s, 0.70 s kern, 50% cpu
>>>>>> :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.093 s, 712 MB/s, 0.74 s kern, 25% cpu
>>>>>>      :........................... Cached on server .....  1.474 s, 1.4 GB/s, 0.67 s kern, 46% cpu
>>>>>> 2048M-mixed-2h
>>>>>> :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.043 s, 722 MB/s, 0.78 s kern, 26% cpu
>>>>>> :    :........................... Cached on server .....  1.374 s, 1.6 GB/s, 0.72 s kern, 53% cpu
>>>>>> :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.913 s, 756 MB/s, 0.74 s kern, 26% cpu
>>>>>>      :........................... Cached on server .....  1.349 s, 1.6 GB/s, 0.67 s kern, 50% cpu
>>>>>> 2048M-mixed-4d
>>>>>> :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.275 s, 680 MB/s, 0.75 s kern, 24% cpu
>>>>>> :    :........................... Cached on server .....  1.391 s, 1.5 GB/s, 0.71 s kern, 52% cpu
>>>>>> :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.470 s, 626 MB/s, 0.72 s kern, 21% cpu
>>>>>>      :........................... Cached on server .....  1.456 s, 1.5 GB/s, 0.67 s kern, 46% cpu
>>>>>> 2048M-mixed-4h
>>>>>> :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.035 s, 743 MB/s, 0.74 s kern, 26% cpu
>>>>>> :    :........................... Cached on server .....  1.345 s, 1.6 GB/s, 0.71 s kern, 53% cpu
>>>>>> :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.848 s, 779 MB/s, 0.73 s kern, 26% cpu
>>>>>>      :........................... Cached on server .....  1.421 s, 1.5 GB/s, 0.68 s kern, 48% cpu
>>>>>> 2048M-mixed-8d
>>>>>> :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.262 s, 687 MB/s, 0.74 s kern, 24% cpu
>>>>>> :    :........................... Cached on server .....  1.366 s, 1.6 GB/s, 0.69 s kern, 51% cpu
>>>>>> :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.195 s, 709 MB/s, 0.72 s kern, 24% cpu
>>>>>>      :........................... Cached on server .....  1.414 s, 1.5 GB/s, 0.67 s kern, 48% cpu
>>>>>> 2048M-mixed-8h
>>>>>> :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  2.899 s, 789 MB/s, 0.73 s kern, 27% cpu
>>>>>> :    :........................... Cached on server .....  1.338 s, 1.6 GB/s, 0.69 s kern, 52% cpu
>>>>>> :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.910 s, 772 MB/s, 0.72 s kern, 26% cpu
>>>>>>      :........................... Cached on server .....  1.438 s, 1.5 GB/s, 0.67 s kern, 47% cpu
>>>>>> 2048M-mixed-16d
>>>>>> :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  3.416 s, 661 MB/s, 0.73 s kern, 23% cpu
>>>>>> :    :........................... Cached on server .....  1.345 s, 1.6 GB/s, 0.70 s kern, 53% cpu
>>>>>> :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  3.177 s, 713 MB/s, 0.70 s kern, 23% cpu
>>>>>>      :........................... Cached on server .....  1.447 s, 1.5 GB/s, 0.68 s kern, 47% cpu
>>>>>> 2048M-mixed-16h
>>>>>> :... v6.0-rc4 (w/o Read Plus) ... Uncached on server ...  2.919 s, 780 MB/s, 0.73 s kern, 26% cpu
>>>>>> :    :........................... Cached on server .....  1.363 s, 1.6 GB/s, 0.70 s kern, 51% cpu
>>>>>> :... v6.0-rc4 (w/ Read Plus) .... Uncached on server ...  2.934 s, 773 MB/s, 0.70 s kern, 25% cpu
>>>>>>      :........................... Cached on server .....  1.435 s, 1.5 GB/s, 0.67 s kern, 47% cpu
>>>>> 
>>>>> For this particular change, I'm interested only in cases where the
>>>>> whole file is cached on the server. We're focusing on the efficiency
>>>>> and performance of the protocol and transport here, not the underlying
>>>>> filesystem (which is... xfs?).
>>>> 
>>>> Sounds good, I can narrow down to just that test.
>>>> 
>>>>> 
>>>>> Also, 2GB files can be read with just 20 1MB READ requests. That
>>>>> means we don't have a large sample size of READ operations for any
>>>>> single test, assuming the client is using 1MB rsize.
>>>>> 
>>>>> Also, are these averages, or single runs? I think running each test
>>>>> 5-10 times (at least) and including some variance data in the results
>>>>> would help build more confidence that the small differences in the
>>>>> timing are not noise.
>>>> 
>>>> This is an average across 10 runs.
>>>> 
>>>>> 
>>>>> All that said, however, I see with some consistency that READ_PLUS
>>>>> takes longer to pull data over the wire, but uses slightly less CPU.
>>>>> Assuming the CPU utilizations are client-side, that matches my
>>>>> expectations of lower CPU utilization results if the throughput is
>>>>> lower.
>>>>> 
>>>>> Looking at the 100% data results, READ_PLUS takes 3.5% longer than
>>>>> READ. That to me is a small but significant drop -- I think it will
>>>>> be noticeable for large workloads. Can you explain the difference?
>>>> 
>>>> I'll try larger files for my next round of testing. I was assuming the
>>>> difference is just noise, since there are cases like the mixed-2h test
>>>> where READ_PLUS was slightly faster. But more testing will help figure
>>>> that out.
>>>> 
>>>>> 
>>>>> For subsequent test runs, can you find a server with more memory,
>>>>> test with larger files, and test with a variety of rsize settings?
>>>>> You can reduce your test matrix by leaving out the tests with holey
>>>>> files for the moment.
>>> 
>>> Hi Chuck,
>>> 
>>> The following numbers are for 10G files that I created on Netapp lab
>>> machines. I narrowed down my testing to files already in the server's
>>> cache and read with directio to get the pagecache out of the way as
>>> much as possible.  I did 25 iterations this time around, and included
>>> minimum time, maximum time, and standard deviation in the report.
>>> 
>>> The following numbers are for XFS:
>>> 
>>>  10240M-data
>>>  :... v6.0-rc6 (w/o Read Plus) ... 95.804 s, 112 MB/s, 0.42 s kern,  0% cpu
>>>  :    :........................... Min: 108.000, Max: 114.000, StDev:  1.555
>>>  :... v6.0-rc6 (w/ Read Plus) .... 96.683 s, 111 MB/s, 0.42 s kern,  0% cpu
>>>       :........................... Min: 107.000, Max: 113.000, StDev:  1.481
>> 
>> 
>>> And here is EXT4:
>>>  10240M-data
>>>  :... v6.0-rc6 (w/o Read Plus) ... 95.419 s, 113 MB/s, 0.43 s kern,  0% cpu
>>>  :    :........................... Min: 111.000, Max: 113.000, StDev:  0.557
>>>  :... v6.0-rc6 (w/ Read Plus) .... 95.764 s, 112 MB/s, 0.42 s kern,  0% cpu
>>>       :........................... Min: 111.000, Max: 112.000, StDev:  0.200
>> 
>> For this case, only the single data segment results are
>> interesting, since that's all the server implementation now
>> supports.
>> 
>> The ext4 results show that the difference between the
>> average throughput results (112 v. 113) is larger than the
>> standard deviation: thus, the slower result is not noise
>> (differences in significant figures notwithstanding).
>> 
>> I've tested on 100GbE (TCP) against a tmpfs export using iozone,
>> and consistently see 10% lower data and IOPS throughput with
>> NFSv4.2 READ_PLUS compared with NFSv4.1 with READ.
>> 
>> Maybe tmpfs is not a real world test case? If you don't see a
>> significant difference on a filesystem that stores its data on
>> durable media, then maybe my 10% result doesn't matter. But it
>> does expose an inefficiency somewhere.
>> 
>> 
>>> Is there anything else you want me to test?
>> 
>> I was asking not just for more precise test results, but also
>> for an explanation/analysis of the differences.
>> 
>> At this point I expect the problem is on the client -- perhaps
>> it is not aligning the receive buffer to expect a single data
>> segment. Do you think that case should be optimized on the
>> client? For typical small files, I would expect that single
>> data segment reads would dominate.
> 
> Hi Chuck, I added a patch to my client to hack in decoding
> single-segment READ_PLUS calls the same way we decode READ, and I'm
> not seeing a huge difference in transfer speed before and after the
> patch:
> 
> With EXT4:
>   10240M-data
>   :... v6.0-rc6 (w/o Read Plus) ... 94.648 s, 113 MB/s, 0.44 s kern,  0% cpu
>   :    :........................... Min: 94.500, Max: 95.141, StDev:  0.107
>   :... v6.0-rc6 (w/ Read Plus) .... 95.831 s, 112 MB/s, 0.44 s kern,  0% cpu
>   :    :........................... Min: 95.731, Max: 96.261, StDev:  0.089
>   :... w/ Client Patch ............ 95.799 s, 112 MB/s, 0.44 s kern,  0% cpu
>        :........................... Min: 95.690, Max: 96.229, StDev:  0.089
> 
> And with XFS:
>   10240M-data
>   :... v6.0-rc6 (w/o Read Plus) ... 94.443 s, 114 MB/s, 0.43 s kern,  0% cpu
>   :    :........................... Min: 94.217, Max: 94.653, StDev:  0.095
>   :... v6.0-rc6 (w/ Read Plus) .... 95.725 s, 112 MB/s, 0.44 s kern,  0% cpu
>   :    :........................... Min: 95.612, Max: 95.843, StDev:  0.067
>   :... w/ Client Patch ............ 95.721 s, 112 MB/s, 0.44 s kern,  0% cpu
>        :........................... Min: 95.602, Max: 95.805, StDev:  0.052
> 
> 
>> 
>> Do you have a way of assessing whether the client has to
>> re-align READ_PLUS data segments when it receives just one
>> of them per Reply? There might be a SunRPC tracepoint that
>> fires when re-alignment is needed, for instance.
> 
> For the READ case, the client always calls xdr_realign_pages() during
> decoding to align the data so it is always doing some shifting around
> to get the read reply positioned right.

I recall that we added a tracepoint in there to catch instances of
re-aligning reply payloads because that is known to be inefficient
and thus it is undesirable in the I/O path.

If you see trace_rpc_xdr_alignment records in your trace log, that
means the send path is setting up the reply xdr_buf incorrectly.


> Anna
> 
>> 
>> I'll add "Simplify READ_PLUS" to the NFSD v6.2 pile, but IMO
>> understanding (and hopefully addressing) the performance
>> difference here is key to the success of the Linux READ_PLUS
>> implementation.
>> 
>> --
>> Chuck Lever

--
Chuck Lever




^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-10-31 18:00 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-13 18:01 [PATCH v4 0/2] NFSD: Simplify READ_PLUS Anna Schumaker
2022-09-13 18:01 ` [PATCH v4 1/2] NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data Anna Schumaker
2022-09-13 20:12   ` Chuck Lever III
2022-09-13 20:42     ` Anna Schumaker
2022-09-15 19:59   ` Jeff Layton
2022-09-16  2:17     ` Chuck Lever III
2022-10-06 16:35   ` Jeff Layton
2022-09-13 18:01 ` [PATCH v4 2/2] NFSD: Simplify READ_PLUS Anna Schumaker
2022-10-06 16:35   ` Jeff Layton
2022-09-13 18:45 ` [PATCH v4 0/2] " Chuck Lever III
2022-09-13 20:45   ` Anna Schumaker
2022-10-05 15:10     ` Anna Schumaker
2022-10-05 15:53       ` Chuck Lever III
2022-10-31 17:55         ` Anna Schumaker
2022-10-31 18:00           ` Chuck Lever III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).