linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: bfields@fieldses.org (J. Bruce Fields)
To: schumaker.anna@gmail.com
Cc: bfields@redhat.com, linux-nfs@vger.kernel.org,
	Anna.Schumaker@Netapp.com, Chuck Lever <chuck.lever@oracle.com>
Subject: Re: [PATCH v2 0/4] NFSD: Add support for the v4.2 READ_PLUS operation
Date: Tue, 3 Mar 2020 10:08:33 -0500	[thread overview]
Message-ID: <20200303150833.GB17257@fieldses.org> (raw)
In-Reply-To: <20200214211206.407725-1-Anna.Schumaker@Netapp.com>

Sorry for the delay, looking at this a little more carefully now....

Previously I remember you found a problem with very slow
SEEK_HOLE/SEEK_DATA on some filesystems--has that been fixed?

On Fri, Feb 14, 2020 at 04:12:02PM -0500, schumaker.anna@gmail.com wrote:
> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> 
> These patches add server support for the READ_PLUS operation, which
> breaks read requests into several "data" and "hole" segments when
> replying to the client.
> 
> Here are the results of some performance tests I ran on Netapp lab
> machines.

Any details?  Ideally we'd have enough detail about the hardware and
software used that someone else could reproduce your results if
necessary.

At a minimum I think it would be helpful to know your network latency
and round trip time.  RPC statistics (e.g. number of round trips) might
also be interesting.

Is this a single run for each number?

> I tested by reading various 2G files from a few different
> undelying filesystems and across several NFS versions. I used the
> `vmtouch` utility to make sure files were only cached when we wanted
> them to be. In addition to 100% data and 100% hole cases, I also tested
> with files that alternate between data and hole segments. These files
> have either 4K, 8K, 16K, or 32K segment sizes and start with either data
> or hole segments. So the file mixed-4d has a 4K segment size beginning
> with a data segment, but mixed-32h hase 32K segments beginning with a
> hole. The units are in seconds, with the first number for each NFS
> version being the uncached read time and the second number is for when
> the file is cached on the server.

OK, READ_PLUS is in 4.2, so it's the last column that's the most
interesting one:

> 
> ext4      |        v3       |       v4.0      |       v4.1      |       v4.2      |
> ----------|-----------------|-----------------|-----------------|-----------------|
> data      | 22.909 : 18.253 | 22.934 : 18.252 | 22.902 : 18.253 | 23.485 : 18.253 |

So, the 4.2 case may be taking a couple percent longer in the case there
are no holes.

> hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 |  0.708 :  0.709 |

And as expected READ_PLUS is a big advantage when the file is one big
hole.  And there's no difference between cached and uncached reads in
this case since the server's got no data to read off its disk.

> mixed-4d  | 28.261 : 18.253 | 29.616 : 18.252 | 28.341 : 18.252 | 24.508 :  9.150 |
> mixed-8d  | 27.956 : 18.253 | 28.404 : 18.252 | 28.320 : 18.252 | 23.967 :  9.140 |
> mixed-16d | 28.172 : 18.253 | 27.946 : 18.252 | 27.627 : 18.252 | 23.043 :  9.134 |
> mixed-32d | 25.350 : 18.253 | 24.406 : 18.252 | 24.384 : 18.253 | 20.698 :  9.132 |
> mixed-4h  | 28.913 : 18.253 | 28.564 : 18.252 | 27.996 : 18.252 | 21.837 :  9.150 |
> mixed-8h  | 28.625 : 18.253 | 27.833 : 18.252 | 27.798 : 18.253 | 21.710 :  9.140 |
> mixed-16h | 27.975 : 18.253 | 27.662 : 18.252 | 27.795 : 18.253 | 20.585 :  9.134 |
> mixed-32h | 25.958 : 18.253 | 25.491 : 18.252 | 24.856 : 18.252 | 21.018 :  9.132 |

So looks like READ_PLUS helps in every case and there's a slight
improvement with larger hole/data segments, so the seeking does have
some overhead.  (Either that or it's just the extra rpc round trips--I
seem to recall this READ_PLUS implementation only handles at most one
hole and one data segment.  But the fact that the times are so similar
in the uncached case suggests rpc latency isn't a factor--what's your
network?)

I wonder why the hole-first cases are faster than the data-first?

> 
> xfs       |        v3       |       v4.0      |       v4.1      |       v4.2      |
> ----------|-----------------|-----------------|-----------------|-----------------|
> data      | 22.041 : 18.253 | 22.618 : 18.252 | 23.067 : 18.253 | 23.496 : 18.253 |
> hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 |  0.723 :  0.708 |
> mixed-4d  | 29.417 : 18.253 | 28.503 : 18.252 | 28.671 : 18.253 | 24.957 :  9.150 |
> mixed-8d  | 29.080 : 18.253 | 29.401 : 18.252 | 29.251 : 18.252 | 24.625 :  9.140 |
> mixed-16d | 27.638 : 18.253 | 28.606 : 18.252 | 27.871 : 18.253 | 25.511 :  9.135 |
> mixed-32d | 24.967 : 18.253 | 25.239 : 18.252 | 25.434 : 18.252 | 21.728 :  9.132 |
> mixed-4h  | 34.816 : 18.253 | 36.243 : 18.252 | 35.837 : 18.252 | 32.332 :  9.150 |
> mixed-8h  | 43.469 : 18.253 | 44.009 : 18.252 | 43.810 : 18.253 | 37.962 :  9.140 |
> mixed-16h | 29.280 : 18.253 | 28.563 : 18.252 | 28.241 : 18.252 | 22.116 :  9.134 |
> mixed-32h | 29.428 : 18.253 | 29.378 : 18.252 | 28.808 : 18.253 | 27.378 :  9.134 |
>
> btrfs     |        v3       |       v4.0      |       v4.1      |       v4.2      |
> ----------|-----------------|-----------------|-----------------|-----------------|
> data      | 25.547 : 18.253 | 25.053 : 18.252 | 24.209 : 18.253 | 32.121 : 18.253 |
> hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.252 |  0.702 :  0.724 |
> mixed-4d  | 19.016 : 18.253 | 18.822 : 18.252 | 18.955 : 18.253 | 18.697 :  9.150 |
> mixed-8d  | 19.186 : 18.253 | 19.444 : 18.252 | 18.841 : 18.253 | 18.452 :  9.140 |
> mixed-16d | 18.480 : 18.253 | 19.010 : 18.252 | 19.167 : 18.252 | 16.000 :  9.134 |
> mixed-32d | 18.635 : 18.253 | 18.565 : 18.252 | 18.550 : 18.252 | 15.930 :  9.132 |
> mixed-4h  | 19.079 : 18.253 | 18.990 : 18.252 | 19.157 : 18.253 | 27.834 :  9.150 |
> mixed-8h  | 18.613 : 18.253 | 19.234 : 18.252 | 18.616 : 18.253 | 20.177 :  9.140 |
> mixed-16h | 18.590 : 18.253 | 19.221 : 18.252 | 19.654 : 18.253 | 17.273 :  9.135 |
> mixed-32h | 18.768 : 18.253 | 19.122 : 18.252 | 18.535 : 18.252 | 15.791 :  9.132 |
> 
> ext3      |        v3       |       v4.0      |       v4.1      |       v4.2      |
> ----------|-----------------|-----------------|-----------------|-----------------|
> data      | 34.292 : 18.253 | 33.810 : 18.252 | 33.450 : 18.253 | 33.390 : 18.254 |
> hole      | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 |  0.718 :  0.728 |
> mixed-4d  | 46.818 : 18.253 | 47.140 : 18.252 | 48.385 : 18.253 | 42.887 :  9.150 |
> mixed-8d  | 58.554 : 18.253 | 59.277 : 18.252 | 59.673 : 18.253 | 56.760 :  9.140 |
> mixed-16d | 44.631 : 18.253 | 44.291 : 18.252 | 44.729 : 18.253 | 40.237 :  9.135 |
> mixed-32d | 39.110 : 18.253 | 38.735 : 18.252 | 38.902 : 18.252 | 35.270 :  9.132 |
> mixed-4h  | 56.396 : 18.253 | 56.387 : 18.252 | 56.573 : 18.253 | 67.661 :  9.150 |
> mixed-8h  | 58.483 : 18.253 | 58.484 : 18.252 | 59.099 : 18.253 | 77.958 :  9.140 |
> mixed-16h | 42.511 : 18.253 | 42.338 : 18.252 | 42.356 : 18.252 | 51.805 :  9.135 |
> mixed-32h | 38.419 : 18.253 | 38.504 : 18.252 | 38.643 : 18.252 | 40.411 :  9.132 |
> 
> Any questions?

I'm surprised at the big differences between filesystems in the mixed
cases.  Time for the uncached mixed-4h NFSv4.1 read is (19s, 28s, 36s,
57s) respectively for (btrfs, ext4, xfs, ext3).

READ_PLUS means giving up zero-copy on the client since the offset of
read data in the reply is no longer predictable, I wonder what sort of
test would show that.

--b.

      parent reply	other threads:[~2020-03-03 15:08 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-14 21:12 [PATCH v2 0/4] NFSD: Add support for the v4.2 READ_PLUS operation schumaker.anna
2020-02-14 21:12 ` [PATCH v2 1/4] NFSD: Return eof and maxcount to nfsd4_encode_read() schumaker.anna
2020-02-14 22:20   ` Chuck Lever
2020-02-17 19:52     ` J. Bruce Fields
2020-02-14 21:12 ` [PATCH v2 2/4] NFSD: Add READ_PLUS data support schumaker.anna
2020-02-14 21:12 ` [PATCH v2 3/4] NFSD: Add READ_PLUS hole segment encoding schumaker.anna
2020-02-14 21:12 ` [PATCH v2 4/4] NFSD: Encode a full READ_PLUS reply schumaker.anna
2020-03-03 15:08 ` J. Bruce Fields [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200303150833.GB17257@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=Anna.Schumaker@Netapp.com \
    --cc=bfields@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=schumaker.anna@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).