All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "bcodding@redhat.com" <bcodding@redhat.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index
Date: Fri, 11 Mar 2022 16:51:21 +0000	[thread overview]
Message-ID: <2dad7c315248b6c31b6e2da5857d2c7eb410ff79.camel@hammerspace.com> (raw)
In-Reply-To: <FDC0812F-DD13-4D13-8F0A-08C5C533B13C@redhat.com>

On Fri, 2022-03-11 at 11:14 -0500, Benjamin Coddington wrote:
> On 11 Mar 2022, at 9:02, Trond Myklebust wrote:
> 
> > On Fri, 2022-03-11 at 06:58 -0500, Benjamin Coddington wrote:
> > > On 10 Mar 2022, at 16:07, Trond Myklebust wrote:
> > > 
> > > > On Wed, 2022-03-09 at 15:01 -0500, Benjamin Coddington wrote:
> > > > > On 27 Feb 2022, at 18:12, trondmy@kernel.org wrote:
> > > > > 
> > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > > > 
> > > > > > Instead of using a linear index to address the pages, use
> > > > > > the
> > > > > > cookie of
> > > > > > the first entry, since that is what we use to match the
> > > > > > page
> > > > > > anyway.
> > > > > > 
> > > > > > This allows us to avoid re-reading the entire cache on a
> > > > > > seekdir()
> > > > > > type
> > > > > > of operation. The latter is very common when re-exporting
> > > > > > NFS,
> > > > > > and
> > > > > > is a
> > > > > > major performance drain.
> > > > > > 
> > > > > > The change does affect our duplicate cookie detection,
> > > > > > since we
> > > > > > can
> > > > > > no
> > > > > > longer rely on the page index as a linear offset for
> > > > > > detecting
> > > > > > whether
> > > > > > we looped backwards. However since we no longer do a linear
> > > > > > search
> > > > > > through all the pages on each call to nfs_readdir(), this
> > > > > > is
> > > > > > less
> > > > > > of a
> > > > > > concern than it was previously.
> > > > > > The other downside is that invalidate_mapping_pages() no
> > > > > > longer
> > > > > > can
> > > > > > use
> > > > > > the page index to avoid clearing pages that have been read.
> > > > > > A
> > > > > > subsequent
> > > > > > patch will restore the functionality this provides to the
> > > > > > 'ls -
> > > > > > l'
> > > > > > heuristic.
> > > > > 
> > > > > I didn't realize the approach was to also hash out the
> > > > > linearly-
> > > > > cached
> > > > > entries.  I thought we'd do something like flag the context
> > > > > for
> > > > > hashed page
> > > > > indexes after a seekdir event, and if there are collisions
> > > > > with
> > > > > the
> > > > > linear
> > > > > entries, they'll get fixed up when found.
> > > > 
> > > > Why? What's the point of using 2 models where 1 will do?
> > > 
> > > I don't think the hashed model is quite as simple and efficient
> > > overall, and
> > > may produce impacts to a system beyond NFS.
> > > 
> > > > > 
> > > > > Doesn't that mean that with this approach seekdir() only hits
> > > > > the
> > > > > same pages
> > > > > when the entry offset is page-aligned?  That's 1 in 127 odds.
> > > > 
> > > > The point is not to stomp all over the pages that contain
> > > > aligned
> > > > data
> > > > when the application does call seekdir().
> > > > 
> > > > IOW: we always optimise for the case where we do a linear read
> > > > of
> > > > the
> > > > directory, but we support random seekdir() + read too.
> > > 
> > > And that could be done just by bumping the seekdir users to some
> > > constant
> > > offset (index 262144 ?), or something else equally dead-nuts
> > > simple. 
> > > That
> > > keeps tightly clustered page indexes, so walking the cache is
> > > faster.  That
> > > reduces the "buckshot" effect the hashing has of eating up
> > > pagecache
> > > pages
> > > they'll never use again.  That doesn't cap our caching ability at
> > > 33
> > > million
> > > entries.
> > > 
> > 
> > What you say would make sense if readdir cookies truly were
> > offsets,
> > but in general they're not. Cookies are unstructured data, and
> > should
> > be treated as unstructured data.
> > 
> > Let's say I do cache more than 33 million entries and I have to
> > find a
> > cookie. I have to linearly read through at least 1GB of cached data
> > before I can give up and start a new readdir. Either that, or I
> > need to
> > have a heuristic that tells me when to stop searching, and then
> > another
> > heuristic that tells me where to store the data in a way that
> > doesn't
> > trash the page cache.
> > 
> > With the hashing, I seek to the page matching the hash, and I
> > either
> > immediately find what I need, or I immediately know to start a
> > readdir.
> > There is no need for any additional heuristic.
> 
> The scenario where we want to find a cookie while not doing a linear
> pass
> through the directory will be the seekdir() case.  In a linear walk,
> we have
> the cached page index to help.  So in the seekdir case, the chances
> of
> having someone already fill a page and also having the cookie be the
> 1 in
> 127 that are page-aligned (and so match an already cached page) are
> small, I
> think.  Unless your use-case will often hit the exact same offsets
> over and
> over.

For the use case where we are reexporting NFS, it can definitely
happen.
Firstly, the clients usually are reading the reexported directory
linearly, so they will tend to follow the same cookie request patterns.
Secondly, we're not going to replay the readdir from the duplicate
reply cache if the client resends the request. So even if you only have
one client, there can be a benefit.

> 
> So with the hashing and seekdir case, I think that the cache will be
> pretty
> heavily filled with the same duplicated data at various offsets and
> rarely
> useful.  That's why I wondered if you'd tested your use-case for it
> and found
> it to be advantageous.  I think what we've got is going to work fine,
> but I
> wonder if you've seen it to work well.
> 
> The major pain point most of our users complain about is not being
> able to
> perform a complete walk in linear time with respect to size with
> invalidations at play.  This series fixes that, and is a huge bonus. 
> Other
> smaller performance improvements are pale in comparison for us, and
> might
> just get us forever chasing one or two minor optimizations that have
> trade-offs.
> 
> There's a lot of variables at play.  For some client/server setups
> (like
> some low-latency RDMA), and very large directories and cache sizes,
> it might
> be more performant to just do the READDIR every time, walking local
> caches
> be damned.
> 

Sure, so a dedicated readdirplus() system call could help by providing
the same kind of guidance that statx() does today.

> > > Its weird to me that we're doing exactly what XArray says not to
> > > do,
> > > hash
> > > the index, when we don't have to.
> > > 
> > > > > It also means we're amplifying the pagecache's useage for
> > > > > slightly
> > > > > changing
> > > > > directories - rather than re-using the same pages we're
> > > > > scattering
> > > > > our usage
> > > > > across the index.  Eh, maybe not a big deal if we just expect
> > > > > the
> > > > > page
> > > > > cache's LRU to do the work.
> > > > > 
> > > > 
> > > > I don't understand your point about 'not reusing'. If the user
> > > > seeks to
> > > > the same cookie, we reuse the page. However I don't know how
> > > > you
> > > > would
> > > > go about setting up a schema that allows you to seek to an
> > > > arbitrary
> > > > cookie without doing a linear search.
> > > 
> > > So when I was taking about 'reusing' a page, that's about re-
> > > filling
> > > the
> > > same pages rather than constantly conjuring new ones, which
> > > requires
> > > less of
> > > the pagecache's resources in total.  Maybe the pagecache can
> > > handle
> > > that
> > > without it negatively impacting other users of the cache that
> > > /will/
> > > re-use
> > > their cached pages, but I worry it might be irresponsible of us
> > > to
> > > fill the
> > > pagecache with pages we know we're never going to find again.
> > > 
> > 
> > In the case where the processes are reading linearly through a
> > directory that is not changing (or at least where the beginning of
> > the
> > directory is not changing), we will reuse the cached data, because
> > just
> > like in the linearly indexed case, each process ends up reading the
> > exact same sequence of cookies, and looking up the exact same
> > sequence
> > of hashes.
> > 
> > The sequences start to diverge only if they hit a part of the
> > directory
> > that is being modified. At that point, we're going to be
> > invalidating
> > page cache entries anyway with the last reader being more likely to
> > be
> > following the new sequence of cookies.
> 
> I don't think we clean up behind ourselves anymore.  Now that we are
> going
> to validate each page before using it, we don't invalidate the whole
> cache
> at any point.  That means that a divergence duplicates the pagecache
> usage
> beyond the divergence.
> 

No. I reinstated the call to nfs_revalidate_mapping() in the linux-next
branch after Olga demonstrated that NFSv3 is still troubled with crappy
mtime/ctime resolutions on the server causing directory changes to not
be reflected in the readdir cache.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  reply	other threads:[~2022-03-11 16:51 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-27 23:12 [PATCH v9 00/27] Readdir improvements trondmy
2022-02-27 23:12 ` [PATCH v9 01/27] NFS: Return valid errors from nfs2/3_decode_dirent() trondmy
2022-02-27 23:12   ` [PATCH v9 02/27] NFS: constify nfs_server_capable() and nfs_have_writebacks() trondmy
2022-02-27 23:12     ` [PATCH v9 03/27] NFS: Trace lookup revalidation failure trondmy
2022-02-27 23:12       ` [PATCH v9 04/27] NFS: Initialise the readdir verifier as best we can in nfs_opendir() trondmy
2022-02-27 23:12         ` [PATCH v9 05/27] NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context trondmy
2022-02-27 23:12           ` [PATCH v9 06/27] NFS: Calculate page offsets algorithmically trondmy
2022-02-27 23:12             ` [PATCH v9 07/27] NFS: Store the change attribute in the directory page cache trondmy
2022-02-27 23:12               ` [PATCH v9 08/27] NFS: Don't re-read the entire page cache to find the next cookie trondmy
2022-02-27 23:12                 ` [PATCH v9 09/27] NFS: Don't advance the page pointer unless the page is full trondmy
2022-02-27 23:12                   ` [PATCH v9 10/27] NFS: Adjust the amount of readahead performed by NFS readdir trondmy
2022-02-27 23:12                     ` [PATCH v9 11/27] NFS: If the cookie verifier changes, we must invalidate the page cache trondmy
2022-02-27 23:12                       ` [PATCH v9 12/27] NFS: Simplify nfs_readdir_xdr_to_array() trondmy
2022-02-27 23:12                         ` [PATCH v9 13/27] NFS: Reduce use of uncached readdir trondmy
2022-02-27 23:12                           ` [PATCH v9 14/27] NFS: Improve heuristic for readdirplus trondmy
2022-02-27 23:12                             ` [PATCH v9 15/27] NFS: Don't ask for readdirplus unless it can help nfs_getattr() trondmy
2022-02-27 23:12                               ` [PATCH v9 16/27] NFSv4: Ask for a full XDR buffer of readdir goodness trondmy
2022-02-27 23:12                                 ` [PATCH v9 17/27] NFS: Readdirplus can't help lookup for case insensitive filesystems trondmy
2022-02-27 23:12                                   ` [PATCH v9 18/27] NFS: Don't request readdirplus when revalidation was forced trondmy
2022-02-27 23:12                                     ` [PATCH v9 19/27] NFS: Add basic readdir tracing trondmy
2022-02-27 23:12                                       ` [PATCH v9 20/27] NFS: Trace effects of readdirplus on the dcache trondmy
2022-02-27 23:12                                         ` [PATCH v9 21/27] NFS: Trace effects of the readdirplus heuristic trondmy
2022-02-27 23:12                                           ` [PATCH v9 22/27] NFS: Clean up page array initialisation/free trondmy
2022-02-27 23:12                                             ` [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index trondmy
2022-02-27 23:12                                               ` [PATCH v9 24/27] NFS: Fix up forced readdirplus trondmy
2022-02-27 23:12                                                 ` [PATCH v9 25/27] NFS: Remove unnecessary cache invalidations for directories trondmy
2022-02-27 23:12                                                   ` [PATCH v9 26/27] NFS: Optimise away the previous cookie field trondmy
2022-02-27 23:12                                                     ` [PATCH v9 27/27] NFS: Cache all entries in the readdirplus reply trondmy
2022-03-09 20:01                                               ` [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index Benjamin Coddington
2022-03-09 21:03                                                 ` Benjamin Coddington
2022-03-10 21:07                                                 ` Trond Myklebust
2022-03-11 11:58                                                   ` Benjamin Coddington
2022-03-11 14:02                                                     ` Trond Myklebust
2022-03-11 16:14                                                       ` Benjamin Coddington
2022-03-11 16:51                                                         ` Trond Myklebust [this message]
2022-03-09 17:39                             ` [PATCH v9 14/27] NFS: Improve heuristic for readdirplus Benjamin Coddington
2022-03-10 14:31                               ` [PATCH] NFS: Trigger "ls -l" readdir heuristic sooner Benjamin Coddington
2022-03-16 22:25                                 ` Olga Kornievskaia
2022-03-10 20:15                               ` [PATCH v9 14/27] NFS: Improve heuristic for readdirplus Trond Myklebust
2022-03-11 11:28                                 ` Benjamin Coddington
2022-03-01 19:09               ` [PATCH v9 07/27] NFS: Store the change attribute in the directory page cache Anna Schumaker
2022-03-01 23:11                 ` Trond Myklebust
2022-03-09 13:42       ` [PATCH v9 03/27] NFS: Trace lookup revalidation failure Benjamin Coddington
2022-03-09 15:28         ` Chuck Lever III
2022-03-09 21:35           ` Benjamin Coddington
2022-03-09 21:32 ` [PATCH v9 00/27] Readdir improvements Benjamin Coddington

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2dad7c315248b6c31b6e2da5857d2c7eb410ff79.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=bcodding@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.