All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "bcodding@redhat.com" <bcodding@redhat.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v6 05/13] NFS: Improve algorithm for falling back to uncached readdir
Date: Wed, 23 Feb 2022 21:31:29 +0000	[thread overview]
Message-ID: <42417e59a1b92b1a2bfc8e775d0ff5bd1b573ed5.camel@hammerspace.com> (raw)
In-Reply-To: <22122A0F-4B12-4B85-9EDA-4A07CADBDDD8@redhat.com>

On Wed, 2022-02-23 at 08:34 -0500, Benjamin Coddington wrote:
> On 23 Feb 2022, at 7:17, Trond Myklebust wrote:
> 
> > On Tue, 2022-02-22 at 15:21 -0500, Benjamin Coddington wrote:
> > > On 22 Feb 2022, at 15:11, Trond Myklebust wrote:
> > > 
> > > > On Tue, 2022-02-22 at 07:50 -0500, Benjamin Coddington wrote:
> > > > > On 21 Feb 2022, at 18:20, Trond Myklebust wrote:
> > > > > 
> > > > > > On Mon, 2022-02-21 at 16:10 -0500, Benjamin Coddington
> > > > > > wrote:
> > > > > > > On 21 Feb 2022, at 15:55, Trond Myklebust wrote:
> > > > > > > > 
> > > > > > > > We will always need the ability to cut over to uncached
> > > > > > > > readdir.
> > > > > > > 
> > > > > > > Yes.
> > > > > > > 
> > > > > > > > If the cookie is no longer returned by the server
> > > > > > > > because
> > > > > > > > one
> > > > > > > > or more
> > > > > > > > files were deleted then we need to resolve the
> > > > > > > > situation
> > > > > > > > somehow (IOW:
> > > > > > > > the 'rm *' case). The new algorithm _does_ improve
> > > > > > > > performance
> > > > > > > > on those
> > > > > > > > situations, because it no longer requires us to read
> > > > > > > > the
> > > > > > > > entire
> > > > > > > > directory before switching over: we try 5 times, then
> > > > > > > > fail
> > > > > > > > over.
> > > > > > > 
> > > > > > > Yes, using per-page validation doesn't remove the need
> > > > > > > for
> > > > > > > uncached
> > > > > > > readdir.  It does allow a reader to simply resume filling
> > > > > > > the
> > > > > > > cache where
> > > > > > > it left off.  There's no need to try 5 times and fail
> > > > > > > over. 
> > > > > > > And
> > > > > > > there's
> > > > > > > no need to make a trade-off and make the situation worse
> > > > > > > in
> > > > > > > certain
> > > > > > > scenarios.
> > > > > > > 
> > > > > > > I thought I'd point that out and make an offer to re-
> > > > > > > submit
> > > > > > > it. 
> > > > > > > Any
> > > > > > > interest?
> > > > > > > 
> > > > > > 
> > > > > > As I recall, I had concerns about that approach. Can you
> > > > > > explain
> > > > > > again
> > > > > > how it will work?
> > > > > 
> > > > > Every page of readdir results has the directory's change attr
> > > > > stored
> > > > > on the
> > > > > page.  That, along with the page's index and the first cookie
> > > > > is
> > > > > enough
> > > > > information to determine if the page's data can be used by
> > > > > another
> > > > > process.
> > > > > 
> > > > > Which means that when the pagecache is dropped, fillers don't
> > > > > have to
> > > > > restart
> > > > > filling the cache at page index 0, they can continue to fill
> > > > > at
> > > > > whatever
> > > > > index they were at previously.  If another process finds a
> > > > > page
> > > > > that
> > > > > doesn't
> > > > > match its page index, cookie, and the current directory's
> > > > > change
> > > > > attr, the
> > > > > page is dropped and refilled from that process' indexing.
> > > > > 
> > > > > > A few of the concerns I have revolve around
> > > > > > telldir()/seekdir(). If
> > > > > > the
> > > > > > platform is 32-bit, then we cannot use cookies as the
> > > > > > telldir()
> > > > > > output,
> > > > > > and so our only option is to use offsets into the page
> > > > > > cache
> > > > > > (this
> > > > > > is
> > > > > > why this patch carves out an exception if desc->dir_cookie
> > > > > > ==
> > > > > > 0).
> > > > > > How
> > > > > > would that work with you scheme?
> > > > > 
> > > > > For 32-bit seekdir, pages are filled starting at index 0. 
> > > > > This
> > > > > is
> > > > > very
> > > > > unlikely to match other readers (unless they also do the
> > > > > _same_
> > > > > seekdir).
> > > > > 
> > > > > > Even in the 64-bit case where are able to use cookies for
> > > > > > telldir()/seekdir(), how do we determine an appropriate
> > > > > > page
> > > > > > index
> > > > > > after a seekdir()?
> > > > > 
> > > > > We don't.  Instead we start filling at index 0.  Again, the
> > > > > pagecache
> > > > > will
> > > > > only be useful to other processes that have done the same
> > > > > llseek.
> > > > > 
> > > > > This approach optimizes the pagecache for processes that are
> > > > > doing
> > > > > similar
> > > > > work, and has the added benefit of scaling well for large
> > > > > directory
> > > > > listings
> > > > > under memory pressure.  Also a number of classes of directory
> > > > > modifications
> > > > > (such as renames, or insertions/deletions at locations a
> > > > > reader
> > > > > has
> > > > > moved
> > > > > beyond) are no longer a reason to re-fill the pagecache from
> > > > > scratch.
> > > > > 
> > > > 
> > > > OK, you've got me more or less sold on it.
> > > > 
> > > > I'd still like to figure out how to improve the performance for
> > > > seekdir
> > > > (since I do have an interest in re-exporting NFS) but I've been
> > > > playing
> > > > around with a couple of patches that implement your concept and
> > > > they do
> > > > seem to work well for the common case of a linear read through
> > > > the
> > > > directory.
> > > 
> > > Nice.  I have another version from the one I originally posted:
> > > https://lore.kernel.org/linux-nfs/cover.1611160120.git.bcodding@redhat.com/
> > > 
> > > .. but I don't remember exactly the changes and it needs
> > > rebasing. 
> > > Should I
> > > rebase it against your testing branch and send the result?
> > 
> > My 2 patches did something slightly different to yours, storing the
> > change attribute in the array header instead of in page_private.
> > That
> > makes for a slightly smaller change.
> 
> I worried that increasing the size of the array header wouldn't allow
> us 
> to
> store as many entries per page.

The struct nfs_cache_array header is 24 bytes long with the change
attribute (as opposed to 16 bytes without it). This size is independent
of the architecture, assuming that 'unsigned int' is 32-bits and
unsigned char is 8-bits (as is always the case on Linux).

On a 64-bit system, A single struct nfs_cache_array_entry ends up being
32 bytes long. So in a standard 4k page with a 24-byte or 16 byte
header you will be able to cache exactly 127 cache array entries.

On a 32-bit system, the cache entry is 28 bytes long (the difference
being the pointer length), and you can pack 145 entries in the 4k page.

IOW: The change in header size makes no difference to the number of
entries you can cache, because in both cases, the header remains
smaller than a single entry.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



      reply	other threads:[~2022-02-23 21:32 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-21 16:08 [PATCH v6 00/13] Readdir improvements trondmy
2022-02-21 16:08 ` [PATCH v6 01/13] NFS: constify nfs_server_capable() and nfs_have_writebacks() trondmy
2022-02-21 16:08   ` [PATCH v6 02/13] NFS: Trace lookup revalidation failure trondmy
2022-02-21 16:08     ` [PATCH v6 03/13] NFS: Adjust the amount of readahead performed by NFS readdir trondmy
2022-02-21 16:08       ` [PATCH v6 04/13] NFS: Simplify nfs_readdir_xdr_to_array() trondmy
2022-02-21 16:08         ` [PATCH v6 05/13] NFS: Improve algorithm for falling back to uncached readdir trondmy
2022-02-21 16:08           ` [PATCH v6 06/13] NFS: Improve heuristic for readdirplus trondmy
2022-02-21 16:08             ` [PATCH v6 07/13] NFS: Don't ask for readdirplus unless it can help nfs_getattr() trondmy
2022-02-21 16:08               ` [PATCH v6 08/13] NFSv4: Ask for a full XDR buffer of readdir goodness trondmy
2022-02-21 16:08                 ` [PATCH v6 09/13] NFS: Readdirplus can't help lookup for case insensitive filesystems trondmy
2022-02-21 16:08                   ` [PATCH v6 10/13] NFS: Don't request readdirplus when revaldation was forced trondmy
2022-02-21 16:08                     ` [PATCH v6 11/13] NFS: Add basic readdir tracing trondmy
2022-02-21 16:08                       ` [PATCH v6 12/13] NFS: Trace effects of readdirplus on the dcache trondmy
2022-02-21 16:08                         ` [PATCH v6 13/13] NFS: Trace effects of the readdirplus heuristic trondmy
2022-02-23 13:40                           ` [PATCH v3 1/8] NFS: save the directory's change attribute on pagecache pages Benjamin Coddington
2022-02-23 13:40                             ` [PATCH v3 2/8] NFSv4: Send GETATTR with READDIR Benjamin Coddington
2022-02-23 13:40                               ` [PATCH v3 3/8] NFS: Add a struct to track readdir pagecache location Benjamin Coddington
2022-02-23 13:40                                 ` [PATCH v3 4/8] NFS: Keep the readdir pagecache cursor updated Benjamin Coddington
2022-02-23 13:40                                   ` [PATCH v3 5/8] NFS: readdir per-page cache validation Benjamin Coddington
2022-02-23 13:40                                     ` [PATCH v3 6/8] NFS: stash the readdir pagecache cursor on the open directory context Benjamin Coddington
2022-02-23 13:40                                       ` [PATCH v3 7/8] NFS: Support headless readdir pagecache pages Benjamin Coddington
2022-02-23 13:40                                         ` [PATCH v3 8/8] NFS: Revalidate the directory pagecache on every nfs_readdir() Benjamin Coddington
2022-02-21 16:45           ` [PATCH v6 05/13] NFS: Improve algorithm for falling back to uncached readdir Benjamin Coddington
2022-02-21 19:58             ` Trond Myklebust
2022-02-21 20:22               ` Benjamin Coddington
2022-02-21 20:55                 ` Trond Myklebust
2022-02-21 21:10                   ` Benjamin Coddington
2022-02-21 23:20                     ` Trond Myklebust
2022-02-22 12:50                       ` Benjamin Coddington
2022-02-22 20:11                         ` Trond Myklebust
2022-02-22 20:21                           ` Benjamin Coddington
2022-02-23 12:17                             ` Trond Myklebust
2022-02-23 13:34                               ` Benjamin Coddington
2022-02-23 21:31                                 ` Trond Myklebust [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42417e59a1b92b1a2bfc8e775d0ff5bd1b573ed5.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=bcodding@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.