linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: torvalds@linux-foundation.org
Cc: dhowells@redhat.com, Alexander Viro <viro@zeniv.linux.org.uk>,
	Matthew Wilcox <willy@infradead.org>,
	Christoph Hellwig <hch@lst.de>, Jeff Layton <jlayton@redhat.com>,
	Dave Wysochanski <dwysocha@redhat.com>,
	Trond Myklebust <trondmy@hammerspace.com>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	Steve French <sfrench@samba.org>,
	Eric Van Hensbergen <ericvh@gmail.com>,
	linux-cachefs@redhat.com, linux-afs@lists.infradead.org,
	linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org,
	ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Upcoming: fscache rewrite
Date: Thu, 30 Jul 2020 12:51:16 +0100	[thread overview]
Message-ID: <447452.1596109876@warthog.procyon.org.uk> (raw)

Hi Linus, Trond/Anna, Steve, Eric,

I have an fscache rewrite that I'm tempted to put in for the next merge
window:

	https://lore.kernel.org/linux-fsdevel/159465784033.1376674.18106463693989811037.stgit@warthog.procyon.org.uk/

It improves the code by:

 (*) Ripping out the stuff that uses page cache snooping and kernel_write()
     and using kiocb instead.  This gives multiple wins: uses async DIO rather
     than snooping for updated pages and then copying them, less VM overhead.

 (*) Object management is also simplified, getting rid of the state machine
     that was managing things and using a much simplified thread pool instead.

 (*) Object invalidation creates a tmpfile and diverts new activity to that so
     that it doesn't have to synchronise in-flight ADIO.

 (*) Using a bitmap stored in an xattr rather than using bmap to find out if
     a block is present in the cache.  Probing the backing filesystem's
     metadata to find out is not reliable in modern extent-based filesystems
     as them may insert or remove blocks of zeros.  Even SEEK_HOLE/SEEK_DATA
     are problematic since they don't distinguish transparently inserted
     bridging.

I've provided a read helper that handles ->readpage, ->readpages, and
preparatory writes in ->write_begin.  Willy is looking at using this as a way
to roll his new ->readahead op out into filesystems.  A good chunk of this
will move into MM code.

The code is simpler, and this is nice too:

 67 files changed, 5947 insertions(+), 8294 deletions(-)

not including documentation changes, which I need to convert to rst format
yet.  That removes a whole bunch more lines.

But there are reasons you might not want to take it yet:

 (1) It starts off by disabling fscache support in all the filesystems that
     use it: afs, nfs, cifs, ceph and 9p.  I've taken care of afs, Dave
     Wysochanski has patches for nfs:

	https://lore.kernel.org/linux-nfs/1596031949-26793-1-git-send-email-dwysocha@redhat.com/

     but they haven't been reviewed by Trond or Anna yet, and Jeff Layton has
     patches for ceph:

	https://marc.info/?l=ceph-devel&m=159541538914631&w=2

     and I've briefly discussed cifs with Steve, but nothing has started there
     yet.  9p I've not looked at yet.

     Now, if we're okay for going a kernel release with 4/5 filesystems with
     caching disabled and then pushing the changes for individual filesystems
     through their respective trees, it might be easier.

     Unfortunately, I wasn't able to get together with Trond and Anna at LSF
     to discuss this.

 (2) The patched afs fs passed xfstests -g quick (unlike the upstream code
     that oopses pretty quickly with caching enabled).  Dave and Jeff's nfs
     and ceph code is getting close, but not quite there yet.

 (3) Al has objections to the ITER_MAPPING iov_iter type that I added

	https://lore.kernel.org/linux-fsdevel/20200719014436.GG2786714@ZenIV.linux.org.uk/

     but note that iov_iter_for_each_range() is not actually used by anything.

     However, Willy likes it and would prefer to make it ITER_XARRAY instead
     as he might be able to use it in other places, though there's an issue
     where I'm calling find_get_pages_contig() which takes a mapping (though
     all it does is then get the xarray out of it).

     Instead I would have to use ITER_BVEC, which has quite a high overhead,
     though it would mean that the RCU read lock wouldn't be necessary.  This
     would require 1K of memory for every 256K block the cache wants to read;
     for any read >1M, I'd have to use vmalloc() instead.

     I'd also prefer not to use ITER_BVEC because the offset and length are
     superfluous here.  If ITER_MAPPING is not good, would it be possible to
     have an ITER_PAGEARRAY that just takes a page array instead?  Or, even,
     create a transient xarray?

 (4) The way object culling is managed needs overhauling too, but that's a
     whole 'nother patchset.  We could wait till that's done too, but its lack
     doesn't prevent what we have now being used.

Thoughts?

David


             reply	other threads:[~2020-07-30 11:51 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-30 11:51 David Howells [this message]
2020-07-30 12:16 ` Upcoming: fscache rewrite Matthew Wilcox
2020-07-30 12:36 ` David Howells
2020-07-30 13:08 ` Jeff Layton
2020-08-03 16:30 ` [GIT PULL] " David Howells
2020-08-10 15:16 ` [GIT PULL] fscache rewrite -- please drop for now David Howells
2020-08-10 15:34   ` Steve French
2020-08-10 15:48   ` David Howells
2020-08-10 16:09     ` Steve French
2020-08-10 16:35     ` David Wysochanski
2020-08-10 17:06       ` Jeff Layton
2020-08-17 19:07         ` Steven French
2020-08-10 16:40   ` Christoph Hellwig
2020-08-27 15:28   ` David Howells
2020-08-27 16:18     ` Dominique Martinet
2020-08-27 17:14     ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=447452.1596109876@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=anna.schumaker@netapp.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=dwysocha@redhat.com \
    --cc=ericvh@gmail.com \
    --cc=hch@lst.de \
    --cc=jlayton@redhat.com \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=sfrench@samba.org \
    --cc=torvalds@linux-foundation.org \
    --cc=trondmy@hammerspace.com \
    --cc=v9fs-developer@lists.sourceforge.net \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).