From: Jeff Layton <jlayton@redhat.com>
To: David Howells <dhowells@redhat.com>, torvalds@linux-foundation.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Matthew Wilcox <willy@infradead.org>,
Christoph Hellwig <hch@lst.de>,
Dave Wysochanski <dwysocha@redhat.com>,
Trond Myklebust <trondmy@hammerspace.com>,
Anna Schumaker <anna.schumaker@netapp.com>,
Steve French <sfrench@samba.org>,
Eric Van Hensbergen <ericvh@gmail.com>,
linux-cachefs@redhat.com, linux-afs@lists.infradead.org,
linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org,
ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Upcoming: fscache rewrite
Date: Thu, 30 Jul 2020 09:08:32 -0400 [thread overview]
Message-ID: <2db05b3eb59bfb59688e7cb435c1b5f2096b8f8a.camel@redhat.com> (raw)
In-Reply-To: <447452.1596109876@warthog.procyon.org.uk>
On Thu, 2020-07-30 at 12:51 +0100, David Howells wrote:
> Hi Linus, Trond/Anna, Steve, Eric,
>
> I have an fscache rewrite that I'm tempted to put in for the next merge
> window:
>
> https://lore.kernel.org/linux-fsdevel/159465784033.1376674.18106463693989811037.stgit@warthog.procyon.org.uk/
>
> It improves the code by:
>
> (*) Ripping out the stuff that uses page cache snooping and kernel_write()
> and using kiocb instead. This gives multiple wins: uses async DIO rather
> than snooping for updated pages and then copying them, less VM overhead.
>
> (*) Object management is also simplified, getting rid of the state machine
> that was managing things and using a much simplified thread pool instead.
>
> (*) Object invalidation creates a tmpfile and diverts new activity to that so
> that it doesn't have to synchronise in-flight ADIO.
>
> (*) Using a bitmap stored in an xattr rather than using bmap to find out if
> a block is present in the cache. Probing the backing filesystem's
> metadata to find out is not reliable in modern extent-based filesystems
> as them may insert or remove blocks of zeros. Even SEEK_HOLE/SEEK_DATA
> are problematic since they don't distinguish transparently inserted
> bridging.
>
> I've provided a read helper that handles ->readpage, ->readpages, and
> preparatory writes in ->write_begin. Willy is looking at using this as a way
> to roll his new ->readahead op out into filesystems. A good chunk of this
> will move into MM code.
>
> The code is simpler, and this is nice too:
>
> 67 files changed, 5947 insertions(+), 8294 deletions(-)
>
> not including documentation changes, which I need to convert to rst format
> yet. That removes a whole bunch more lines.
>
> But there are reasons you might not want to take it yet:
>
> (1) It starts off by disabling fscache support in all the filesystems that
> use it: afs, nfs, cifs, ceph and 9p. I've taken care of afs, Dave
> Wysochanski has patches for nfs:
>
> https://lore.kernel.org/linux-nfs/1596031949-26793-1-git-send-email-dwysocha@redhat.com/
>
> but they haven't been reviewed by Trond or Anna yet, and Jeff Layton has
> patches for ceph:
>
> https://marc.info/?l=ceph-devel&m=159541538914631&w=2
>
> and I've briefly discussed cifs with Steve, but nothing has started there
> yet. 9p I've not looked at yet.
>
> Now, if we're okay for going a kernel release with 4/5 filesystems with
> caching disabled and then pushing the changes for individual filesystems
> through their respective trees, it might be easier.
>
> Unfortunately, I wasn't able to get together with Trond and Anna at LSF
> to discuss this.
>
> (2) The patched afs fs passed xfstests -g quick (unlike the upstream code
> that oopses pretty quickly with caching enabled). Dave and Jeff's nfs
> and ceph code is getting close, but not quite there yet.
That was my experience on cephfs+fscache too -- it often crashed down in
the fscache code. I'd support the approach in (1) above -- put this in
soon and disable the caches in the filesystems. Then push the changes to
reenable it via fs-specific trees.
The ceph patch series is more or less ready. It passes all of the
xfstest "quick" group run (aside from a few expected failures on
cephfs).
The only real exception is generic/531, which seems to trigger OOM kills
in my testing. The test tries to create a ton of files and leak the file
descriptors. I tend to think that workload is pretty unusual, and given
that fscache was terribly unstable and crashed before, this is still a
marked improvement.
> (3) Al has objections to the ITER_MAPPING iov_iter type that I added
>
> https://lore.kernel.org/linux-fsdevel/20200719014436.GG2786714@ZenIV.linux.org.uk/
>
> but note that iov_iter_for_each_range() is not actually used by anything.
>
> However, Willy likes it and would prefer to make it ITER_XARRAY instead
> as he might be able to use it in other places, though there's an issue
> where I'm calling find_get_pages_contig() which takes a mapping (though
> all it does is then get the xarray out of it).
>
> Instead I would have to use ITER_BVEC, which has quite a high overhead,
> though it would mean that the RCU read lock wouldn't be necessary. This
> would require 1K of memory for every 256K block the cache wants to read;
> for any read >1M, I'd have to use vmalloc() instead.
>
> I'd also prefer not to use ITER_BVEC because the offset and length are
> superfluous here. If ITER_MAPPING is not good, would it be possible to
> have an ITER_PAGEARRAY that just takes a page array instead? Or, even,
> create a transient xarray?
>
> (4) The way object culling is managed needs overhauling too, but that's a
> whole 'nother patchset. We could wait till that's done too, but its lack
> doesn't prevent what we have now being used.
>
> Thoughts?
>
> David
>
--
Jeff Layton <jlayton@redhat.com>
next prev parent reply other threads:[~2020-07-30 13:08 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-30 11:51 Upcoming: fscache rewrite David Howells
2020-07-30 12:16 ` Matthew Wilcox
2020-07-30 12:36 ` David Howells
2020-07-30 13:08 ` Jeff Layton [this message]
2020-08-03 16:30 ` [GIT PULL] " David Howells
2020-08-10 15:16 ` [GIT PULL] fscache rewrite -- please drop for now David Howells
2020-08-10 15:34 ` Steve French
2020-08-10 15:48 ` David Howells
2020-08-10 16:09 ` Steve French
2020-08-10 16:35 ` David Wysochanski
2020-08-10 17:06 ` Jeff Layton
2020-08-17 19:07 ` Steven French
2020-08-10 16:40 ` Christoph Hellwig
2020-08-27 15:28 ` David Howells
2020-08-27 16:18 ` Dominique Martinet
2020-08-27 17:14 ` David Howells
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2db05b3eb59bfb59688e7cb435c1b5f2096b8f8a.camel@redhat.com \
--to=jlayton@redhat.com \
--cc=anna.schumaker@netapp.com \
--cc=ceph-devel@vger.kernel.org \
--cc=dhowells@redhat.com \
--cc=dwysocha@redhat.com \
--cc=ericvh@gmail.com \
--cc=hch@lst.de \
--cc=linux-afs@lists.infradead.org \
--cc=linux-cachefs@redhat.com \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=sfrench@samba.org \
--cc=torvalds@linux-foundation.org \
--cc=trondmy@hammerspace.com \
--cc=v9fs-developer@lists.sourceforge.net \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).