All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/32] fscache: Rewrite 2: Make the I/O interface use kiocb/iov_iter
@ 2020-07-13 16:30 ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:30 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel


Here's a set of patches that massively overhauls the object lifecycle
management and the I/O API of the local caching for network filesystems code
for a reduction of about 3000 LoC and a 1000 line Documentation reduction.
The ability to use async DIO to pass data to/from the cache gives a huge speed
bonus and uses less memory.

So far, only AFS has been ported to use this.  This series disabled fscache
support in all other filesystems that use it.  I'm working on NFS, but it's
tricky and I could use some help.

The following parts have been removed:

    - The object state machine
    - The I/O operation manager
    - All non-transient references from fscache to the netfs's data
    - All non-transient callbacks from fscache to the netfs
    - The backing page I/O monitoring
    - The tracking of netfs pages that fscache knows about
    - The tracking of netfs pages that need writing to the cache
    - The use of bmap to work out if a page is stored in the cache
    - The copy of data to/from backing pages to netfs pages.

Instead, the I/O to the cache is driven much more from the netfs.  There are a
number of aspects to the I/O API:

 (1) The lowest level I/O primitives just take an iov_iter and start async
     DIO on the cache objects.  The caller gets a callback upon completion.
     The PG_fscache bit is now just used to indicate that there's a write
     to the cache in progress.  The cache will keep track in xattrs as to
     what areas of the backing file are occupied.

     - struct fscache_io_request
     - fscache_read() and fscache_write().

 (2) The cache may require that the I/O fulfil certain alignment and length
     granularity constraints.  This, and whether the cache contains the
     desired data, can be queried.

     - struct fscache_request_shape
     - fscache_shape_request()

 (3) The cookie that's obtained when an inode is set up must be 'used' when a
     file is opened (with an indication as to whether it might be modified)
     and 'unused' when it is done with.  At the point of unuse, the auxdata
     and file size can be specified.

     - fscache_use_cookie(), fscache_unuse_cookie()

 (4) The cookie can be invalidated at any time, and new auxiliary data and a
     new size provided.  Any in-progress I/O will either cause new I/O to
     wait, or a replacement tmpfile will be created and the in-progress I/O
     will just be abandoned.  The on-disk auxdata (in xattrs, say) are updated
     lazily.

     - fscache_invalidate()

 (5) A helper API for reads is provided to combine the (1), (2) above and
     read_cache_pages() and also do read-(re)issue to the network in the case
     that the data isn't present or the cache fails.  This requires that an
     operation descriptor be allocated and given some operations.  This needs
     to be used for ->readpage(), ->readpages() and prefetching for
     ->write_begin().

     - struct fscache_io_request
     - fscache_read_helper()

I've also simplified the cookie management API to remove struct
fscache_cookie_def.  Instead, the pertinent details are supplied when a cookie
is created and the file size, key and auxdata are stored in the cookie.
Callbacks and backpointers are simply removed.

I've added some pieces outside of the API also:

 (1) An inode flag to mark a backing cachefile as being in use by the kernel.
     This prevents multiple caches mounted in the same directory from fighting
     over the same files.  It can also be extended to exclude other kernel
     users (such as swap) and could also be used to prevent userspace
     interfering with the file.

 (2) A new I/O iterator class, ITER_MAPPING, that iterates over the specified
     byte range of a mapping.  The caller is required to make sure that the
     pages don't evaporate under the callee (eg. pinning them by PG_locked,
     PG_writeback, PG_fscache or usage count).

     It may make sense to make this more generic and just take an xarray that
     is known to be laden with pages.  The one tricky bit there is that the
     iteration routines call find_get_page_contig() - though the only thing
     the mapping is used for is to find the xarray containing the pages.

     This is better than an ITER_BVEC as no allocation of bio_vec structs is
     required since the xarray holds pointers to all the pages involved.

 (3) Wait and unlock functions for PG_fscache.  These are in the core, so no
     need to call into fscache for it.

So, to be done hopefully before the next merge window:

 (1) Need to investigate whether I need to be lazy, but only to an extent,
     about closing up backing files.

 (2) Make NFS, CIFS, Ceph, 9P work with it.  Jeff Layton is working on Ceph
     and Dave Wysochanski is working on NFS.

And beyond that:

 (3) Put in support for versioned monolithic objects (eg. AFS directories).

 (4) Currently it cachefiles only caches large files up to 1GiB.  File data
     beyond that isn't cached.  The problem here is that I'm using an xattr to
     hold the content map, and xattrs may be limited in size and I've limited
     myself to using a 512 byte xattr.  I can easily make it cache a
     non-sparse file of any size with no map, but as soon as it becomes
     sparse, I need a different strategy.

 (5) Change the indexing strategy so that the culling mechanism is brought
     into the kernel, rather than doing that in userspace, and use an index
     table of files with a LRU list.

These patches can be found also on:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter

David
---
David Howells (1):
      cachefiles: Shape write requests


 fs/9p/Kconfig                          |    2 +-
 fs/Makefile                            |    2 +-
 fs/afs/Kconfig                         |    1 +
 fs/afs/cache.c                         |   54 --
 fs/afs/cell.c                          |    9 +-
 fs/afs/dir.c                           |  242 +++++--
 fs/afs/file.c                          |  575 +++++++--------
 fs/afs/fs_operation.c                  |    4 +-
 fs/afs/fsclient.c                      |  154 ++--
 fs/afs/inode.c                         |   57 +-
 fs/afs/internal.h                      |   58 +-
 fs/afs/rxrpc.c                         |  150 ++--
 fs/afs/volume.c                        |    9 +-
 fs/afs/write.c                         |  413 +++++++----
 fs/afs/yfsclient.c                     |  113 ++-
 fs/cachefiles/Makefile                 |    3 +-
 fs/cachefiles/bind.c                   |   11 +-
 fs/cachefiles/content-map.c            |  476 ++++++++++++
 fs/cachefiles/daemon.c                 |   10 +-
 fs/cachefiles/interface.c              |  564 ++++++++-------
 fs/cachefiles/internal.h               |  139 ++--
 fs/cachefiles/io.c                     |  279 +++++++
 fs/cachefiles/main.c                   |   12 +-
 fs/cachefiles/namei.c                  |  508 +++++--------
 fs/cachefiles/rdwr.c                   |  974 -------------------------
 fs/cachefiles/xattr.c                  |  263 +++----
 fs/ceph/Kconfig                        |    2 +-
 fs/cifs/Kconfig                        |    2 +-
 fs/fscache/Kconfig                     |    8 +
 fs/fscache/Makefile                    |   10 +-
 fs/fscache/cache.c                     |  136 ++--
 fs/fscache/cookie.c                    |  769 ++++++++------------
 fs/fscache/dispatcher.c                |  150 ++++
 fs/fscache/fsdef.c                     |   56 +-
 fs/fscache/histogram.c                 |    2 +-
 fs/fscache/internal.h                  |  260 +++----
 fs/fscache/io.c                        |  201 +++++
 fs/fscache/main.c                      |   35 +-
 fs/fscache/netfs.c                     |   10 +-
 fs/fscache/obj.c                       |  345 +++++++++
 fs/fscache/object-list.c               |  129 +---
 fs/fscache/object.c                    | 1133 -----------------------------
 fs/fscache/object_bits.c               |  120 +++
 fs/fscache/operation.c                 |  633 ----------------
 fs/fscache/page.c                      | 1248 --------------------------------
 fs/fscache/proc.c                      |   13 +-
 fs/fscache/read_helper.c               |  688 ++++++++++++++++++
 fs/fscache/stats.c                     |  265 +++----
 fs/internal.h                          |    5 -
 fs/nfs/Kconfig                         |    2 +-
 fs/nfs/fscache-index.c                 |    4 +-
 fs/read_write.c                        |    1 +
 include/linux/fs.h                     |    2 +
 include/linux/fscache-cache.h          |  508 +++----------
 include/linux/fscache-obsolete.h       |   13 +
 include/linux/fscache.h                |  814 ++++++++-------------
 include/linux/mm.h                     |    1 +
 include/linux/pagemap.h                |   14 +
 include/linux/uio.h                    |   11 +
 include/net/af_rxrpc.h                 |    2 +-
 include/trace/events/afs.h             |   51 +-
 include/trace/events/cachefiles.h      |  285 ++++++--
 include/trace/events/fscache.h         |  421 ++---------
 include/trace/events/fscache_support.h |   91 +++
 lib/iov_iter.c                         |  286 +++++++-
 mm/filemap.c                           |   27 +-
 net/rxrpc/recvmsg.c                    |    9 +-
 67 files changed, 5598 insertions(+), 8246 deletions(-)
 create mode 100644 fs/cachefiles/content-map.c
 create mode 100644 fs/cachefiles/io.c
 delete mode 100644 fs/cachefiles/rdwr.c
 create mode 100644 fs/fscache/dispatcher.c
 create mode 100644 fs/fscache/io.c
 create mode 100644 fs/fscache/obj.c
 delete mode 100644 fs/fscache/object.c
 create mode 100644 fs/fscache/object_bits.c
 delete mode 100644 fs/fscache/operation.c
 delete mode 100644 fs/fscache/page.c
 create mode 100644 fs/fscache/read_helper.c
 create mode 100644 include/linux/fscache-obsolete.h
 create mode 100644 include/trace/events/fscache_support.h



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 00/32] fscache: Rewrite 2: Make the I/O interface use kiocb/iov_iter
@ 2020-07-13 16:30 ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:30 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs


Here's a set of patches that massively overhauls the object lifecycle
management and the I/O API of the local caching for network filesystems code
for a reduction of about 3000 LoC and a 1000 line Documentation reduction.
The ability to use async DIO to pass data to/from the cache gives a huge speed
bonus and uses less memory.

So far, only AFS has been ported to use this.  This series disabled fscache
support in all other filesystems that use it.  I'm working on NFS, but it's
tricky and I could use some help.

The following parts have been removed:

    - The object state machine
    - The I/O operation manager
    - All non-transient references from fscache to the netfs's data
    - All non-transient callbacks from fscache to the netfs
    - The backing page I/O monitoring
    - The tracking of netfs pages that fscache knows about
    - The tracking of netfs pages that need writing to the cache
    - The use of bmap to work out if a page is stored in the cache
    - The copy of data to/from backing pages to netfs pages.

Instead, the I/O to the cache is driven much more from the netfs.  There are a
number of aspects to the I/O API:

 (1) The lowest level I/O primitives just take an iov_iter and start async
     DIO on the cache objects.  The caller gets a callback upon completion.
     The PG_fscache bit is now just used to indicate that there's a write
     to the cache in progress.  The cache will keep track in xattrs as to
     what areas of the backing file are occupied.

     - struct fscache_io_request
     - fscache_read() and fscache_write().

 (2) The cache may require that the I/O fulfil certain alignment and length
     granularity constraints.  This, and whether the cache contains the
     desired data, can be queried.

     - struct fscache_request_shape
     - fscache_shape_request()

 (3) The cookie that's obtained when an inode is set up must be 'used' when a
     file is opened (with an indication as to whether it might be modified)
     and 'unused' when it is done with.  At the point of unuse, the auxdata
     and file size can be specified.

     - fscache_use_cookie(), fscache_unuse_cookie()

 (4) The cookie can be invalidated at any time, and new auxiliary data and a
     new size provided.  Any in-progress I/O will either cause new I/O to
     wait, or a replacement tmpfile will be created and the in-progress I/O
     will just be abandoned.  The on-disk auxdata (in xattrs, say) are updated
     lazily.

     - fscache_invalidate()

 (5) A helper API for reads is provided to combine the (1), (2) above and
     read_cache_pages() and also do read-(re)issue to the network in the case
     that the data isn't present or the cache fails.  This requires that an
     operation descriptor be allocated and given some operations.  This needs
     to be used for ->readpage(), ->readpages() and prefetching for
     ->write_begin().

     - struct fscache_io_request
     - fscache_read_helper()

I've also simplified the cookie management API to remove struct
fscache_cookie_def.  Instead, the pertinent details are supplied when a cookie
is created and the file size, key and auxdata are stored in the cookie.
Callbacks and backpointers are simply removed.

I've added some pieces outside of the API also:

 (1) An inode flag to mark a backing cachefile as being in use by the kernel.
     This prevents multiple caches mounted in the same directory from fighting
     over the same files.  It can also be extended to exclude other kernel
     users (such as swap) and could also be used to prevent userspace
     interfering with the file.

 (2) A new I/O iterator class, ITER_MAPPING, that iterates over the specified
     byte range of a mapping.  The caller is required to make sure that the
     pages don't evaporate under the callee (eg. pinning them by PG_locked,
     PG_writeback, PG_fscache or usage count).

     It may make sense to make this more generic and just take an xarray that
     is known to be laden with pages.  The one tricky bit there is that the
     iteration routines call find_get_page_contig() - though the only thing
     the mapping is used for is to find the xarray containing the pages.

     This is better than an ITER_BVEC as no allocation of bio_vec structs is
     required since the xarray holds pointers to all the pages involved.

 (3) Wait and unlock functions for PG_fscache.  These are in the core, so no
     need to call into fscache for it.

So, to be done hopefully before the next merge window:

 (1) Need to investigate whether I need to be lazy, but only to an extent,
     about closing up backing files.

 (2) Make NFS, CIFS, Ceph, 9P work with it.  Jeff Layton is working on Ceph
     and Dave Wysochanski is working on NFS.

And beyond that:

 (3) Put in support for versioned monolithic objects (eg. AFS directories).

 (4) Currently it cachefiles only caches large files up to 1GiB.  File data
     beyond that isn't cached.  The problem here is that I'm using an xattr to
     hold the content map, and xattrs may be limited in size and I've limited
     myself to using a 512 byte xattr.  I can easily make it cache a
     non-sparse file of any size with no map, but as soon as it becomes
     sparse, I need a different strategy.

 (5) Change the indexing strategy so that the culling mechanism is brought
     into the kernel, rather than doing that in userspace, and use an index
     table of files with a LRU list.

These patches can be found also on:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter

David
---
David Howells (1):
      cachefiles: Shape write requests


 fs/9p/Kconfig                          |    2 +-
 fs/Makefile                            |    2 +-
 fs/afs/Kconfig                         |    1 +
 fs/afs/cache.c                         |   54 --
 fs/afs/cell.c                          |    9 +-
 fs/afs/dir.c                           |  242 +++++--
 fs/afs/file.c                          |  575 +++++++--------
 fs/afs/fs_operation.c                  |    4 +-
 fs/afs/fsclient.c                      |  154 ++--
 fs/afs/inode.c                         |   57 +-
 fs/afs/internal.h                      |   58 +-
 fs/afs/rxrpc.c                         |  150 ++--
 fs/afs/volume.c                        |    9 +-
 fs/afs/write.c                         |  413 +++++++----
 fs/afs/yfsclient.c                     |  113 ++-
 fs/cachefiles/Makefile                 |    3 +-
 fs/cachefiles/bind.c                   |   11 +-
 fs/cachefiles/content-map.c            |  476 ++++++++++++
 fs/cachefiles/daemon.c                 |   10 +-
 fs/cachefiles/interface.c              |  564 ++++++++-------
 fs/cachefiles/internal.h               |  139 ++--
 fs/cachefiles/io.c                     |  279 +++++++
 fs/cachefiles/main.c                   |   12 +-
 fs/cachefiles/namei.c                  |  508 +++++--------
 fs/cachefiles/rdwr.c                   |  974 -------------------------
 fs/cachefiles/xattr.c                  |  263 +++----
 fs/ceph/Kconfig                        |    2 +-
 fs/cifs/Kconfig                        |    2 +-
 fs/fscache/Kconfig                     |    8 +
 fs/fscache/Makefile                    |   10 +-
 fs/fscache/cache.c                     |  136 ++--
 fs/fscache/cookie.c                    |  769 ++++++++------------
 fs/fscache/dispatcher.c                |  150 ++++
 fs/fscache/fsdef.c                     |   56 +-
 fs/fscache/histogram.c                 |    2 +-
 fs/fscache/internal.h                  |  260 +++----
 fs/fscache/io.c                        |  201 +++++
 fs/fscache/main.c                      |   35 +-
 fs/fscache/netfs.c                     |   10 +-
 fs/fscache/obj.c                       |  345 +++++++++
 fs/fscache/object-list.c               |  129 +---
 fs/fscache/object.c                    | 1133 -----------------------------
 fs/fscache/object_bits.c               |  120 +++
 fs/fscache/operation.c                 |  633 ----------------
 fs/fscache/page.c                      | 1248 --------------------------------
 fs/fscache/proc.c                      |   13 +-
 fs/fscache/read_helper.c               |  688 ++++++++++++++++++
 fs/fscache/stats.c                     |  265 +++----
 fs/internal.h                          |    5 -
 fs/nfs/Kconfig                         |    2 +-
 fs/nfs/fscache-index.c                 |    4 +-
 fs/read_write.c                        |    1 +
 include/linux/fs.h                     |    2 +
 include/linux/fscache-cache.h          |  508 +++----------
 include/linux/fscache-obsolete.h       |   13 +
 include/linux/fscache.h                |  814 ++++++++-------------
 include/linux/mm.h                     |    1 +
 include/linux/pagemap.h                |   14 +
 include/linux/uio.h                    |   11 +
 include/net/af_rxrpc.h                 |    2 +-
 include/trace/events/afs.h             |   51 +-
 include/trace/events/cachefiles.h      |  285 ++++++--
 include/trace/events/fscache.h         |  421 ++---------
 include/trace/events/fscache_support.h |   91 +++
 lib/iov_iter.c                         |  286 +++++++-
 mm/filemap.c                           |   27 +-
 net/rxrpc/recvmsg.c                    |    9 +-
 67 files changed, 5598 insertions(+), 8246 deletions(-)
 create mode 100644 fs/cachefiles/content-map.c
 create mode 100644 fs/cachefiles/io.c
 delete mode 100644 fs/cachefiles/rdwr.c
 create mode 100644 fs/fscache/dispatcher.c
 create mode 100644 fs/fscache/io.c
 create mode 100644 fs/fscache/obj.c
 delete mode 100644 fs/fscache/object.c
 create mode 100644 fs/fscache/object_bits.c
 delete mode 100644 fs/fscache/operation.c
 delete mode 100644 fs/fscache/page.c
 create mode 100644 fs/fscache/read_helper.c
 create mode 100644 include/linux/fscache-obsolete.h
 create mode 100644 include/trace/events/fscache_support.h

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 01/32] iov_iter: Add ITER_MAPPING
  2020-07-13 16:30 ` David Howells
@ 2020-07-13 16:30   ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:30 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Add an iterator, ITER_MAPPING, that walks through a set of pages attached
to an address_space, starting at a given page and offset and walking for
the specified amount of bytes.

The caller must guarantee that the pages are all present and they must be
locked using PG_locked, PG_writeback or PG_fscache to prevent them from
going away or being migrated whilst they're being accessed.

This is useful for copying data from socket buffers to inodes in network
filesystems and for transferring data between those inodes and the cache
using direct I/O.

Whilst it is true that ITER_BVEC could be used instead, that would require
a bio_vec array to be allocated to refer to all the pages - which should be
redundant if inode->i_pages also points to all these pages.

This could also be turned into an ITER_XARRAY, taking and xarray pointer
instead of a mapping pointer.  It would be mostly trivial, except for the
use of find_get_pages_contig() by iov_iter_get_pages*().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
---

 include/linux/uio.h |   11 ++
 lib/iov_iter.c      |  286 +++++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 274 insertions(+), 23 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 9576fd8158d7..a0321a740f51 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -11,6 +11,7 @@
 #include <uapi/linux/uio.h>
 
 struct page;
+struct address_space;
 struct pipe_inode_info;
 
 struct kvec {
@@ -25,6 +26,7 @@ enum iter_type {
 	ITER_BVEC = 16,
 	ITER_PIPE = 32,
 	ITER_DISCARD = 64,
+	ITER_MAPPING = 128,
 };
 
 struct iov_iter {
@@ -40,6 +42,7 @@ struct iov_iter {
 		const struct iovec *iov;
 		const struct kvec *kvec;
 		const struct bio_vec *bvec;
+		struct address_space *mapping;
 		struct pipe_inode_info *pipe;
 	};
 	union {
@@ -48,6 +51,7 @@ struct iov_iter {
 			unsigned int head;
 			unsigned int start_head;
 		};
+		loff_t mapping_start;
 	};
 };
 
@@ -81,6 +85,11 @@ static inline bool iov_iter_is_discard(const struct iov_iter *i)
 	return iov_iter_type(i) == ITER_DISCARD;
 }
 
+static inline bool iov_iter_is_mapping(const struct iov_iter *i)
+{
+	return iov_iter_type(i) == ITER_MAPPING;
+}
+
 static inline unsigned char iov_iter_rw(const struct iov_iter *i)
 {
 	return i->type & (READ | WRITE);
@@ -222,6 +231,8 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_
 void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode_info *pipe,
 			size_t count);
 void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count);
+void iov_iter_mapping(struct iov_iter *i, unsigned int direction, struct address_space *mapping,
+		      loff_t start, size_t count);
 ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
 			size_t maxsize, unsigned maxpages, size_t *start);
 ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index bf538c2bec77..e4a073523b76 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -75,7 +75,40 @@
 	}						\
 }
 
-#define iterate_all_kinds(i, n, v, I, B, K) {			\
+#define iterate_mapping(i, n, __v, skip, STEP) {		\
+	struct page *page;					\
+	size_t wanted = n, seg, offset;				\
+	loff_t start = i->mapping_start + skip;			\
+	pgoff_t index = start >> PAGE_SHIFT;			\
+								\
+	XA_STATE(xas, &i->mapping->i_pages, index);		\
+								\
+	rcu_read_lock();						\
+	for (page = xas_load(&xas); page; page = xas_next(&xas)) {	\
+		if (xas_retry(&xas, page))				\
+			continue;					\
+		if (WARN_ON(xa_is_value(page)))				\
+			break;						\
+		if (WARN_ON(PageHuge(page)))				\
+			break;						\
+		if (!page)						\
+			break;						\
+		__v.bv_page = find_subpage(page, xas.xa_index);		\
+		offset = (i->mapping_start + skip) & ~PAGE_MASK;	\
+		seg = PAGE_SIZE - offset;			\
+		__v.bv_offset = offset;				\
+		__v.bv_len = min(n, seg);			\
+		(void)(STEP);					\
+		n -= __v.bv_len;				\
+		skip += __v.bv_len;				\
+		if (n == 0)					\
+			break;					\
+	}							\
+	rcu_read_unlock();					\
+	n = wanted - n;						\
+}
+
+#define iterate_all_kinds(i, n, v, I, B, K, M) {		\
 	if (likely(n)) {					\
 		size_t skip = i->iov_offset;			\
 		if (unlikely(i->type & ITER_BVEC)) {		\
@@ -87,6 +120,9 @@
 			struct kvec v;				\
 			iterate_kvec(i, n, v, kvec, skip, (K))	\
 		} else if (unlikely(i->type & ITER_DISCARD)) {	\
+		} else if (unlikely(i->type & ITER_MAPPING)) {	\
+			struct bio_vec v;			\
+			iterate_mapping(i, n, v, skip, (M));	\
 		} else {					\
 			const struct iovec *iov;		\
 			struct iovec v;				\
@@ -95,7 +131,7 @@
 	}							\
 }
 
-#define iterate_and_advance(i, n, v, I, B, K) {			\
+#define iterate_and_advance(i, n, v, I, B, K, M) {		\
 	if (unlikely(i->count < n))				\
 		n = i->count;					\
 	if (i->count) {						\
@@ -120,6 +156,9 @@
 			i->kvec = kvec;				\
 		} else if (unlikely(i->type & ITER_DISCARD)) {	\
 			skip += n;				\
+		} else if (unlikely(i->type & ITER_MAPPING)) {	\
+			struct bio_vec v;			\
+			iterate_mapping(i, n, v, skip, (M))	\
 		} else {					\
 			const struct iovec *iov;		\
 			struct iovec v;				\
@@ -629,7 +668,9 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 		copyout(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len),
 		memcpy_to_page(v.bv_page, v.bv_offset,
 			       (from += v.bv_len) - v.bv_len, v.bv_len),
-		memcpy(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len)
+		memcpy(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len),
+		memcpy_to_page(v.bv_page, v.bv_offset,
+			       (from += v.bv_len) - v.bv_len, v.bv_len)
 	)
 
 	return bytes;
@@ -747,6 +788,15 @@ size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i)
 			bytes = curr_addr - s_addr - rem;
 			return bytes;
 		}
+		}),
+		({
+		rem = memcpy_mcsafe_to_page(v.bv_page, v.bv_offset,
+                               (from += v.bv_len) - v.bv_len, v.bv_len);
+		if (rem) {
+			curr_addr = (unsigned long) from;
+			bytes = curr_addr - s_addr - rem;
+			return bytes;
+		}
 		})
 	)
 
@@ -768,7 +818,9 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 		copyin((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
 		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
 				 v.bv_offset, v.bv_len),
-		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
+		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
+		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len)
 	)
 
 	return bytes;
@@ -794,7 +846,9 @@ bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
 		0;}),
 		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
 				 v.bv_offset, v.bv_len),
-		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
+		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
+		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len)
 	)
 
 	iov_iter_advance(i, bytes);
@@ -814,7 +868,9 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
 					 v.iov_base, v.iov_len),
 		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
 				 v.bv_offset, v.bv_len),
-		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
+		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
+		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len)
 	)
 
 	return bytes;
@@ -849,7 +905,9 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
 		memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page,
 				 v.bv_offset, v.bv_len),
 		memcpy_flushcache((to += v.iov_len) - v.iov_len, v.iov_base,
-			v.iov_len)
+			v.iov_len),
+		memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len)
 	)
 
 	return bytes;
@@ -873,7 +931,9 @@ bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
 		0;}),
 		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
 				 v.bv_offset, v.bv_len),
-		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
+		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
+		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len)
 	)
 
 	iov_iter_advance(i, bytes);
@@ -910,7 +970,7 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
 {
 	if (unlikely(!page_copy_sane(page, offset, bytes)))
 		return 0;
-	if (i->type & (ITER_BVEC|ITER_KVEC)) {
+	if (i->type & (ITER_BVEC | ITER_KVEC | ITER_MAPPING)) {
 		void *kaddr = kmap_atomic(page);
 		size_t wanted = copy_to_iter(kaddr + offset, bytes, i);
 		kunmap_atomic(kaddr);
@@ -933,7 +993,7 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
 		WARN_ON(1);
 		return 0;
 	}
-	if (i->type & (ITER_BVEC|ITER_KVEC)) {
+	if (i->type & (ITER_BVEC | ITER_KVEC | ITER_MAPPING)) {
 		void *kaddr = kmap_atomic(page);
 		size_t wanted = _copy_from_iter(kaddr + offset, bytes, i);
 		kunmap_atomic(kaddr);
@@ -977,7 +1037,8 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
 	iterate_and_advance(i, bytes, v,
 		clear_user(v.iov_base, v.iov_len),
 		memzero_page(v.bv_page, v.bv_offset, v.bv_len),
-		memset(v.iov_base, 0, v.iov_len)
+		memset(v.iov_base, 0, v.iov_len),
+		memzero_page(v.bv_page, v.bv_offset, v.bv_len)
 	)
 
 	return bytes;
@@ -1001,7 +1062,9 @@ size_t iov_iter_copy_from_user_atomic(struct page *page,
 		copyin((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
 		memcpy_from_page((p += v.bv_len) - v.bv_len, v.bv_page,
 				 v.bv_offset, v.bv_len),
-		memcpy((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
+		memcpy((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
+		memcpy_from_page((p += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len)
 	)
 	kunmap_atomic(kaddr);
 	return bytes;
@@ -1072,7 +1135,13 @@ void iov_iter_advance(struct iov_iter *i, size_t size)
 		i->count -= size;
 		return;
 	}
-	iterate_and_advance(i, size, v, 0, 0, 0)
+	if (unlikely(iov_iter_is_mapping(i))) {
+		/* We really don't want to fetch pages if we can avoid it */
+		i->iov_offset += size;
+		i->count -= size;
+		return;
+	}
+	iterate_and_advance(i, size, v, 0, 0, 0, 0)
 }
 EXPORT_SYMBOL(iov_iter_advance);
 
@@ -1116,7 +1185,12 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll)
 		return;
 	}
 	unroll -= i->iov_offset;
-	if (iov_iter_is_bvec(i)) {
+	if (iov_iter_is_mapping(i)) {
+		BUG(); /* We should never go beyond the start of the specified
+			* range since we might then be straying into pages that
+			* aren't pinned.
+			*/
+	} else if (iov_iter_is_bvec(i)) {
 		const struct bio_vec *bvec = i->bvec;
 		while (1) {
 			size_t n = (--bvec)->bv_len;
@@ -1153,9 +1227,9 @@ size_t iov_iter_single_seg_count(const struct iov_iter *i)
 		return i->count;	// it is a silly place, anyway
 	if (i->nr_segs == 1)
 		return i->count;
-	if (unlikely(iov_iter_is_discard(i)))
+	if (unlikely(iov_iter_is_discard(i) || iov_iter_is_mapping(i)))
 		return i->count;
-	else if (iov_iter_is_bvec(i))
+	if (iov_iter_is_bvec(i))
 		return min(i->count, i->bvec->bv_len - i->iov_offset);
 	else
 		return min(i->count, i->iov->iov_len - i->iov_offset);
@@ -1203,6 +1277,32 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction,
 }
 EXPORT_SYMBOL(iov_iter_pipe);
 
+/**
+ * iov_iter_mapping - Initialise an I/O iterator to use the pages in a mapping
+ * @i: The iterator to initialise.
+ * @direction: The direction of the transfer.
+ * @mapping: The mapping to access.
+ * @start: The start file position.
+ * @count: The size of the I/O buffer in bytes.
+ *
+ * Set up an I/O iterator to either draw data out of the pages attached to an
+ * inode or to inject data into those pages.  The pages *must* be prevented
+ * from evaporation, either by taking a ref on them or locking them by the
+ * caller.
+ */
+void iov_iter_mapping(struct iov_iter *i, unsigned int direction,
+		      struct address_space *mapping,
+		      loff_t start, size_t count)
+{
+	BUG_ON(direction & ~1);
+	i->type = ITER_MAPPING | (direction & (READ | WRITE));
+	i->mapping = mapping;
+	i->mapping_start = start;
+	i->count = count;
+	i->iov_offset = 0;
+}
+EXPORT_SYMBOL(iov_iter_mapping);
+
 /**
  * iov_iter_discard - Initialise an I/O iterator that discards data
  * @i: The iterator to initialise.
@@ -1236,7 +1336,8 @@ unsigned long iov_iter_alignment(const struct iov_iter *i)
 	iterate_all_kinds(i, size, v,
 		(res |= (unsigned long)v.iov_base | v.iov_len, 0),
 		res |= v.bv_offset | v.bv_len,
-		res |= (unsigned long)v.iov_base | v.iov_len
+		res |= (unsigned long)v.iov_base | v.iov_len,
+		res |= v.bv_offset | v.bv_len
 	)
 	return res;
 }
@@ -1258,7 +1359,9 @@ unsigned long iov_iter_gap_alignment(const struct iov_iter *i)
 		(res |= (!res ? 0 : (unsigned long)v.bv_offset) |
 			(size != v.bv_len ? size : 0)),
 		(res |= (!res ? 0 : (unsigned long)v.iov_base) |
-			(size != v.iov_len ? size : 0))
+			(size != v.iov_len ? size : 0)),
+		(res |= (!res ? 0 : (unsigned long)v.bv_offset) |
+			(size != v.bv_len ? size : 0))
 		);
 	return res;
 }
@@ -1308,6 +1411,48 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
 	return __pipe_get_pages(i, min(maxsize, capacity), pages, iter_head, start);
 }
 
+static ssize_t iter_mapping_get_pages(struct iov_iter *i,
+				      struct page **pages, size_t maxsize,
+				      unsigned maxpages, size_t *_start_offset)
+{
+	unsigned nr, offset;
+	pgoff_t index, count;
+	size_t size = maxsize, actual;
+	loff_t pos;
+
+	if (!size || !maxpages)
+		return 0;
+
+	pos = i->mapping_start + i->iov_offset;
+	index = pos >> PAGE_SHIFT;
+	offset = pos & ~PAGE_MASK;
+	*_start_offset = offset;
+
+	count = 1;
+	if (size > PAGE_SIZE - offset) {
+		size -= PAGE_SIZE - offset;
+		count += size >> PAGE_SHIFT;
+		size &= ~PAGE_MASK;
+		if (size)
+			count++;
+	}
+
+	if (count > maxpages)
+		count = maxpages;
+
+	nr = find_get_pages_contig(i->mapping, index, count, pages);
+	if (nr == 0)
+		return 0;
+
+	actual = PAGE_SIZE * nr;
+	actual -= offset;
+	if (nr == count && size > 0) {
+		unsigned last_offset = (nr > 1) ? 0 : offset;
+		actual -= PAGE_SIZE - (last_offset + size);
+	}
+	return actual;
+}
+
 ssize_t iov_iter_get_pages(struct iov_iter *i,
 		   struct page **pages, size_t maxsize, unsigned maxpages,
 		   size_t *start)
@@ -1317,6 +1462,8 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
 
 	if (unlikely(iov_iter_is_pipe(i)))
 		return pipe_get_pages(i, pages, maxsize, maxpages, start);
+	if (unlikely(iov_iter_is_mapping(i)))
+		return iter_mapping_get_pages(i, pages, maxsize, maxpages, start);
 	if (unlikely(iov_iter_is_discard(i)))
 		return -EFAULT;
 
@@ -1343,7 +1490,8 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
 		return v.bv_len;
 	}),({
 		return -EFAULT;
-	})
+	}),
+	0
 	)
 	return 0;
 }
@@ -1387,6 +1535,51 @@ static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
 	return n;
 }
 
+static ssize_t iter_mapping_get_pages_alloc(struct iov_iter *i,
+					    struct page ***pages, size_t maxsize,
+					    size_t *_start_offset)
+{
+	struct page **p;
+	unsigned nr, offset;
+	pgoff_t index, count;
+	size_t size = maxsize, actual;
+	loff_t pos;
+
+	if (!size)
+		return 0;
+
+	pos = i->mapping_start + i->iov_offset;
+	index = pos >> PAGE_SHIFT;
+	offset = pos & ~PAGE_MASK;
+	*_start_offset = offset;
+
+	count = 1;
+	if (size > PAGE_SIZE - offset) {
+		size -= PAGE_SIZE - offset;
+		count += size >> PAGE_SHIFT;
+		size &= ~PAGE_MASK;
+		if (size)
+			count++;
+	}
+
+	p = get_pages_array(count);
+	if (!p)
+		return -ENOMEM;
+	*pages = p;
+
+	nr = find_get_pages_contig(i->mapping, index, count, p);
+	if (nr == 0)
+		return 0;
+
+	actual = PAGE_SIZE * nr;
+	actual -= offset;
+	if (nr == count && size > 0) {
+		unsigned last_offset = (nr > 1) ? 0 : offset;
+		actual -= PAGE_SIZE - (last_offset + size);
+	}
+	return actual;
+}
+
 ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 		   struct page ***pages, size_t maxsize,
 		   size_t *start)
@@ -1398,6 +1591,8 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 
 	if (unlikely(iov_iter_is_pipe(i)))
 		return pipe_get_pages_alloc(i, pages, maxsize, start);
+	if (unlikely(iov_iter_is_mapping(i)))
+		return iter_mapping_get_pages_alloc(i, pages, maxsize, start);
 	if (unlikely(iov_iter_is_discard(i)))
 		return -EFAULT;
 
@@ -1430,7 +1625,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 		return v.bv_len;
 	}),({
 		return -EFAULT;
-	})
+	}), 0
 	)
 	return 0;
 }
@@ -1469,6 +1664,14 @@ size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
 				      v.iov_base, v.iov_len,
 				      sum, off);
 		off += v.iov_len;
+	}), ({
+		char *p = kmap_atomic(v.bv_page);
+		next = csum_partial_copy_nocheck(p + v.bv_offset,
+						 (to += v.bv_len) - v.bv_len,
+						 v.bv_len, 0);
+		kunmap_atomic(p);
+		sum = csum_block_add(sum, next, off);
+		off += v.bv_len;
 	})
 	)
 	*csum = sum;
@@ -1511,6 +1714,14 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum,
 				      v.iov_base, v.iov_len,
 				      sum, off);
 		off += v.iov_len;
+	}), ({
+		char *p = kmap_atomic(v.bv_page);
+		next = csum_partial_copy_nocheck(p + v.bv_offset,
+						 (to += v.bv_len) - v.bv_len,
+						 v.bv_len, 0);
+		kunmap_atomic(p);
+		sum = csum_block_add(sum, next, off);
+		off += v.bv_len;
 	})
 	)
 	*csum = sum;
@@ -1557,6 +1768,14 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump,
 				     (from += v.iov_len) - v.iov_len,
 				     v.iov_len, sum, off);
 		off += v.iov_len;
+	}), ({
+		char *p = kmap_atomic(v.bv_page);
+		next = csum_partial_copy_nocheck((from += v.bv_len) - v.bv_len,
+						 p + v.bv_offset,
+						 v.bv_len, 0);
+		kunmap_atomic(p);
+		sum = csum_block_add(sum, next, off);
+		off += v.bv_len;
 	})
 	)
 	*csum = sum;
@@ -1606,6 +1825,21 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages)
 		npages = pipe_space_for_user(iter_head, pipe->tail, pipe);
 		if (npages >= maxpages)
 			return maxpages;
+	} else if (unlikely(iov_iter_is_mapping(i))) {
+		unsigned offset;
+
+		offset = (i->mapping_start + i->iov_offset) & ~PAGE_MASK;
+
+		npages = 1;
+		if (size > PAGE_SIZE - offset) {
+			size -= PAGE_SIZE - offset;
+			npages += size >> PAGE_SHIFT;
+			size &= ~PAGE_MASK;
+			if (size)
+				npages++;
+		}
+		if (npages >= maxpages)
+			return maxpages;
 	} else iterate_all_kinds(i, size, v, ({
 		unsigned long p = (unsigned long)v.iov_base;
 		npages += DIV_ROUND_UP(p + v.iov_len, PAGE_SIZE)
@@ -1622,7 +1856,8 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages)
 			- p / PAGE_SIZE;
 		if (npages >= maxpages)
 			return maxpages;
-	})
+	}),
+	0
 	)
 	return npages;
 }
@@ -1635,7 +1870,7 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
 		WARN_ON(1);
 		return NULL;
 	}
-	if (unlikely(iov_iter_is_discard(new)))
+	if (unlikely(iov_iter_is_discard(new) || iov_iter_is_mapping(new)))
 		return NULL;
 	if (iov_iter_is_bvec(new))
 		return new->bvec = kmemdup(new->bvec,
@@ -1747,7 +1982,12 @@ int iov_iter_for_each_range(struct iov_iter *i, size_t bytes,
 		kunmap(v.bv_page);
 		err;}), ({
 		w = v;
-		err = f(&w, context);})
+		err = f(&w, context);}), ({
+		w.iov_base = kmap(v.bv_page) + v.bv_offset;
+		w.iov_len = v.bv_len;
+		err = f(&w, context);
+		kunmap(v.bv_page);
+		err;})
 	)
 	return err;
 }



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 01/32] iov_iter: Add ITER_MAPPING
@ 2020-07-13 16:30   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:30 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs

Add an iterator, ITER_MAPPING, that walks through a set of pages attached
to an address_space, starting at a given page and offset and walking for
the specified amount of bytes.

The caller must guarantee that the pages are all present and they must be
locked using PG_locked, PG_writeback or PG_fscache to prevent them from
going away or being migrated whilst they're being accessed.

This is useful for copying data from socket buffers to inodes in network
filesystems and for transferring data between those inodes and the cache
using direct I/O.

Whilst it is true that ITER_BVEC could be used instead, that would require
a bio_vec array to be allocated to refer to all the pages - which should be
redundant if inode->i_pages also points to all these pages.

This could also be turned into an ITER_XARRAY, taking and xarray pointer
instead of a mapping pointer.  It would be mostly trivial, except for the
use of find_get_pages_contig() by iov_iter_get_pages*().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
---

 include/linux/uio.h |   11 ++
 lib/iov_iter.c      |  286 +++++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 274 insertions(+), 23 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 9576fd8158d7..a0321a740f51 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -11,6 +11,7 @@
 #include <uapi/linux/uio.h>
 
 struct page;
+struct address_space;
 struct pipe_inode_info;
 
 struct kvec {
@@ -25,6 +26,7 @@ enum iter_type {
 	ITER_BVEC = 16,
 	ITER_PIPE = 32,
 	ITER_DISCARD = 64,
+	ITER_MAPPING = 128,
 };
 
 struct iov_iter {
@@ -40,6 +42,7 @@ struct iov_iter {
 		const struct iovec *iov;
 		const struct kvec *kvec;
 		const struct bio_vec *bvec;
+		struct address_space *mapping;
 		struct pipe_inode_info *pipe;
 	};
 	union {
@@ -48,6 +51,7 @@ struct iov_iter {
 			unsigned int head;
 			unsigned int start_head;
 		};
+		loff_t mapping_start;
 	};
 };
 
@@ -81,6 +85,11 @@ static inline bool iov_iter_is_discard(const struct iov_iter *i)
 	return iov_iter_type(i) == ITER_DISCARD;
 }
 
+static inline bool iov_iter_is_mapping(const struct iov_iter *i)
+{
+	return iov_iter_type(i) == ITER_MAPPING;
+}
+
 static inline unsigned char iov_iter_rw(const struct iov_iter *i)
 {
 	return i->type & (READ | WRITE);
@@ -222,6 +231,8 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_
 void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode_info *pipe,
 			size_t count);
 void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count);
+void iov_iter_mapping(struct iov_iter *i, unsigned int direction, struct address_space *mapping,
+		      loff_t start, size_t count);
 ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
 			size_t maxsize, unsigned maxpages, size_t *start);
 ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index bf538c2bec77..e4a073523b76 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -75,7 +75,40 @@
 	}						\
 }
 
-#define iterate_all_kinds(i, n, v, I, B, K) {			\
+#define iterate_mapping(i, n, __v, skip, STEP) {		\
+	struct page *page;					\
+	size_t wanted = n, seg, offset;				\
+	loff_t start = i->mapping_start + skip;			\
+	pgoff_t index = start >> PAGE_SHIFT;			\
+								\
+	XA_STATE(xas, &i->mapping->i_pages, index);		\
+								\
+	rcu_read_lock();						\
+	for (page = xas_load(&xas); page; page = xas_next(&xas)) {	\
+		if (xas_retry(&xas, page))				\
+			continue;					\
+		if (WARN_ON(xa_is_value(page)))				\
+			break;						\
+		if (WARN_ON(PageHuge(page)))				\
+			break;						\
+		if (!page)						\
+			break;						\
+		__v.bv_page = find_subpage(page, xas.xa_index);		\
+		offset = (i->mapping_start + skip) & ~PAGE_MASK;	\
+		seg = PAGE_SIZE - offset;			\
+		__v.bv_offset = offset;				\
+		__v.bv_len = min(n, seg);			\
+		(void)(STEP);					\
+		n -= __v.bv_len;				\
+		skip += __v.bv_len;				\
+		if (n == 0)					\
+			break;					\
+	}							\
+	rcu_read_unlock();					\
+	n = wanted - n;						\
+}
+
+#define iterate_all_kinds(i, n, v, I, B, K, M) {		\
 	if (likely(n)) {					\
 		size_t skip = i->iov_offset;			\
 		if (unlikely(i->type & ITER_BVEC)) {		\
@@ -87,6 +120,9 @@
 			struct kvec v;				\
 			iterate_kvec(i, n, v, kvec, skip, (K))	\
 		} else if (unlikely(i->type & ITER_DISCARD)) {	\
+		} else if (unlikely(i->type & ITER_MAPPING)) {	\
+			struct bio_vec v;			\
+			iterate_mapping(i, n, v, skip, (M));	\
 		} else {					\
 			const struct iovec *iov;		\
 			struct iovec v;				\
@@ -95,7 +131,7 @@
 	}							\
 }
 
-#define iterate_and_advance(i, n, v, I, B, K) {			\
+#define iterate_and_advance(i, n, v, I, B, K, M) {		\
 	if (unlikely(i->count < n))				\
 		n = i->count;					\
 	if (i->count) {						\
@@ -120,6 +156,9 @@
 			i->kvec = kvec;				\
 		} else if (unlikely(i->type & ITER_DISCARD)) {	\
 			skip += n;				\
+		} else if (unlikely(i->type & ITER_MAPPING)) {	\
+			struct bio_vec v;			\
+			iterate_mapping(i, n, v, skip, (M))	\
 		} else {					\
 			const struct iovec *iov;		\
 			struct iovec v;				\
@@ -629,7 +668,9 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 		copyout(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len),
 		memcpy_to_page(v.bv_page, v.bv_offset,
 			       (from += v.bv_len) - v.bv_len, v.bv_len),
-		memcpy(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len)
+		memcpy(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len),
+		memcpy_to_page(v.bv_page, v.bv_offset,
+			       (from += v.bv_len) - v.bv_len, v.bv_len)
 	)
 
 	return bytes;
@@ -747,6 +788,15 @@ size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i)
 			bytes = curr_addr - s_addr - rem;
 			return bytes;
 		}
+		}),
+		({
+		rem = memcpy_mcsafe_to_page(v.bv_page, v.bv_offset,
+                               (from += v.bv_len) - v.bv_len, v.bv_len);
+		if (rem) {
+			curr_addr = (unsigned long) from;
+			bytes = curr_addr - s_addr - rem;
+			return bytes;
+		}
 		})
 	)
 
@@ -768,7 +818,9 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 		copyin((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
 		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
 				 v.bv_offset, v.bv_len),
-		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
+		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
+		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len)
 	)
 
 	return bytes;
@@ -794,7 +846,9 @@ bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
 		0;}),
 		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
 				 v.bv_offset, v.bv_len),
-		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
+		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
+		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len)
 	)
 
 	iov_iter_advance(i, bytes);
@@ -814,7 +868,9 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
 					 v.iov_base, v.iov_len),
 		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
 				 v.bv_offset, v.bv_len),
-		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
+		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
+		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len)
 	)
 
 	return bytes;
@@ -849,7 +905,9 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
 		memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page,
 				 v.bv_offset, v.bv_len),
 		memcpy_flushcache((to += v.iov_len) - v.iov_len, v.iov_base,
-			v.iov_len)
+			v.iov_len),
+		memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len)
 	)
 
 	return bytes;
@@ -873,7 +931,9 @@ bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
 		0;}),
 		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
 				 v.bv_offset, v.bv_len),
-		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
+		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
+		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len)
 	)
 
 	iov_iter_advance(i, bytes);
@@ -910,7 +970,7 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
 {
 	if (unlikely(!page_copy_sane(page, offset, bytes)))
 		return 0;
-	if (i->type & (ITER_BVEC|ITER_KVEC)) {
+	if (i->type & (ITER_BVEC | ITER_KVEC | ITER_MAPPING)) {
 		void *kaddr = kmap_atomic(page);
 		size_t wanted = copy_to_iter(kaddr + offset, bytes, i);
 		kunmap_atomic(kaddr);
@@ -933,7 +993,7 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
 		WARN_ON(1);
 		return 0;
 	}
-	if (i->type & (ITER_BVEC|ITER_KVEC)) {
+	if (i->type & (ITER_BVEC | ITER_KVEC | ITER_MAPPING)) {
 		void *kaddr = kmap_atomic(page);
 		size_t wanted = _copy_from_iter(kaddr + offset, bytes, i);
 		kunmap_atomic(kaddr);
@@ -977,7 +1037,8 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
 	iterate_and_advance(i, bytes, v,
 		clear_user(v.iov_base, v.iov_len),
 		memzero_page(v.bv_page, v.bv_offset, v.bv_len),
-		memset(v.iov_base, 0, v.iov_len)
+		memset(v.iov_base, 0, v.iov_len),
+		memzero_page(v.bv_page, v.bv_offset, v.bv_len)
 	)
 
 	return bytes;
@@ -1001,7 +1062,9 @@ size_t iov_iter_copy_from_user_atomic(struct page *page,
 		copyin((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
 		memcpy_from_page((p += v.bv_len) - v.bv_len, v.bv_page,
 				 v.bv_offset, v.bv_len),
-		memcpy((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
+		memcpy((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
+		memcpy_from_page((p += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len)
 	)
 	kunmap_atomic(kaddr);
 	return bytes;
@@ -1072,7 +1135,13 @@ void iov_iter_advance(struct iov_iter *i, size_t size)
 		i->count -= size;
 		return;
 	}
-	iterate_and_advance(i, size, v, 0, 0, 0)
+	if (unlikely(iov_iter_is_mapping(i))) {
+		/* We really don't want to fetch pages if we can avoid it */
+		i->iov_offset += size;
+		i->count -= size;
+		return;
+	}
+	iterate_and_advance(i, size, v, 0, 0, 0, 0)
 }
 EXPORT_SYMBOL(iov_iter_advance);
 
@@ -1116,7 +1185,12 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll)
 		return;
 	}
 	unroll -= i->iov_offset;
-	if (iov_iter_is_bvec(i)) {
+	if (iov_iter_is_mapping(i)) {
+		BUG(); /* We should never go beyond the start of the specified
+			* range since we might then be straying into pages that
+			* aren't pinned.
+			*/
+	} else if (iov_iter_is_bvec(i)) {
 		const struct bio_vec *bvec = i->bvec;
 		while (1) {
 			size_t n = (--bvec)->bv_len;
@@ -1153,9 +1227,9 @@ size_t iov_iter_single_seg_count(const struct iov_iter *i)
 		return i->count;	// it is a silly place, anyway
 	if (i->nr_segs == 1)
 		return i->count;
-	if (unlikely(iov_iter_is_discard(i)))
+	if (unlikely(iov_iter_is_discard(i) || iov_iter_is_mapping(i)))
 		return i->count;
-	else if (iov_iter_is_bvec(i))
+	if (iov_iter_is_bvec(i))
 		return min(i->count, i->bvec->bv_len - i->iov_offset);
 	else
 		return min(i->count, i->iov->iov_len - i->iov_offset);
@@ -1203,6 +1277,32 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction,
 }
 EXPORT_SYMBOL(iov_iter_pipe);
 
+/**
+ * iov_iter_mapping - Initialise an I/O iterator to use the pages in a mapping
+ * @i: The iterator to initialise.
+ * @direction: The direction of the transfer.
+ * @mapping: The mapping to access.
+ * @start: The start file position.
+ * @count: The size of the I/O buffer in bytes.
+ *
+ * Set up an I/O iterator to either draw data out of the pages attached to an
+ * inode or to inject data into those pages.  The pages *must* be prevented
+ * from evaporation, either by taking a ref on them or locking them by the
+ * caller.
+ */
+void iov_iter_mapping(struct iov_iter *i, unsigned int direction,
+		      struct address_space *mapping,
+		      loff_t start, size_t count)
+{
+	BUG_ON(direction & ~1);
+	i->type = ITER_MAPPING | (direction & (READ | WRITE));
+	i->mapping = mapping;
+	i->mapping_start = start;
+	i->count = count;
+	i->iov_offset = 0;
+}
+EXPORT_SYMBOL(iov_iter_mapping);
+
 /**
  * iov_iter_discard - Initialise an I/O iterator that discards data
  * @i: The iterator to initialise.
@@ -1236,7 +1336,8 @@ unsigned long iov_iter_alignment(const struct iov_iter *i)
 	iterate_all_kinds(i, size, v,
 		(res |= (unsigned long)v.iov_base | v.iov_len, 0),
 		res |= v.bv_offset | v.bv_len,
-		res |= (unsigned long)v.iov_base | v.iov_len
+		res |= (unsigned long)v.iov_base | v.iov_len,
+		res |= v.bv_offset | v.bv_len
 	)
 	return res;
 }
@@ -1258,7 +1359,9 @@ unsigned long iov_iter_gap_alignment(const struct iov_iter *i)
 		(res |= (!res ? 0 : (unsigned long)v.bv_offset) |
 			(size != v.bv_len ? size : 0)),
 		(res |= (!res ? 0 : (unsigned long)v.iov_base) |
-			(size != v.iov_len ? size : 0))
+			(size != v.iov_len ? size : 0)),
+		(res |= (!res ? 0 : (unsigned long)v.bv_offset) |
+			(size != v.bv_len ? size : 0))
 		);
 	return res;
 }
@@ -1308,6 +1411,48 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
 	return __pipe_get_pages(i, min(maxsize, capacity), pages, iter_head, start);
 }
 
+static ssize_t iter_mapping_get_pages(struct iov_iter *i,
+				      struct page **pages, size_t maxsize,
+				      unsigned maxpages, size_t *_start_offset)
+{
+	unsigned nr, offset;
+	pgoff_t index, count;
+	size_t size = maxsize, actual;
+	loff_t pos;
+
+	if (!size || !maxpages)
+		return 0;
+
+	pos = i->mapping_start + i->iov_offset;
+	index = pos >> PAGE_SHIFT;
+	offset = pos & ~PAGE_MASK;
+	*_start_offset = offset;
+
+	count = 1;
+	if (size > PAGE_SIZE - offset) {
+		size -= PAGE_SIZE - offset;
+		count += size >> PAGE_SHIFT;
+		size &= ~PAGE_MASK;
+		if (size)
+			count++;
+	}
+
+	if (count > maxpages)
+		count = maxpages;
+
+	nr = find_get_pages_contig(i->mapping, index, count, pages);
+	if (nr == 0)
+		return 0;
+
+	actual = PAGE_SIZE * nr;
+	actual -= offset;
+	if (nr == count && size > 0) {
+		unsigned last_offset = (nr > 1) ? 0 : offset;
+		actual -= PAGE_SIZE - (last_offset + size);
+	}
+	return actual;
+}
+
 ssize_t iov_iter_get_pages(struct iov_iter *i,
 		   struct page **pages, size_t maxsize, unsigned maxpages,
 		   size_t *start)
@@ -1317,6 +1462,8 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
 
 	if (unlikely(iov_iter_is_pipe(i)))
 		return pipe_get_pages(i, pages, maxsize, maxpages, start);
+	if (unlikely(iov_iter_is_mapping(i)))
+		return iter_mapping_get_pages(i, pages, maxsize, maxpages, start);
 	if (unlikely(iov_iter_is_discard(i)))
 		return -EFAULT;
 
@@ -1343,7 +1490,8 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
 		return v.bv_len;
 	}),({
 		return -EFAULT;
-	})
+	}),
+	0
 	)
 	return 0;
 }
@@ -1387,6 +1535,51 @@ static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
 	return n;
 }
 
+static ssize_t iter_mapping_get_pages_alloc(struct iov_iter *i,
+					    struct page ***pages, size_t maxsize,
+					    size_t *_start_offset)
+{
+	struct page **p;
+	unsigned nr, offset;
+	pgoff_t index, count;
+	size_t size = maxsize, actual;
+	loff_t pos;
+
+	if (!size)
+		return 0;
+
+	pos = i->mapping_start + i->iov_offset;
+	index = pos >> PAGE_SHIFT;
+	offset = pos & ~PAGE_MASK;
+	*_start_offset = offset;
+
+	count = 1;
+	if (size > PAGE_SIZE - offset) {
+		size -= PAGE_SIZE - offset;
+		count += size >> PAGE_SHIFT;
+		size &= ~PAGE_MASK;
+		if (size)
+			count++;
+	}
+
+	p = get_pages_array(count);
+	if (!p)
+		return -ENOMEM;
+	*pages = p;
+
+	nr = find_get_pages_contig(i->mapping, index, count, p);
+	if (nr == 0)
+		return 0;
+
+	actual = PAGE_SIZE * nr;
+	actual -= offset;
+	if (nr == count && size > 0) {
+		unsigned last_offset = (nr > 1) ? 0 : offset;
+		actual -= PAGE_SIZE - (last_offset + size);
+	}
+	return actual;
+}
+
 ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 		   struct page ***pages, size_t maxsize,
 		   size_t *start)
@@ -1398,6 +1591,8 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 
 	if (unlikely(iov_iter_is_pipe(i)))
 		return pipe_get_pages_alloc(i, pages, maxsize, start);
+	if (unlikely(iov_iter_is_mapping(i)))
+		return iter_mapping_get_pages_alloc(i, pages, maxsize, start);
 	if (unlikely(iov_iter_is_discard(i)))
 		return -EFAULT;
 
@@ -1430,7 +1625,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 		return v.bv_len;
 	}),({
 		return -EFAULT;
-	})
+	}), 0
 	)
 	return 0;
 }
@@ -1469,6 +1664,14 @@ size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
 				      v.iov_base, v.iov_len,
 				      sum, off);
 		off += v.iov_len;
+	}), ({
+		char *p = kmap_atomic(v.bv_page);
+		next = csum_partial_copy_nocheck(p + v.bv_offset,
+						 (to += v.bv_len) - v.bv_len,
+						 v.bv_len, 0);
+		kunmap_atomic(p);
+		sum = csum_block_add(sum, next, off);
+		off += v.bv_len;
 	})
 	)
 	*csum = sum;
@@ -1511,6 +1714,14 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum,
 				      v.iov_base, v.iov_len,
 				      sum, off);
 		off += v.iov_len;
+	}), ({
+		char *p = kmap_atomic(v.bv_page);
+		next = csum_partial_copy_nocheck(p + v.bv_offset,
+						 (to += v.bv_len) - v.bv_len,
+						 v.bv_len, 0);
+		kunmap_atomic(p);
+		sum = csum_block_add(sum, next, off);
+		off += v.bv_len;
 	})
 	)
 	*csum = sum;
@@ -1557,6 +1768,14 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump,
 				     (from += v.iov_len) - v.iov_len,
 				     v.iov_len, sum, off);
 		off += v.iov_len;
+	}), ({
+		char *p = kmap_atomic(v.bv_page);
+		next = csum_partial_copy_nocheck((from += v.bv_len) - v.bv_len,
+						 p + v.bv_offset,
+						 v.bv_len, 0);
+		kunmap_atomic(p);
+		sum = csum_block_add(sum, next, off);
+		off += v.bv_len;
 	})
 	)
 	*csum = sum;
@@ -1606,6 +1825,21 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages)
 		npages = pipe_space_for_user(iter_head, pipe->tail, pipe);
 		if (npages >= maxpages)
 			return maxpages;
+	} else if (unlikely(iov_iter_is_mapping(i))) {
+		unsigned offset;
+
+		offset = (i->mapping_start + i->iov_offset) & ~PAGE_MASK;
+
+		npages = 1;
+		if (size > PAGE_SIZE - offset) {
+			size -= PAGE_SIZE - offset;
+			npages += size >> PAGE_SHIFT;
+			size &= ~PAGE_MASK;
+			if (size)
+				npages++;
+		}
+		if (npages >= maxpages)
+			return maxpages;
 	} else iterate_all_kinds(i, size, v, ({
 		unsigned long p = (unsigned long)v.iov_base;
 		npages += DIV_ROUND_UP(p + v.iov_len, PAGE_SIZE)
@@ -1622,7 +1856,8 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages)
 			- p / PAGE_SIZE;
 		if (npages >= maxpages)
 			return maxpages;
-	})
+	}),
+	0
 	)
 	return npages;
 }
@@ -1635,7 +1870,7 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
 		WARN_ON(1);
 		return NULL;
 	}
-	if (unlikely(iov_iter_is_discard(new)))
+	if (unlikely(iov_iter_is_discard(new) || iov_iter_is_mapping(new)))
 		return NULL;
 	if (iov_iter_is_bvec(new))
 		return new->bvec = kmemdup(new->bvec,
@@ -1747,7 +1982,12 @@ int iov_iter_for_each_range(struct iov_iter *i, size_t bytes,
 		kunmap(v.bv_page);
 		err;}), ({
 		w = v;
-		err = f(&w, context);})
+		err = f(&w, context);}), ({
+		w.iov_base = kmap(v.bv_page) + v.bv_offset;
+		w.iov_len = v.bv_len;
+		err = f(&w, context);
+		kunmap(v.bv_page);
+		err;})
 	)
 	return err;
 }

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 02/32] vm: Add wait/unlock functions for PG_fscache
  2020-07-13 16:30 ` David Howells
@ 2020-07-13 16:31   ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:31 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Add functions to unlock and wait for unlock of PG_fscache analogously with
those for PG_lock.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 include/linux/pagemap.h |   14 ++++++++++++++
 mm/filemap.c            |   18 ++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index cf2468da68e9..0b917990dc1e 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -501,6 +501,7 @@ extern int __lock_page_killable(struct page *page);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
 extern void unlock_page(struct page *page);
+extern void unlock_page_fscache(struct page *page);
 
 /*
  * Return true if the page was successfully locked
@@ -575,6 +576,19 @@ static inline int wait_on_page_locked_killable(struct page *page)
 	return wait_on_page_bit_killable(compound_head(page), PG_locked);
 }
 
+/**
+ * wait_on_page_fscache - Wait for PG_fscache to be cleared on a page
+ * @page: The page
+ *
+ * Wait for the fscache mark to be removed from a page, usually signifying the
+ * completion of a write from that page to the cache.
+ */
+static inline void wait_on_page_fscache(struct page *page)
+{
+	if (PagePrivate2(page))
+		wait_on_page_bit(compound_head(page), PG_fscache);
+}
+
 extern void put_and_wait_on_page_locked(struct page *page);
 
 void wait_on_page_writeback(struct page *page);
diff --git a/mm/filemap.c b/mm/filemap.c
index f0ae9a6308cb..4894e9705d34 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1293,6 +1293,24 @@ void unlock_page(struct page *page)
 }
 EXPORT_SYMBOL(unlock_page);
 
+/**
+ * unlock_page_fscache - Unlock a page pinned with PG_fscache
+ * @page: The page
+ *
+ * Unlocks the page and wakes up sleepers in wait_on_page_fscache().  Also
+ * wakes those waiting for the lock and writeback bits because the wakeup
+ * mechanism is shared.  But that's OK - those sleepers will just go back to
+ * sleep.
+ */
+void unlock_page_fscache(struct page *page)
+{
+	page = compound_head(page);
+	VM_BUG_ON_PAGE(!PagePrivate2(page), page);
+	clear_bit_unlock(PG_fscache, &page->flags);
+	wake_up_page_bit(page, PG_fscache);
+}
+EXPORT_SYMBOL(unlock_page_fscache);
+
 /**
  * end_page_writeback - end writeback against a page
  * @page: the page



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 02/32] vm: Add wait/unlock functions for PG_fscache
@ 2020-07-13 16:31   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:31 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs

Add functions to unlock and wait for unlock of PG_fscache analogously with
those for PG_lock.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 include/linux/pagemap.h |   14 ++++++++++++++
 mm/filemap.c            |   18 ++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index cf2468da68e9..0b917990dc1e 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -501,6 +501,7 @@ extern int __lock_page_killable(struct page *page);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
 extern void unlock_page(struct page *page);
+extern void unlock_page_fscache(struct page *page);
 
 /*
  * Return true if the page was successfully locked
@@ -575,6 +576,19 @@ static inline int wait_on_page_locked_killable(struct page *page)
 	return wait_on_page_bit_killable(compound_head(page), PG_locked);
 }
 
+/**
+ * wait_on_page_fscache - Wait for PG_fscache to be cleared on a page
+ * @page: The page
+ *
+ * Wait for the fscache mark to be removed from a page, usually signifying the
+ * completion of a write from that page to the cache.
+ */
+static inline void wait_on_page_fscache(struct page *page)
+{
+	if (PagePrivate2(page))
+		wait_on_page_bit(compound_head(page), PG_fscache);
+}
+
 extern void put_and_wait_on_page_locked(struct page *page);
 
 void wait_on_page_writeback(struct page *page);
diff --git a/mm/filemap.c b/mm/filemap.c
index f0ae9a6308cb..4894e9705d34 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1293,6 +1293,24 @@ void unlock_page(struct page *page)
 }
 EXPORT_SYMBOL(unlock_page);
 
+/**
+ * unlock_page_fscache - Unlock a page pinned with PG_fscache
+ * @page: The page
+ *
+ * Unlocks the page and wakes up sleepers in wait_on_page_fscache().  Also
+ * wakes those waiting for the lock and writeback bits because the wakeup
+ * mechanism is shared.  But that's OK - those sleepers will just go back to
+ * sleep.
+ */
+void unlock_page_fscache(struct page *page)
+{
+	page = compound_head(page);
+	VM_BUG_ON_PAGE(!PagePrivate2(page), page);
+	clear_bit_unlock(PG_fscache, &page->flags);
+	wake_up_page_bit(page, PG_fscache);
+}
+EXPORT_SYMBOL(unlock_page_fscache);
+
 /**
  * end_page_writeback - end writeback against a page
  * @page: the page

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 03/32] vfs: Export rw_verify_area() for use by cachefiles
  2020-07-13 16:30 ` David Howells
                   ` (2 preceding siblings ...)
  (?)
@ 2020-07-13 16:31 ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:31 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Export rw_verify_area() for so that cachefiles can use it before issuing
call_read_iter() and call_write_iter() to effect async DIO operations
against the cache.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/internal.h      |    5 -----
 fs/read_write.c    |    1 +
 include/linux/fs.h |    1 +
 3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 9b863a7bd708..8213a29a972f 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -156,11 +156,6 @@ extern char *simple_dname(struct dentry *, char *, int);
 extern void dput_to_list(struct dentry *, struct list_head *);
 extern void shrink_dentry_list(struct list_head *);
 
-/*
- * read_write.c
- */
-extern int rw_verify_area(int, struct file *, const loff_t *, size_t);
-
 /*
  * pipe.c
  */
diff --git a/fs/read_write.c b/fs/read_write.c
index bbfa9b12b15e..eb18270a1e14 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -400,6 +400,7 @@ int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t
 	return security_file_permission(file,
 				read_write == READ ? MAY_READ : MAY_WRITE);
 }
+EXPORT_SYMBOL(rw_verify_area);
 
 static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3f881a892ea7..aa3e3af92220 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2905,6 +2905,7 @@ extern int notify_change(struct dentry *, struct iattr *, struct inode **);
 extern int inode_permission(struct inode *, int);
 extern int generic_permission(struct inode *, int);
 extern int __check_sticky(struct inode *dir, struct inode *inode);
+extern int rw_verify_area(int, struct file *, const loff_t *, size_t);
 
 static inline bool execute_ok(struct inode *inode)
 {



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 04/32] vfs: Provide S_CACHE_FILE inode flag
  2020-07-13 16:30 ` David Howells
@ 2020-07-13 16:31   ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:31 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Provide an S_CACHE_FILE inode flag that cachefiles can set to ward off
other kernel services and drivers (including itself) from using its cache
files.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 include/linux/fs.h |    1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index aa3e3af92220..33d30742b26e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2003,6 +2003,7 @@ struct super_operations {
 #define S_ENCRYPTED	16384	/* Encrypted file (using fs/crypto/) */
 #define S_CASEFOLD	32768	/* Casefolded file */
 #define S_VERITY	65536	/* Verity file (using fs/verity/) */
+#define S_CACHE_FILE	0x20000	/* File is in use as cache file (eg. fs/cachefiles) */
 
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 04/32] vfs: Provide S_CACHE_FILE inode flag
@ 2020-07-13 16:31   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:31 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs

Provide an S_CACHE_FILE inode flag that cachefiles can set to ward off
other kernel services and drivers (including itself) from using its cache
files.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 include/linux/fs.h |    1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index aa3e3af92220..33d30742b26e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2003,6 +2003,7 @@ struct super_operations {
 #define S_ENCRYPTED	16384	/* Encrypted file (using fs/crypto/) */
 #define S_CASEFOLD	32768	/* Casefolded file */
 #define S_VERITY	65536	/* Verity file (using fs/verity/) */
+#define S_CACHE_FILE	0x20000	/* File is in use as cache file (eg. fs/cachefiles) */
 
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 05/32] mm: Provide lru_to_last_page() to get last of a page list
  2020-07-13 16:30 ` David Howells
                   ` (4 preceding siblings ...)
  (?)
@ 2020-07-13 16:31 ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:31 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Provide a macro, lru_to_last_page(), to find the last page in a page list
(the opposite of lru_to_page()).

Signed-off-by: David Howells <dhowells@redhat.com>
---

 include/linux/mm.h |    1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index dc7b87310c10..9692b9b58b06 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -216,6 +216,7 @@ int overcommit_kbytes_handler(struct ctl_table *, int, void *, size_t *,
 #define PAGE_ALIGNED(addr)	IS_ALIGNED((unsigned long)(addr), PAGE_SIZE)
 
 #define lru_to_page(head) (list_entry((head)->prev, struct page, lru))
+#define lru_to_last_page(head) (list_entry((head)->next, struct page, lru))
 
 /*
  * Linux kernel virtual memory manager primitives.



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 06/32] cachefiles: Remove tree of active files and use S_CACHE_FILE inode flag
  2020-07-13 16:30 ` David Howells
@ 2020-07-13 16:31   ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:31 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Remove the tree of active dentries from the cachefiles_cache struct and
instead set a flag, S_CACHE_FILE, on the backing inode to indicate that
this file is in use by the kernel so as to ward off other kernel users.

This simplifies the code a lot and also prevents two overlain caches from
fighting with each other.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/daemon.c            |    4 
 fs/cachefiles/interface.c         |   20 --
 fs/cachefiles/internal.h          |   10 -
 fs/cachefiles/namei.c             |  375 +++++++------------------------------
 include/trace/events/cachefiles.h |   29 ---
 5 files changed, 78 insertions(+), 360 deletions(-)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 752c1e43416f..8a937d6d5e22 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -102,8 +102,6 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
 	}
 
 	mutex_init(&cache->daemon_mutex);
-	cache->active_nodes = RB_ROOT;
-	rwlock_init(&cache->active_lock);
 	init_waitqueue_head(&cache->daemon_pollwq);
 
 	/* set default caching limits
@@ -138,8 +136,6 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
 
 	cachefiles_daemon_unbind(cache);
 
-	ASSERT(!cache->active_nodes.rb_node);
-
 	/* clean up the control file interface */
 	cache->cachefilesd = NULL;
 	file->private_data = NULL;
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 99f42d216ef7..b868afb970ad 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -36,7 +36,6 @@ static struct fscache_object *cachefiles_alloc_object(
 
 	ASSERTCMP(object->backer, ==, NULL);
 
-	BUG_ON(test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags));
 	atomic_set(&object->usage, 1);
 
 	fscache_object_init(&object->fscache, cookie, &cache->cache);
@@ -74,7 +73,6 @@ static struct fscache_object *cachefiles_alloc_object(
 nomem_key:
 	kfree(buffer);
 nomem_buffer:
-	BUG_ON(test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags));
 	kmem_cache_free(cachefiles_object_jar, object);
 	fscache_object_destroyed(&cache->cache);
 nomem_object:
@@ -190,8 +188,6 @@ static void cachefiles_drop_object(struct fscache_object *_object)
 	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
 	const struct cred *saved_cred;
-	struct inode *inode;
-	blkcnt_t i_blocks = 0;
 
 	ASSERT(_object);
 
@@ -218,10 +214,6 @@ static void cachefiles_drop_object(struct fscache_object *_object)
 		    _object != cache->cache.fsdef
 		    ) {
 			_debug("- retire object OBJ%x", object->fscache.debug_id);
-			inode = d_backing_inode(object->dentry);
-			if (inode)
-				i_blocks = inode->i_blocks;
-
 			cachefiles_begin_secure(cache, &saved_cred);
 			cachefiles_delete_object(cache, object);
 			cachefiles_end_secure(cache, saved_cred);
@@ -231,14 +223,11 @@ static void cachefiles_drop_object(struct fscache_object *_object)
 		if (object->backer != object->dentry)
 			dput(object->backer);
 		object->backer = NULL;
-	}
 
-	/* note that the object is now inactive */
-	if (test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags))
-		cachefiles_mark_object_inactive(cache, object, i_blocks);
-
-	dput(object->dentry);
-	object->dentry = NULL;
+		cachefiles_unmark_inode_in_use(object, object->dentry);
+		dput(object->dentry);
+		object->dentry = NULL;
+	}
 
 	_leave("");
 }
@@ -274,7 +263,6 @@ static void cachefiles_put_object(struct fscache_object *_object,
 	if (u == 0) {
 		_debug("- kill object OBJ%x", object->fscache.debug_id);
 
-		ASSERT(!test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags));
 		ASSERTCMP(object->fscache.parent, ==, NULL);
 		ASSERTCMP(object->backer, ==, NULL);
 		ASSERTCMP(object->dentry, ==, NULL);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index a5d48f271ce1..f8f308ce7385 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -38,12 +38,9 @@ struct cachefiles_object {
 	struct dentry			*dentry;	/* the file/dir representing this object */
 	struct dentry			*backer;	/* backing file */
 	loff_t				i_size;		/* object size */
-	unsigned long			flags;
-#define CACHEFILES_OBJECT_ACTIVE	0		/* T if marked active */
 	atomic_t			usage;		/* object usage count */
 	uint8_t				type;		/* object type */
 	uint8_t				new;		/* T if object new */
-	struct rb_node			active_node;	/* link in active tree (dentry is key) */
 };
 
 extern struct kmem_cache *cachefiles_object_jar;
@@ -59,8 +56,6 @@ struct cachefiles_cache {
 	const struct cred		*cache_cred;	/* security override for accessing cache */
 	struct mutex			daemon_mutex;	/* command serialisation mutex */
 	wait_queue_head_t		daemon_pollwq;	/* poll waitqueue for daemon */
-	struct rb_root			active_nodes;	/* active nodes (can't be culled) */
-	rwlock_t			active_lock;	/* lock for active_nodes */
 	atomic_t			gravecounter;	/* graveyard uniquifier */
 	atomic_t			f_released;	/* number of objects released lately */
 	atomic_long_t			b_released;	/* number of blocks released lately */
@@ -126,9 +121,8 @@ extern char *cachefiles_cook_key(const u8 *raw, int keylen, uint8_t type);
 /*
  * namei.c
  */
-extern void cachefiles_mark_object_inactive(struct cachefiles_cache *cache,
-					    struct cachefiles_object *object,
-					    blkcnt_t i_blocks);
+extern void cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
+				    struct dentry *dentry);
 extern int cachefiles_delete_object(struct cachefiles_cache *cache,
 				    struct cachefiles_object *object);
 extern int cachefiles_walk_to_object(struct cachefiles_object *parent,
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 924042e8cced..818d1bca1904 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -21,251 +21,51 @@
 #define CACHEFILES_KEYBUF_SIZE 512
 
 /*
- * dump debugging info about an object
+ * Mark the backing file as being a cache file if it's not already in use so.
  */
-static noinline
-void __cachefiles_printk_object(struct cachefiles_object *object,
-				const char *prefix)
+static bool cachefiles_mark_inode_in_use(struct cachefiles_object *object,
+					 struct dentry *dentry)
 {
-	struct fscache_cookie *cookie;
-	const u8 *k;
-	unsigned loop;
-
-	pr_err("%sobject: OBJ%x\n", prefix, object->fscache.debug_id);
-	pr_err("%sobjstate=%s fl=%lx wbusy=%x ev=%lx[%lx]\n",
-	       prefix, object->fscache.state->name,
-	       object->fscache.flags, work_busy(&object->fscache.work),
-	       object->fscache.events, object->fscache.event_mask);
-	pr_err("%sops=%u\n",
-	       prefix, object->fscache.n_ops);
-	pr_err("%sparent=%p\n",
-	       prefix, object->fscache.parent);
-
-	spin_lock(&object->fscache.lock);
-	cookie = object->fscache.cookie;
-	if (cookie) {
-		pr_err("%scookie=%p [pr=%p fl=%lx]\n",
-		       prefix,
-		       object->fscache.cookie,
-		       object->fscache.cookie->parent,
-		       object->fscache.cookie->flags);
-		pr_err("%skey=[%u] '", prefix, cookie->key_len);
-		k = (cookie->key_len <= sizeof(cookie->inline_key)) ?
-			cookie->inline_key : cookie->key;
-		for (loop = 0; loop < cookie->key_len; loop++)
-			pr_cont("%02x", k[loop]);
-		pr_cont("'\n");
-	} else {
-		pr_err("%scookie=NULL\n", prefix);
-	}
-	spin_unlock(&object->fscache.lock);
-}
-
-/*
- * dump debugging info about a pair of objects
- */
-static noinline void cachefiles_printk_object(struct cachefiles_object *object,
-					      struct cachefiles_object *xobject)
-{
-	if (object)
-		__cachefiles_printk_object(object, "");
-	if (xobject)
-		__cachefiles_printk_object(xobject, "x");
-}
-
-/*
- * mark the owner of a dentry, if there is one, to indicate that that dentry
- * has been preemptively deleted
- * - the caller must hold the i_mutex on the dentry's parent as required to
- *   call vfs_unlink(), vfs_rmdir() or vfs_rename()
- */
-static void cachefiles_mark_object_buried(struct cachefiles_cache *cache,
-					  struct dentry *dentry,
-					  enum fscache_why_object_killed why)
-{
-	struct cachefiles_object *object;
-	struct rb_node *p;
-
-	_enter(",'%pd'", dentry);
-
-	write_lock(&cache->active_lock);
-
-	p = cache->active_nodes.rb_node;
-	while (p) {
-		object = rb_entry(p, struct cachefiles_object, active_node);
-		if (object->dentry > dentry)
-			p = p->rb_left;
-		else if (object->dentry < dentry)
-			p = p->rb_right;
-		else
-			goto found_dentry;
-	}
-
-	write_unlock(&cache->active_lock);
-	trace_cachefiles_mark_buried(NULL, dentry, why);
-	_leave(" [no owner]");
-	return;
+	struct inode *inode = d_backing_inode(dentry);
+	bool can_use = false;
 
-	/* found the dentry for  */
-found_dentry:
-	kdebug("preemptive burial: OBJ%x [%s] %p",
-	       object->fscache.debug_id,
-	       object->fscache.state->name,
-	       dentry);
+	_enter(",%p", object);
 
-	trace_cachefiles_mark_buried(object, dentry, why);
+	inode_lock(inode);
 
-	if (fscache_object_is_live(&object->fscache)) {
-		pr_err("\n");
-		pr_err("Error: Can't preemptively bury live object\n");
-		cachefiles_printk_object(object, NULL);
+	if (!(inode->i_flags & S_CACHE_FILE)) {
+		inode->i_flags |= S_CACHE_FILE;
+		trace_cachefiles_mark_active(object, dentry);
+		can_use = true;
 	} else {
-		if (why != FSCACHE_OBJECT_IS_STALE)
-			fscache_object_mark_killed(&object->fscache, why);
+		pr_notice("cachefiles: Inode already in use: %pd\n", dentry);
 	}
 
-	write_unlock(&cache->active_lock);
-	_leave(" [owner marked]");
+	inode_unlock(inode);
+	return can_use;
 }
 
 /*
- * record the fact that an object is now active
+ * Unmark a backing inode.
  */
-static int cachefiles_mark_object_active(struct cachefiles_cache *cache,
-					 struct cachefiles_object *object)
+void cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
+				    struct dentry *dentry)
 {
-	struct cachefiles_object *xobject;
-	struct rb_node **_p, *_parent = NULL;
-	struct dentry *dentry;
-
-	_enter(",%p", object);
-
-try_again:
-	write_lock(&cache->active_lock);
-
-	dentry = object->dentry;
-	trace_cachefiles_mark_active(object, dentry);
-
-	if (test_and_set_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags)) {
-		pr_err("Error: Object already active\n");
-		cachefiles_printk_object(object, NULL);
-		BUG();
-	}
-
-	_p = &cache->active_nodes.rb_node;
-	while (*_p) {
-		_parent = *_p;
-		xobject = rb_entry(_parent,
-				   struct cachefiles_object, active_node);
-
-		ASSERT(xobject != object);
-
-		if (xobject->dentry > dentry)
-			_p = &(*_p)->rb_left;
-		else if (xobject->dentry < dentry)
-			_p = &(*_p)->rb_right;
-		else
-			goto wait_for_old_object;
-	}
-
-	rb_link_node(&object->active_node, _parent, _p);
-	rb_insert_color(&object->active_node, &cache->active_nodes);
-
-	write_unlock(&cache->active_lock);
-	_leave(" = 0");
-	return 0;
-
-	/* an old object from a previous incarnation is hogging the slot - we
-	 * need to wait for it to be destroyed */
-wait_for_old_object:
-	trace_cachefiles_wait_active(object, dentry, xobject);
-	clear_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags);
-
-	if (fscache_object_is_live(&xobject->fscache)) {
-		pr_err("\n");
-		pr_err("Error: Unexpected object collision\n");
-		cachefiles_printk_object(object, xobject);
-	}
-	atomic_inc(&xobject->usage);
-	write_unlock(&cache->active_lock);
-
-	if (test_bit(CACHEFILES_OBJECT_ACTIVE, &xobject->flags)) {
-		wait_queue_head_t *wq;
-
-		signed long timeout = 60 * HZ;
-		wait_queue_entry_t wait;
-		bool requeue;
-
-		/* if the object we're waiting for is queued for processing,
-		 * then just put ourselves on the queue behind it */
-		if (work_pending(&xobject->fscache.work)) {
-			_debug("queue OBJ%x behind OBJ%x immediately",
-			       object->fscache.debug_id,
-			       xobject->fscache.debug_id);
-			goto requeue;
-		}
-
-		/* otherwise we sleep until either the object we're waiting for
-		 * is done, or the fscache_object is congested */
-		wq = bit_waitqueue(&xobject->flags, CACHEFILES_OBJECT_ACTIVE);
-		init_wait(&wait);
-		requeue = false;
-		do {
-			prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
-			if (!test_bit(CACHEFILES_OBJECT_ACTIVE, &xobject->flags))
-				break;
-
-			requeue = fscache_object_sleep_till_congested(&timeout);
-		} while (timeout > 0 && !requeue);
-		finish_wait(wq, &wait);
-
-		if (requeue &&
-		    test_bit(CACHEFILES_OBJECT_ACTIVE, &xobject->flags)) {
-			_debug("queue OBJ%x behind OBJ%x after wait",
-			       object->fscache.debug_id,
-			       xobject->fscache.debug_id);
-			goto requeue;
-		}
-
-		if (timeout <= 0) {
-			pr_err("\n");
-			pr_err("Error: Overlong wait for old active object to go away\n");
-			cachefiles_printk_object(object, xobject);
-			goto requeue;
-		}
-	}
-
-	ASSERT(!test_bit(CACHEFILES_OBJECT_ACTIVE, &xobject->flags));
-
-	cache->cache.ops->put_object(&xobject->fscache,
-		(enum fscache_obj_ref_trace)cachefiles_obj_put_wait_retry);
-	goto try_again;
+	struct inode *inode = d_backing_inode(dentry);
 
-requeue:
-	cache->cache.ops->put_object(&xobject->fscache,
-		(enum fscache_obj_ref_trace)cachefiles_obj_put_wait_timeo);
-	_leave(" = -ETIMEDOUT");
-	return -ETIMEDOUT;
+	inode_lock(inode);
+	inode->i_flags &= ~S_CACHE_FILE;
+	inode_unlock(inode);
+	trace_cachefiles_mark_inactive(object, dentry, inode);
 }
 
 /*
  * Mark an object as being inactive.
  */
-void cachefiles_mark_object_inactive(struct cachefiles_cache *cache,
-				     struct cachefiles_object *object,
-				     blkcnt_t i_blocks)
+static void cachefiles_mark_object_inactive(struct cachefiles_cache *cache,
+					    struct cachefiles_object *object)
 {
-	struct dentry *dentry = object->dentry;
-	struct inode *inode = d_backing_inode(dentry);
-
-	trace_cachefiles_mark_inactive(object, dentry, inode);
-
-	write_lock(&cache->active_lock);
-	rb_erase(&object->active_node, &cache->active_nodes);
-	clear_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags);
-	write_unlock(&cache->active_lock);
-
-	wake_up_bit(&object->flags, CACHEFILES_OBJECT_ACTIVE);
+	blkcnt_t i_blocks = d_backing_inode(object->dentry)->i_blocks;
 
 	/* This object can now be culled, so we need to let the daemon know
 	 * that there is something it can remove if it needs to.
@@ -286,7 +86,6 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
 				  struct cachefiles_object *object,
 				  struct dentry *dir,
 				  struct dentry *rep,
-				  bool preemptive,
 				  enum fscache_why_object_killed why)
 {
 	struct dentry *grave, *trap;
@@ -310,9 +109,6 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
 		} else {
 			trace_cachefiles_unlink(object, rep, why);
 			ret = vfs_unlink(d_inode(dir), rep, NULL);
-
-			if (preemptive)
-				cachefiles_mark_object_buried(cache, rep, why);
 		}
 
 		inode_unlock(d_inode(dir));
@@ -373,8 +169,7 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
 			return -ENOMEM;
 		}
 
-		cachefiles_io_error(cache, "Lookup error %ld",
-				    PTR_ERR(grave));
+		cachefiles_io_error(cache, "Lookup error %ld", PTR_ERR(grave));
 		return -EIO;
 	}
 
@@ -416,9 +211,6 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
 		if (ret != 0 && ret != -ENOMEM)
 			cachefiles_io_error(cache,
 					    "Rename failed with error %d", ret);
-
-		if (preemptive)
-			cachefiles_mark_object_buried(cache, rep, why);
 	}
 
 	unlock_rename(cache->graveyard, dir);
@@ -446,26 +238,18 @@ int cachefiles_delete_object(struct cachefiles_cache *cache,
 
 	inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
 
-	if (test_bit(FSCACHE_OBJECT_KILLED_BY_CACHE, &object->fscache.flags)) {
-		/* object allocation for the same key preemptively deleted this
-		 * object's file so that it could create its own file */
-		_debug("object preemptively buried");
+	/* We need to check that our parent is _still_ our parent - it may have
+	 * been renamed.
+	 */
+	if (dir == object->dentry->d_parent) {
+		ret = cachefiles_bury_object(cache, object, dir, object->dentry,
+					     FSCACHE_OBJECT_WAS_RETIRED);
+	} else {
+		/* It got moved, presumably by cachefilesd culling it, so it's
+		 * no longer in the key path and we can ignore it.
+		 */
 		inode_unlock(d_inode(dir));
 		ret = 0;
-	} else {
-		/* we need to check that our parent is _still_ our parent - it
-		 * may have been renamed */
-		if (dir == object->dentry->d_parent) {
-			ret = cachefiles_bury_object(cache, object, dir,
-						     object->dentry, false,
-						     FSCACHE_OBJECT_WAS_RETIRED);
-		} else {
-			/* it got moved, presumably by cachefilesd culling it,
-			 * so it's no longer in the key path and we can ignore
-			 * it */
-			inode_unlock(d_inode(dir));
-			ret = 0;
-		}
 	}
 
 	dput(dir);
@@ -487,6 +271,7 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	struct path path;
 	unsigned long start;
 	const char *name;
+	bool marked = false;
 	int ret, nlen;
 
 	_enter("OBJ%x{%p},OBJ%x,%s,",
@@ -529,6 +314,7 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	cachefiles_hist(cachefiles_lookup_histogram, start);
 	if (IS_ERR(next)) {
 		trace_cachefiles_lookup(object, next, NULL);
+		ret = PTR_ERR(next);
 		goto lookup_error;
 	}
 
@@ -628,6 +414,13 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	/* we've found the object we were looking for */
 	object->dentry = next;
 
+	/* note that we're now using this object */
+	if (!cachefiles_mark_inode_in_use(object, object->dentry)) {
+		ret = -EBUSY;
+		goto check_error_unlock;
+	}
+	marked = true;
+
 	/* if we've found that the terminal object exists, then we need to
 	 * check its attributes and delete it if it's out of date */
 	if (!object->new) {
@@ -640,13 +433,12 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 			object->dentry = NULL;
 
 			ret = cachefiles_bury_object(cache, object, dir, next,
-						     true,
 						     FSCACHE_OBJECT_IS_STALE);
 			dput(next);
 			next = NULL;
 
 			if (ret < 0)
-				goto delete_error;
+				goto error_out2;
 
 			_debug("redo lookup");
 			fscache_object_retrying_stale(&object->fscache);
@@ -654,16 +446,10 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 		}
 	}
 
-	/* note that we're now using this object */
-	ret = cachefiles_mark_object_active(cache, object);
-
 	inode_unlock(d_inode(dir));
 	dput(dir);
 	dir = NULL;
 
-	if (ret == -ETIMEDOUT)
-		goto mark_active_timed_out;
-
 	_debug("=== OBTAINED_OBJECT ===");
 
 	if (object->new) {
@@ -712,26 +498,19 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 		cachefiles_io_error(cache, "Create/mkdir failed");
 	goto error;
 
-mark_active_timed_out:
-	_debug("mark active timed out");
-	goto release_dentry;
-
+check_error_unlock:
+	inode_unlock(d_inode(dir));
+	dput(dir);
 check_error:
-	_debug("check error %d", ret);
-	cachefiles_mark_object_inactive(
-		cache, object, d_backing_inode(object->dentry)->i_blocks);
-release_dentry:
+	if (marked)
+		cachefiles_unmark_inode_in_use(object, object->dentry);
+	cachefiles_mark_object_inactive(cache, object);
 	dput(object->dentry);
 	object->dentry = NULL;
 	goto error_out;
 
-delete_error:
-	_debug("delete error %d", ret);
-	goto error_out2;
-
 lookup_error:
-	_debug("lookup error %ld", PTR_ERR(next));
-	ret = PTR_ERR(next);
+	_debug("lookup error %d", ret);
 	if (ret == -EIO)
 		cachefiles_io_error(cache, "Lookup failed");
 	next = NULL;
@@ -861,8 +640,6 @@ static struct dentry *cachefiles_check_active(struct cachefiles_cache *cache,
 					      struct dentry *dir,
 					      char *filename)
 {
-	struct cachefiles_object *object;
-	struct rb_node *_n;
 	struct dentry *victim;
 	unsigned long start;
 	int ret;
@@ -892,34 +669,9 @@ static struct dentry *cachefiles_check_active(struct cachefiles_cache *cache,
 		return ERR_PTR(-ENOENT);
 	}
 
-	/* check to see if we're using this object */
-	read_lock(&cache->active_lock);
-
-	_n = cache->active_nodes.rb_node;
-
-	while (_n) {
-		object = rb_entry(_n, struct cachefiles_object, active_node);
-
-		if (object->dentry > victim)
-			_n = _n->rb_left;
-		else if (object->dentry < victim)
-			_n = _n->rb_right;
-		else
-			goto object_in_use;
-	}
-
-	read_unlock(&cache->active_lock);
-
 	//_leave(" = %p", victim);
 	return victim;
 
-object_in_use:
-	read_unlock(&cache->active_lock);
-	inode_unlock(d_inode(dir));
-	dput(victim);
-	//_leave(" = -EBUSY [in use]");
-	return ERR_PTR(-EBUSY);
-
 lookup_error:
 	inode_unlock(d_inode(dir));
 	ret = PTR_ERR(victim);
@@ -948,6 +700,7 @@ int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
 		    char *filename)
 {
 	struct dentry *victim;
+	struct inode *inode;
 	int ret;
 
 	_enter(",%pd/,%s", dir, filename);
@@ -956,6 +709,19 @@ int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
 	if (IS_ERR(victim))
 		return PTR_ERR(victim);
 
+	/* check to see if someone is using this object */
+	inode = d_inode(victim);
+	inode_lock(inode);
+	if (inode->i_flags & S_CACHE_FILE) {
+		ret = -EBUSY;
+	} else {
+		inode->i_flags |= S_CACHE_FILE;
+		ret = 0;
+	}
+	inode_unlock(inode);
+	if (ret < 0)
+		goto error_unlock;
+
 	_debug("victim -> %p %s",
 	       victim, d_backing_inode(victim) ? "positive" : "negative");
 
@@ -971,7 +737,7 @@ int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
 	/*  actually remove the victim (drops the dir mutex) */
 	_debug("bury");
 
-	ret = cachefiles_bury_object(cache, NULL, dir, victim, false,
+	ret = cachefiles_bury_object(cache, NULL, dir, victim,
 				     FSCACHE_OBJECT_WAS_CULLED);
 	if (ret < 0)
 		goto error;
@@ -1008,6 +774,7 @@ int cachefiles_check_in_use(struct cachefiles_cache *cache, struct dentry *dir,
 			    char *filename)
 {
 	struct dentry *victim;
+	int ret = 0;
 
 	//_enter(",%pd/,%s",
 	//       dir, filename);
@@ -1017,7 +784,9 @@ int cachefiles_check_in_use(struct cachefiles_cache *cache, struct dentry *dir,
 		return PTR_ERR(victim);
 
 	inode_unlock(d_inode(dir));
+	if (d_inode(victim)->i_flags & S_CACHE_FILE)
+		ret = -EBUSY;
 	dput(victim);
 	//_leave(" = 0");
-	return 0;
+	return ret;
 }
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 9a448fe9355d..c877035c2946 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -237,35 +237,6 @@ TRACE_EVENT(cachefiles_mark_active,
 		      __entry->obj, __entry->de)
 	    );
 
-TRACE_EVENT(cachefiles_wait_active,
-	    TP_PROTO(struct cachefiles_object *obj,
-		     struct dentry *de,
-		     struct cachefiles_object *xobj),
-
-	    TP_ARGS(obj, de, xobj),
-
-	    /* Note that obj may be NULL */
-	    TP_STRUCT__entry(
-		    __field(unsigned int,		obj		)
-		    __field(unsigned int,		xobj		)
-		    __field(struct dentry *,		de		)
-		    __field(u16,			flags		)
-		    __field(u16,			fsc_flags	)
-			     ),
-
-	    TP_fast_assign(
-		    __entry->obj	= obj->fscache.debug_id;
-		    __entry->de		= de;
-		    __entry->xobj	= xobj->fscache.debug_id;
-		    __entry->flags	= xobj->flags;
-		    __entry->fsc_flags	= xobj->fscache.flags;
-			   ),
-
-	    TP_printk("o=%08x d=%p wo=%08x wf=%x wff=%x",
-		      __entry->obj, __entry->de, __entry->xobj,
-		      __entry->flags, __entry->fsc_flags)
-	    );
-
 TRACE_EVENT(cachefiles_mark_inactive,
 	    TP_PROTO(struct cachefiles_object *obj,
 		     struct dentry *de,



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 06/32] cachefiles: Remove tree of active files and use S_CACHE_FILE inode flag
@ 2020-07-13 16:31   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:31 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs

Remove the tree of active dentries from the cachefiles_cache struct and
instead set a flag, S_CACHE_FILE, on the backing inode to indicate that
this file is in use by the kernel so as to ward off other kernel users.

This simplifies the code a lot and also prevents two overlain caches from
fighting with each other.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/daemon.c            |    4 
 fs/cachefiles/interface.c         |   20 --
 fs/cachefiles/internal.h          |   10 -
 fs/cachefiles/namei.c             |  375 +++++++------------------------------
 include/trace/events/cachefiles.h |   29 ---
 5 files changed, 78 insertions(+), 360 deletions(-)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 752c1e43416f..8a937d6d5e22 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -102,8 +102,6 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
 	}
 
 	mutex_init(&cache->daemon_mutex);
-	cache->active_nodes = RB_ROOT;
-	rwlock_init(&cache->active_lock);
 	init_waitqueue_head(&cache->daemon_pollwq);
 
 	/* set default caching limits
@@ -138,8 +136,6 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
 
 	cachefiles_daemon_unbind(cache);
 
-	ASSERT(!cache->active_nodes.rb_node);
-
 	/* clean up the control file interface */
 	cache->cachefilesd = NULL;
 	file->private_data = NULL;
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 99f42d216ef7..b868afb970ad 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -36,7 +36,6 @@ static struct fscache_object *cachefiles_alloc_object(
 
 	ASSERTCMP(object->backer, ==, NULL);
 
-	BUG_ON(test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags));
 	atomic_set(&object->usage, 1);
 
 	fscache_object_init(&object->fscache, cookie, &cache->cache);
@@ -74,7 +73,6 @@ static struct fscache_object *cachefiles_alloc_object(
 nomem_key:
 	kfree(buffer);
 nomem_buffer:
-	BUG_ON(test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags));
 	kmem_cache_free(cachefiles_object_jar, object);
 	fscache_object_destroyed(&cache->cache);
 nomem_object:
@@ -190,8 +188,6 @@ static void cachefiles_drop_object(struct fscache_object *_object)
 	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
 	const struct cred *saved_cred;
-	struct inode *inode;
-	blkcnt_t i_blocks = 0;
 
 	ASSERT(_object);
 
@@ -218,10 +214,6 @@ static void cachefiles_drop_object(struct fscache_object *_object)
 		    _object != cache->cache.fsdef
 		    ) {
 			_debug("- retire object OBJ%x", object->fscache.debug_id);
-			inode = d_backing_inode(object->dentry);
-			if (inode)
-				i_blocks = inode->i_blocks;
-
 			cachefiles_begin_secure(cache, &saved_cred);
 			cachefiles_delete_object(cache, object);
 			cachefiles_end_secure(cache, saved_cred);
@@ -231,14 +223,11 @@ static void cachefiles_drop_object(struct fscache_object *_object)
 		if (object->backer != object->dentry)
 			dput(object->backer);
 		object->backer = NULL;
-	}
 
-	/* note that the object is now inactive */
-	if (test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags))
-		cachefiles_mark_object_inactive(cache, object, i_blocks);
-
-	dput(object->dentry);
-	object->dentry = NULL;
+		cachefiles_unmark_inode_in_use(object, object->dentry);
+		dput(object->dentry);
+		object->dentry = NULL;
+	}
 
 	_leave("");
 }
@@ -274,7 +263,6 @@ static void cachefiles_put_object(struct fscache_object *_object,
 	if (u == 0) {
 		_debug("- kill object OBJ%x", object->fscache.debug_id);
 
-		ASSERT(!test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags));
 		ASSERTCMP(object->fscache.parent, ==, NULL);
 		ASSERTCMP(object->backer, ==, NULL);
 		ASSERTCMP(object->dentry, ==, NULL);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index a5d48f271ce1..f8f308ce7385 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -38,12 +38,9 @@ struct cachefiles_object {
 	struct dentry			*dentry;	/* the file/dir representing this object */
 	struct dentry			*backer;	/* backing file */
 	loff_t				i_size;		/* object size */
-	unsigned long			flags;
-#define CACHEFILES_OBJECT_ACTIVE	0		/* T if marked active */
 	atomic_t			usage;		/* object usage count */
 	uint8_t				type;		/* object type */
 	uint8_t				new;		/* T if object new */
-	struct rb_node			active_node;	/* link in active tree (dentry is key) */
 };
 
 extern struct kmem_cache *cachefiles_object_jar;
@@ -59,8 +56,6 @@ struct cachefiles_cache {
 	const struct cred		*cache_cred;	/* security override for accessing cache */
 	struct mutex			daemon_mutex;	/* command serialisation mutex */
 	wait_queue_head_t		daemon_pollwq;	/* poll waitqueue for daemon */
-	struct rb_root			active_nodes;	/* active nodes (can't be culled) */
-	rwlock_t			active_lock;	/* lock for active_nodes */
 	atomic_t			gravecounter;	/* graveyard uniquifier */
 	atomic_t			f_released;	/* number of objects released lately */
 	atomic_long_t			b_released;	/* number of blocks released lately */
@@ -126,9 +121,8 @@ extern char *cachefiles_cook_key(const u8 *raw, int keylen, uint8_t type);
 /*
  * namei.c
  */
-extern void cachefiles_mark_object_inactive(struct cachefiles_cache *cache,
-					    struct cachefiles_object *object,
-					    blkcnt_t i_blocks);
+extern void cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
+				    struct dentry *dentry);
 extern int cachefiles_delete_object(struct cachefiles_cache *cache,
 				    struct cachefiles_object *object);
 extern int cachefiles_walk_to_object(struct cachefiles_object *parent,
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 924042e8cced..818d1bca1904 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -21,251 +21,51 @@
 #define CACHEFILES_KEYBUF_SIZE 512
 
 /*
- * dump debugging info about an object
+ * Mark the backing file as being a cache file if it's not already in use so.
  */
-static noinline
-void __cachefiles_printk_object(struct cachefiles_object *object,
-				const char *prefix)
+static bool cachefiles_mark_inode_in_use(struct cachefiles_object *object,
+					 struct dentry *dentry)
 {
-	struct fscache_cookie *cookie;
-	const u8 *k;
-	unsigned loop;
-
-	pr_err("%sobject: OBJ%x\n", prefix, object->fscache.debug_id);
-	pr_err("%sobjstate=%s fl=%lx wbusy=%x ev=%lx[%lx]\n",
-	       prefix, object->fscache.state->name,
-	       object->fscache.flags, work_busy(&object->fscache.work),
-	       object->fscache.events, object->fscache.event_mask);
-	pr_err("%sops=%u\n",
-	       prefix, object->fscache.n_ops);
-	pr_err("%sparent=%p\n",
-	       prefix, object->fscache.parent);
-
-	spin_lock(&object->fscache.lock);
-	cookie = object->fscache.cookie;
-	if (cookie) {
-		pr_err("%scookie=%p [pr=%p fl=%lx]\n",
-		       prefix,
-		       object->fscache.cookie,
-		       object->fscache.cookie->parent,
-		       object->fscache.cookie->flags);
-		pr_err("%skey=[%u] '", prefix, cookie->key_len);
-		k = (cookie->key_len <= sizeof(cookie->inline_key)) ?
-			cookie->inline_key : cookie->key;
-		for (loop = 0; loop < cookie->key_len; loop++)
-			pr_cont("%02x", k[loop]);
-		pr_cont("'\n");
-	} else {
-		pr_err("%scookie=NULL\n", prefix);
-	}
-	spin_unlock(&object->fscache.lock);
-}
-
-/*
- * dump debugging info about a pair of objects
- */
-static noinline void cachefiles_printk_object(struct cachefiles_object *object,
-					      struct cachefiles_object *xobject)
-{
-	if (object)
-		__cachefiles_printk_object(object, "");
-	if (xobject)
-		__cachefiles_printk_object(xobject, "x");
-}
-
-/*
- * mark the owner of a dentry, if there is one, to indicate that that dentry
- * has been preemptively deleted
- * - the caller must hold the i_mutex on the dentry's parent as required to
- *   call vfs_unlink(), vfs_rmdir() or vfs_rename()
- */
-static void cachefiles_mark_object_buried(struct cachefiles_cache *cache,
-					  struct dentry *dentry,
-					  enum fscache_why_object_killed why)
-{
-	struct cachefiles_object *object;
-	struct rb_node *p;
-
-	_enter(",'%pd'", dentry);
-
-	write_lock(&cache->active_lock);
-
-	p = cache->active_nodes.rb_node;
-	while (p) {
-		object = rb_entry(p, struct cachefiles_object, active_node);
-		if (object->dentry > dentry)
-			p = p->rb_left;
-		else if (object->dentry < dentry)
-			p = p->rb_right;
-		else
-			goto found_dentry;
-	}
-
-	write_unlock(&cache->active_lock);
-	trace_cachefiles_mark_buried(NULL, dentry, why);
-	_leave(" [no owner]");
-	return;
+	struct inode *inode = d_backing_inode(dentry);
+	bool can_use = false;
 
-	/* found the dentry for  */
-found_dentry:
-	kdebug("preemptive burial: OBJ%x [%s] %p",
-	       object->fscache.debug_id,
-	       object->fscache.state->name,
-	       dentry);
+	_enter(",%p", object);
 
-	trace_cachefiles_mark_buried(object, dentry, why);
+	inode_lock(inode);
 
-	if (fscache_object_is_live(&object->fscache)) {
-		pr_err("\n");
-		pr_err("Error: Can't preemptively bury live object\n");
-		cachefiles_printk_object(object, NULL);
+	if (!(inode->i_flags & S_CACHE_FILE)) {
+		inode->i_flags |= S_CACHE_FILE;
+		trace_cachefiles_mark_active(object, dentry);
+		can_use = true;
 	} else {
-		if (why != FSCACHE_OBJECT_IS_STALE)
-			fscache_object_mark_killed(&object->fscache, why);
+		pr_notice("cachefiles: Inode already in use: %pd\n", dentry);
 	}
 
-	write_unlock(&cache->active_lock);
-	_leave(" [owner marked]");
+	inode_unlock(inode);
+	return can_use;
 }
 
 /*
- * record the fact that an object is now active
+ * Unmark a backing inode.
  */
-static int cachefiles_mark_object_active(struct cachefiles_cache *cache,
-					 struct cachefiles_object *object)
+void cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
+				    struct dentry *dentry)
 {
-	struct cachefiles_object *xobject;
-	struct rb_node **_p, *_parent = NULL;
-	struct dentry *dentry;
-
-	_enter(",%p", object);
-
-try_again:
-	write_lock(&cache->active_lock);
-
-	dentry = object->dentry;
-	trace_cachefiles_mark_active(object, dentry);
-
-	if (test_and_set_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags)) {
-		pr_err("Error: Object already active\n");
-		cachefiles_printk_object(object, NULL);
-		BUG();
-	}
-
-	_p = &cache->active_nodes.rb_node;
-	while (*_p) {
-		_parent = *_p;
-		xobject = rb_entry(_parent,
-				   struct cachefiles_object, active_node);
-
-		ASSERT(xobject != object);
-
-		if (xobject->dentry > dentry)
-			_p = &(*_p)->rb_left;
-		else if (xobject->dentry < dentry)
-			_p = &(*_p)->rb_right;
-		else
-			goto wait_for_old_object;
-	}
-
-	rb_link_node(&object->active_node, _parent, _p);
-	rb_insert_color(&object->active_node, &cache->active_nodes);
-
-	write_unlock(&cache->active_lock);
-	_leave(" = 0");
-	return 0;
-
-	/* an old object from a previous incarnation is hogging the slot - we
-	 * need to wait for it to be destroyed */
-wait_for_old_object:
-	trace_cachefiles_wait_active(object, dentry, xobject);
-	clear_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags);
-
-	if (fscache_object_is_live(&xobject->fscache)) {
-		pr_err("\n");
-		pr_err("Error: Unexpected object collision\n");
-		cachefiles_printk_object(object, xobject);
-	}
-	atomic_inc(&xobject->usage);
-	write_unlock(&cache->active_lock);
-
-	if (test_bit(CACHEFILES_OBJECT_ACTIVE, &xobject->flags)) {
-		wait_queue_head_t *wq;
-
-		signed long timeout = 60 * HZ;
-		wait_queue_entry_t wait;
-		bool requeue;
-
-		/* if the object we're waiting for is queued for processing,
-		 * then just put ourselves on the queue behind it */
-		if (work_pending(&xobject->fscache.work)) {
-			_debug("queue OBJ%x behind OBJ%x immediately",
-			       object->fscache.debug_id,
-			       xobject->fscache.debug_id);
-			goto requeue;
-		}
-
-		/* otherwise we sleep until either the object we're waiting for
-		 * is done, or the fscache_object is congested */
-		wq = bit_waitqueue(&xobject->flags, CACHEFILES_OBJECT_ACTIVE);
-		init_wait(&wait);
-		requeue = false;
-		do {
-			prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
-			if (!test_bit(CACHEFILES_OBJECT_ACTIVE, &xobject->flags))
-				break;
-
-			requeue = fscache_object_sleep_till_congested(&timeout);
-		} while (timeout > 0 && !requeue);
-		finish_wait(wq, &wait);
-
-		if (requeue &&
-		    test_bit(CACHEFILES_OBJECT_ACTIVE, &xobject->flags)) {
-			_debug("queue OBJ%x behind OBJ%x after wait",
-			       object->fscache.debug_id,
-			       xobject->fscache.debug_id);
-			goto requeue;
-		}
-
-		if (timeout <= 0) {
-			pr_err("\n");
-			pr_err("Error: Overlong wait for old active object to go away\n");
-			cachefiles_printk_object(object, xobject);
-			goto requeue;
-		}
-	}
-
-	ASSERT(!test_bit(CACHEFILES_OBJECT_ACTIVE, &xobject->flags));
-
-	cache->cache.ops->put_object(&xobject->fscache,
-		(enum fscache_obj_ref_trace)cachefiles_obj_put_wait_retry);
-	goto try_again;
+	struct inode *inode = d_backing_inode(dentry);
 
-requeue:
-	cache->cache.ops->put_object(&xobject->fscache,
-		(enum fscache_obj_ref_trace)cachefiles_obj_put_wait_timeo);
-	_leave(" = -ETIMEDOUT");
-	return -ETIMEDOUT;
+	inode_lock(inode);
+	inode->i_flags &= ~S_CACHE_FILE;
+	inode_unlock(inode);
+	trace_cachefiles_mark_inactive(object, dentry, inode);
 }
 
 /*
  * Mark an object as being inactive.
  */
-void cachefiles_mark_object_inactive(struct cachefiles_cache *cache,
-				     struct cachefiles_object *object,
-				     blkcnt_t i_blocks)
+static void cachefiles_mark_object_inactive(struct cachefiles_cache *cache,
+					    struct cachefiles_object *object)
 {
-	struct dentry *dentry = object->dentry;
-	struct inode *inode = d_backing_inode(dentry);
-
-	trace_cachefiles_mark_inactive(object, dentry, inode);
-
-	write_lock(&cache->active_lock);
-	rb_erase(&object->active_node, &cache->active_nodes);
-	clear_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags);
-	write_unlock(&cache->active_lock);
-
-	wake_up_bit(&object->flags, CACHEFILES_OBJECT_ACTIVE);
+	blkcnt_t i_blocks = d_backing_inode(object->dentry)->i_blocks;
 
 	/* This object can now be culled, so we need to let the daemon know
 	 * that there is something it can remove if it needs to.
@@ -286,7 +86,6 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
 				  struct cachefiles_object *object,
 				  struct dentry *dir,
 				  struct dentry *rep,
-				  bool preemptive,
 				  enum fscache_why_object_killed why)
 {
 	struct dentry *grave, *trap;
@@ -310,9 +109,6 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
 		} else {
 			trace_cachefiles_unlink(object, rep, why);
 			ret = vfs_unlink(d_inode(dir), rep, NULL);
-
-			if (preemptive)
-				cachefiles_mark_object_buried(cache, rep, why);
 		}
 
 		inode_unlock(d_inode(dir));
@@ -373,8 +169,7 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
 			return -ENOMEM;
 		}
 
-		cachefiles_io_error(cache, "Lookup error %ld",
-				    PTR_ERR(grave));
+		cachefiles_io_error(cache, "Lookup error %ld", PTR_ERR(grave));
 		return -EIO;
 	}
 
@@ -416,9 +211,6 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
 		if (ret != 0 && ret != -ENOMEM)
 			cachefiles_io_error(cache,
 					    "Rename failed with error %d", ret);
-
-		if (preemptive)
-			cachefiles_mark_object_buried(cache, rep, why);
 	}
 
 	unlock_rename(cache->graveyard, dir);
@@ -446,26 +238,18 @@ int cachefiles_delete_object(struct cachefiles_cache *cache,
 
 	inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
 
-	if (test_bit(FSCACHE_OBJECT_KILLED_BY_CACHE, &object->fscache.flags)) {
-		/* object allocation for the same key preemptively deleted this
-		 * object's file so that it could create its own file */
-		_debug("object preemptively buried");
+	/* We need to check that our parent is _still_ our parent - it may have
+	 * been renamed.
+	 */
+	if (dir == object->dentry->d_parent) {
+		ret = cachefiles_bury_object(cache, object, dir, object->dentry,
+					     FSCACHE_OBJECT_WAS_RETIRED);
+	} else {
+		/* It got moved, presumably by cachefilesd culling it, so it's
+		 * no longer in the key path and we can ignore it.
+		 */
 		inode_unlock(d_inode(dir));
 		ret = 0;
-	} else {
-		/* we need to check that our parent is _still_ our parent - it
-		 * may have been renamed */
-		if (dir == object->dentry->d_parent) {
-			ret = cachefiles_bury_object(cache, object, dir,
-						     object->dentry, false,
-						     FSCACHE_OBJECT_WAS_RETIRED);
-		} else {
-			/* it got moved, presumably by cachefilesd culling it,
-			 * so it's no longer in the key path and we can ignore
-			 * it */
-			inode_unlock(d_inode(dir));
-			ret = 0;
-		}
 	}
 
 	dput(dir);
@@ -487,6 +271,7 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	struct path path;
 	unsigned long start;
 	const char *name;
+	bool marked = false;
 	int ret, nlen;
 
 	_enter("OBJ%x{%p},OBJ%x,%s,",
@@ -529,6 +314,7 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	cachefiles_hist(cachefiles_lookup_histogram, start);
 	if (IS_ERR(next)) {
 		trace_cachefiles_lookup(object, next, NULL);
+		ret = PTR_ERR(next);
 		goto lookup_error;
 	}
 
@@ -628,6 +414,13 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	/* we've found the object we were looking for */
 	object->dentry = next;
 
+	/* note that we're now using this object */
+	if (!cachefiles_mark_inode_in_use(object, object->dentry)) {
+		ret = -EBUSY;
+		goto check_error_unlock;
+	}
+	marked = true;
+
 	/* if we've found that the terminal object exists, then we need to
 	 * check its attributes and delete it if it's out of date */
 	if (!object->new) {
@@ -640,13 +433,12 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 			object->dentry = NULL;
 
 			ret = cachefiles_bury_object(cache, object, dir, next,
-						     true,
 						     FSCACHE_OBJECT_IS_STALE);
 			dput(next);
 			next = NULL;
 
 			if (ret < 0)
-				goto delete_error;
+				goto error_out2;
 
 			_debug("redo lookup");
 			fscache_object_retrying_stale(&object->fscache);
@@ -654,16 +446,10 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 		}
 	}
 
-	/* note that we're now using this object */
-	ret = cachefiles_mark_object_active(cache, object);
-
 	inode_unlock(d_inode(dir));
 	dput(dir);
 	dir = NULL;
 
-	if (ret == -ETIMEDOUT)
-		goto mark_active_timed_out;
-
 	_debug("=== OBTAINED_OBJECT ===");
 
 	if (object->new) {
@@ -712,26 +498,19 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 		cachefiles_io_error(cache, "Create/mkdir failed");
 	goto error;
 
-mark_active_timed_out:
-	_debug("mark active timed out");
-	goto release_dentry;
-
+check_error_unlock:
+	inode_unlock(d_inode(dir));
+	dput(dir);
 check_error:
-	_debug("check error %d", ret);
-	cachefiles_mark_object_inactive(
-		cache, object, d_backing_inode(object->dentry)->i_blocks);
-release_dentry:
+	if (marked)
+		cachefiles_unmark_inode_in_use(object, object->dentry);
+	cachefiles_mark_object_inactive(cache, object);
 	dput(object->dentry);
 	object->dentry = NULL;
 	goto error_out;
 
-delete_error:
-	_debug("delete error %d", ret);
-	goto error_out2;
-
 lookup_error:
-	_debug("lookup error %ld", PTR_ERR(next));
-	ret = PTR_ERR(next);
+	_debug("lookup error %d", ret);
 	if (ret == -EIO)
 		cachefiles_io_error(cache, "Lookup failed");
 	next = NULL;
@@ -861,8 +640,6 @@ static struct dentry *cachefiles_check_active(struct cachefiles_cache *cache,
 					      struct dentry *dir,
 					      char *filename)
 {
-	struct cachefiles_object *object;
-	struct rb_node *_n;
 	struct dentry *victim;
 	unsigned long start;
 	int ret;
@@ -892,34 +669,9 @@ static struct dentry *cachefiles_check_active(struct cachefiles_cache *cache,
 		return ERR_PTR(-ENOENT);
 	}
 
-	/* check to see if we're using this object */
-	read_lock(&cache->active_lock);
-
-	_n = cache->active_nodes.rb_node;
-
-	while (_n) {
-		object = rb_entry(_n, struct cachefiles_object, active_node);
-
-		if (object->dentry > victim)
-			_n = _n->rb_left;
-		else if (object->dentry < victim)
-			_n = _n->rb_right;
-		else
-			goto object_in_use;
-	}
-
-	read_unlock(&cache->active_lock);
-
 	//_leave(" = %p", victim);
 	return victim;
 
-object_in_use:
-	read_unlock(&cache->active_lock);
-	inode_unlock(d_inode(dir));
-	dput(victim);
-	//_leave(" = -EBUSY [in use]");
-	return ERR_PTR(-EBUSY);
-
 lookup_error:
 	inode_unlock(d_inode(dir));
 	ret = PTR_ERR(victim);
@@ -948,6 +700,7 @@ int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
 		    char *filename)
 {
 	struct dentry *victim;
+	struct inode *inode;
 	int ret;
 
 	_enter(",%pd/,%s", dir, filename);
@@ -956,6 +709,19 @@ int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
 	if (IS_ERR(victim))
 		return PTR_ERR(victim);
 
+	/* check to see if someone is using this object */
+	inode = d_inode(victim);
+	inode_lock(inode);
+	if (inode->i_flags & S_CACHE_FILE) {
+		ret = -EBUSY;
+	} else {
+		inode->i_flags |= S_CACHE_FILE;
+		ret = 0;
+	}
+	inode_unlock(inode);
+	if (ret < 0)
+		goto error_unlock;
+
 	_debug("victim -> %p %s",
 	       victim, d_backing_inode(victim) ? "positive" : "negative");
 
@@ -971,7 +737,7 @@ int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
 	/*  actually remove the victim (drops the dir mutex) */
 	_debug("bury");
 
-	ret = cachefiles_bury_object(cache, NULL, dir, victim, false,
+	ret = cachefiles_bury_object(cache, NULL, dir, victim,
 				     FSCACHE_OBJECT_WAS_CULLED);
 	if (ret < 0)
 		goto error;
@@ -1008,6 +774,7 @@ int cachefiles_check_in_use(struct cachefiles_cache *cache, struct dentry *dir,
 			    char *filename)
 {
 	struct dentry *victim;
+	int ret = 0;
 
 	//_enter(",%pd/,%s",
 	//       dir, filename);
@@ -1017,7 +784,9 @@ int cachefiles_check_in_use(struct cachefiles_cache *cache, struct dentry *dir,
 		return PTR_ERR(victim);
 
 	inode_unlock(d_inode(dir));
+	if (d_inode(victim)->i_flags & S_CACHE_FILE)
+		ret = -EBUSY;
 	dput(victim);
 	//_leave(" = 0");
-	return 0;
+	return ret;
 }
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 9a448fe9355d..c877035c2946 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -237,35 +237,6 @@ TRACE_EVENT(cachefiles_mark_active,
 		      __entry->obj, __entry->de)
 	    );
 
-TRACE_EVENT(cachefiles_wait_active,
-	    TP_PROTO(struct cachefiles_object *obj,
-		     struct dentry *de,
-		     struct cachefiles_object *xobj),
-
-	    TP_ARGS(obj, de, xobj),
-
-	    /* Note that obj may be NULL */
-	    TP_STRUCT__entry(
-		    __field(unsigned int,		obj		)
-		    __field(unsigned int,		xobj		)
-		    __field(struct dentry *,		de		)
-		    __field(u16,			flags		)
-		    __field(u16,			fsc_flags	)
-			     ),
-
-	    TP_fast_assign(
-		    __entry->obj	= obj->fscache.debug_id;
-		    __entry->de		= de;
-		    __entry->xobj	= xobj->fscache.debug_id;
-		    __entry->flags	= xobj->flags;
-		    __entry->fsc_flags	= xobj->fscache.flags;
-			   ),
-
-	    TP_printk("o=%08x d=%p wo=%08x wf=%x wff=%x",
-		      __entry->obj, __entry->de, __entry->xobj,
-		      __entry->flags, __entry->fsc_flags)
-	    );
-
 TRACE_EVENT(cachefiles_mark_inactive,
 	    TP_PROTO(struct cachefiles_object *obj,
 		     struct dentry *de,

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 07/32] fscache: Provide a simple thread pool for running ops asynchronously
  2020-07-13 16:30 ` David Howells
                   ` (6 preceding siblings ...)
  (?)
@ 2020-07-13 16:31 ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:31 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Provide a simple thread pool that can be used to run cookie management
operations in the background and a dispatcher infrastructure to punt
operations to the pool if threads are available or to just run the
operation in the calling thread if not.

A future patch will replace all the object state machine stuff with whole
routines that do all the work in one go without trying to interleave bits
from various objects.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/Makefile            |    1 
 fs/fscache/dispatcher.c        |  144 ++++++++++++++++++++++++++++++++++++++++
 fs/fscache/internal.h          |    8 ++
 fs/fscache/main.c              |    7 ++
 include/trace/events/fscache.h |    6 +-
 5 files changed, 165 insertions(+), 1 deletion(-)
 create mode 100644 fs/fscache/dispatcher.c

diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
index ac3fcd909fff..7b10c6aad157 100644
--- a/fs/fscache/Makefile
+++ b/fs/fscache/Makefile
@@ -6,6 +6,7 @@
 fscache-y := \
 	cache.o \
 	cookie.o \
+	dispatcher.o \
 	fsdef.o \
 	main.o \
 	netfs.o \
diff --git a/fs/fscache/dispatcher.c b/fs/fscache/dispatcher.c
new file mode 100644
index 000000000000..fba71b99c951
--- /dev/null
+++ b/fs/fscache/dispatcher.c
@@ -0,0 +1,144 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Object dispatcher
+ *
+ * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#define FSCACHE_DEBUG_LEVEL OPERATION
+#include <linux/kthread.h>
+#include <linux/slab.h>
+#include <linux/wait.h>
+#include <linux/freezer.h>
+#include "internal.h"
+
+#define FSCACHE_DISPATCHER_POOL_SIZE 8
+
+static LIST_HEAD(fscache_pending_work);
+static DEFINE_SPINLOCK(fscache_work_lock);
+static DECLARE_WAIT_QUEUE_HEAD(fscache_dispatcher_pool);
+static struct completion fscache_dispatcher_pool_done[FSCACHE_DISPATCHER_POOL_SIZE];
+static bool fscache_dispatcher_stop;
+
+struct fscache_work {
+	struct list_head	link;
+	struct fscache_cookie	*cookie;
+	struct fscache_object	*object;
+	int			param;
+	void (*func)(struct fscache_cookie *, struct fscache_object *, int);
+};
+
+/*
+ * Attempt to queue some work to do.  If there's too much asynchronous work
+ * already queued, we'll do it here in this thread instead.
+ */
+void fscache_dispatch(struct fscache_cookie *cookie,
+		      struct fscache_object *object,
+		      int param,
+		      void (*func)(struct fscache_cookie *,
+				   struct fscache_object *, int))
+{
+	struct fscache_work *work;
+	bool queued = false;
+
+	work = kzalloc(sizeof(struct fscache_work), GFP_KERNEL);
+	if (work) {
+		work->cookie = cookie;
+		work->object = object;
+		work->param = param;
+		work->func = func;
+
+		spin_lock(&fscache_work_lock);
+		if (waitqueue_active(&fscache_dispatcher_pool) ||
+		    list_empty(&fscache_pending_work)) {
+			fscache_cookie_get(cookie, fscache_cookie_get_work);
+			list_add_tail(&work->link, &fscache_pending_work);
+			wake_up(&fscache_dispatcher_pool);
+			queued = true;
+		}
+		spin_unlock(&fscache_work_lock);
+	}
+
+	if (!queued) {
+		kfree(work);
+		func(cookie, object, param);
+	}
+}
+
+/*
+ * A dispatcher thread.
+ */
+static int fscache_dispatcher(void *data)
+{
+	struct completion *done = data;
+
+	for (;;) {
+		if (!list_empty(&fscache_pending_work)) {
+			struct fscache_work *work = NULL;
+
+			spin_lock(&fscache_work_lock);
+			if (!list_empty(&fscache_pending_work)) {
+				work = list_entry(fscache_pending_work.next,
+						  struct fscache_work, link);
+				list_del_init(&work->link);
+			}
+			spin_unlock(&fscache_work_lock);
+
+			if (work) {
+				work->func(work->cookie, work->object, work->param);
+				fscache_cookie_put(work->cookie, fscache_cookie_put_work);
+				kfree(work);
+			}
+			continue;
+		} else if (fscache_dispatcher_stop) {
+			break;
+		}
+
+		wait_event_freezable(fscache_dispatcher_pool,
+				     (fscache_dispatcher_stop ||
+				      !list_empty(&fscache_pending_work)));
+	}
+
+	complete_and_exit(done, 0);
+}
+
+/*
+ * Start up the dispatcher threads.
+ */
+int fscache_init_dispatchers(void)
+{
+	struct task_struct *t;
+	int i;
+
+	for (i = 0; i < FSCACHE_DISPATCHER_POOL_SIZE; i++) {
+		t = kthread_create(fscache_dispatcher,
+				   &fscache_dispatcher_pool_done[i],
+				   "kfsc/%d", i);
+		if (IS_ERR(t))
+			goto failed;
+		wake_up_process(t);
+	}
+
+	return 0;
+
+failed:
+	fscache_dispatcher_stop = true;
+	wake_up_all(&fscache_dispatcher_pool);
+	for (i--; i >= 0; i--)
+		wait_for_completion(&fscache_dispatcher_pool_done[i]);
+	return PTR_ERR(t);
+}
+
+/*
+ * Kill off the dispatcher threads.
+ */
+void fscache_kill_dispatchers(void)
+{
+	int i;
+
+	fscache_dispatcher_stop = true;
+	wake_up_all(&fscache_dispatcher_pool);
+
+	for (i = 0; i < FSCACHE_DISPATCHER_POOL_SIZE; i++)
+		wait_for_completion(&fscache_dispatcher_pool_done[i]);
+}
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index bc5539d2157b..2100e2222884 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -75,6 +75,14 @@ extern struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *);
 extern void fscache_cookie_put(struct fscache_cookie *,
 			       enum fscache_cookie_trace);
 
+/*
+ * dispatcher.c
+ */
+extern void fscache_dispatch(struct fscache_cookie *, struct fscache_object *, int,
+			     void (*func)(struct fscache_cookie *, struct fscache_object *, int));
+extern int fscache_init_dispatchers(void);
+extern void fscache_kill_dispatchers(void);
+
 /*
  * fsdef.c
  */
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index c1e6cc9091aa..c8f1beafa8e1 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -125,6 +125,10 @@ static int __init fscache_init(void)
 	for_each_possible_cpu(cpu)
 		init_waitqueue_head(&per_cpu(fscache_object_cong_wait, cpu));
 
+	ret = fscache_init_dispatchers();
+	if (ret < 0)
+		goto error_dispatchers;
+
 	ret = fscache_proc_init();
 	if (ret < 0)
 		goto error_proc;
@@ -159,6 +163,8 @@ static int __init fscache_init(void)
 	unregister_sysctl_table(fscache_sysctl_header);
 error_sysctl:
 #endif
+	fscache_kill_dispatchers();
+error_dispatchers:
 	fscache_proc_cleanup();
 error_proc:
 	destroy_workqueue(fscache_op_wq);
@@ -183,6 +189,7 @@ static void __exit fscache_exit(void)
 	unregister_sysctl_table(fscache_sysctl_header);
 #endif
 	fscache_proc_cleanup();
+	fscache_kill_dispatchers();
 	destroy_workqueue(fscache_op_wq);
 	destroy_workqueue(fscache_object_wq);
 	pr_notice("Unloaded\n");
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 08d7de72409d..fb3fdf2921ee 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -26,11 +26,13 @@ enum fscache_cookie_trace {
 	fscache_cookie_get_attach_object,
 	fscache_cookie_get_reacquire,
 	fscache_cookie_get_register_netfs,
+	fscache_cookie_get_work,
 	fscache_cookie_put_acquire_nobufs,
 	fscache_cookie_put_dup_netfs,
 	fscache_cookie_put_relinquish,
 	fscache_cookie_put_object,
 	fscache_cookie_put_parent,
+	fscache_cookie_put_work,
 };
 
 #endif
@@ -45,11 +47,13 @@ enum fscache_cookie_trace {
 	EM(fscache_cookie_get_attach_object,	"GET obj")		\
 	EM(fscache_cookie_get_reacquire,	"GET raq")		\
 	EM(fscache_cookie_get_register_netfs,	"GET net")		\
+	EM(fscache_cookie_get_work,		"GET wrk")		\
 	EM(fscache_cookie_put_acquire_nobufs,	"PUT nbf")		\
 	EM(fscache_cookie_put_dup_netfs,	"PUT dnt")		\
 	EM(fscache_cookie_put_relinquish,	"PUT rlq")		\
 	EM(fscache_cookie_put_object,		"PUT obj")		\
-	E_(fscache_cookie_put_parent,		"PUT prn")
+	EM(fscache_cookie_put_parent,		"PUT prn")		\
+	E_(fscache_cookie_put_work,		"PUT wrk")
 
 /*
  * Export enum symbols via userspace.



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 08/32] fscache: Replace the object management state machine
  2020-07-13 16:30 ` David Howells
@ 2020-07-13 16:32   ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:32 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Replace the object management state machine with something a lot simpler.
The entire process of setting up or tearing down a cookie is done in one
go, and the dispatcher either punts it to a worker thread, or if all the
worker threads are all busy, does it in the current thread.

fscache_enable_cookie() and fscache_disable_cookie() are replaced by 'use'
and 'unuse' routines to which the mode of access (readonly or writable) is
declared - these then impose the policy of what to do with the backing
object.

The policy for handling local writes is declared to
fscache_acquire_cookie() using FSACHE_ADV_WRITE_*CACHE flags.  This only
allows for the possibility of suspending caching whilst a file is open for
writing; policies such as write-through and write-back have to be handled
at the netfs level.

At some point in the future, object records that aren't in use will get put
on an LRU and discarded under memory pressure or if they haven't been used
for a while.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/cell.c                     |    4 
 fs/afs/file.c                     |   15 +
 fs/afs/inode.c                    |   13 
 fs/afs/volume.c                   |    4 
 fs/cachefiles/bind.c              |    4 
 fs/cachefiles/interface.c         |  160 +++---
 fs/cachefiles/internal.h          |   12 
 fs/cachefiles/main.c              |   11 
 fs/cachefiles/namei.c             |   60 +-
 fs/cachefiles/xattr.c             |    1 
 fs/fscache/Makefile               |    3 
 fs/fscache/cache.c                |  107 ++--
 fs/fscache/cookie.c               |  485 +++++-------------
 fs/fscache/fsdef.c                |    1 
 fs/fscache/histogram.c            |    2 
 fs/fscache/internal.h             |   53 +-
 fs/fscache/main.c                 |   29 -
 fs/fscache/netfs.c                |    4 
 fs/fscache/obj.c                  |  352 +++++++++++++
 fs/fscache/object-list.c          |   58 --
 fs/fscache/object.c               |  982 -------------------------------------
 fs/fscache/object_bits.c          |  120 +++++
 fs/fscache/proc.c                 |    6 
 fs/fscache/stats.c                |    2 
 include/linux/fscache-cache.h     |  208 ++------
 include/linux/fscache.h           |  173 +++----
 include/trace/events/cachefiles.h |   17 -
 include/trace/events/fscache.h    |  133 +----
 28 files changed, 1019 insertions(+), 2000 deletions(-)
 create mode 100644 fs/fscache/obj.c
 delete mode 100644 fs/fscache/object.c
 create mode 100644 fs/fscache/object_bits.c

diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index 7fdc39b1c965..1cdabbdeeb2c 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -613,7 +613,7 @@ static int afs_activate_cell(struct afs_net *net, struct afs_cell *cell)
 					     NULL,
 					     cell->name, strlen(cell->name),
 					     NULL, 0,
-					     0, true);
+					     0);
 #endif
 	ret = afs_proc_cell_setup(cell);
 	if (ret < 0)
@@ -652,7 +652,7 @@ static void afs_deactivate_cell(struct afs_net *net, struct afs_cell *cell)
 	mutex_unlock(&net->proc_cells_lock);
 
 #ifdef CONFIG_AFS_FSCACHE
-	fscache_relinquish_cookie(cell->cache, NULL, false);
+	fscache_relinquish_cookie(cell->cache, false);
 	cell->cache = NULL;
 #endif
 
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 13f078111bad..75058de366a1 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -145,7 +145,9 @@ int afs_open(struct inode *inode, struct file *file)
 
 	if (file->f_flags & O_TRUNC)
 		set_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags);
-	
+
+	fscache_use_cookie(afs_vnode_cache(vnode), file->f_mode & FMODE_WRITE);
+
 	file->private_data = af;
 	_leave(" = 0");
 	return 0;
@@ -164,8 +166,10 @@ int afs_open(struct inode *inode, struct file *file)
  */
 int afs_release(struct inode *inode, struct file *file)
 {
+	struct afs_vnode_cache_aux aux;
 	struct afs_vnode *vnode = AFS_FS_I(inode);
 	struct afs_file *af = file->private_data;
+	loff_t i_size;
 	int ret = 0;
 
 	_enter("{%llx:%llu},", vnode->fid.vid, vnode->fid.vnode);
@@ -176,6 +180,15 @@ int afs_release(struct inode *inode, struct file *file)
 	file->private_data = NULL;
 	if (af->wb)
 		afs_put_wb_key(af->wb);
+
+	if ((file->f_mode & FMODE_WRITE)) {
+		i_size = i_size_read(&vnode->vfs_inode);
+		aux.data_version = vnode->status.data_version;
+		fscache_unuse_cookie(afs_vnode_cache(vnode), &aux, &i_size);
+	} else {
+		fscache_unuse_cookie(afs_vnode_cache(vnode), NULL, NULL);
+	}
+
 	key_put(af->key);
 	kfree(af);
 	afs_prune_wb_keys(vnode);
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 0a4623c38cf5..49d897437998 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -439,7 +439,7 @@ static void afs_get_inode_cache(struct afs_vnode *vnode)
 		NULL,
 		&key, sizeof(key),
 		&aux, sizeof(aux),
-		vnode->status.size, true);
+		vnode->status.size);
 #endif
 }
 
@@ -793,14 +793,9 @@ void afs_evict_inode(struct inode *inode)
 	}
 
 #ifdef CONFIG_AFS_FSCACHE
-	{
-		struct afs_vnode_cache_aux aux;
-
-		aux.data_version = vnode->status.data_version;
-		fscache_relinquish_cookie(vnode->cache, &aux,
-					  test_bit(AFS_VNODE_DELETED, &vnode->flags));
-		vnode->cache = NULL;
-	}
+	fscache_relinquish_cookie(vnode->cache,
+				  test_bit(AFS_VNODE_DELETED, &vnode->flags));
+	vnode->cache = NULL;
 #endif
 
 	afs_prune_wb_keys(vnode);
diff --git a/fs/afs/volume.c b/fs/afs/volume.c
index 5e3fe894a0f9..13a953df5d75 100644
--- a/fs/afs/volume.c
+++ b/fs/afs/volume.c
@@ -276,7 +276,7 @@ void afs_activate_volume(struct afs_volume *volume)
 					       "AFS.vol",
 					       0, NULL,
 					       &volume->vid, sizeof(volume->vid),
-					       NULL, 0, 0, true);
+					       NULL, 0, 0);
 #endif
 }
 
@@ -288,7 +288,7 @@ void afs_deactivate_volume(struct afs_volume *volume)
 	_enter("%s", volume->name);
 
 #ifdef CONFIG_AFS_FSCACHE
-	fscache_relinquish_cookie(volume->cache, NULL,
+	fscache_relinquish_cookie(volume->cache,
 				  test_bit(AFS_VOLUME_DELETED, &volume->flags));
 	volume->cache = NULL;
 #endif
diff --git a/fs/cachefiles/bind.c b/fs/cachefiles/bind.c
index dfb14dbddf51..4c59e1ef4500 100644
--- a/fs/cachefiles/bind.c
+++ b/fs/cachefiles/bind.c
@@ -99,12 +99,10 @@ static int cachefiles_daemon_add_cache(struct cachefiles_cache *cache)
 	/* allocate the root index object */
 	ret = -ENOMEM;
 
-	fsdef = kmem_cache_alloc(cachefiles_object_jar, GFP_KERNEL);
+	fsdef = kmem_cache_zalloc(cachefiles_object_jar, GFP_KERNEL);
 	if (!fsdef)
 		goto error_root_object;
 
-	ASSERTCMP(fsdef->backer, ==, NULL);
-
 	atomic_set(&fsdef->usage, 1);
 	fsdef->type = FSCACHE_COOKIE_TYPE_INDEX;
 
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index b868afb970ad..e4d1a82b9f33 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -8,39 +8,56 @@
 #include <linux/slab.h>
 #include <linux/mount.h>
 #include <linux/xattr.h>
+#include <linux/file.h>
 #include "internal.h"
 
 static int cachefiles_attr_changed(struct cachefiles_object *object);
+static void cachefiles_put_object(struct fscache_object *_object,
+				  enum fscache_obj_ref_trace why);
 
 /*
- * allocate an object record for a cookie lookup and prepare the lookup data
+ * Allocate an object record for a cookie lookup and prepare the lookup data.
+ * Eats the caller's ref on parent.
  */
-static struct fscache_object *cachefiles_alloc_object(
-	struct fscache_cache *_cache,
-	struct fscache_cookie *cookie)
+static
+struct fscache_object *cachefiles_alloc_object(struct fscache_cookie *cookie,
+					       struct fscache_cache *_cache,
+					       struct fscache_object *parent)
 {
 	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
-	unsigned keylen;
-	void *buffer, *p;
-	char *key;
 
 	cache = container_of(_cache, struct cachefiles_cache, cache);
 
 	_enter("{%s},%p,", cache->cache.identifier, cookie);
 
-	/* create a new object record and a temporary leaf image */
-	object = kmem_cache_alloc(cachefiles_object_jar, cachefiles_gfp);
-	if (!object)
-		goto nomem_object;
-
-	ASSERTCMP(object->backer, ==, NULL);
-
-	atomic_set(&object->usage, 1);
+	object = kmem_cache_zalloc(cachefiles_object_jar, cachefiles_gfp);
+	if (!object) {
+		cachefiles_put_object(parent, fscache_obj_put_alloc_fail);
+		return NULL;
+	}
 
 	fscache_object_init(&object->fscache, cookie, &cache->cache);
+	object->fscache.parent = parent;
+	object->fscache.stage = FSCACHE_OBJECT_STAGE_LOOKING_UP;
+	atomic_set(&object->usage, 1);
 
 	object->type = cookie->type;
+	trace_cachefiles_ref(object, cookie,
+			     (enum cachefiles_obj_ref_trace)fscache_obj_new, 1);
+	return &object->fscache;
+}
+
+/*
+ * Prepare data for use in lookup.  This involves cooking the binary key into
+ * something that can be used as a filename.
+ */
+static void *cachefiles_prepare_lookup_data(struct fscache_object *object)
+{
+	struct fscache_cookie *cookie = object->cookie;
+	unsigned keylen;
+	void *buffer, *p;
+	char *key;
 
 	/* get hold of the raw key
 	 * - stick the length on the front and leave space on the back for the
@@ -48,7 +65,7 @@ static struct fscache_object *cachefiles_alloc_object(
 	 */
 	buffer = kmalloc((2 + 512) + 3, cachefiles_gfp);
 	if (!buffer)
-		goto nomem_buffer;
+		goto nomem;
 
 	keylen = cookie->key_len;
 	p = fscache_get_key(cookie);
@@ -60,36 +77,29 @@ static struct fscache_object *cachefiles_alloc_object(
 	((char *)buffer)[keylen + 4] = 0;
 
 	/* turn the raw key into something that can work with as a filename */
-	key = cachefiles_cook_key(buffer, keylen + 2, object->type);
+	key = cachefiles_cook_key(buffer, keylen + 2, cookie->type);
 	kfree(buffer);
 	if (!key)
-		goto nomem_key;
+		goto nomem;
 
-	object->lookup_key = key;
+	_leave(" = %s", key);
+	return key;
 
-	_leave(" = %p [%s]", &object->fscache, key);
-	return &object->fscache;
-
-nomem_key:
-	kfree(buffer);
-nomem_buffer:
-	kmem_cache_free(cachefiles_object_jar, object);
-	fscache_object_destroyed(&cache->cache);
-nomem_object:
-	_leave(" = -ENOMEM");
+nomem:
 	return ERR_PTR(-ENOMEM);
 }
 
 /*
- * attempt to look up the nominated node in this cache
- * - return -ETIMEDOUT to be scheduled again
+ * Attempt to look up the nominated node in this cache
  */
-static int cachefiles_lookup_object(struct fscache_object *_object)
+static bool cachefiles_lookup_object(struct fscache_object *_object,
+				     void *lookup_data)
 {
 	struct cachefiles_object *parent, *object;
 	struct cachefiles_cache *cache;
 	const struct cred *saved_cred;
-	int ret;
+	char *lookup_key = lookup_data;
+	bool success;
 
 	_enter("{OBJ%x}", _object->debug_id);
 
@@ -98,47 +108,30 @@ static int cachefiles_lookup_object(struct fscache_object *_object)
 			      struct cachefiles_object, fscache);
 	object = container_of(_object, struct cachefiles_object, fscache);
 
-	ASSERTCMP(object->lookup_key, !=, NULL);
+	ASSERTCMP(lookup_key, !=, NULL);
 
 	/* look up the key, creating any missing bits */
 	cachefiles_begin_secure(cache, &saved_cred);
-	ret = cachefiles_walk_to_object(parent, object, object->lookup_key);
+	success = cachefiles_walk_to_object(parent, object, lookup_key);
 	cachefiles_end_secure(cache, saved_cred);
 
 	/* polish off by setting the attributes of non-index files */
-	if (ret == 0 &&
+	if (success &&
 	    object->fscache.cookie->type != FSCACHE_COOKIE_TYPE_INDEX)
 		cachefiles_attr_changed(object);
 
-	if (ret < 0 && ret != -ETIMEDOUT) {
-		if (ret != -ENOBUFS)
-			pr_warn("Lookup failed error %d\n", ret);
-		fscache_object_lookup_error(&object->fscache);
-	}
-
-	_leave(" [%d]", ret);
-	return ret;
+	_leave(" [%d]", success);
+	return success;
 }
 
-/*
- * indication of lookup completion
- */
-static void cachefiles_lookup_complete(struct fscache_object *_object)
+static void cachefiles_free_lookup_data(struct fscache_object *object, void *lookup_data)
 {
-	struct cachefiles_object *object;
-
-	object = container_of(_object, struct cachefiles_object, fscache);
-
-	_enter("{OBJ%x}", object->fscache.debug_id);
-
-	kfree(object->lookup_key);
-	object->lookup_key = NULL;
+	kfree(lookup_data);
 }
 
 /*
  * increment the usage count on an inode object (may fail if unmounting)
  */
-static
 struct fscache_object *cachefiles_grab_object(struct fscache_object *_object,
 					      enum fscache_obj_ref_trace why)
 {
@@ -166,6 +159,8 @@ static void cachefiles_update_object(struct fscache_object *_object)
 	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
 	const struct cred *saved_cred;
+	loff_t object_size;
+	int ret;
 
 	_enter("{OBJ%x}", _object->debug_id);
 
@@ -174,7 +169,25 @@ static void cachefiles_update_object(struct fscache_object *_object)
 			     cache);
 
 	cachefiles_begin_secure(cache, &saved_cred);
+
+	object_size = object->fscache.cookie->object_size;
+	if (i_size_read(d_inode(object->backer)) > object_size) {
+		struct path path = {
+			.mnt	= cache->mnt,
+			.dentry	= object->backer
+		};
+		_debug("trunc %llx -> %llx", i_size_read(d_inode(object->backer)), object_size);
+		ret = vfs_truncate(&path, object_size);
+		if (ret < 0) {
+			cachefiles_io_error_obj(object, "Trunc-to-size failed");
+			cachefiles_remove_object_xattr(cache, object->backer);
+			goto out;
+		}
+	}
+
 	cachefiles_set_object_xattr(object, XATTR_REPLACE);
+
+out:
 	cachefiles_end_secure(cache, saved_cred);
 	_leave("");
 }
@@ -183,7 +196,8 @@ static void cachefiles_update_object(struct fscache_object *_object)
  * discard the resources pinned by an object and effect retirement if
  * requested
  */
-static void cachefiles_drop_object(struct fscache_object *_object)
+static void cachefiles_drop_object(struct fscache_object *_object,
+				   bool invalidate)
 {
 	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
@@ -209,17 +223,18 @@ static void cachefiles_drop_object(struct fscache_object *_object)
 	 * before we set it up.
 	 */
 	if (object->dentry) {
-		/* delete retired objects */
-		if (test_bit(FSCACHE_OBJECT_RETIRED, &object->fscache.flags) &&
-		    _object != cache->cache.fsdef
-		    ) {
-			_debug("- retire object OBJ%x", object->fscache.debug_id);
+		if (invalidate && _object != cache->cache.fsdef) {
+			_debug("- inval object OBJ%x", object->fscache.debug_id);
 			cachefiles_begin_secure(cache, &saved_cred);
 			cachefiles_delete_object(cache, object);
 			cachefiles_end_secure(cache, saved_cred);
 		}
 
 		/* close the filesystem stuff attached to the object */
+		if (object->backing_file)
+			fput(object->backing_file);
+		object->backing_file = NULL;
+
 		if (object->backer != object->dentry)
 			dput(object->backer);
 		object->backer = NULL;
@@ -253,9 +268,6 @@ static void cachefiles_put_object(struct fscache_object *_object,
 	ASSERT((atomic_read(&object->usage) & 0xffff0000) != 0x6b6b0000);
 #endif
 
-	ASSERTIFCMP(object->fscache.parent,
-		    object->fscache.parent->n_children, >, 0);
-
 	u = atomic_dec_return(&object->usage);
 	trace_cachefiles_ref(object, _object->cookie,
 			     (enum cachefiles_obj_ref_trace)why, u);
@@ -263,14 +275,10 @@ static void cachefiles_put_object(struct fscache_object *_object,
 	if (u == 0) {
 		_debug("- kill object OBJ%x", object->fscache.debug_id);
 
-		ASSERTCMP(object->fscache.parent, ==, NULL);
 		ASSERTCMP(object->backer, ==, NULL);
 		ASSERTCMP(object->dentry, ==, NULL);
-		ASSERTCMP(object->fscache.n_ops, ==, 0);
 		ASSERTCMP(object->fscache.n_children, ==, 0);
 
-		kfree(object->lookup_key);
-
 		cache = object->fscache.cache;
 		fscache_object_destroy(&object->fscache);
 		kmem_cache_free(cachefiles_object_jar, object);
@@ -416,15 +424,25 @@ static void cachefiles_invalidate_object(struct fscache_object *_object)
 	_leave("");
 }
 
+static unsigned int cachefiles_get_object_usage(const struct fscache_object *_object)
+{
+	struct cachefiles_object *object;
+
+	object = container_of(_object, struct cachefiles_object, fscache);
+	return atomic_read(&object->usage);
+}
+
 const struct fscache_cache_ops cachefiles_cache_ops = {
 	.name			= "cachefiles",
 	.alloc_object		= cachefiles_alloc_object,
+	.prepare_lookup_data	= cachefiles_prepare_lookup_data,
 	.lookup_object		= cachefiles_lookup_object,
-	.lookup_complete	= cachefiles_lookup_complete,
+	.free_lookup_data	= cachefiles_free_lookup_data,
 	.grab_object		= cachefiles_grab_object,
 	.update_object		= cachefiles_update_object,
 	.invalidate_object	= cachefiles_invalidate_object,
 	.drop_object		= cachefiles_drop_object,
 	.put_object		= cachefiles_put_object,
+	.get_object_usage	= cachefiles_get_object_usage,
 	.sync_cache		= cachefiles_sync_cache,
 };
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index f8f308ce7385..b89f76a03546 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -34,13 +34,13 @@ extern unsigned cachefiles_debug;
  */
 struct cachefiles_object {
 	struct fscache_object		fscache;	/* fscache handle */
-	char				*lookup_key;	/* key to look up */
 	struct dentry			*dentry;	/* the file/dir representing this object */
 	struct dentry			*backer;	/* backing file */
+	struct file			*backing_file;	/* File open on backing storage */
 	loff_t				i_size;		/* object size */
 	atomic_t			usage;		/* object usage count */
 	uint8_t				type;		/* object type */
-	uint8_t				new;		/* T if object new */
+	bool				new;		/* T if object new */
 };
 
 extern struct kmem_cache *cachefiles_object_jar;
@@ -112,6 +112,8 @@ extern int cachefiles_has_space(struct cachefiles_cache *cache,
  * interface.c
  */
 extern const struct fscache_cache_ops cachefiles_cache_ops;
+extern struct fscache_object *cachefiles_grab_object(struct fscache_object *_object,
+						     enum fscache_obj_ref_trace why);
 
 /*
  * key.c
@@ -125,9 +127,9 @@ extern void cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
 				    struct dentry *dentry);
 extern int cachefiles_delete_object(struct cachefiles_cache *cache,
 				    struct cachefiles_object *object);
-extern int cachefiles_walk_to_object(struct cachefiles_object *parent,
-				     struct cachefiles_object *object,
-				     const char *key);
+extern bool cachefiles_walk_to_object(struct cachefiles_object *parent,
+				      struct cachefiles_object *object,
+				      const char *key);
 extern struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
 					       struct dentry *dir,
 					       const char *name);
diff --git a/fs/cachefiles/main.c b/fs/cachefiles/main.c
index 3f0101a74809..3373f3eddbf8 100644
--- a/fs/cachefiles/main.c
+++ b/fs/cachefiles/main.c
@@ -37,13 +37,6 @@ static struct miscdevice cachefiles_dev = {
 	.fops	= &cachefiles_daemon_fops,
 };
 
-static void cachefiles_object_init_once(void *_object)
-{
-	struct cachefiles_object *object = _object;
-
-	memset(object, 0, sizeof(*object));
-}
-
 /*
  * initialise the fs caching module
  */
@@ -60,9 +53,7 @@ static int __init cachefiles_init(void)
 	cachefiles_object_jar =
 		kmem_cache_create("cachefiles_object_jar",
 				  sizeof(struct cachefiles_object),
-				  0,
-				  SLAB_HWCACHE_ALIGN,
-				  cachefiles_object_init_once);
+				  0, SLAB_HWCACHE_ALIGN, NULL);
 	if (!cachefiles_object_jar) {
 		pr_notice("Failed to allocate an object jar\n");
 		goto error_object_jar;
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 818d1bca1904..d1c8828ebbbb 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -18,8 +18,6 @@
 #include <linux/slab.h>
 #include "internal.h"
 
-#define CACHEFILES_KEYBUF_SIZE 512
-
 /*
  * Mark the backing file as being a cache file if it's not already in use so.
  */
@@ -257,13 +255,24 @@ int cachefiles_delete_object(struct cachefiles_cache *cache,
 	return ret;
 }
 
+/*
+ * Handle negative lookup.  We create a temporary file for a datafile to use
+ * which we will link into place later.
+ */
+static void cachefiles_negative_lookup(struct cachefiles_object *parent,
+				       struct cachefiles_object *object)
+{
+	// TODO: Use vfs_tmpfile() for datafiles
+	//fscache_object_lookup_negative(&object->fscache);
+}
+
 /*
  * walk from the parent object to the child object through the backing
  * filesystem, creating directories as we go
  */
-int cachefiles_walk_to_object(struct cachefiles_object *parent,
-			      struct cachefiles_object *object,
-			      const char *key)
+bool cachefiles_walk_to_object(struct cachefiles_object *parent,
+			       struct cachefiles_object *object,
+			       const char *key)
 {
 	struct cachefiles_cache *cache;
 	struct dentry *dir, *next = NULL;
@@ -271,7 +280,7 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	struct path path;
 	unsigned long start;
 	const char *name;
-	bool marked = false;
+	bool marked = false, negated = false;
 	int ret, nlen;
 
 	_enter("OBJ%x{%p},OBJ%x,%s,",
@@ -288,7 +297,7 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	if (!(d_is_dir(parent->dentry))) {
 		// TODO: convert file to dir
 		_leave("looking up in none directory");
-		return -ENOBUFS;
+		return false;
 	}
 
 	dir = dget(parent->dentry);
@@ -328,8 +337,10 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	/* if this element of the path doesn't exist, then the lookup phase
 	 * failed, and we can release any readers in the certain knowledge that
 	 * there's nothing for them to actually read */
-	if (d_is_negative(next))
-		fscache_object_lookup_negative(&object->fscache);
+	if (d_is_negative(next) && !negated) {
+		cachefiles_negative_lookup(object, parent);
+		negated = true;
+	}
 
 	/* we need to create the object if it's negative */
 	if (key || object->type == FSCACHE_COOKIE_TYPE_INDEX) {
@@ -444,6 +455,9 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 			fscache_object_retrying_stale(&object->fscache);
 			goto lookup_again;
 		}
+
+		if (ret < 0)
+			goto check_error_unlock;
 	}
 
 	inode_unlock(d_inode(dir));
@@ -469,14 +483,10 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	/* open a file interface onto a data file */
 	if (object->type != FSCACHE_COOKIE_TYPE_INDEX) {
 		if (d_is_reg(object->dentry)) {
-			const struct address_space_operations *aops;
-
-			ret = -EPERM;
-			aops = d_backing_inode(object->dentry)->i_mapping->a_ops;
-			if (!aops->bmap)
-				goto check_error;
-			if (object->dentry->d_sb->s_blocksize > PAGE_SIZE)
+			if (object->dentry->d_sb->s_blocksize > PAGE_SIZE) {
+				pr_warn("cachefiles: Block size too large\n");
 				goto check_error;
+			}
 
 			object->backer = object->dentry;
 		} else {
@@ -484,11 +494,15 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 		}
 	}
 
-	object->new = 0;
-	fscache_obtained_object(&object->fscache);
+	if (object->new)
+		object->fscache.stage = FSCACHE_OBJECT_STAGE_LIVE_EMPTY;
+	else
+		object->fscache.stage = FSCACHE_OBJECT_STAGE_LIVE;
+	wake_up_var(&object->fscache.stage);
 
-	_leave(" = 0 [%lu]", d_backing_inode(object->dentry)->i_ino);
-	return 0;
+	object->new = false;
+	_leave(" = t [%lu]", d_backing_inode(object->dentry)->i_ino);
+	return true;
 
 no_space_error:
 	fscache_object_mark_killed(&object->fscache, FSCACHE_OBJECT_NO_SPACE);
@@ -520,8 +534,10 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 error_out2:
 	dput(dir);
 error_out:
-	_leave(" = error %d", -ret);
-	return ret;
+	object->fscache.stage = FSCACHE_OBJECT_STAGE_DEAD;
+	wake_up_var(&object->fscache.stage);
+	_leave(" = f");
+	return false;
 }
 
 /*
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index a4f1eddebe6f..5b2f6da91cc8 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -122,7 +122,6 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 	if (len > 0)
 		memcpy(buf->data, fscache_get_aux(object->fscache.cookie), len);
 
-	clear_bit(FSCACHE_COOKIE_AUX_UPDATED, &object->fscache.cookie->flags);
 	ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
 			   buf, sizeof(struct cachefiles_xattr) + len,
 			   xattr_flags);
diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
index 7b10c6aad157..396e1b5fdc28 100644
--- a/fs/fscache/Makefile
+++ b/fs/fscache/Makefile
@@ -10,7 +10,8 @@ fscache-y := \
 	fsdef.o \
 	main.o \
 	netfs.o \
-	object.o
+	obj.o \
+	object_bits.o
 
 fscache-$(CONFIG_PROC_FS) += proc.o
 fscache-$(CONFIG_FSCACHE_STATS) += stats.o
diff --git a/fs/fscache/cache.c b/fs/fscache/cache.c
index 92f747b72ec3..8f2dd40f93fb 100644
--- a/fs/fscache/cache.c
+++ b/fs/fscache/cache.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0-or-later
 /* FS-Cache cache handling
  *
- * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Copyright (C) 2007, 2019 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
  */
 
@@ -93,6 +93,7 @@ struct fscache_cache *fscache_select_cache_for_object(
 	struct fscache_cookie *cookie)
 {
 	struct fscache_cache_tag *tag;
+	struct fscache_cookie *parent = cookie->parent;
 	struct fscache_object *object;
 	struct fscache_cache *cache;
 
@@ -104,33 +105,32 @@ struct fscache_cache *fscache_select_cache_for_object(
 	}
 
 	/* we check the parent to determine the cache to use */
-	spin_lock(&cookie->lock);
+	spin_lock(&parent->lock);
 
 	/* the first in the parent's backing list should be the preferred
 	 * cache */
-	if (!hlist_empty(&cookie->backing_objects)) {
-		object = hlist_entry(cookie->backing_objects.first,
+	if (!hlist_empty(&parent->backing_objects)) {
+		object = hlist_entry(parent->backing_objects.first,
 				     struct fscache_object, cookie_link);
 
 		cache = object->cache;
-		if (fscache_object_is_dying(object) ||
-		    test_bit(FSCACHE_IOERROR, &cache->flags))
+		if (test_bit(FSCACHE_IOERROR, &cache->flags))
 			cache = NULL;
 
-		spin_unlock(&cookie->lock);
+		spin_unlock(&parent->lock);
 		_leave(" = %p [parent]", cache);
 		return cache;
 	}
 
 	/* the parent is unbacked */
-	if (cookie->type != FSCACHE_COOKIE_TYPE_INDEX) {
+	if (parent->type != FSCACHE_COOKIE_TYPE_INDEX) {
 		/* cookie not an index and is unbacked */
-		spin_unlock(&cookie->lock);
-		_leave(" = NULL [cookie ub,ni]");
+		spin_unlock(&parent->lock);
+		_leave(" = NULL [parent ub,ni]");
 		return NULL;
 	}
 
-	spin_unlock(&cookie->lock);
+	spin_unlock(&parent->lock);
 
 	tag = cookie->preferred_cache;
 	if (!tag)
@@ -210,10 +210,7 @@ int fscache_add_cache(struct fscache_cache *cache,
 	BUG_ON(!ifsdef);
 
 	cache->flags = 0;
-	ifsdef->event_mask =
-		((1 << NR_FSCACHE_OBJECT_EVENTS) - 1) &
-		~(1 << FSCACHE_OBJECT_EV_CLEARED);
-	__set_bit(FSCACHE_OBJECT_IS_AVAILABLE, &ifsdef->flags);
+	ifsdef->stage = FSCACHE_OBJECT_STAGE_LIVE;
 
 	if (!tagname)
 		tagname = cache->identifier;
@@ -308,33 +305,57 @@ void fscache_io_error(struct fscache_cache *cache)
 EXPORT_SYMBOL(fscache_io_error);
 
 /*
- * request withdrawal of all the objects in a cache
- * - all the objects being withdrawn are moved onto the supplied list
+ * Withdraw an object.
  */
-static void fscache_withdraw_all_objects(struct fscache_cache *cache,
-					 struct list_head *dying_objects)
+static void fscache_withdraw_object(struct fscache_cookie *cookie,
+				    struct fscache_object *object,
+				    int param)
 {
-	struct fscache_object *object;
+	_enter("c=%08x o=%08x", cookie ? cookie->debug_id : 0, object->debug_id);
 
-	while (!list_empty(&cache->object_list)) {
-		spin_lock(&cache->object_list_lock);
+	_debug("WITHDRAW %x", object->debug_id);
 
-		if (!list_empty(&cache->object_list)) {
-			object = list_entry(cache->object_list.next,
-					    struct fscache_object, cache_link);
-			list_move_tail(&object->cache_link, dying_objects);
+	spin_lock(&cookie->lock);
+	cookie->stage = FSCACHE_COOKIE_STAGE_DEAD;
+	hlist_del_init(&object->cookie_link);
+	spin_unlock(&cookie->lock);
 
-			_debug("withdraw %p", object->cookie);
+	fscache_drop_object(cookie, object, param);
+	object->cache->ops->put_object(object, fscache_obj_put_withdraw);
+}
 
-			/* This must be done under object_list_lock to prevent
-			 * a race with fscache_drop_object().
-			 */
-			fscache_raise_event(object, FSCACHE_OBJECT_EV_KILL);
-		}
+/*
+ * Request withdrawal of all the objects in a cache.
+ */
+static void fscache_withdraw_all_objects(struct fscache_cache *cache)
+{
+	struct fscache_object *object;
 
+	_enter("");
+
+	spin_lock(&cache->object_list_lock);
+	while (!list_empty(&cache->object_list)) {
+		/* Go through the list backwards so that we do children before
+		 * their parents.
+		 */
+		object = list_entry(cache->object_list.prev,
+				    struct fscache_object, cache_link);
+		list_del_init(&object->cache_link);
+		cache->ops->grab_object(object, fscache_obj_get_withdraw);
 		spin_unlock(&cache->object_list_lock);
+
+		_debug("o=%08x n=%u", object->debug_id, object->n_children);
+		wait_var_event(&object->n_children, object->n_children == 0);
+
+		fscache_dispatch(object->cookie, object, 0,
+				 fscache_withdraw_object);
+
 		cond_resched();
+		spin_lock(&cache->object_list_lock);
 	}
+	spin_unlock(&cache->object_list_lock);
+
+	_leave("");
 }
 
 /**
@@ -349,12 +370,10 @@ static void fscache_withdraw_all_objects(struct fscache_cache *cache,
  */
 void fscache_withdraw_cache(struct fscache_cache *cache)
 {
-	LIST_HEAD(dying_objects);
-
 	_enter("");
 
-	pr_notice("Withdrawing cache \"%s\"\n",
-		  cache->tag->name);
+	pr_notice("Withdrawing cache \"%s\" (%u objs)\n",
+		  cache->tag->name, atomic_read(&cache->object_count));
 
 	/* make the cache unavailable for cookie acquisition */
 	if (test_and_set_bit(FSCACHE_CACHE_WITHDRAWN, &cache->flags))
@@ -365,29 +384,27 @@ void fscache_withdraw_cache(struct fscache_cache *cache)
 	cache->tag->cache = NULL;
 	up_write(&fscache_addremove_sem);
 
-	/* make sure all pages pinned by operations on behalf of the netfs are
-	 * written to disk */
-	fscache_stat(&fscache_n_cop_sync_cache);
-	cache->ops->sync_cache(cache);
-	fscache_stat_d(&fscache_n_cop_sync_cache);
-
 	/* we now have to destroy all the active objects pertaining to this
 	 * cache - which we do by passing them off to thread pool to be
 	 * disposed of */
 	_debug("destroy");
 
-	fscache_withdraw_all_objects(cache, &dying_objects);
+	fscache_withdraw_all_objects(cache);
+
+	/* make sure all outstanding data is written to disk */
+	fscache_stat(&fscache_n_cop_sync_cache);
+	cache->ops->sync_cache(cache);
+	fscache_stat_d(&fscache_n_cop_sync_cache);
 
 	/* wait for all extant objects to finish their outstanding operations
 	 * and go away */
-	_debug("wait for finish");
+	_debug("wait for finish %u", atomic_read(&cache->object_count));
 	wait_event(fscache_cache_cleared_wq,
 		   atomic_read(&cache->object_count) == 0);
 	_debug("wait for clearance");
 	wait_event(fscache_cache_cleared_wq,
 		   list_empty(&cache->object_list));
 	_debug("cleared");
-	ASSERT(list_empty(&dying_objects));
 
 	kobject_put(cache->kobj);
 
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index aa33d16711e1..a8aa1639e93b 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0-or-later
 /* netfs cookie management
  *
- * Copyright (C) 2004-2007 Red Hat, Inc. All Rights Reserved.
+ * Copyright (C) 2004-2007, 2019 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
  *
  * See Documentation/filesystems/caching/netfs-api.rst for more information on
@@ -15,19 +15,11 @@
 
 struct kmem_cache *fscache_cookie_jar;
 
-static atomic_t fscache_object_debug_id = ATOMIC_INIT(0);
-
 #define fscache_cookie_hash_shift 15
 static struct hlist_bl_head fscache_cookie_hash[1 << fscache_cookie_hash_shift];
 static LIST_HEAD(fscache_cookies);
 static DEFINE_RWLOCK(fscache_cookies_lock);
 
-static int fscache_acquire_non_index_cookie(struct fscache_cookie *cookie);
-static int fscache_alloc_object(struct fscache_cache *cache,
-				struct fscache_cookie *cookie);
-static int fscache_attach_object(struct fscache_cookie *cookie,
-				 struct fscache_object *object);
-
 static void fscache_print_cookie(struct fscache_cookie *cookie, char prefix)
 {
 	struct fscache_object *object;
@@ -162,8 +154,8 @@ struct fscache_cookie *fscache_alloc_cookie(
 	if (!cookie)
 		return NULL;
 
-	cookie->type = type;
-	cookie->advice = advice;
+	cookie->type	= type;
+	cookie->advice	= advice;
 	cookie->key_len = index_key_len;
 	cookie->aux_len = aux_data_len;
 	cookie->object_size = object_size;
@@ -180,19 +172,21 @@ struct fscache_cookie *fscache_alloc_cookie(
 			goto nomem;
 	}
 
+	cookie->parent = parent;
 	atomic_set(&cookie->usage, 1);
 	atomic_set(&cookie->n_children, 0);
+	atomic_set(&cookie->n_ops, 1);
 	cookie->debug_id = atomic_inc_return(&fscache_cookie_debug_id);
 
+	cookie->stage = FSCACHE_COOKIE_STAGE_QUIESCENT;
+
 	/* We keep the active count elevated until relinquishment to prevent an
 	 * attempt to wake up every time the object operations queue quiesces.
 	 */
 	atomic_set(&cookie->n_active, 1);
 
-	cookie->parent		= parent;
 	cookie->preferred_cache	= fscache_get_cache_tag(preferred_cache);
-	
-	cookie->flags		= (1 << FSCACHE_COOKIE_NO_DATA_YET);
+
 	spin_lock_init(&cookie->lock);
 	INIT_HLIST_HEAD(&cookie->backing_objects);
 
@@ -270,13 +264,12 @@ struct fscache_cookie *__fscache_acquire_cookie(
 	struct fscache_cache_tag *preferred_cache,
 	const void *index_key, size_t index_key_len,
 	const void *aux_data, size_t aux_data_len,
-	loff_t object_size,
-	bool enable)
+	loff_t object_size)
 {
 	struct fscache_cookie *candidate, *cookie;
 
-	_enter("{%s},{%s},%u",
-	       parent ? parent->type_name : "<no-parent>", type_name, enable);
+	_enter("{%s},{%s}",
+	       parent ? parent->type_name : "<no-parent>", type_name);
 
 	if (!index_key || !index_key_len || index_key_len > 255 || aux_data_len > 255)
 		return NULL;
@@ -308,6 +301,7 @@ struct fscache_cookie *__fscache_acquire_cookie(
 		_leave(" [ENOMEM]");
 		return NULL;
 	}
+	trace_fscache_cookie(candidate, fscache_cookie_new_acquire, 1);
 
 	cookie = fscache_hash_cookie(candidate);
 	if (!cookie) {
@@ -331,28 +325,8 @@ struct fscache_cookie *__fscache_acquire_cookie(
 	}
 
 	trace_fscache_acquire(cookie);
-
-	if (enable) {
-		/* if the object is an index then we need do nothing more here
-		 * - we create indices on disk when we need them as an index
-		 * may exist in multiple caches */
-		if (cookie->type != FSCACHE_COOKIE_TYPE_INDEX) {
-			if (fscache_acquire_non_index_cookie(cookie) == 0) {
-				set_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags);
-			} else {
-				atomic_dec(&parent->n_children);
-				fscache_cookie_put(cookie,
-						   fscache_cookie_put_acquire_nobufs);
-				fscache_stat(&fscache_n_acquires_nobufs);
-				_leave(" = NULL");
-				return NULL;
-			}
-		} else {
-			set_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags);
-		}
-	}
-
 	fscache_stat(&fscache_n_acquires_ok);
+	_leave(" = c=%08x", cookie->debug_id);
 
 out:
 	fscache_free_cookie(candidate);
@@ -361,211 +335,80 @@ struct fscache_cookie *__fscache_acquire_cookie(
 EXPORT_SYMBOL(__fscache_acquire_cookie);
 
 /*
- * Enable a cookie to permit it to accept new operations.
+ * Start using the cookie for I/O.  This prevents the backing object from being
+ * reaped by VM pressure.
  */
-void __fscache_enable_cookie(struct fscache_cookie *cookie,
-			     const void *aux_data,
-			     loff_t object_size,
-			     bool (*can_enable)(void *data),
-			     void *data)
+void __fscache_use_cookie(struct fscache_cookie *cookie, bool will_modify)
 {
-	_enter("%p", cookie);
+	enum fscache_cookie_stage stage;
 
-	trace_fscache_enable(cookie);
+	_enter("c=%08x", cookie->debug_id);
 
-	wait_on_bit_lock(&cookie->flags, FSCACHE_COOKIE_ENABLEMENT_LOCK,
-			 TASK_UNINTERRUPTIBLE);
-
-	cookie->object_size = object_size;
-	fscache_update_aux(cookie, aux_data);
-
-	if (test_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags))
-		goto out_unlock;
-
-	if (can_enable && !can_enable(data)) {
-		/* The netfs decided it didn't want to enable after all */
-	} else if (cookie->type != FSCACHE_COOKIE_TYPE_INDEX) {
-		/* Wait for outstanding disablement to complete */
-		__fscache_wait_on_invalidate(cookie);
-
-		if (fscache_acquire_non_index_cookie(cookie) == 0)
-			set_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags);
-	} else {
-		set_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags);
-	}
-
-out_unlock:
-	clear_bit_unlock(FSCACHE_COOKIE_ENABLEMENT_LOCK, &cookie->flags);
-	wake_up_bit(&cookie->flags, FSCACHE_COOKIE_ENABLEMENT_LOCK);
-}
-EXPORT_SYMBOL(__fscache_enable_cookie);
+	ASSERT(cookie->type != FSCACHE_COOKIE_TYPE_INDEX);
 
-/*
- * acquire a non-index cookie
- * - this must make sure the index chain is instantiated and instantiate the
- *   object representation too
- */
-static int fscache_acquire_non_index_cookie(struct fscache_cookie *cookie)
-{
-	struct fscache_object *object;
-	struct fscache_cache *cache;
-	int ret;
-
-	_enter("");
-
-	set_bit(FSCACHE_COOKIE_UNAVAILABLE, &cookie->flags);
-
-	/* now we need to see whether the backing objects for this cookie yet
-	 * exist, if not there'll be nothing to search */
-	down_read(&fscache_addremove_sem);
-
-	if (list_empty(&fscache_cache_list)) {
-		up_read(&fscache_addremove_sem);
-		_leave(" = 0 [no caches]");
-		return 0;
-	}
-
-	/* select a cache in which to store the object */
-	cache = fscache_select_cache_for_object(cookie->parent);
-	if (!cache) {
-		up_read(&fscache_addremove_sem);
-		fscache_stat(&fscache_n_acquires_no_cache);
-		_leave(" = -ENOMEDIUM [no cache]");
-		return -ENOMEDIUM;
-	}
-
-	_debug("cache %s", cache->tag->name);
-
-	set_bit(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags);
+	spin_lock(&cookie->lock);
 
-	/* ask the cache to allocate objects for this cookie and its parent
-	 * chain */
-	ret = fscache_alloc_object(cache, cookie);
-	if (ret < 0) {
-		up_read(&fscache_addremove_sem);
-		_leave(" = %d", ret);
-		return ret;
-	}
+	atomic_inc(&cookie->n_active);
 
-	spin_lock(&cookie->lock);
-	if (hlist_empty(&cookie->backing_objects)) {
+again:
+	stage = cookie->stage;
+	switch (stage) {
+	case FSCACHE_COOKIE_STAGE_QUIESCENT:
+		cookie->stage = FSCACHE_COOKIE_STAGE_INITIALISING;
 		spin_unlock(&cookie->lock);
-		goto unavailable;
-	}
 
-	object = hlist_entry(cookie->backing_objects.first,
-			     struct fscache_object, cookie_link);
+		/* The lookup job holds its own active increment */
+		atomic_inc(&cookie->n_active);
+		fscache_dispatch(cookie, NULL, 0, fscache_lookup_object);
+		break;
 
-	/* initiate the process of looking up all the objects in the chain
-	 * (done by fscache_initialise_object()) */
-	fscache_raise_event(object, FSCACHE_OBJECT_EV_NEW_CHILD);
+	case FSCACHE_COOKIE_STAGE_INITIALISING:
+	case FSCACHE_COOKIE_STAGE_LOOKING_UP:
+		spin_unlock(&cookie->lock);
+		wait_var_event(&cookie->stage, cookie->stage != stage);
+		spin_lock(&cookie->lock);
+		goto again;
 
-	spin_unlock(&cookie->lock);
+	case FSCACHE_COOKIE_STAGE_NO_DATA_YET:
+	case FSCACHE_COOKIE_STAGE_ACTIVE:
+	case FSCACHE_COOKIE_STAGE_INVALIDATING:
+		// TODO: Handle will_modify
+		spin_unlock(&cookie->lock);
+		break;
 
-	/* we may be required to wait for lookup to complete at this point */
-	if (!fscache_defer_lookup) {
-		_debug("non-deferred lookup %p", &cookie->flags);
-		wait_on_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP,
-			    TASK_UNINTERRUPTIBLE);
-		_debug("complete");
-		if (test_bit(FSCACHE_COOKIE_UNAVAILABLE, &cookie->flags))
-			goto unavailable;
+	case FSCACHE_COOKIE_STAGE_DEAD:
+		spin_unlock(&cookie->lock);
+		kdebug("dead");
+		break;
 	}
 
-	up_read(&fscache_addremove_sem);
-	_leave(" = 0 [deferred]");
-	return 0;
-
-unavailable:
-	up_read(&fscache_addremove_sem);
-	_leave(" = -ENOBUFS");
-	return -ENOBUFS;
+	_leave("");
 }
+EXPORT_SYMBOL(__fscache_use_cookie);
 
 /*
- * recursively allocate cache object records for a cookie/cache combination
- * - caller must be holding the addremove sem
+ * Stop using the cookie for I/O.
  */
-static int fscache_alloc_object(struct fscache_cache *cache,
-				struct fscache_cookie *cookie)
+void __fscache_unuse_cookie(struct fscache_cookie *cookie,
+			    const void *aux_data, const loff_t *object_size)
 {
-	struct fscache_object *object;
-	int ret;
-
-	_enter("%p,%p{%s}", cache, cookie, cookie->type_name);
-
-	spin_lock(&cookie->lock);
-	hlist_for_each_entry(object, &cookie->backing_objects,
-			     cookie_link) {
-		if (object->cache == cache)
-			goto object_already_extant;
-	}
-	spin_unlock(&cookie->lock);
-
-	/* ask the cache to allocate an object (we may end up with duplicate
-	 * objects at this stage, but we sort that out later) */
-	fscache_stat(&fscache_n_cop_alloc_object);
-	object = cache->ops->alloc_object(cache, cookie);
-	fscache_stat_d(&fscache_n_cop_alloc_object);
-	if (IS_ERR(object)) {
-		fscache_stat(&fscache_n_object_no_alloc);
-		ret = PTR_ERR(object);
-		goto error;
-	}
-
-	ASSERTCMP(object->cookie, ==, cookie);
-	fscache_stat(&fscache_n_object_alloc);
-
-	object->debug_id = atomic_inc_return(&fscache_object_debug_id);
-
-	_debug("ALLOC OBJ%x: %s {%lx}",
-	       object->debug_id, cookie->type_name, object->events);
-
-	ret = fscache_alloc_object(cache, cookie->parent);
-	if (ret < 0)
-		goto error_put;
-
-	/* only attach if we managed to allocate all we needed, otherwise
-	 * discard the object we just allocated and instead use the one
-	 * attached to the cookie */
-	if (fscache_attach_object(cookie, object) < 0) {
-		fscache_stat(&fscache_n_cop_put_object);
-		cache->ops->put_object(object, fscache_obj_put_attach_fail);
-		fscache_stat_d(&fscache_n_cop_put_object);
-	}
-
-	_leave(" = 0");
-	return 0;
-
-object_already_extant:
-	ret = -ENOBUFS;
-	if (fscache_object_is_dying(object) ||
-	    fscache_cache_is_broken(object)) {
-		spin_unlock(&cookie->lock);
-		goto error;
-	}
-	spin_unlock(&cookie->lock);
-	_leave(" = 0 [found]");
-	return 0;
-
-error_put:
-	fscache_stat(&fscache_n_cop_put_object);
-	cache->ops->put_object(object, fscache_obj_put_alloc_fail);
-	fscache_stat_d(&fscache_n_cop_put_object);
-error:
-	_leave(" = %d", ret);
-	return ret;
+	if (aux_data || object_size)
+		__fscache_update_cookie(cookie, aux_data, object_size);
+	if (atomic_dec_and_test(&cookie->n_active))
+		wake_up_var(&cookie->n_active);
 }
+EXPORT_SYMBOL(__fscache_unuse_cookie);
 
 /*
- * attach a cache object to a cookie
+ * Attempt to attach the object to the list on the cookie or, if there's an
+ * object already attached, then that is used instead and a ref is taken on it
+ * for the caller.  Returns a pointer to whichever object is selected.
  */
-static int fscache_attach_object(struct fscache_cookie *cookie,
-				 struct fscache_object *object)
+struct fscache_object *fscache_attach_object(struct fscache_cookie *cookie,
+					     struct fscache_object *object)
 {
-	struct fscache_object *p;
+	struct fscache_object *parent, *p, *ret = NULL;
 	struct fscache_cache *cache = object->cache;
-	int ret;
 
 	_enter("{%s},{OBJ%x}", cookie->type_name, object->debug_id);
 
@@ -575,38 +418,26 @@ static int fscache_attach_object(struct fscache_cookie *cookie,
 
 	/* there may be multiple initial creations of this object, but we only
 	 * want one */
-	ret = -EEXIST;
 	hlist_for_each_entry(p, &cookie->backing_objects, cookie_link) {
-		if (p->cache == object->cache) {
-			if (fscache_object_is_dying(p))
-				ret = -ENOBUFS;
-			goto cant_attach_object;
-		}
+		if (p->cache == object->cache)
+			goto exists;
 	}
 
 	/* pin the parent object */
 	spin_lock_nested(&cookie->parent->lock, 1);
-	hlist_for_each_entry(p, &cookie->parent->backing_objects,
-			     cookie_link) {
-		if (p->cache == object->cache) {
-			if (fscache_object_is_dying(p)) {
-				ret = -ENOBUFS;
-				spin_unlock(&cookie->parent->lock);
-				goto cant_attach_object;
-			}
-			object->parent = p;
-			spin_lock(&p->lock);
-			p->n_children++;
-			spin_unlock(&p->lock);
-			break;
-		}
-	}
+
+	parent = object->parent;
+
+	spin_lock(&parent->lock);
+	parent->n_children++;
+	spin_unlock(&parent->lock);
+
 	spin_unlock(&cookie->parent->lock);
 
 	/* attach to the cache's object list */
 	if (list_empty(&object->cache_link)) {
 		spin_lock(&cache->object_list_lock);
-		list_add(&object->cache_link, &cache->object_list);
+		list_add_tail(&object->cache_link, &cache->object_list);
 		spin_unlock(&cache->object_list_lock);
 	}
 
@@ -614,12 +445,29 @@ static int fscache_attach_object(struct fscache_cookie *cookie,
 	hlist_add_head(&object->cookie_link, &cookie->backing_objects);
 
 	fscache_objlist_add(object);
-	ret = 0;
+	cookie->stage = FSCACHE_COOKIE_STAGE_LOOKING_UP;
+	wake_up_var(&cookie->stage);
 
-cant_attach_object:
+out_grab:
+	ret = cache->ops->grab_object(object, fscache_obj_get_attach);
 	spin_unlock(&cookie->lock);
-	_leave(" = %d", ret);
+	_leave(" = %p", ret);
 	return ret;
+
+exists:
+	object = p;
+	goto out_grab;
+}
+
+void fscache_set_cookie_stage(struct fscache_cookie *cookie,
+			      enum fscache_cookie_stage stage)
+{
+	if (cookie->stage != stage) {
+		spin_lock(&cookie->lock);
+		cookie->stage = stage;
+		spin_unlock(&cookie->lock);
+		wake_up_var(&cookie->stage);
+	}
 }
 
 /*
@@ -627,8 +475,6 @@ static int fscache_attach_object(struct fscache_cookie *cookie,
  */
 void __fscache_invalidate(struct fscache_cookie *cookie)
 {
-	struct fscache_object *object;
-
 	_enter("{%s}", cookie->type_name);
 
 	fscache_stat(&fscache_n_invalidates);
@@ -640,26 +486,9 @@ void __fscache_invalidate(struct fscache_cookie *cookie)
 	 */
 	ASSERTCMP(cookie->type, ==, FSCACHE_COOKIE_TYPE_DATAFILE);
 
-	/* If there's an object, we tell the object state machine to handle the
-	 * invalidation on our behalf, otherwise there's nothing to do.
-	 */
-	if (!hlist_empty(&cookie->backing_objects)) {
-		spin_lock(&cookie->lock);
-
-		if (fscache_cookie_enabled(cookie) &&
-		    !hlist_empty(&cookie->backing_objects) &&
-		    !test_and_set_bit(FSCACHE_COOKIE_INVALIDATING,
-				      &cookie->flags)) {
-			object = hlist_entry(cookie->backing_objects.first,
-					     struct fscache_object,
-					     cookie_link);
-			/* TODO: Do invalidation */
-		}
-
-		spin_unlock(&cookie->lock);
-	}
-
-	_leave("");
+	if (!hlist_empty(&cookie->backing_objects) &&
+	    test_and_set_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags))
+		fscache_dispatch(cookie, NULL, 0, fscache_invalidate_object);
 }
 EXPORT_SYMBOL(__fscache_invalidate);
 
@@ -678,9 +507,10 @@ void __fscache_wait_on_invalidate(struct fscache_cookie *cookie)
 EXPORT_SYMBOL(__fscache_wait_on_invalidate);
 
 /*
- * update the index entries backing a cookie
+ * Update the index entries backing a cookie.  The writeback is done lazily.
  */
-void __fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data)
+void __fscache_update_cookie(struct fscache_cookie *cookie,
+			     const void *aux_data, const loff_t *object_size)
 {
 	struct fscache_object *object;
 
@@ -696,16 +526,9 @@ void __fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data
 
 	spin_lock(&cookie->lock);
 
-	fscache_update_aux(cookie, aux_data);
-
-	if (fscache_cookie_enabled(cookie)) {
-		/* update the index entry on disk in each cache backing this
-		 * cookie.
-		 */
-		hlist_for_each_entry(object,
-				     &cookie->backing_objects, cookie_link) {
-			fscache_raise_event(object, FSCACHE_OBJECT_EV_UPDATE);
-		}
+	fscache_update_aux(cookie, aux_data, object_size);
+	hlist_for_each_entry(object, &cookie->backing_objects, cookie_link) {
+		set_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->flags);
 	}
 
 	spin_unlock(&cookie->lock);
@@ -713,90 +536,13 @@ void __fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data
 }
 EXPORT_SYMBOL(__fscache_update_cookie);
 
-/*
- * Disable a cookie to stop it from accepting new requests from the netfs.
- */
-void __fscache_disable_cookie(struct fscache_cookie *cookie,
-			      const void *aux_data,
-			      bool invalidate)
-{
-	struct fscache_object *object;
-	bool awaken = false;
-
-	_enter("%p,%u", cookie, invalidate);
-
-	trace_fscache_disable(cookie);
-
-	ASSERTCMP(atomic_read(&cookie->n_active), >, 0);
-
-	if (atomic_read(&cookie->n_children) != 0) {
-		pr_err("Cookie '%s' still has children\n",
-		       cookie->type_name);
-		BUG();
-	}
-
-	wait_on_bit_lock(&cookie->flags, FSCACHE_COOKIE_ENABLEMENT_LOCK,
-			 TASK_UNINTERRUPTIBLE);
-
-	fscache_update_aux(cookie, aux_data);
-
-	if (!test_and_clear_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags))
-		goto out_unlock_enable;
-
-	/* If the cookie is being invalidated, wait for that to complete first
-	 * so that we can reuse the flag.
-	 */
-	__fscache_wait_on_invalidate(cookie);
-
-	/* Dispose of the backing objects */
-	set_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags);
-
-	spin_lock(&cookie->lock);
-	if (!hlist_empty(&cookie->backing_objects)) {
-		hlist_for_each_entry(object, &cookie->backing_objects, cookie_link) {
-			if (invalidate)
-				set_bit(FSCACHE_OBJECT_RETIRED, &object->flags);
-			fscache_raise_event(object, FSCACHE_OBJECT_EV_KILL);
-		}
-	} else {
-		if (test_and_clear_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags))
-			awaken = true;
-	}
-	spin_unlock(&cookie->lock);
-	if (awaken)
-		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING);
-
-	/* Wait for cessation of activity requiring access to the netfs (when
-	 * n_active reaches 0).  This makes sure outstanding reads and writes
-	 * have completed.
-	 */
-	if (!atomic_dec_and_test(&cookie->n_active)) {
-		wait_var_event(&cookie->n_active,
-			       !atomic_read(&cookie->n_active));
-	}
-
-	/* Reset the cookie state if it wasn't relinquished */
-	if (!test_bit(FSCACHE_COOKIE_RELINQUISHED, &cookie->flags)) {
-		atomic_inc(&cookie->n_active);
-		set_bit(FSCACHE_COOKIE_NO_DATA_YET, &cookie->flags);
-	}
-
-out_unlock_enable:
-	clear_bit_unlock(FSCACHE_COOKIE_ENABLEMENT_LOCK, &cookie->flags);
-	wake_up_bit(&cookie->flags, FSCACHE_COOKIE_ENABLEMENT_LOCK);
-	_leave("");
-}
-EXPORT_SYMBOL(__fscache_disable_cookie);
-
 /*
  * release a cookie back to the cache
  * - the object will be marked as recyclable on disk if retire is true
  * - all dependents of this cookie must have already been unregistered
  *   (indices/files/pages)
  */
-void __fscache_relinquish_cookie(struct fscache_cookie *cookie,
-				 const void *aux_data,
-				 bool retire)
+void __fscache_relinquish_cookie(struct fscache_cookie *cookie, bool retire)
 {
 	fscache_stat(&fscache_n_relinquishes);
 	if (retire)
@@ -818,7 +564,23 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie,
 	if (test_and_set_bit(FSCACHE_COOKIE_RELINQUISHED, &cookie->flags))
 		BUG();
 
-	__fscache_disable_cookie(cookie, aux_data, retire);
+	/* Mark the cookie out of service and wait for cessation of I/O. */
+	spin_lock(&cookie->lock);
+	cookie->stage = FSCACHE_COOKIE_STAGE_DEAD;
+	atomic_dec(&cookie->n_ops);
+	spin_unlock(&cookie->lock);
+
+	wait_var_event(&cookie->n_ops, atomic_read(&cookie->n_ops) == 0);
+
+	/* Dispose of the backing objects */
+	if (!hlist_empty(&cookie->backing_objects))
+		fscache_dispatch(cookie, NULL, retire, fscache_discard_objects);
+
+	if (atomic_read(&cookie->n_children) != 0) {
+		pr_err("Cookie '%s' still has children\n",
+		       cookie->type_name);
+		BUG();
+	}
 
 	if (cookie->parent) {
 		ASSERTCMP(atomic_read(&cookie->parent->usage), >, 0);
@@ -879,6 +641,7 @@ void fscache_cookie_put(struct fscache_cookie *cookie,
 
 	_leave("");
 }
+EXPORT_SYMBOL(fscache_cookie_put);
 
 /*
  * Generate a list of extant cookies in /proc/fs/fscache/cookies
@@ -892,8 +655,8 @@ static int fscache_cookies_seq_show(struct seq_file *m, void *v)
 
 	if (v == &fscache_cookies) {
 		seq_puts(m,
-			 "COOKIE   PARENT   USAGE CHILD ACT TY FL  DEF             \n"
-			 "======== ======== ===== ===== === == === ================\n"
+			 "COOKIE   PARENT   USAGE CHILD ACT OPS TY S FL  DEF             \n"
+			 "======== ======== ===== ===== === === == = === ================\n"
 			 );
 		return 0;
 	}
@@ -915,13 +678,15 @@ static int fscache_cookies_seq_show(struct seq_file *m, void *v)
 	}
 
 	seq_printf(m,
-		   "%08x %08x %5u %5u %3u %s %03lx %-16s",
+		   "%08x %08x %5d %5d %3d %3d %s %u %03lx %-16s",
 		   cookie->debug_id,
 		   cookie->parent ? cookie->parent->debug_id : 0,
 		   atomic_read(&cookie->usage),
 		   atomic_read(&cookie->n_children),
 		   atomic_read(&cookie->n_active),
+		   atomic_read(&cookie->n_ops) - 1,
 		   type,
+		   cookie->stage,
 		   cookie->flags,
 		   cookie->type_name);
 
diff --git a/fs/fscache/fsdef.c b/fs/fscache/fsdef.c
index b802cbddb578..377151a6a54a 100644
--- a/fs/fscache/fsdef.c
+++ b/fs/fscache/fsdef.c
@@ -40,7 +40,6 @@ struct fscache_cookie fscache_fsdef_index = {
 	.lock		= __SPIN_LOCK_UNLOCKED(fscache_fsdef_index.lock),
 	.backing_objects = HLIST_HEAD_INIT,
 	.type_name	= ".fscach",
-	.flags		= 1 << FSCACHE_COOKIE_ENABLED,
 	.type		= FSCACHE_COOKIE_TYPE_INDEX,
 };
 EXPORT_SYMBOL(fscache_fsdef_index);
diff --git a/fs/fscache/histogram.c b/fs/fscache/histogram.c
index 4e5beeaaf454..4fd5dda1d0ad 100644
--- a/fs/fscache/histogram.c
+++ b/fs/fscache/histogram.c
@@ -5,7 +5,7 @@
  * Written by David Howells (dhowells@redhat.com)
  */
 
-#define FSCACHE_DEBUG_LEVEL THREAD
+#define FSCACHE_DEBUG_LEVEL CACHE
 #include <linux/module.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 2100e2222884..443671310e31 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -74,6 +74,10 @@ extern struct fscache_cookie *fscache_alloc_cookie(struct fscache_cookie *,
 extern struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *);
 extern void fscache_cookie_put(struct fscache_cookie *,
 			       enum fscache_cookie_trace);
+extern struct fscache_object *fscache_attach_object(struct fscache_cookie *,
+						    struct fscache_object *);
+extern void fscache_set_cookie_stage(struct fscache_cookie *,
+				     enum fscache_cookie_stage);
 
 /*
  * dispatcher.c
@@ -119,19 +123,15 @@ extern unsigned fscache_defer_lookup;
 extern unsigned fscache_defer_create;
 extern unsigned fscache_debug;
 extern struct kobject *fscache_root;
-extern struct workqueue_struct *fscache_object_wq;
 extern struct workqueue_struct *fscache_op_wq;
-DECLARE_PER_CPU(wait_queue_head_t, fscache_object_cong_wait);
-
-static inline bool fscache_object_congested(void)
-{
-	return workqueue_congested(WORK_CPU_UNBOUND, fscache_object_wq);
-}
 
 /*
- * object.c
+ * obj.c
  */
-extern void fscache_enqueue_object(struct fscache_object *);
+extern void fscache_lookup_object(struct fscache_cookie *, struct fscache_object *, int);
+extern void fscache_invalidate_object(struct fscache_cookie *, struct fscache_object *, int);
+extern void fscache_drop_object(struct fscache_cookie *, struct fscache_object *, bool);
+extern void fscache_discard_objects(struct fscache_cookie *, struct fscache_object *, int);
 
 /*
  * object-list.c
@@ -239,44 +239,29 @@ int fscache_stats_show(struct seq_file *m, void *v);
 #define fscache_stat_d(stat) do {} while (0)
 #endif
 
-/*
- * raise an event on an object
- * - if the event is not masked for that object, then the object is
- *   queued for attention by the thread pool.
- */
-static inline void fscache_raise_event(struct fscache_object *object,
-				       unsigned event)
-{
-	BUG_ON(event >= NR_FSCACHE_OBJECT_EVENTS);
-#if 0
-	printk("*** fscache_raise_event(OBJ%d{%lx},%x)\n",
-	       object->debug_id, object->event_mask, (1 << event));
-#endif
-	if (!test_and_set_bit(event, &object->events) &&
-	    test_bit(event, &object->event_mask))
-		fscache_enqueue_object(object);
-}
-
-static inline void fscache_cookie_get(struct fscache_cookie *cookie,
-				      enum fscache_cookie_trace where)
+static inline
+struct fscache_cookie *fscache_cookie_get(struct fscache_cookie *cookie,
+					  enum fscache_cookie_trace where)
 {
 	int usage = atomic_inc_return(&cookie->usage);
 
 	trace_fscache_cookie(cookie, where, usage);
+	return cookie;
 }
 
 /*
  * Update the auxiliary data on a cookie.
  */
 static inline
-void fscache_update_aux(struct fscache_cookie *cookie, const void *aux_data)
+void fscache_update_aux(struct fscache_cookie *cookie,
+			const void *aux_data, const loff_t *object_size)
 {
 	void *p = fscache_get_aux(cookie);
 
-	if (p && memcmp(p, aux_data, cookie->aux_len) != 0) {
+	if (aux_data && p)
 		memcpy(p, aux_data, cookie->aux_len);
-		set_bit(FSCACHE_COOKIE_AUX_UPDATED, &cookie->flags);
-	}
+	if (object_size)
+		cookie->object_size = *object_size;
 }
 
 /*****************************************************************************/
@@ -337,7 +322,7 @@ do {						\
 
 #define FSCACHE_DEBUG_CACHE	0
 #define FSCACHE_DEBUG_COOKIE	1
-#define FSCACHE_DEBUG_PAGE	2
+#define FSCACHE_DEBUG_OBJECT	2
 #define FSCACHE_DEBUG_OPERATION	3
 
 #define FSCACHE_POINT_ENTER	1
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index c8f1beafa8e1..b2691439377b 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -38,11 +38,8 @@ MODULE_PARM_DESC(fscache_debug,
 		 "FS-Cache debugging mask");
 
 struct kobject *fscache_root;
-struct workqueue_struct *fscache_object_wq;
 struct workqueue_struct *fscache_op_wq;
 
-DEFINE_PER_CPU(wait_queue_head_t, fscache_object_cong_wait);
-
 /* these values serve as lower bounds, will be adjusted in fscache_init() */
 static unsigned fscache_object_max_active = 4;
 static unsigned fscache_op_max_active = 2;
@@ -64,14 +61,6 @@ static int fscache_max_active_sysctl(struct ctl_table *table, int write,
 }
 
 static struct ctl_table fscache_sysctls[] = {
-	{
-		.procname	= "object_max_active",
-		.data		= &fscache_object_max_active,
-		.maxlen		= sizeof(unsigned),
-		.mode		= 0644,
-		.proc_handler	= fscache_max_active_sysctl,
-		.extra1		= &fscache_object_wq,
-	},
 	{
 		.procname	= "operation_max_active",
 		.data		= &fscache_op_max_active,
@@ -98,20 +87,8 @@ static struct ctl_table fscache_sysctls_root[] = {
  */
 static int __init fscache_init(void)
 {
-	unsigned int nr_cpus = num_possible_cpus();
-	unsigned int cpu;
 	int ret;
 
-	fscache_object_max_active =
-		clamp_val(nr_cpus,
-			  fscache_object_max_active, WQ_UNBOUND_MAX_ACTIVE);
-
-	ret = -ENOMEM;
-	fscache_object_wq = alloc_workqueue("fscache_object", WQ_UNBOUND,
-					    fscache_object_max_active);
-	if (!fscache_object_wq)
-		goto error_object_wq;
-
 	fscache_op_max_active =
 		clamp_val(fscache_object_max_active / 2,
 			  fscache_op_max_active, WQ_UNBOUND_MAX_ACTIVE);
@@ -122,9 +99,6 @@ static int __init fscache_init(void)
 	if (!fscache_op_wq)
 		goto error_op_wq;
 
-	for_each_possible_cpu(cpu)
-		init_waitqueue_head(&per_cpu(fscache_object_cong_wait, cpu));
-
 	ret = fscache_init_dispatchers();
 	if (ret < 0)
 		goto error_dispatchers;
@@ -169,8 +143,6 @@ static int __init fscache_init(void)
 error_proc:
 	destroy_workqueue(fscache_op_wq);
 error_op_wq:
-	destroy_workqueue(fscache_object_wq);
-error_object_wq:
 	return ret;
 }
 
@@ -191,7 +163,6 @@ static void __exit fscache_exit(void)
 	fscache_proc_cleanup();
 	fscache_kill_dispatchers();
 	destroy_workqueue(fscache_op_wq);
-	destroy_workqueue(fscache_object_wq);
 	pr_notice("Unloaded\n");
 }
 
diff --git a/fs/fscache/netfs.c b/fs/fscache/netfs.c
index f8a816f844f6..7ebf733b068b 100644
--- a/fs/fscache/netfs.c
+++ b/fs/fscache/netfs.c
@@ -32,7 +32,7 @@ int __fscache_register_netfs(struct fscache_netfs *netfs)
 		return -ENOMEM;
 	}
 
-	candidate->flags = 1 << FSCACHE_COOKIE_ENABLED;
+	trace_fscache_cookie(candidate, fscache_cookie_new_netfs, 1);
 
 	/* check the netfs type is not already present */
 	cookie = fscache_hash_cookie(candidate);
@@ -68,7 +68,7 @@ void __fscache_unregister_netfs(struct fscache_netfs *netfs)
 {
 	_enter("{%s.%u}", netfs->name, netfs->version);
 
-	fscache_relinquish_cookie(netfs->primary_index, NULL, false);
+	fscache_relinquish_cookie(netfs->primary_index, false);
 	pr_notice("Netfs '%s' unregistered from caching\n", netfs->name);
 
 	_leave("");
diff --git a/fs/fscache/obj.c b/fs/fscache/obj.c
new file mode 100644
index 000000000000..63373b99ac34
--- /dev/null
+++ b/fs/fscache/obj.c
@@ -0,0 +1,352 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Cache object management
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for more information on
+ * the netfs API.
+ */
+
+#define FSCACHE_DEBUG_LEVEL OPERATION
+#include <linux/module.h>
+#include <linux/slab.h>
+#include "internal.h"
+
+static struct fscache_object *fscache_do_alloc_object(struct fscache_cookie *cookie,
+						      struct fscache_cache *cache,
+						      struct fscache_object *parent)
+{
+	struct fscache_object *object;
+
+	fscache_stat(&fscache_n_cop_alloc_object);
+	object = cache->ops->alloc_object(cookie, cache, parent);
+	fscache_stat_d(&fscache_n_cop_alloc_object);
+
+	if (!IS_ERR(object)) {
+		ASSERTCMP(object->cookie, ==, cookie);
+		_debug("ALLOC OBJ%x: %s", object->debug_id, cookie->type_name);
+		fscache_stat(&fscache_n_object_alloc);
+	} else {
+		fscache_stat(&fscache_n_object_no_alloc);
+	}
+
+	return object;
+}
+
+static int fscache_do_lookup_object(struct fscache_object *object, void *data)
+{
+	int ret;
+	fscache_stat(&fscache_n_object_lookups);
+	fscache_stat(&fscache_n_cop_lookup_object);
+	ret = object->cache->ops->lookup_object(object, data);
+	fscache_stat_d(&fscache_n_cop_lookup_object);
+	return ret;
+}
+
+static int fscache_do_create_object(struct fscache_object *object, void *data)
+{
+	int ret;
+	fscache_stat(&fscache_n_object_lookups);
+	fscache_stat(&fscache_n_cop_lookup_object);
+	ret = object->cache->ops->create_object(object, data);
+	fscache_stat_d(&fscache_n_cop_lookup_object);
+	return ret;
+}
+
+static void fscache_do_update_object(struct fscache_object *object)
+{
+	fscache_stat(&fscache_n_updates_run);
+	fscache_stat(&fscache_n_cop_update_object);
+	object->cache->ops->update_object(object);
+	fscache_stat_d(&fscache_n_cop_update_object);
+}
+
+static void fscache_do_drop_object(struct fscache_cache *cache,
+				   struct fscache_object *object,
+				   bool invalidate)
+{
+	fscache_stat(&fscache_n_cop_drop_object);
+	cache->ops->drop_object(object, invalidate);
+	fscache_stat_d(&fscache_n_cop_drop_object);
+}
+
+static void fscache_do_put_object(struct fscache_object *object,
+				  enum fscache_obj_ref_trace why)
+{
+	fscache_stat(&fscache_n_cop_put_object);
+	object->cache->ops->put_object(object, why);
+	fscache_stat_d(&fscache_n_cop_put_object);
+}
+
+/*
+ * Do the actual on-disk wrangling involved in object lookup/creation.
+ */
+static bool fscache_wrangle_object(struct fscache_cookie *cookie,
+				   struct fscache_cache *cache,
+				   struct fscache_object *object)
+{
+
+	void *lookup_data;
+	bool ret = false;
+
+	lookup_data = cache->ops->prepare_lookup_data(object);
+	if (IS_ERR(lookup_data))
+		goto out;
+
+	if (!fscache_do_lookup_object(object, lookup_data))
+		goto out_free;
+
+	if (object->stage < FSCACHE_OBJECT_STAGE_LIVE_EMPTY &&
+	    !fscache_do_create_object(object, lookup_data))
+		goto out_free;
+
+	fscache_set_cookie_stage(cookie,
+				 (object->stage < FSCACHE_OBJECT_STAGE_LIVE ?
+				  FSCACHE_COOKIE_STAGE_NO_DATA_YET :
+				  FSCACHE_COOKIE_STAGE_ACTIVE));
+	ret = true;
+
+out_free:
+	cache->ops->free_lookup_data(object, lookup_data);
+out:
+	return ret;
+}
+
+/*
+ * Create an object chain, making sure that the index chain is fully created.
+ */
+static struct fscache_object *fscache_lookup_object_chain(struct fscache_cookie *cookie,
+							  struct fscache_cache *cache)
+{
+	struct fscache_object *object = NULL, *parent, *xobject;
+
+	_enter("c=%08x", cookie->debug_id);
+
+	spin_lock(&cookie->lock);
+	hlist_for_each_entry(object, &cookie->backing_objects, cookie_link) {
+		if (object->cache == cache)
+			goto object_exists_grab;
+	}
+	spin_unlock(&cookie->lock);
+
+	/* Recurse to look up/create the parent index. */
+	parent = fscache_lookup_object_chain(cookie->parent, cache);
+	if (IS_ERR(parent))
+		goto error;
+
+	/* Ask the cache to allocate an object (we may end up with duplicate
+	 * objects at this stage, but we sort that out later).
+	 *
+	 * The object may be created, say, with O_TMPFILE at this point if the
+	 * parent index was unpopulated.  Note that this may race on index
+	 * creation with other callers.
+	 */
+	object = fscache_do_alloc_object(cookie, cache, parent);
+	if (!object)
+		goto error;
+
+	xobject = fscache_attach_object(cookie, object);
+	if (xobject != object) {
+		fscache_do_put_object(object, fscache_obj_put_alloc_dup);
+		object = xobject;
+		if (fscache_cache_is_broken(object))
+			goto error_put;
+		goto object_exists;
+	}
+
+	if (!fscache_wrangle_object(cookie, cache, object))
+		goto error_detach;
+
+	_leave(" = %u", object->debug_id);
+	return object;
+
+object_exists_grab:
+	object = cache->ops->grab_object(object, fscache_obj_get_exists);
+	if (fscache_cache_is_broken(object)) {
+		spin_unlock(&cookie->lock);
+		goto error_put;
+	}
+
+	spin_unlock(&cookie->lock);
+
+object_exists:
+	wait_var_event(&object->stage,
+		       object->stage >= FSCACHE_OBJECT_STAGE_LIVE_EMPTY);
+
+	if (object->stage >= FSCACHE_OBJECT_STAGE_DESTROYING)
+		goto error_put;
+
+	_leave(" = %u [found]", object->debug_id);
+	return object;
+
+error_detach:
+	spin_lock(&cookie->lock);
+	spin_lock(&object->lock);
+	object->parent = NULL;
+	object->stage = FSCACHE_OBJECT_STAGE_DEAD;
+	cookie->stage = FSCACHE_COOKIE_STAGE_DEAD;
+	hlist_del_init(&object->cookie_link);
+	spin_unlock(&object->lock);
+	spin_unlock(&cookie->lock);
+	fscache_drop_object(cookie, object, false);
+error_put:
+	fscache_do_put_object(object, fscache_obj_put_lookup_fail);
+error:
+	fscache_set_cookie_stage(cookie, FSCACHE_COOKIE_STAGE_QUIESCENT);
+	_leave(" = NULL");
+	return NULL;
+}
+
+/*
+ * Create an object in the cache.
+ * - this must make sure the index chain is instantiated and instantiate the
+ *   object representation too
+ */
+static void fscache_lookup_object_locked(struct fscache_cookie *cookie)
+{
+	struct fscache_object *object;
+	struct fscache_cache *cache;
+
+	_enter("");
+
+	/* select a cache in which to store the object */
+	cache = fscache_select_cache_for_object(cookie);
+	if (!cache) {
+		fscache_stat(&fscache_n_acquires_no_cache);
+		fscache_set_cookie_stage(cookie, FSCACHE_COOKIE_STAGE_QUIESCENT);
+		_leave(" [no cache]");
+		return;
+	}
+
+	_debug("cache %s", cache->tag->name);
+
+	object = fscache_lookup_object_chain(cookie, cache);
+	if (!object) {
+		_leave(" [fail]");
+		return;
+	}
+
+	fscache_do_put_object(object, fscache_obj_put);
+	_leave(" [done]");
+}
+
+void fscache_lookup_object(struct fscache_cookie *cookie,
+			   struct fscache_object *object, int param)
+{
+	down_read(&fscache_addremove_sem);
+	fscache_lookup_object_locked(cookie);
+	up_read(&fscache_addremove_sem);
+	__fscache_unuse_cookie(cookie, NULL, NULL);
+}
+
+/*
+ * Invalidate an object
+ */
+void fscache_invalidate_object(struct fscache_cookie *cookie,
+			       struct fscache_object *unused, int param)
+{
+	struct fscache_object *object = NULL;
+
+	spin_lock(&cookie->lock);
+
+	if (!hlist_empty(&cookie->backing_objects)) {
+		object = hlist_entry(cookie->backing_objects.first,
+				     struct fscache_object,
+				     cookie_link);
+		object = object->cache->ops->grab_object(object,
+							 fscache_obj_get_inval);
+	}
+
+	spin_unlock(&cookie->lock);
+
+	if (object) {
+		object->cache->ops->invalidate_object(object);
+		fscache_do_put_object(object, fscache_obj_put_inval);
+	}
+
+	clear_bit_unlock(FSCACHE_COOKIE_INVALIDATING, &cookie->flags);
+	wake_up_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING);
+}
+
+/*
+ * Drop an object's attachments
+ */
+void fscache_drop_object(struct fscache_cookie *cookie,
+			 struct fscache_object *object,
+			 bool invalidate)
+{
+	struct fscache_object *parent = object->parent;
+	struct fscache_cache *cache = object->cache;
+
+	ASSERTCMP(cookie->stage, ==, FSCACHE_COOKIE_STAGE_DEAD);
+
+	_enter("{OBJ%x,%d},%u",
+	       object->debug_id, object->n_children, invalidate);
+
+	if (!invalidate &&
+	    test_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->flags)) {
+		_debug("final update");
+		fscache_do_update_object(object);
+	}
+
+	spin_lock(&cache->object_list_lock);
+	list_del_init(&object->cache_link);
+	spin_unlock(&cache->object_list_lock);
+
+	fscache_do_drop_object(cache, object, invalidate);
+
+	/* The parent object wants to know when all it dependents have gone */
+	if (parent) {
+		_debug("release parent OBJ%x {%d}",
+		       parent->debug_id, parent->n_children);
+
+		spin_lock(&parent->lock);
+		parent->n_children--;
+		if (parent->n_children == 0)
+			wake_up_var(&parent->n_children);
+		spin_unlock(&parent->lock);
+		fscache_do_put_object(parent, fscache_obj_put_drop_child);
+	}
+
+	fscache_do_put_object(object, fscache_obj_put_drop_obj);
+	fscache_stat(&fscache_n_object_dead);
+	_leave("");
+}
+
+/*
+ * Discard objects.  param==1 to invalidate it at the same time.
+ */
+void fscache_discard_objects(struct fscache_cookie *cookie,
+			     struct fscache_object *unused, int param)
+{
+	_enter("c=%08x", cookie->debug_id);
+
+	/* Take the cache lock to prevent a fight with cache withdrawal */
+	down_read(&fscache_addremove_sem);
+
+	for (;;) {
+		struct fscache_object *object = NULL;
+
+		trace_fscache_cookie(cookie, fscache_cookie_see_discard,
+				     atomic_read(&cookie->usage));
+
+		spin_lock(&cookie->lock);
+		if (!hlist_empty(&cookie->backing_objects)) {
+			object = hlist_entry(cookie->backing_objects.first,
+					     struct fscache_object,
+					     cookie_link);
+			hlist_del_init(&object->cookie_link);
+		}
+		spin_unlock(&cookie->lock);
+
+		if (!object)
+			break;
+
+		_debug("DISCARD %x", object->debug_id);
+		fscache_drop_object(cookie, object, param);
+	}
+
+	up_read(&fscache_addremove_sem);
+	_leave("");
+}
diff --git a/fs/fscache/object-list.c b/fs/fscache/object-list.c
index 147c556ee01b..5777f909d31a 100644
--- a/fs/fscache/object-list.c
+++ b/fs/fscache/object-list.c
@@ -23,10 +23,6 @@ struct fscache_objlist_data {
 #define FSCACHE_OBJLIST_CONFIG_AUX	0x00000002	/* show object auxdata */
 #define FSCACHE_OBJLIST_CONFIG_BUSY	0x00000010	/* show busy objects */
 #define FSCACHE_OBJLIST_CONFIG_IDLE	0x00000020	/* show idle objects */
-#define FSCACHE_OBJLIST_CONFIG_EVENTS	0x00000400	/* show objects with events */
-#define FSCACHE_OBJLIST_CONFIG_NOEVENTS	0x00000800	/* show objects without no events */
-#define FSCACHE_OBJLIST_CONFIG_WORK	0x00001000	/* show objects with work */
-#define FSCACHE_OBJLIST_CONFIG_NOWORK	0x00002000	/* show objects without work */
 };
 
 /*
@@ -162,32 +158,15 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 	u8 *p;
 
 	if ((unsigned long) v == 1) {
-		seq_puts(m, "OBJECT   PARENT   STAT CHLDN OPS OOP"
-			 " EM EV FL S"
-			 " | COOKIE   TYPE    TY FL");
-		if (config & (FSCACHE_OBJLIST_CONFIG_KEY |
-			      FSCACHE_OBJLIST_CONFIG_AUX))
-			seq_puts(m, "       ");
-		if (config & FSCACHE_OBJLIST_CONFIG_KEY)
-			seq_puts(m, "OBJECT_KEY");
-		if ((config & (FSCACHE_OBJLIST_CONFIG_KEY |
-			       FSCACHE_OBJLIST_CONFIG_AUX)) ==
-		    (FSCACHE_OBJLIST_CONFIG_KEY | FSCACHE_OBJLIST_CONFIG_AUX))
-			seq_puts(m, ", ");
-		if (config & FSCACHE_OBJLIST_CONFIG_AUX)
-			seq_puts(m, "AUX_DATA");
+		seq_puts(m, "OBJECT   PARENT   USE CHLDN OPS FL  S"
+			 " | COOKIE   TYPE    TY S FLG CONTENT_MAP");
 		seq_puts(m, "\n");
 		return 0;
 	}
 
 	if ((unsigned long) v == 2) {
-		seq_puts(m, "======== ======== ==== ===== === ==="
-			 " == == == ="
-			 " | ======== ======= == ===");
-		if (config & (FSCACHE_OBJLIST_CONFIG_KEY |
-			      FSCACHE_OBJLIST_CONFIG_AUX))
-			seq_puts(m, " ================");
-		seq_puts(m, "\n");
+		seq_puts(m, "======== ======== === ===== === === ="
+			 " | ======== ======= == = === ================\n");
 		return 0;
 	}
 
@@ -207,29 +186,19 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 
 	cookie = obj->cookie;
 	if (~config) {
-		FILTER(fscache_object_is_active(obj) ||
-		       obj->n_ops != 0 ||
-		       obj->n_obj_ops != 0 ||
-		       obj->flags ||
-		       !list_empty(&obj->dependents),
+		FILTER(atomic_read(&cookie->n_ops) > 0,
 		       BUSY, IDLE);
-		FILTER(obj->events & obj->event_mask,
-		       EVENTS, NOEVENTS);
-		FILTER(work_busy(&obj->work), WORK, NOWORK);
 	}
 
 	seq_printf(m,
-		   "%08x %08x %s %5u %3u %3u %2lx %2lx %2lx %1x | ",
+		   "%08x %08x %3u %5u %3u %3lx %u | ",
 		   obj->debug_id,
 		   obj->parent ? obj->parent->debug_id : UINT_MAX,
-		   obj->state->short_name,
+		   obj->cache->ops->get_object_usage(obj),
 		   obj->n_children,
-		   obj->n_ops,
-		   obj->n_obj_ops,
-		   obj->event_mask,
-		   obj->events,
+		   atomic_read(&obj->cookie->n_ops),
 		   obj->flags,
-		   work_busy(&obj->work));
+		   obj->stage);
 
 	if (obj->cookie) {
 		uint16_t keylen = 0, auxlen = 0;
@@ -248,10 +217,11 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 			break;
 		}
 
-		seq_printf(m, "%08x %-7s %s %3lx",
+		seq_printf(m, "%08x %-7s %s %u %3lx",
 			   cookie->debug_id,
 			   cookie->type_name,
 			   type,
+			   cookie->stage,
 			   cookie->flags);
 
 		if (config & FSCACHE_OBJLIST_CONFIG_KEY)
@@ -325,8 +295,6 @@ static void fscache_objlist_config(struct fscache_objlist_data *data)
 		case 'A': config |= FSCACHE_OBJLIST_CONFIG_AUX;		break;
 		case 'B': config |= FSCACHE_OBJLIST_CONFIG_BUSY;	break;
 		case 'b': config |= FSCACHE_OBJLIST_CONFIG_IDLE;	break;
-		case 'S': config |= FSCACHE_OBJLIST_CONFIG_WORK;	break;
-		case 's': config |= FSCACHE_OBJLIST_CONFIG_NOWORK;	break;
 		}
 	}
 
@@ -335,10 +303,6 @@ static void fscache_objlist_config(struct fscache_objlist_data *data)
 
 	if (!(config & (FSCACHE_OBJLIST_CONFIG_BUSY | FSCACHE_OBJLIST_CONFIG_IDLE)))
 	    config   |= FSCACHE_OBJLIST_CONFIG_BUSY | FSCACHE_OBJLIST_CONFIG_IDLE;
-	if (!(config & (FSCACHE_OBJLIST_CONFIG_EVENTS | FSCACHE_OBJLIST_CONFIG_NOEVENTS)))
-	    config   |= FSCACHE_OBJLIST_CONFIG_EVENTS | FSCACHE_OBJLIST_CONFIG_NOEVENTS;
-	if (!(config & (FSCACHE_OBJLIST_CONFIG_WORK | FSCACHE_OBJLIST_CONFIG_NOWORK)))
-	    config   |= FSCACHE_OBJLIST_CONFIG_WORK | FSCACHE_OBJLIST_CONFIG_NOWORK;
 
 	data->config = config;
 	return;
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
deleted file mode 100644
index 5d50976bf379..000000000000
--- a/fs/fscache/object.c
+++ /dev/null
@@ -1,982 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/* FS-Cache object state machine handler
- *
- * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- *
- * See Documentation/filesystems/caching/object.rst for a description of the
- * object state machine and the in-kernel representations.
- */
-
-#define FSCACHE_DEBUG_LEVEL COOKIE
-#include <linux/module.h>
-#include <linux/slab.h>
-#include <linux/prefetch.h>
-#include "internal.h"
-
-static const struct fscache_state *fscache_abort_initialisation(struct fscache_object *, int);
-static const struct fscache_state *fscache_kill_dependents(struct fscache_object *, int);
-static const struct fscache_state *fscache_drop_object(struct fscache_object *, int);
-static const struct fscache_state *fscache_initialise_object(struct fscache_object *, int);
-static const struct fscache_state *fscache_invalidate_object(struct fscache_object *, int);
-static const struct fscache_state *fscache_jumpstart_dependents(struct fscache_object *, int);
-static const struct fscache_state *fscache_kill_object(struct fscache_object *, int);
-static const struct fscache_state *fscache_lookup_failure(struct fscache_object *, int);
-static const struct fscache_state *fscache_look_up_object(struct fscache_object *, int);
-static const struct fscache_state *fscache_object_available(struct fscache_object *, int);
-static const struct fscache_state *fscache_parent_ready(struct fscache_object *, int);
-static const struct fscache_state *fscache_update_object(struct fscache_object *, int);
-static const struct fscache_state *fscache_object_dead(struct fscache_object *, int);
-
-#define __STATE_NAME(n) fscache_osm_##n
-#define STATE(n) (&__STATE_NAME(n))
-
-/*
- * Define a work state.  Work states are execution states.  No event processing
- * is performed by them.  The function attached to a work state returns a
- * pointer indicating the next state to which the state machine should
- * transition.  Returning NO_TRANSIT repeats the current state, but goes back
- * to the scheduler first.
- */
-#define WORK_STATE(n, sn, f) \
-	const struct fscache_state __STATE_NAME(n) = {			\
-		.name = #n,						\
-		.short_name = sn,					\
-		.work = f						\
-	}
-
-/*
- * Returns from work states.
- */
-#define transit_to(state) ({ prefetch(&STATE(state)->work); STATE(state); })
-
-#define NO_TRANSIT ((struct fscache_state *)NULL)
-
-/*
- * Define a wait state.  Wait states are event processing states.  No execution
- * is performed by them.  Wait states are just tables of "if event X occurs,
- * clear it and transition to state Y".  The dispatcher returns to the
- * scheduler if none of the events in which the wait state has an interest are
- * currently pending.
- */
-#define WAIT_STATE(n, sn, ...) \
-	const struct fscache_state __STATE_NAME(n) = {			\
-		.name = #n,						\
-		.short_name = sn,					\
-		.work = NULL,						\
-		.transitions = { __VA_ARGS__, { 0, NULL } }		\
-	}
-
-#define TRANSIT_TO(state, emask) \
-	{ .events = (emask), .transit_to = STATE(state) }
-
-/*
- * The object state machine.
- */
-static WORK_STATE(INIT_OBJECT,		"INIT", fscache_initialise_object);
-static WORK_STATE(PARENT_READY,		"PRDY", fscache_parent_ready);
-static WORK_STATE(ABORT_INIT,		"ABRT", fscache_abort_initialisation);
-static WORK_STATE(LOOK_UP_OBJECT,	"LOOK", fscache_look_up_object);
-static WORK_STATE(CREATE_OBJECT,	"CRTO", fscache_look_up_object);
-static WORK_STATE(OBJECT_AVAILABLE,	"AVBL", fscache_object_available);
-static WORK_STATE(JUMPSTART_DEPS,	"JUMP", fscache_jumpstart_dependents);
-
-static WORK_STATE(INVALIDATE_OBJECT,	"INVL", fscache_invalidate_object);
-static WORK_STATE(UPDATE_OBJECT,	"UPDT", fscache_update_object);
-
-static WORK_STATE(LOOKUP_FAILURE,	"LCFL", fscache_lookup_failure);
-static WORK_STATE(KILL_OBJECT,		"KILL", fscache_kill_object);
-static WORK_STATE(KILL_DEPENDENTS,	"KDEP", fscache_kill_dependents);
-static WORK_STATE(DROP_OBJECT,		"DROP", fscache_drop_object);
-static WORK_STATE(OBJECT_DEAD,		"DEAD", fscache_object_dead);
-
-static WAIT_STATE(WAIT_FOR_INIT,	"?INI",
-		  TRANSIT_TO(INIT_OBJECT,	1 << FSCACHE_OBJECT_EV_NEW_CHILD));
-
-static WAIT_STATE(WAIT_FOR_PARENT,	"?PRN",
-		  TRANSIT_TO(PARENT_READY,	1 << FSCACHE_OBJECT_EV_PARENT_READY));
-
-static WAIT_STATE(WAIT_FOR_CMD,		"?CMD",
-		  TRANSIT_TO(INVALIDATE_OBJECT,	1 << FSCACHE_OBJECT_EV_INVALIDATE),
-		  TRANSIT_TO(UPDATE_OBJECT,	1 << FSCACHE_OBJECT_EV_UPDATE),
-		  TRANSIT_TO(JUMPSTART_DEPS,	1 << FSCACHE_OBJECT_EV_NEW_CHILD));
-
-static WAIT_STATE(WAIT_FOR_CLEARANCE,	"?CLR",
-		  TRANSIT_TO(KILL_OBJECT,	1 << FSCACHE_OBJECT_EV_CLEARED));
-
-/*
- * Out-of-band event transition tables.  These are for handling unexpected
- * events, such as an I/O error.  If an OOB event occurs, the state machine
- * clears and disables the event and forces a transition to the nominated work
- * state (acurrently executing work states will complete first).
- *
- * In such a situation, object->state remembers the state the machine should
- * have been in/gone to and returning NO_TRANSIT returns to that.
- */
-static const struct fscache_transition fscache_osm_init_oob[] = {
-	   TRANSIT_TO(ABORT_INIT,
-		      (1 << FSCACHE_OBJECT_EV_ERROR) |
-		      (1 << FSCACHE_OBJECT_EV_KILL)),
-	   { 0, NULL }
-};
-
-static const struct fscache_transition fscache_osm_lookup_oob[] = {
-	   TRANSIT_TO(LOOKUP_FAILURE,
-		      (1 << FSCACHE_OBJECT_EV_ERROR) |
-		      (1 << FSCACHE_OBJECT_EV_KILL)),
-	   { 0, NULL }
-};
-
-static const struct fscache_transition fscache_osm_run_oob[] = {
-	   TRANSIT_TO(KILL_OBJECT,
-		      (1 << FSCACHE_OBJECT_EV_ERROR) |
-		      (1 << FSCACHE_OBJECT_EV_KILL)),
-	   { 0, NULL }
-};
-
-static int  fscache_get_object(struct fscache_object *,
-			       enum fscache_obj_ref_trace);
-static void fscache_put_object(struct fscache_object *,
-			       enum fscache_obj_ref_trace);
-static bool fscache_enqueue_dependents(struct fscache_object *, int);
-static void fscache_dequeue_object(struct fscache_object *);
-static void fscache_update_aux_data(struct fscache_object *);
-
-/*
- * we need to notify the parent when an op completes that we had outstanding
- * upon it
- */
-static inline void fscache_done_parent_op(struct fscache_object *object)
-{
-	struct fscache_object *parent = object->parent;
-
-	_enter("OBJ%x {OBJ%x,%x}",
-	       object->debug_id, parent->debug_id, parent->n_ops);
-
-	spin_lock_nested(&parent->lock, 1);
-	parent->n_obj_ops--;
-	parent->n_ops--;
-	if (parent->n_ops == 0)
-		fscache_raise_event(parent, FSCACHE_OBJECT_EV_CLEARED);
-	spin_unlock(&parent->lock);
-}
-
-/*
- * Object state machine dispatcher.
- */
-static void fscache_object_sm_dispatcher(struct fscache_object *object)
-{
-	const struct fscache_transition *t;
-	const struct fscache_state *state, *new_state;
-	unsigned long events, event_mask;
-	bool oob;
-	int event = -1;
-
-	ASSERT(object != NULL);
-
-	_enter("{OBJ%x,%s,%lx}",
-	       object->debug_id, object->state->name, object->events);
-
-	event_mask = object->event_mask;
-restart:
-	object->event_mask = 0; /* Mask normal event handling */
-	state = object->state;
-restart_masked:
-	events = object->events;
-
-	/* Handle any out-of-band events (typically an error) */
-	if (events & object->oob_event_mask) {
-		_debug("{OBJ%x} oob %lx",
-		       object->debug_id, events & object->oob_event_mask);
-		oob = true;
-		for (t = object->oob_table; t->events; t++) {
-			if (events & t->events) {
-				state = t->transit_to;
-				ASSERT(state->work != NULL);
-				event = fls(events & t->events) - 1;
-				__clear_bit(event, &object->oob_event_mask);
-				clear_bit(event, &object->events);
-				goto execute_work_state;
-			}
-		}
-	}
-	oob = false;
-
-	/* Wait states are just transition tables */
-	if (!state->work) {
-		if (events & event_mask) {
-			for (t = state->transitions; t->events; t++) {
-				if (events & t->events) {
-					new_state = t->transit_to;
-					event = fls(events & t->events) - 1;
-					trace_fscache_osm(object, state,
-							  true, false, event);
-					clear_bit(event, &object->events);
-					_debug("{OBJ%x} ev %d: %s -> %s",
-					       object->debug_id, event,
-					       state->name, new_state->name);
-					object->state = state = new_state;
-					goto execute_work_state;
-				}
-			}
-
-			/* The event mask didn't include all the tabled bits */
-			BUG();
-		}
-		/* Randomly woke up */
-		goto unmask_events;
-	}
-
-execute_work_state:
-	_debug("{OBJ%x} exec %s", object->debug_id, state->name);
-
-	trace_fscache_osm(object, state, false, oob, event);
-	new_state = state->work(object, event);
-	event = -1;
-	if (new_state == NO_TRANSIT) {
-		_debug("{OBJ%x} %s notrans", object->debug_id, state->name);
-		if (unlikely(state == STATE(OBJECT_DEAD))) {
-			_leave(" [dead]");
-			return;
-		}
-		fscache_enqueue_object(object);
-		event_mask = object->oob_event_mask;
-		goto unmask_events;
-	}
-
-	_debug("{OBJ%x} %s -> %s",
-	       object->debug_id, state->name, new_state->name);
-	object->state = state = new_state;
-
-	if (state->work) {
-		if (unlikely(state == STATE(OBJECT_DEAD))) {
-			_leave(" [dead]");
-			return;
-		}
-		goto restart_masked;
-	}
-
-	/* Transited to wait state */
-	event_mask = object->oob_event_mask;
-	for (t = state->transitions; t->events; t++)
-		event_mask |= t->events;
-
-unmask_events:
-	object->event_mask = event_mask;
-	smp_mb();
-	events = object->events;
-	if (events & event_mask)
-		goto restart;
-	_leave(" [msk %lx]", event_mask);
-}
-
-/*
- * execute an object
- */
-static void fscache_object_work_func(struct work_struct *work)
-{
-	struct fscache_object *object =
-		container_of(work, struct fscache_object, work);
-	unsigned long start;
-
-	_enter("{OBJ%x}", object->debug_id);
-
-	start = jiffies;
-	fscache_object_sm_dispatcher(object);
-	fscache_hist(fscache_objs_histogram, start);
-	fscache_put_object(object, fscache_obj_put_work);
-}
-
-/**
- * fscache_object_init - Initialise a cache object description
- * @object: Object description
- * @cookie: Cookie object will be attached to
- * @cache: Cache in which backing object will be found
- *
- * Initialise a cache object description to its basic values.
- *
- * See Documentation/filesystems/caching/backend-api.rst for a complete
- * description.
- */
-void fscache_object_init(struct fscache_object *object,
-			 struct fscache_cookie *cookie,
-			 struct fscache_cache *cache)
-{
-	const struct fscache_transition *t;
-
-	atomic_inc(&cache->object_count);
-
-	object->state = STATE(WAIT_FOR_INIT);
-	object->oob_table = fscache_osm_init_oob;
-	object->flags = 1 << FSCACHE_OBJECT_IS_LIVE;
-	spin_lock_init(&object->lock);
-	INIT_LIST_HEAD(&object->cache_link);
-	INIT_HLIST_NODE(&object->cookie_link);
-	INIT_WORK(&object->work, fscache_object_work_func);
-	INIT_LIST_HEAD(&object->dependents);
-	INIT_LIST_HEAD(&object->dep_link);
-	object->n_children = 0;
-	object->n_ops = 0;
-	object->events = 0;
-	object->cache = cache;
-	object->cookie = cookie;
-	fscache_cookie_get(cookie, fscache_cookie_get_attach_object);
-	object->parent = NULL;
-#ifdef CONFIG_FSCACHE_OBJECT_LIST
-	RB_CLEAR_NODE(&object->objlist_link);
-#endif
-
-	object->oob_event_mask = 0;
-	for (t = object->oob_table; t->events; t++)
-		object->oob_event_mask |= t->events;
-	object->event_mask = object->oob_event_mask;
-	for (t = object->state->transitions; t->events; t++)
-		object->event_mask |= t->events;
-}
-EXPORT_SYMBOL(fscache_object_init);
-
-/*
- * Mark the object as no longer being live, making sure that we synchronise
- * against op submission.
- */
-static inline void fscache_mark_object_dead(struct fscache_object *object)
-{
-	spin_lock(&object->lock);
-	clear_bit(FSCACHE_OBJECT_IS_LIVE, &object->flags);
-	spin_unlock(&object->lock);
-}
-
-/*
- * Abort object initialisation before we start it.
- */
-static const struct fscache_state *fscache_abort_initialisation(struct fscache_object *object,
-								int event)
-{
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	object->oob_event_mask = 0;
-	fscache_dequeue_object(object);
-	return transit_to(KILL_OBJECT);
-}
-
-/*
- * initialise an object
- * - check the specified object's parent to see if we can make use of it
- *   immediately to do a creation
- * - we may need to start the process of creating a parent and we need to wait
- *   for the parent's lookup and creation to complete if it's not there yet
- */
-static const struct fscache_state *fscache_initialise_object(struct fscache_object *object,
-							     int event)
-{
-	struct fscache_object *parent;
-	bool success;
-
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	ASSERT(list_empty(&object->dep_link));
-
-	parent = object->parent;
-	if (!parent) {
-		_leave(" [no parent]");
-		return transit_to(DROP_OBJECT);
-	}
-
-	_debug("parent: %s of:%lx", parent->state->name, parent->flags);
-
-	if (fscache_object_is_dying(parent)) {
-		_leave(" [bad parent]");
-		return transit_to(DROP_OBJECT);
-	}
-
-	if (fscache_object_is_available(parent)) {
-		_leave(" [ready]");
-		return transit_to(PARENT_READY);
-	}
-
-	_debug("wait");
-
-	spin_lock(&parent->lock);
-	fscache_stat(&fscache_n_cop_grab_object);
-	success = false;
-	if (fscache_object_is_live(parent) &&
-	    object->cache->ops->grab_object(object, fscache_obj_get_add_to_deps)) {
-		list_add(&object->dep_link, &parent->dependents);
-		success = true;
-	}
-	fscache_stat_d(&fscache_n_cop_grab_object);
-	spin_unlock(&parent->lock);
-	if (!success) {
-		_leave(" [grab failed]");
-		return transit_to(DROP_OBJECT);
-	}
-
-	/* fscache_acquire_non_index_cookie() uses this
-	 * to wake the chain up */
-	fscache_raise_event(parent, FSCACHE_OBJECT_EV_NEW_CHILD);
-	_leave(" [wait]");
-	return transit_to(WAIT_FOR_PARENT);
-}
-
-/*
- * Once the parent object is ready, we should kick off our lookup op.
- */
-static const struct fscache_state *fscache_parent_ready(struct fscache_object *object,
-							int event)
-{
-	struct fscache_object *parent = object->parent;
-
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	ASSERT(parent != NULL);
-
-	spin_lock(&parent->lock);
-	parent->n_ops++;
-	parent->n_obj_ops++;
-	object->lookup_jif = jiffies;
-	spin_unlock(&parent->lock);
-
-	_leave("");
-	return transit_to(LOOK_UP_OBJECT);
-}
-
-/*
- * look an object up in the cache from which it was allocated
- * - we hold an "access lock" on the parent object, so the parent object cannot
- *   be withdrawn by either party till we've finished
- */
-static const struct fscache_state *fscache_look_up_object(struct fscache_object *object,
-							  int event)
-{
-	struct fscache_cookie *cookie = object->cookie;
-	struct fscache_object *parent = object->parent;
-	int ret;
-
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	object->oob_table = fscache_osm_lookup_oob;
-
-	ASSERT(parent != NULL);
-	ASSERTCMP(parent->n_ops, >, 0);
-	ASSERTCMP(parent->n_obj_ops, >, 0);
-
-	/* make sure the parent is still available */
-	ASSERT(fscache_object_is_available(parent));
-
-	if (fscache_object_is_dying(parent) ||
-	    test_bit(FSCACHE_IOERROR, &object->cache->flags) ||
-	    !fscache_use_cookie(object)) {
-		_leave(" [unavailable]");
-		return transit_to(LOOKUP_FAILURE);
-	}
-
-	_debug("LOOKUP \"%s\" in \"%s\"",
-	       cookie->type_name, object->cache->tag->name);
-
-	fscache_stat(&fscache_n_object_lookups);
-	fscache_stat(&fscache_n_cop_lookup_object);
-	ret = object->cache->ops->lookup_object(object);
-	fscache_stat_d(&fscache_n_cop_lookup_object);
-
-	fscache_unuse_cookie(object);
-
-	if (ret == -ETIMEDOUT) {
-		/* probably stuck behind another object, so move this one to
-		 * the back of the queue */
-		fscache_stat(&fscache_n_object_lookups_timed_out);
-		_leave(" [timeout]");
-		return NO_TRANSIT;
-	}
-
-	if (ret < 0) {
-		_leave(" [error]");
-		return transit_to(LOOKUP_FAILURE);
-	}
-
-	_leave(" [ok]");
-	return transit_to(OBJECT_AVAILABLE);
-}
-
-/**
- * fscache_object_lookup_negative - Note negative cookie lookup
- * @object: Object pointing to cookie to mark
- *
- * Note negative lookup, permitting those waiting to read data from an already
- * existing backing object to continue as there's no data for them to read.
- */
-void fscache_object_lookup_negative(struct fscache_object *object)
-{
-	struct fscache_cookie *cookie = object->cookie;
-
-	_enter("{OBJ%x,%s}", object->debug_id, object->state->name);
-
-	if (!test_and_set_bit(FSCACHE_OBJECT_IS_LOOKED_UP, &object->flags)) {
-		fscache_stat(&fscache_n_object_lookups_negative);
-
-		/* Allow write requests to begin stacking up and read requests to begin
-		 * returning ENODATA.
-		 */
-		set_bit(FSCACHE_COOKIE_NO_DATA_YET, &cookie->flags);
-		clear_bit(FSCACHE_COOKIE_UNAVAILABLE, &cookie->flags);
-
-		_debug("wake up lookup %p", &cookie->flags);
-		clear_bit_unlock(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags);
-		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP);
-	}
-	_leave("");
-}
-EXPORT_SYMBOL(fscache_object_lookup_negative);
-
-/**
- * fscache_obtained_object - Note successful object lookup or creation
- * @object: Object pointing to cookie to mark
- *
- * Note successful lookup and/or creation, permitting those waiting to write
- * data to a backing object to continue.
- *
- * Note that after calling this, an object's cookie may be relinquished by the
- * netfs, and so must be accessed with object lock held.
- */
-void fscache_obtained_object(struct fscache_object *object)
-{
-	struct fscache_cookie *cookie = object->cookie;
-
-	_enter("{OBJ%x,%s}", object->debug_id, object->state->name);
-
-	/* if we were still looking up, then we must have a positive lookup
-	 * result, in which case there may be data available */
-	if (!test_and_set_bit(FSCACHE_OBJECT_IS_LOOKED_UP, &object->flags)) {
-		fscache_stat(&fscache_n_object_lookups_positive);
-
-		/* We do (presumably) have data */
-		clear_bit_unlock(FSCACHE_COOKIE_NO_DATA_YET, &cookie->flags);
-		clear_bit(FSCACHE_COOKIE_UNAVAILABLE, &cookie->flags);
-
-		/* Allow write requests to begin stacking up and read requests
-		 * to begin shovelling data.
-		 */
-		clear_bit_unlock(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags);
-		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP);
-	} else {
-		fscache_stat(&fscache_n_object_created);
-	}
-
-	set_bit(FSCACHE_OBJECT_IS_AVAILABLE, &object->flags);
-	_leave("");
-}
-EXPORT_SYMBOL(fscache_obtained_object);
-
-/*
- * handle an object that has just become available
- */
-static const struct fscache_state *fscache_object_available(struct fscache_object *object,
-							    int event)
-{
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	object->oob_table = fscache_osm_run_oob;
-
-	spin_lock(&object->lock);
-
-	fscache_done_parent_op(object);
-	spin_unlock(&object->lock);
-
-	fscache_stat(&fscache_n_cop_lookup_complete);
-	object->cache->ops->lookup_complete(object);
-	fscache_stat_d(&fscache_n_cop_lookup_complete);
-
-	fscache_hist(fscache_obj_instantiate_histogram, object->lookup_jif);
-	fscache_stat(&fscache_n_object_avail);
-
-	_leave("");
-	return transit_to(JUMPSTART_DEPS);
-}
-
-/*
- * Wake up this object's dependent objects now that we've become available.
- */
-static const struct fscache_state *fscache_jumpstart_dependents(struct fscache_object *object,
-								int event)
-{
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	if (!fscache_enqueue_dependents(object, FSCACHE_OBJECT_EV_PARENT_READY))
-		return NO_TRANSIT; /* Not finished; requeue */
-	return transit_to(WAIT_FOR_CMD);
-}
-
-/*
- * Handle lookup or creation failute.
- */
-static const struct fscache_state *fscache_lookup_failure(struct fscache_object *object,
-							  int event)
-{
-	struct fscache_cookie *cookie;
-
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	object->oob_event_mask = 0;
-
-	fscache_stat(&fscache_n_cop_lookup_complete);
-	object->cache->ops->lookup_complete(object);
-	fscache_stat_d(&fscache_n_cop_lookup_complete);
-
-	set_bit(FSCACHE_OBJECT_KILLED_BY_CACHE, &object->flags);
-
-	cookie = object->cookie;
-	set_bit(FSCACHE_COOKIE_UNAVAILABLE, &cookie->flags);
-	if (test_and_clear_bit(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags))
-		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP);
-
-	fscache_done_parent_op(object);
-	return transit_to(KILL_OBJECT);
-}
-
-/*
- * Wait for completion of all active operations on this object and the death of
- * all child objects of this object.
- */
-static const struct fscache_state *fscache_kill_object(struct fscache_object *object,
-						       int event)
-{
-	_enter("{OBJ%x,%d,%d},%d",
-	       object->debug_id, object->n_ops, object->n_children, event);
-
-	fscache_mark_object_dead(object);
-	object->oob_event_mask = 0;
-
-	if (list_empty(&object->dependents) &&
-	    object->n_ops == 0 &&
-	    object->n_children == 0)
-		return transit_to(DROP_OBJECT);
-
-	if (!list_empty(&object->dependents))
-		return transit_to(KILL_DEPENDENTS);
-
-	return transit_to(WAIT_FOR_CLEARANCE);
-}
-
-/*
- * Kill dependent objects.
- */
-static const struct fscache_state *fscache_kill_dependents(struct fscache_object *object,
-							   int event)
-{
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	if (!fscache_enqueue_dependents(object, FSCACHE_OBJECT_EV_KILL))
-		return NO_TRANSIT; /* Not finished */
-	return transit_to(WAIT_FOR_CLEARANCE);
-}
-
-/*
- * Drop an object's attachments
- */
-static const struct fscache_state *fscache_drop_object(struct fscache_object *object,
-						       int event)
-{
-	struct fscache_object *parent = object->parent;
-	struct fscache_cookie *cookie = object->cookie;
-	struct fscache_cache *cache = object->cache;
-	bool awaken = false;
-
-	_enter("{OBJ%x,%d},%d", object->debug_id, object->n_children, event);
-
-	ASSERT(cookie != NULL);
-	ASSERT(!hlist_unhashed(&object->cookie_link));
-
-	if (test_bit(FSCACHE_COOKIE_AUX_UPDATED, &cookie->flags)) {
-		_debug("final update");
-		fscache_update_aux_data(object);
-	}
-
-	/* Make sure the cookie no longer points here and that the netfs isn't
-	 * waiting for us.
-	 */
-	spin_lock(&cookie->lock);
-	hlist_del_init(&object->cookie_link);
-	if (hlist_empty(&cookie->backing_objects) &&
-	    test_and_clear_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags))
-		awaken = true;
-	spin_unlock(&cookie->lock);
-
-	if (awaken)
-		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING);
-	if (test_and_clear_bit(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags))
-		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP);
-
-
-	/* Prevent a race with our last child, which has to signal EV_CLEARED
-	 * before dropping our spinlock.
-	 */
-	spin_lock(&object->lock);
-	spin_unlock(&object->lock);
-
-	/* Discard from the cache's collection of objects */
-	spin_lock(&cache->object_list_lock);
-	list_del_init(&object->cache_link);
-	spin_unlock(&cache->object_list_lock);
-
-	fscache_stat(&fscache_n_cop_drop_object);
-	cache->ops->drop_object(object);
-	fscache_stat_d(&fscache_n_cop_drop_object);
-
-	/* The parent object wants to know when all it dependents have gone */
-	if (parent) {
-		_debug("release parent OBJ%x {%d}",
-		       parent->debug_id, parent->n_children);
-
-		spin_lock(&parent->lock);
-		parent->n_children--;
-		if (parent->n_children == 0)
-			fscache_raise_event(parent, FSCACHE_OBJECT_EV_CLEARED);
-		spin_unlock(&parent->lock);
-		object->parent = NULL;
-	}
-
-	/* this just shifts the object release to the work processor */
-	fscache_put_object(object, fscache_obj_put_drop_obj);
-	fscache_stat(&fscache_n_object_dead);
-
-	_leave("");
-	return transit_to(OBJECT_DEAD);
-}
-
-/*
- * get a ref on an object
- */
-static int fscache_get_object(struct fscache_object *object,
-			      enum fscache_obj_ref_trace why)
-{
-	int ret;
-
-	fscache_stat(&fscache_n_cop_grab_object);
-	ret = object->cache->ops->grab_object(object, why) ? 0 : -EAGAIN;
-	fscache_stat_d(&fscache_n_cop_grab_object);
-	return ret;
-}
-
-/*
- * Discard a ref on an object
- */
-static void fscache_put_object(struct fscache_object *object,
-			       enum fscache_obj_ref_trace why)
-{
-	fscache_stat(&fscache_n_cop_put_object);
-	object->cache->ops->put_object(object, why);
-	fscache_stat_d(&fscache_n_cop_put_object);
-}
-
-/**
- * fscache_object_destroy - Note that a cache object is about to be destroyed
- * @object: The object to be destroyed
- *
- * Note the imminent destruction and deallocation of a cache object record.
- */
-void fscache_object_destroy(struct fscache_object *object)
-{
-	fscache_objlist_remove(object);
-
-	/* We can get rid of the cookie now */
-	fscache_cookie_put(object->cookie, fscache_cookie_put_object);
-	object->cookie = NULL;
-}
-EXPORT_SYMBOL(fscache_object_destroy);
-
-/*
- * enqueue an object for metadata-type processing
- */
-void fscache_enqueue_object(struct fscache_object *object)
-{
-	_enter("{OBJ%x}", object->debug_id);
-
-	if (fscache_get_object(object, fscache_obj_get_queue) >= 0) {
-		wait_queue_head_t *cong_wq =
-			&get_cpu_var(fscache_object_cong_wait);
-
-		if (queue_work(fscache_object_wq, &object->work)) {
-			if (fscache_object_congested())
-				wake_up(cong_wq);
-		} else
-			fscache_put_object(object, fscache_obj_put_queue);
-
-		put_cpu_var(fscache_object_cong_wait);
-	}
-}
-
-/**
- * fscache_object_sleep_till_congested - Sleep until object wq is congested
- * @timeoutp: Scheduler sleep timeout
- *
- * Allow an object handler to sleep until the object workqueue is congested.
- *
- * The caller must set up a wake up event before calling this and must have set
- * the appropriate sleep mode (such as TASK_UNINTERRUPTIBLE) and tested its own
- * condition before calling this function as no test is made here.
- *
- * %true is returned if the object wq is congested, %false otherwise.
- */
-bool fscache_object_sleep_till_congested(signed long *timeoutp)
-{
-	wait_queue_head_t *cong_wq = this_cpu_ptr(&fscache_object_cong_wait);
-	DEFINE_WAIT(wait);
-
-	if (fscache_object_congested())
-		return true;
-
-	add_wait_queue_exclusive(cong_wq, &wait);
-	if (!fscache_object_congested())
-		*timeoutp = schedule_timeout(*timeoutp);
-	finish_wait(cong_wq, &wait);
-
-	return fscache_object_congested();
-}
-EXPORT_SYMBOL_GPL(fscache_object_sleep_till_congested);
-
-/*
- * Enqueue the dependents of an object for metadata-type processing.
- *
- * If we don't manage to finish the list before the scheduler wants to run
- * again then return false immediately.  We return true if the list was
- * cleared.
- */
-static bool fscache_enqueue_dependents(struct fscache_object *object, int event)
-{
-	struct fscache_object *dep;
-	bool ret = true;
-
-	_enter("{OBJ%x}", object->debug_id);
-
-	if (list_empty(&object->dependents))
-		return true;
-
-	spin_lock(&object->lock);
-
-	while (!list_empty(&object->dependents)) {
-		dep = list_entry(object->dependents.next,
-				 struct fscache_object, dep_link);
-		list_del_init(&dep->dep_link);
-
-		fscache_raise_event(dep, event);
-		fscache_put_object(dep, fscache_obj_put_enq_dep);
-
-		if (!list_empty(&object->dependents) && need_resched()) {
-			ret = false;
-			break;
-		}
-	}
-
-	spin_unlock(&object->lock);
-	return ret;
-}
-
-/*
- * remove an object from whatever queue it's waiting on
- */
-static void fscache_dequeue_object(struct fscache_object *object)
-{
-	_enter("{OBJ%x}", object->debug_id);
-
-	if (!list_empty(&object->dep_link)) {
-		spin_lock(&object->parent->lock);
-		list_del_init(&object->dep_link);
-		spin_unlock(&object->parent->lock);
-	}
-
-	_leave("");
-}
-
-static const struct fscache_state *fscache_invalidate_object(struct fscache_object *object,
-							     int event)
-{
-	return transit_to(UPDATE_OBJECT);
-}
-
-/*
- * Update auxiliary data.
- */
-static void fscache_update_aux_data(struct fscache_object *object)
-{
-	fscache_stat(&fscache_n_updates_run);
-	fscache_stat(&fscache_n_cop_update_object);
-	object->cache->ops->update_object(object);
-	fscache_stat_d(&fscache_n_cop_update_object);
-}
-
-/*
- * Asynchronously update an object.
- */
-static const struct fscache_state *fscache_update_object(struct fscache_object *object,
-							 int event)
-{
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	fscache_update_aux_data(object);
-
-	_leave("");
-	return transit_to(WAIT_FOR_CMD);
-}
-
-/**
- * fscache_object_retrying_stale - Note retrying stale object
- * @object: The object that will be retried
- *
- * Note that an object lookup found an on-disk object that was adjudged to be
- * stale and has been deleted.  The lookup will be retried.
- */
-void fscache_object_retrying_stale(struct fscache_object *object)
-{
-	fscache_stat(&fscache_n_cache_no_space_reject);
-}
-EXPORT_SYMBOL(fscache_object_retrying_stale);
-
-/**
- * fscache_object_mark_killed - Note that an object was killed
- * @object: The object that was culled
- * @why: The reason the object was killed.
- *
- * Note that an object was killed.  Returns true if the object was
- * already marked killed, false if it wasn't.
- */
-void fscache_object_mark_killed(struct fscache_object *object,
-				enum fscache_why_object_killed why)
-{
-	if (test_and_set_bit(FSCACHE_OBJECT_KILLED_BY_CACHE, &object->flags)) {
-		pr_err("Error: Object already killed by cache [%s]\n",
-		       object->cache->identifier);
-		return;
-	}
-
-	switch (why) {
-	case FSCACHE_OBJECT_NO_SPACE:
-		fscache_stat(&fscache_n_cache_no_space_reject);
-		break;
-	case FSCACHE_OBJECT_IS_STALE:
-		fscache_stat(&fscache_n_cache_stale_objects);
-		break;
-	case FSCACHE_OBJECT_WAS_RETIRED:
-		fscache_stat(&fscache_n_cache_retired_objects);
-		break;
-	case FSCACHE_OBJECT_WAS_CULLED:
-		fscache_stat(&fscache_n_cache_culled_objects);
-		break;
-	}
-}
-EXPORT_SYMBOL(fscache_object_mark_killed);
-
-/*
- * The object is dead.  We can get here if an object gets queued by an event
- * that would lead to its death (such as EV_KILL) when the dispatcher is
- * already running (and so can be requeued) but hasn't yet cleared the event
- * mask.
- */
-static const struct fscache_state *fscache_object_dead(struct fscache_object *object,
-						       int event)
-{
-	if (!test_and_set_bit(FSCACHE_OBJECT_RUN_AFTER_DEAD,
-			      &object->flags))
-		return NO_TRANSIT;
-
-	WARN(true, "FS-Cache object redispatched after death");
-	return NO_TRANSIT;
-}
diff --git a/fs/fscache/object_bits.c b/fs/fscache/object_bits.c
new file mode 100644
index 000000000000..5316d7d33ce0
--- /dev/null
+++ b/fs/fscache/object_bits.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Miscellaneous object routines.
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for more information on
+ * the netfs API.
+ */
+
+#define FSCACHE_DEBUG_LEVEL OBJECT
+#include <linux/module.h>
+#include <linux/slab.h>
+#include "internal.h"
+
+static atomic_t fscache_object_debug_id;
+
+/**
+ * fscache_object_init - Initialise a cache object description
+ * @object: Object description
+ * @cookie: Cookie object will be attached to
+ * @cache: Cache in which backing object will be found
+ *
+ * Initialise a cache object description to its basic values.
+ *
+ * See Documentation/filesystems/caching/backend-api.txt for a complete
+ * description.
+ */
+void fscache_object_init(struct fscache_object *object,
+			 struct fscache_cookie *cookie,
+			 struct fscache_cache *cache)
+{
+	atomic_inc(&cache->object_count);
+
+	spin_lock_init(&object->lock);
+	INIT_LIST_HEAD(&object->cache_link);
+	INIT_HLIST_NODE(&object->cookie_link);
+	object->n_children = 0;
+	object->cache = cache;
+	object->cookie = fscache_cookie_get(cookie, fscache_cookie_get_attach_object);
+	object->parent = NULL;
+#ifdef CONFIG_FSCACHE_OBJECT_LIST
+	RB_CLEAR_NODE(&object->objlist_link);
+#endif
+	object->debug_id = atomic_inc_return(&fscache_object_debug_id);
+}
+EXPORT_SYMBOL(fscache_object_init);
+
+/**
+ * fscache_object_destroy - Note that a cache object is about to be destroyed
+ * @object: The object to be destroyed
+ *
+ * Note the imminent destruction and deallocation of a cache object record.
+ */
+void fscache_object_destroy(struct fscache_object *object)
+{
+	_enter("%u", atomic_read(&object->cache->object_count));
+
+	fscache_objlist_remove(object);
+
+	/* We can get rid of the cookie now */
+	fscache_cookie_put(object->cookie, fscache_cookie_put_object);
+	object->cookie = NULL;
+}
+EXPORT_SYMBOL(fscache_object_destroy);
+
+/**
+ * fscache_object_destroyed - Note destruction of an object in a cache
+ * @cache: The cache from which the object came
+ *
+ * Note the destruction and deallocation of an object record in a cache.
+ */
+void fscache_object_destroyed(struct fscache_cache *cache)
+{
+	_enter("%d", atomic_read(&cache->object_count));
+	if (atomic_dec_and_test(&cache->object_count))
+		wake_up_all(&fscache_cache_cleared_wq);
+}
+EXPORT_SYMBOL(fscache_object_destroyed);
+
+/**
+ * fscache_object_mark_killed - Note that an object was killed
+ * @object: The object that was culled
+ * @why: The reason the object was killed.
+ *
+ * Note that an object was killed.  Returns true if the object was
+ * already marked killed, false if it wasn't.
+ */
+void fscache_object_mark_killed(struct fscache_object *object,
+				enum fscache_why_object_killed why)
+{
+	switch (why) {
+	case FSCACHE_OBJECT_NO_SPACE:
+		fscache_stat(&fscache_n_cache_no_space_reject);
+		break;
+	case FSCACHE_OBJECT_IS_STALE:
+		fscache_stat(&fscache_n_cache_stale_objects);
+		break;
+	case FSCACHE_OBJECT_WAS_RETIRED:
+		fscache_stat(&fscache_n_cache_retired_objects);
+		break;
+	case FSCACHE_OBJECT_WAS_CULLED:
+		fscache_stat(&fscache_n_cache_culled_objects);
+		break;
+	}
+}
+EXPORT_SYMBOL(fscache_object_mark_killed);
+
+/**
+ * fscache_object_retrying_stale - Note retrying stale object
+ * @object: The object that will be retried
+ *
+ * Note that an object lookup found an on-disk object that was adjudged to be
+ * stale and has been deleted.  The lookup will be retried.
+ */
+void fscache_object_retrying_stale(struct fscache_object *object)
+{
+	fscache_stat(&fscache_n_cache_stale_objects);
+}
+EXPORT_SYMBOL(fscache_object_retrying_stale);
diff --git a/fs/fscache/proc.c b/fs/fscache/proc.c
index da51fdfc8641..729d083f1e91 100644
--- a/fs/fscache/proc.c
+++ b/fs/fscache/proc.c
@@ -5,7 +5,7 @@
  * Written by David Howells (dhowells@redhat.com)
  */
 
-#define FSCACHE_DEBUG_LEVEL OPERATION
+#define FSCACHE_DEBUG_LEVEL CACHE
 #include <linux/module.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
@@ -16,8 +16,6 @@
  */
 int __init fscache_proc_init(void)
 {
-	_enter("");
-
 	if (!proc_mkdir("fs/fscache", NULL))
 		goto error_dir;
 
@@ -43,7 +41,6 @@ int __init fscache_proc_init(void)
 		goto error_objects;
 #endif
 
-	_leave(" = 0");
 	return 0;
 
 #ifdef CONFIG_FSCACHE_OBJECT_LIST
@@ -61,7 +58,6 @@ int __init fscache_proc_init(void)
 error_cookies:
 	remove_proc_entry("fs/fscache", NULL);
 error_dir:
-	_leave(" = -ENOMEM");
 	return -ENOMEM;
 }
 
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index 5b1cec456199..583817f4f113 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -5,7 +5,7 @@
  * Written by David Howells (dhowells@redhat.com)
  */
 
-#define FSCACHE_DEBUG_LEVEL THREAD
+#define FSCACHE_DEBUG_LEVEL CACHE
 #include <linux/module.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 8b6aa11dee54..de1cffb2558e 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -16,7 +16,6 @@
 
 #include <linux/fscache.h>
 #include <linux/sched.h>
-#include <linux/workqueue.h>
 
 #define NR_MAXCACHES BITS_PER_LONG
 
@@ -25,14 +24,21 @@ struct fscache_cache_ops;
 struct fscache_object;
 
 enum fscache_obj_ref_trace {
-	fscache_obj_get_add_to_deps,
-	fscache_obj_get_queue,
+	fscache_obj_get_attach,
+	fscache_obj_get_exists,
+	fscache_obj_get_inval,
+	fscache_obj_get_wait,
+	fscache_obj_get_withdraw,
+	fscache_obj_new,
+	fscache_obj_put,
+	fscache_obj_put_alloc_dup,
 	fscache_obj_put_alloc_fail,
 	fscache_obj_put_attach_fail,
+	fscache_obj_put_drop_child,
 	fscache_obj_put_drop_obj,
-	fscache_obj_put_enq_dep,
-	fscache_obj_put_queue,
-	fscache_obj_put_work,
+	fscache_obj_put_inval,
+	fscache_obj_put_lookup_fail,
+	fscache_obj_put_withdraw,
 	fscache_obj_ref__nr_traces
 };
 
@@ -80,16 +86,21 @@ struct fscache_cache_ops {
 	const char *name;
 
 	/* allocate an object record for a cookie */
-	struct fscache_object *(*alloc_object)(struct fscache_cache *cache,
-					       struct fscache_cookie *cookie);
+	struct fscache_object *(*alloc_object)(struct fscache_cookie *cookie,
+					       struct fscache_cache *cache,
+					       struct fscache_object *parent);
 
-	/* look up the object for a cookie
-	 * - return -ETIMEDOUT to be requeued
-	 */
-	int (*lookup_object)(struct fscache_object *object);
+	/* Prepare data used in lookup */
+	void *(*prepare_lookup_data)(struct fscache_object *object);
 
-	/* finished looking up */
-	void (*lookup_complete)(struct fscache_object *object);
+	/* Look up the object for a cookie */
+	bool (*lookup_object)(struct fscache_object *object, void *lookup_data);
+
+	/* Create the object for a cookie */
+	bool (*create_object)(struct fscache_object *object, void *lookup_data);
+
+	/* Clean up lookup data */
+	void (*free_lookup_data)(struct fscache_object *object, void *lookup_data);
 
 	/* increment the usage count on this object (may fail if unmounting) */
 	struct fscache_object *(*grab_object)(struct fscache_object *object,
@@ -109,12 +120,15 @@ struct fscache_cache_ops {
 
 	/* discard the resources pinned by an object and effect retirement if
 	 * necessary */
-	void (*drop_object)(struct fscache_object *object);
+	void (*drop_object)(struct fscache_object *object, bool invalidate);
 
 	/* dispose of a reference to an object */
 	void (*put_object)(struct fscache_object *object,
 			   enum fscache_obj_ref_trace why);
 
+	/* Get object usage count */
+	unsigned int (*get_object_usage)(const struct fscache_object *object);
+
 	/* sync a cache */
 	void (*sync_cache)(struct fscache_cache *cache);
 
@@ -124,74 +138,34 @@ struct fscache_cache_ops {
 
 extern struct fscache_cookie fscache_fsdef_index;
 
-/*
- * Event list for fscache_object::{event_mask,events}
- */
-enum {
-	FSCACHE_OBJECT_EV_NEW_CHILD,	/* T if object has a new child */
-	FSCACHE_OBJECT_EV_PARENT_READY,	/* T if object's parent is ready */
-	FSCACHE_OBJECT_EV_UPDATE,	/* T if object should be updated */
-	FSCACHE_OBJECT_EV_INVALIDATE,	/* T if cache requested object invalidation */
-	FSCACHE_OBJECT_EV_CLEARED,	/* T if accessors all gone */
-	FSCACHE_OBJECT_EV_ERROR,	/* T if fatal error occurred during processing */
-	FSCACHE_OBJECT_EV_KILL,		/* T if netfs relinquished or cache withdrew object */
-	NR_FSCACHE_OBJECT_EVENTS
-};
-
-#define FSCACHE_OBJECT_EVENTS_MASK ((1UL << NR_FSCACHE_OBJECT_EVENTS) - 1)
-
-/*
- * States for object state machine.
- */
-struct fscache_transition {
-	unsigned long events;
-	const struct fscache_state *transit_to;
-};
-
-struct fscache_state {
-	char name[24];
-	char short_name[8];
-	const struct fscache_state *(*work)(struct fscache_object *object,
-					    int event);
-	const struct fscache_transition transitions[];
+enum fscache_object_stage {
+	FSCACHE_OBJECT_STAGE_INITIAL,
+	FSCACHE_OBJECT_STAGE_LOOKING_UP,
+	FSCACHE_OBJECT_STAGE_UNCREATED,		/* Needs creation */
+	FSCACHE_OBJECT_STAGE_LIVE_TEMP,		/* Temporary object created, can be no hits */
+	FSCACHE_OBJECT_STAGE_LIVE_EMPTY,	/* Object was freshly created, can be no hits */
+	FSCACHE_OBJECT_STAGE_LIVE,		/* Object is populated */
+	FSCACHE_OBJECT_STAGE_DESTROYING,
+	FSCACHE_OBJECT_STAGE_DEAD,
 };
 
 /*
  * on-disk cache file or index handle
  */
 struct fscache_object {
-	const struct fscache_state *state;	/* Object state machine state */
-	const struct fscache_transition *oob_table; /* OOB state transition table */
 	int			debug_id;	/* debugging ID */
 	int			n_children;	/* number of child objects */
-	int			n_ops;		/* number of extant ops on object */
-	int			n_obj_ops;	/* number of object ops outstanding on object */
+	enum fscache_object_stage stage;	/* Stage of object's lifecycle */
 	spinlock_t		lock;		/* state and operations lock */
 
-	unsigned long		lookup_jif;	/* time at which lookup started */
-	unsigned long		oob_event_mask;	/* OOB events this object is interested in */
-	unsigned long		event_mask;	/* events this object is interested in */
-	unsigned long		events;		/* events to be processed by this object
-						 * (order is important - using fls) */
-
 	unsigned long		flags;
-#define FSCACHE_OBJECT_LOCK		0	/* T if object is busy being processed */
-#define FSCACHE_OBJECT_WAITING		2	/* T if object is waiting on its parent */
-#define FSCACHE_OBJECT_IS_LIVE		3	/* T if object is not withdrawn or relinquished */
-#define FSCACHE_OBJECT_IS_LOOKED_UP	4	/* T if object has been looked up */
-#define FSCACHE_OBJECT_IS_AVAILABLE	5	/* T if object has become active */
-#define FSCACHE_OBJECT_RETIRED		6	/* T if object was retired on relinquishment */
-#define FSCACHE_OBJECT_KILLED_BY_CACHE	7	/* T if object was killed by the cache */
-#define FSCACHE_OBJECT_RUN_AFTER_DEAD	8	/* T if object has been dispatched after death */
+#define FSCACHE_OBJECT_NEEDS_UPDATE	9	/* T if object attrs need writing to disk */
 
 	struct list_head	cache_link;	/* link in cache->object_list */
 	struct hlist_node	cookie_link;	/* link in cookie->backing_objects */
 	struct fscache_cache	*cache;		/* cache that supplied this object */
 	struct fscache_cookie	*cookie;	/* netfs's file/index object */
 	struct fscache_object	*parent;	/* parent object */
-	struct work_struct	work;		/* attention scheduling record */
-	struct list_head	dependents;	/* FIFO of dependent objects */
-	struct list_head	dep_link;	/* link in parent's dependents list */
 #ifdef CONFIG_FSCACHE_OBJECT_LIST
 	struct rb_node		objlist_link;	/* link in global object list */
 #endif
@@ -201,101 +175,12 @@ extern void fscache_object_init(struct fscache_object *, struct fscache_cookie *
 				struct fscache_cache *);
 extern void fscache_object_destroy(struct fscache_object *);
 
-extern void fscache_object_lookup_negative(struct fscache_object *object);
-extern void fscache_obtained_object(struct fscache_object *object);
-
-static inline bool fscache_object_is_live(struct fscache_object *object)
-{
-	return test_bit(FSCACHE_OBJECT_IS_LIVE, &object->flags);
-}
-
-static inline bool fscache_object_is_dying(struct fscache_object *object)
-{
-	return !fscache_object_is_live(object);
-}
-
-static inline bool fscache_object_is_available(struct fscache_object *object)
-{
-	return test_bit(FSCACHE_OBJECT_IS_AVAILABLE, &object->flags);
-}
-
 static inline bool fscache_cache_is_broken(struct fscache_object *object)
 {
 	return test_bit(FSCACHE_IOERROR, &object->cache->flags);
 }
 
-static inline bool fscache_object_is_active(struct fscache_object *object)
-{
-	return fscache_object_is_available(object) &&
-		fscache_object_is_live(object) &&
-		!fscache_cache_is_broken(object);
-}
-
-/**
- * fscache_object_destroyed - Note destruction of an object in a cache
- * @cache: The cache from which the object came
- *
- * Note the destruction and deallocation of an object record in a cache.
- */
-static inline void fscache_object_destroyed(struct fscache_cache *cache)
-{
-	if (atomic_dec_and_test(&cache->object_count))
-		wake_up_all(&fscache_cache_cleared_wq);
-}
-
-/**
- * fscache_object_lookup_error - Note an object encountered an error
- * @object: The object on which the error was encountered
- *
- * Note that an object encountered a fatal error (usually an I/O error) and
- * that it should be withdrawn as soon as possible.
- */
-static inline void fscache_object_lookup_error(struct fscache_object *object)
-{
-	set_bit(FSCACHE_OBJECT_EV_ERROR, &object->events);
-}
-
-static inline void __fscache_use_cookie(struct fscache_cookie *cookie)
-{
-	atomic_inc(&cookie->n_active);
-}
-
-/**
- * fscache_use_cookie - Request usage of cookie attached to an object
- * @object: Object description
- * 
- * Request usage of the cookie attached to an object.  NULL is returned if the
- * relinquishment had reduced the cookie usage count to 0.
- */
-static inline bool fscache_use_cookie(struct fscache_object *object)
-{
-	struct fscache_cookie *cookie = object->cookie;
-	return atomic_inc_not_zero(&cookie->n_active) != 0;
-}
-
-static inline bool __fscache_unuse_cookie(struct fscache_cookie *cookie)
-{
-	return atomic_dec_and_test(&cookie->n_active);
-}
-
-static inline void __fscache_wake_unused_cookie(struct fscache_cookie *cookie)
-{
-	wake_up_var(&cookie->n_active);
-}
-
-/**
- * fscache_unuse_cookie - Cease usage of cookie attached to an object
- * @object: Object description
- * 
- * Cease usage of the cookie attached to an object.  When the users count
- * reaches zero then the cookie relinquishment will be permitted to proceed.
- */
-static inline void fscache_unuse_cookie(struct fscache_object *object)
-{
-	struct fscache_cookie *cookie = object->cookie;
-	if (__fscache_unuse_cookie(cookie))
-		__fscache_wake_unused_cookie(cookie);
-}
+extern void fscache_object_destroyed(struct fscache_cache *cache);
 
 /*
  * out-of-line cache backend functions
@@ -312,8 +197,6 @@ extern void fscache_withdraw_cache(struct fscache_cache *cache);
 
 extern void fscache_io_error(struct fscache_cache *cache);
 
-extern bool fscache_object_sleep_till_congested(signed long *timeoutp);
-
 extern void fscache_object_retrying_stale(struct fscache_object *object);
 
 enum fscache_why_object_killed {
@@ -347,4 +230,13 @@ static inline void *fscache_get_aux(struct fscache_cookie *cookie)
 		return cookie->aux;
 }
 
+/*
+ * Complete an I/O operation
+ */
+static inline void fscache_end_io_operation(struct fscache_cookie *cookie)
+{
+	if (atomic_dec_and_test(&cookie->n_ops))
+		wake_up_var(&cookie->n_ops);
+}
+
 #endif /* _LINUX_FSCACHE_CACHE_H */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index c62b62938c4d..11b18761a3b6 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -31,7 +31,7 @@
 
 /*
  * overload PG_private_2 to give us PG_fscache - this is used to indicate that
- * a page is currently backed by a local disk cache
+ * a page is currently being written to the cache, possibly by direct I/O.
  */
 #define PageFsCache(page)		PagePrivate2((page))
 #define SetPageFsCache(page)		SetPagePrivate2((page))
@@ -52,6 +52,8 @@ enum fscache_cookie_type {
 };
 
 #define FSCACHE_ADV_SINGLE_CHUNK	0x01 /* The object is a single chunk of data */
+#define FSCACHE_ADV_WRITE_CACHE		0x00 /* Do cache if written to locally */
+#define FSCACHE_ADV_WRITE_NOCACHE	0x02 /* Don't cache if written to locally */
 
 /*
  * fscache cached network filesystem type
@@ -64,6 +66,19 @@ struct fscache_netfs {
 	struct fscache_cookie		*primary_index;
 };
 
+/*
+ * Data object state.
+ */
+enum fscache_cookie_stage {
+	FSCACHE_COOKIE_STAGE_QUIESCENT,		/* The cookie is uncached */
+	FSCACHE_COOKIE_STAGE_INITIALISING,	/* The in-memory structs are being inited */
+	FSCACHE_COOKIE_STAGE_LOOKING_UP,	/* The cache object is being looked up */
+	FSCACHE_COOKIE_STAGE_NO_DATA_YET,	/* The cache has no data, read to network */
+	FSCACHE_COOKIE_STAGE_ACTIVE,		/* The cache is active, readable and writable */
+	FSCACHE_COOKIE_STAGE_INVALIDATING,	/* The cache is being invalidated */
+	FSCACHE_COOKIE_STAGE_DEAD,		/* The cache object is dead */
+} __attribute__((mode(byte)));
+
 /*
  * data file or index object cookie
  * - a file will only appear in one cache
@@ -74,7 +89,8 @@ struct fscache_netfs {
 struct fscache_cookie {
 	atomic_t			usage;		/* number of users of this cookie */
 	atomic_t			n_children;	/* number of children of this cookie */
-	atomic_t			n_active;	/* number of active users of netfs ptrs */
+	atomic_t			n_active;	/* number of active users of cookie */
+	atomic_t			n_ops;		/* Number of active ops on this cookie */
 	unsigned int			debug_id;
 	spinlock_t			lock;
 	struct hlist_head		backing_objects; /* object(s) backing this file/index */
@@ -86,19 +102,13 @@ struct fscache_cookie {
 	loff_t				object_size;	/* Size of the netfs object */
 
 	unsigned long			flags;
-#define FSCACHE_COOKIE_LOOKING_UP	0	/* T if non-index cookie being looked up still */
-#define FSCACHE_COOKIE_NO_DATA_YET	1	/* T if new object with no cached data yet */
-#define FSCACHE_COOKIE_UNAVAILABLE	2	/* T if cookie is unavailable (error, etc) */
-#define FSCACHE_COOKIE_INVALIDATING	3	/* T if cookie is being invalidated */
-#define FSCACHE_COOKIE_RELINQUISHED	4	/* T if cookie has been relinquished */
-#define FSCACHE_COOKIE_ENABLED		5	/* T if cookie is enabled */
-#define FSCACHE_COOKIE_ENABLEMENT_LOCK	6	/* T if cookie is being en/disabled */
-#define FSCACHE_COOKIE_AUX_UPDATED	8	/* T if the auxiliary data was updated */
-#define FSCACHE_COOKIE_ACQUIRED		9	/* T if cookie is in use */
-#define FSCACHE_COOKIE_RELINQUISHING	10	/* T if cookie is being relinquished */
+#define FSCACHE_COOKIE_INVALIDATING	4	/* T if cookie is being invalidated */
+#define FSCACHE_COOKIE_ACQUIRED		5	/* T if cookie is in use */
+#define FSCACHE_COOKIE_RELINQUISHED	6	/* T if cookie has been relinquished */
 
+	enum fscache_cookie_stage	stage;
 	enum fscache_cookie_type	type:8;
-	u8				advice;		/* FSCACHE_COOKIE_ADV_* */
+	u8				advice;		/* FSCACHE_ADV_* */
 	u8				key_len;	/* Length of index key */
 	u8				aux_len;	/* Length of auxiliary data */
 	u32				key_hash;	/* Hash of parent, type, key, len */
@@ -112,11 +122,6 @@ struct fscache_cookie {
 	};
 };
 
-static inline bool fscache_cookie_enabled(struct fscache_cookie *cookie)
-{
-	return test_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags);
-}
-
 /*
  * slow-path functions for when there is actually caching available, and the
  * netfs does actually have a valid token
@@ -137,14 +142,13 @@ extern struct fscache_cookie *__fscache_acquire_cookie(
 	struct fscache_cache_tag *,
 	const void *, size_t,
 	const void *, size_t,
-	loff_t, bool);
-extern void __fscache_relinquish_cookie(struct fscache_cookie *, const void *, bool);
-extern void __fscache_update_cookie(struct fscache_cookie *, const void *);
+	loff_t);
+extern void __fscache_use_cookie(struct fscache_cookie *, bool);
+extern void __fscache_unuse_cookie(struct fscache_cookie *, const void *, const loff_t *);
+extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
+extern void __fscache_update_cookie(struct fscache_cookie *, const void *, const loff_t *);
 extern void __fscache_invalidate(struct fscache_cookie *);
 extern void __fscache_wait_on_invalidate(struct fscache_cookie *);
-extern void __fscache_disable_cookie(struct fscache_cookie *, const void *, bool);
-extern void __fscache_enable_cookie(struct fscache_cookie *, const void *, loff_t,
-				    bool (*)(void *), void *);
 
 /**
  * fscache_register_netfs - Register a filesystem as desiring caching services
@@ -231,7 +235,6 @@ void fscache_release_cache_tag(struct fscache_cache_tag *tag)
  * @netfs_data: An arbitrary piece of data to be kept in the cookie to
  * represent the cache object to the netfs
  * @object_size: The initial size of object
- * @enable: Whether or not to enable a data cookie immediately
  *
  * This function is used to inform FS-Cache about part of an index hierarchy
  * that can be used to locate files.  This is done by requesting a cookie for
@@ -251,60 +254,92 @@ struct fscache_cookie *fscache_acquire_cookie(
 	size_t index_key_len,
 	const void *aux_data,
 	size_t aux_data_len,
-	loff_t object_size,
-	bool enable)
+	loff_t object_size)
 {
-	if (fscache_cookie_valid(parent) && fscache_cookie_enabled(parent))
+	if (fscache_cookie_valid(parent))
 		return __fscache_acquire_cookie(parent, type, type_name, advice,
 						preferred_cache,
 						index_key, index_key_len,
 						aux_data, aux_data_len,
-						object_size, enable);
+						object_size);
 	else
 		return NULL;
 }
 
+/**
+ * fscache_use_cookie - Request usage of cookie attached to an object
+ * @object: Object description
+ * @will_modify: If cache is expected to be modified locally
+ *
+ * Request usage of the cookie attached to an object.  The caller should tell
+ * the cache if the object's contents are about to be modified locally and then
+ * the cache can apply the policy that has been set to handle this case.
+ */
+static inline void fscache_use_cookie(struct fscache_cookie *cookie,
+				      bool will_modify)
+{
+	if (fscache_cookie_valid(cookie) &&
+	    cookie->type != FSCACHE_COOKIE_TYPE_INDEX)
+		__fscache_use_cookie(cookie, will_modify);
+}
+
+/**
+ * fscache_unuse_cookie - Cease usage of cookie attached to an object
+ * @object: Object description
+ * @aux_data: Updated auxiliary data (or NULL)
+ * @object_size: Revised size of the object (or NULL)
+ *
+ * Cease usage of the cookie attached to an object.  When the users count
+ * reaches zero then the cookie relinquishment will be permitted to proceed.
+ */
+static inline void fscache_unuse_cookie(struct fscache_cookie *cookie,
+					const void *aux_data,
+					const loff_t *object_size)
+{
+	if (fscache_cookie_valid(cookie) &&
+	    cookie->type != FSCACHE_COOKIE_TYPE_INDEX)
+		__fscache_unuse_cookie(cookie, aux_data, object_size);
+}
+
 /**
  * fscache_relinquish_cookie - Return the cookie to the cache, maybe discarding
  * it
  * @cookie: The cookie being returned
- * @aux_data: The updated auxiliary data for the cookie (may be NULL)
  * @retire: True if the cache object the cookie represents is to be discarded
  *
  * This function returns a cookie to the cache, forcibly discarding the
- * associated cache object if retire is set to true.  The opportunity is
- * provided to update the auxiliary data in the cache before the object is
- * disconnected.
+ * associated cache object if retire is set to true.
  *
  * See Documentation/filesystems/caching/netfs-api.rst for a complete
  * description.
  */
 static inline
-void fscache_relinquish_cookie(struct fscache_cookie *cookie,
-			       const void *aux_data,
-			       bool retire)
+void fscache_relinquish_cookie(struct fscache_cookie *cookie, bool retire)
 {
 	if (fscache_cookie_valid(cookie))
-		__fscache_relinquish_cookie(cookie, aux_data, retire);
+		__fscache_relinquish_cookie(cookie, retire);
 }
 
 /**
  * fscache_update_cookie - Request that a cache object be updated
  * @cookie: The cookie representing the cache object
  * @aux_data: The updated auxiliary data for the cookie (may be NULL)
+ * @object_size: The current size of the object (may be NULL)
  *
  * Request an update of the index data for the cache object associated with the
  * cookie.  The auxiliary data on the cookie will be updated first if @aux_data
- * is set.
+ * is set and the object size will be updated and the object possibly trimmed
+ * if @object_size is set.
  *
  * See Documentation/filesystems/caching/netfs-api.rst for a complete
  * description.
  */
 static inline
-void fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data)
+void fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data,
+			   const loff_t *object_size)
 {
-	if (fscache_cookie_valid(cookie) && fscache_cookie_enabled(cookie))
-		__fscache_update_cookie(cookie, aux_data);
+	if (fscache_cookie_valid(cookie))
+		__fscache_update_cookie(cookie, aux_data, object_size);
 }
 
 /**
@@ -352,7 +387,7 @@ void fscache_unpin_cookie(struct fscache_cookie *cookie)
 static inline
 void fscache_invalidate(struct fscache_cookie *cookie)
 {
-	if (fscache_cookie_valid(cookie) && fscache_cookie_enabled(cookie))
+	if (fscache_cookie_valid(cookie))
 		__fscache_invalidate(cookie);
 }
 
@@ -372,58 +407,4 @@ void fscache_wait_on_invalidate(struct fscache_cookie *cookie)
 		__fscache_wait_on_invalidate(cookie);
 }
 
-/**
- * fscache_disable_cookie - Disable a cookie
- * @cookie: The cookie representing the cache object
- * @aux_data: The updated auxiliary data for the cookie (may be NULL)
- * @invalidate: Invalidate the backing object
- *
- * Disable a cookie from accepting further alloc, read, write, invalidate,
- * update or acquire operations.  Outstanding operations can still be waited
- * upon and pages can still be uncached and the cookie relinquished.
- *
- * This will not return until all outstanding operations have completed.
- *
- * If @invalidate is set, then the backing object will be invalidated and
- * detached, otherwise it will just be detached.
- *
- * If @aux_data is set, then auxiliary data will be updated from that.
- */
-static inline
-void fscache_disable_cookie(struct fscache_cookie *cookie,
-			    const void *aux_data,
-			    bool invalidate)
-{
-	if (fscache_cookie_valid(cookie) && fscache_cookie_enabled(cookie))
-		__fscache_disable_cookie(cookie, aux_data, invalidate);
-}
-
-/**
- * fscache_enable_cookie - Reenable a cookie
- * @cookie: The cookie representing the cache object
- * @aux_data: The updated auxiliary data for the cookie (may be NULL)
- * @object_size: Current size of object
- * @can_enable: A function to permit enablement once lock is held
- * @data: Data for can_enable()
- *
- * Reenable a previously disabled cookie, allowing it to accept further alloc,
- * read, write, invalidate, update or acquire operations.  An attempt will be
- * made to immediately reattach the cookie to a backing object.  If @aux_data
- * is set, the auxiliary data attached to the cookie will be updated.
- *
- * The can_enable() function is called (if not NULL) once the enablement lock
- * is held to rule on whether enablement is still permitted to go ahead.
- */
-static inline
-void fscache_enable_cookie(struct fscache_cookie *cookie,
-			   const void *aux_data,
-			   loff_t object_size,
-			   bool (*can_enable)(void *data),
-			   void *data)
-{
-	if (fscache_cookie_valid(cookie) && !fscache_cookie_enabled(cookie))
-		__fscache_enable_cookie(cookie, aux_data, object_size,
-					can_enable, data);
-}
-
 #endif /* _LINUX_FSCACHE_H */
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index c877035c2946..4fedc2e9c428 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -36,14 +36,21 @@ enum cachefiles_obj_ref_trace {
 	E_(FSCACHE_OBJECT_WAS_CULLED,	"was_culled")
 
 #define cachefiles_obj_ref_traces					\
-	EM(fscache_obj_get_add_to_deps,		"GET add_to_deps")	\
-	EM(fscache_obj_get_queue,		"GET queue")		\
+	EM(fscache_obj_get_attach,		"GET attach")		\
+	EM(fscache_obj_get_exists,		"GET exists")		\
+	EM(fscache_obj_get_inval,		"GET inval")		\
+	EM(fscache_obj_get_wait,		"GET wait")		\
+	EM(fscache_obj_get_withdraw,		"GET withdraw")		\
+	EM(fscache_obj_new,			"NEW obj")		\
+	EM(fscache_obj_put,			"PUT general")		\
+	EM(fscache_obj_put_alloc_dup,		"PUT alloc_dup")	\
 	EM(fscache_obj_put_alloc_fail,		"PUT alloc_fail")	\
 	EM(fscache_obj_put_attach_fail,		"PUT attach_fail")	\
+	EM(fscache_obj_put_drop_child,		"PUT drop_child")	\
 	EM(fscache_obj_put_drop_obj,		"PUT drop_obj")		\
-	EM(fscache_obj_put_enq_dep,		"PUT enq_dep")		\
-	EM(fscache_obj_put_queue,		"PUT queue")		\
-	EM(fscache_obj_put_work,		"PUT work")		\
+	EM(fscache_obj_put_inval,		"PUT inval")		\
+	EM(fscache_obj_put_withdraw,		"PUT withdraw")		\
+	EM(fscache_obj_put_lookup_fail,		"PUT lookup_fail")	\
 	EM(cachefiles_obj_put_wait_retry,	"PUT wait_retry")	\
 	E_(cachefiles_obj_put_wait_timeo,	"PUT wait_timeo")
 
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index fb3fdf2921ee..4ce81f3f4c5b 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -24,15 +24,19 @@ enum fscache_cookie_trace {
 	fscache_cookie_discard,
 	fscache_cookie_get_acquire_parent,
 	fscache_cookie_get_attach_object,
+	fscache_cookie_get_ioreq,
 	fscache_cookie_get_reacquire,
 	fscache_cookie_get_register_netfs,
 	fscache_cookie_get_work,
-	fscache_cookie_put_acquire_nobufs,
+	fscache_cookie_new_acquire,
+	fscache_cookie_new_netfs,
 	fscache_cookie_put_dup_netfs,
-	fscache_cookie_put_relinquish,
+	fscache_cookie_put_ioreq,
 	fscache_cookie_put_object,
 	fscache_cookie_put_parent,
+	fscache_cookie_put_relinquish,
 	fscache_cookie_put_work,
+	fscache_cookie_see_discard,
 };
 
 #endif
@@ -41,19 +45,23 @@ enum fscache_cookie_trace {
  * Declare tracing information enums and their string mappings for display.
  */
 #define fscache_cookie_traces						\
-	EM(fscache_cookie_collision,		"*COLLISION*")		\
-	EM(fscache_cookie_discard,		"DISCARD")		\
-	EM(fscache_cookie_get_acquire_parent,	"GET prn")		\
-	EM(fscache_cookie_get_attach_object,	"GET obj")		\
-	EM(fscache_cookie_get_reacquire,	"GET raq")		\
-	EM(fscache_cookie_get_register_netfs,	"GET net")		\
-	EM(fscache_cookie_get_work,		"GET wrk")		\
-	EM(fscache_cookie_put_acquire_nobufs,	"PUT nbf")		\
-	EM(fscache_cookie_put_dup_netfs,	"PUT dnt")		\
-	EM(fscache_cookie_put_relinquish,	"PUT rlq")		\
-	EM(fscache_cookie_put_object,		"PUT obj")		\
-	EM(fscache_cookie_put_parent,		"PUT prn")		\
-	E_(fscache_cookie_put_work,		"PUT wrk")
+	EM(fscache_cookie_collision,		"*COLLIDE*")		\
+	EM(fscache_cookie_discard,		"DISCARD  ")		\
+	EM(fscache_cookie_get_acquire_parent,	"GET paren")		\
+	EM(fscache_cookie_get_attach_object,	"GET obj  ")		\
+	EM(fscache_cookie_get_ioreq,		"GET ioreq")		\
+	EM(fscache_cookie_get_reacquire,	"GET reacq")		\
+	EM(fscache_cookie_get_register_netfs,	"GET rgstr")		\
+	EM(fscache_cookie_get_work,		"GET work ")		\
+	EM(fscache_cookie_new_acquire,		"NEW acq  ")		\
+	EM(fscache_cookie_new_netfs,		"NEW netfs")		\
+	EM(fscache_cookie_put_dup_netfs,	"PUT dupnf")		\
+	EM(fscache_cookie_put_ioreq,		"GET ioreq")		\
+	EM(fscache_cookie_put_object,		"PUT obj  ")		\
+	EM(fscache_cookie_put_parent,		"PUT paren")		\
+	EM(fscache_cookie_put_relinquish,	"PUT relnq")		\
+	EM(fscache_cookie_put_work,		"PUT work ")		\
+	E_(fscache_cookie_see_discard,		"SEE discd")
 
 /*
  * Export enum symbols via userspace.
@@ -102,9 +110,10 @@ TRACE_EVENT(fscache_cookie,
 		    __entry->flags	= cookie->flags;
 			   ),
 
-	    TP_printk("%s c=%08x u=%d p=%08x Nc=%d Na=%d f=%02x",
+	    TP_printk("c=%08x %s u=%d p=%08x Nc=%d Na=%d f=%02x",
+		      __entry->cookie,
 		      __print_symbolic(__entry->where, fscache_cookie_traces),
-		      __entry->cookie, __entry->usage,
+		      __entry->usage,
 		      __entry->parent, __entry->n_children, __entry->n_active,
 		      __entry->flags)
 	    );
@@ -189,96 +198,6 @@ TRACE_EVENT(fscache_relinquish,
 		      __entry->flags, __entry->retire)
 	    );
 
-TRACE_EVENT(fscache_enable,
-	    TP_PROTO(struct fscache_cookie *cookie),
-
-	    TP_ARGS(cookie),
-
-	    TP_STRUCT__entry(
-		    __field(unsigned int,		cookie		)
-		    __field(int,			usage		)
-		    __field(int,			n_children	)
-		    __field(int,			n_active	)
-		    __field(u8,				flags		)
-			     ),
-
-	    TP_fast_assign(
-		    __entry->cookie	= cookie->debug_id;
-		    __entry->usage	= atomic_read(&cookie->usage);
-		    __entry->n_children	= atomic_read(&cookie->n_children);
-		    __entry->n_active	= atomic_read(&cookie->n_active);
-		    __entry->flags	= cookie->flags;
-			   ),
-
-	    TP_printk("c=%08x u=%d Nc=%d Na=%d f=%02x",
-		      __entry->cookie, __entry->usage,
-		      __entry->n_children, __entry->n_active, __entry->flags)
-	    );
-
-TRACE_EVENT(fscache_disable,
-	    TP_PROTO(struct fscache_cookie *cookie),
-
-	    TP_ARGS(cookie),
-
-	    TP_STRUCT__entry(
-		    __field(unsigned int,		cookie		)
-		    __field(int,			usage		)
-		    __field(int,			n_children	)
-		    __field(int,			n_active	)
-		    __field(u8,				flags		)
-			     ),
-
-	    TP_fast_assign(
-		    __entry->cookie	= cookie->debug_id;
-		    __entry->usage	= atomic_read(&cookie->usage);
-		    __entry->n_children	= atomic_read(&cookie->n_children);
-		    __entry->n_active	= atomic_read(&cookie->n_active);
-		    __entry->flags	= cookie->flags;
-			   ),
-
-	    TP_printk("c=%08x u=%d Nc=%d Na=%d f=%02x",
-		      __entry->cookie, __entry->usage,
-		      __entry->n_children, __entry->n_active, __entry->flags)
-	    );
-
-TRACE_EVENT(fscache_osm,
-	    TP_PROTO(struct fscache_object *object,
-		     const struct fscache_state *state,
-		     bool wait, bool oob, s8 event_num),
-
-	    TP_ARGS(object, state, wait, oob, event_num),
-
-	    TP_STRUCT__entry(
-		    __field(unsigned int,		cookie		)
-		    __field(unsigned int,		object		)
-		    __array(char,			state, 8	)
-		    __field(bool,			wait		)
-		    __field(bool,			oob		)
-		    __field(s8,				event_num	)
-			     ),
-
-	    TP_fast_assign(
-		    __entry->cookie		= object->cookie->debug_id;
-		    __entry->object		= object->debug_id;
-		    __entry->wait		= wait;
-		    __entry->oob		= oob;
-		    __entry->event_num		= event_num;
-		    memcpy(__entry->state, state->short_name, 8);
-			   ),
-
-	    TP_printk("c=%08x o=%08d %s %s%sev=%d",
-		      __entry->cookie,
-		      __entry->object,
-		      __entry->state,
-		      __print_symbolic(__entry->wait,
-				       { true,  "WAIT" },
-				       { false, "WORK" }),
-		      __print_symbolic(__entry->oob,
-				       { true,  " OOB " },
-				       { false, " " }),
-		      __entry->event_num)
-	    );
-
 #endif /* _TRACE_FSCACHE_H */
 
 /* This part must be outside protection */



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 08/32] fscache: Replace the object management state machine
@ 2020-07-13 16:32   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:32 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs

Replace the object management state machine with something a lot simpler.
The entire process of setting up or tearing down a cookie is done in one
go, and the dispatcher either punts it to a worker thread, or if all the
worker threads are all busy, does it in the current thread.

fscache_enable_cookie() and fscache_disable_cookie() are replaced by 'use'
and 'unuse' routines to which the mode of access (readonly or writable) is
declared - these then impose the policy of what to do with the backing
object.

The policy for handling local writes is declared to
fscache_acquire_cookie() using FSACHE_ADV_WRITE_*CACHE flags.  This only
allows for the possibility of suspending caching whilst a file is open for
writing; policies such as write-through and write-back have to be handled
at the netfs level.

At some point in the future, object records that aren't in use will get put
on an LRU and discarded under memory pressure or if they haven't been used
for a while.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/cell.c                     |    4 
 fs/afs/file.c                     |   15 +
 fs/afs/inode.c                    |   13 
 fs/afs/volume.c                   |    4 
 fs/cachefiles/bind.c              |    4 
 fs/cachefiles/interface.c         |  160 +++---
 fs/cachefiles/internal.h          |   12 
 fs/cachefiles/main.c              |   11 
 fs/cachefiles/namei.c             |   60 +-
 fs/cachefiles/xattr.c             |    1 
 fs/fscache/Makefile               |    3 
 fs/fscache/cache.c                |  107 ++--
 fs/fscache/cookie.c               |  485 +++++-------------
 fs/fscache/fsdef.c                |    1 
 fs/fscache/histogram.c            |    2 
 fs/fscache/internal.h             |   53 +-
 fs/fscache/main.c                 |   29 -
 fs/fscache/netfs.c                |    4 
 fs/fscache/obj.c                  |  352 +++++++++++++
 fs/fscache/object-list.c          |   58 --
 fs/fscache/object.c               |  982 -------------------------------------
 fs/fscache/object_bits.c          |  120 +++++
 fs/fscache/proc.c                 |    6 
 fs/fscache/stats.c                |    2 
 include/linux/fscache-cache.h     |  208 ++------
 include/linux/fscache.h           |  173 +++----
 include/trace/events/cachefiles.h |   17 -
 include/trace/events/fscache.h    |  133 +----
 28 files changed, 1019 insertions(+), 2000 deletions(-)
 create mode 100644 fs/fscache/obj.c
 delete mode 100644 fs/fscache/object.c
 create mode 100644 fs/fscache/object_bits.c

diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index 7fdc39b1c965..1cdabbdeeb2c 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -613,7 +613,7 @@ static int afs_activate_cell(struct afs_net *net, struct afs_cell *cell)
 					     NULL,
 					     cell->name, strlen(cell->name),
 					     NULL, 0,
-					     0, true);
+					     0);
 #endif
 	ret = afs_proc_cell_setup(cell);
 	if (ret < 0)
@@ -652,7 +652,7 @@ static void afs_deactivate_cell(struct afs_net *net, struct afs_cell *cell)
 	mutex_unlock(&net->proc_cells_lock);
 
 #ifdef CONFIG_AFS_FSCACHE
-	fscache_relinquish_cookie(cell->cache, NULL, false);
+	fscache_relinquish_cookie(cell->cache, false);
 	cell->cache = NULL;
 #endif
 
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 13f078111bad..75058de366a1 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -145,7 +145,9 @@ int afs_open(struct inode *inode, struct file *file)
 
 	if (file->f_flags & O_TRUNC)
 		set_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags);
-	
+
+	fscache_use_cookie(afs_vnode_cache(vnode), file->f_mode & FMODE_WRITE);
+
 	file->private_data = af;
 	_leave(" = 0");
 	return 0;
@@ -164,8 +166,10 @@ int afs_open(struct inode *inode, struct file *file)
  */
 int afs_release(struct inode *inode, struct file *file)
 {
+	struct afs_vnode_cache_aux aux;
 	struct afs_vnode *vnode = AFS_FS_I(inode);
 	struct afs_file *af = file->private_data;
+	loff_t i_size;
 	int ret = 0;
 
 	_enter("{%llx:%llu},", vnode->fid.vid, vnode->fid.vnode);
@@ -176,6 +180,15 @@ int afs_release(struct inode *inode, struct file *file)
 	file->private_data = NULL;
 	if (af->wb)
 		afs_put_wb_key(af->wb);
+
+	if ((file->f_mode & FMODE_WRITE)) {
+		i_size = i_size_read(&vnode->vfs_inode);
+		aux.data_version = vnode->status.data_version;
+		fscache_unuse_cookie(afs_vnode_cache(vnode), &aux, &i_size);
+	} else {
+		fscache_unuse_cookie(afs_vnode_cache(vnode), NULL, NULL);
+	}
+
 	key_put(af->key);
 	kfree(af);
 	afs_prune_wb_keys(vnode);
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 0a4623c38cf5..49d897437998 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -439,7 +439,7 @@ static void afs_get_inode_cache(struct afs_vnode *vnode)
 		NULL,
 		&key, sizeof(key),
 		&aux, sizeof(aux),
-		vnode->status.size, true);
+		vnode->status.size);
 #endif
 }
 
@@ -793,14 +793,9 @@ void afs_evict_inode(struct inode *inode)
 	}
 
 #ifdef CONFIG_AFS_FSCACHE
-	{
-		struct afs_vnode_cache_aux aux;
-
-		aux.data_version = vnode->status.data_version;
-		fscache_relinquish_cookie(vnode->cache, &aux,
-					  test_bit(AFS_VNODE_DELETED, &vnode->flags));
-		vnode->cache = NULL;
-	}
+	fscache_relinquish_cookie(vnode->cache,
+				  test_bit(AFS_VNODE_DELETED, &vnode->flags));
+	vnode->cache = NULL;
 #endif
 
 	afs_prune_wb_keys(vnode);
diff --git a/fs/afs/volume.c b/fs/afs/volume.c
index 5e3fe894a0f9..13a953df5d75 100644
--- a/fs/afs/volume.c
+++ b/fs/afs/volume.c
@@ -276,7 +276,7 @@ void afs_activate_volume(struct afs_volume *volume)
 					       "AFS.vol",
 					       0, NULL,
 					       &volume->vid, sizeof(volume->vid),
-					       NULL, 0, 0, true);
+					       NULL, 0, 0);
 #endif
 }
 
@@ -288,7 +288,7 @@ void afs_deactivate_volume(struct afs_volume *volume)
 	_enter("%s", volume->name);
 
 #ifdef CONFIG_AFS_FSCACHE
-	fscache_relinquish_cookie(volume->cache, NULL,
+	fscache_relinquish_cookie(volume->cache,
 				  test_bit(AFS_VOLUME_DELETED, &volume->flags));
 	volume->cache = NULL;
 #endif
diff --git a/fs/cachefiles/bind.c b/fs/cachefiles/bind.c
index dfb14dbddf51..4c59e1ef4500 100644
--- a/fs/cachefiles/bind.c
+++ b/fs/cachefiles/bind.c
@@ -99,12 +99,10 @@ static int cachefiles_daemon_add_cache(struct cachefiles_cache *cache)
 	/* allocate the root index object */
 	ret = -ENOMEM;
 
-	fsdef = kmem_cache_alloc(cachefiles_object_jar, GFP_KERNEL);
+	fsdef = kmem_cache_zalloc(cachefiles_object_jar, GFP_KERNEL);
 	if (!fsdef)
 		goto error_root_object;
 
-	ASSERTCMP(fsdef->backer, ==, NULL);
-
 	atomic_set(&fsdef->usage, 1);
 	fsdef->type = FSCACHE_COOKIE_TYPE_INDEX;
 
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index b868afb970ad..e4d1a82b9f33 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -8,39 +8,56 @@
 #include <linux/slab.h>
 #include <linux/mount.h>
 #include <linux/xattr.h>
+#include <linux/file.h>
 #include "internal.h"
 
 static int cachefiles_attr_changed(struct cachefiles_object *object);
+static void cachefiles_put_object(struct fscache_object *_object,
+				  enum fscache_obj_ref_trace why);
 
 /*
- * allocate an object record for a cookie lookup and prepare the lookup data
+ * Allocate an object record for a cookie lookup and prepare the lookup data.
+ * Eats the caller's ref on parent.
  */
-static struct fscache_object *cachefiles_alloc_object(
-	struct fscache_cache *_cache,
-	struct fscache_cookie *cookie)
+static
+struct fscache_object *cachefiles_alloc_object(struct fscache_cookie *cookie,
+					       struct fscache_cache *_cache,
+					       struct fscache_object *parent)
 {
 	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
-	unsigned keylen;
-	void *buffer, *p;
-	char *key;
 
 	cache = container_of(_cache, struct cachefiles_cache, cache);
 
 	_enter("{%s},%p,", cache->cache.identifier, cookie);
 
-	/* create a new object record and a temporary leaf image */
-	object = kmem_cache_alloc(cachefiles_object_jar, cachefiles_gfp);
-	if (!object)
-		goto nomem_object;
-
-	ASSERTCMP(object->backer, ==, NULL);
-
-	atomic_set(&object->usage, 1);
+	object = kmem_cache_zalloc(cachefiles_object_jar, cachefiles_gfp);
+	if (!object) {
+		cachefiles_put_object(parent, fscache_obj_put_alloc_fail);
+		return NULL;
+	}
 
 	fscache_object_init(&object->fscache, cookie, &cache->cache);
+	object->fscache.parent = parent;
+	object->fscache.stage = FSCACHE_OBJECT_STAGE_LOOKING_UP;
+	atomic_set(&object->usage, 1);
 
 	object->type = cookie->type;
+	trace_cachefiles_ref(object, cookie,
+			     (enum cachefiles_obj_ref_trace)fscache_obj_new, 1);
+	return &object->fscache;
+}
+
+/*
+ * Prepare data for use in lookup.  This involves cooking the binary key into
+ * something that can be used as a filename.
+ */
+static void *cachefiles_prepare_lookup_data(struct fscache_object *object)
+{
+	struct fscache_cookie *cookie = object->cookie;
+	unsigned keylen;
+	void *buffer, *p;
+	char *key;
 
 	/* get hold of the raw key
 	 * - stick the length on the front and leave space on the back for the
@@ -48,7 +65,7 @@ static struct fscache_object *cachefiles_alloc_object(
 	 */
 	buffer = kmalloc((2 + 512) + 3, cachefiles_gfp);
 	if (!buffer)
-		goto nomem_buffer;
+		goto nomem;
 
 	keylen = cookie->key_len;
 	p = fscache_get_key(cookie);
@@ -60,36 +77,29 @@ static struct fscache_object *cachefiles_alloc_object(
 	((char *)buffer)[keylen + 4] = 0;
 
 	/* turn the raw key into something that can work with as a filename */
-	key = cachefiles_cook_key(buffer, keylen + 2, object->type);
+	key = cachefiles_cook_key(buffer, keylen + 2, cookie->type);
 	kfree(buffer);
 	if (!key)
-		goto nomem_key;
+		goto nomem;
 
-	object->lookup_key = key;
+	_leave(" = %s", key);
+	return key;
 
-	_leave(" = %p [%s]", &object->fscache, key);
-	return &object->fscache;
-
-nomem_key:
-	kfree(buffer);
-nomem_buffer:
-	kmem_cache_free(cachefiles_object_jar, object);
-	fscache_object_destroyed(&cache->cache);
-nomem_object:
-	_leave(" = -ENOMEM");
+nomem:
 	return ERR_PTR(-ENOMEM);
 }
 
 /*
- * attempt to look up the nominated node in this cache
- * - return -ETIMEDOUT to be scheduled again
+ * Attempt to look up the nominated node in this cache
  */
-static int cachefiles_lookup_object(struct fscache_object *_object)
+static bool cachefiles_lookup_object(struct fscache_object *_object,
+				     void *lookup_data)
 {
 	struct cachefiles_object *parent, *object;
 	struct cachefiles_cache *cache;
 	const struct cred *saved_cred;
-	int ret;
+	char *lookup_key = lookup_data;
+	bool success;
 
 	_enter("{OBJ%x}", _object->debug_id);
 
@@ -98,47 +108,30 @@ static int cachefiles_lookup_object(struct fscache_object *_object)
 			      struct cachefiles_object, fscache);
 	object = container_of(_object, struct cachefiles_object, fscache);
 
-	ASSERTCMP(object->lookup_key, !=, NULL);
+	ASSERTCMP(lookup_key, !=, NULL);
 
 	/* look up the key, creating any missing bits */
 	cachefiles_begin_secure(cache, &saved_cred);
-	ret = cachefiles_walk_to_object(parent, object, object->lookup_key);
+	success = cachefiles_walk_to_object(parent, object, lookup_key);
 	cachefiles_end_secure(cache, saved_cred);
 
 	/* polish off by setting the attributes of non-index files */
-	if (ret == 0 &&
+	if (success &&
 	    object->fscache.cookie->type != FSCACHE_COOKIE_TYPE_INDEX)
 		cachefiles_attr_changed(object);
 
-	if (ret < 0 && ret != -ETIMEDOUT) {
-		if (ret != -ENOBUFS)
-			pr_warn("Lookup failed error %d\n", ret);
-		fscache_object_lookup_error(&object->fscache);
-	}
-
-	_leave(" [%d]", ret);
-	return ret;
+	_leave(" [%d]", success);
+	return success;
 }
 
-/*
- * indication of lookup completion
- */
-static void cachefiles_lookup_complete(struct fscache_object *_object)
+static void cachefiles_free_lookup_data(struct fscache_object *object, void *lookup_data)
 {
-	struct cachefiles_object *object;
-
-	object = container_of(_object, struct cachefiles_object, fscache);
-
-	_enter("{OBJ%x}", object->fscache.debug_id);
-
-	kfree(object->lookup_key);
-	object->lookup_key = NULL;
+	kfree(lookup_data);
 }
 
 /*
  * increment the usage count on an inode object (may fail if unmounting)
  */
-static
 struct fscache_object *cachefiles_grab_object(struct fscache_object *_object,
 					      enum fscache_obj_ref_trace why)
 {
@@ -166,6 +159,8 @@ static void cachefiles_update_object(struct fscache_object *_object)
 	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
 	const struct cred *saved_cred;
+	loff_t object_size;
+	int ret;
 
 	_enter("{OBJ%x}", _object->debug_id);
 
@@ -174,7 +169,25 @@ static void cachefiles_update_object(struct fscache_object *_object)
 			     cache);
 
 	cachefiles_begin_secure(cache, &saved_cred);
+
+	object_size = object->fscache.cookie->object_size;
+	if (i_size_read(d_inode(object->backer)) > object_size) {
+		struct path path = {
+			.mnt	= cache->mnt,
+			.dentry	= object->backer
+		};
+		_debug("trunc %llx -> %llx", i_size_read(d_inode(object->backer)), object_size);
+		ret = vfs_truncate(&path, object_size);
+		if (ret < 0) {
+			cachefiles_io_error_obj(object, "Trunc-to-size failed");
+			cachefiles_remove_object_xattr(cache, object->backer);
+			goto out;
+		}
+	}
+
 	cachefiles_set_object_xattr(object, XATTR_REPLACE);
+
+out:
 	cachefiles_end_secure(cache, saved_cred);
 	_leave("");
 }
@@ -183,7 +196,8 @@ static void cachefiles_update_object(struct fscache_object *_object)
  * discard the resources pinned by an object and effect retirement if
  * requested
  */
-static void cachefiles_drop_object(struct fscache_object *_object)
+static void cachefiles_drop_object(struct fscache_object *_object,
+				   bool invalidate)
 {
 	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
@@ -209,17 +223,18 @@ static void cachefiles_drop_object(struct fscache_object *_object)
 	 * before we set it up.
 	 */
 	if (object->dentry) {
-		/* delete retired objects */
-		if (test_bit(FSCACHE_OBJECT_RETIRED, &object->fscache.flags) &&
-		    _object != cache->cache.fsdef
-		    ) {
-			_debug("- retire object OBJ%x", object->fscache.debug_id);
+		if (invalidate && _object != cache->cache.fsdef) {
+			_debug("- inval object OBJ%x", object->fscache.debug_id);
 			cachefiles_begin_secure(cache, &saved_cred);
 			cachefiles_delete_object(cache, object);
 			cachefiles_end_secure(cache, saved_cred);
 		}
 
 		/* close the filesystem stuff attached to the object */
+		if (object->backing_file)
+			fput(object->backing_file);
+		object->backing_file = NULL;
+
 		if (object->backer != object->dentry)
 			dput(object->backer);
 		object->backer = NULL;
@@ -253,9 +268,6 @@ static void cachefiles_put_object(struct fscache_object *_object,
 	ASSERT((atomic_read(&object->usage) & 0xffff0000) != 0x6b6b0000);
 #endif
 
-	ASSERTIFCMP(object->fscache.parent,
-		    object->fscache.parent->n_children, >, 0);
-
 	u = atomic_dec_return(&object->usage);
 	trace_cachefiles_ref(object, _object->cookie,
 			     (enum cachefiles_obj_ref_trace)why, u);
@@ -263,14 +275,10 @@ static void cachefiles_put_object(struct fscache_object *_object,
 	if (u == 0) {
 		_debug("- kill object OBJ%x", object->fscache.debug_id);
 
-		ASSERTCMP(object->fscache.parent, ==, NULL);
 		ASSERTCMP(object->backer, ==, NULL);
 		ASSERTCMP(object->dentry, ==, NULL);
-		ASSERTCMP(object->fscache.n_ops, ==, 0);
 		ASSERTCMP(object->fscache.n_children, ==, 0);
 
-		kfree(object->lookup_key);
-
 		cache = object->fscache.cache;
 		fscache_object_destroy(&object->fscache);
 		kmem_cache_free(cachefiles_object_jar, object);
@@ -416,15 +424,25 @@ static void cachefiles_invalidate_object(struct fscache_object *_object)
 	_leave("");
 }
 
+static unsigned int cachefiles_get_object_usage(const struct fscache_object *_object)
+{
+	struct cachefiles_object *object;
+
+	object = container_of(_object, struct cachefiles_object, fscache);
+	return atomic_read(&object->usage);
+}
+
 const struct fscache_cache_ops cachefiles_cache_ops = {
 	.name			= "cachefiles",
 	.alloc_object		= cachefiles_alloc_object,
+	.prepare_lookup_data	= cachefiles_prepare_lookup_data,
 	.lookup_object		= cachefiles_lookup_object,
-	.lookup_complete	= cachefiles_lookup_complete,
+	.free_lookup_data	= cachefiles_free_lookup_data,
 	.grab_object		= cachefiles_grab_object,
 	.update_object		= cachefiles_update_object,
 	.invalidate_object	= cachefiles_invalidate_object,
 	.drop_object		= cachefiles_drop_object,
 	.put_object		= cachefiles_put_object,
+	.get_object_usage	= cachefiles_get_object_usage,
 	.sync_cache		= cachefiles_sync_cache,
 };
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index f8f308ce7385..b89f76a03546 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -34,13 +34,13 @@ extern unsigned cachefiles_debug;
  */
 struct cachefiles_object {
 	struct fscache_object		fscache;	/* fscache handle */
-	char				*lookup_key;	/* key to look up */
 	struct dentry			*dentry;	/* the file/dir representing this object */
 	struct dentry			*backer;	/* backing file */
+	struct file			*backing_file;	/* File open on backing storage */
 	loff_t				i_size;		/* object size */
 	atomic_t			usage;		/* object usage count */
 	uint8_t				type;		/* object type */
-	uint8_t				new;		/* T if object new */
+	bool				new;		/* T if object new */
 };
 
 extern struct kmem_cache *cachefiles_object_jar;
@@ -112,6 +112,8 @@ extern int cachefiles_has_space(struct cachefiles_cache *cache,
  * interface.c
  */
 extern const struct fscache_cache_ops cachefiles_cache_ops;
+extern struct fscache_object *cachefiles_grab_object(struct fscache_object *_object,
+						     enum fscache_obj_ref_trace why);
 
 /*
  * key.c
@@ -125,9 +127,9 @@ extern void cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
 				    struct dentry *dentry);
 extern int cachefiles_delete_object(struct cachefiles_cache *cache,
 				    struct cachefiles_object *object);
-extern int cachefiles_walk_to_object(struct cachefiles_object *parent,
-				     struct cachefiles_object *object,
-				     const char *key);
+extern bool cachefiles_walk_to_object(struct cachefiles_object *parent,
+				      struct cachefiles_object *object,
+				      const char *key);
 extern struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
 					       struct dentry *dir,
 					       const char *name);
diff --git a/fs/cachefiles/main.c b/fs/cachefiles/main.c
index 3f0101a74809..3373f3eddbf8 100644
--- a/fs/cachefiles/main.c
+++ b/fs/cachefiles/main.c
@@ -37,13 +37,6 @@ static struct miscdevice cachefiles_dev = {
 	.fops	= &cachefiles_daemon_fops,
 };
 
-static void cachefiles_object_init_once(void *_object)
-{
-	struct cachefiles_object *object = _object;
-
-	memset(object, 0, sizeof(*object));
-}
-
 /*
  * initialise the fs caching module
  */
@@ -60,9 +53,7 @@ static int __init cachefiles_init(void)
 	cachefiles_object_jar =
 		kmem_cache_create("cachefiles_object_jar",
 				  sizeof(struct cachefiles_object),
-				  0,
-				  SLAB_HWCACHE_ALIGN,
-				  cachefiles_object_init_once);
+				  0, SLAB_HWCACHE_ALIGN, NULL);
 	if (!cachefiles_object_jar) {
 		pr_notice("Failed to allocate an object jar\n");
 		goto error_object_jar;
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 818d1bca1904..d1c8828ebbbb 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -18,8 +18,6 @@
 #include <linux/slab.h>
 #include "internal.h"
 
-#define CACHEFILES_KEYBUF_SIZE 512
-
 /*
  * Mark the backing file as being a cache file if it's not already in use so.
  */
@@ -257,13 +255,24 @@ int cachefiles_delete_object(struct cachefiles_cache *cache,
 	return ret;
 }
 
+/*
+ * Handle negative lookup.  We create a temporary file for a datafile to use
+ * which we will link into place later.
+ */
+static void cachefiles_negative_lookup(struct cachefiles_object *parent,
+				       struct cachefiles_object *object)
+{
+	// TODO: Use vfs_tmpfile() for datafiles
+	//fscache_object_lookup_negative(&object->fscache);
+}
+
 /*
  * walk from the parent object to the child object through the backing
  * filesystem, creating directories as we go
  */
-int cachefiles_walk_to_object(struct cachefiles_object *parent,
-			      struct cachefiles_object *object,
-			      const char *key)
+bool cachefiles_walk_to_object(struct cachefiles_object *parent,
+			       struct cachefiles_object *object,
+			       const char *key)
 {
 	struct cachefiles_cache *cache;
 	struct dentry *dir, *next = NULL;
@@ -271,7 +280,7 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	struct path path;
 	unsigned long start;
 	const char *name;
-	bool marked = false;
+	bool marked = false, negated = false;
 	int ret, nlen;
 
 	_enter("OBJ%x{%p},OBJ%x,%s,",
@@ -288,7 +297,7 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	if (!(d_is_dir(parent->dentry))) {
 		// TODO: convert file to dir
 		_leave("looking up in none directory");
-		return -ENOBUFS;
+		return false;
 	}
 
 	dir = dget(parent->dentry);
@@ -328,8 +337,10 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	/* if this element of the path doesn't exist, then the lookup phase
 	 * failed, and we can release any readers in the certain knowledge that
 	 * there's nothing for them to actually read */
-	if (d_is_negative(next))
-		fscache_object_lookup_negative(&object->fscache);
+	if (d_is_negative(next) && !negated) {
+		cachefiles_negative_lookup(object, parent);
+		negated = true;
+	}
 
 	/* we need to create the object if it's negative */
 	if (key || object->type == FSCACHE_COOKIE_TYPE_INDEX) {
@@ -444,6 +455,9 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 			fscache_object_retrying_stale(&object->fscache);
 			goto lookup_again;
 		}
+
+		if (ret < 0)
+			goto check_error_unlock;
 	}
 
 	inode_unlock(d_inode(dir));
@@ -469,14 +483,10 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	/* open a file interface onto a data file */
 	if (object->type != FSCACHE_COOKIE_TYPE_INDEX) {
 		if (d_is_reg(object->dentry)) {
-			const struct address_space_operations *aops;
-
-			ret = -EPERM;
-			aops = d_backing_inode(object->dentry)->i_mapping->a_ops;
-			if (!aops->bmap)
-				goto check_error;
-			if (object->dentry->d_sb->s_blocksize > PAGE_SIZE)
+			if (object->dentry->d_sb->s_blocksize > PAGE_SIZE) {
+				pr_warn("cachefiles: Block size too large\n");
 				goto check_error;
+			}
 
 			object->backer = object->dentry;
 		} else {
@@ -484,11 +494,15 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 		}
 	}
 
-	object->new = 0;
-	fscache_obtained_object(&object->fscache);
+	if (object->new)
+		object->fscache.stage = FSCACHE_OBJECT_STAGE_LIVE_EMPTY;
+	else
+		object->fscache.stage = FSCACHE_OBJECT_STAGE_LIVE;
+	wake_up_var(&object->fscache.stage);
 
-	_leave(" = 0 [%lu]", d_backing_inode(object->dentry)->i_ino);
-	return 0;
+	object->new = false;
+	_leave(" = t [%lu]", d_backing_inode(object->dentry)->i_ino);
+	return true;
 
 no_space_error:
 	fscache_object_mark_killed(&object->fscache, FSCACHE_OBJECT_NO_SPACE);
@@ -520,8 +534,10 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 error_out2:
 	dput(dir);
 error_out:
-	_leave(" = error %d", -ret);
-	return ret;
+	object->fscache.stage = FSCACHE_OBJECT_STAGE_DEAD;
+	wake_up_var(&object->fscache.stage);
+	_leave(" = f");
+	return false;
 }
 
 /*
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index a4f1eddebe6f..5b2f6da91cc8 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -122,7 +122,6 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 	if (len > 0)
 		memcpy(buf->data, fscache_get_aux(object->fscache.cookie), len);
 
-	clear_bit(FSCACHE_COOKIE_AUX_UPDATED, &object->fscache.cookie->flags);
 	ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
 			   buf, sizeof(struct cachefiles_xattr) + len,
 			   xattr_flags);
diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
index 7b10c6aad157..396e1b5fdc28 100644
--- a/fs/fscache/Makefile
+++ b/fs/fscache/Makefile
@@ -10,7 +10,8 @@ fscache-y := \
 	fsdef.o \
 	main.o \
 	netfs.o \
-	object.o
+	obj.o \
+	object_bits.o
 
 fscache-$(CONFIG_PROC_FS) += proc.o
 fscache-$(CONFIG_FSCACHE_STATS) += stats.o
diff --git a/fs/fscache/cache.c b/fs/fscache/cache.c
index 92f747b72ec3..8f2dd40f93fb 100644
--- a/fs/fscache/cache.c
+++ b/fs/fscache/cache.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0-or-later
 /* FS-Cache cache handling
  *
- * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Copyright (C) 2007, 2019 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
  */
 
@@ -93,6 +93,7 @@ struct fscache_cache *fscache_select_cache_for_object(
 	struct fscache_cookie *cookie)
 {
 	struct fscache_cache_tag *tag;
+	struct fscache_cookie *parent = cookie->parent;
 	struct fscache_object *object;
 	struct fscache_cache *cache;
 
@@ -104,33 +105,32 @@ struct fscache_cache *fscache_select_cache_for_object(
 	}
 
 	/* we check the parent to determine the cache to use */
-	spin_lock(&cookie->lock);
+	spin_lock(&parent->lock);
 
 	/* the first in the parent's backing list should be the preferred
 	 * cache */
-	if (!hlist_empty(&cookie->backing_objects)) {
-		object = hlist_entry(cookie->backing_objects.first,
+	if (!hlist_empty(&parent->backing_objects)) {
+		object = hlist_entry(parent->backing_objects.first,
 				     struct fscache_object, cookie_link);
 
 		cache = object->cache;
-		if (fscache_object_is_dying(object) ||
-		    test_bit(FSCACHE_IOERROR, &cache->flags))
+		if (test_bit(FSCACHE_IOERROR, &cache->flags))
 			cache = NULL;
 
-		spin_unlock(&cookie->lock);
+		spin_unlock(&parent->lock);
 		_leave(" = %p [parent]", cache);
 		return cache;
 	}
 
 	/* the parent is unbacked */
-	if (cookie->type != FSCACHE_COOKIE_TYPE_INDEX) {
+	if (parent->type != FSCACHE_COOKIE_TYPE_INDEX) {
 		/* cookie not an index and is unbacked */
-		spin_unlock(&cookie->lock);
-		_leave(" = NULL [cookie ub,ni]");
+		spin_unlock(&parent->lock);
+		_leave(" = NULL [parent ub,ni]");
 		return NULL;
 	}
 
-	spin_unlock(&cookie->lock);
+	spin_unlock(&parent->lock);
 
 	tag = cookie->preferred_cache;
 	if (!tag)
@@ -210,10 +210,7 @@ int fscache_add_cache(struct fscache_cache *cache,
 	BUG_ON(!ifsdef);
 
 	cache->flags = 0;
-	ifsdef->event_mask =
-		((1 << NR_FSCACHE_OBJECT_EVENTS) - 1) &
-		~(1 << FSCACHE_OBJECT_EV_CLEARED);
-	__set_bit(FSCACHE_OBJECT_IS_AVAILABLE, &ifsdef->flags);
+	ifsdef->stage = FSCACHE_OBJECT_STAGE_LIVE;
 
 	if (!tagname)
 		tagname = cache->identifier;
@@ -308,33 +305,57 @@ void fscache_io_error(struct fscache_cache *cache)
 EXPORT_SYMBOL(fscache_io_error);
 
 /*
- * request withdrawal of all the objects in a cache
- * - all the objects being withdrawn are moved onto the supplied list
+ * Withdraw an object.
  */
-static void fscache_withdraw_all_objects(struct fscache_cache *cache,
-					 struct list_head *dying_objects)
+static void fscache_withdraw_object(struct fscache_cookie *cookie,
+				    struct fscache_object *object,
+				    int param)
 {
-	struct fscache_object *object;
+	_enter("c=%08x o=%08x", cookie ? cookie->debug_id : 0, object->debug_id);
 
-	while (!list_empty(&cache->object_list)) {
-		spin_lock(&cache->object_list_lock);
+	_debug("WITHDRAW %x", object->debug_id);
 
-		if (!list_empty(&cache->object_list)) {
-			object = list_entry(cache->object_list.next,
-					    struct fscache_object, cache_link);
-			list_move_tail(&object->cache_link, dying_objects);
+	spin_lock(&cookie->lock);
+	cookie->stage = FSCACHE_COOKIE_STAGE_DEAD;
+	hlist_del_init(&object->cookie_link);
+	spin_unlock(&cookie->lock);
 
-			_debug("withdraw %p", object->cookie);
+	fscache_drop_object(cookie, object, param);
+	object->cache->ops->put_object(object, fscache_obj_put_withdraw);
+}
 
-			/* This must be done under object_list_lock to prevent
-			 * a race with fscache_drop_object().
-			 */
-			fscache_raise_event(object, FSCACHE_OBJECT_EV_KILL);
-		}
+/*
+ * Request withdrawal of all the objects in a cache.
+ */
+static void fscache_withdraw_all_objects(struct fscache_cache *cache)
+{
+	struct fscache_object *object;
 
+	_enter("");
+
+	spin_lock(&cache->object_list_lock);
+	while (!list_empty(&cache->object_list)) {
+		/* Go through the list backwards so that we do children before
+		 * their parents.
+		 */
+		object = list_entry(cache->object_list.prev,
+				    struct fscache_object, cache_link);
+		list_del_init(&object->cache_link);
+		cache->ops->grab_object(object, fscache_obj_get_withdraw);
 		spin_unlock(&cache->object_list_lock);
+
+		_debug("o=%08x n=%u", object->debug_id, object->n_children);
+		wait_var_event(&object->n_children, object->n_children == 0);
+
+		fscache_dispatch(object->cookie, object, 0,
+				 fscache_withdraw_object);
+
 		cond_resched();
+		spin_lock(&cache->object_list_lock);
 	}
+	spin_unlock(&cache->object_list_lock);
+
+	_leave("");
 }
 
 /**
@@ -349,12 +370,10 @@ static void fscache_withdraw_all_objects(struct fscache_cache *cache,
  */
 void fscache_withdraw_cache(struct fscache_cache *cache)
 {
-	LIST_HEAD(dying_objects);
-
 	_enter("");
 
-	pr_notice("Withdrawing cache \"%s\"\n",
-		  cache->tag->name);
+	pr_notice("Withdrawing cache \"%s\" (%u objs)\n",
+		  cache->tag->name, atomic_read(&cache->object_count));
 
 	/* make the cache unavailable for cookie acquisition */
 	if (test_and_set_bit(FSCACHE_CACHE_WITHDRAWN, &cache->flags))
@@ -365,29 +384,27 @@ void fscache_withdraw_cache(struct fscache_cache *cache)
 	cache->tag->cache = NULL;
 	up_write(&fscache_addremove_sem);
 
-	/* make sure all pages pinned by operations on behalf of the netfs are
-	 * written to disk */
-	fscache_stat(&fscache_n_cop_sync_cache);
-	cache->ops->sync_cache(cache);
-	fscache_stat_d(&fscache_n_cop_sync_cache);
-
 	/* we now have to destroy all the active objects pertaining to this
 	 * cache - which we do by passing them off to thread pool to be
 	 * disposed of */
 	_debug("destroy");
 
-	fscache_withdraw_all_objects(cache, &dying_objects);
+	fscache_withdraw_all_objects(cache);
+
+	/* make sure all outstanding data is written to disk */
+	fscache_stat(&fscache_n_cop_sync_cache);
+	cache->ops->sync_cache(cache);
+	fscache_stat_d(&fscache_n_cop_sync_cache);
 
 	/* wait for all extant objects to finish their outstanding operations
 	 * and go away */
-	_debug("wait for finish");
+	_debug("wait for finish %u", atomic_read(&cache->object_count));
 	wait_event(fscache_cache_cleared_wq,
 		   atomic_read(&cache->object_count) == 0);
 	_debug("wait for clearance");
 	wait_event(fscache_cache_cleared_wq,
 		   list_empty(&cache->object_list));
 	_debug("cleared");
-	ASSERT(list_empty(&dying_objects));
 
 	kobject_put(cache->kobj);
 
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index aa33d16711e1..a8aa1639e93b 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0-or-later
 /* netfs cookie management
  *
- * Copyright (C) 2004-2007 Red Hat, Inc. All Rights Reserved.
+ * Copyright (C) 2004-2007, 2019 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
  *
  * See Documentation/filesystems/caching/netfs-api.rst for more information on
@@ -15,19 +15,11 @@
 
 struct kmem_cache *fscache_cookie_jar;
 
-static atomic_t fscache_object_debug_id = ATOMIC_INIT(0);
-
 #define fscache_cookie_hash_shift 15
 static struct hlist_bl_head fscache_cookie_hash[1 << fscache_cookie_hash_shift];
 static LIST_HEAD(fscache_cookies);
 static DEFINE_RWLOCK(fscache_cookies_lock);
 
-static int fscache_acquire_non_index_cookie(struct fscache_cookie *cookie);
-static int fscache_alloc_object(struct fscache_cache *cache,
-				struct fscache_cookie *cookie);
-static int fscache_attach_object(struct fscache_cookie *cookie,
-				 struct fscache_object *object);
-
 static void fscache_print_cookie(struct fscache_cookie *cookie, char prefix)
 {
 	struct fscache_object *object;
@@ -162,8 +154,8 @@ struct fscache_cookie *fscache_alloc_cookie(
 	if (!cookie)
 		return NULL;
 
-	cookie->type = type;
-	cookie->advice = advice;
+	cookie->type	= type;
+	cookie->advice	= advice;
 	cookie->key_len = index_key_len;
 	cookie->aux_len = aux_data_len;
 	cookie->object_size = object_size;
@@ -180,19 +172,21 @@ struct fscache_cookie *fscache_alloc_cookie(
 			goto nomem;
 	}
 
+	cookie->parent = parent;
 	atomic_set(&cookie->usage, 1);
 	atomic_set(&cookie->n_children, 0);
+	atomic_set(&cookie->n_ops, 1);
 	cookie->debug_id = atomic_inc_return(&fscache_cookie_debug_id);
 
+	cookie->stage = FSCACHE_COOKIE_STAGE_QUIESCENT;
+
 	/* We keep the active count elevated until relinquishment to prevent an
 	 * attempt to wake up every time the object operations queue quiesces.
 	 */
 	atomic_set(&cookie->n_active, 1);
 
-	cookie->parent		= parent;
 	cookie->preferred_cache	= fscache_get_cache_tag(preferred_cache);
-	
-	cookie->flags		= (1 << FSCACHE_COOKIE_NO_DATA_YET);
+
 	spin_lock_init(&cookie->lock);
 	INIT_HLIST_HEAD(&cookie->backing_objects);
 
@@ -270,13 +264,12 @@ struct fscache_cookie *__fscache_acquire_cookie(
 	struct fscache_cache_tag *preferred_cache,
 	const void *index_key, size_t index_key_len,
 	const void *aux_data, size_t aux_data_len,
-	loff_t object_size,
-	bool enable)
+	loff_t object_size)
 {
 	struct fscache_cookie *candidate, *cookie;
 
-	_enter("{%s},{%s},%u",
-	       parent ? parent->type_name : "<no-parent>", type_name, enable);
+	_enter("{%s},{%s}",
+	       parent ? parent->type_name : "<no-parent>", type_name);
 
 	if (!index_key || !index_key_len || index_key_len > 255 || aux_data_len > 255)
 		return NULL;
@@ -308,6 +301,7 @@ struct fscache_cookie *__fscache_acquire_cookie(
 		_leave(" [ENOMEM]");
 		return NULL;
 	}
+	trace_fscache_cookie(candidate, fscache_cookie_new_acquire, 1);
 
 	cookie = fscache_hash_cookie(candidate);
 	if (!cookie) {
@@ -331,28 +325,8 @@ struct fscache_cookie *__fscache_acquire_cookie(
 	}
 
 	trace_fscache_acquire(cookie);
-
-	if (enable) {
-		/* if the object is an index then we need do nothing more here
-		 * - we create indices on disk when we need them as an index
-		 * may exist in multiple caches */
-		if (cookie->type != FSCACHE_COOKIE_TYPE_INDEX) {
-			if (fscache_acquire_non_index_cookie(cookie) == 0) {
-				set_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags);
-			} else {
-				atomic_dec(&parent->n_children);
-				fscache_cookie_put(cookie,
-						   fscache_cookie_put_acquire_nobufs);
-				fscache_stat(&fscache_n_acquires_nobufs);
-				_leave(" = NULL");
-				return NULL;
-			}
-		} else {
-			set_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags);
-		}
-	}
-
 	fscache_stat(&fscache_n_acquires_ok);
+	_leave(" = c=%08x", cookie->debug_id);
 
 out:
 	fscache_free_cookie(candidate);
@@ -361,211 +335,80 @@ struct fscache_cookie *__fscache_acquire_cookie(
 EXPORT_SYMBOL(__fscache_acquire_cookie);
 
 /*
- * Enable a cookie to permit it to accept new operations.
+ * Start using the cookie for I/O.  This prevents the backing object from being
+ * reaped by VM pressure.
  */
-void __fscache_enable_cookie(struct fscache_cookie *cookie,
-			     const void *aux_data,
-			     loff_t object_size,
-			     bool (*can_enable)(void *data),
-			     void *data)
+void __fscache_use_cookie(struct fscache_cookie *cookie, bool will_modify)
 {
-	_enter("%p", cookie);
+	enum fscache_cookie_stage stage;
 
-	trace_fscache_enable(cookie);
+	_enter("c=%08x", cookie->debug_id);
 
-	wait_on_bit_lock(&cookie->flags, FSCACHE_COOKIE_ENABLEMENT_LOCK,
-			 TASK_UNINTERRUPTIBLE);
-
-	cookie->object_size = object_size;
-	fscache_update_aux(cookie, aux_data);
-
-	if (test_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags))
-		goto out_unlock;
-
-	if (can_enable && !can_enable(data)) {
-		/* The netfs decided it didn't want to enable after all */
-	} else if (cookie->type != FSCACHE_COOKIE_TYPE_INDEX) {
-		/* Wait for outstanding disablement to complete */
-		__fscache_wait_on_invalidate(cookie);
-
-		if (fscache_acquire_non_index_cookie(cookie) == 0)
-			set_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags);
-	} else {
-		set_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags);
-	}
-
-out_unlock:
-	clear_bit_unlock(FSCACHE_COOKIE_ENABLEMENT_LOCK, &cookie->flags);
-	wake_up_bit(&cookie->flags, FSCACHE_COOKIE_ENABLEMENT_LOCK);
-}
-EXPORT_SYMBOL(__fscache_enable_cookie);
+	ASSERT(cookie->type != FSCACHE_COOKIE_TYPE_INDEX);
 
-/*
- * acquire a non-index cookie
- * - this must make sure the index chain is instantiated and instantiate the
- *   object representation too
- */
-static int fscache_acquire_non_index_cookie(struct fscache_cookie *cookie)
-{
-	struct fscache_object *object;
-	struct fscache_cache *cache;
-	int ret;
-
-	_enter("");
-
-	set_bit(FSCACHE_COOKIE_UNAVAILABLE, &cookie->flags);
-
-	/* now we need to see whether the backing objects for this cookie yet
-	 * exist, if not there'll be nothing to search */
-	down_read(&fscache_addremove_sem);
-
-	if (list_empty(&fscache_cache_list)) {
-		up_read(&fscache_addremove_sem);
-		_leave(" = 0 [no caches]");
-		return 0;
-	}
-
-	/* select a cache in which to store the object */
-	cache = fscache_select_cache_for_object(cookie->parent);
-	if (!cache) {
-		up_read(&fscache_addremove_sem);
-		fscache_stat(&fscache_n_acquires_no_cache);
-		_leave(" = -ENOMEDIUM [no cache]");
-		return -ENOMEDIUM;
-	}
-
-	_debug("cache %s", cache->tag->name);
-
-	set_bit(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags);
+	spin_lock(&cookie->lock);
 
-	/* ask the cache to allocate objects for this cookie and its parent
-	 * chain */
-	ret = fscache_alloc_object(cache, cookie);
-	if (ret < 0) {
-		up_read(&fscache_addremove_sem);
-		_leave(" = %d", ret);
-		return ret;
-	}
+	atomic_inc(&cookie->n_active);
 
-	spin_lock(&cookie->lock);
-	if (hlist_empty(&cookie->backing_objects)) {
+again:
+	stage = cookie->stage;
+	switch (stage) {
+	case FSCACHE_COOKIE_STAGE_QUIESCENT:
+		cookie->stage = FSCACHE_COOKIE_STAGE_INITIALISING;
 		spin_unlock(&cookie->lock);
-		goto unavailable;
-	}
 
-	object = hlist_entry(cookie->backing_objects.first,
-			     struct fscache_object, cookie_link);
+		/* The lookup job holds its own active increment */
+		atomic_inc(&cookie->n_active);
+		fscache_dispatch(cookie, NULL, 0, fscache_lookup_object);
+		break;
 
-	/* initiate the process of looking up all the objects in the chain
-	 * (done by fscache_initialise_object()) */
-	fscache_raise_event(object, FSCACHE_OBJECT_EV_NEW_CHILD);
+	case FSCACHE_COOKIE_STAGE_INITIALISING:
+	case FSCACHE_COOKIE_STAGE_LOOKING_UP:
+		spin_unlock(&cookie->lock);
+		wait_var_event(&cookie->stage, cookie->stage != stage);
+		spin_lock(&cookie->lock);
+		goto again;
 
-	spin_unlock(&cookie->lock);
+	case FSCACHE_COOKIE_STAGE_NO_DATA_YET:
+	case FSCACHE_COOKIE_STAGE_ACTIVE:
+	case FSCACHE_COOKIE_STAGE_INVALIDATING:
+		// TODO: Handle will_modify
+		spin_unlock(&cookie->lock);
+		break;
 
-	/* we may be required to wait for lookup to complete at this point */
-	if (!fscache_defer_lookup) {
-		_debug("non-deferred lookup %p", &cookie->flags);
-		wait_on_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP,
-			    TASK_UNINTERRUPTIBLE);
-		_debug("complete");
-		if (test_bit(FSCACHE_COOKIE_UNAVAILABLE, &cookie->flags))
-			goto unavailable;
+	case FSCACHE_COOKIE_STAGE_DEAD:
+		spin_unlock(&cookie->lock);
+		kdebug("dead");
+		break;
 	}
 
-	up_read(&fscache_addremove_sem);
-	_leave(" = 0 [deferred]");
-	return 0;
-
-unavailable:
-	up_read(&fscache_addremove_sem);
-	_leave(" = -ENOBUFS");
-	return -ENOBUFS;
+	_leave("");
 }
+EXPORT_SYMBOL(__fscache_use_cookie);
 
 /*
- * recursively allocate cache object records for a cookie/cache combination
- * - caller must be holding the addremove sem
+ * Stop using the cookie for I/O.
  */
-static int fscache_alloc_object(struct fscache_cache *cache,
-				struct fscache_cookie *cookie)
+void __fscache_unuse_cookie(struct fscache_cookie *cookie,
+			    const void *aux_data, const loff_t *object_size)
 {
-	struct fscache_object *object;
-	int ret;
-
-	_enter("%p,%p{%s}", cache, cookie, cookie->type_name);
-
-	spin_lock(&cookie->lock);
-	hlist_for_each_entry(object, &cookie->backing_objects,
-			     cookie_link) {
-		if (object->cache == cache)
-			goto object_already_extant;
-	}
-	spin_unlock(&cookie->lock);
-
-	/* ask the cache to allocate an object (we may end up with duplicate
-	 * objects at this stage, but we sort that out later) */
-	fscache_stat(&fscache_n_cop_alloc_object);
-	object = cache->ops->alloc_object(cache, cookie);
-	fscache_stat_d(&fscache_n_cop_alloc_object);
-	if (IS_ERR(object)) {
-		fscache_stat(&fscache_n_object_no_alloc);
-		ret = PTR_ERR(object);
-		goto error;
-	}
-
-	ASSERTCMP(object->cookie, ==, cookie);
-	fscache_stat(&fscache_n_object_alloc);
-
-	object->debug_id = atomic_inc_return(&fscache_object_debug_id);
-
-	_debug("ALLOC OBJ%x: %s {%lx}",
-	       object->debug_id, cookie->type_name, object->events);
-
-	ret = fscache_alloc_object(cache, cookie->parent);
-	if (ret < 0)
-		goto error_put;
-
-	/* only attach if we managed to allocate all we needed, otherwise
-	 * discard the object we just allocated and instead use the one
-	 * attached to the cookie */
-	if (fscache_attach_object(cookie, object) < 0) {
-		fscache_stat(&fscache_n_cop_put_object);
-		cache->ops->put_object(object, fscache_obj_put_attach_fail);
-		fscache_stat_d(&fscache_n_cop_put_object);
-	}
-
-	_leave(" = 0");
-	return 0;
-
-object_already_extant:
-	ret = -ENOBUFS;
-	if (fscache_object_is_dying(object) ||
-	    fscache_cache_is_broken(object)) {
-		spin_unlock(&cookie->lock);
-		goto error;
-	}
-	spin_unlock(&cookie->lock);
-	_leave(" = 0 [found]");
-	return 0;
-
-error_put:
-	fscache_stat(&fscache_n_cop_put_object);
-	cache->ops->put_object(object, fscache_obj_put_alloc_fail);
-	fscache_stat_d(&fscache_n_cop_put_object);
-error:
-	_leave(" = %d", ret);
-	return ret;
+	if (aux_data || object_size)
+		__fscache_update_cookie(cookie, aux_data, object_size);
+	if (atomic_dec_and_test(&cookie->n_active))
+		wake_up_var(&cookie->n_active);
 }
+EXPORT_SYMBOL(__fscache_unuse_cookie);
 
 /*
- * attach a cache object to a cookie
+ * Attempt to attach the object to the list on the cookie or, if there's an
+ * object already attached, then that is used instead and a ref is taken on it
+ * for the caller.  Returns a pointer to whichever object is selected.
  */
-static int fscache_attach_object(struct fscache_cookie *cookie,
-				 struct fscache_object *object)
+struct fscache_object *fscache_attach_object(struct fscache_cookie *cookie,
+					     struct fscache_object *object)
 {
-	struct fscache_object *p;
+	struct fscache_object *parent, *p, *ret = NULL;
 	struct fscache_cache *cache = object->cache;
-	int ret;
 
 	_enter("{%s},{OBJ%x}", cookie->type_name, object->debug_id);
 
@@ -575,38 +418,26 @@ static int fscache_attach_object(struct fscache_cookie *cookie,
 
 	/* there may be multiple initial creations of this object, but we only
 	 * want one */
-	ret = -EEXIST;
 	hlist_for_each_entry(p, &cookie->backing_objects, cookie_link) {
-		if (p->cache == object->cache) {
-			if (fscache_object_is_dying(p))
-				ret = -ENOBUFS;
-			goto cant_attach_object;
-		}
+		if (p->cache == object->cache)
+			goto exists;
 	}
 
 	/* pin the parent object */
 	spin_lock_nested(&cookie->parent->lock, 1);
-	hlist_for_each_entry(p, &cookie->parent->backing_objects,
-			     cookie_link) {
-		if (p->cache == object->cache) {
-			if (fscache_object_is_dying(p)) {
-				ret = -ENOBUFS;
-				spin_unlock(&cookie->parent->lock);
-				goto cant_attach_object;
-			}
-			object->parent = p;
-			spin_lock(&p->lock);
-			p->n_children++;
-			spin_unlock(&p->lock);
-			break;
-		}
-	}
+
+	parent = object->parent;
+
+	spin_lock(&parent->lock);
+	parent->n_children++;
+	spin_unlock(&parent->lock);
+
 	spin_unlock(&cookie->parent->lock);
 
 	/* attach to the cache's object list */
 	if (list_empty(&object->cache_link)) {
 		spin_lock(&cache->object_list_lock);
-		list_add(&object->cache_link, &cache->object_list);
+		list_add_tail(&object->cache_link, &cache->object_list);
 		spin_unlock(&cache->object_list_lock);
 	}
 
@@ -614,12 +445,29 @@ static int fscache_attach_object(struct fscache_cookie *cookie,
 	hlist_add_head(&object->cookie_link, &cookie->backing_objects);
 
 	fscache_objlist_add(object);
-	ret = 0;
+	cookie->stage = FSCACHE_COOKIE_STAGE_LOOKING_UP;
+	wake_up_var(&cookie->stage);
 
-cant_attach_object:
+out_grab:
+	ret = cache->ops->grab_object(object, fscache_obj_get_attach);
 	spin_unlock(&cookie->lock);
-	_leave(" = %d", ret);
+	_leave(" = %p", ret);
 	return ret;
+
+exists:
+	object = p;
+	goto out_grab;
+}
+
+void fscache_set_cookie_stage(struct fscache_cookie *cookie,
+			      enum fscache_cookie_stage stage)
+{
+	if (cookie->stage != stage) {
+		spin_lock(&cookie->lock);
+		cookie->stage = stage;
+		spin_unlock(&cookie->lock);
+		wake_up_var(&cookie->stage);
+	}
 }
 
 /*
@@ -627,8 +475,6 @@ static int fscache_attach_object(struct fscache_cookie *cookie,
  */
 void __fscache_invalidate(struct fscache_cookie *cookie)
 {
-	struct fscache_object *object;
-
 	_enter("{%s}", cookie->type_name);
 
 	fscache_stat(&fscache_n_invalidates);
@@ -640,26 +486,9 @@ void __fscache_invalidate(struct fscache_cookie *cookie)
 	 */
 	ASSERTCMP(cookie->type, ==, FSCACHE_COOKIE_TYPE_DATAFILE);
 
-	/* If there's an object, we tell the object state machine to handle the
-	 * invalidation on our behalf, otherwise there's nothing to do.
-	 */
-	if (!hlist_empty(&cookie->backing_objects)) {
-		spin_lock(&cookie->lock);
-
-		if (fscache_cookie_enabled(cookie) &&
-		    !hlist_empty(&cookie->backing_objects) &&
-		    !test_and_set_bit(FSCACHE_COOKIE_INVALIDATING,
-				      &cookie->flags)) {
-			object = hlist_entry(cookie->backing_objects.first,
-					     struct fscache_object,
-					     cookie_link);
-			/* TODO: Do invalidation */
-		}
-
-		spin_unlock(&cookie->lock);
-	}
-
-	_leave("");
+	if (!hlist_empty(&cookie->backing_objects) &&
+	    test_and_set_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags))
+		fscache_dispatch(cookie, NULL, 0, fscache_invalidate_object);
 }
 EXPORT_SYMBOL(__fscache_invalidate);
 
@@ -678,9 +507,10 @@ void __fscache_wait_on_invalidate(struct fscache_cookie *cookie)
 EXPORT_SYMBOL(__fscache_wait_on_invalidate);
 
 /*
- * update the index entries backing a cookie
+ * Update the index entries backing a cookie.  The writeback is done lazily.
  */
-void __fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data)
+void __fscache_update_cookie(struct fscache_cookie *cookie,
+			     const void *aux_data, const loff_t *object_size)
 {
 	struct fscache_object *object;
 
@@ -696,16 +526,9 @@ void __fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data
 
 	spin_lock(&cookie->lock);
 
-	fscache_update_aux(cookie, aux_data);
-
-	if (fscache_cookie_enabled(cookie)) {
-		/* update the index entry on disk in each cache backing this
-		 * cookie.
-		 */
-		hlist_for_each_entry(object,
-				     &cookie->backing_objects, cookie_link) {
-			fscache_raise_event(object, FSCACHE_OBJECT_EV_UPDATE);
-		}
+	fscache_update_aux(cookie, aux_data, object_size);
+	hlist_for_each_entry(object, &cookie->backing_objects, cookie_link) {
+		set_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->flags);
 	}
 
 	spin_unlock(&cookie->lock);
@@ -713,90 +536,13 @@ void __fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data
 }
 EXPORT_SYMBOL(__fscache_update_cookie);
 
-/*
- * Disable a cookie to stop it from accepting new requests from the netfs.
- */
-void __fscache_disable_cookie(struct fscache_cookie *cookie,
-			      const void *aux_data,
-			      bool invalidate)
-{
-	struct fscache_object *object;
-	bool awaken = false;
-
-	_enter("%p,%u", cookie, invalidate);
-
-	trace_fscache_disable(cookie);
-
-	ASSERTCMP(atomic_read(&cookie->n_active), >, 0);
-
-	if (atomic_read(&cookie->n_children) != 0) {
-		pr_err("Cookie '%s' still has children\n",
-		       cookie->type_name);
-		BUG();
-	}
-
-	wait_on_bit_lock(&cookie->flags, FSCACHE_COOKIE_ENABLEMENT_LOCK,
-			 TASK_UNINTERRUPTIBLE);
-
-	fscache_update_aux(cookie, aux_data);
-
-	if (!test_and_clear_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags))
-		goto out_unlock_enable;
-
-	/* If the cookie is being invalidated, wait for that to complete first
-	 * so that we can reuse the flag.
-	 */
-	__fscache_wait_on_invalidate(cookie);
-
-	/* Dispose of the backing objects */
-	set_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags);
-
-	spin_lock(&cookie->lock);
-	if (!hlist_empty(&cookie->backing_objects)) {
-		hlist_for_each_entry(object, &cookie->backing_objects, cookie_link) {
-			if (invalidate)
-				set_bit(FSCACHE_OBJECT_RETIRED, &object->flags);
-			fscache_raise_event(object, FSCACHE_OBJECT_EV_KILL);
-		}
-	} else {
-		if (test_and_clear_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags))
-			awaken = true;
-	}
-	spin_unlock(&cookie->lock);
-	if (awaken)
-		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING);
-
-	/* Wait for cessation of activity requiring access to the netfs (when
-	 * n_active reaches 0).  This makes sure outstanding reads and writes
-	 * have completed.
-	 */
-	if (!atomic_dec_and_test(&cookie->n_active)) {
-		wait_var_event(&cookie->n_active,
-			       !atomic_read(&cookie->n_active));
-	}
-
-	/* Reset the cookie state if it wasn't relinquished */
-	if (!test_bit(FSCACHE_COOKIE_RELINQUISHED, &cookie->flags)) {
-		atomic_inc(&cookie->n_active);
-		set_bit(FSCACHE_COOKIE_NO_DATA_YET, &cookie->flags);
-	}
-
-out_unlock_enable:
-	clear_bit_unlock(FSCACHE_COOKIE_ENABLEMENT_LOCK, &cookie->flags);
-	wake_up_bit(&cookie->flags, FSCACHE_COOKIE_ENABLEMENT_LOCK);
-	_leave("");
-}
-EXPORT_SYMBOL(__fscache_disable_cookie);
-
 /*
  * release a cookie back to the cache
  * - the object will be marked as recyclable on disk if retire is true
  * - all dependents of this cookie must have already been unregistered
  *   (indices/files/pages)
  */
-void __fscache_relinquish_cookie(struct fscache_cookie *cookie,
-				 const void *aux_data,
-				 bool retire)
+void __fscache_relinquish_cookie(struct fscache_cookie *cookie, bool retire)
 {
 	fscache_stat(&fscache_n_relinquishes);
 	if (retire)
@@ -818,7 +564,23 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie,
 	if (test_and_set_bit(FSCACHE_COOKIE_RELINQUISHED, &cookie->flags))
 		BUG();
 
-	__fscache_disable_cookie(cookie, aux_data, retire);
+	/* Mark the cookie out of service and wait for cessation of I/O. */
+	spin_lock(&cookie->lock);
+	cookie->stage = FSCACHE_COOKIE_STAGE_DEAD;
+	atomic_dec(&cookie->n_ops);
+	spin_unlock(&cookie->lock);
+
+	wait_var_event(&cookie->n_ops, atomic_read(&cookie->n_ops) == 0);
+
+	/* Dispose of the backing objects */
+	if (!hlist_empty(&cookie->backing_objects))
+		fscache_dispatch(cookie, NULL, retire, fscache_discard_objects);
+
+	if (atomic_read(&cookie->n_children) != 0) {
+		pr_err("Cookie '%s' still has children\n",
+		       cookie->type_name);
+		BUG();
+	}
 
 	if (cookie->parent) {
 		ASSERTCMP(atomic_read(&cookie->parent->usage), >, 0);
@@ -879,6 +641,7 @@ void fscache_cookie_put(struct fscache_cookie *cookie,
 
 	_leave("");
 }
+EXPORT_SYMBOL(fscache_cookie_put);
 
 /*
  * Generate a list of extant cookies in /proc/fs/fscache/cookies
@@ -892,8 +655,8 @@ static int fscache_cookies_seq_show(struct seq_file *m, void *v)
 
 	if (v == &fscache_cookies) {
 		seq_puts(m,
-			 "COOKIE   PARENT   USAGE CHILD ACT TY FL  DEF             \n"
-			 "======== ======== ===== ===== === == === ================\n"
+			 "COOKIE   PARENT   USAGE CHILD ACT OPS TY S FL  DEF             \n"
+			 "======== ======== ===== ===== === === == = === ================\n"
 			 );
 		return 0;
 	}
@@ -915,13 +678,15 @@ static int fscache_cookies_seq_show(struct seq_file *m, void *v)
 	}
 
 	seq_printf(m,
-		   "%08x %08x %5u %5u %3u %s %03lx %-16s",
+		   "%08x %08x %5d %5d %3d %3d %s %u %03lx %-16s",
 		   cookie->debug_id,
 		   cookie->parent ? cookie->parent->debug_id : 0,
 		   atomic_read(&cookie->usage),
 		   atomic_read(&cookie->n_children),
 		   atomic_read(&cookie->n_active),
+		   atomic_read(&cookie->n_ops) - 1,
 		   type,
+		   cookie->stage,
 		   cookie->flags,
 		   cookie->type_name);
 
diff --git a/fs/fscache/fsdef.c b/fs/fscache/fsdef.c
index b802cbddb578..377151a6a54a 100644
--- a/fs/fscache/fsdef.c
+++ b/fs/fscache/fsdef.c
@@ -40,7 +40,6 @@ struct fscache_cookie fscache_fsdef_index = {
 	.lock		= __SPIN_LOCK_UNLOCKED(fscache_fsdef_index.lock),
 	.backing_objects = HLIST_HEAD_INIT,
 	.type_name	= ".fscach",
-	.flags		= 1 << FSCACHE_COOKIE_ENABLED,
 	.type		= FSCACHE_COOKIE_TYPE_INDEX,
 };
 EXPORT_SYMBOL(fscache_fsdef_index);
diff --git a/fs/fscache/histogram.c b/fs/fscache/histogram.c
index 4e5beeaaf454..4fd5dda1d0ad 100644
--- a/fs/fscache/histogram.c
+++ b/fs/fscache/histogram.c
@@ -5,7 +5,7 @@
  * Written by David Howells (dhowells@redhat.com)
  */
 
-#define FSCACHE_DEBUG_LEVEL THREAD
+#define FSCACHE_DEBUG_LEVEL CACHE
 #include <linux/module.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 2100e2222884..443671310e31 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -74,6 +74,10 @@ extern struct fscache_cookie *fscache_alloc_cookie(struct fscache_cookie *,
 extern struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *);
 extern void fscache_cookie_put(struct fscache_cookie *,
 			       enum fscache_cookie_trace);
+extern struct fscache_object *fscache_attach_object(struct fscache_cookie *,
+						    struct fscache_object *);
+extern void fscache_set_cookie_stage(struct fscache_cookie *,
+				     enum fscache_cookie_stage);
 
 /*
  * dispatcher.c
@@ -119,19 +123,15 @@ extern unsigned fscache_defer_lookup;
 extern unsigned fscache_defer_create;
 extern unsigned fscache_debug;
 extern struct kobject *fscache_root;
-extern struct workqueue_struct *fscache_object_wq;
 extern struct workqueue_struct *fscache_op_wq;
-DECLARE_PER_CPU(wait_queue_head_t, fscache_object_cong_wait);
-
-static inline bool fscache_object_congested(void)
-{
-	return workqueue_congested(WORK_CPU_UNBOUND, fscache_object_wq);
-}
 
 /*
- * object.c
+ * obj.c
  */
-extern void fscache_enqueue_object(struct fscache_object *);
+extern void fscache_lookup_object(struct fscache_cookie *, struct fscache_object *, int);
+extern void fscache_invalidate_object(struct fscache_cookie *, struct fscache_object *, int);
+extern void fscache_drop_object(struct fscache_cookie *, struct fscache_object *, bool);
+extern void fscache_discard_objects(struct fscache_cookie *, struct fscache_object *, int);
 
 /*
  * object-list.c
@@ -239,44 +239,29 @@ int fscache_stats_show(struct seq_file *m, void *v);
 #define fscache_stat_d(stat) do {} while (0)
 #endif
 
-/*
- * raise an event on an object
- * - if the event is not masked for that object, then the object is
- *   queued for attention by the thread pool.
- */
-static inline void fscache_raise_event(struct fscache_object *object,
-				       unsigned event)
-{
-	BUG_ON(event >= NR_FSCACHE_OBJECT_EVENTS);
-#if 0
-	printk("*** fscache_raise_event(OBJ%d{%lx},%x)\n",
-	       object->debug_id, object->event_mask, (1 << event));
-#endif
-	if (!test_and_set_bit(event, &object->events) &&
-	    test_bit(event, &object->event_mask))
-		fscache_enqueue_object(object);
-}
-
-static inline void fscache_cookie_get(struct fscache_cookie *cookie,
-				      enum fscache_cookie_trace where)
+static inline
+struct fscache_cookie *fscache_cookie_get(struct fscache_cookie *cookie,
+					  enum fscache_cookie_trace where)
 {
 	int usage = atomic_inc_return(&cookie->usage);
 
 	trace_fscache_cookie(cookie, where, usage);
+	return cookie;
 }
 
 /*
  * Update the auxiliary data on a cookie.
  */
 static inline
-void fscache_update_aux(struct fscache_cookie *cookie, const void *aux_data)
+void fscache_update_aux(struct fscache_cookie *cookie,
+			const void *aux_data, const loff_t *object_size)
 {
 	void *p = fscache_get_aux(cookie);
 
-	if (p && memcmp(p, aux_data, cookie->aux_len) != 0) {
+	if (aux_data && p)
 		memcpy(p, aux_data, cookie->aux_len);
-		set_bit(FSCACHE_COOKIE_AUX_UPDATED, &cookie->flags);
-	}
+	if (object_size)
+		cookie->object_size = *object_size;
 }
 
 /*****************************************************************************/
@@ -337,7 +322,7 @@ do {						\
 
 #define FSCACHE_DEBUG_CACHE	0
 #define FSCACHE_DEBUG_COOKIE	1
-#define FSCACHE_DEBUG_PAGE	2
+#define FSCACHE_DEBUG_OBJECT	2
 #define FSCACHE_DEBUG_OPERATION	3
 
 #define FSCACHE_POINT_ENTER	1
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index c8f1beafa8e1..b2691439377b 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -38,11 +38,8 @@ MODULE_PARM_DESC(fscache_debug,
 		 "FS-Cache debugging mask");
 
 struct kobject *fscache_root;
-struct workqueue_struct *fscache_object_wq;
 struct workqueue_struct *fscache_op_wq;
 
-DEFINE_PER_CPU(wait_queue_head_t, fscache_object_cong_wait);
-
 /* these values serve as lower bounds, will be adjusted in fscache_init() */
 static unsigned fscache_object_max_active = 4;
 static unsigned fscache_op_max_active = 2;
@@ -64,14 +61,6 @@ static int fscache_max_active_sysctl(struct ctl_table *table, int write,
 }
 
 static struct ctl_table fscache_sysctls[] = {
-	{
-		.procname	= "object_max_active",
-		.data		= &fscache_object_max_active,
-		.maxlen		= sizeof(unsigned),
-		.mode		= 0644,
-		.proc_handler	= fscache_max_active_sysctl,
-		.extra1		= &fscache_object_wq,
-	},
 	{
 		.procname	= "operation_max_active",
 		.data		= &fscache_op_max_active,
@@ -98,20 +87,8 @@ static struct ctl_table fscache_sysctls_root[] = {
  */
 static int __init fscache_init(void)
 {
-	unsigned int nr_cpus = num_possible_cpus();
-	unsigned int cpu;
 	int ret;
 
-	fscache_object_max_active =
-		clamp_val(nr_cpus,
-			  fscache_object_max_active, WQ_UNBOUND_MAX_ACTIVE);
-
-	ret = -ENOMEM;
-	fscache_object_wq = alloc_workqueue("fscache_object", WQ_UNBOUND,
-					    fscache_object_max_active);
-	if (!fscache_object_wq)
-		goto error_object_wq;
-
 	fscache_op_max_active =
 		clamp_val(fscache_object_max_active / 2,
 			  fscache_op_max_active, WQ_UNBOUND_MAX_ACTIVE);
@@ -122,9 +99,6 @@ static int __init fscache_init(void)
 	if (!fscache_op_wq)
 		goto error_op_wq;
 
-	for_each_possible_cpu(cpu)
-		init_waitqueue_head(&per_cpu(fscache_object_cong_wait, cpu));
-
 	ret = fscache_init_dispatchers();
 	if (ret < 0)
 		goto error_dispatchers;
@@ -169,8 +143,6 @@ static int __init fscache_init(void)
 error_proc:
 	destroy_workqueue(fscache_op_wq);
 error_op_wq:
-	destroy_workqueue(fscache_object_wq);
-error_object_wq:
 	return ret;
 }
 
@@ -191,7 +163,6 @@ static void __exit fscache_exit(void)
 	fscache_proc_cleanup();
 	fscache_kill_dispatchers();
 	destroy_workqueue(fscache_op_wq);
-	destroy_workqueue(fscache_object_wq);
 	pr_notice("Unloaded\n");
 }
 
diff --git a/fs/fscache/netfs.c b/fs/fscache/netfs.c
index f8a816f844f6..7ebf733b068b 100644
--- a/fs/fscache/netfs.c
+++ b/fs/fscache/netfs.c
@@ -32,7 +32,7 @@ int __fscache_register_netfs(struct fscache_netfs *netfs)
 		return -ENOMEM;
 	}
 
-	candidate->flags = 1 << FSCACHE_COOKIE_ENABLED;
+	trace_fscache_cookie(candidate, fscache_cookie_new_netfs, 1);
 
 	/* check the netfs type is not already present */
 	cookie = fscache_hash_cookie(candidate);
@@ -68,7 +68,7 @@ void __fscache_unregister_netfs(struct fscache_netfs *netfs)
 {
 	_enter("{%s.%u}", netfs->name, netfs->version);
 
-	fscache_relinquish_cookie(netfs->primary_index, NULL, false);
+	fscache_relinquish_cookie(netfs->primary_index, false);
 	pr_notice("Netfs '%s' unregistered from caching\n", netfs->name);
 
 	_leave("");
diff --git a/fs/fscache/obj.c b/fs/fscache/obj.c
new file mode 100644
index 000000000000..63373b99ac34
--- /dev/null
+++ b/fs/fscache/obj.c
@@ -0,0 +1,352 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Cache object management
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for more information on
+ * the netfs API.
+ */
+
+#define FSCACHE_DEBUG_LEVEL OPERATION
+#include <linux/module.h>
+#include <linux/slab.h>
+#include "internal.h"
+
+static struct fscache_object *fscache_do_alloc_object(struct fscache_cookie *cookie,
+						      struct fscache_cache *cache,
+						      struct fscache_object *parent)
+{
+	struct fscache_object *object;
+
+	fscache_stat(&fscache_n_cop_alloc_object);
+	object = cache->ops->alloc_object(cookie, cache, parent);
+	fscache_stat_d(&fscache_n_cop_alloc_object);
+
+	if (!IS_ERR(object)) {
+		ASSERTCMP(object->cookie, ==, cookie);
+		_debug("ALLOC OBJ%x: %s", object->debug_id, cookie->type_name);
+		fscache_stat(&fscache_n_object_alloc);
+	} else {
+		fscache_stat(&fscache_n_object_no_alloc);
+	}
+
+	return object;
+}
+
+static int fscache_do_lookup_object(struct fscache_object *object, void *data)
+{
+	int ret;
+	fscache_stat(&fscache_n_object_lookups);
+	fscache_stat(&fscache_n_cop_lookup_object);
+	ret = object->cache->ops->lookup_object(object, data);
+	fscache_stat_d(&fscache_n_cop_lookup_object);
+	return ret;
+}
+
+static int fscache_do_create_object(struct fscache_object *object, void *data)
+{
+	int ret;
+	fscache_stat(&fscache_n_object_lookups);
+	fscache_stat(&fscache_n_cop_lookup_object);
+	ret = object->cache->ops->create_object(object, data);
+	fscache_stat_d(&fscache_n_cop_lookup_object);
+	return ret;
+}
+
+static void fscache_do_update_object(struct fscache_object *object)
+{
+	fscache_stat(&fscache_n_updates_run);
+	fscache_stat(&fscache_n_cop_update_object);
+	object->cache->ops->update_object(object);
+	fscache_stat_d(&fscache_n_cop_update_object);
+}
+
+static void fscache_do_drop_object(struct fscache_cache *cache,
+				   struct fscache_object *object,
+				   bool invalidate)
+{
+	fscache_stat(&fscache_n_cop_drop_object);
+	cache->ops->drop_object(object, invalidate);
+	fscache_stat_d(&fscache_n_cop_drop_object);
+}
+
+static void fscache_do_put_object(struct fscache_object *object,
+				  enum fscache_obj_ref_trace why)
+{
+	fscache_stat(&fscache_n_cop_put_object);
+	object->cache->ops->put_object(object, why);
+	fscache_stat_d(&fscache_n_cop_put_object);
+}
+
+/*
+ * Do the actual on-disk wrangling involved in object lookup/creation.
+ */
+static bool fscache_wrangle_object(struct fscache_cookie *cookie,
+				   struct fscache_cache *cache,
+				   struct fscache_object *object)
+{
+
+	void *lookup_data;
+	bool ret = false;
+
+	lookup_data = cache->ops->prepare_lookup_data(object);
+	if (IS_ERR(lookup_data))
+		goto out;
+
+	if (!fscache_do_lookup_object(object, lookup_data))
+		goto out_free;
+
+	if (object->stage < FSCACHE_OBJECT_STAGE_LIVE_EMPTY &&
+	    !fscache_do_create_object(object, lookup_data))
+		goto out_free;
+
+	fscache_set_cookie_stage(cookie,
+				 (object->stage < FSCACHE_OBJECT_STAGE_LIVE ?
+				  FSCACHE_COOKIE_STAGE_NO_DATA_YET :
+				  FSCACHE_COOKIE_STAGE_ACTIVE));
+	ret = true;
+
+out_free:
+	cache->ops->free_lookup_data(object, lookup_data);
+out:
+	return ret;
+}
+
+/*
+ * Create an object chain, making sure that the index chain is fully created.
+ */
+static struct fscache_object *fscache_lookup_object_chain(struct fscache_cookie *cookie,
+							  struct fscache_cache *cache)
+{
+	struct fscache_object *object = NULL, *parent, *xobject;
+
+	_enter("c=%08x", cookie->debug_id);
+
+	spin_lock(&cookie->lock);
+	hlist_for_each_entry(object, &cookie->backing_objects, cookie_link) {
+		if (object->cache == cache)
+			goto object_exists_grab;
+	}
+	spin_unlock(&cookie->lock);
+
+	/* Recurse to look up/create the parent index. */
+	parent = fscache_lookup_object_chain(cookie->parent, cache);
+	if (IS_ERR(parent))
+		goto error;
+
+	/* Ask the cache to allocate an object (we may end up with duplicate
+	 * objects at this stage, but we sort that out later).
+	 *
+	 * The object may be created, say, with O_TMPFILE at this point if the
+	 * parent index was unpopulated.  Note that this may race on index
+	 * creation with other callers.
+	 */
+	object = fscache_do_alloc_object(cookie, cache, parent);
+	if (!object)
+		goto error;
+
+	xobject = fscache_attach_object(cookie, object);
+	if (xobject != object) {
+		fscache_do_put_object(object, fscache_obj_put_alloc_dup);
+		object = xobject;
+		if (fscache_cache_is_broken(object))
+			goto error_put;
+		goto object_exists;
+	}
+
+	if (!fscache_wrangle_object(cookie, cache, object))
+		goto error_detach;
+
+	_leave(" = %u", object->debug_id);
+	return object;
+
+object_exists_grab:
+	object = cache->ops->grab_object(object, fscache_obj_get_exists);
+	if (fscache_cache_is_broken(object)) {
+		spin_unlock(&cookie->lock);
+		goto error_put;
+	}
+
+	spin_unlock(&cookie->lock);
+
+object_exists:
+	wait_var_event(&object->stage,
+		       object->stage >= FSCACHE_OBJECT_STAGE_LIVE_EMPTY);
+
+	if (object->stage >= FSCACHE_OBJECT_STAGE_DESTROYING)
+		goto error_put;
+
+	_leave(" = %u [found]", object->debug_id);
+	return object;
+
+error_detach:
+	spin_lock(&cookie->lock);
+	spin_lock(&object->lock);
+	object->parent = NULL;
+	object->stage = FSCACHE_OBJECT_STAGE_DEAD;
+	cookie->stage = FSCACHE_COOKIE_STAGE_DEAD;
+	hlist_del_init(&object->cookie_link);
+	spin_unlock(&object->lock);
+	spin_unlock(&cookie->lock);
+	fscache_drop_object(cookie, object, false);
+error_put:
+	fscache_do_put_object(object, fscache_obj_put_lookup_fail);
+error:
+	fscache_set_cookie_stage(cookie, FSCACHE_COOKIE_STAGE_QUIESCENT);
+	_leave(" = NULL");
+	return NULL;
+}
+
+/*
+ * Create an object in the cache.
+ * - this must make sure the index chain is instantiated and instantiate the
+ *   object representation too
+ */
+static void fscache_lookup_object_locked(struct fscache_cookie *cookie)
+{
+	struct fscache_object *object;
+	struct fscache_cache *cache;
+
+	_enter("");
+
+	/* select a cache in which to store the object */
+	cache = fscache_select_cache_for_object(cookie);
+	if (!cache) {
+		fscache_stat(&fscache_n_acquires_no_cache);
+		fscache_set_cookie_stage(cookie, FSCACHE_COOKIE_STAGE_QUIESCENT);
+		_leave(" [no cache]");
+		return;
+	}
+
+	_debug("cache %s", cache->tag->name);
+
+	object = fscache_lookup_object_chain(cookie, cache);
+	if (!object) {
+		_leave(" [fail]");
+		return;
+	}
+
+	fscache_do_put_object(object, fscache_obj_put);
+	_leave(" [done]");
+}
+
+void fscache_lookup_object(struct fscache_cookie *cookie,
+			   struct fscache_object *object, int param)
+{
+	down_read(&fscache_addremove_sem);
+	fscache_lookup_object_locked(cookie);
+	up_read(&fscache_addremove_sem);
+	__fscache_unuse_cookie(cookie, NULL, NULL);
+}
+
+/*
+ * Invalidate an object
+ */
+void fscache_invalidate_object(struct fscache_cookie *cookie,
+			       struct fscache_object *unused, int param)
+{
+	struct fscache_object *object = NULL;
+
+	spin_lock(&cookie->lock);
+
+	if (!hlist_empty(&cookie->backing_objects)) {
+		object = hlist_entry(cookie->backing_objects.first,
+				     struct fscache_object,
+				     cookie_link);
+		object = object->cache->ops->grab_object(object,
+							 fscache_obj_get_inval);
+	}
+
+	spin_unlock(&cookie->lock);
+
+	if (object) {
+		object->cache->ops->invalidate_object(object);
+		fscache_do_put_object(object, fscache_obj_put_inval);
+	}
+
+	clear_bit_unlock(FSCACHE_COOKIE_INVALIDATING, &cookie->flags);
+	wake_up_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING);
+}
+
+/*
+ * Drop an object's attachments
+ */
+void fscache_drop_object(struct fscache_cookie *cookie,
+			 struct fscache_object *object,
+			 bool invalidate)
+{
+	struct fscache_object *parent = object->parent;
+	struct fscache_cache *cache = object->cache;
+
+	ASSERTCMP(cookie->stage, ==, FSCACHE_COOKIE_STAGE_DEAD);
+
+	_enter("{OBJ%x,%d},%u",
+	       object->debug_id, object->n_children, invalidate);
+
+	if (!invalidate &&
+	    test_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->flags)) {
+		_debug("final update");
+		fscache_do_update_object(object);
+	}
+
+	spin_lock(&cache->object_list_lock);
+	list_del_init(&object->cache_link);
+	spin_unlock(&cache->object_list_lock);
+
+	fscache_do_drop_object(cache, object, invalidate);
+
+	/* The parent object wants to know when all it dependents have gone */
+	if (parent) {
+		_debug("release parent OBJ%x {%d}",
+		       parent->debug_id, parent->n_children);
+
+		spin_lock(&parent->lock);
+		parent->n_children--;
+		if (parent->n_children == 0)
+			wake_up_var(&parent->n_children);
+		spin_unlock(&parent->lock);
+		fscache_do_put_object(parent, fscache_obj_put_drop_child);
+	}
+
+	fscache_do_put_object(object, fscache_obj_put_drop_obj);
+	fscache_stat(&fscache_n_object_dead);
+	_leave("");
+}
+
+/*
+ * Discard objects.  param==1 to invalidate it at the same time.
+ */
+void fscache_discard_objects(struct fscache_cookie *cookie,
+			     struct fscache_object *unused, int param)
+{
+	_enter("c=%08x", cookie->debug_id);
+
+	/* Take the cache lock to prevent a fight with cache withdrawal */
+	down_read(&fscache_addremove_sem);
+
+	for (;;) {
+		struct fscache_object *object = NULL;
+
+		trace_fscache_cookie(cookie, fscache_cookie_see_discard,
+				     atomic_read(&cookie->usage));
+
+		spin_lock(&cookie->lock);
+		if (!hlist_empty(&cookie->backing_objects)) {
+			object = hlist_entry(cookie->backing_objects.first,
+					     struct fscache_object,
+					     cookie_link);
+			hlist_del_init(&object->cookie_link);
+		}
+		spin_unlock(&cookie->lock);
+
+		if (!object)
+			break;
+
+		_debug("DISCARD %x", object->debug_id);
+		fscache_drop_object(cookie, object, param);
+	}
+
+	up_read(&fscache_addremove_sem);
+	_leave("");
+}
diff --git a/fs/fscache/object-list.c b/fs/fscache/object-list.c
index 147c556ee01b..5777f909d31a 100644
--- a/fs/fscache/object-list.c
+++ b/fs/fscache/object-list.c
@@ -23,10 +23,6 @@ struct fscache_objlist_data {
 #define FSCACHE_OBJLIST_CONFIG_AUX	0x00000002	/* show object auxdata */
 #define FSCACHE_OBJLIST_CONFIG_BUSY	0x00000010	/* show busy objects */
 #define FSCACHE_OBJLIST_CONFIG_IDLE	0x00000020	/* show idle objects */
-#define FSCACHE_OBJLIST_CONFIG_EVENTS	0x00000400	/* show objects with events */
-#define FSCACHE_OBJLIST_CONFIG_NOEVENTS	0x00000800	/* show objects without no events */
-#define FSCACHE_OBJLIST_CONFIG_WORK	0x00001000	/* show objects with work */
-#define FSCACHE_OBJLIST_CONFIG_NOWORK	0x00002000	/* show objects without work */
 };
 
 /*
@@ -162,32 +158,15 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 	u8 *p;
 
 	if ((unsigned long) v == 1) {
-		seq_puts(m, "OBJECT   PARENT   STAT CHLDN OPS OOP"
-			 " EM EV FL S"
-			 " | COOKIE   TYPE    TY FL");
-		if (config & (FSCACHE_OBJLIST_CONFIG_KEY |
-			      FSCACHE_OBJLIST_CONFIG_AUX))
-			seq_puts(m, "       ");
-		if (config & FSCACHE_OBJLIST_CONFIG_KEY)
-			seq_puts(m, "OBJECT_KEY");
-		if ((config & (FSCACHE_OBJLIST_CONFIG_KEY |
-			       FSCACHE_OBJLIST_CONFIG_AUX)) ==
-		    (FSCACHE_OBJLIST_CONFIG_KEY | FSCACHE_OBJLIST_CONFIG_AUX))
-			seq_puts(m, ", ");
-		if (config & FSCACHE_OBJLIST_CONFIG_AUX)
-			seq_puts(m, "AUX_DATA");
+		seq_puts(m, "OBJECT   PARENT   USE CHLDN OPS FL  S"
+			 " | COOKIE   TYPE    TY S FLG CONTENT_MAP");
 		seq_puts(m, "\n");
 		return 0;
 	}
 
 	if ((unsigned long) v == 2) {
-		seq_puts(m, "======== ======== ==== ===== === ==="
-			 " == == == ="
-			 " | ======== ======= == ===");
-		if (config & (FSCACHE_OBJLIST_CONFIG_KEY |
-			      FSCACHE_OBJLIST_CONFIG_AUX))
-			seq_puts(m, " ================");
-		seq_puts(m, "\n");
+		seq_puts(m, "======== ======== === ===== === === ="
+			 " | ======== ======= == = === ================\n");
 		return 0;
 	}
 
@@ -207,29 +186,19 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 
 	cookie = obj->cookie;
 	if (~config) {
-		FILTER(fscache_object_is_active(obj) ||
-		       obj->n_ops != 0 ||
-		       obj->n_obj_ops != 0 ||
-		       obj->flags ||
-		       !list_empty(&obj->dependents),
+		FILTER(atomic_read(&cookie->n_ops) > 0,
 		       BUSY, IDLE);
-		FILTER(obj->events & obj->event_mask,
-		       EVENTS, NOEVENTS);
-		FILTER(work_busy(&obj->work), WORK, NOWORK);
 	}
 
 	seq_printf(m,
-		   "%08x %08x %s %5u %3u %3u %2lx %2lx %2lx %1x | ",
+		   "%08x %08x %3u %5u %3u %3lx %u | ",
 		   obj->debug_id,
 		   obj->parent ? obj->parent->debug_id : UINT_MAX,
-		   obj->state->short_name,
+		   obj->cache->ops->get_object_usage(obj),
 		   obj->n_children,
-		   obj->n_ops,
-		   obj->n_obj_ops,
-		   obj->event_mask,
-		   obj->events,
+		   atomic_read(&obj->cookie->n_ops),
 		   obj->flags,
-		   work_busy(&obj->work));
+		   obj->stage);
 
 	if (obj->cookie) {
 		uint16_t keylen = 0, auxlen = 0;
@@ -248,10 +217,11 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 			break;
 		}
 
-		seq_printf(m, "%08x %-7s %s %3lx",
+		seq_printf(m, "%08x %-7s %s %u %3lx",
 			   cookie->debug_id,
 			   cookie->type_name,
 			   type,
+			   cookie->stage,
 			   cookie->flags);
 
 		if (config & FSCACHE_OBJLIST_CONFIG_KEY)
@@ -325,8 +295,6 @@ static void fscache_objlist_config(struct fscache_objlist_data *data)
 		case 'A': config |= FSCACHE_OBJLIST_CONFIG_AUX;		break;
 		case 'B': config |= FSCACHE_OBJLIST_CONFIG_BUSY;	break;
 		case 'b': config |= FSCACHE_OBJLIST_CONFIG_IDLE;	break;
-		case 'S': config |= FSCACHE_OBJLIST_CONFIG_WORK;	break;
-		case 's': config |= FSCACHE_OBJLIST_CONFIG_NOWORK;	break;
 		}
 	}
 
@@ -335,10 +303,6 @@ static void fscache_objlist_config(struct fscache_objlist_data *data)
 
 	if (!(config & (FSCACHE_OBJLIST_CONFIG_BUSY | FSCACHE_OBJLIST_CONFIG_IDLE)))
 	    config   |= FSCACHE_OBJLIST_CONFIG_BUSY | FSCACHE_OBJLIST_CONFIG_IDLE;
-	if (!(config & (FSCACHE_OBJLIST_CONFIG_EVENTS | FSCACHE_OBJLIST_CONFIG_NOEVENTS)))
-	    config   |= FSCACHE_OBJLIST_CONFIG_EVENTS | FSCACHE_OBJLIST_CONFIG_NOEVENTS;
-	if (!(config & (FSCACHE_OBJLIST_CONFIG_WORK | FSCACHE_OBJLIST_CONFIG_NOWORK)))
-	    config   |= FSCACHE_OBJLIST_CONFIG_WORK | FSCACHE_OBJLIST_CONFIG_NOWORK;
 
 	data->config = config;
 	return;
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
deleted file mode 100644
index 5d50976bf379..000000000000
--- a/fs/fscache/object.c
+++ /dev/null
@@ -1,982 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/* FS-Cache object state machine handler
- *
- * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- *
- * See Documentation/filesystems/caching/object.rst for a description of the
- * object state machine and the in-kernel representations.
- */
-
-#define FSCACHE_DEBUG_LEVEL COOKIE
-#include <linux/module.h>
-#include <linux/slab.h>
-#include <linux/prefetch.h>
-#include "internal.h"
-
-static const struct fscache_state *fscache_abort_initialisation(struct fscache_object *, int);
-static const struct fscache_state *fscache_kill_dependents(struct fscache_object *, int);
-static const struct fscache_state *fscache_drop_object(struct fscache_object *, int);
-static const struct fscache_state *fscache_initialise_object(struct fscache_object *, int);
-static const struct fscache_state *fscache_invalidate_object(struct fscache_object *, int);
-static const struct fscache_state *fscache_jumpstart_dependents(struct fscache_object *, int);
-static const struct fscache_state *fscache_kill_object(struct fscache_object *, int);
-static const struct fscache_state *fscache_lookup_failure(struct fscache_object *, int);
-static const struct fscache_state *fscache_look_up_object(struct fscache_object *, int);
-static const struct fscache_state *fscache_object_available(struct fscache_object *, int);
-static const struct fscache_state *fscache_parent_ready(struct fscache_object *, int);
-static const struct fscache_state *fscache_update_object(struct fscache_object *, int);
-static const struct fscache_state *fscache_object_dead(struct fscache_object *, int);
-
-#define __STATE_NAME(n) fscache_osm_##n
-#define STATE(n) (&__STATE_NAME(n))
-
-/*
- * Define a work state.  Work states are execution states.  No event processing
- * is performed by them.  The function attached to a work state returns a
- * pointer indicating the next state to which the state machine should
- * transition.  Returning NO_TRANSIT repeats the current state, but goes back
- * to the scheduler first.
- */
-#define WORK_STATE(n, sn, f) \
-	const struct fscache_state __STATE_NAME(n) = {			\
-		.name = #n,						\
-		.short_name = sn,					\
-		.work = f						\
-	}
-
-/*
- * Returns from work states.
- */
-#define transit_to(state) ({ prefetch(&STATE(state)->work); STATE(state); })
-
-#define NO_TRANSIT ((struct fscache_state *)NULL)
-
-/*
- * Define a wait state.  Wait states are event processing states.  No execution
- * is performed by them.  Wait states are just tables of "if event X occurs,
- * clear it and transition to state Y".  The dispatcher returns to the
- * scheduler if none of the events in which the wait state has an interest are
- * currently pending.
- */
-#define WAIT_STATE(n, sn, ...) \
-	const struct fscache_state __STATE_NAME(n) = {			\
-		.name = #n,						\
-		.short_name = sn,					\
-		.work = NULL,						\
-		.transitions = { __VA_ARGS__, { 0, NULL } }		\
-	}
-
-#define TRANSIT_TO(state, emask) \
-	{ .events = (emask), .transit_to = STATE(state) }
-
-/*
- * The object state machine.
- */
-static WORK_STATE(INIT_OBJECT,		"INIT", fscache_initialise_object);
-static WORK_STATE(PARENT_READY,		"PRDY", fscache_parent_ready);
-static WORK_STATE(ABORT_INIT,		"ABRT", fscache_abort_initialisation);
-static WORK_STATE(LOOK_UP_OBJECT,	"LOOK", fscache_look_up_object);
-static WORK_STATE(CREATE_OBJECT,	"CRTO", fscache_look_up_object);
-static WORK_STATE(OBJECT_AVAILABLE,	"AVBL", fscache_object_available);
-static WORK_STATE(JUMPSTART_DEPS,	"JUMP", fscache_jumpstart_dependents);
-
-static WORK_STATE(INVALIDATE_OBJECT,	"INVL", fscache_invalidate_object);
-static WORK_STATE(UPDATE_OBJECT,	"UPDT", fscache_update_object);
-
-static WORK_STATE(LOOKUP_FAILURE,	"LCFL", fscache_lookup_failure);
-static WORK_STATE(KILL_OBJECT,		"KILL", fscache_kill_object);
-static WORK_STATE(KILL_DEPENDENTS,	"KDEP", fscache_kill_dependents);
-static WORK_STATE(DROP_OBJECT,		"DROP", fscache_drop_object);
-static WORK_STATE(OBJECT_DEAD,		"DEAD", fscache_object_dead);
-
-static WAIT_STATE(WAIT_FOR_INIT,	"?INI",
-		  TRANSIT_TO(INIT_OBJECT,	1 << FSCACHE_OBJECT_EV_NEW_CHILD));
-
-static WAIT_STATE(WAIT_FOR_PARENT,	"?PRN",
-		  TRANSIT_TO(PARENT_READY,	1 << FSCACHE_OBJECT_EV_PARENT_READY));
-
-static WAIT_STATE(WAIT_FOR_CMD,		"?CMD",
-		  TRANSIT_TO(INVALIDATE_OBJECT,	1 << FSCACHE_OBJECT_EV_INVALIDATE),
-		  TRANSIT_TO(UPDATE_OBJECT,	1 << FSCACHE_OBJECT_EV_UPDATE),
-		  TRANSIT_TO(JUMPSTART_DEPS,	1 << FSCACHE_OBJECT_EV_NEW_CHILD));
-
-static WAIT_STATE(WAIT_FOR_CLEARANCE,	"?CLR",
-		  TRANSIT_TO(KILL_OBJECT,	1 << FSCACHE_OBJECT_EV_CLEARED));
-
-/*
- * Out-of-band event transition tables.  These are for handling unexpected
- * events, such as an I/O error.  If an OOB event occurs, the state machine
- * clears and disables the event and forces a transition to the nominated work
- * state (acurrently executing work states will complete first).
- *
- * In such a situation, object->state remembers the state the machine should
- * have been in/gone to and returning NO_TRANSIT returns to that.
- */
-static const struct fscache_transition fscache_osm_init_oob[] = {
-	   TRANSIT_TO(ABORT_INIT,
-		      (1 << FSCACHE_OBJECT_EV_ERROR) |
-		      (1 << FSCACHE_OBJECT_EV_KILL)),
-	   { 0, NULL }
-};
-
-static const struct fscache_transition fscache_osm_lookup_oob[] = {
-	   TRANSIT_TO(LOOKUP_FAILURE,
-		      (1 << FSCACHE_OBJECT_EV_ERROR) |
-		      (1 << FSCACHE_OBJECT_EV_KILL)),
-	   { 0, NULL }
-};
-
-static const struct fscache_transition fscache_osm_run_oob[] = {
-	   TRANSIT_TO(KILL_OBJECT,
-		      (1 << FSCACHE_OBJECT_EV_ERROR) |
-		      (1 << FSCACHE_OBJECT_EV_KILL)),
-	   { 0, NULL }
-};
-
-static int  fscache_get_object(struct fscache_object *,
-			       enum fscache_obj_ref_trace);
-static void fscache_put_object(struct fscache_object *,
-			       enum fscache_obj_ref_trace);
-static bool fscache_enqueue_dependents(struct fscache_object *, int);
-static void fscache_dequeue_object(struct fscache_object *);
-static void fscache_update_aux_data(struct fscache_object *);
-
-/*
- * we need to notify the parent when an op completes that we had outstanding
- * upon it
- */
-static inline void fscache_done_parent_op(struct fscache_object *object)
-{
-	struct fscache_object *parent = object->parent;
-
-	_enter("OBJ%x {OBJ%x,%x}",
-	       object->debug_id, parent->debug_id, parent->n_ops);
-
-	spin_lock_nested(&parent->lock, 1);
-	parent->n_obj_ops--;
-	parent->n_ops--;
-	if (parent->n_ops == 0)
-		fscache_raise_event(parent, FSCACHE_OBJECT_EV_CLEARED);
-	spin_unlock(&parent->lock);
-}
-
-/*
- * Object state machine dispatcher.
- */
-static void fscache_object_sm_dispatcher(struct fscache_object *object)
-{
-	const struct fscache_transition *t;
-	const struct fscache_state *state, *new_state;
-	unsigned long events, event_mask;
-	bool oob;
-	int event = -1;
-
-	ASSERT(object != NULL);
-
-	_enter("{OBJ%x,%s,%lx}",
-	       object->debug_id, object->state->name, object->events);
-
-	event_mask = object->event_mask;
-restart:
-	object->event_mask = 0; /* Mask normal event handling */
-	state = object->state;
-restart_masked:
-	events = object->events;
-
-	/* Handle any out-of-band events (typically an error) */
-	if (events & object->oob_event_mask) {
-		_debug("{OBJ%x} oob %lx",
-		       object->debug_id, events & object->oob_event_mask);
-		oob = true;
-		for (t = object->oob_table; t->events; t++) {
-			if (events & t->events) {
-				state = t->transit_to;
-				ASSERT(state->work != NULL);
-				event = fls(events & t->events) - 1;
-				__clear_bit(event, &object->oob_event_mask);
-				clear_bit(event, &object->events);
-				goto execute_work_state;
-			}
-		}
-	}
-	oob = false;
-
-	/* Wait states are just transition tables */
-	if (!state->work) {
-		if (events & event_mask) {
-			for (t = state->transitions; t->events; t++) {
-				if (events & t->events) {
-					new_state = t->transit_to;
-					event = fls(events & t->events) - 1;
-					trace_fscache_osm(object, state,
-							  true, false, event);
-					clear_bit(event, &object->events);
-					_debug("{OBJ%x} ev %d: %s -> %s",
-					       object->debug_id, event,
-					       state->name, new_state->name);
-					object->state = state = new_state;
-					goto execute_work_state;
-				}
-			}
-
-			/* The event mask didn't include all the tabled bits */
-			BUG();
-		}
-		/* Randomly woke up */
-		goto unmask_events;
-	}
-
-execute_work_state:
-	_debug("{OBJ%x} exec %s", object->debug_id, state->name);
-
-	trace_fscache_osm(object, state, false, oob, event);
-	new_state = state->work(object, event);
-	event = -1;
-	if (new_state == NO_TRANSIT) {
-		_debug("{OBJ%x} %s notrans", object->debug_id, state->name);
-		if (unlikely(state == STATE(OBJECT_DEAD))) {
-			_leave(" [dead]");
-			return;
-		}
-		fscache_enqueue_object(object);
-		event_mask = object->oob_event_mask;
-		goto unmask_events;
-	}
-
-	_debug("{OBJ%x} %s -> %s",
-	       object->debug_id, state->name, new_state->name);
-	object->state = state = new_state;
-
-	if (state->work) {
-		if (unlikely(state == STATE(OBJECT_DEAD))) {
-			_leave(" [dead]");
-			return;
-		}
-		goto restart_masked;
-	}
-
-	/* Transited to wait state */
-	event_mask = object->oob_event_mask;
-	for (t = state->transitions; t->events; t++)
-		event_mask |= t->events;
-
-unmask_events:
-	object->event_mask = event_mask;
-	smp_mb();
-	events = object->events;
-	if (events & event_mask)
-		goto restart;
-	_leave(" [msk %lx]", event_mask);
-}
-
-/*
- * execute an object
- */
-static void fscache_object_work_func(struct work_struct *work)
-{
-	struct fscache_object *object =
-		container_of(work, struct fscache_object, work);
-	unsigned long start;
-
-	_enter("{OBJ%x}", object->debug_id);
-
-	start = jiffies;
-	fscache_object_sm_dispatcher(object);
-	fscache_hist(fscache_objs_histogram, start);
-	fscache_put_object(object, fscache_obj_put_work);
-}
-
-/**
- * fscache_object_init - Initialise a cache object description
- * @object: Object description
- * @cookie: Cookie object will be attached to
- * @cache: Cache in which backing object will be found
- *
- * Initialise a cache object description to its basic values.
- *
- * See Documentation/filesystems/caching/backend-api.rst for a complete
- * description.
- */
-void fscache_object_init(struct fscache_object *object,
-			 struct fscache_cookie *cookie,
-			 struct fscache_cache *cache)
-{
-	const struct fscache_transition *t;
-
-	atomic_inc(&cache->object_count);
-
-	object->state = STATE(WAIT_FOR_INIT);
-	object->oob_table = fscache_osm_init_oob;
-	object->flags = 1 << FSCACHE_OBJECT_IS_LIVE;
-	spin_lock_init(&object->lock);
-	INIT_LIST_HEAD(&object->cache_link);
-	INIT_HLIST_NODE(&object->cookie_link);
-	INIT_WORK(&object->work, fscache_object_work_func);
-	INIT_LIST_HEAD(&object->dependents);
-	INIT_LIST_HEAD(&object->dep_link);
-	object->n_children = 0;
-	object->n_ops = 0;
-	object->events = 0;
-	object->cache = cache;
-	object->cookie = cookie;
-	fscache_cookie_get(cookie, fscache_cookie_get_attach_object);
-	object->parent = NULL;
-#ifdef CONFIG_FSCACHE_OBJECT_LIST
-	RB_CLEAR_NODE(&object->objlist_link);
-#endif
-
-	object->oob_event_mask = 0;
-	for (t = object->oob_table; t->events; t++)
-		object->oob_event_mask |= t->events;
-	object->event_mask = object->oob_event_mask;
-	for (t = object->state->transitions; t->events; t++)
-		object->event_mask |= t->events;
-}
-EXPORT_SYMBOL(fscache_object_init);
-
-/*
- * Mark the object as no longer being live, making sure that we synchronise
- * against op submission.
- */
-static inline void fscache_mark_object_dead(struct fscache_object *object)
-{
-	spin_lock(&object->lock);
-	clear_bit(FSCACHE_OBJECT_IS_LIVE, &object->flags);
-	spin_unlock(&object->lock);
-}
-
-/*
- * Abort object initialisation before we start it.
- */
-static const struct fscache_state *fscache_abort_initialisation(struct fscache_object *object,
-								int event)
-{
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	object->oob_event_mask = 0;
-	fscache_dequeue_object(object);
-	return transit_to(KILL_OBJECT);
-}
-
-/*
- * initialise an object
- * - check the specified object's parent to see if we can make use of it
- *   immediately to do a creation
- * - we may need to start the process of creating a parent and we need to wait
- *   for the parent's lookup and creation to complete if it's not there yet
- */
-static const struct fscache_state *fscache_initialise_object(struct fscache_object *object,
-							     int event)
-{
-	struct fscache_object *parent;
-	bool success;
-
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	ASSERT(list_empty(&object->dep_link));
-
-	parent = object->parent;
-	if (!parent) {
-		_leave(" [no parent]");
-		return transit_to(DROP_OBJECT);
-	}
-
-	_debug("parent: %s of:%lx", parent->state->name, parent->flags);
-
-	if (fscache_object_is_dying(parent)) {
-		_leave(" [bad parent]");
-		return transit_to(DROP_OBJECT);
-	}
-
-	if (fscache_object_is_available(parent)) {
-		_leave(" [ready]");
-		return transit_to(PARENT_READY);
-	}
-
-	_debug("wait");
-
-	spin_lock(&parent->lock);
-	fscache_stat(&fscache_n_cop_grab_object);
-	success = false;
-	if (fscache_object_is_live(parent) &&
-	    object->cache->ops->grab_object(object, fscache_obj_get_add_to_deps)) {
-		list_add(&object->dep_link, &parent->dependents);
-		success = true;
-	}
-	fscache_stat_d(&fscache_n_cop_grab_object);
-	spin_unlock(&parent->lock);
-	if (!success) {
-		_leave(" [grab failed]");
-		return transit_to(DROP_OBJECT);
-	}
-
-	/* fscache_acquire_non_index_cookie() uses this
-	 * to wake the chain up */
-	fscache_raise_event(parent, FSCACHE_OBJECT_EV_NEW_CHILD);
-	_leave(" [wait]");
-	return transit_to(WAIT_FOR_PARENT);
-}
-
-/*
- * Once the parent object is ready, we should kick off our lookup op.
- */
-static const struct fscache_state *fscache_parent_ready(struct fscache_object *object,
-							int event)
-{
-	struct fscache_object *parent = object->parent;
-
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	ASSERT(parent != NULL);
-
-	spin_lock(&parent->lock);
-	parent->n_ops++;
-	parent->n_obj_ops++;
-	object->lookup_jif = jiffies;
-	spin_unlock(&parent->lock);
-
-	_leave("");
-	return transit_to(LOOK_UP_OBJECT);
-}
-
-/*
- * look an object up in the cache from which it was allocated
- * - we hold an "access lock" on the parent object, so the parent object cannot
- *   be withdrawn by either party till we've finished
- */
-static const struct fscache_state *fscache_look_up_object(struct fscache_object *object,
-							  int event)
-{
-	struct fscache_cookie *cookie = object->cookie;
-	struct fscache_object *parent = object->parent;
-	int ret;
-
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	object->oob_table = fscache_osm_lookup_oob;
-
-	ASSERT(parent != NULL);
-	ASSERTCMP(parent->n_ops, >, 0);
-	ASSERTCMP(parent->n_obj_ops, >, 0);
-
-	/* make sure the parent is still available */
-	ASSERT(fscache_object_is_available(parent));
-
-	if (fscache_object_is_dying(parent) ||
-	    test_bit(FSCACHE_IOERROR, &object->cache->flags) ||
-	    !fscache_use_cookie(object)) {
-		_leave(" [unavailable]");
-		return transit_to(LOOKUP_FAILURE);
-	}
-
-	_debug("LOOKUP \"%s\" in \"%s\"",
-	       cookie->type_name, object->cache->tag->name);
-
-	fscache_stat(&fscache_n_object_lookups);
-	fscache_stat(&fscache_n_cop_lookup_object);
-	ret = object->cache->ops->lookup_object(object);
-	fscache_stat_d(&fscache_n_cop_lookup_object);
-
-	fscache_unuse_cookie(object);
-
-	if (ret == -ETIMEDOUT) {
-		/* probably stuck behind another object, so move this one to
-		 * the back of the queue */
-		fscache_stat(&fscache_n_object_lookups_timed_out);
-		_leave(" [timeout]");
-		return NO_TRANSIT;
-	}
-
-	if (ret < 0) {
-		_leave(" [error]");
-		return transit_to(LOOKUP_FAILURE);
-	}
-
-	_leave(" [ok]");
-	return transit_to(OBJECT_AVAILABLE);
-}
-
-/**
- * fscache_object_lookup_negative - Note negative cookie lookup
- * @object: Object pointing to cookie to mark
- *
- * Note negative lookup, permitting those waiting to read data from an already
- * existing backing object to continue as there's no data for them to read.
- */
-void fscache_object_lookup_negative(struct fscache_object *object)
-{
-	struct fscache_cookie *cookie = object->cookie;
-
-	_enter("{OBJ%x,%s}", object->debug_id, object->state->name);
-
-	if (!test_and_set_bit(FSCACHE_OBJECT_IS_LOOKED_UP, &object->flags)) {
-		fscache_stat(&fscache_n_object_lookups_negative);
-
-		/* Allow write requests to begin stacking up and read requests to begin
-		 * returning ENODATA.
-		 */
-		set_bit(FSCACHE_COOKIE_NO_DATA_YET, &cookie->flags);
-		clear_bit(FSCACHE_COOKIE_UNAVAILABLE, &cookie->flags);
-
-		_debug("wake up lookup %p", &cookie->flags);
-		clear_bit_unlock(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags);
-		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP);
-	}
-	_leave("");
-}
-EXPORT_SYMBOL(fscache_object_lookup_negative);
-
-/**
- * fscache_obtained_object - Note successful object lookup or creation
- * @object: Object pointing to cookie to mark
- *
- * Note successful lookup and/or creation, permitting those waiting to write
- * data to a backing object to continue.
- *
- * Note that after calling this, an object's cookie may be relinquished by the
- * netfs, and so must be accessed with object lock held.
- */
-void fscache_obtained_object(struct fscache_object *object)
-{
-	struct fscache_cookie *cookie = object->cookie;
-
-	_enter("{OBJ%x,%s}", object->debug_id, object->state->name);
-
-	/* if we were still looking up, then we must have a positive lookup
-	 * result, in which case there may be data available */
-	if (!test_and_set_bit(FSCACHE_OBJECT_IS_LOOKED_UP, &object->flags)) {
-		fscache_stat(&fscache_n_object_lookups_positive);
-
-		/* We do (presumably) have data */
-		clear_bit_unlock(FSCACHE_COOKIE_NO_DATA_YET, &cookie->flags);
-		clear_bit(FSCACHE_COOKIE_UNAVAILABLE, &cookie->flags);
-
-		/* Allow write requests to begin stacking up and read requests
-		 * to begin shovelling data.
-		 */
-		clear_bit_unlock(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags);
-		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP);
-	} else {
-		fscache_stat(&fscache_n_object_created);
-	}
-
-	set_bit(FSCACHE_OBJECT_IS_AVAILABLE, &object->flags);
-	_leave("");
-}
-EXPORT_SYMBOL(fscache_obtained_object);
-
-/*
- * handle an object that has just become available
- */
-static const struct fscache_state *fscache_object_available(struct fscache_object *object,
-							    int event)
-{
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	object->oob_table = fscache_osm_run_oob;
-
-	spin_lock(&object->lock);
-
-	fscache_done_parent_op(object);
-	spin_unlock(&object->lock);
-
-	fscache_stat(&fscache_n_cop_lookup_complete);
-	object->cache->ops->lookup_complete(object);
-	fscache_stat_d(&fscache_n_cop_lookup_complete);
-
-	fscache_hist(fscache_obj_instantiate_histogram, object->lookup_jif);
-	fscache_stat(&fscache_n_object_avail);
-
-	_leave("");
-	return transit_to(JUMPSTART_DEPS);
-}
-
-/*
- * Wake up this object's dependent objects now that we've become available.
- */
-static const struct fscache_state *fscache_jumpstart_dependents(struct fscache_object *object,
-								int event)
-{
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	if (!fscache_enqueue_dependents(object, FSCACHE_OBJECT_EV_PARENT_READY))
-		return NO_TRANSIT; /* Not finished; requeue */
-	return transit_to(WAIT_FOR_CMD);
-}
-
-/*
- * Handle lookup or creation failute.
- */
-static const struct fscache_state *fscache_lookup_failure(struct fscache_object *object,
-							  int event)
-{
-	struct fscache_cookie *cookie;
-
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	object->oob_event_mask = 0;
-
-	fscache_stat(&fscache_n_cop_lookup_complete);
-	object->cache->ops->lookup_complete(object);
-	fscache_stat_d(&fscache_n_cop_lookup_complete);
-
-	set_bit(FSCACHE_OBJECT_KILLED_BY_CACHE, &object->flags);
-
-	cookie = object->cookie;
-	set_bit(FSCACHE_COOKIE_UNAVAILABLE, &cookie->flags);
-	if (test_and_clear_bit(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags))
-		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP);
-
-	fscache_done_parent_op(object);
-	return transit_to(KILL_OBJECT);
-}
-
-/*
- * Wait for completion of all active operations on this object and the death of
- * all child objects of this object.
- */
-static const struct fscache_state *fscache_kill_object(struct fscache_object *object,
-						       int event)
-{
-	_enter("{OBJ%x,%d,%d},%d",
-	       object->debug_id, object->n_ops, object->n_children, event);
-
-	fscache_mark_object_dead(object);
-	object->oob_event_mask = 0;
-
-	if (list_empty(&object->dependents) &&
-	    object->n_ops == 0 &&
-	    object->n_children == 0)
-		return transit_to(DROP_OBJECT);
-
-	if (!list_empty(&object->dependents))
-		return transit_to(KILL_DEPENDENTS);
-
-	return transit_to(WAIT_FOR_CLEARANCE);
-}
-
-/*
- * Kill dependent objects.
- */
-static const struct fscache_state *fscache_kill_dependents(struct fscache_object *object,
-							   int event)
-{
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	if (!fscache_enqueue_dependents(object, FSCACHE_OBJECT_EV_KILL))
-		return NO_TRANSIT; /* Not finished */
-	return transit_to(WAIT_FOR_CLEARANCE);
-}
-
-/*
- * Drop an object's attachments
- */
-static const struct fscache_state *fscache_drop_object(struct fscache_object *object,
-						       int event)
-{
-	struct fscache_object *parent = object->parent;
-	struct fscache_cookie *cookie = object->cookie;
-	struct fscache_cache *cache = object->cache;
-	bool awaken = false;
-
-	_enter("{OBJ%x,%d},%d", object->debug_id, object->n_children, event);
-
-	ASSERT(cookie != NULL);
-	ASSERT(!hlist_unhashed(&object->cookie_link));
-
-	if (test_bit(FSCACHE_COOKIE_AUX_UPDATED, &cookie->flags)) {
-		_debug("final update");
-		fscache_update_aux_data(object);
-	}
-
-	/* Make sure the cookie no longer points here and that the netfs isn't
-	 * waiting for us.
-	 */
-	spin_lock(&cookie->lock);
-	hlist_del_init(&object->cookie_link);
-	if (hlist_empty(&cookie->backing_objects) &&
-	    test_and_clear_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags))
-		awaken = true;
-	spin_unlock(&cookie->lock);
-
-	if (awaken)
-		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING);
-	if (test_and_clear_bit(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags))
-		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP);
-
-
-	/* Prevent a race with our last child, which has to signal EV_CLEARED
-	 * before dropping our spinlock.
-	 */
-	spin_lock(&object->lock);
-	spin_unlock(&object->lock);
-
-	/* Discard from the cache's collection of objects */
-	spin_lock(&cache->object_list_lock);
-	list_del_init(&object->cache_link);
-	spin_unlock(&cache->object_list_lock);
-
-	fscache_stat(&fscache_n_cop_drop_object);
-	cache->ops->drop_object(object);
-	fscache_stat_d(&fscache_n_cop_drop_object);
-
-	/* The parent object wants to know when all it dependents have gone */
-	if (parent) {
-		_debug("release parent OBJ%x {%d}",
-		       parent->debug_id, parent->n_children);
-
-		spin_lock(&parent->lock);
-		parent->n_children--;
-		if (parent->n_children == 0)
-			fscache_raise_event(parent, FSCACHE_OBJECT_EV_CLEARED);
-		spin_unlock(&parent->lock);
-		object->parent = NULL;
-	}
-
-	/* this just shifts the object release to the work processor */
-	fscache_put_object(object, fscache_obj_put_drop_obj);
-	fscache_stat(&fscache_n_object_dead);
-
-	_leave("");
-	return transit_to(OBJECT_DEAD);
-}
-
-/*
- * get a ref on an object
- */
-static int fscache_get_object(struct fscache_object *object,
-			      enum fscache_obj_ref_trace why)
-{
-	int ret;
-
-	fscache_stat(&fscache_n_cop_grab_object);
-	ret = object->cache->ops->grab_object(object, why) ? 0 : -EAGAIN;
-	fscache_stat_d(&fscache_n_cop_grab_object);
-	return ret;
-}
-
-/*
- * Discard a ref on an object
- */
-static void fscache_put_object(struct fscache_object *object,
-			       enum fscache_obj_ref_trace why)
-{
-	fscache_stat(&fscache_n_cop_put_object);
-	object->cache->ops->put_object(object, why);
-	fscache_stat_d(&fscache_n_cop_put_object);
-}
-
-/**
- * fscache_object_destroy - Note that a cache object is about to be destroyed
- * @object: The object to be destroyed
- *
- * Note the imminent destruction and deallocation of a cache object record.
- */
-void fscache_object_destroy(struct fscache_object *object)
-{
-	fscache_objlist_remove(object);
-
-	/* We can get rid of the cookie now */
-	fscache_cookie_put(object->cookie, fscache_cookie_put_object);
-	object->cookie = NULL;
-}
-EXPORT_SYMBOL(fscache_object_destroy);
-
-/*
- * enqueue an object for metadata-type processing
- */
-void fscache_enqueue_object(struct fscache_object *object)
-{
-	_enter("{OBJ%x}", object->debug_id);
-
-	if (fscache_get_object(object, fscache_obj_get_queue) >= 0) {
-		wait_queue_head_t *cong_wq =
-			&get_cpu_var(fscache_object_cong_wait);
-
-		if (queue_work(fscache_object_wq, &object->work)) {
-			if (fscache_object_congested())
-				wake_up(cong_wq);
-		} else
-			fscache_put_object(object, fscache_obj_put_queue);
-
-		put_cpu_var(fscache_object_cong_wait);
-	}
-}
-
-/**
- * fscache_object_sleep_till_congested - Sleep until object wq is congested
- * @timeoutp: Scheduler sleep timeout
- *
- * Allow an object handler to sleep until the object workqueue is congested.
- *
- * The caller must set up a wake up event before calling this and must have set
- * the appropriate sleep mode (such as TASK_UNINTERRUPTIBLE) and tested its own
- * condition before calling this function as no test is made here.
- *
- * %true is returned if the object wq is congested, %false otherwise.
- */
-bool fscache_object_sleep_till_congested(signed long *timeoutp)
-{
-	wait_queue_head_t *cong_wq = this_cpu_ptr(&fscache_object_cong_wait);
-	DEFINE_WAIT(wait);
-
-	if (fscache_object_congested())
-		return true;
-
-	add_wait_queue_exclusive(cong_wq, &wait);
-	if (!fscache_object_congested())
-		*timeoutp = schedule_timeout(*timeoutp);
-	finish_wait(cong_wq, &wait);
-
-	return fscache_object_congested();
-}
-EXPORT_SYMBOL_GPL(fscache_object_sleep_till_congested);
-
-/*
- * Enqueue the dependents of an object for metadata-type processing.
- *
- * If we don't manage to finish the list before the scheduler wants to run
- * again then return false immediately.  We return true if the list was
- * cleared.
- */
-static bool fscache_enqueue_dependents(struct fscache_object *object, int event)
-{
-	struct fscache_object *dep;
-	bool ret = true;
-
-	_enter("{OBJ%x}", object->debug_id);
-
-	if (list_empty(&object->dependents))
-		return true;
-
-	spin_lock(&object->lock);
-
-	while (!list_empty(&object->dependents)) {
-		dep = list_entry(object->dependents.next,
-				 struct fscache_object, dep_link);
-		list_del_init(&dep->dep_link);
-
-		fscache_raise_event(dep, event);
-		fscache_put_object(dep, fscache_obj_put_enq_dep);
-
-		if (!list_empty(&object->dependents) && need_resched()) {
-			ret = false;
-			break;
-		}
-	}
-
-	spin_unlock(&object->lock);
-	return ret;
-}
-
-/*
- * remove an object from whatever queue it's waiting on
- */
-static void fscache_dequeue_object(struct fscache_object *object)
-{
-	_enter("{OBJ%x}", object->debug_id);
-
-	if (!list_empty(&object->dep_link)) {
-		spin_lock(&object->parent->lock);
-		list_del_init(&object->dep_link);
-		spin_unlock(&object->parent->lock);
-	}
-
-	_leave("");
-}
-
-static const struct fscache_state *fscache_invalidate_object(struct fscache_object *object,
-							     int event)
-{
-	return transit_to(UPDATE_OBJECT);
-}
-
-/*
- * Update auxiliary data.
- */
-static void fscache_update_aux_data(struct fscache_object *object)
-{
-	fscache_stat(&fscache_n_updates_run);
-	fscache_stat(&fscache_n_cop_update_object);
-	object->cache->ops->update_object(object);
-	fscache_stat_d(&fscache_n_cop_update_object);
-}
-
-/*
- * Asynchronously update an object.
- */
-static const struct fscache_state *fscache_update_object(struct fscache_object *object,
-							 int event)
-{
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	fscache_update_aux_data(object);
-
-	_leave("");
-	return transit_to(WAIT_FOR_CMD);
-}
-
-/**
- * fscache_object_retrying_stale - Note retrying stale object
- * @object: The object that will be retried
- *
- * Note that an object lookup found an on-disk object that was adjudged to be
- * stale and has been deleted.  The lookup will be retried.
- */
-void fscache_object_retrying_stale(struct fscache_object *object)
-{
-	fscache_stat(&fscache_n_cache_no_space_reject);
-}
-EXPORT_SYMBOL(fscache_object_retrying_stale);
-
-/**
- * fscache_object_mark_killed - Note that an object was killed
- * @object: The object that was culled
- * @why: The reason the object was killed.
- *
- * Note that an object was killed.  Returns true if the object was
- * already marked killed, false if it wasn't.
- */
-void fscache_object_mark_killed(struct fscache_object *object,
-				enum fscache_why_object_killed why)
-{
-	if (test_and_set_bit(FSCACHE_OBJECT_KILLED_BY_CACHE, &object->flags)) {
-		pr_err("Error: Object already killed by cache [%s]\n",
-		       object->cache->identifier);
-		return;
-	}
-
-	switch (why) {
-	case FSCACHE_OBJECT_NO_SPACE:
-		fscache_stat(&fscache_n_cache_no_space_reject);
-		break;
-	case FSCACHE_OBJECT_IS_STALE:
-		fscache_stat(&fscache_n_cache_stale_objects);
-		break;
-	case FSCACHE_OBJECT_WAS_RETIRED:
-		fscache_stat(&fscache_n_cache_retired_objects);
-		break;
-	case FSCACHE_OBJECT_WAS_CULLED:
-		fscache_stat(&fscache_n_cache_culled_objects);
-		break;
-	}
-}
-EXPORT_SYMBOL(fscache_object_mark_killed);
-
-/*
- * The object is dead.  We can get here if an object gets queued by an event
- * that would lead to its death (such as EV_KILL) when the dispatcher is
- * already running (and so can be requeued) but hasn't yet cleared the event
- * mask.
- */
-static const struct fscache_state *fscache_object_dead(struct fscache_object *object,
-						       int event)
-{
-	if (!test_and_set_bit(FSCACHE_OBJECT_RUN_AFTER_DEAD,
-			      &object->flags))
-		return NO_TRANSIT;
-
-	WARN(true, "FS-Cache object redispatched after death");
-	return NO_TRANSIT;
-}
diff --git a/fs/fscache/object_bits.c b/fs/fscache/object_bits.c
new file mode 100644
index 000000000000..5316d7d33ce0
--- /dev/null
+++ b/fs/fscache/object_bits.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Miscellaneous object routines.
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for more information on
+ * the netfs API.
+ */
+
+#define FSCACHE_DEBUG_LEVEL OBJECT
+#include <linux/module.h>
+#include <linux/slab.h>
+#include "internal.h"
+
+static atomic_t fscache_object_debug_id;
+
+/**
+ * fscache_object_init - Initialise a cache object description
+ * @object: Object description
+ * @cookie: Cookie object will be attached to
+ * @cache: Cache in which backing object will be found
+ *
+ * Initialise a cache object description to its basic values.
+ *
+ * See Documentation/filesystems/caching/backend-api.txt for a complete
+ * description.
+ */
+void fscache_object_init(struct fscache_object *object,
+			 struct fscache_cookie *cookie,
+			 struct fscache_cache *cache)
+{
+	atomic_inc(&cache->object_count);
+
+	spin_lock_init(&object->lock);
+	INIT_LIST_HEAD(&object->cache_link);
+	INIT_HLIST_NODE(&object->cookie_link);
+	object->n_children = 0;
+	object->cache = cache;
+	object->cookie = fscache_cookie_get(cookie, fscache_cookie_get_attach_object);
+	object->parent = NULL;
+#ifdef CONFIG_FSCACHE_OBJECT_LIST
+	RB_CLEAR_NODE(&object->objlist_link);
+#endif
+	object->debug_id = atomic_inc_return(&fscache_object_debug_id);
+}
+EXPORT_SYMBOL(fscache_object_init);
+
+/**
+ * fscache_object_destroy - Note that a cache object is about to be destroyed
+ * @object: The object to be destroyed
+ *
+ * Note the imminent destruction and deallocation of a cache object record.
+ */
+void fscache_object_destroy(struct fscache_object *object)
+{
+	_enter("%u", atomic_read(&object->cache->object_count));
+
+	fscache_objlist_remove(object);
+
+	/* We can get rid of the cookie now */
+	fscache_cookie_put(object->cookie, fscache_cookie_put_object);
+	object->cookie = NULL;
+}
+EXPORT_SYMBOL(fscache_object_destroy);
+
+/**
+ * fscache_object_destroyed - Note destruction of an object in a cache
+ * @cache: The cache from which the object came
+ *
+ * Note the destruction and deallocation of an object record in a cache.
+ */
+void fscache_object_destroyed(struct fscache_cache *cache)
+{
+	_enter("%d", atomic_read(&cache->object_count));
+	if (atomic_dec_and_test(&cache->object_count))
+		wake_up_all(&fscache_cache_cleared_wq);
+}
+EXPORT_SYMBOL(fscache_object_destroyed);
+
+/**
+ * fscache_object_mark_killed - Note that an object was killed
+ * @object: The object that was culled
+ * @why: The reason the object was killed.
+ *
+ * Note that an object was killed.  Returns true if the object was
+ * already marked killed, false if it wasn't.
+ */
+void fscache_object_mark_killed(struct fscache_object *object,
+				enum fscache_why_object_killed why)
+{
+	switch (why) {
+	case FSCACHE_OBJECT_NO_SPACE:
+		fscache_stat(&fscache_n_cache_no_space_reject);
+		break;
+	case FSCACHE_OBJECT_IS_STALE:
+		fscache_stat(&fscache_n_cache_stale_objects);
+		break;
+	case FSCACHE_OBJECT_WAS_RETIRED:
+		fscache_stat(&fscache_n_cache_retired_objects);
+		break;
+	case FSCACHE_OBJECT_WAS_CULLED:
+		fscache_stat(&fscache_n_cache_culled_objects);
+		break;
+	}
+}
+EXPORT_SYMBOL(fscache_object_mark_killed);
+
+/**
+ * fscache_object_retrying_stale - Note retrying stale object
+ * @object: The object that will be retried
+ *
+ * Note that an object lookup found an on-disk object that was adjudged to be
+ * stale and has been deleted.  The lookup will be retried.
+ */
+void fscache_object_retrying_stale(struct fscache_object *object)
+{
+	fscache_stat(&fscache_n_cache_stale_objects);
+}
+EXPORT_SYMBOL(fscache_object_retrying_stale);
diff --git a/fs/fscache/proc.c b/fs/fscache/proc.c
index da51fdfc8641..729d083f1e91 100644
--- a/fs/fscache/proc.c
+++ b/fs/fscache/proc.c
@@ -5,7 +5,7 @@
  * Written by David Howells (dhowells@redhat.com)
  */
 
-#define FSCACHE_DEBUG_LEVEL OPERATION
+#define FSCACHE_DEBUG_LEVEL CACHE
 #include <linux/module.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
@@ -16,8 +16,6 @@
  */
 int __init fscache_proc_init(void)
 {
-	_enter("");
-
 	if (!proc_mkdir("fs/fscache", NULL))
 		goto error_dir;
 
@@ -43,7 +41,6 @@ int __init fscache_proc_init(void)
 		goto error_objects;
 #endif
 
-	_leave(" = 0");
 	return 0;
 
 #ifdef CONFIG_FSCACHE_OBJECT_LIST
@@ -61,7 +58,6 @@ int __init fscache_proc_init(void)
 error_cookies:
 	remove_proc_entry("fs/fscache", NULL);
 error_dir:
-	_leave(" = -ENOMEM");
 	return -ENOMEM;
 }
 
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index 5b1cec456199..583817f4f113 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -5,7 +5,7 @@
  * Written by David Howells (dhowells@redhat.com)
  */
 
-#define FSCACHE_DEBUG_LEVEL THREAD
+#define FSCACHE_DEBUG_LEVEL CACHE
 #include <linux/module.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 8b6aa11dee54..de1cffb2558e 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -16,7 +16,6 @@
 
 #include <linux/fscache.h>
 #include <linux/sched.h>
-#include <linux/workqueue.h>
 
 #define NR_MAXCACHES BITS_PER_LONG
 
@@ -25,14 +24,21 @@ struct fscache_cache_ops;
 struct fscache_object;
 
 enum fscache_obj_ref_trace {
-	fscache_obj_get_add_to_deps,
-	fscache_obj_get_queue,
+	fscache_obj_get_attach,
+	fscache_obj_get_exists,
+	fscache_obj_get_inval,
+	fscache_obj_get_wait,
+	fscache_obj_get_withdraw,
+	fscache_obj_new,
+	fscache_obj_put,
+	fscache_obj_put_alloc_dup,
 	fscache_obj_put_alloc_fail,
 	fscache_obj_put_attach_fail,
+	fscache_obj_put_drop_child,
 	fscache_obj_put_drop_obj,
-	fscache_obj_put_enq_dep,
-	fscache_obj_put_queue,
-	fscache_obj_put_work,
+	fscache_obj_put_inval,
+	fscache_obj_put_lookup_fail,
+	fscache_obj_put_withdraw,
 	fscache_obj_ref__nr_traces
 };
 
@@ -80,16 +86,21 @@ struct fscache_cache_ops {
 	const char *name;
 
 	/* allocate an object record for a cookie */
-	struct fscache_object *(*alloc_object)(struct fscache_cache *cache,
-					       struct fscache_cookie *cookie);
+	struct fscache_object *(*alloc_object)(struct fscache_cookie *cookie,
+					       struct fscache_cache *cache,
+					       struct fscache_object *parent);
 
-	/* look up the object for a cookie
-	 * - return -ETIMEDOUT to be requeued
-	 */
-	int (*lookup_object)(struct fscache_object *object);
+	/* Prepare data used in lookup */
+	void *(*prepare_lookup_data)(struct fscache_object *object);
 
-	/* finished looking up */
-	void (*lookup_complete)(struct fscache_object *object);
+	/* Look up the object for a cookie */
+	bool (*lookup_object)(struct fscache_object *object, void *lookup_data);
+
+	/* Create the object for a cookie */
+	bool (*create_object)(struct fscache_object *object, void *lookup_data);
+
+	/* Clean up lookup data */
+	void (*free_lookup_data)(struct fscache_object *object, void *lookup_data);
 
 	/* increment the usage count on this object (may fail if unmounting) */
 	struct fscache_object *(*grab_object)(struct fscache_object *object,
@@ -109,12 +120,15 @@ struct fscache_cache_ops {
 
 	/* discard the resources pinned by an object and effect retirement if
 	 * necessary */
-	void (*drop_object)(struct fscache_object *object);
+	void (*drop_object)(struct fscache_object *object, bool invalidate);
 
 	/* dispose of a reference to an object */
 	void (*put_object)(struct fscache_object *object,
 			   enum fscache_obj_ref_trace why);
 
+	/* Get object usage count */
+	unsigned int (*get_object_usage)(const struct fscache_object *object);
+
 	/* sync a cache */
 	void (*sync_cache)(struct fscache_cache *cache);
 
@@ -124,74 +138,34 @@ struct fscache_cache_ops {
 
 extern struct fscache_cookie fscache_fsdef_index;
 
-/*
- * Event list for fscache_object::{event_mask,events}
- */
-enum {
-	FSCACHE_OBJECT_EV_NEW_CHILD,	/* T if object has a new child */
-	FSCACHE_OBJECT_EV_PARENT_READY,	/* T if object's parent is ready */
-	FSCACHE_OBJECT_EV_UPDATE,	/* T if object should be updated */
-	FSCACHE_OBJECT_EV_INVALIDATE,	/* T if cache requested object invalidation */
-	FSCACHE_OBJECT_EV_CLEARED,	/* T if accessors all gone */
-	FSCACHE_OBJECT_EV_ERROR,	/* T if fatal error occurred during processing */
-	FSCACHE_OBJECT_EV_KILL,		/* T if netfs relinquished or cache withdrew object */
-	NR_FSCACHE_OBJECT_EVENTS
-};
-
-#define FSCACHE_OBJECT_EVENTS_MASK ((1UL << NR_FSCACHE_OBJECT_EVENTS) - 1)
-
-/*
- * States for object state machine.
- */
-struct fscache_transition {
-	unsigned long events;
-	const struct fscache_state *transit_to;
-};
-
-struct fscache_state {
-	char name[24];
-	char short_name[8];
-	const struct fscache_state *(*work)(struct fscache_object *object,
-					    int event);
-	const struct fscache_transition transitions[];
+enum fscache_object_stage {
+	FSCACHE_OBJECT_STAGE_INITIAL,
+	FSCACHE_OBJECT_STAGE_LOOKING_UP,
+	FSCACHE_OBJECT_STAGE_UNCREATED,		/* Needs creation */
+	FSCACHE_OBJECT_STAGE_LIVE_TEMP,		/* Temporary object created, can be no hits */
+	FSCACHE_OBJECT_STAGE_LIVE_EMPTY,	/* Object was freshly created, can be no hits */
+	FSCACHE_OBJECT_STAGE_LIVE,		/* Object is populated */
+	FSCACHE_OBJECT_STAGE_DESTROYING,
+	FSCACHE_OBJECT_STAGE_DEAD,
 };
 
 /*
  * on-disk cache file or index handle
  */
 struct fscache_object {
-	const struct fscache_state *state;	/* Object state machine state */
-	const struct fscache_transition *oob_table; /* OOB state transition table */
 	int			debug_id;	/* debugging ID */
 	int			n_children;	/* number of child objects */
-	int			n_ops;		/* number of extant ops on object */
-	int			n_obj_ops;	/* number of object ops outstanding on object */
+	enum fscache_object_stage stage;	/* Stage of object's lifecycle */
 	spinlock_t		lock;		/* state and operations lock */
 
-	unsigned long		lookup_jif;	/* time at which lookup started */
-	unsigned long		oob_event_mask;	/* OOB events this object is interested in */
-	unsigned long		event_mask;	/* events this object is interested in */
-	unsigned long		events;		/* events to be processed by this object
-						 * (order is important - using fls) */
-
 	unsigned long		flags;
-#define FSCACHE_OBJECT_LOCK		0	/* T if object is busy being processed */
-#define FSCACHE_OBJECT_WAITING		2	/* T if object is waiting on its parent */
-#define FSCACHE_OBJECT_IS_LIVE		3	/* T if object is not withdrawn or relinquished */
-#define FSCACHE_OBJECT_IS_LOOKED_UP	4	/* T if object has been looked up */
-#define FSCACHE_OBJECT_IS_AVAILABLE	5	/* T if object has become active */
-#define FSCACHE_OBJECT_RETIRED		6	/* T if object was retired on relinquishment */
-#define FSCACHE_OBJECT_KILLED_BY_CACHE	7	/* T if object was killed by the cache */
-#define FSCACHE_OBJECT_RUN_AFTER_DEAD	8	/* T if object has been dispatched after death */
+#define FSCACHE_OBJECT_NEEDS_UPDATE	9	/* T if object attrs need writing to disk */
 
 	struct list_head	cache_link;	/* link in cache->object_list */
 	struct hlist_node	cookie_link;	/* link in cookie->backing_objects */
 	struct fscache_cache	*cache;		/* cache that supplied this object */
 	struct fscache_cookie	*cookie;	/* netfs's file/index object */
 	struct fscache_object	*parent;	/* parent object */
-	struct work_struct	work;		/* attention scheduling record */
-	struct list_head	dependents;	/* FIFO of dependent objects */
-	struct list_head	dep_link;	/* link in parent's dependents list */
 #ifdef CONFIG_FSCACHE_OBJECT_LIST
 	struct rb_node		objlist_link;	/* link in global object list */
 #endif
@@ -201,101 +175,12 @@ extern void fscache_object_init(struct fscache_object *, struct fscache_cookie *
 				struct fscache_cache *);
 extern void fscache_object_destroy(struct fscache_object *);
 
-extern void fscache_object_lookup_negative(struct fscache_object *object);
-extern void fscache_obtained_object(struct fscache_object *object);
-
-static inline bool fscache_object_is_live(struct fscache_object *object)
-{
-	return test_bit(FSCACHE_OBJECT_IS_LIVE, &object->flags);
-}
-
-static inline bool fscache_object_is_dying(struct fscache_object *object)
-{
-	return !fscache_object_is_live(object);
-}
-
-static inline bool fscache_object_is_available(struct fscache_object *object)
-{
-	return test_bit(FSCACHE_OBJECT_IS_AVAILABLE, &object->flags);
-}
-
 static inline bool fscache_cache_is_broken(struct fscache_object *object)
 {
 	return test_bit(FSCACHE_IOERROR, &object->cache->flags);
 }
 
-static inline bool fscache_object_is_active(struct fscache_object *object)
-{
-	return fscache_object_is_available(object) &&
-		fscache_object_is_live(object) &&
-		!fscache_cache_is_broken(object);
-}
-
-/**
- * fscache_object_destroyed - Note destruction of an object in a cache
- * @cache: The cache from which the object came
- *
- * Note the destruction and deallocation of an object record in a cache.
- */
-static inline void fscache_object_destroyed(struct fscache_cache *cache)
-{
-	if (atomic_dec_and_test(&cache->object_count))
-		wake_up_all(&fscache_cache_cleared_wq);
-}
-
-/**
- * fscache_object_lookup_error - Note an object encountered an error
- * @object: The object on which the error was encountered
- *
- * Note that an object encountered a fatal error (usually an I/O error) and
- * that it should be withdrawn as soon as possible.
- */
-static inline void fscache_object_lookup_error(struct fscache_object *object)
-{
-	set_bit(FSCACHE_OBJECT_EV_ERROR, &object->events);
-}
-
-static inline void __fscache_use_cookie(struct fscache_cookie *cookie)
-{
-	atomic_inc(&cookie->n_active);
-}
-
-/**
- * fscache_use_cookie - Request usage of cookie attached to an object
- * @object: Object description
- * 
- * Request usage of the cookie attached to an object.  NULL is returned if the
- * relinquishment had reduced the cookie usage count to 0.
- */
-static inline bool fscache_use_cookie(struct fscache_object *object)
-{
-	struct fscache_cookie *cookie = object->cookie;
-	return atomic_inc_not_zero(&cookie->n_active) != 0;
-}
-
-static inline bool __fscache_unuse_cookie(struct fscache_cookie *cookie)
-{
-	return atomic_dec_and_test(&cookie->n_active);
-}
-
-static inline void __fscache_wake_unused_cookie(struct fscache_cookie *cookie)
-{
-	wake_up_var(&cookie->n_active);
-}
-
-/**
- * fscache_unuse_cookie - Cease usage of cookie attached to an object
- * @object: Object description
- * 
- * Cease usage of the cookie attached to an object.  When the users count
- * reaches zero then the cookie relinquishment will be permitted to proceed.
- */
-static inline void fscache_unuse_cookie(struct fscache_object *object)
-{
-	struct fscache_cookie *cookie = object->cookie;
-	if (__fscache_unuse_cookie(cookie))
-		__fscache_wake_unused_cookie(cookie);
-}
+extern void fscache_object_destroyed(struct fscache_cache *cache);
 
 /*
  * out-of-line cache backend functions
@@ -312,8 +197,6 @@ extern void fscache_withdraw_cache(struct fscache_cache *cache);
 
 extern void fscache_io_error(struct fscache_cache *cache);
 
-extern bool fscache_object_sleep_till_congested(signed long *timeoutp);
-
 extern void fscache_object_retrying_stale(struct fscache_object *object);
 
 enum fscache_why_object_killed {
@@ -347,4 +230,13 @@ static inline void *fscache_get_aux(struct fscache_cookie *cookie)
 		return cookie->aux;
 }
 
+/*
+ * Complete an I/O operation
+ */
+static inline void fscache_end_io_operation(struct fscache_cookie *cookie)
+{
+	if (atomic_dec_and_test(&cookie->n_ops))
+		wake_up_var(&cookie->n_ops);
+}
+
 #endif /* _LINUX_FSCACHE_CACHE_H */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index c62b62938c4d..11b18761a3b6 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -31,7 +31,7 @@
 
 /*
  * overload PG_private_2 to give us PG_fscache - this is used to indicate that
- * a page is currently backed by a local disk cache
+ * a page is currently being written to the cache, possibly by direct I/O.
  */
 #define PageFsCache(page)		PagePrivate2((page))
 #define SetPageFsCache(page)		SetPagePrivate2((page))
@@ -52,6 +52,8 @@ enum fscache_cookie_type {
 };
 
 #define FSCACHE_ADV_SINGLE_CHUNK	0x01 /* The object is a single chunk of data */
+#define FSCACHE_ADV_WRITE_CACHE		0x00 /* Do cache if written to locally */
+#define FSCACHE_ADV_WRITE_NOCACHE	0x02 /* Don't cache if written to locally */
 
 /*
  * fscache cached network filesystem type
@@ -64,6 +66,19 @@ struct fscache_netfs {
 	struct fscache_cookie		*primary_index;
 };
 
+/*
+ * Data object state.
+ */
+enum fscache_cookie_stage {
+	FSCACHE_COOKIE_STAGE_QUIESCENT,		/* The cookie is uncached */
+	FSCACHE_COOKIE_STAGE_INITIALISING,	/* The in-memory structs are being inited */
+	FSCACHE_COOKIE_STAGE_LOOKING_UP,	/* The cache object is being looked up */
+	FSCACHE_COOKIE_STAGE_NO_DATA_YET,	/* The cache has no data, read to network */
+	FSCACHE_COOKIE_STAGE_ACTIVE,		/* The cache is active, readable and writable */
+	FSCACHE_COOKIE_STAGE_INVALIDATING,	/* The cache is being invalidated */
+	FSCACHE_COOKIE_STAGE_DEAD,		/* The cache object is dead */
+} __attribute__((mode(byte)));
+
 /*
  * data file or index object cookie
  * - a file will only appear in one cache
@@ -74,7 +89,8 @@ struct fscache_netfs {
 struct fscache_cookie {
 	atomic_t			usage;		/* number of users of this cookie */
 	atomic_t			n_children;	/* number of children of this cookie */
-	atomic_t			n_active;	/* number of active users of netfs ptrs */
+	atomic_t			n_active;	/* number of active users of cookie */
+	atomic_t			n_ops;		/* Number of active ops on this cookie */
 	unsigned int			debug_id;
 	spinlock_t			lock;
 	struct hlist_head		backing_objects; /* object(s) backing this file/index */
@@ -86,19 +102,13 @@ struct fscache_cookie {
 	loff_t				object_size;	/* Size of the netfs object */
 
 	unsigned long			flags;
-#define FSCACHE_COOKIE_LOOKING_UP	0	/* T if non-index cookie being looked up still */
-#define FSCACHE_COOKIE_NO_DATA_YET	1	/* T if new object with no cached data yet */
-#define FSCACHE_COOKIE_UNAVAILABLE	2	/* T if cookie is unavailable (error, etc) */
-#define FSCACHE_COOKIE_INVALIDATING	3	/* T if cookie is being invalidated */
-#define FSCACHE_COOKIE_RELINQUISHED	4	/* T if cookie has been relinquished */
-#define FSCACHE_COOKIE_ENABLED		5	/* T if cookie is enabled */
-#define FSCACHE_COOKIE_ENABLEMENT_LOCK	6	/* T if cookie is being en/disabled */
-#define FSCACHE_COOKIE_AUX_UPDATED	8	/* T if the auxiliary data was updated */
-#define FSCACHE_COOKIE_ACQUIRED		9	/* T if cookie is in use */
-#define FSCACHE_COOKIE_RELINQUISHING	10	/* T if cookie is being relinquished */
+#define FSCACHE_COOKIE_INVALIDATING	4	/* T if cookie is being invalidated */
+#define FSCACHE_COOKIE_ACQUIRED		5	/* T if cookie is in use */
+#define FSCACHE_COOKIE_RELINQUISHED	6	/* T if cookie has been relinquished */
 
+	enum fscache_cookie_stage	stage;
 	enum fscache_cookie_type	type:8;
-	u8				advice;		/* FSCACHE_COOKIE_ADV_* */
+	u8				advice;		/* FSCACHE_ADV_* */
 	u8				key_len;	/* Length of index key */
 	u8				aux_len;	/* Length of auxiliary data */
 	u32				key_hash;	/* Hash of parent, type, key, len */
@@ -112,11 +122,6 @@ struct fscache_cookie {
 	};
 };
 
-static inline bool fscache_cookie_enabled(struct fscache_cookie *cookie)
-{
-	return test_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags);
-}
-
 /*
  * slow-path functions for when there is actually caching available, and the
  * netfs does actually have a valid token
@@ -137,14 +142,13 @@ extern struct fscache_cookie *__fscache_acquire_cookie(
 	struct fscache_cache_tag *,
 	const void *, size_t,
 	const void *, size_t,
-	loff_t, bool);
-extern void __fscache_relinquish_cookie(struct fscache_cookie *, const void *, bool);
-extern void __fscache_update_cookie(struct fscache_cookie *, const void *);
+	loff_t);
+extern void __fscache_use_cookie(struct fscache_cookie *, bool);
+extern void __fscache_unuse_cookie(struct fscache_cookie *, const void *, const loff_t *);
+extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
+extern void __fscache_update_cookie(struct fscache_cookie *, const void *, const loff_t *);
 extern void __fscache_invalidate(struct fscache_cookie *);
 extern void __fscache_wait_on_invalidate(struct fscache_cookie *);
-extern void __fscache_disable_cookie(struct fscache_cookie *, const void *, bool);
-extern void __fscache_enable_cookie(struct fscache_cookie *, const void *, loff_t,
-				    bool (*)(void *), void *);
 
 /**
  * fscache_register_netfs - Register a filesystem as desiring caching services
@@ -231,7 +235,6 @@ void fscache_release_cache_tag(struct fscache_cache_tag *tag)
  * @netfs_data: An arbitrary piece of data to be kept in the cookie to
  * represent the cache object to the netfs
  * @object_size: The initial size of object
- * @enable: Whether or not to enable a data cookie immediately
  *
  * This function is used to inform FS-Cache about part of an index hierarchy
  * that can be used to locate files.  This is done by requesting a cookie for
@@ -251,60 +254,92 @@ struct fscache_cookie *fscache_acquire_cookie(
 	size_t index_key_len,
 	const void *aux_data,
 	size_t aux_data_len,
-	loff_t object_size,
-	bool enable)
+	loff_t object_size)
 {
-	if (fscache_cookie_valid(parent) && fscache_cookie_enabled(parent))
+	if (fscache_cookie_valid(parent))
 		return __fscache_acquire_cookie(parent, type, type_name, advice,
 						preferred_cache,
 						index_key, index_key_len,
 						aux_data, aux_data_len,
-						object_size, enable);
+						object_size);
 	else
 		return NULL;
 }
 
+/**
+ * fscache_use_cookie - Request usage of cookie attached to an object
+ * @object: Object description
+ * @will_modify: If cache is expected to be modified locally
+ *
+ * Request usage of the cookie attached to an object.  The caller should tell
+ * the cache if the object's contents are about to be modified locally and then
+ * the cache can apply the policy that has been set to handle this case.
+ */
+static inline void fscache_use_cookie(struct fscache_cookie *cookie,
+				      bool will_modify)
+{
+	if (fscache_cookie_valid(cookie) &&
+	    cookie->type != FSCACHE_COOKIE_TYPE_INDEX)
+		__fscache_use_cookie(cookie, will_modify);
+}
+
+/**
+ * fscache_unuse_cookie - Cease usage of cookie attached to an object
+ * @object: Object description
+ * @aux_data: Updated auxiliary data (or NULL)
+ * @object_size: Revised size of the object (or NULL)
+ *
+ * Cease usage of the cookie attached to an object.  When the users count
+ * reaches zero then the cookie relinquishment will be permitted to proceed.
+ */
+static inline void fscache_unuse_cookie(struct fscache_cookie *cookie,
+					const void *aux_data,
+					const loff_t *object_size)
+{
+	if (fscache_cookie_valid(cookie) &&
+	    cookie->type != FSCACHE_COOKIE_TYPE_INDEX)
+		__fscache_unuse_cookie(cookie, aux_data, object_size);
+}
+
 /**
  * fscache_relinquish_cookie - Return the cookie to the cache, maybe discarding
  * it
  * @cookie: The cookie being returned
- * @aux_data: The updated auxiliary data for the cookie (may be NULL)
  * @retire: True if the cache object the cookie represents is to be discarded
  *
  * This function returns a cookie to the cache, forcibly discarding the
- * associated cache object if retire is set to true.  The opportunity is
- * provided to update the auxiliary data in the cache before the object is
- * disconnected.
+ * associated cache object if retire is set to true.
  *
  * See Documentation/filesystems/caching/netfs-api.rst for a complete
  * description.
  */
 static inline
-void fscache_relinquish_cookie(struct fscache_cookie *cookie,
-			       const void *aux_data,
-			       bool retire)
+void fscache_relinquish_cookie(struct fscache_cookie *cookie, bool retire)
 {
 	if (fscache_cookie_valid(cookie))
-		__fscache_relinquish_cookie(cookie, aux_data, retire);
+		__fscache_relinquish_cookie(cookie, retire);
 }
 
 /**
  * fscache_update_cookie - Request that a cache object be updated
  * @cookie: The cookie representing the cache object
  * @aux_data: The updated auxiliary data for the cookie (may be NULL)
+ * @object_size: The current size of the object (may be NULL)
  *
  * Request an update of the index data for the cache object associated with the
  * cookie.  The auxiliary data on the cookie will be updated first if @aux_data
- * is set.
+ * is set and the object size will be updated and the object possibly trimmed
+ * if @object_size is set.
  *
  * See Documentation/filesystems/caching/netfs-api.rst for a complete
  * description.
  */
 static inline
-void fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data)
+void fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data,
+			   const loff_t *object_size)
 {
-	if (fscache_cookie_valid(cookie) && fscache_cookie_enabled(cookie))
-		__fscache_update_cookie(cookie, aux_data);
+	if (fscache_cookie_valid(cookie))
+		__fscache_update_cookie(cookie, aux_data, object_size);
 }
 
 /**
@@ -352,7 +387,7 @@ void fscache_unpin_cookie(struct fscache_cookie *cookie)
 static inline
 void fscache_invalidate(struct fscache_cookie *cookie)
 {
-	if (fscache_cookie_valid(cookie) && fscache_cookie_enabled(cookie))
+	if (fscache_cookie_valid(cookie))
 		__fscache_invalidate(cookie);
 }
 
@@ -372,58 +407,4 @@ void fscache_wait_on_invalidate(struct fscache_cookie *cookie)
 		__fscache_wait_on_invalidate(cookie);
 }
 
-/**
- * fscache_disable_cookie - Disable a cookie
- * @cookie: The cookie representing the cache object
- * @aux_data: The updated auxiliary data for the cookie (may be NULL)
- * @invalidate: Invalidate the backing object
- *
- * Disable a cookie from accepting further alloc, read, write, invalidate,
- * update or acquire operations.  Outstanding operations can still be waited
- * upon and pages can still be uncached and the cookie relinquished.
- *
- * This will not return until all outstanding operations have completed.
- *
- * If @invalidate is set, then the backing object will be invalidated and
- * detached, otherwise it will just be detached.
- *
- * If @aux_data is set, then auxiliary data will be updated from that.
- */
-static inline
-void fscache_disable_cookie(struct fscache_cookie *cookie,
-			    const void *aux_data,
-			    bool invalidate)
-{
-	if (fscache_cookie_valid(cookie) && fscache_cookie_enabled(cookie))
-		__fscache_disable_cookie(cookie, aux_data, invalidate);
-}
-
-/**
- * fscache_enable_cookie - Reenable a cookie
- * @cookie: The cookie representing the cache object
- * @aux_data: The updated auxiliary data for the cookie (may be NULL)
- * @object_size: Current size of object
- * @can_enable: A function to permit enablement once lock is held
- * @data: Data for can_enable()
- *
- * Reenable a previously disabled cookie, allowing it to accept further alloc,
- * read, write, invalidate, update or acquire operations.  An attempt will be
- * made to immediately reattach the cookie to a backing object.  If @aux_data
- * is set, the auxiliary data attached to the cookie will be updated.
- *
- * The can_enable() function is called (if not NULL) once the enablement lock
- * is held to rule on whether enablement is still permitted to go ahead.
- */
-static inline
-void fscache_enable_cookie(struct fscache_cookie *cookie,
-			   const void *aux_data,
-			   loff_t object_size,
-			   bool (*can_enable)(void *data),
-			   void *data)
-{
-	if (fscache_cookie_valid(cookie) && !fscache_cookie_enabled(cookie))
-		__fscache_enable_cookie(cookie, aux_data, object_size,
-					can_enable, data);
-}
-
 #endif /* _LINUX_FSCACHE_H */
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index c877035c2946..4fedc2e9c428 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -36,14 +36,21 @@ enum cachefiles_obj_ref_trace {
 	E_(FSCACHE_OBJECT_WAS_CULLED,	"was_culled")
 
 #define cachefiles_obj_ref_traces					\
-	EM(fscache_obj_get_add_to_deps,		"GET add_to_deps")	\
-	EM(fscache_obj_get_queue,		"GET queue")		\
+	EM(fscache_obj_get_attach,		"GET attach")		\
+	EM(fscache_obj_get_exists,		"GET exists")		\
+	EM(fscache_obj_get_inval,		"GET inval")		\
+	EM(fscache_obj_get_wait,		"GET wait")		\
+	EM(fscache_obj_get_withdraw,		"GET withdraw")		\
+	EM(fscache_obj_new,			"NEW obj")		\
+	EM(fscache_obj_put,			"PUT general")		\
+	EM(fscache_obj_put_alloc_dup,		"PUT alloc_dup")	\
 	EM(fscache_obj_put_alloc_fail,		"PUT alloc_fail")	\
 	EM(fscache_obj_put_attach_fail,		"PUT attach_fail")	\
+	EM(fscache_obj_put_drop_child,		"PUT drop_child")	\
 	EM(fscache_obj_put_drop_obj,		"PUT drop_obj")		\
-	EM(fscache_obj_put_enq_dep,		"PUT enq_dep")		\
-	EM(fscache_obj_put_queue,		"PUT queue")		\
-	EM(fscache_obj_put_work,		"PUT work")		\
+	EM(fscache_obj_put_inval,		"PUT inval")		\
+	EM(fscache_obj_put_withdraw,		"PUT withdraw")		\
+	EM(fscache_obj_put_lookup_fail,		"PUT lookup_fail")	\
 	EM(cachefiles_obj_put_wait_retry,	"PUT wait_retry")	\
 	E_(cachefiles_obj_put_wait_timeo,	"PUT wait_timeo")
 
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index fb3fdf2921ee..4ce81f3f4c5b 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -24,15 +24,19 @@ enum fscache_cookie_trace {
 	fscache_cookie_discard,
 	fscache_cookie_get_acquire_parent,
 	fscache_cookie_get_attach_object,
+	fscache_cookie_get_ioreq,
 	fscache_cookie_get_reacquire,
 	fscache_cookie_get_register_netfs,
 	fscache_cookie_get_work,
-	fscache_cookie_put_acquire_nobufs,
+	fscache_cookie_new_acquire,
+	fscache_cookie_new_netfs,
 	fscache_cookie_put_dup_netfs,
-	fscache_cookie_put_relinquish,
+	fscache_cookie_put_ioreq,
 	fscache_cookie_put_object,
 	fscache_cookie_put_parent,
+	fscache_cookie_put_relinquish,
 	fscache_cookie_put_work,
+	fscache_cookie_see_discard,
 };
 
 #endif
@@ -41,19 +45,23 @@ enum fscache_cookie_trace {
  * Declare tracing information enums and their string mappings for display.
  */
 #define fscache_cookie_traces						\
-	EM(fscache_cookie_collision,		"*COLLISION*")		\
-	EM(fscache_cookie_discard,		"DISCARD")		\
-	EM(fscache_cookie_get_acquire_parent,	"GET prn")		\
-	EM(fscache_cookie_get_attach_object,	"GET obj")		\
-	EM(fscache_cookie_get_reacquire,	"GET raq")		\
-	EM(fscache_cookie_get_register_netfs,	"GET net")		\
-	EM(fscache_cookie_get_work,		"GET wrk")		\
-	EM(fscache_cookie_put_acquire_nobufs,	"PUT nbf")		\
-	EM(fscache_cookie_put_dup_netfs,	"PUT dnt")		\
-	EM(fscache_cookie_put_relinquish,	"PUT rlq")		\
-	EM(fscache_cookie_put_object,		"PUT obj")		\
-	EM(fscache_cookie_put_parent,		"PUT prn")		\
-	E_(fscache_cookie_put_work,		"PUT wrk")
+	EM(fscache_cookie_collision,		"*COLLIDE*")		\
+	EM(fscache_cookie_discard,		"DISCARD  ")		\
+	EM(fscache_cookie_get_acquire_parent,	"GET paren")		\
+	EM(fscache_cookie_get_attach_object,	"GET obj  ")		\
+	EM(fscache_cookie_get_ioreq,		"GET ioreq")		\
+	EM(fscache_cookie_get_reacquire,	"GET reacq")		\
+	EM(fscache_cookie_get_register_netfs,	"GET rgstr")		\
+	EM(fscache_cookie_get_work,		"GET work ")		\
+	EM(fscache_cookie_new_acquire,		"NEW acq  ")		\
+	EM(fscache_cookie_new_netfs,		"NEW netfs")		\
+	EM(fscache_cookie_put_dup_netfs,	"PUT dupnf")		\
+	EM(fscache_cookie_put_ioreq,		"GET ioreq")		\
+	EM(fscache_cookie_put_object,		"PUT obj  ")		\
+	EM(fscache_cookie_put_parent,		"PUT paren")		\
+	EM(fscache_cookie_put_relinquish,	"PUT relnq")		\
+	EM(fscache_cookie_put_work,		"PUT work ")		\
+	E_(fscache_cookie_see_discard,		"SEE discd")
 
 /*
  * Export enum symbols via userspace.
@@ -102,9 +110,10 @@ TRACE_EVENT(fscache_cookie,
 		    __entry->flags	= cookie->flags;
 			   ),
 
-	    TP_printk("%s c=%08x u=%d p=%08x Nc=%d Na=%d f=%02x",
+	    TP_printk("c=%08x %s u=%d p=%08x Nc=%d Na=%d f=%02x",
+		      __entry->cookie,
 		      __print_symbolic(__entry->where, fscache_cookie_traces),
-		      __entry->cookie, __entry->usage,
+		      __entry->usage,
 		      __entry->parent, __entry->n_children, __entry->n_active,
 		      __entry->flags)
 	    );
@@ -189,96 +198,6 @@ TRACE_EVENT(fscache_relinquish,
 		      __entry->flags, __entry->retire)
 	    );
 
-TRACE_EVENT(fscache_enable,
-	    TP_PROTO(struct fscache_cookie *cookie),
-
-	    TP_ARGS(cookie),
-
-	    TP_STRUCT__entry(
-		    __field(unsigned int,		cookie		)
-		    __field(int,			usage		)
-		    __field(int,			n_children	)
-		    __field(int,			n_active	)
-		    __field(u8,				flags		)
-			     ),
-
-	    TP_fast_assign(
-		    __entry->cookie	= cookie->debug_id;
-		    __entry->usage	= atomic_read(&cookie->usage);
-		    __entry->n_children	= atomic_read(&cookie->n_children);
-		    __entry->n_active	= atomic_read(&cookie->n_active);
-		    __entry->flags	= cookie->flags;
-			   ),
-
-	    TP_printk("c=%08x u=%d Nc=%d Na=%d f=%02x",
-		      __entry->cookie, __entry->usage,
-		      __entry->n_children, __entry->n_active, __entry->flags)
-	    );
-
-TRACE_EVENT(fscache_disable,
-	    TP_PROTO(struct fscache_cookie *cookie),
-
-	    TP_ARGS(cookie),
-
-	    TP_STRUCT__entry(
-		    __field(unsigned int,		cookie		)
-		    __field(int,			usage		)
-		    __field(int,			n_children	)
-		    __field(int,			n_active	)
-		    __field(u8,				flags		)
-			     ),
-
-	    TP_fast_assign(
-		    __entry->cookie	= cookie->debug_id;
-		    __entry->usage	= atomic_read(&cookie->usage);
-		    __entry->n_children	= atomic_read(&cookie->n_children);
-		    __entry->n_active	= atomic_read(&cookie->n_active);
-		    __entry->flags	= cookie->flags;
-			   ),
-
-	    TP_printk("c=%08x u=%d Nc=%d Na=%d f=%02x",
-		      __entry->cookie, __entry->usage,
-		      __entry->n_children, __entry->n_active, __entry->flags)
-	    );
-
-TRACE_EVENT(fscache_osm,
-	    TP_PROTO(struct fscache_object *object,
-		     const struct fscache_state *state,
-		     bool wait, bool oob, s8 event_num),
-
-	    TP_ARGS(object, state, wait, oob, event_num),
-
-	    TP_STRUCT__entry(
-		    __field(unsigned int,		cookie		)
-		    __field(unsigned int,		object		)
-		    __array(char,			state, 8	)
-		    __field(bool,			wait		)
-		    __field(bool,			oob		)
-		    __field(s8,				event_num	)
-			     ),
-
-	    TP_fast_assign(
-		    __entry->cookie		= object->cookie->debug_id;
-		    __entry->object		= object->debug_id;
-		    __entry->wait		= wait;
-		    __entry->oob		= oob;
-		    __entry->event_num		= event_num;
-		    memcpy(__entry->state, state->short_name, 8);
-			   ),
-
-	    TP_printk("c=%08x o=%08d %s %s%sev=%d",
-		      __entry->cookie,
-		      __entry->object,
-		      __entry->state,
-		      __print_symbolic(__entry->wait,
-				       { true,  "WAIT" },
-				       { false, "WORK" }),
-		      __print_symbolic(__entry->oob,
-				       { true,  " OOB " },
-				       { false, " " }),
-		      __entry->event_num)
-	    );
-
 #endif /* _TRACE_FSCACHE_H */
 
 /* This part must be outside protection */

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 09/32] fscache: Rewrite the I/O API based on iov_iter
  2020-07-13 16:30 ` David Howells
                   ` (8 preceding siblings ...)
  (?)
@ 2020-07-13 16:32 ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:32 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Rewrite the fscache I/O API by introducing a number of new routines based
on a number of principles:

 (1) The cache provides *only* write-to-cache and read-from-cache calls for
     transferring data to/from the cache.

 (2) The bufferage for I/O to/from the cache is supplied with an iov_iter.
     There is no requirement that the iov_iters involved have anything to
     do with an inode's pagecache, though if it does, an ITER_MAPPING
     iterator is available.

 (3) I/O to/from any particular cache object is done in one of a number of
     modes, set for the cache object at cookie acquisition time:

	(A) Single blob.  The blob must be written in its entirety in one
	    go.

	(B) Granular.  Writes to the cache should be done in granule
	    sized-blocks, where, for the moment, a granule will be 256KiB,
	    but could be variable.  This allows the metadata indicating
	    which granules are present to be smaller at the cost of using
	    more disk space.

     In both cases, reads from the cache may be done in smaller chunks and
     small update writes may be done inside a block that exists.

 (4) I/O to/from the cache must be aligned to the DIO block size of the
     backing filesystem.  The cache tells the caller what it should
     consider the DIO block size to be.  This will never be larger than
     page size.

 (5) Completion of the I/O results in a callback - after which the cache
     no longer knows about it.

 (6) The cache doesn't retain any pointers back into the netfs, either the
     code, its state or its pagecache.


To do granular I/O, the netfs has to take the read or write request it got
from the VFS/VM and 'shape' it to fit the caching parameters.  It does this
by filling in a form to indicate the extent of the operation it might like
to make:

	struct fscache_request_shape {
		/* Parameters */
		loff_t		i_size;
		pgoff_t		proposed_start;
		unsigned int	proposed_nr_pages;
		unsigned int	max_io_pages;
		bool		for_write;

		/* Result */
		unsigned int	to_be_done;
		unsigned int	granularity;
		unsigned int	dio_block_size;
		unsigned int	actual_nr_pages;
		pgoff_t		actual_start;
	};

and then it calls:

	void fscache_shape_request(struct fscache_cookie *cookie,
				   struct fscache_request_shape *shape);

to shape it.

The netfs should set 'proposed_start' to be the first page to read,
'proposed_nr_pages' to indicate the size of the request and 'i_size' to
indicate the size that the file should be considered to be.  'max_io_pages'
should be set to the maximum size of a transaction, up to UINT_MAX, and
'for_write' should be set to true if this is for a write to the cache.

The cache will then shape the proposed read to fit a blocking factor
appropriate for the cache object and region of the file.  It may extend
start forward and may shrink or extend the request to fit the granularity
of the cache.  This will be trimmed to the end of file as specified by the
proposed file size.

Upon return, 'to_be_done' will be set to one of FSCACHE_READ_FROM_SERVER,
FSCACHE_READ_FROM_CACHE, FSCACHE_FILL_WITH_ZERO, and may have
FSCACHE_WRITE_TO_CACHE bitwise-OR'd onto it.

'actual_start' and 'actual_nr_pages' will be set to indicate
the cache's proposal for the desired size and position of the operation.
'granularity' will be set to hold the cache block granularity (in pages)
and transaction can be shortened to a multiple of this.  Note that the
shaped request will always include the proposed_start page.

'dio_block_size' will be set to whatever I/O size the cache must
communicate with its storage in.  This is necessary to set up the iov_iter
to be passed to the cache for reading and writing so that it can do direct
I/O.

Once the netfs has set up its request, if FSCACHE_READ_FROM_CACHE was set,
it should then call:

	void fscache_read(struct fscache_io_request *req)

to read data from the cache.  To do this, it needs to fill out a request
descriptor:

	struct fscache_io_request {
		const struct fscache_io_request_ops *ops;
		struct fscache_cookie	*cookie;
		loff_t			pos;
		loff_t			len;
		int			error;
		bool (*is_still_valid)(struct fscache_io_request *);
		void (*done)(struct fscache_io_request *);
		...
	};

The ops pointer, cookie, position and length should be set to describe the
I/O operation to be performed.  An 'is_still_valid' function may be
provided to check whether the operation should still go ahead after a wait
in case it got invalidated by the server.

A 'done' function may be provided that will be called to finalise the
operation.  If provided, the 'done' function will be always be called, even
when the operation doesn't take place because there's no cache.  If no done
function is called, the operation will be synchronous.

Note that the pages must be pinned - typically by locking them.

If FSCACHE_WRITE_TO_CACHE was set, then once the data is read from the
server, the netfs should write it to the cache by calling:

	void fscache_write(struct fscache_io_request *req)

The request descriptor is set as for fscache_read().  Note that the pages
must be pinned.  In this case, PG_fscache can be set on the page and the
pages can be unlocked; the bit can then be cleared by the done handler.
The releasepage, invalidatepage, launderpage and page_mkwrite functions
should be used to suspend progress until the bit is cleared.  The following
functions are made available in an earlier patch for this:

	void unlock_page_fscache(struct page *page);
	void wait_on_page_fscache(struct page *page)

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/Makefile               |    1 
 fs/fscache/io.c                   |  170 +++++++++++++++++++++++++++++++
 include/linux/fscache-cache.h     |   28 +++++
 include/linux/fscache.h           |  201 +++++++++++++++++++++++++++++++++++++
 include/trace/events/cachefiles.h |    2 
 5 files changed, 402 insertions(+)
 create mode 100644 fs/fscache/io.c

diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
index 396e1b5fdc28..3caf66810e7b 100644
--- a/fs/fscache/Makefile
+++ b/fs/fscache/Makefile
@@ -8,6 +8,7 @@ fscache-y := \
 	cookie.o \
 	dispatcher.o \
 	fsdef.o \
+	io.o \
 	main.o \
 	netfs.o \
 	obj.o \
diff --git a/fs/fscache/io.c b/fs/fscache/io.c
new file mode 100644
index 000000000000..8d7f79551699
--- /dev/null
+++ b/fs/fscache/io.c
@@ -0,0 +1,170 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Data I/O routines
+ *
+ * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#define FSCACHE_DEBUG_LEVEL OPERATION
+#include <linux/module.h>
+#include <linux/fscache-cache.h>
+#include <linux/slab.h>
+#include "internal.h"
+
+/*
+ * Initialise an I/O request
+ */
+void __fscache_init_io_request(struct fscache_io_request *req,
+			       struct fscache_cookie *cookie)
+{
+	req->cookie = fscache_cookie_get(cookie, fscache_cookie_get_ioreq);
+}
+EXPORT_SYMBOL(__fscache_init_io_request);
+
+/*
+ * Clean up an I/O request
+ */
+void __fscache_free_io_request(struct fscache_io_request *req)
+{
+	if (req->object)
+		req->object->cache->ops->put_object(req->object,
+						    fscache_obj_put_ioreq);
+	fscache_cookie_put(req->cookie, fscache_cookie_put_ioreq);
+}
+EXPORT_SYMBOL(__fscache_free_io_request);
+
+enum fscache_want_stage {
+	FSCACHE_WANT_PARAMS,
+	FSCACHE_WANT_WRITE,
+	FSCACHE_WANT_READ,
+};
+
+/*
+ * Begin an I/O operation on the cache, waiting till we reach the right state.
+ *
+ * Returns a pointer to the object to use or an error.  If an object is
+ * returned, it will have an extra ref on it.
+ */
+static struct fscache_object *fscache_begin_io_operation(
+	struct fscache_cookie *cookie,
+	enum fscache_want_stage want,
+	struct fscache_io_request *req)
+{
+	struct fscache_object *object;
+	enum fscache_cookie_stage stage;
+
+again:
+	spin_lock(&cookie->lock);
+
+	stage = cookie->stage;
+	_enter("c=%08x{%u},%x", cookie->debug_id, stage, want);
+
+	switch (stage) {
+	case FSCACHE_COOKIE_STAGE_QUIESCENT:
+	case FSCACHE_COOKIE_STAGE_DEAD:
+		goto not_live;
+	case FSCACHE_COOKIE_STAGE_INITIALISING:
+	case FSCACHE_COOKIE_STAGE_LOOKING_UP:
+	case FSCACHE_COOKIE_STAGE_INVALIDATING:
+		goto wait_and_validate;
+
+	case FSCACHE_COOKIE_STAGE_NO_DATA_YET:
+		if (want == FSCACHE_WANT_READ)
+			goto no_data_yet;
+		/* Fall through */
+	case FSCACHE_COOKIE_STAGE_ACTIVE:
+		goto ready;
+	}
+
+ready:
+	object = hlist_entry(cookie->backing_objects.first,
+			     struct fscache_object, cookie_link);
+
+	if (fscache_cache_is_broken(object))
+		goto not_live;
+
+	object->cache->ops->grab_object(object, fscache_obj_get_ioreq);
+
+	atomic_inc(&cookie->n_ops);
+	spin_unlock(&cookie->lock);
+	return object;
+
+wait_and_validate:
+	spin_unlock(&cookie->lock);
+	wait_var_event(&cookie->stage, cookie->stage != stage);
+	if (req &&
+	    req->ops->is_still_valid &&
+	    !req->ops->is_still_valid(req)) {
+		_leave(" = -ESTALE");
+		return ERR_PTR(-ESTALE);
+	}
+	goto again;
+
+no_data_yet:
+	spin_unlock(&cookie->lock);
+	_leave(" = -ENODATA");
+	return ERR_PTR(-ENODATA);
+
+not_live:
+	spin_unlock(&cookie->lock);
+	_leave(" = -ENOBUFS");
+	return ERR_PTR(-ENOBUFS);
+}
+
+/*
+ * Determine the size of an allocation granule or a region of data in the
+ * cache.
+ */
+void __fscache_shape_request(struct fscache_cookie *cookie,
+			     struct fscache_request_shape *shape)
+{
+	struct fscache_object *object =
+		fscache_begin_io_operation(cookie, FSCACHE_WANT_PARAMS, NULL);
+
+	if (!IS_ERR(object)) {
+		object->cache->ops->shape_request(object, shape);
+		object->cache->ops->put_object(object, fscache_obj_put_ioreq);
+		fscache_end_io_operation(cookie);
+	}
+}
+EXPORT_SYMBOL(__fscache_shape_request);
+
+/*
+ * Read data from the cache.
+ */
+int __fscache_read(struct fscache_io_request *req, struct iov_iter *iter)
+{
+	struct fscache_object *object =
+		fscache_begin_io_operation(req->cookie, FSCACHE_WANT_READ, req);
+
+	if (!IS_ERR(object)) {
+		req->object = object;
+		return object->cache->ops->read(object, req, iter);
+	} else {
+		req->error = PTR_ERR(object);
+		if (req->io_done)
+			req->io_done(req);
+		return req->error;
+	}
+}
+EXPORT_SYMBOL(__fscache_read);
+
+/*
+ * Write data to the cache.
+ */
+int __fscache_write(struct fscache_io_request *req, struct iov_iter *iter)
+{
+	struct fscache_object *object =
+		fscache_begin_io_operation(req->cookie, FSCACHE_WANT_WRITE, req);
+
+	if (!IS_ERR(object)) {
+		req->object = object;
+		return object->cache->ops->write(object, req, iter);
+	} else {
+		req->error = PTR_ERR(object);
+		if (req->io_done)
+			req->io_done(req);
+		return req->error;
+	}
+}
+EXPORT_SYMBOL(__fscache_write);
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index de1cffb2558e..81a41e37f07b 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -22,11 +22,13 @@
 struct fscache_cache;
 struct fscache_cache_ops;
 struct fscache_object;
+struct fscache_io_operations;
 
 enum fscache_obj_ref_trace {
 	fscache_obj_get_attach,
 	fscache_obj_get_exists,
 	fscache_obj_get_inval,
+	fscache_obj_get_ioreq,
 	fscache_obj_get_wait,
 	fscache_obj_get_withdraw,
 	fscache_obj_new,
@@ -37,6 +39,7 @@ enum fscache_obj_ref_trace {
 	fscache_obj_put_drop_child,
 	fscache_obj_put_drop_obj,
 	fscache_obj_put_inval,
+	fscache_obj_put_ioreq,
 	fscache_obj_put_lookup_fail,
 	fscache_obj_put_withdraw,
 	fscache_obj_ref__nr_traces
@@ -134,6 +137,20 @@ struct fscache_cache_ops {
 
 	/* reserve space for an object's data and associated metadata */
 	int (*reserve_space)(struct fscache_object *object, loff_t i_size);
+
+	/* Shape the extent of a read or write */
+	void (*shape_request)(struct fscache_object *object,
+			      struct fscache_request_shape *shape);
+
+	/* Read data from the cache */
+	int (*read)(struct fscache_object *object,
+		    struct fscache_io_request *req,
+		    struct iov_iter *iter);
+
+	/* Write data to the cache */
+	int (*write)(struct fscache_object *object,
+		     struct fscache_io_request *req,
+		     struct iov_iter *iter);
 };
 
 extern struct fscache_cookie fscache_fsdef_index;
@@ -239,4 +256,15 @@ static inline void fscache_end_io_operation(struct fscache_cookie *cookie)
 		wake_up_var(&cookie->n_ops);
 }
 
+static inline void fscache_get_io_request(struct fscache_io_request *req)
+{
+	req->ops->get(req);
+}
+
+static inline void fscache_put_io_request(struct fscache_io_request *req)
+{
+	if (req)
+		req->ops->put(req);
+}
+
 #endif /* _LINUX_FSCACHE_CACHE_H */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 11b18761a3b6..aec75fc0d297 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -42,9 +42,11 @@
 /* pattern used to fill dead space in an index entry */
 #define FSCACHE_INDEX_DEADFILL_PATTERN 0x79
 
+struct iov_iter;
 struct fscache_cache_tag;
 struct fscache_cookie;
 struct fscache_netfs;
+struct fscache_io_request_ops;
 
 enum fscache_cookie_type {
 	FSCACHE_COOKIE_TYPE_INDEX,
@@ -122,6 +124,73 @@ struct fscache_cookie {
 	};
 };
 
+/*
+ * The size and shape of a request to the cache, adjusted for cache
+ * granularity, for the data available on doing a read, the page size and
+ * non-contiguities and for the netfs's own I/O patterning.
+ *
+ * Before calling fscache_shape_request(), @proposed_start and @proposed_end
+ * must be set to indicate the bounds of the request and @max_io_pages to the
+ * limit the netfs is willing to accept on the size of an I/O operation.
+ * @i_size should be set to the size the file should be considered to be and
+ * @for_write should be set if a write request is being shaped.
+ *
+ * After shaping, @actual_start and @actual_end will mark out the size of the
+ * shaped request.  @granularity will convey the size a the cache block, should
+ * the request need to be reduced in scope, either due to memory constraints or
+ * netfs I/O constraints.  @dio_block_size will be set to the direct I/O size
+ * for the cache - fscache_read/write() can't be expected read/write chunks
+ * smaller than this or at positions that aren't aligned to this.
+ *
+ * Finally, @to_be_done will be set by the shaper to indicate whether the
+ * region can be read from the cache or filled with zeros and whether it should
+ * be written to the cache after being read from the server or cleared.
+ */
+struct fscache_request_shape {
+	/* Parameters */
+	loff_t		i_size;		/* The file size to use in calculations */
+	pgoff_t		proposed_start;	/* First page in the proposed request */
+	unsigned int	proposed_nr_pages; /* Number of pages in the proposed request */
+	unsigned int	max_io_pages;	/* Max pages in a netfs I/O request (or UINT_MAX) */
+	bool		for_write;	/* Set if shaping a write */
+
+	/* Result */
+#define FSCACHE_READ_FROM_SERVER 0x00
+#define FSCACHE_READ_FROM_CACHE	0x01
+#define FSCACHE_WRITE_TO_CACHE	0x02
+#define FSCACHE_FILL_WITH_ZERO	0x04
+	unsigned int	to_be_done;	/* What should be done by the caller */
+	unsigned int	granularity;	/* Cache granularity in pages */
+	unsigned int	dio_block_size;	/* Block size required for direct I/O */
+	unsigned int	actual_nr_pages; /* Number of pages in the shaped request */
+	pgoff_t		actual_start;	/* First page in the shaped request */
+};
+
+/*
+ * Descriptor for an fscache I/O request.
+ */
+struct fscache_io_request {
+	const struct fscache_io_request_ops *ops;
+	struct fscache_cookie	*cookie;
+	struct fscache_object	*object;
+	loff_t			pos;		/* Where to start the I/O */
+	loff_t			len;		/* Size of the I/O */
+	loff_t			transferred;	/* Amount of data transferred */
+	short			error;		/* 0 or error that occurred */
+	unsigned long		flags;
+#define FSCACHE_IO_DATA_FROM_SERVER	0	/* Set if data was read from server */
+#define FSCACHE_IO_DATA_FROM_CACHE	1	/* Set if data was read from the cache */
+	void (*io_done)(struct fscache_io_request *);
+};
+
+struct fscache_io_request_ops {
+	bool (*is_still_valid)(struct fscache_io_request *);
+	void (*issue_op)(struct fscache_io_request *);
+	void (*done)(struct fscache_io_request *);
+	void (*get)(struct fscache_io_request *);
+	void (*put)(struct fscache_io_request *);
+};
+
 /*
  * slow-path functions for when there is actually caching available, and the
  * netfs does actually have a valid token
@@ -149,6 +218,12 @@ extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
 extern void __fscache_update_cookie(struct fscache_cookie *, const void *, const loff_t *);
 extern void __fscache_invalidate(struct fscache_cookie *);
 extern void __fscache_wait_on_invalidate(struct fscache_cookie *);
+extern void __fscache_shape_request(struct fscache_cookie *, struct fscache_request_shape *);
+extern void __fscache_init_io_request(struct fscache_io_request *,
+				      struct fscache_cookie *);
+extern void __fscache_free_io_request(struct fscache_io_request *);
+extern int __fscache_read(struct fscache_io_request *, struct iov_iter *);
+extern int __fscache_write(struct fscache_io_request *, struct iov_iter *);
 
 /**
  * fscache_register_netfs - Register a filesystem as desiring caching services
@@ -407,4 +482,130 @@ void fscache_wait_on_invalidate(struct fscache_cookie *cookie)
 		__fscache_wait_on_invalidate(cookie);
 }
 
+/**
+ * fscache_init_io_request - Initialise an I/O request
+ * @req: The I/O request to initialise
+ * @cookie: The I/O cookie to access
+ * @ops: The operations table to set
+ */
+static inline void fscache_init_io_request(struct fscache_io_request *req,
+					   struct fscache_cookie *cookie,
+					   const struct fscache_io_request_ops *ops)
+{
+	req->ops = ops;
+	if (fscache_cookie_valid(cookie))
+		__fscache_init_io_request(req, cookie);
+}
+
+/**
+ * fscache_free_io_request - Clean up an I/O request
+ * @req: The I/O request to clean
+ */
+static inline
+void fscache_free_io_request(struct fscache_io_request *req)
+{
+	if (req->cookie)
+		__fscache_free_io_request(req);
+}
+
+/**
+ * fscache_shape_request - Shape an request to fit cache granulation
+ * @cookie: The cache cookie to access
+ * @shape: The request proposed by the VM/filesystem (gets modified).
+ *
+ * Shape the size and position of a cache I/O request such that either the
+ * region will entirely be read from the server or entirely read from the
+ * cache.  The proposed region may be adjusted by a combination of extending
+ * the front forward and/or extending or shrinking the end.  In any case, the
+ * first page of the proposed request will be contained in the revised extent.
+ *
+ * The function sets shape->to_be_done to FSCACHE_READ_FROM_CACHE to indicate
+ * that the data is resident in the cache and can be read from there,
+ * FSCACHE_WRITE_TO_CACHE to indicate that the data isn't present, but the
+ * netfs should write it, FSCACHE_FILL_WITH_ZERO to indicate that the data
+ * should be all zeros on the server and can just be fabricated locally or
+ * FSCACHE_READ_FROM_SERVER to indicate that there's no cache or an error
+ * occurred and the netfs should just read from the server.
+ */
+static inline
+void fscache_shape_request(struct fscache_cookie *cookie,
+			   struct fscache_request_shape *shape)
+{
+	shape->to_be_done	= FSCACHE_READ_FROM_SERVER;
+	shape->granularity	= 1;
+	shape->dio_block_size	= 1;
+	shape->actual_nr_pages	= shape->proposed_nr_pages;
+	shape->actual_start	= shape->proposed_start;
+
+	if (fscache_cookie_valid(cookie))
+		__fscache_shape_request(cookie, shape);
+}
+
+/**
+ * fscache_read - Read data from the cache.
+ * @req: The I/O request descriptor
+ * @iter: The buffer to read into
+ *
+ * The cache will attempt to read from the object referred to by the cookie,
+ * using the size and position described in the request.  The data will be
+ * transferred to the buffer described by the iterator specified in the request.
+ *
+ * If this fails or can't be done, an error will be set in the request
+ * descriptor and the netfs must reissue the read to the server.
+ *
+ * Note that the length and position of the request should be aligned to the DIO
+ * block size returned by fscache_shape_request().
+ *
+ * If req->done is set, the request will be submitted as asynchronous I/O and
+ * -EIOCBQUEUED may be returned to indicate that the operation is in progress.
+ * The done function will be called when the operation is concluded either way.
+ *
+ * If req->done is not set, the request will be submitted as synchronous I/O and
+ * will be completed before the function returns.
+ */
+static inline
+int fscache_read(struct fscache_io_request *req, struct iov_iter *iter)
+{
+	if (fscache_cookie_valid(req->cookie))
+		return __fscache_read(req, iter);
+	req->error = -ENODATA;
+	if (req->io_done)
+		req->io_done(req);
+	return -ENODATA;
+}
+
+
+/**
+ * fscache_write - Write data to the cache.
+ * @req: The I/O request description
+ * @iter: The data to write
+ *
+ * The cache will attempt to write to the object referred to by the cookie,
+ * using the size and position described in the request.  The data will be
+ * transferred from the iterator specified in the request.
+ *
+ * If this fails or can't be done, an error will be set in the request
+ * descriptor.
+ *
+ * Note that the length and position of the request should be aligned to the DIO
+ * block size returned by fscache_shape_request().
+ *
+ * If req->io_done is set, the request will be submitted as asynchronous I/O and
+ * -EIOCBQUEUED may be returned to indicate that the operation is in progress.
+ * The done function will be called when the operation is concluded either way.
+ *
+ * If req->io_done is not set, the request will be submitted as synchronous I/O and
+ * will be completed before the function returns.
+ */
+static inline
+int fscache_write(struct fscache_io_request *req, struct iov_iter *iter)
+{
+	if (fscache_cookie_valid(req->cookie))
+		return __fscache_write(req, iter);
+	req->error = -ENOBUFS;
+	if (req->io_done)
+		req->io_done(req);
+	return -ENOBUFS;
+}
+
 #endif /* _LINUX_FSCACHE_H */
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 4fedc2e9c428..0aa3f3126f6e 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -39,6 +39,7 @@ enum cachefiles_obj_ref_trace {
 	EM(fscache_obj_get_attach,		"GET attach")		\
 	EM(fscache_obj_get_exists,		"GET exists")		\
 	EM(fscache_obj_get_inval,		"GET inval")		\
+	EM(fscache_obj_get_ioreq,		"GET ioreq")		\
 	EM(fscache_obj_get_wait,		"GET wait")		\
 	EM(fscache_obj_get_withdraw,		"GET withdraw")		\
 	EM(fscache_obj_new,			"NEW obj")		\
@@ -49,6 +50,7 @@ enum cachefiles_obj_ref_trace {
 	EM(fscache_obj_put_drop_child,		"PUT drop_child")	\
 	EM(fscache_obj_put_drop_obj,		"PUT drop_obj")		\
 	EM(fscache_obj_put_inval,		"PUT inval")		\
+	EM(fscache_obj_put_ioreq,		"PUT ioreq")		\
 	EM(fscache_obj_put_withdraw,		"PUT withdraw")		\
 	EM(fscache_obj_put_lookup_fail,		"PUT lookup_fail")	\
 	EM(cachefiles_obj_put_wait_retry,	"PUT wait_retry")	\



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 10/32] fscache: Remove fscache_wait_on_invalidate()
  2020-07-13 16:30 ` David Howells
@ 2020-07-13 16:32   ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:32 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Remove fscache_wait_on_invalidate() as the invalidation wait is now built into
the I/O path.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/cookie.c     |   14 --------------
 include/linux/fscache.h |   17 -----------------
 2 files changed, 31 deletions(-)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index a8aa1639e93b..a1eba3be9ce8 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -492,20 +492,6 @@ void __fscache_invalidate(struct fscache_cookie *cookie)
 }
 EXPORT_SYMBOL(__fscache_invalidate);
 
-/*
- * Wait for object invalidation to complete.
- */
-void __fscache_wait_on_invalidate(struct fscache_cookie *cookie)
-{
-	_enter("%p", cookie);
-
-	wait_on_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING,
-		    TASK_UNINTERRUPTIBLE);
-
-	_leave("");
-}
-EXPORT_SYMBOL(__fscache_wait_on_invalidate);
-
 /*
  * Update the index entries backing a cookie.  The writeback is done lazily.
  */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index aec75fc0d297..56fdd0e74a88 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -217,7 +217,6 @@ extern void __fscache_unuse_cookie(struct fscache_cookie *, const void *, const
 extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
 extern void __fscache_update_cookie(struct fscache_cookie *, const void *, const loff_t *);
 extern void __fscache_invalidate(struct fscache_cookie *);
-extern void __fscache_wait_on_invalidate(struct fscache_cookie *);
 extern void __fscache_shape_request(struct fscache_cookie *, struct fscache_request_shape *);
 extern void __fscache_init_io_request(struct fscache_io_request *,
 				      struct fscache_cookie *);
@@ -466,22 +465,6 @@ void fscache_invalidate(struct fscache_cookie *cookie)
 		__fscache_invalidate(cookie);
 }
 
-/**
- * fscache_wait_on_invalidate - Wait for invalidation to complete
- * @cookie: The cookie representing the cache object
- *
- * Wait for the invalidation of an object to complete.
- *
- * See Documentation/filesystems/caching/netfs-api.rst for a complete
- * description.
- */
-static inline
-void fscache_wait_on_invalidate(struct fscache_cookie *cookie)
-{
-	if (fscache_cookie_valid(cookie))
-		__fscache_wait_on_invalidate(cookie);
-}
-
 /**
  * fscache_init_io_request - Initialise an I/O request
  * @req: The I/O request to initialise



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 10/32] fscache: Remove fscache_wait_on_invalidate()
@ 2020-07-13 16:32   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:32 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs

Remove fscache_wait_on_invalidate() as the invalidation wait is now built into
the I/O path.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/cookie.c     |   14 --------------
 include/linux/fscache.h |   17 -----------------
 2 files changed, 31 deletions(-)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index a8aa1639e93b..a1eba3be9ce8 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -492,20 +492,6 @@ void __fscache_invalidate(struct fscache_cookie *cookie)
 }
 EXPORT_SYMBOL(__fscache_invalidate);
 
-/*
- * Wait for object invalidation to complete.
- */
-void __fscache_wait_on_invalidate(struct fscache_cookie *cookie)
-{
-	_enter("%p", cookie);
-
-	wait_on_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING,
-		    TASK_UNINTERRUPTIBLE);
-
-	_leave("");
-}
-EXPORT_SYMBOL(__fscache_wait_on_invalidate);
-
 /*
  * Update the index entries backing a cookie.  The writeback is done lazily.
  */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index aec75fc0d297..56fdd0e74a88 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -217,7 +217,6 @@ extern void __fscache_unuse_cookie(struct fscache_cookie *, const void *, const
 extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
 extern void __fscache_update_cookie(struct fscache_cookie *, const void *, const loff_t *);
 extern void __fscache_invalidate(struct fscache_cookie *);
-extern void __fscache_wait_on_invalidate(struct fscache_cookie *);
 extern void __fscache_shape_request(struct fscache_cookie *, struct fscache_request_shape *);
 extern void __fscache_init_io_request(struct fscache_io_request *,
 				      struct fscache_cookie *);
@@ -466,22 +465,6 @@ void fscache_invalidate(struct fscache_cookie *cookie)
 		__fscache_invalidate(cookie);
 }
 
-/**
- * fscache_wait_on_invalidate - Wait for invalidation to complete
- * @cookie: The cookie representing the cache object
- *
- * Wait for the invalidation of an object to complete.
- *
- * See Documentation/filesystems/caching/netfs-api.rst for a complete
- * description.
- */
-static inline
-void fscache_wait_on_invalidate(struct fscache_cookie *cookie)
-{
-	if (fscache_cookie_valid(cookie))
-		__fscache_wait_on_invalidate(cookie);
-}
-
 /**
  * fscache_init_io_request - Initialise an I/O request
  * @req: The I/O request to initialise

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 11/32] fscache: Keep track of size of a file last set independently on the server
  2020-07-13 16:30 ` David Howells
@ 2020-07-13 16:32   ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:32 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Keep track of the size of a file that we're caching as last set
independently on the server by another client.  As long as this does not
change, we can make the assumption that anything over that boundary, if not
represented in the local cache, will not be represented on the server
either and can be just cleared rather than being read, thereby saving a
trip to the server.

This only works if we make space in the cache by zapping whole files and
not just punching bits out of them as if we write to the server but don't
keep a copy in the cache, the assumption mentioned above no longer holds
true.

We also need to update this size when invalidation occurs.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/inode.c          |    2 +-
 fs/fscache/cookie.c     |    8 +++++++-
 include/linux/fscache.h |    8 +++++---
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 49d897437998..b0772e64a844 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -569,7 +569,7 @@ static void afs_zap_data(struct afs_vnode *vnode)
 	_enter("{%llx:%llu}", vnode->fid.vid, vnode->fid.vnode);
 
 #ifdef CONFIG_AFS_FSCACHE
-	fscache_invalidate(vnode->cache);
+	fscache_invalidate(vnode->cache, i_size_read(&vnode->vfs_inode));
 #endif
 
 	/* nuke all the non-dirty pages that aren't locked, mapped or being
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index a1eba3be9ce8..5c53027d3f53 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -159,6 +159,7 @@ struct fscache_cookie *fscache_alloc_cookie(
 	cookie->key_len = index_key_len;
 	cookie->aux_len = aux_data_len;
 	cookie->object_size = object_size;
+	cookie->zero_point = object_size;
 	strlcpy(cookie->type_name, type_name, sizeof(cookie->type_name));
 
 	if (fscache_set_key(cookie, index_key, index_key_len) < 0)
@@ -473,7 +474,7 @@ void fscache_set_cookie_stage(struct fscache_cookie *cookie,
 /*
  * Invalidate an object.  Callable with spinlocks held.
  */
-void __fscache_invalidate(struct fscache_cookie *cookie)
+void __fscache_invalidate(struct fscache_cookie *cookie, loff_t new_size)
 {
 	_enter("{%s}", cookie->type_name);
 
@@ -486,6 +487,11 @@ void __fscache_invalidate(struct fscache_cookie *cookie)
 	 */
 	ASSERTCMP(cookie->type, ==, FSCACHE_COOKIE_TYPE_DATAFILE);
 
+	spin_lock(&cookie->lock);
+	cookie->object_size = new_size;
+	cookie->zero_point = new_size;
+	spin_unlock(&cookie->lock);
+
 	if (!hlist_empty(&cookie->backing_objects) &&
 	    test_and_set_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags))
 		fscache_dispatch(cookie, NULL, 0, fscache_invalidate_object);
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 56fdd0e74a88..bfb28cebfcfd 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -102,6 +102,7 @@ struct fscache_cookie {
 	struct list_head		proc_link;	/* Link in proc list */
 	char				type_name[8];	/* Cookie type name */
 	loff_t				object_size;	/* Size of the netfs object */
+	loff_t				zero_point;	/* Size after which no data on server */
 
 	unsigned long			flags;
 #define FSCACHE_COOKIE_INVALIDATING	4	/* T if cookie is being invalidated */
@@ -216,8 +217,8 @@ extern void __fscache_use_cookie(struct fscache_cookie *, bool);
 extern void __fscache_unuse_cookie(struct fscache_cookie *, const void *, const loff_t *);
 extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
 extern void __fscache_update_cookie(struct fscache_cookie *, const void *, const loff_t *);
-extern void __fscache_invalidate(struct fscache_cookie *);
 extern void __fscache_shape_request(struct fscache_cookie *, struct fscache_request_shape *);
+extern void __fscache_invalidate(struct fscache_cookie *, loff_t);
 extern void __fscache_init_io_request(struct fscache_io_request *,
 				      struct fscache_cookie *);
 extern void __fscache_free_io_request(struct fscache_io_request *);
@@ -448,6 +449,7 @@ void fscache_unpin_cookie(struct fscache_cookie *cookie)
 /**
  * fscache_invalidate - Notify cache that an object needs invalidation
  * @cookie: The cookie representing the cache object
+ * @size: The revised size of the object.
  *
  * Notify the cache that an object is needs to be invalidated and that it
  * should abort any retrievals or stores it is doing on the cache.  The object
@@ -459,10 +461,10 @@ void fscache_unpin_cookie(struct fscache_cookie *cookie)
  * description.
  */
 static inline
-void fscache_invalidate(struct fscache_cookie *cookie)
+void fscache_invalidate(struct fscache_cookie *cookie, loff_t size)
 {
 	if (fscache_cookie_valid(cookie))
-		__fscache_invalidate(cookie);
+		__fscache_invalidate(cookie, size);
 }
 
 /**



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 11/32] fscache: Keep track of size of a file last set independently on the server
@ 2020-07-13 16:32   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:32 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs

Keep track of the size of a file that we're caching as last set
independently on the server by another client.  As long as this does not
change, we can make the assumption that anything over that boundary, if not
represented in the local cache, will not be represented on the server
either and can be just cleared rather than being read, thereby saving a
trip to the server.

This only works if we make space in the cache by zapping whole files and
not just punching bits out of them as if we write to the server but don't
keep a copy in the cache, the assumption mentioned above no longer holds
true.

We also need to update this size when invalidation occurs.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/inode.c          |    2 +-
 fs/fscache/cookie.c     |    8 +++++++-
 include/linux/fscache.h |    8 +++++---
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 49d897437998..b0772e64a844 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -569,7 +569,7 @@ static void afs_zap_data(struct afs_vnode *vnode)
 	_enter("{%llx:%llu}", vnode->fid.vid, vnode->fid.vnode);
 
 #ifdef CONFIG_AFS_FSCACHE
-	fscache_invalidate(vnode->cache);
+	fscache_invalidate(vnode->cache, i_size_read(&vnode->vfs_inode));
 #endif
 
 	/* nuke all the non-dirty pages that aren't locked, mapped or being
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index a1eba3be9ce8..5c53027d3f53 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -159,6 +159,7 @@ struct fscache_cookie *fscache_alloc_cookie(
 	cookie->key_len = index_key_len;
 	cookie->aux_len = aux_data_len;
 	cookie->object_size = object_size;
+	cookie->zero_point = object_size;
 	strlcpy(cookie->type_name, type_name, sizeof(cookie->type_name));
 
 	if (fscache_set_key(cookie, index_key, index_key_len) < 0)
@@ -473,7 +474,7 @@ void fscache_set_cookie_stage(struct fscache_cookie *cookie,
 /*
  * Invalidate an object.  Callable with spinlocks held.
  */
-void __fscache_invalidate(struct fscache_cookie *cookie)
+void __fscache_invalidate(struct fscache_cookie *cookie, loff_t new_size)
 {
 	_enter("{%s}", cookie->type_name);
 
@@ -486,6 +487,11 @@ void __fscache_invalidate(struct fscache_cookie *cookie)
 	 */
 	ASSERTCMP(cookie->type, ==, FSCACHE_COOKIE_TYPE_DATAFILE);
 
+	spin_lock(&cookie->lock);
+	cookie->object_size = new_size;
+	cookie->zero_point = new_size;
+	spin_unlock(&cookie->lock);
+
 	if (!hlist_empty(&cookie->backing_objects) &&
 	    test_and_set_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags))
 		fscache_dispatch(cookie, NULL, 0, fscache_invalidate_object);
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 56fdd0e74a88..bfb28cebfcfd 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -102,6 +102,7 @@ struct fscache_cookie {
 	struct list_head		proc_link;	/* Link in proc list */
 	char				type_name[8];	/* Cookie type name */
 	loff_t				object_size;	/* Size of the netfs object */
+	loff_t				zero_point;	/* Size after which no data on server */
 
 	unsigned long			flags;
 #define FSCACHE_COOKIE_INVALIDATING	4	/* T if cookie is being invalidated */
@@ -216,8 +217,8 @@ extern void __fscache_use_cookie(struct fscache_cookie *, bool);
 extern void __fscache_unuse_cookie(struct fscache_cookie *, const void *, const loff_t *);
 extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
 extern void __fscache_update_cookie(struct fscache_cookie *, const void *, const loff_t *);
-extern void __fscache_invalidate(struct fscache_cookie *);
 extern void __fscache_shape_request(struct fscache_cookie *, struct fscache_request_shape *);
+extern void __fscache_invalidate(struct fscache_cookie *, loff_t);
 extern void __fscache_init_io_request(struct fscache_io_request *,
 				      struct fscache_cookie *);
 extern void __fscache_free_io_request(struct fscache_io_request *);
@@ -448,6 +449,7 @@ void fscache_unpin_cookie(struct fscache_cookie *cookie)
 /**
  * fscache_invalidate - Notify cache that an object needs invalidation
  * @cookie: The cookie representing the cache object
+ * @size: The revised size of the object.
  *
  * Notify the cache that an object is needs to be invalidated and that it
  * should abort any retrievals or stores it is doing on the cache.  The object
@@ -459,10 +461,10 @@ void fscache_unpin_cookie(struct fscache_cookie *cookie)
  * description.
  */
 static inline
-void fscache_invalidate(struct fscache_cookie *cookie)
+void fscache_invalidate(struct fscache_cookie *cookie, loff_t size)
 {
 	if (fscache_cookie_valid(cookie))
-		__fscache_invalidate(cookie);
+		__fscache_invalidate(cookie, size);
 }
 
 /**

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 12/32] fscache, cachefiles: Fix disabled histogram warnings
  2020-07-13 16:30 ` David Howells
@ 2020-07-13 16:32   ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:32 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Fix variable unused warnings due to disabled histogram stuff.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/internal.h |    7 +++++--
 fs/fscache/internal.h    |    6 ++++--
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index b89f76a03546..16d15291a629 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -143,11 +143,11 @@ extern int cachefiles_check_in_use(struct cachefiles_cache *cache,
 /*
  * proc.c
  */
-#ifdef CONFIG_CACHEFILES_HISTOGRAM
 extern atomic_t cachefiles_lookup_histogram[HZ];
 extern atomic_t cachefiles_mkdir_histogram[HZ];
 extern atomic_t cachefiles_create_histogram[HZ];
 
+#ifdef CONFIG_CACHEFILES_HISTOGRAM
 extern int __init cachefiles_proc_init(void);
 extern void cachefiles_proc_cleanup(void);
 static inline
@@ -162,7 +162,10 @@ void cachefiles_hist(atomic_t histogram[], unsigned long start_jif)
 #else
 #define cachefiles_proc_init()		(0)
 #define cachefiles_proc_cleanup()	do {} while (0)
-#define cachefiles_hist(hist, start_jif) do {} while (0)
+static inline
+void cachefiles_hist(atomic_t histogram[], unsigned long start_jif)
+{
+}
 #endif
 
 /*
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 443671310e31..a70c1a612309 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -95,13 +95,13 @@ extern struct fscache_cookie fscache_fsdef_index;
 /*
  * histogram.c
  */
-#ifdef CONFIG_FSCACHE_HISTOGRAM
 extern atomic_t fscache_obj_instantiate_histogram[HZ];
 extern atomic_t fscache_objs_histogram[HZ];
 extern atomic_t fscache_ops_histogram[HZ];
 extern atomic_t fscache_retrieval_delay_histogram[HZ];
 extern atomic_t fscache_retrieval_histogram[HZ];
 
+#ifdef CONFIG_FSCACHE_HISTOGRAM
 static inline void fscache_hist(atomic_t histogram[], unsigned long start_jif)
 {
 	unsigned long jif = jiffies - start_jif;
@@ -113,7 +113,9 @@ static inline void fscache_hist(atomic_t histogram[], unsigned long start_jif)
 extern const struct seq_operations fscache_histogram_ops;
 
 #else
-#define fscache_hist(hist, start_jif) do {} while (0)
+static inline void fscache_hist(atomic_t histogram[], unsigned long start_jif)
+{
+}
 #endif
 
 /*



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 12/32] fscache, cachefiles: Fix disabled histogram warnings
@ 2020-07-13 16:32   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:32 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs

Fix variable unused warnings due to disabled histogram stuff.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/internal.h |    7 +++++--
 fs/fscache/internal.h    |    6 ++++--
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index b89f76a03546..16d15291a629 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -143,11 +143,11 @@ extern int cachefiles_check_in_use(struct cachefiles_cache *cache,
 /*
  * proc.c
  */
-#ifdef CONFIG_CACHEFILES_HISTOGRAM
 extern atomic_t cachefiles_lookup_histogram[HZ];
 extern atomic_t cachefiles_mkdir_histogram[HZ];
 extern atomic_t cachefiles_create_histogram[HZ];
 
+#ifdef CONFIG_CACHEFILES_HISTOGRAM
 extern int __init cachefiles_proc_init(void);
 extern void cachefiles_proc_cleanup(void);
 static inline
@@ -162,7 +162,10 @@ void cachefiles_hist(atomic_t histogram[], unsigned long start_jif)
 #else
 #define cachefiles_proc_init()		(0)
 #define cachefiles_proc_cleanup()	do {} while (0)
-#define cachefiles_hist(hist, start_jif) do {} while (0)
+static inline
+void cachefiles_hist(atomic_t histogram[], unsigned long start_jif)
+{
+}
 #endif
 
 /*
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 443671310e31..a70c1a612309 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -95,13 +95,13 @@ extern struct fscache_cookie fscache_fsdef_index;
 /*
  * histogram.c
  */
-#ifdef CONFIG_FSCACHE_HISTOGRAM
 extern atomic_t fscache_obj_instantiate_histogram[HZ];
 extern atomic_t fscache_objs_histogram[HZ];
 extern atomic_t fscache_ops_histogram[HZ];
 extern atomic_t fscache_retrieval_delay_histogram[HZ];
 extern atomic_t fscache_retrieval_histogram[HZ];
 
+#ifdef CONFIG_FSCACHE_HISTOGRAM
 static inline void fscache_hist(atomic_t histogram[], unsigned long start_jif)
 {
 	unsigned long jif = jiffies - start_jif;
@@ -113,7 +113,9 @@ static inline void fscache_hist(atomic_t histogram[], unsigned long start_jif)
 extern const struct seq_operations fscache_histogram_ops;
 
 #else
-#define fscache_hist(hist, start_jif) do {} while (0)
+static inline void fscache_hist(atomic_t histogram[], unsigned long start_jif)
+{
+}
 #endif
 
 /*

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 13/32] fscache: Recast assertion in terms of cookie not being an index
  2020-07-13 16:30 ` David Howells
                   ` (12 preceding siblings ...)
  (?)
@ 2020-07-13 16:33 ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:33 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Recast assertion in terms of cookie not being an index rather than being a
datafile.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/cookie.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 5c53027d3f53..2d9d147411cd 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -485,7 +485,7 @@ void __fscache_invalidate(struct fscache_cookie *cookie, loff_t new_size)
 	 * there, and if it's doing that, it may as well just retire the
 	 * cookie.
 	 */
-	ASSERTCMP(cookie->type, ==, FSCACHE_COOKIE_TYPE_DATAFILE);
+	ASSERTCMP(cookie->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
 
 	spin_lock(&cookie->lock);
 	cookie->object_size = new_size;



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 14/32] cachefiles: Remove some redundant checks on unsigned values
  2020-07-13 16:30 ` David Howells
@ 2020-07-13 16:33   ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:33 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Remove some redundant checks for unsigned values being >= 0.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/bind.c   |    6 ++----
 fs/cachefiles/daemon.c |    6 +++---
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/cachefiles/bind.c b/fs/cachefiles/bind.c
index 4c59e1ef4500..84fe89d5999e 100644
--- a/fs/cachefiles/bind.c
+++ b/fs/cachefiles/bind.c
@@ -36,13 +36,11 @@ int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
 	       args);
 
 	/* start by checking things over */
-	ASSERT(cache->fstop_percent >= 0 &&
-	       cache->fstop_percent < cache->fcull_percent &&
+	ASSERT(cache->fstop_percent < cache->fcull_percent &&
 	       cache->fcull_percent < cache->frun_percent &&
 	       cache->frun_percent  < 100);
 
-	ASSERT(cache->bstop_percent >= 0 &&
-	       cache->bstop_percent < cache->bcull_percent &&
+	ASSERT(cache->bstop_percent < cache->bcull_percent &&
 	       cache->bcull_percent < cache->brun_percent &&
 	       cache->brun_percent  < 100);
 
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 8a937d6d5e22..e8ab3ab57147 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -221,7 +221,7 @@ static ssize_t cachefiles_daemon_write(struct file *file,
 	if (test_bit(CACHEFILES_DEAD, &cache->flags))
 		return -EIO;
 
-	if (datalen < 0 || datalen > PAGE_SIZE - 1)
+	if (datalen > PAGE_SIZE - 1)
 		return -EOPNOTSUPP;
 
 	/* drag the command string into the kernel so we can parse it */
@@ -378,7 +378,7 @@ static int cachefiles_daemon_fstop(struct cachefiles_cache *cache, char *args)
 	if (args[0] != '%' || args[1] != '\0')
 		return -EINVAL;
 
-	if (fstop < 0 || fstop >= cache->fcull_percent)
+	if (fstop >= cache->fcull_percent)
 		return cachefiles_daemon_range_error(cache, args);
 
 	cache->fstop_percent = fstop;
@@ -450,7 +450,7 @@ static int cachefiles_daemon_bstop(struct cachefiles_cache *cache, char *args)
 	if (args[0] != '%' || args[1] != '\0')
 		return -EINVAL;
 
-	if (bstop < 0 || bstop >= cache->bcull_percent)
+	if (bstop >= cache->bcull_percent)
 		return cachefiles_daemon_range_error(cache, args);
 
 	cache->bstop_percent = bstop;



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 14/32] cachefiles: Remove some redundant checks on unsigned values
@ 2020-07-13 16:33   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:33 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs

Remove some redundant checks for unsigned values being >= 0.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/bind.c   |    6 ++----
 fs/cachefiles/daemon.c |    6 +++---
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/cachefiles/bind.c b/fs/cachefiles/bind.c
index 4c59e1ef4500..84fe89d5999e 100644
--- a/fs/cachefiles/bind.c
+++ b/fs/cachefiles/bind.c
@@ -36,13 +36,11 @@ int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
 	       args);
 
 	/* start by checking things over */
-	ASSERT(cache->fstop_percent >= 0 &&
-	       cache->fstop_percent < cache->fcull_percent &&
+	ASSERT(cache->fstop_percent < cache->fcull_percent &&
 	       cache->fcull_percent < cache->frun_percent &&
 	       cache->frun_percent  < 100);
 
-	ASSERT(cache->bstop_percent >= 0 &&
-	       cache->bstop_percent < cache->bcull_percent &&
+	ASSERT(cache->bstop_percent < cache->bcull_percent &&
 	       cache->bcull_percent < cache->brun_percent &&
 	       cache->brun_percent  < 100);
 
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 8a937d6d5e22..e8ab3ab57147 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -221,7 +221,7 @@ static ssize_t cachefiles_daemon_write(struct file *file,
 	if (test_bit(CACHEFILES_DEAD, &cache->flags))
 		return -EIO;
 
-	if (datalen < 0 || datalen > PAGE_SIZE - 1)
+	if (datalen > PAGE_SIZE - 1)
 		return -EOPNOTSUPP;
 
 	/* drag the command string into the kernel so we can parse it */
@@ -378,7 +378,7 @@ static int cachefiles_daemon_fstop(struct cachefiles_cache *cache, char *args)
 	if (args[0] != '%' || args[1] != '\0')
 		return -EINVAL;
 
-	if (fstop < 0 || fstop >= cache->fcull_percent)
+	if (fstop >= cache->fcull_percent)
 		return cachefiles_daemon_range_error(cache, args);
 
 	cache->fstop_percent = fstop;
@@ -450,7 +450,7 @@ static int cachefiles_daemon_bstop(struct cachefiles_cache *cache, char *args)
 	if (args[0] != '%' || args[1] != '\0')
 		return -EINVAL;
 
-	if (bstop < 0 || bstop >= cache->bcull_percent)
+	if (bstop >= cache->bcull_percent)
 		return cachefiles_daemon_range_error(cache, args);
 
 	cache->bstop_percent = bstop;

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 15/32] cachefiles: trace: Log coherency checks
  2020-07-13 16:30 ` David Howells
@ 2020-07-13 16:33   ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:33 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Add a cachefiles tracepoint that logs the result of coherency management
when the coherency data on a file in the cache is checked or committed.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/xattr.c             |   45 ++++++++++++++++++++++--------
 include/trace/events/cachefiles.h |   56 +++++++++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+), 12 deletions(-)

diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index 5b2f6da91cc8..17c16c2bd07e 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -125,12 +125,21 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 	ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
 			   buf, sizeof(struct cachefiles_xattr) + len,
 			   xattr_flags);
-	kfree(buf);
-	if (ret < 0 && ret != -ENOMEM)
-		cachefiles_io_error_obj(
-			object,
-			"Failed to set xattr with error %d", ret);
+	if (ret < 0) {
+		trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
+					   0,
+					   cachefiles_coherency_set_fail);
+		if (ret != -ENOMEM)
+			cachefiles_io_error_obj(
+				object,
+				"Failed to set xattr with error %d", ret);
+	} else {
+		trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
+					   0,
+					   cachefiles_coherency_set_ok);
+	}
 
+	kfree(buf);
 	_leave(" = %d", ret);
 	return ret;
 }
@@ -144,7 +153,9 @@ int cachefiles_check_auxdata(struct cachefiles_object *object)
 	struct dentry *dentry = object->dentry;
 	unsigned int len = object->fscache.cookie->aux_len, tlen;
 	const void *p = fscache_get_aux(object->fscache.cookie);
-	ssize_t ret;
+	enum cachefiles_coherency_trace why;
+	ssize_t xlen;
+	int ret = -ESTALE;
 
 	ASSERT(dentry);
 	ASSERT(d_backing_inode(dentry));
@@ -154,14 +165,24 @@ int cachefiles_check_auxdata(struct cachefiles_object *object)
 	if (!buf)
 		return -ENOMEM;
 
-	ret = vfs_getxattr(dentry, cachefiles_xattr_cache, buf, tlen);
-	if (ret == tlen &&
-	    buf->type == object->fscache.cookie->type &&
-	    memcmp(buf->data, p, len) == 0)
+	xlen = vfs_getxattr(dentry, cachefiles_xattr_cache, buf, tlen);
+	if (xlen != tlen) {
+		if (xlen == -EIO)
+			cachefiles_io_error_obj(
+				object,
+				"Failed to read aux with error %zd", xlen);
+		why = cachefiles_coherency_check_xattr;
+	} else if (buf->type != object->fscache.cookie->type) {
+		why = cachefiles_coherency_check_type;
+	} else if (memcmp(buf->data, p, len) != 0) {
+		why = cachefiles_coherency_check_aux;
+	} else {
+		why = cachefiles_coherency_check_ok;
 		ret = 0;
-	else
-		ret = -ESTALE;
+	}
 
+	trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
+				   0, why);
 	kfree(buf);
 	return ret;
 }
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 0aa3f3126f6e..bf588c3f4a07 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -24,6 +24,19 @@ enum cachefiles_obj_ref_trace {
 	cachefiles_obj_ref__nr_traces
 };
 
+enum cachefiles_coherency_trace {
+	cachefiles_coherency_check_aux,
+	cachefiles_coherency_check_content,
+	cachefiles_coherency_check_dirty,
+	cachefiles_coherency_check_len,
+	cachefiles_coherency_check_objsize,
+	cachefiles_coherency_check_ok,
+	cachefiles_coherency_check_type,
+	cachefiles_coherency_check_xattr,
+	cachefiles_coherency_set_fail,
+	cachefiles_coherency_set_ok,
+};
+
 #endif
 
 /*
@@ -56,6 +69,18 @@ enum cachefiles_obj_ref_trace {
 	EM(cachefiles_obj_put_wait_retry,	"PUT wait_retry")	\
 	E_(cachefiles_obj_put_wait_timeo,	"PUT wait_timeo")
 
+#define cachefiles_coherency_traces					\
+	EM(cachefiles_coherency_check_aux,	"BAD aux ")		\
+	EM(cachefiles_coherency_check_content,	"BAD cont")		\
+	EM(cachefiles_coherency_check_dirty,	"BAD dirt")		\
+	EM(cachefiles_coherency_check_len,	"BAD len ")		\
+	EM(cachefiles_coherency_check_objsize,	"BAD osiz")		\
+	EM(cachefiles_coherency_check_ok,	"OK      ")		\
+	EM(cachefiles_coherency_check_type,	"BAD type")		\
+	EM(cachefiles_coherency_check_xattr,	"BAD xatt")		\
+	EM(cachefiles_coherency_set_fail,	"SET fail")		\
+	E_(cachefiles_coherency_set_ok,		"SET ok  ")
+
 /*
  * Export enum symbols via userspace.
  */
@@ -66,6 +91,7 @@ enum cachefiles_obj_ref_trace {
 
 cachefiles_obj_kill_traces;
 cachefiles_obj_ref_traces;
+cachefiles_coherency_traces;
 
 /*
  * Now redefine the EM() and E_() macros to map the enums to the strings that
@@ -295,6 +321,36 @@ TRACE_EVENT(cachefiles_mark_buried,
 		      __print_symbolic(__entry->why, cachefiles_obj_kill_traces))
 	    );
 
+TRACE_EVENT(cachefiles_coherency,
+	    TP_PROTO(struct cachefiles_object *obj,
+		     ino_t ino,
+		     int content,
+		     enum cachefiles_coherency_trace why),
+
+	    TP_ARGS(obj, ino, content, why),
+
+	    /* Note that obj may be NULL */
+	    TP_STRUCT__entry(
+		    __field(unsigned int,			obj	)
+		    __field(enum cachefiles_coherency_trace,	why	)
+		    __field(int,				content	)
+		    __field(u64,				ino	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->why	= why;
+		    __entry->content	= content;
+		    __entry->ino	= ino;
+			   ),
+
+	    TP_printk("o=%08x %s i=%llx c=%u",
+		      __entry->obj,
+		      __print_symbolic(__entry->why, cachefiles_coherency_traces),
+		      __entry->ino,
+		      __entry->content)
+	    );
+
 #endif /* _TRACE_CACHEFILES_H */
 
 /* This part must be outside protection */



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 15/32] cachefiles: trace: Log coherency checks
@ 2020-07-13 16:33   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:33 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs

Add a cachefiles tracepoint that logs the result of coherency management
when the coherency data on a file in the cache is checked or committed.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/xattr.c             |   45 ++++++++++++++++++++++--------
 include/trace/events/cachefiles.h |   56 +++++++++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+), 12 deletions(-)

diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index 5b2f6da91cc8..17c16c2bd07e 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -125,12 +125,21 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 	ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
 			   buf, sizeof(struct cachefiles_xattr) + len,
 			   xattr_flags);
-	kfree(buf);
-	if (ret < 0 && ret != -ENOMEM)
-		cachefiles_io_error_obj(
-			object,
-			"Failed to set xattr with error %d", ret);
+	if (ret < 0) {
+		trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
+					   0,
+					   cachefiles_coherency_set_fail);
+		if (ret != -ENOMEM)
+			cachefiles_io_error_obj(
+				object,
+				"Failed to set xattr with error %d", ret);
+	} else {
+		trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
+					   0,
+					   cachefiles_coherency_set_ok);
+	}
 
+	kfree(buf);
 	_leave(" = %d", ret);
 	return ret;
 }
@@ -144,7 +153,9 @@ int cachefiles_check_auxdata(struct cachefiles_object *object)
 	struct dentry *dentry = object->dentry;
 	unsigned int len = object->fscache.cookie->aux_len, tlen;
 	const void *p = fscache_get_aux(object->fscache.cookie);
-	ssize_t ret;
+	enum cachefiles_coherency_trace why;
+	ssize_t xlen;
+	int ret = -ESTALE;
 
 	ASSERT(dentry);
 	ASSERT(d_backing_inode(dentry));
@@ -154,14 +165,24 @@ int cachefiles_check_auxdata(struct cachefiles_object *object)
 	if (!buf)
 		return -ENOMEM;
 
-	ret = vfs_getxattr(dentry, cachefiles_xattr_cache, buf, tlen);
-	if (ret == tlen &&
-	    buf->type == object->fscache.cookie->type &&
-	    memcmp(buf->data, p, len) == 0)
+	xlen = vfs_getxattr(dentry, cachefiles_xattr_cache, buf, tlen);
+	if (xlen != tlen) {
+		if (xlen == -EIO)
+			cachefiles_io_error_obj(
+				object,
+				"Failed to read aux with error %zd", xlen);
+		why = cachefiles_coherency_check_xattr;
+	} else if (buf->type != object->fscache.cookie->type) {
+		why = cachefiles_coherency_check_type;
+	} else if (memcmp(buf->data, p, len) != 0) {
+		why = cachefiles_coherency_check_aux;
+	} else {
+		why = cachefiles_coherency_check_ok;
 		ret = 0;
-	else
-		ret = -ESTALE;
+	}
 
+	trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
+				   0, why);
 	kfree(buf);
 	return ret;
 }
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 0aa3f3126f6e..bf588c3f4a07 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -24,6 +24,19 @@ enum cachefiles_obj_ref_trace {
 	cachefiles_obj_ref__nr_traces
 };
 
+enum cachefiles_coherency_trace {
+	cachefiles_coherency_check_aux,
+	cachefiles_coherency_check_content,
+	cachefiles_coherency_check_dirty,
+	cachefiles_coherency_check_len,
+	cachefiles_coherency_check_objsize,
+	cachefiles_coherency_check_ok,
+	cachefiles_coherency_check_type,
+	cachefiles_coherency_check_xattr,
+	cachefiles_coherency_set_fail,
+	cachefiles_coherency_set_ok,
+};
+
 #endif
 
 /*
@@ -56,6 +69,18 @@ enum cachefiles_obj_ref_trace {
 	EM(cachefiles_obj_put_wait_retry,	"PUT wait_retry")	\
 	E_(cachefiles_obj_put_wait_timeo,	"PUT wait_timeo")
 
+#define cachefiles_coherency_traces					\
+	EM(cachefiles_coherency_check_aux,	"BAD aux ")		\
+	EM(cachefiles_coherency_check_content,	"BAD cont")		\
+	EM(cachefiles_coherency_check_dirty,	"BAD dirt")		\
+	EM(cachefiles_coherency_check_len,	"BAD len ")		\
+	EM(cachefiles_coherency_check_objsize,	"BAD osiz")		\
+	EM(cachefiles_coherency_check_ok,	"OK      ")		\
+	EM(cachefiles_coherency_check_type,	"BAD type")		\
+	EM(cachefiles_coherency_check_xattr,	"BAD xatt")		\
+	EM(cachefiles_coherency_set_fail,	"SET fail")		\
+	E_(cachefiles_coherency_set_ok,		"SET ok  ")
+
 /*
  * Export enum symbols via userspace.
  */
@@ -66,6 +91,7 @@ enum cachefiles_obj_ref_trace {
 
 cachefiles_obj_kill_traces;
 cachefiles_obj_ref_traces;
+cachefiles_coherency_traces;
 
 /*
  * Now redefine the EM() and E_() macros to map the enums to the strings that
@@ -295,6 +321,36 @@ TRACE_EVENT(cachefiles_mark_buried,
 		      __print_symbolic(__entry->why, cachefiles_obj_kill_traces))
 	    );
 
+TRACE_EVENT(cachefiles_coherency,
+	    TP_PROTO(struct cachefiles_object *obj,
+		     ino_t ino,
+		     int content,
+		     enum cachefiles_coherency_trace why),
+
+	    TP_ARGS(obj, ino, content, why),
+
+	    /* Note that obj may be NULL */
+	    TP_STRUCT__entry(
+		    __field(unsigned int,			obj	)
+		    __field(enum cachefiles_coherency_trace,	why	)
+		    __field(int,				content	)
+		    __field(u64,				ino	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->why	= why;
+		    __entry->content	= content;
+		    __entry->ino	= ino;
+			   ),
+
+	    TP_printk("o=%08x %s i=%llx c=%u",
+		      __entry->obj,
+		      __print_symbolic(__entry->why, cachefiles_coherency_traces),
+		      __entry->ino,
+		      __entry->content)
+	    );
+
 #endif /* _TRACE_CACHEFILES_H */
 
 /* This part must be outside protection */

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 16/32] cachefiles: Split cachefiles_drop_object() up a bit
@ 2020-07-13 16:33   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:33 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Split cachefiles_drop_object() up a bit to make it easier to modify later.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/interface.c |   58 ++++++++++++++++++++++++++++++---------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index e4d1a82b9f33..56ed6f203e1c 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -192,6 +192,42 @@ static void cachefiles_update_object(struct fscache_object *_object)
 	_leave("");
 }
 
+/*
+ * Commit changes to the object as we drop it.
+ */
+static void cachefiles_commit_object(struct cachefiles_object *object,
+				     struct cachefiles_cache *cache)
+{
+}
+
+/*
+ * Finalise and object and close the VFS structs that we have.
+ */
+static void cachefiles_clean_up_object(struct cachefiles_object *object,
+				       struct cachefiles_cache *cache,
+				       bool invalidate)
+{
+	if (invalidate && &object->fscache != cache->cache.fsdef) {
+		_debug("- inval object OBJ%x", object->fscache.debug_id);
+		cachefiles_delete_object(cache, object);
+	} else {
+		cachefiles_commit_object(object, cache);
+ 	}
+
+	/* close the filesystem stuff attached to the object */
+	if (object->backing_file)
+		fput(object->backing_file);
+	object->backing_file = NULL;
+
+	if (object->backer != object->dentry)
+		dput(object->backer);
+	object->backer = NULL;
+
+	cachefiles_unmark_inode_in_use(object, object->dentry);
+	dput(object->dentry);
+	object->dentry = NULL;
+}
+
 /*
  * discard the resources pinned by an object and effect retirement if
  * requested
@@ -223,25 +259,9 @@ static void cachefiles_drop_object(struct fscache_object *_object,
 	 * before we set it up.
 	 */
 	if (object->dentry) {
-		if (invalidate && _object != cache->cache.fsdef) {
-			_debug("- inval object OBJ%x", object->fscache.debug_id);
-			cachefiles_begin_secure(cache, &saved_cred);
-			cachefiles_delete_object(cache, object);
-			cachefiles_end_secure(cache, saved_cred);
-		}
-
-		/* close the filesystem stuff attached to the object */
-		if (object->backing_file)
-			fput(object->backing_file);
-		object->backing_file = NULL;
-
-		if (object->backer != object->dentry)
-			dput(object->backer);
-		object->backer = NULL;
-
-		cachefiles_unmark_inode_in_use(object, object->dentry);
-		dput(object->dentry);
-		object->dentry = NULL;
+		cachefiles_begin_secure(cache, &saved_cred);
+		cachefiles_clean_up_object(object, cache, invalidate);
+		cachefiles_end_secure(cache, saved_cred);
 	}
 
 	_leave("");



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 16/32] cachefiles: Split cachefiles_drop_object() up a bit
@ 2020-07-13 16:33   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:33 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells-H+wXaHxf7aLQT0dZR+AlfA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA,
	linux-afs-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	v9fs-developer-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Split cachefiles_drop_object() up a bit to make it easier to modify later.

Signed-off-by: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---

 fs/cachefiles/interface.c |   58 ++++++++++++++++++++++++++++++---------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index e4d1a82b9f33..56ed6f203e1c 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -192,6 +192,42 @@ static void cachefiles_update_object(struct fscache_object *_object)
 	_leave("");
 }
 
+/*
+ * Commit changes to the object as we drop it.
+ */
+static void cachefiles_commit_object(struct cachefiles_object *object,
+				     struct cachefiles_cache *cache)
+{
+}
+
+/*
+ * Finalise and object and close the VFS structs that we have.
+ */
+static void cachefiles_clean_up_object(struct cachefiles_object *object,
+				       struct cachefiles_cache *cache,
+				       bool invalidate)
+{
+	if (invalidate && &object->fscache != cache->cache.fsdef) {
+		_debug("- inval object OBJ%x", object->fscache.debug_id);
+		cachefiles_delete_object(cache, object);
+	} else {
+		cachefiles_commit_object(object, cache);
+ 	}
+
+	/* close the filesystem stuff attached to the object */
+	if (object->backing_file)
+		fput(object->backing_file);
+	object->backing_file = NULL;
+
+	if (object->backer != object->dentry)
+		dput(object->backer);
+	object->backer = NULL;
+
+	cachefiles_unmark_inode_in_use(object, object->dentry);
+	dput(object->dentry);
+	object->dentry = NULL;
+}
+
 /*
  * discard the resources pinned by an object and effect retirement if
  * requested
@@ -223,25 +259,9 @@ static void cachefiles_drop_object(struct fscache_object *_object,
 	 * before we set it up.
 	 */
 	if (object->dentry) {
-		if (invalidate && _object != cache->cache.fsdef) {
-			_debug("- inval object OBJ%x", object->fscache.debug_id);
-			cachefiles_begin_secure(cache, &saved_cred);
-			cachefiles_delete_object(cache, object);
-			cachefiles_end_secure(cache, saved_cred);
-		}
-
-		/* close the filesystem stuff attached to the object */
-		if (object->backing_file)
-			fput(object->backing_file);
-		object->backing_file = NULL;
-
-		if (object->backer != object->dentry)
-			dput(object->backer);
-		object->backer = NULL;
-
-		cachefiles_unmark_inode_in_use(object, object->dentry);
-		dput(object->dentry);
-		object->dentry = NULL;
+		cachefiles_begin_secure(cache, &saved_cred);
+		cachefiles_clean_up_object(object, cache, invalidate);
+		cachefiles_end_secure(cache, saved_cred);
 	}
 
 	_leave("");

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 17/32] cachefiles: Implement new fscache I/O backend API
  2020-07-13 16:30 ` David Howells
                   ` (16 preceding siblings ...)
  (?)
@ 2020-07-13 16:33 ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:33 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Implement the new fscache I/O backend API in cachefiles.  The
cachefiles_object struct carries a non-accounted file to the cachefiles
object (so that it doesn't cause ENFILE).

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/Makefile    |    1 +
 fs/cachefiles/interface.c |    3 ++
 fs/cachefiles/internal.h  |   13 +++++++
 fs/cachefiles/io.c        |   87 +++++++++++++++++++++++++++++++++++++++++++++
 fs/cachefiles/namei.c     |    3 ++
 5 files changed, 107 insertions(+)
 create mode 100644 fs/cachefiles/io.c

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index 3455d3646547..d894d317d6e7 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -7,6 +7,7 @@ cachefiles-y := \
 	bind.o \
 	daemon.o \
 	interface.o \
+	io.o \
 	key.o \
 	main.o \
 	namei.o \
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 56ed6f203e1c..4ce7ab5c75db 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -465,4 +465,7 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
 	.put_object		= cachefiles_put_object,
 	.get_object_usage	= cachefiles_get_object_usage,
 	.sync_cache		= cachefiles_sync_cache,
+	.shape_request		= cachefiles_shape_request,
+	.read			= cachefiles_read,
+	.write			= cachefiles_write,
 };
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 16d15291a629..b82e7f8b00bd 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -115,6 +115,19 @@ extern const struct fscache_cache_ops cachefiles_cache_ops;
 extern struct fscache_object *cachefiles_grab_object(struct fscache_object *_object,
 						     enum fscache_obj_ref_trace why);
 
+/*
+ * io.c
+ */
+extern void cachefiles_shape_request(struct fscache_object *object,
+				     struct fscache_request_shape *shape);
+extern int cachefiles_read(struct fscache_object *object,
+			   struct fscache_io_request *req,
+			   struct iov_iter *iter);
+extern int cachefiles_write(struct fscache_object *object,
+			    struct fscache_io_request *req,
+			    struct iov_iter *iter);
+extern bool cachefiles_open_object(struct cachefiles_object *obj);
+
 /*
  * key.c
  */
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
new file mode 100644
index 000000000000..89fd4a24e613
--- /dev/null
+++ b/fs/cachefiles/io.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Data I/O routines
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/mount.h>
+#include <linux/slab.h>
+#include <linux/file.h>
+#include <linux/uio.h>
+#include <linux/xattr.h>
+#include "internal.h"
+
+/*
+ * Determine the size of a data extent in a cache object.  This must be written
+ * as a whole unit, but can be read piecemeal.
+ */
+void cachefiles_shape_request(struct fscache_object *object,
+			      struct fscache_request_shape *shape)
+{
+	return 0;
+}
+
+/*
+ * Initiate a read from the cache.
+ */
+int cachefiles_read(struct fscache_object *object,
+		    struct fscache_io_request *req,
+		    struct iov_iter *iter)
+{
+	req->error = -ENODATA;
+	if (req->io_done)
+		req->io_done(req);
+	return -ENODATA;
+}
+
+/*
+ * Initiate a write to the cache.
+ */
+int cachefiles_write(struct fscache_object *object,
+		     struct fscache_io_request *req,
+		     struct iov_iter *iter)
+{
+	req->error = -ENOBUFS;
+	if (req->io_done)
+		req->io_done(req);
+	return -ENOBUFS;
+}
+
+/*
+ * Open a cache object.
+ */
+bool cachefiles_open_object(struct cachefiles_object *object)
+{
+	struct cachefiles_cache *cache =
+		container_of(object->fscache.cache, struct cachefiles_cache, cache);
+	struct file *file;
+	struct path path;
+
+	path.mnt = cache->mnt;
+	path.dentry = object->backer;
+
+	file = open_with_fake_path(&path,
+				   O_RDWR | O_LARGEFILE | O_DIRECT,
+				   d_backing_inode(object->backer),
+				   cache->cache_cred);
+	if (IS_ERR(file))
+		goto error;
+
+	if (!S_ISREG(file_inode(file)->i_mode))
+		goto error_file;
+
+	if (unlikely(!file->f_op->read_iter) ||
+	    unlikely(!file->f_op->write_iter)) {
+		pr_notice("Cache does not support read_iter and write_iter\n");
+		goto error_file;
+	}
+
+	object->backing_file = file;
+	return true;
+
+error_file:
+	fput(file);
+error:
+	return false;
+}
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index d1c8828ebbbb..d9c9a7d7eb8a 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -492,6 +492,9 @@ bool cachefiles_walk_to_object(struct cachefiles_object *parent,
 		} else {
 			BUG(); // TODO: open file in data-class subdir
 		}
+
+		if (!cachefiles_open_object(object))
+			goto check_error;
 	}
 
 	if (object->new)



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 18/32] cachefiles: Merge object->backer into object->dentry
  2020-07-13 16:30 ` David Howells
                   ` (17 preceding siblings ...)
  (?)
@ 2020-07-13 16:33 ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:33 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Merge the object->backer pointer into the object->dentry pointer and assume
that data objects are always going to be just regular files.

object->dentry can then more easily be overridden later by invalidation
without having two different things to update the xattrs on.

object->old maintains a pointer to the old file so that we can unlink the
it later.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/interface.c |   35 +++++++++++++++++------------------
 fs/cachefiles/internal.h  |    2 +-
 fs/cachefiles/io.c        |    4 ++--
 fs/cachefiles/namei.c     |    4 +++-
 4 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 4ce7ab5c75db..6384fba652eb 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -171,16 +171,16 @@ static void cachefiles_update_object(struct fscache_object *_object)
 	cachefiles_begin_secure(cache, &saved_cred);
 
 	object_size = object->fscache.cookie->object_size;
-	if (i_size_read(d_inode(object->backer)) > object_size) {
+	if (i_size_read(d_inode(object->dentry)) > object_size) {
 		struct path path = {
 			.mnt	= cache->mnt,
-			.dentry	= object->backer
+			.dentry	= object->dentry
 		};
-		_debug("trunc %llx -> %llx", i_size_read(d_inode(object->backer)), object_size);
+		_debug("trunc %llx -> %llx", i_size_read(d_inode(object->dentry)), object_size);
 		ret = vfs_truncate(&path, object_size);
 		if (ret < 0) {
 			cachefiles_io_error_obj(object, "Trunc-to-size failed");
-			cachefiles_remove_object_xattr(cache, object->backer);
+			cachefiles_remove_object_xattr(cache, object->dentry);
 			goto out;
 		}
 	}
@@ -219,9 +219,8 @@ static void cachefiles_clean_up_object(struct cachefiles_object *object,
 		fput(object->backing_file);
 	object->backing_file = NULL;
 
-	if (object->backer != object->dentry)
-		dput(object->backer);
-	object->backer = NULL;
+	dput(object->old);
+	object->old = NULL;
 
 	cachefiles_unmark_inode_in_use(object, object->dentry);
 	dput(object->dentry);
@@ -295,7 +294,7 @@ static void cachefiles_put_object(struct fscache_object *_object,
 	if (u == 0) {
 		_debug("- kill object OBJ%x", object->fscache.debug_id);
 
-		ASSERTCMP(object->backer, ==, NULL);
+		ASSERTCMP(object->old, ==, NULL);
 		ASSERTCMP(object->dentry, ==, NULL);
 		ASSERTCMP(object->fscache.n_children, ==, 0);
 
@@ -360,17 +359,17 @@ static int cachefiles_attr_changed(struct cachefiles_object *object)
 	if (ni_size == object->i_size)
 		return 0;
 
-	if (!object->backer)
+	if (!object->dentry)
 		return -ENOBUFS;
 
-	ASSERT(d_is_reg(object->backer));
+	ASSERT(d_is_reg(object->dentry));
 
-	oi_size = i_size_read(d_backing_inode(object->backer));
+	oi_size = i_size_read(d_backing_inode(object->dentry));
 	if (oi_size == ni_size)
 		return 0;
 
 	cachefiles_begin_secure(cache, &saved_cred);
-	inode_lock(d_inode(object->backer));
+	inode_lock(d_inode(object->dentry));
 
 	/* if there's an extension to a partial page at the end of the backing
 	 * file, we need to discard the partial page so that we pick up new
@@ -379,17 +378,17 @@ static int cachefiles_attr_changed(struct cachefiles_object *object)
 		_debug("discard tail %llx", oi_size);
 		newattrs.ia_valid = ATTR_SIZE;
 		newattrs.ia_size = oi_size & PAGE_MASK;
-		ret = notify_change(object->backer, &newattrs, NULL);
+		ret = notify_change(object->dentry, &newattrs, NULL);
 		if (ret < 0)
 			goto truncate_failed;
 	}
 
 	newattrs.ia_valid = ATTR_SIZE;
 	newattrs.ia_size = ni_size;
-	ret = notify_change(object->backer, &newattrs, NULL);
+	ret = notify_change(object->dentry, &newattrs, NULL);
 
 truncate_failed:
-	inode_unlock(d_inode(object->backer));
+	inode_unlock(d_inode(object->dentry));
 	cachefiles_end_secure(cache, saved_cred);
 
 	if (ret == -EIO) {
@@ -422,10 +421,10 @@ static void cachefiles_invalidate_object(struct fscache_object *_object)
 	_enter("{OBJ%x},[%llu]",
 	       object->fscache.debug_id, (unsigned long long)ni_size);
 
-	if (object->backer) {
-		ASSERT(d_is_reg(object->backer));
+	if (object->dentry) {
+		ASSERT(d_is_reg(object->dentry));
 
-		path.dentry = object->backer;
+		path.dentry = object->dentry;
 		path.mnt = cache->mnt;
 
 		cachefiles_begin_secure(cache, &saved_cred);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index b82e7f8b00bd..a00ffb63baf4 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -35,7 +35,7 @@ extern unsigned cachefiles_debug;
 struct cachefiles_object {
 	struct fscache_object		fscache;	/* fscache handle */
 	struct dentry			*dentry;	/* the file/dir representing this object */
-	struct dentry			*backer;	/* backing file */
+	struct dentry			*old;		/* backing file */
 	struct file			*backing_file;	/* File open on backing storage */
 	loff_t				i_size;		/* object size */
 	atomic_t			usage;		/* object usage count */
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 89fd4a24e613..d17734455af2 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -59,11 +59,11 @@ bool cachefiles_open_object(struct cachefiles_object *object)
 	struct path path;
 
 	path.mnt = cache->mnt;
-	path.dentry = object->backer;
+	path.dentry = object->dentry;
 
 	file = open_with_fake_path(&path,
 				   O_RDWR | O_LARGEFILE | O_DIRECT,
-				   d_backing_inode(object->backer),
+				   d_backing_inode(object->dentry),
 				   cache->cache_cred);
 	if (IS_ERR(file))
 		goto error;
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index d9c9a7d7eb8a..3dc64ae5dde8 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -488,7 +488,7 @@ bool cachefiles_walk_to_object(struct cachefiles_object *parent,
 				goto check_error;
 			}
 
-			object->backer = object->dentry;
+			object->old = dget(object->dentry);
 		} else {
 			BUG(); // TODO: open file in data-class subdir
 		}
@@ -523,7 +523,9 @@ bool cachefiles_walk_to_object(struct cachefiles_object *parent,
 		cachefiles_unmark_inode_in_use(object, object->dentry);
 	cachefiles_mark_object_inactive(cache, object);
 	dput(object->dentry);
+	dput(object->old);
 	object->dentry = NULL;
+	object->old = NULL;
 	goto error_out;
 
 lookup_error:



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 19/32] cachefiles: Implement a content-present indicator and bitmap
  2020-07-13 16:30 ` David Howells
@ 2020-07-13 16:34   ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:34 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Implement a content indicator that indicates the presence or absence of
content and a bitmap that indicates which blocks of granular content are
present in a granular file.  This is added to the xattr that stores the
netfs coherency data, along with the file size and the file zero point (the
point after which it can be assumed that the server doesn't have any data).

In the content bitmap, if present, each bit indicates which 256KiB granules
of a cache file are present.  This is stored in a separate xattr, which is
loaded when the first I/O handle is created on that cache object and saved
when the object is discarded from memory.

Non-index objects in the cache can be monolithic or granular.  The content
map isn't used for monolithic objects (FSCACHE_COOKIE_ADV_SINGLE_CHUNK) as
they are expected to be all-or-nothing, so the content indicator alone
suffices.  Examples of this would be AFS directory or symlink content.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/Makefile            |    1 
 fs/cachefiles/bind.c              |    1 
 fs/cachefiles/content-map.c       |  251 +++++++++++++++++++++++++++++++++++++
 fs/cachefiles/interface.c         |    5 +
 fs/cachefiles/internal.h          |   31 +++++
 fs/cachefiles/io.c                |    4 +
 fs/cachefiles/xattr.c             |   24 +++-
 include/trace/events/cachefiles.h |    4 -
 8 files changed, 313 insertions(+), 8 deletions(-)
 create mode 100644 fs/cachefiles/content-map.c

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index d894d317d6e7..84615aca866a 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -5,6 +5,7 @@
 
 cachefiles-y := \
 	bind.o \
+	content-map.o \
 	daemon.o \
 	interface.o \
 	io.o \
diff --git a/fs/cachefiles/bind.c b/fs/cachefiles/bind.c
index 84fe89d5999e..40377633e3d9 100644
--- a/fs/cachefiles/bind.c
+++ b/fs/cachefiles/bind.c
@@ -102,6 +102,7 @@ static int cachefiles_daemon_add_cache(struct cachefiles_cache *cache)
 		goto error_root_object;
 
 	atomic_set(&fsdef->usage, 1);
+	rwlock_init(&fsdef->content_map_lock);
 	fsdef->type = FSCACHE_COOKIE_TYPE_INDEX;
 
 	_debug("- fsdef %p", fsdef);
diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
new file mode 100644
index 000000000000..594624cb1cb9
--- /dev/null
+++ b/fs/cachefiles/content-map.c
@@ -0,0 +1,251 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Datafile content management
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/mount.h>
+#include <linux/slab.h>
+#include <linux/file.h>
+#include <linux/swap.h>
+#include <linux/xattr.h>
+#include "internal.h"
+
+static const char cachefiles_xattr_content_map[] =
+	XATTR_USER_PREFIX "CacheFiles.content";
+
+static bool cachefiles_granule_is_present(struct cachefiles_object *object,
+					  size_t granule)
+{
+	bool res;
+
+	if (granule / 8 >= object->content_map_size)
+		return false;
+	read_lock_bh(&object->content_map_lock);
+	res = test_bit_le(granule, object->content_map);
+	read_unlock_bh(&object->content_map_lock);
+	return res;
+}
+
+/*
+ * Mark the content map to indicate stored granule.
+ */
+void cachefiles_mark_content_map(struct fscache_io_request *req)
+{
+	struct cachefiles_object *object =
+		container_of(req->object, struct cachefiles_object, fscache);
+	loff_t pos = req->pos;
+
+	_enter("%llx", pos);
+
+	read_lock_bh(&object->content_map_lock);
+
+	if (object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK) {
+		if (pos == 0) {
+			object->content_info = CACHEFILES_CONTENT_SINGLE;
+			set_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->fscache.flags);
+		}
+	} else {
+		pgoff_t granule;
+		loff_t end = pos + req->len;
+
+		pos = round_down(pos, CACHEFILES_GRAN_SIZE);
+		do {
+			granule = pos / CACHEFILES_GRAN_SIZE;
+			if (granule / 8 >= object->content_map_size)
+				break;
+
+			set_bit_le(granule, object->content_map);
+			object->content_map_changed = true;
+			pos += CACHEFILES_GRAN_SIZE;
+
+		} while (pos < end);
+
+		if (object->content_info != CACHEFILES_CONTENT_MAP) {
+			object->content_info = CACHEFILES_CONTENT_MAP;
+			set_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->fscache.flags);
+		}
+	}
+
+	read_unlock_bh(&object->content_map_lock);
+}
+
+/*
+ * Expand the content map to a larger file size.
+ */
+void cachefiles_expand_content_map(struct cachefiles_object *object, loff_t size)
+{
+	u8 *map, *zap;
+
+	/* Determine the size.  There's one bit per granule.  We size it in
+	 * terms of 8-byte chunks, where a 64-bit span * 256KiB bytes granules
+	 * covers 16MiB of file space.  At that, 512B will cover 1GiB.
+	 */
+	if (size > 0) {
+		size += CACHEFILES_GRAN_SIZE - 1;
+		size /= CACHEFILES_GRAN_SIZE;
+		size += 8 - 1;
+		size /= 8;
+		size = roundup_pow_of_two(size);
+	} else {
+		size = 8;
+	}
+
+	if (size <= object->content_map_size)
+		return;
+
+	map = kzalloc(size, GFP_KERNEL);
+	if (!map)
+		return;
+
+	write_lock_bh(&object->content_map_lock);
+	if (size > object->content_map_size) {
+		zap = object->content_map;
+		memcpy(map, zap, object->content_map_size);
+		object->content_map = map;
+		object->content_map_size = size;
+	} else {
+		zap = map;
+	}
+	write_unlock_bh(&object->content_map_lock);
+
+	kfree(zap);
+}
+
+/*
+ * Adjust the content map when we shorten a backing object.
+ *
+ * We need to unmark any granules that are going to be discarded.
+ */
+void cachefiles_shorten_content_map(struct cachefiles_object *object,
+				    loff_t new_size)
+{
+	struct fscache_cookie *cookie = object->fscache.cookie;
+	loff_t granule, o_granule;
+
+	if (object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK)
+		return;
+
+	write_lock_bh(&object->content_map_lock);
+
+	if (object->content_info == CACHEFILES_CONTENT_MAP) {
+		if (cookie->zero_point > new_size)
+			cookie->zero_point = new_size;
+
+		granule = new_size;
+		granule += CACHEFILES_GRAN_SIZE - 1;
+		granule /= CACHEFILES_GRAN_SIZE;
+
+		o_granule = cookie->object_size;
+		o_granule += CACHEFILES_GRAN_SIZE - 1;
+		o_granule /= CACHEFILES_GRAN_SIZE;
+
+		for (; o_granule > granule; o_granule--)
+			clear_bit_le(o_granule, object->content_map);
+	}
+
+	write_unlock_bh(&object->content_map_lock);
+}
+
+/*
+ * Load the content map.
+ */
+bool cachefiles_load_content_map(struct cachefiles_object *object)
+{
+	struct cachefiles_cache *cache = container_of(object->fscache.cache,
+						      struct cachefiles_cache, cache);
+	const struct cred *saved_cred;
+	ssize_t got;
+	loff_t size;
+	u8 *map = NULL;
+
+	_enter("c=%08x,%llx",
+	       object->fscache.cookie->debug_id,
+	       object->fscache.cookie->object_size);
+
+	object->content_info = CACHEFILES_CONTENT_NO_DATA;
+	if (object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK) {
+		/* Single-chunk object.  The presence or absence of the content
+		 * map xattr is sufficient indication.
+		 */
+		size = 0;
+	} else {
+		/* Granulated object.  There's one bit per granule.  We size it
+		 * in terms of 8-byte chunks, where a 64-bit span * 256KiB
+		 * bytes granules covers 16MiB of file space.  At that, 512B
+		 * will cover 1GiB.
+		 */
+		size = object->fscache.cookie->object_size;
+		if (size > 0) {
+			size += CACHEFILES_GRAN_SIZE - 1;
+			size /= CACHEFILES_GRAN_SIZE;
+			size += 8 - 1;
+			size /= 8;
+			if (size < 8)
+				size = 8;
+			size = roundup_pow_of_two(size);
+		} else {
+			size = 8;
+		}
+
+		map = kzalloc(size, GFP_KERNEL);
+		if (!map)
+			return false;
+	}
+
+	cachefiles_begin_secure(cache, &saved_cred);
+	got = vfs_getxattr(object->dentry, cachefiles_xattr_content_map,
+			   map, size);
+	cachefiles_end_secure(cache, saved_cred);
+	if (got < 0 && got != -ENODATA) {
+		kfree(map);
+		_leave(" = f [%zd]", got);
+		return false;
+	}
+
+	if (size == 0) {
+		if (got != -ENODATA)
+			object->content_info = CACHEFILES_CONTENT_SINGLE;
+		_leave(" = t [%zd]", got);
+	} else {
+		object->content_map = map;
+		object->content_map_size = size;
+		object->content_info = CACHEFILES_CONTENT_MAP;
+		_leave(" = t [%zd/%llu %*phN]", got, size, (int)size, map);
+	}
+
+	return true;
+}
+
+/*
+ * Save the content map.
+ */
+void cachefiles_save_content_map(struct cachefiles_object *object)
+{
+	ssize_t ret;
+	size_t size;
+	u8 *map;
+
+	_enter("c=%08x", object->fscache.cookie->debug_id);
+
+	if (object->content_info != CACHEFILES_CONTENT_MAP)
+		return;
+
+	size = object->content_map_size;
+	map = object->content_map;
+
+	/* Don't save trailing zeros, but do save at least one byte */
+	for (; size > 0; size--)
+		if (map[size - 1])
+			break;
+
+	ret = vfs_setxattr(object->dentry, cachefiles_xattr_content_map,
+			   map, size, 0);
+	if (ret < 0) {
+		cachefiles_io_error_obj(object, "Unable to set xattr");
+		return;
+	}
+
+	_leave(" = %zd", ret);
+}
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 6384fba652eb..de4fb41103a6 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -37,6 +37,7 @@ struct fscache_object *cachefiles_alloc_object(struct fscache_cookie *cookie,
 		return NULL;
 	}
 
+	rwlock_init(&object->content_map_lock);
 	fscache_object_init(&object->fscache, cookie, &cache->cache);
 	object->fscache.parent = parent;
 	object->fscache.stage = FSCACHE_OBJECT_STAGE_LOOKING_UP;
@@ -198,6 +199,8 @@ static void cachefiles_update_object(struct fscache_object *_object)
 static void cachefiles_commit_object(struct cachefiles_object *object,
 				     struct cachefiles_cache *cache)
 {
+	if (object->content_map_changed)
+		cachefiles_save_content_map(object);
 }
 
 /*
@@ -298,6 +301,8 @@ static void cachefiles_put_object(struct fscache_object *_object,
 		ASSERTCMP(object->dentry, ==, NULL);
 		ASSERTCMP(object->fscache.n_children, ==, 0);
 
+		kfree(object->content_map);
+
 		cache = object->fscache.cache;
 		fscache_object_destroy(&object->fscache);
 		kmem_cache_free(cachefiles_object_jar, object);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index a00ffb63baf4..4085c1185693 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -19,6 +19,11 @@
 #include <linux/workqueue.h>
 #include <linux/security.h>
 
+/* Cachefile granularity */
+#define CACHEFILES_GRAN_SIZE	(256 * 1024)
+#define CACHEFILES_GRAN_PAGES	(CACHEFILES_GRAN_SIZE / PAGE_SIZE)
+#define CACHEFILES_DIO_BLOCK_SIZE 4096
+
 struct cachefiles_cache;
 struct cachefiles_object;
 
@@ -29,6 +34,16 @@ extern unsigned cachefiles_debug;
 
 #define cachefiles_gfp (__GFP_RECLAIM | __GFP_NORETRY | __GFP_NOMEMALLOC)
 
+enum cachefiles_content {
+	/* These values are saved on disk */
+	CACHEFILES_CONTENT_NO_DATA	= 0, /* No content stored */
+	CACHEFILES_CONTENT_SINGLE	= 1, /* Content is monolithic, all is present */
+	CACHEFILES_CONTENT_ALL		= 2, /* Content is all present, no map */
+	CACHEFILES_CONTENT_MAP		= 3, /* Content is piecemeal, map in use */
+	CACHEFILES_CONTENT_DIRTY	= 4, /* Content is dirty (only seen on disk) */
+	nr__cachefiles_content
+};
+
 /*
  * node records
  */
@@ -41,6 +56,13 @@ struct cachefiles_object {
 	atomic_t			usage;		/* object usage count */
 	uint8_t				type;		/* object type */
 	bool				new;		/* T if object new */
+
+	/* Map of the content blocks in the object */
+	enum cachefiles_content		content_info:8;	/* Info about content presence */
+	bool				content_map_changed;
+	u8				*content_map;		/* Content present bitmap */
+	unsigned int			content_map_size;	/* Size of buffer */
+	rwlock_t			content_map_lock;
 };
 
 extern struct kmem_cache *cachefiles_object_jar;
@@ -100,6 +122,15 @@ static inline void cachefiles_state_changed(struct cachefiles_cache *cache)
 extern int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args);
 extern void cachefiles_daemon_unbind(struct cachefiles_cache *cache);
 
+/*
+ * content-map.c
+ */
+extern void cachefiles_mark_content_map(struct fscache_io_request *req);
+extern void cachefiles_expand_content_map(struct cachefiles_object *object, loff_t size);
+extern void cachefiles_shorten_content_map(struct cachefiles_object *object, loff_t new_size);
+extern bool cachefiles_load_content_map(struct cachefiles_object *object);
+extern void cachefiles_save_content_map(struct cachefiles_object *object);
+
 /*
  * daemon.c
  */
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index d17734455af2..e324b835b1a0 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -61,6 +61,10 @@ bool cachefiles_open_object(struct cachefiles_object *object)
 	path.mnt = cache->mnt;
 	path.dentry = object->dentry;
 
+	if (object->content_info == CACHEFILES_CONTENT_MAP &&
+	    !cachefiles_load_content_map(object))
+		goto error;
+
 	file = open_with_fake_path(&path,
 				   O_RDWR | O_LARGEFILE | O_DIRECT,
 				   d_backing_inode(object->dentry),
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index 17c16c2bd07e..a1d4a3d1db69 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -16,8 +16,11 @@
 #include "internal.h"
 
 struct cachefiles_xattr {
-	uint8_t				type;
-	uint8_t				data[];
+	__be64	object_size;	/* Actual size of the object */
+	__be64	zero_point;	/* Size after which server has no data not written by us */
+	__u8	type;		/* Type of object */
+	__u8	content;	/* Content presence (enum cachefiles_content) */
+	__u8	data[];		/* netfs coherency data */
 } __packed;
 
 static const char cachefiles_xattr_cache[] =
@@ -118,7 +121,10 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 	if (!buf)
 		return -ENOMEM;
 
-	buf->type = object->fscache.cookie->type;
+	buf->object_size	= cpu_to_be64(object->fscache.cookie->object_size);
+	buf->zero_point		= cpu_to_be64(object->fscache.cookie->zero_point);
+	buf->type		= object->fscache.cookie->type;
+	buf->content		= object->content_info;
 	if (len > 0)
 		memcpy(buf->data, fscache_get_aux(object->fscache.cookie), len);
 
@@ -127,7 +133,7 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 			   xattr_flags);
 	if (ret < 0) {
 		trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
-					   0,
+					   buf->content,
 					   cachefiles_coherency_set_fail);
 		if (ret != -ENOMEM)
 			cachefiles_io_error_obj(
@@ -135,7 +141,7 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 				"Failed to set xattr with error %d", ret);
 	} else {
 		trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
-					   0,
+					   buf->content,
 					   cachefiles_coherency_set_ok);
 	}
 
@@ -174,15 +180,21 @@ int cachefiles_check_auxdata(struct cachefiles_object *object)
 		why = cachefiles_coherency_check_xattr;
 	} else if (buf->type != object->fscache.cookie->type) {
 		why = cachefiles_coherency_check_type;
+	} else if (buf->content >= nr__cachefiles_content) {
+		why = cachefiles_coherency_check_content;
 	} else if (memcmp(buf->data, p, len) != 0) {
 		why = cachefiles_coherency_check_aux;
+	} else if (be64_to_cpu(buf->object_size) != object->fscache.cookie->object_size) {
+		why = cachefiles_coherency_check_objsize;
 	} else {
+		object->fscache.cookie->zero_point = be64_to_cpu(buf->zero_point);
+		object->content_info = buf->content;
 		why = cachefiles_coherency_check_ok;
 		ret = 0;
 	}
 
 	trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
-				   0, why);
+				   buf->content, why);
 	kfree(buf);
 	return ret;
 }
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index bf588c3f4a07..e7af1d683009 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -324,7 +324,7 @@ TRACE_EVENT(cachefiles_mark_buried,
 TRACE_EVENT(cachefiles_coherency,
 	    TP_PROTO(struct cachefiles_object *obj,
 		     ino_t ino,
-		     int content,
+		     enum cachefiles_content content,
 		     enum cachefiles_coherency_trace why),
 
 	    TP_ARGS(obj, ino, content, why),
@@ -333,7 +333,7 @@ TRACE_EVENT(cachefiles_coherency,
 	    TP_STRUCT__entry(
 		    __field(unsigned int,			obj	)
 		    __field(enum cachefiles_coherency_trace,	why	)
-		    __field(int,				content	)
+		    __field(enum cachefiles_content,		content	)
 		    __field(u64,				ino	)
 			     ),
 



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 19/32] cachefiles: Implement a content-present indicator and bitmap
@ 2020-07-13 16:34   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:34 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs

Implement a content indicator that indicates the presence or absence of
content and a bitmap that indicates which blocks of granular content are
present in a granular file.  This is added to the xattr that stores the
netfs coherency data, along with the file size and the file zero point (the
point after which it can be assumed that the server doesn't have any data).

In the content bitmap, if present, each bit indicates which 256KiB granules
of a cache file are present.  This is stored in a separate xattr, which is
loaded when the first I/O handle is created on that cache object and saved
when the object is discarded from memory.

Non-index objects in the cache can be monolithic or granular.  The content
map isn't used for monolithic objects (FSCACHE_COOKIE_ADV_SINGLE_CHUNK) as
they are expected to be all-or-nothing, so the content indicator alone
suffices.  Examples of this would be AFS directory or symlink content.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/Makefile            |    1 
 fs/cachefiles/bind.c              |    1 
 fs/cachefiles/content-map.c       |  251 +++++++++++++++++++++++++++++++++++++
 fs/cachefiles/interface.c         |    5 +
 fs/cachefiles/internal.h          |   31 +++++
 fs/cachefiles/io.c                |    4 +
 fs/cachefiles/xattr.c             |   24 +++-
 include/trace/events/cachefiles.h |    4 -
 8 files changed, 313 insertions(+), 8 deletions(-)
 create mode 100644 fs/cachefiles/content-map.c

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index d894d317d6e7..84615aca866a 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -5,6 +5,7 @@
 
 cachefiles-y := \
 	bind.o \
+	content-map.o \
 	daemon.o \
 	interface.o \
 	io.o \
diff --git a/fs/cachefiles/bind.c b/fs/cachefiles/bind.c
index 84fe89d5999e..40377633e3d9 100644
--- a/fs/cachefiles/bind.c
+++ b/fs/cachefiles/bind.c
@@ -102,6 +102,7 @@ static int cachefiles_daemon_add_cache(struct cachefiles_cache *cache)
 		goto error_root_object;
 
 	atomic_set(&fsdef->usage, 1);
+	rwlock_init(&fsdef->content_map_lock);
 	fsdef->type = FSCACHE_COOKIE_TYPE_INDEX;
 
 	_debug("- fsdef %p", fsdef);
diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
new file mode 100644
index 000000000000..594624cb1cb9
--- /dev/null
+++ b/fs/cachefiles/content-map.c
@@ -0,0 +1,251 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Datafile content management
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/mount.h>
+#include <linux/slab.h>
+#include <linux/file.h>
+#include <linux/swap.h>
+#include <linux/xattr.h>
+#include "internal.h"
+
+static const char cachefiles_xattr_content_map[] =
+	XATTR_USER_PREFIX "CacheFiles.content";
+
+static bool cachefiles_granule_is_present(struct cachefiles_object *object,
+					  size_t granule)
+{
+	bool res;
+
+	if (granule / 8 >= object->content_map_size)
+		return false;
+	read_lock_bh(&object->content_map_lock);
+	res = test_bit_le(granule, object->content_map);
+	read_unlock_bh(&object->content_map_lock);
+	return res;
+}
+
+/*
+ * Mark the content map to indicate stored granule.
+ */
+void cachefiles_mark_content_map(struct fscache_io_request *req)
+{
+	struct cachefiles_object *object =
+		container_of(req->object, struct cachefiles_object, fscache);
+	loff_t pos = req->pos;
+
+	_enter("%llx", pos);
+
+	read_lock_bh(&object->content_map_lock);
+
+	if (object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK) {
+		if (pos == 0) {
+			object->content_info = CACHEFILES_CONTENT_SINGLE;
+			set_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->fscache.flags);
+		}
+	} else {
+		pgoff_t granule;
+		loff_t end = pos + req->len;
+
+		pos = round_down(pos, CACHEFILES_GRAN_SIZE);
+		do {
+			granule = pos / CACHEFILES_GRAN_SIZE;
+			if (granule / 8 >= object->content_map_size)
+				break;
+
+			set_bit_le(granule, object->content_map);
+			object->content_map_changed = true;
+			pos += CACHEFILES_GRAN_SIZE;
+
+		} while (pos < end);
+
+		if (object->content_info != CACHEFILES_CONTENT_MAP) {
+			object->content_info = CACHEFILES_CONTENT_MAP;
+			set_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->fscache.flags);
+		}
+	}
+
+	read_unlock_bh(&object->content_map_lock);
+}
+
+/*
+ * Expand the content map to a larger file size.
+ */
+void cachefiles_expand_content_map(struct cachefiles_object *object, loff_t size)
+{
+	u8 *map, *zap;
+
+	/* Determine the size.  There's one bit per granule.  We size it in
+	 * terms of 8-byte chunks, where a 64-bit span * 256KiB bytes granules
+	 * covers 16MiB of file space.  At that, 512B will cover 1GiB.
+	 */
+	if (size > 0) {
+		size += CACHEFILES_GRAN_SIZE - 1;
+		size /= CACHEFILES_GRAN_SIZE;
+		size += 8 - 1;
+		size /= 8;
+		size = roundup_pow_of_two(size);
+	} else {
+		size = 8;
+	}
+
+	if (size <= object->content_map_size)
+		return;
+
+	map = kzalloc(size, GFP_KERNEL);
+	if (!map)
+		return;
+
+	write_lock_bh(&object->content_map_lock);
+	if (size > object->content_map_size) {
+		zap = object->content_map;
+		memcpy(map, zap, object->content_map_size);
+		object->content_map = map;
+		object->content_map_size = size;
+	} else {
+		zap = map;
+	}
+	write_unlock_bh(&object->content_map_lock);
+
+	kfree(zap);
+}
+
+/*
+ * Adjust the content map when we shorten a backing object.
+ *
+ * We need to unmark any granules that are going to be discarded.
+ */
+void cachefiles_shorten_content_map(struct cachefiles_object *object,
+				    loff_t new_size)
+{
+	struct fscache_cookie *cookie = object->fscache.cookie;
+	loff_t granule, o_granule;
+
+	if (object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK)
+		return;
+
+	write_lock_bh(&object->content_map_lock);
+
+	if (object->content_info == CACHEFILES_CONTENT_MAP) {
+		if (cookie->zero_point > new_size)
+			cookie->zero_point = new_size;
+
+		granule = new_size;
+		granule += CACHEFILES_GRAN_SIZE - 1;
+		granule /= CACHEFILES_GRAN_SIZE;
+
+		o_granule = cookie->object_size;
+		o_granule += CACHEFILES_GRAN_SIZE - 1;
+		o_granule /= CACHEFILES_GRAN_SIZE;
+
+		for (; o_granule > granule; o_granule--)
+			clear_bit_le(o_granule, object->content_map);
+	}
+
+	write_unlock_bh(&object->content_map_lock);
+}
+
+/*
+ * Load the content map.
+ */
+bool cachefiles_load_content_map(struct cachefiles_object *object)
+{
+	struct cachefiles_cache *cache = container_of(object->fscache.cache,
+						      struct cachefiles_cache, cache);
+	const struct cred *saved_cred;
+	ssize_t got;
+	loff_t size;
+	u8 *map = NULL;
+
+	_enter("c=%08x,%llx",
+	       object->fscache.cookie->debug_id,
+	       object->fscache.cookie->object_size);
+
+	object->content_info = CACHEFILES_CONTENT_NO_DATA;
+	if (object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK) {
+		/* Single-chunk object.  The presence or absence of the content
+		 * map xattr is sufficient indication.
+		 */
+		size = 0;
+	} else {
+		/* Granulated object.  There's one bit per granule.  We size it
+		 * in terms of 8-byte chunks, where a 64-bit span * 256KiB
+		 * bytes granules covers 16MiB of file space.  At that, 512B
+		 * will cover 1GiB.
+		 */
+		size = object->fscache.cookie->object_size;
+		if (size > 0) {
+			size += CACHEFILES_GRAN_SIZE - 1;
+			size /= CACHEFILES_GRAN_SIZE;
+			size += 8 - 1;
+			size /= 8;
+			if (size < 8)
+				size = 8;
+			size = roundup_pow_of_two(size);
+		} else {
+			size = 8;
+		}
+
+		map = kzalloc(size, GFP_KERNEL);
+		if (!map)
+			return false;
+	}
+
+	cachefiles_begin_secure(cache, &saved_cred);
+	got = vfs_getxattr(object->dentry, cachefiles_xattr_content_map,
+			   map, size);
+	cachefiles_end_secure(cache, saved_cred);
+	if (got < 0 && got != -ENODATA) {
+		kfree(map);
+		_leave(" = f [%zd]", got);
+		return false;
+	}
+
+	if (size == 0) {
+		if (got != -ENODATA)
+			object->content_info = CACHEFILES_CONTENT_SINGLE;
+		_leave(" = t [%zd]", got);
+	} else {
+		object->content_map = map;
+		object->content_map_size = size;
+		object->content_info = CACHEFILES_CONTENT_MAP;
+		_leave(" = t [%zd/%llu %*phN]", got, size, (int)size, map);
+	}
+
+	return true;
+}
+
+/*
+ * Save the content map.
+ */
+void cachefiles_save_content_map(struct cachefiles_object *object)
+{
+	ssize_t ret;
+	size_t size;
+	u8 *map;
+
+	_enter("c=%08x", object->fscache.cookie->debug_id);
+
+	if (object->content_info != CACHEFILES_CONTENT_MAP)
+		return;
+
+	size = object->content_map_size;
+	map = object->content_map;
+
+	/* Don't save trailing zeros, but do save at least one byte */
+	for (; size > 0; size--)
+		if (map[size - 1])
+			break;
+
+	ret = vfs_setxattr(object->dentry, cachefiles_xattr_content_map,
+			   map, size, 0);
+	if (ret < 0) {
+		cachefiles_io_error_obj(object, "Unable to set xattr");
+		return;
+	}
+
+	_leave(" = %zd", ret);
+}
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 6384fba652eb..de4fb41103a6 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -37,6 +37,7 @@ struct fscache_object *cachefiles_alloc_object(struct fscache_cookie *cookie,
 		return NULL;
 	}
 
+	rwlock_init(&object->content_map_lock);
 	fscache_object_init(&object->fscache, cookie, &cache->cache);
 	object->fscache.parent = parent;
 	object->fscache.stage = FSCACHE_OBJECT_STAGE_LOOKING_UP;
@@ -198,6 +199,8 @@ static void cachefiles_update_object(struct fscache_object *_object)
 static void cachefiles_commit_object(struct cachefiles_object *object,
 				     struct cachefiles_cache *cache)
 {
+	if (object->content_map_changed)
+		cachefiles_save_content_map(object);
 }
 
 /*
@@ -298,6 +301,8 @@ static void cachefiles_put_object(struct fscache_object *_object,
 		ASSERTCMP(object->dentry, ==, NULL);
 		ASSERTCMP(object->fscache.n_children, ==, 0);
 
+		kfree(object->content_map);
+
 		cache = object->fscache.cache;
 		fscache_object_destroy(&object->fscache);
 		kmem_cache_free(cachefiles_object_jar, object);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index a00ffb63baf4..4085c1185693 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -19,6 +19,11 @@
 #include <linux/workqueue.h>
 #include <linux/security.h>
 
+/* Cachefile granularity */
+#define CACHEFILES_GRAN_SIZE	(256 * 1024)
+#define CACHEFILES_GRAN_PAGES	(CACHEFILES_GRAN_SIZE / PAGE_SIZE)
+#define CACHEFILES_DIO_BLOCK_SIZE 4096
+
 struct cachefiles_cache;
 struct cachefiles_object;
 
@@ -29,6 +34,16 @@ extern unsigned cachefiles_debug;
 
 #define cachefiles_gfp (__GFP_RECLAIM | __GFP_NORETRY | __GFP_NOMEMALLOC)
 
+enum cachefiles_content {
+	/* These values are saved on disk */
+	CACHEFILES_CONTENT_NO_DATA	= 0, /* No content stored */
+	CACHEFILES_CONTENT_SINGLE	= 1, /* Content is monolithic, all is present */
+	CACHEFILES_CONTENT_ALL		= 2, /* Content is all present, no map */
+	CACHEFILES_CONTENT_MAP		= 3, /* Content is piecemeal, map in use */
+	CACHEFILES_CONTENT_DIRTY	= 4, /* Content is dirty (only seen on disk) */
+	nr__cachefiles_content
+};
+
 /*
  * node records
  */
@@ -41,6 +56,13 @@ struct cachefiles_object {
 	atomic_t			usage;		/* object usage count */
 	uint8_t				type;		/* object type */
 	bool				new;		/* T if object new */
+
+	/* Map of the content blocks in the object */
+	enum cachefiles_content		content_info:8;	/* Info about content presence */
+	bool				content_map_changed;
+	u8				*content_map;		/* Content present bitmap */
+	unsigned int			content_map_size;	/* Size of buffer */
+	rwlock_t			content_map_lock;
 };
 
 extern struct kmem_cache *cachefiles_object_jar;
@@ -100,6 +122,15 @@ static inline void cachefiles_state_changed(struct cachefiles_cache *cache)
 extern int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args);
 extern void cachefiles_daemon_unbind(struct cachefiles_cache *cache);
 
+/*
+ * content-map.c
+ */
+extern void cachefiles_mark_content_map(struct fscache_io_request *req);
+extern void cachefiles_expand_content_map(struct cachefiles_object *object, loff_t size);
+extern void cachefiles_shorten_content_map(struct cachefiles_object *object, loff_t new_size);
+extern bool cachefiles_load_content_map(struct cachefiles_object *object);
+extern void cachefiles_save_content_map(struct cachefiles_object *object);
+
 /*
  * daemon.c
  */
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index d17734455af2..e324b835b1a0 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -61,6 +61,10 @@ bool cachefiles_open_object(struct cachefiles_object *object)
 	path.mnt = cache->mnt;
 	path.dentry = object->dentry;
 
+	if (object->content_info == CACHEFILES_CONTENT_MAP &&
+	    !cachefiles_load_content_map(object))
+		goto error;
+
 	file = open_with_fake_path(&path,
 				   O_RDWR | O_LARGEFILE | O_DIRECT,
 				   d_backing_inode(object->dentry),
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index 17c16c2bd07e..a1d4a3d1db69 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -16,8 +16,11 @@
 #include "internal.h"
 
 struct cachefiles_xattr {
-	uint8_t				type;
-	uint8_t				data[];
+	__be64	object_size;	/* Actual size of the object */
+	__be64	zero_point;	/* Size after which server has no data not written by us */
+	__u8	type;		/* Type of object */
+	__u8	content;	/* Content presence (enum cachefiles_content) */
+	__u8	data[];		/* netfs coherency data */
 } __packed;
 
 static const char cachefiles_xattr_cache[] =
@@ -118,7 +121,10 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 	if (!buf)
 		return -ENOMEM;
 
-	buf->type = object->fscache.cookie->type;
+	buf->object_size	= cpu_to_be64(object->fscache.cookie->object_size);
+	buf->zero_point		= cpu_to_be64(object->fscache.cookie->zero_point);
+	buf->type		= object->fscache.cookie->type;
+	buf->content		= object->content_info;
 	if (len > 0)
 		memcpy(buf->data, fscache_get_aux(object->fscache.cookie), len);
 
@@ -127,7 +133,7 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 			   xattr_flags);
 	if (ret < 0) {
 		trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
-					   0,
+					   buf->content,
 					   cachefiles_coherency_set_fail);
 		if (ret != -ENOMEM)
 			cachefiles_io_error_obj(
@@ -135,7 +141,7 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 				"Failed to set xattr with error %d", ret);
 	} else {
 		trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
-					   0,
+					   buf->content,
 					   cachefiles_coherency_set_ok);
 	}
 
@@ -174,15 +180,21 @@ int cachefiles_check_auxdata(struct cachefiles_object *object)
 		why = cachefiles_coherency_check_xattr;
 	} else if (buf->type != object->fscache.cookie->type) {
 		why = cachefiles_coherency_check_type;
+	} else if (buf->content >= nr__cachefiles_content) {
+		why = cachefiles_coherency_check_content;
 	} else if (memcmp(buf->data, p, len) != 0) {
 		why = cachefiles_coherency_check_aux;
+	} else if (be64_to_cpu(buf->object_size) != object->fscache.cookie->object_size) {
+		why = cachefiles_coherency_check_objsize;
 	} else {
+		object->fscache.cookie->zero_point = be64_to_cpu(buf->zero_point);
+		object->content_info = buf->content;
 		why = cachefiles_coherency_check_ok;
 		ret = 0;
 	}
 
 	trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
-				   0, why);
+				   buf->content, why);
 	kfree(buf);
 	return ret;
 }
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index bf588c3f4a07..e7af1d683009 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -324,7 +324,7 @@ TRACE_EVENT(cachefiles_mark_buried,
 TRACE_EVENT(cachefiles_coherency,
 	    TP_PROTO(struct cachefiles_object *obj,
 		     ino_t ino,
-		     int content,
+		     enum cachefiles_content content,
 		     enum cachefiles_coherency_trace why),
 
 	    TP_ARGS(obj, ino, content, why),
@@ -333,7 +333,7 @@ TRACE_EVENT(cachefiles_coherency,
 	    TP_STRUCT__entry(
 		    __field(unsigned int,			obj	)
 		    __field(enum cachefiles_coherency_trace,	why	)
-		    __field(int,				content	)
+		    __field(enum cachefiles_content,		content	)
 		    __field(u64,				ino	)
 			     ),
 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 20/32] cachefiles: Implement extent shaper
  2020-07-13 16:30 ` David Howells
                   ` (19 preceding siblings ...)
  (?)
@ 2020-07-13 16:34 ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:34 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Implement the function that shapes extents to map onto the granules in a
cache file.

When setting to fetch data from the server to be cached, the extent will be
expanded to align with granule size and cut down so that it doesn't cross
the boundary between a non-present extent and a present extent.

When setting to read data from the cache, the extent will be cut down so
that it doesn't cross the boundary between a present extent and a
non-present extent.

If no caching is taking place, whatever was requested goes.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/content-map.c |  217 ++++++++++++++++++++++++++++++++++++-------
 fs/cachefiles/internal.h    |    4 -
 fs/cachefiles/io.c          |   10 --
 3 files changed, 184 insertions(+), 47 deletions(-)

diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
index 594624cb1cb9..91c44bb39a93 100644
--- a/fs/cachefiles/content-map.c
+++ b/fs/cachefiles/content-map.c
@@ -15,6 +15,31 @@
 static const char cachefiles_xattr_content_map[] =
 	XATTR_USER_PREFIX "CacheFiles.content";
 
+/*
+ * Determine the map size for a granulated object.
+ *
+ * There's one bit per granule.  We size it in terms of 8-byte chunks, where a
+ * 64-bit span * 256KiB bytes granules covers 16MiB of file space.  At that,
+ * 512B will cover 1GiB.
+ */
+static size_t cachefiles_map_size(loff_t i_size)
+{
+	loff_t size;
+	size_t granules, bits, bytes, map_size;
+
+	if (i_size <= CACHEFILES_GRAN_SIZE * 64)
+		return 8;
+
+	size = i_size + CACHEFILES_GRAN_SIZE - 1;
+	granules = size / CACHEFILES_GRAN_SIZE;
+	bits = granules + (64 - 1);
+	bits &= ~(64 - 1);
+	bytes = bits / 8;
+	map_size = roundup_pow_of_two(bytes);
+	_leave(" = %zx [i=%llx g=%zu b=%zu]", map_size, i_size, granules, bits);
+	return map_size;
+}
+
 static bool cachefiles_granule_is_present(struct cachefiles_object *object,
 					  size_t granule)
 {
@@ -28,6 +53,145 @@ static bool cachefiles_granule_is_present(struct cachefiles_object *object,
 	return res;
 }
 
+/*
+ * Shape the extent of a single-chunk data object.
+ */
+static void cachefiles_shape_single(struct fscache_object *obj,
+				    struct fscache_request_shape *shape)
+{
+	struct cachefiles_object *object =
+		container_of(obj, struct cachefiles_object, fscache);
+	pgoff_t eof;
+
+	_enter("{%lx,%x,%x},%llx,%d",
+	       shape->proposed_start, shape->proposed_nr_pages,
+	       shape->max_io_pages, shape->i_size, shape->for_write);
+
+	shape->dio_block_size = CACHEFILES_DIO_BLOCK_SIZE;
+
+	if (object->content_info == CACHEFILES_CONTENT_SINGLE) {
+		shape->to_be_done = FSCACHE_READ_FROM_CACHE;
+	} else {
+		eof = (shape->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
+		shape->actual_start = 0;
+		shape->actual_nr_pages = eof;
+		shape->granularity = 0;
+		shape->to_be_done = FSCACHE_WRITE_TO_CACHE;
+	}
+}
+
+/*
+ * Determine the size of a data extent in a cache object.
+ *
+ * In cachefiles, a data cache object is divided into granules of 256KiB, each
+ * of which must be written as a whole unit when the cache is being loaded.
+ * Data may be read out piecemeal.
+ *
+ * The extent is resized, but the result will always contain the starting page
+ * from the extent.
+ *
+ * If the granule does not exist in the cachefile, the start may be brought
+ * forward to align with the beginning of a granule boundary, and the end may be
+ * moved either way to align also.  The extent will be cut off it it would cross
+ * over the boundary between what's cached and what's not.
+ *
+ * If the starting granule does exist in the cachefile, the extent will be
+ * shortened, if necessary, so that it doesn't cross over into a region that is
+ * not present.
+ *
+ * If the granule does not exist and we cannot cache it for lack of space, the
+ * requested extent is left unaltered.
+ */
+void cachefiles_shape_request(struct fscache_object *obj,
+			      struct fscache_request_shape *shape)
+{
+	struct cachefiles_object *object =
+		container_of(obj, struct cachefiles_object, fscache);
+	unsigned int max_pages;
+	pgoff_t start, end, eof, bend;
+	size_t granule;
+
+	if (object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK) {
+		cachefiles_shape_single(obj, shape);
+		goto out;
+	}
+
+	start	= shape->proposed_start;
+	end	= shape->proposed_start + shape->proposed_nr_pages;
+	max_pages = shape->max_io_pages;
+	_enter("{%lx,%lx,%x},%llx,%d",
+	       start, end, max_pages, shape->i_size, shape->for_write);
+
+	max_pages = round_down(max_pages, CACHEFILES_GRAN_PAGES);
+	if (end - start > max_pages)
+		end = start + max_pages;
+
+	/* If the content map didn't get expanded for some reason - simply
+	 * ignore this granule.
+	 */
+	granule = start / CACHEFILES_GRAN_PAGES;
+	if (granule / 8 >= object->content_map_size)
+		return;
+
+	if (cachefiles_granule_is_present(object, granule)) {
+		/* The start of the requested extent is present in the cache -
+		 * restrict the returned extent to the maximum length of what's
+		 * available.
+		 */
+		bend = round_up(start + 1, CACHEFILES_GRAN_PAGES);
+		while (bend < end) {
+			pgoff_t i = round_up(bend + 1, CACHEFILES_GRAN_PAGES);
+			granule = i / CACHEFILES_GRAN_PAGES;
+			if (!cachefiles_granule_is_present(object, granule))
+				break;
+			bend = i;
+		}
+
+		if (end > bend)
+			end = bend;
+		shape->to_be_done = FSCACHE_READ_FROM_CACHE;
+	} else {
+		/* Otherwise expand the extent in both directions to cover what
+		 * we want for caching purposes.
+		 */
+		start = round_down(start, CACHEFILES_GRAN_PAGES);
+		end   = round_up(end, CACHEFILES_GRAN_PAGES);
+
+		/* But trim to the end of the file and the starting page */
+		eof = (shape->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+		if (eof <= shape->proposed_start)
+			eof = shape->proposed_start + 1;
+		if (end > eof)
+			end = eof;
+
+		if ((start << PAGE_SHIFT) >= object->fscache.cookie->zero_point) {
+			/* The start of the requested extent is beyond the
+			 * original EOF of the file on the server - therefore
+			 * it's not going to be found on the server.
+			 */
+			end = round_up(start + 1, CACHEFILES_GRAN_PAGES);
+			shape->to_be_done = FSCACHE_FILL_WITH_ZERO;
+		} else {
+			end = start + CACHEFILES_GRAN_PAGES;
+			if (end > eof)
+				end = eof;
+			shape->to_be_done = FSCACHE_WRITE_TO_CACHE;
+		}
+
+		/* TODO: Check we have space in the cache */
+	}
+
+	shape->actual_start	= start;
+	shape->actual_nr_pages	= end - start;
+	shape->granularity	= CACHEFILES_GRAN_PAGES;
+	shape->dio_block_size	= CACHEFILES_DIO_BLOCK_SIZE;
+
+out:
+	_leave(" [%x,%lx,%x]",
+	       shape->to_be_done, shape->actual_start, shape->actual_nr_pages);
+}
+
 /*
  * Mark the content map to indicate stored granule.
  */
@@ -74,23 +238,14 @@ void cachefiles_mark_content_map(struct fscache_io_request *req)
 /*
  * Expand the content map to a larger file size.
  */
-void cachefiles_expand_content_map(struct cachefiles_object *object, loff_t size)
+void cachefiles_expand_content_map(struct cachefiles_object *object, loff_t i_size)
 {
+	size_t size;
 	u8 *map, *zap;
 
-	/* Determine the size.  There's one bit per granule.  We size it in
-	 * terms of 8-byte chunks, where a 64-bit span * 256KiB bytes granules
-	 * covers 16MiB of file space.  At that, 512B will cover 1GiB.
-	 */
-	if (size > 0) {
-		size += CACHEFILES_GRAN_SIZE - 1;
-		size /= CACHEFILES_GRAN_SIZE;
-		size += 8 - 1;
-		size /= 8;
-		size = roundup_pow_of_two(size);
-	} else {
-		size = 8;
-	}
+	size = cachefiles_map_size(i_size);
+
+	_enter("%llx,%zx,%x", i_size, size, object->content_map_size);
 
 	if (size <= object->content_map_size)
 		return;
@@ -122,7 +277,7 @@ void cachefiles_shorten_content_map(struct cachefiles_object *object,
 				    loff_t new_size)
 {
 	struct fscache_cookie *cookie = object->fscache.cookie;
-	loff_t granule, o_granule;
+	size_t granule, tmp, bytes;
 
 	if (object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK)
 		return;
@@ -137,12 +292,16 @@ void cachefiles_shorten_content_map(struct cachefiles_object *object,
 		granule += CACHEFILES_GRAN_SIZE - 1;
 		granule /= CACHEFILES_GRAN_SIZE;
 
-		o_granule = cookie->object_size;
-		o_granule += CACHEFILES_GRAN_SIZE - 1;
-		o_granule /= CACHEFILES_GRAN_SIZE;
+		tmp = granule;
+		tmp = round_up(granule, 64);
+		bytes = tmp / 8;
+		if (bytes < object->content_map_size)
+			memset(object->content_map + bytes, 0,
+			       object->content_map_size - bytes);
 
-		for (; o_granule > granule; o_granule--)
-			clear_bit_le(o_granule, object->content_map);
+		if (tmp > granule)
+			for (tmp--; tmp > granule; tmp--)
+				clear_bit_le(tmp, object->content_map);
 	}
 
 	write_unlock_bh(&object->content_map_lock);
@@ -157,7 +316,7 @@ bool cachefiles_load_content_map(struct cachefiles_object *object)
 						      struct cachefiles_cache, cache);
 	const struct cred *saved_cred;
 	ssize_t got;
-	loff_t size;
+	size_t size;
 	u8 *map = NULL;
 
 	_enter("c=%08x,%llx",
@@ -176,19 +335,7 @@ bool cachefiles_load_content_map(struct cachefiles_object *object)
 		 * bytes granules covers 16MiB of file space.  At that, 512B
 		 * will cover 1GiB.
 		 */
-		size = object->fscache.cookie->object_size;
-		if (size > 0) {
-			size += CACHEFILES_GRAN_SIZE - 1;
-			size /= CACHEFILES_GRAN_SIZE;
-			size += 8 - 1;
-			size /= 8;
-			if (size < 8)
-				size = 8;
-			size = roundup_pow_of_two(size);
-		} else {
-			size = 8;
-		}
-
+		size = cachefiles_map_size(object->fscache.cookie->object_size);
 		map = kzalloc(size, GFP_KERNEL);
 		if (!map)
 			return false;
@@ -212,7 +359,7 @@ bool cachefiles_load_content_map(struct cachefiles_object *object)
 		object->content_map = map;
 		object->content_map_size = size;
 		object->content_info = CACHEFILES_CONTENT_MAP;
-		_leave(" = t [%zd/%llu %*phN]", got, size, (int)size, map);
+		_leave(" = t [%zd/%zu %*phN]", got, size, (int)size, map);
 	}
 
 	return true;
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 4085c1185693..2ea469b77712 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -125,6 +125,8 @@ extern void cachefiles_daemon_unbind(struct cachefiles_cache *cache);
 /*
  * content-map.c
  */
+extern void cachefiles_shape_request(struct fscache_object *object,
+				     struct fscache_request_shape *shape);
 extern void cachefiles_mark_content_map(struct fscache_io_request *req);
 extern void cachefiles_expand_content_map(struct cachefiles_object *object, loff_t size);
 extern void cachefiles_shorten_content_map(struct cachefiles_object *object, loff_t new_size);
@@ -149,8 +151,6 @@ extern struct fscache_object *cachefiles_grab_object(struct fscache_object *_obj
 /*
  * io.c
  */
-extern void cachefiles_shape_request(struct fscache_object *object,
-				     struct fscache_request_shape *shape);
 extern int cachefiles_read(struct fscache_object *object,
 			   struct fscache_io_request *req,
 			   struct iov_iter *iter);
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index e324b835b1a0..ddb44ec5a199 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -12,16 +12,6 @@
 #include <linux/xattr.h>
 #include "internal.h"
 
-/*
- * Determine the size of a data extent in a cache object.  This must be written
- * as a whole unit, but can be read piecemeal.
- */
-void cachefiles_shape_request(struct fscache_object *object,
-			      struct fscache_request_shape *shape)
-{
-	return 0;
-}
-
 /*
  * Initiate a read from the cache.
  */



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 21/32] cachefiles: Round the cachefile size up to DIO block size
  2020-07-13 16:30 ` David Howells
                   ` (20 preceding siblings ...)
  (?)
@ 2020-07-13 16:34 ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:34 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Round the size of a cachefile up to DIO block size so that we can always
read back the last partial page of a file using direct I/O.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/interface.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index de4fb41103a6..054d5cc794b5 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -184,6 +184,17 @@ static void cachefiles_update_object(struct fscache_object *_object)
 			cachefiles_remove_object_xattr(cache, object->dentry);
 			goto out;
 		}
+
+		object_size = round_up(object_size, CACHEFILES_DIO_BLOCK_SIZE);
+		_debug("trunc %llx -> %llx", i_size_read(d_inode(object->dentry)), object_size);
+		if (i_size_read(d_inode(object->dentry)) < object_size) {
+			ret = vfs_truncate(&path, object_size);
+			if (ret < 0) {
+				cachefiles_io_error_obj(object, "Trunc-to-dio-size failed");
+				cachefiles_remove_object_xattr(cache, object->dentry);
+				goto out;
+			}
+		}
 	}
 
 	cachefiles_set_object_xattr(object, XATTR_REPLACE);
@@ -354,6 +365,7 @@ static int cachefiles_attr_changed(struct cachefiles_object *object)
 	int ret;
 
 	ni_size = object->fscache.cookie->object_size;
+	ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
 
 	_enter("{OBJ%x},[%llu]",
 	       object->fscache.debug_id, (unsigned long long) ni_size);
@@ -422,6 +434,7 @@ static void cachefiles_invalidate_object(struct fscache_object *_object)
 			     struct cachefiles_cache, cache);
 
 	ni_size = object->fscache.cookie->object_size;
+	ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
 
 	_enter("{OBJ%x},[%llu]",
 	       object->fscache.debug_id, (unsigned long long)ni_size);



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 22/32] cachefiles: Implement read and write parts of new I/O API
  2020-07-13 16:30 ` David Howells
@ 2020-07-13 16:34   ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:34 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Implement writing into the cache and reading back from the cache inside
cachefiles using asynchronous direct I/O from the specified iterator.  The
size and position of the request should be aligned to the reported
dio_block_size.

Errors and completion are reported by callback.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/io.c |  208 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 202 insertions(+), 6 deletions(-)

diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index ddb44ec5a199..42e0d620d778 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -12,30 +12,226 @@
 #include <linux/xattr.h>
 #include "internal.h"
 
+struct cachefiles_kiocb {
+	struct kiocb			iocb;
+	struct fscache_io_request	*req;
+	refcount_t			ki_refcnt;
+};
+
+static inline void cachefiles_put_kiocb(struct cachefiles_kiocb *ki)
+{
+	if (refcount_dec_and_test(&ki->ki_refcnt)) {
+		fscache_put_io_request(ki->req);
+		fput(ki->iocb.ki_filp);
+		kfree(ki);
+	}
+}
+
+/*
+ * Handle completion of a read from the cache.
+ */
+static void cachefiles_read_complete(struct kiocb *iocb, long ret, long ret2)
+{
+	struct cachefiles_kiocb *ki = container_of(iocb, struct cachefiles_kiocb, iocb);
+	struct fscache_io_request *req = ki->req;
+
+	_enter("%llx,%ld,%ld", req->len, ret, ret2);
+
+	fscache_end_io_operation(req->cookie);
+
+	if (ret < 0) {
+		req->error = ret;
+	} else if (ret != req->len) {
+		req->error = -ENODATA;
+	} else {
+		req->transferred = ret;
+		set_bit(FSCACHE_IO_DATA_FROM_CACHE, &req->flags);
+	}
+	if (req->io_done)
+		req->io_done(req);
+	cachefiles_put_kiocb(ki);
+}
+
 /*
  * Initiate a read from the cache.
  */
-int cachefiles_read(struct fscache_object *object,
+int cachefiles_read(struct fscache_object *obj,
 		    struct fscache_io_request *req,
 		    struct iov_iter *iter)
 {
-	req->error = -ENODATA;
+	struct cachefiles_object *object =
+		container_of(obj, struct cachefiles_object, fscache);
+	struct cachefiles_kiocb *ki;
+	struct file *file = object->backing_file;
+	ssize_t ret = -ENOBUFS;
+
+	_enter("%pD,%li,%llx,%llx/%llx",
+	       file, file_inode(file)->i_ino, req->pos, req->len, i_size_read(file->f_inode));
+
+	ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
+	if (!ki)
+		goto presubmission_error;
+
+	refcount_set(&ki->ki_refcnt, 2);
+	ki->iocb.ki_filp	= get_file(file);
+	ki->iocb.ki_pos		= req->pos;
+	ki->iocb.ki_flags	= IOCB_DIRECT;
+	ki->iocb.ki_hint	= ki_hint_validate(file_write_hint(file));
+	ki->iocb.ki_ioprio	= get_current_ioprio();
+	ki->req			= req;
+
+	if (req->io_done)
+		ki->iocb.ki_complete = cachefiles_read_complete;
+
+	ret = rw_verify_area(READ, file, &ki->iocb.ki_pos, iov_iter_count(iter));
+	if (ret < 0)
+		goto presubmission_error_free;
+
+	fscache_get_io_request(req);
+	ret = call_read_iter(file, &ki->iocb, iter);
+	switch (ret) {
+	case -EIOCBQUEUED:
+		goto in_progress;
+
+	case -ERESTARTSYS:
+	case -ERESTARTNOINTR:
+	case -ERESTARTNOHAND:
+	case -ERESTART_RESTARTBLOCK:
+		/* There's no easy way to restart the syscall since other AIO's
+		 * may be already running. Just fail this IO with EINTR.
+		 */
+		ret = -EINTR;
+		/* Fall through */
+	default:
+		cachefiles_read_complete(&ki->iocb, ret, 0);
+		if (ret > 0)
+			ret = 0;
+		break;
+	}
+
+in_progress:
+	cachefiles_put_kiocb(ki);
+	_leave(" = %zd", ret);
+	return ret;
+
+presubmission_error_free:
+	fput(file);
+	kfree(ki);
+presubmission_error:
+	req->error = -ENOMEM;
+	if (req->io_done)
+		req->io_done(req);
+	return -ENOMEM;
+}
+
+/*
+ * Handle completion of a write to the cache.
+ */
+static void cachefiles_write_complete(struct kiocb *iocb, long ret, long ret2)
+{
+	struct cachefiles_kiocb *ki = container_of(iocb, struct cachefiles_kiocb, iocb);
+	struct fscache_io_request *req = ki->req;
+	struct inode *inode = file_inode(ki->iocb.ki_filp);
+
+	_enter("%llx,%ld,%ld", req->len, ret, ret2);
+
+	/* Tell lockdep we inherited freeze protection from submission thread */
+	__sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
+	__sb_end_write(inode->i_sb, SB_FREEZE_WRITE);
+
+	fscache_end_io_operation(req->cookie);
+
+	if (ret < 0)
+		req->error = ret;
+	else if (ret != req->len)
+		req->error = -ENOBUFS;
+	else
+		cachefiles_mark_content_map(req);
 	if (req->io_done)
 		req->io_done(req);
-	return -ENODATA;
+	cachefiles_put_kiocb(ki);
 }
 
 /*
  * Initiate a write to the cache.
  */
-int cachefiles_write(struct fscache_object *object,
+int cachefiles_write(struct fscache_object *obj,
 		     struct fscache_io_request *req,
 		     struct iov_iter *iter)
 {
-	req->error = -ENOBUFS;
+	struct cachefiles_object *object =
+		container_of(obj, struct cachefiles_object, fscache);
+	struct cachefiles_kiocb *ki;
+	struct inode *inode;
+	struct file *file = object->backing_file;
+	ssize_t ret = -ENOBUFS;
+
+	_enter("%pD,%li,%llx,%llx/%llx",
+	       file, file_inode(file)->i_ino, req->pos, req->len, i_size_read(file->f_inode));
+
+	ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
+	if (!ki)
+		goto presubmission_error;
+
+	refcount_set(&ki->ki_refcnt, 2);
+	ki->iocb.ki_filp	= get_file(file);
+	ki->iocb.ki_pos		= req->pos;
+	ki->iocb.ki_flags	= IOCB_DIRECT | IOCB_WRITE;
+	ki->iocb.ki_hint	= ki_hint_validate(file_write_hint(file));
+	ki->iocb.ki_ioprio	= get_current_ioprio();
+	ki->req			= req;
+
+	if (req->io_done)
+		ki->iocb.ki_complete = cachefiles_write_complete;
+
+	ret = rw_verify_area(WRITE, file, &ki->iocb.ki_pos, iov_iter_count(iter));
+	if (ret < 0)
+		goto presubmission_error_free;
+
+	/* Open-code file_start_write here to grab freeze protection, which
+	 * will be released by another thread in aio_complete_rw().  Fool
+	 * lockdep by telling it the lock got released so that it doesn't
+	 * complain about the held lock when we return to userspace.
+	 */
+	inode = file_inode(file);
+	__sb_start_write(inode->i_sb, SB_FREEZE_WRITE, true);
+	__sb_writers_release(inode->i_sb, SB_FREEZE_WRITE);
+
+	fscache_get_io_request(req);
+	ret = call_write_iter(file, &ki->iocb, iter);
+	switch (ret) {
+	case -EIOCBQUEUED:
+		goto in_progress;
+
+	case -ERESTARTSYS:
+	case -ERESTARTNOINTR:
+	case -ERESTARTNOHAND:
+	case -ERESTART_RESTARTBLOCK:
+		/* There's no easy way to restart the syscall since other AIO's
+		 * may be already running. Just fail this IO with EINTR.
+		 */
+		ret = -EINTR;
+		/* Fall through */
+	default:
+		cachefiles_write_complete(&ki->iocb, ret, 0);
+		if (ret > 0)
+			ret = 0;
+		break;
+	}
+
+in_progress:
+	cachefiles_put_kiocb(ki);
+	_leave(" = %zd", ret);
+	return ret;
+
+presubmission_error_free:
+	fput(file);
+	kfree(ki);
+presubmission_error:
+	req->error = -ENOMEM;
 	if (req->io_done)
 		req->io_done(req);
-	return -ENOBUFS;
+	return -ENOMEM;
 }
 
 /*



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 22/32] cachefiles: Implement read and write parts of new I/O API
@ 2020-07-13 16:34   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:34 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs

Implement writing into the cache and reading back from the cache inside
cachefiles using asynchronous direct I/O from the specified iterator.  The
size and position of the request should be aligned to the reported
dio_block_size.

Errors and completion are reported by callback.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/io.c |  208 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 202 insertions(+), 6 deletions(-)

diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index ddb44ec5a199..42e0d620d778 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -12,30 +12,226 @@
 #include <linux/xattr.h>
 #include "internal.h"
 
+struct cachefiles_kiocb {
+	struct kiocb			iocb;
+	struct fscache_io_request	*req;
+	refcount_t			ki_refcnt;
+};
+
+static inline void cachefiles_put_kiocb(struct cachefiles_kiocb *ki)
+{
+	if (refcount_dec_and_test(&ki->ki_refcnt)) {
+		fscache_put_io_request(ki->req);
+		fput(ki->iocb.ki_filp);
+		kfree(ki);
+	}
+}
+
+/*
+ * Handle completion of a read from the cache.
+ */
+static void cachefiles_read_complete(struct kiocb *iocb, long ret, long ret2)
+{
+	struct cachefiles_kiocb *ki = container_of(iocb, struct cachefiles_kiocb, iocb);
+	struct fscache_io_request *req = ki->req;
+
+	_enter("%llx,%ld,%ld", req->len, ret, ret2);
+
+	fscache_end_io_operation(req->cookie);
+
+	if (ret < 0) {
+		req->error = ret;
+	} else if (ret != req->len) {
+		req->error = -ENODATA;
+	} else {
+		req->transferred = ret;
+		set_bit(FSCACHE_IO_DATA_FROM_CACHE, &req->flags);
+	}
+	if (req->io_done)
+		req->io_done(req);
+	cachefiles_put_kiocb(ki);
+}
+
 /*
  * Initiate a read from the cache.
  */
-int cachefiles_read(struct fscache_object *object,
+int cachefiles_read(struct fscache_object *obj,
 		    struct fscache_io_request *req,
 		    struct iov_iter *iter)
 {
-	req->error = -ENODATA;
+	struct cachefiles_object *object =
+		container_of(obj, struct cachefiles_object, fscache);
+	struct cachefiles_kiocb *ki;
+	struct file *file = object->backing_file;
+	ssize_t ret = -ENOBUFS;
+
+	_enter("%pD,%li,%llx,%llx/%llx",
+	       file, file_inode(file)->i_ino, req->pos, req->len, i_size_read(file->f_inode));
+
+	ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
+	if (!ki)
+		goto presubmission_error;
+
+	refcount_set(&ki->ki_refcnt, 2);
+	ki->iocb.ki_filp	= get_file(file);
+	ki->iocb.ki_pos		= req->pos;
+	ki->iocb.ki_flags	= IOCB_DIRECT;
+	ki->iocb.ki_hint	= ki_hint_validate(file_write_hint(file));
+	ki->iocb.ki_ioprio	= get_current_ioprio();
+	ki->req			= req;
+
+	if (req->io_done)
+		ki->iocb.ki_complete = cachefiles_read_complete;
+
+	ret = rw_verify_area(READ, file, &ki->iocb.ki_pos, iov_iter_count(iter));
+	if (ret < 0)
+		goto presubmission_error_free;
+
+	fscache_get_io_request(req);
+	ret = call_read_iter(file, &ki->iocb, iter);
+	switch (ret) {
+	case -EIOCBQUEUED:
+		goto in_progress;
+
+	case -ERESTARTSYS:
+	case -ERESTARTNOINTR:
+	case -ERESTARTNOHAND:
+	case -ERESTART_RESTARTBLOCK:
+		/* There's no easy way to restart the syscall since other AIO's
+		 * may be already running. Just fail this IO with EINTR.
+		 */
+		ret = -EINTR;
+		/* Fall through */
+	default:
+		cachefiles_read_complete(&ki->iocb, ret, 0);
+		if (ret > 0)
+			ret = 0;
+		break;
+	}
+
+in_progress:
+	cachefiles_put_kiocb(ki);
+	_leave(" = %zd", ret);
+	return ret;
+
+presubmission_error_free:
+	fput(file);
+	kfree(ki);
+presubmission_error:
+	req->error = -ENOMEM;
+	if (req->io_done)
+		req->io_done(req);
+	return -ENOMEM;
+}
+
+/*
+ * Handle completion of a write to the cache.
+ */
+static void cachefiles_write_complete(struct kiocb *iocb, long ret, long ret2)
+{
+	struct cachefiles_kiocb *ki = container_of(iocb, struct cachefiles_kiocb, iocb);
+	struct fscache_io_request *req = ki->req;
+	struct inode *inode = file_inode(ki->iocb.ki_filp);
+
+	_enter("%llx,%ld,%ld", req->len, ret, ret2);
+
+	/* Tell lockdep we inherited freeze protection from submission thread */
+	__sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
+	__sb_end_write(inode->i_sb, SB_FREEZE_WRITE);
+
+	fscache_end_io_operation(req->cookie);
+
+	if (ret < 0)
+		req->error = ret;
+	else if (ret != req->len)
+		req->error = -ENOBUFS;
+	else
+		cachefiles_mark_content_map(req);
 	if (req->io_done)
 		req->io_done(req);
-	return -ENODATA;
+	cachefiles_put_kiocb(ki);
 }
 
 /*
  * Initiate a write to the cache.
  */
-int cachefiles_write(struct fscache_object *object,
+int cachefiles_write(struct fscache_object *obj,
 		     struct fscache_io_request *req,
 		     struct iov_iter *iter)
 {
-	req->error = -ENOBUFS;
+	struct cachefiles_object *object =
+		container_of(obj, struct cachefiles_object, fscache);
+	struct cachefiles_kiocb *ki;
+	struct inode *inode;
+	struct file *file = object->backing_file;
+	ssize_t ret = -ENOBUFS;
+
+	_enter("%pD,%li,%llx,%llx/%llx",
+	       file, file_inode(file)->i_ino, req->pos, req->len, i_size_read(file->f_inode));
+
+	ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
+	if (!ki)
+		goto presubmission_error;
+
+	refcount_set(&ki->ki_refcnt, 2);
+	ki->iocb.ki_filp	= get_file(file);
+	ki->iocb.ki_pos		= req->pos;
+	ki->iocb.ki_flags	= IOCB_DIRECT | IOCB_WRITE;
+	ki->iocb.ki_hint	= ki_hint_validate(file_write_hint(file));
+	ki->iocb.ki_ioprio	= get_current_ioprio();
+	ki->req			= req;
+
+	if (req->io_done)
+		ki->iocb.ki_complete = cachefiles_write_complete;
+
+	ret = rw_verify_area(WRITE, file, &ki->iocb.ki_pos, iov_iter_count(iter));
+	if (ret < 0)
+		goto presubmission_error_free;
+
+	/* Open-code file_start_write here to grab freeze protection, which
+	 * will be released by another thread in aio_complete_rw().  Fool
+	 * lockdep by telling it the lock got released so that it doesn't
+	 * complain about the held lock when we return to userspace.
+	 */
+	inode = file_inode(file);
+	__sb_start_write(inode->i_sb, SB_FREEZE_WRITE, true);
+	__sb_writers_release(inode->i_sb, SB_FREEZE_WRITE);
+
+	fscache_get_io_request(req);
+	ret = call_write_iter(file, &ki->iocb, iter);
+	switch (ret) {
+	case -EIOCBQUEUED:
+		goto in_progress;
+
+	case -ERESTARTSYS:
+	case -ERESTARTNOINTR:
+	case -ERESTARTNOHAND:
+	case -ERESTART_RESTARTBLOCK:
+		/* There's no easy way to restart the syscall since other AIO's
+		 * may be already running. Just fail this IO with EINTR.
+		 */
+		ret = -EINTR;
+		/* Fall through */
+	default:
+		cachefiles_write_complete(&ki->iocb, ret, 0);
+		if (ret > 0)
+			ret = 0;
+		break;
+	}
+
+in_progress:
+	cachefiles_put_kiocb(ki);
+	_leave(" = %zd", ret);
+	return ret;
+
+presubmission_error_free:
+	fput(file);
+	kfree(ki);
+presubmission_error:
+	req->error = -ENOMEM;
 	if (req->io_done)
 		req->io_done(req);
-	return -ENOBUFS;
+	return -ENOMEM;
 }
 
 /*

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 23/32] cachefiles: Add I/O tracepoints
  2020-07-13 16:30 ` David Howells
@ 2020-07-13 16:34   ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:34 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel


---

 fs/cachefiles/interface.c         |   16 +++--
 fs/cachefiles/io.c                |    2 +
 include/trace/events/cachefiles.h |  123 +++++++++++++++++++++++++++++++++++++
 3 files changed, 136 insertions(+), 5 deletions(-)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 054d5cc794b5..e73de62d0e73 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -160,7 +160,8 @@ static void cachefiles_update_object(struct fscache_object *_object)
 	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
 	const struct cred *saved_cred;
-	loff_t object_size;
+	struct inode *inode;
+	loff_t object_size, i_size;
 	int ret;
 
 	_enter("{OBJ%x}", _object->debug_id);
@@ -172,12 +173,15 @@ static void cachefiles_update_object(struct fscache_object *_object)
 	cachefiles_begin_secure(cache, &saved_cred);
 
 	object_size = object->fscache.cookie->object_size;
-	if (i_size_read(d_inode(object->dentry)) > object_size) {
+	inode = d_inode(object->dentry);
+	i_size = i_size_read(inode);
+	if (i_size > object_size) {
 		struct path path = {
 			.mnt	= cache->mnt,
 			.dentry	= object->dentry
 		};
-		_debug("trunc %llx -> %llx", i_size_read(d_inode(object->dentry)), object_size);
+		_debug("trunc %llx -> %llx", i_size, object_size);
+		trace_cachefiles_trunc(object, inode, i_size, object_size);
 		ret = vfs_truncate(&path, object_size);
 		if (ret < 0) {
 			cachefiles_io_error_obj(object, "Trunc-to-size failed");
@@ -186,8 +190,10 @@ static void cachefiles_update_object(struct fscache_object *_object)
 		}
 
 		object_size = round_up(object_size, CACHEFILES_DIO_BLOCK_SIZE);
-		_debug("trunc %llx -> %llx", i_size_read(d_inode(object->dentry)), object_size);
-		if (i_size_read(d_inode(object->dentry)) < object_size) {
+		i_size = i_size_read(inode);
+		_debug("trunc %llx -> %llx", i_size, object_size);
+		if (i_size < object_size) {
+			trace_cachefiles_trunc(object, inode, i_size, object_size);
 			ret = vfs_truncate(&path, object_size);
 			if (ret < 0) {
 				cachefiles_io_error_obj(object, "Trunc-to-dio-size failed");
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 42e0d620d778..268e6f69ba9c 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -88,6 +88,7 @@ int cachefiles_read(struct fscache_object *obj,
 		goto presubmission_error_free;
 
 	fscache_get_io_request(req);
+	trace_cachefiles_read(object, file_inode(file), req);
 	ret = call_read_iter(file, &ki->iocb, iter);
 	switch (ret) {
 	case -EIOCBQUEUED:
@@ -198,6 +199,7 @@ int cachefiles_write(struct fscache_object *obj,
 	__sb_writers_release(inode->i_sb, SB_FREEZE_WRITE);
 
 	fscache_get_io_request(req);
+	trace_cachefiles_write(object, inode, req);
 	ret = call_write_iter(file, &ki->iocb, iter);
 	switch (ret) {
 	case -EIOCBQUEUED:
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index e7af1d683009..d83568e8fee8 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -351,6 +351,129 @@ TRACE_EVENT(cachefiles_coherency,
 		      __entry->content)
 	    );
 
+TRACE_EVENT(cachefiles_read,
+	    TP_PROTO(struct cachefiles_object *obj,
+		     struct inode *backer,
+		     struct fscache_io_request *req),
+
+	    TP_ARGS(obj, backer, req),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,			obj	)
+		    __field(unsigned int,			backer	)
+		    __field(unsigned int,			len	)
+		    __field(loff_t,				pos	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->backer	= backer->i_ino;
+		    __entry->pos	= req->pos;
+		    __entry->len	= req->len;
+			   ),
+
+	    TP_printk("o=%08x b=%08x p=%llx l=%x",
+		      __entry->obj,
+		      __entry->backer,
+		      __entry->pos,
+		      __entry->len)
+	    );
+
+TRACE_EVENT(cachefiles_write,
+	    TP_PROTO(struct cachefiles_object *obj,
+		     struct inode *backer,
+		     struct fscache_io_request *req),
+
+	    TP_ARGS(obj, backer, req),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,			obj	)
+		    __field(unsigned int,			backer	)
+		    __field(unsigned int,			len	)
+		    __field(loff_t,				pos	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->backer	= backer->i_ino;
+		    __entry->pos	= req->pos;
+		    __entry->len	= req->len;
+			   ),
+
+	    TP_printk("o=%08x b=%08x p=%llx l=%x",
+		      __entry->obj,
+		      __entry->backer,
+		      __entry->pos,
+		      __entry->len)
+	    );
+
+TRACE_EVENT(cachefiles_trunc,
+	    TP_PROTO(struct cachefiles_object *obj, struct inode *backer,
+		     loff_t from, loff_t to),
+
+	    TP_ARGS(obj, backer, from, to),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,			obj	)
+		    __field(unsigned int,			backer	)
+		    __field(loff_t,				from	)
+		    __field(loff_t,				to	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->backer	= backer->i_ino;
+		    __entry->from	= from;
+		    __entry->to		= to;
+			   ),
+
+	    TP_printk("o=%08x b=%08x l=%llx->%llx",
+		      __entry->obj,
+		      __entry->backer,
+		      __entry->from,
+		      __entry->to)
+	    );
+
+TRACE_EVENT(cachefiles_tmpfile,
+	    TP_PROTO(struct cachefiles_object *obj, struct inode *backer),
+
+	    TP_ARGS(obj, backer),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,			obj	)
+		    __field(unsigned int,			backer	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->backer	= backer->i_ino;
+			   ),
+
+	    TP_printk("o=%08x b=%08x",
+		      __entry->obj,
+		      __entry->backer)
+	    );
+
+TRACE_EVENT(cachefiles_link,
+	    TP_PROTO(struct cachefiles_object *obj, struct inode *backer),
+
+	    TP_ARGS(obj, backer),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,			obj	)
+		    __field(unsigned int,			backer	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->backer	= backer->i_ino;
+			   ),
+
+	    TP_printk("o=%08x b=%08x",
+		      __entry->obj,
+		      __entry->backer)
+	    );
+
 #endif /* _TRACE_CACHEFILES_H */
 
 /* This part must be outside protection */



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 23/32] cachefiles: Add I/O tracepoints
@ 2020-07-13 16:34   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:34 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs


---

 fs/cachefiles/interface.c         |   16 +++--
 fs/cachefiles/io.c                |    2 +
 include/trace/events/cachefiles.h |  123 +++++++++++++++++++++++++++++++++++++
 3 files changed, 136 insertions(+), 5 deletions(-)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 054d5cc794b5..e73de62d0e73 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -160,7 +160,8 @@ static void cachefiles_update_object(struct fscache_object *_object)
 	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
 	const struct cred *saved_cred;
-	loff_t object_size;
+	struct inode *inode;
+	loff_t object_size, i_size;
 	int ret;
 
 	_enter("{OBJ%x}", _object->debug_id);
@@ -172,12 +173,15 @@ static void cachefiles_update_object(struct fscache_object *_object)
 	cachefiles_begin_secure(cache, &saved_cred);
 
 	object_size = object->fscache.cookie->object_size;
-	if (i_size_read(d_inode(object->dentry)) > object_size) {
+	inode = d_inode(object->dentry);
+	i_size = i_size_read(inode);
+	if (i_size > object_size) {
 		struct path path = {
 			.mnt	= cache->mnt,
 			.dentry	= object->dentry
 		};
-		_debug("trunc %llx -> %llx", i_size_read(d_inode(object->dentry)), object_size);
+		_debug("trunc %llx -> %llx", i_size, object_size);
+		trace_cachefiles_trunc(object, inode, i_size, object_size);
 		ret = vfs_truncate(&path, object_size);
 		if (ret < 0) {
 			cachefiles_io_error_obj(object, "Trunc-to-size failed");
@@ -186,8 +190,10 @@ static void cachefiles_update_object(struct fscache_object *_object)
 		}
 
 		object_size = round_up(object_size, CACHEFILES_DIO_BLOCK_SIZE);
-		_debug("trunc %llx -> %llx", i_size_read(d_inode(object->dentry)), object_size);
-		if (i_size_read(d_inode(object->dentry)) < object_size) {
+		i_size = i_size_read(inode);
+		_debug("trunc %llx -> %llx", i_size, object_size);
+		if (i_size < object_size) {
+			trace_cachefiles_trunc(object, inode, i_size, object_size);
 			ret = vfs_truncate(&path, object_size);
 			if (ret < 0) {
 				cachefiles_io_error_obj(object, "Trunc-to-dio-size failed");
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 42e0d620d778..268e6f69ba9c 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -88,6 +88,7 @@ int cachefiles_read(struct fscache_object *obj,
 		goto presubmission_error_free;
 
 	fscache_get_io_request(req);
+	trace_cachefiles_read(object, file_inode(file), req);
 	ret = call_read_iter(file, &ki->iocb, iter);
 	switch (ret) {
 	case -EIOCBQUEUED:
@@ -198,6 +199,7 @@ int cachefiles_write(struct fscache_object *obj,
 	__sb_writers_release(inode->i_sb, SB_FREEZE_WRITE);
 
 	fscache_get_io_request(req);
+	trace_cachefiles_write(object, inode, req);
 	ret = call_write_iter(file, &ki->iocb, iter);
 	switch (ret) {
 	case -EIOCBQUEUED:
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index e7af1d683009..d83568e8fee8 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -351,6 +351,129 @@ TRACE_EVENT(cachefiles_coherency,
 		      __entry->content)
 	    );
 
+TRACE_EVENT(cachefiles_read,
+	    TP_PROTO(struct cachefiles_object *obj,
+		     struct inode *backer,
+		     struct fscache_io_request *req),
+
+	    TP_ARGS(obj, backer, req),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,			obj	)
+		    __field(unsigned int,			backer	)
+		    __field(unsigned int,			len	)
+		    __field(loff_t,				pos	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->backer	= backer->i_ino;
+		    __entry->pos	= req->pos;
+		    __entry->len	= req->len;
+			   ),
+
+	    TP_printk("o=%08x b=%08x p=%llx l=%x",
+		      __entry->obj,
+		      __entry->backer,
+		      __entry->pos,
+		      __entry->len)
+	    );
+
+TRACE_EVENT(cachefiles_write,
+	    TP_PROTO(struct cachefiles_object *obj,
+		     struct inode *backer,
+		     struct fscache_io_request *req),
+
+	    TP_ARGS(obj, backer, req),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,			obj	)
+		    __field(unsigned int,			backer	)
+		    __field(unsigned int,			len	)
+		    __field(loff_t,				pos	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->backer	= backer->i_ino;
+		    __entry->pos	= req->pos;
+		    __entry->len	= req->len;
+			   ),
+
+	    TP_printk("o=%08x b=%08x p=%llx l=%x",
+		      __entry->obj,
+		      __entry->backer,
+		      __entry->pos,
+		      __entry->len)
+	    );
+
+TRACE_EVENT(cachefiles_trunc,
+	    TP_PROTO(struct cachefiles_object *obj, struct inode *backer,
+		     loff_t from, loff_t to),
+
+	    TP_ARGS(obj, backer, from, to),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,			obj	)
+		    __field(unsigned int,			backer	)
+		    __field(loff_t,				from	)
+		    __field(loff_t,				to	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->backer	= backer->i_ino;
+		    __entry->from	= from;
+		    __entry->to		= to;
+			   ),
+
+	    TP_printk("o=%08x b=%08x l=%llx->%llx",
+		      __entry->obj,
+		      __entry->backer,
+		      __entry->from,
+		      __entry->to)
+	    );
+
+TRACE_EVENT(cachefiles_tmpfile,
+	    TP_PROTO(struct cachefiles_object *obj, struct inode *backer),
+
+	    TP_ARGS(obj, backer),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,			obj	)
+		    __field(unsigned int,			backer	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->backer	= backer->i_ino;
+			   ),
+
+	    TP_printk("o=%08x b=%08x",
+		      __entry->obj,
+		      __entry->backer)
+	    );
+
+TRACE_EVENT(cachefiles_link,
+	    TP_PROTO(struct cachefiles_object *obj, struct inode *backer),
+
+	    TP_ARGS(obj, backer),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,			obj	)
+		    __field(unsigned int,			backer	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->backer	= backer->i_ino;
+			   ),
+
+	    TP_printk("o=%08x b=%08x",
+		      __entry->obj,
+		      __entry->backer)
+	    );
+
 #endif /* _TRACE_CACHEFILES_H */
 
 /* This part must be outside protection */

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 24/32] fscache: Add read helper
  2020-07-13 16:30 ` David Howells
@ 2020-07-13 16:35   ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:35 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Add a trio of helper functions:

	fscache_read_helper_page_list();
	fscache_read_helper_locked_page();
	fscache_read_helper_for_write();

to do the work of shaping read requests, attempting to read from the cache,
issuing or reissuing requests to the filesystem to pass to the server and
writing back to the filesystem.

The filesystem passes in a prepared request descriptor with the fscache
descriptor embedded in it to one of the helper functions.  The caller also
indicates which page(s) it is interested in and provides some operations to
issue reads and manage the request descriptor.

The helper is placed into its own module, fsinfo_support.ko, which must be
enabled unconditionally by any filesystem which wishes to use the helper
even if CONFIG_FSCACHE=no.  This module is selected by
CONFIG_FSCACHE_SUPPORT.  About half of the code is optimised away by
CONFIG_FSCACHE=no.

Also add a tracepoint to track calls.  A set of 'notes' are taken to record
the path through the function and this is dumped into the trace.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/Makefile                            |    2 
 fs/fscache/Kconfig                     |    4 
 fs/fscache/Makefile                    |    3 
 fs/fscache/internal.h                  |    8 
 fs/fscache/main.c                      |    1 
 fs/fscache/read_helper.c               |  656 ++++++++++++++++++++++++++++++++
 include/linux/fscache.h                |   26 +
 include/trace/events/fscache_support.h |   91 ++++
 8 files changed, 789 insertions(+), 2 deletions(-)
 create mode 100644 fs/fscache/read_helper.c
 create mode 100644 include/trace/events/fscache_support.h

diff --git a/fs/Makefile b/fs/Makefile
index 2ce5112b02c8..8b0a5b5b1d86 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -68,7 +68,7 @@ obj-$(CONFIG_PROFILING)		+= dcookies.o
 obj-$(CONFIG_DLM)		+= dlm/
  
 # Do not add any filesystems before this line
-obj-$(CONFIG_FSCACHE)		+= fscache/
+obj-$(CONFIG_FSCACHE_SUPPORT)	+= fscache/
 obj-$(CONFIG_REISERFS_FS)	+= reiserfs/
 obj-$(CONFIG_EXT4_FS)		+= ext4/
 # We place ext4 before ext2 so that clean ext3 root fs's do NOT mount using the
diff --git a/fs/fscache/Kconfig b/fs/fscache/Kconfig
index ce6f731065d0..369c12ef0167 100644
--- a/fs/fscache/Kconfig
+++ b/fs/fscache/Kconfig
@@ -1,7 +1,11 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
+config FSCACHE_SUPPORT
+	tristate "Support for local caching of network filesystems"
+
 config FSCACHE
 	tristate "General filesystem local caching manager"
+	depends on FSCACHE_SUPPORT
 	help
 	  This option enables a generic filesystem caching manager that can be
 	  used by various network and other filesystems to cache data locally.
diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
index 3caf66810e7b..0a5c8c654942 100644
--- a/fs/fscache/Makefile
+++ b/fs/fscache/Makefile
@@ -20,3 +20,6 @@ fscache-$(CONFIG_FSCACHE_HISTOGRAM) += histogram.o
 fscache-$(CONFIG_FSCACHE_OBJECT_LIST) += object-list.o
 
 obj-$(CONFIG_FSCACHE) := fscache.o
+
+fscache_support-y := read_helper.o
+obj-$(CONFIG_FSCACHE_SUPPORT) += fscache_support.o
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index a70c1a612309..2674438ccafd 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -30,6 +30,8 @@
 #include <linux/sched.h>
 #include <linux/seq_file.h>
 
+#if IS_ENABLED(CONFIG_FSCACHE)
+
 #define FSCACHE_MIN_THREADS	4
 #define FSCACHE_MAX_THREADS	32
 
@@ -266,6 +268,12 @@ void fscache_update_aux(struct fscache_cookie *cookie,
 		cookie->object_size = *object_size;
 }
 
+#else /* CONFIG_FSCACHE */
+
+#define fscache_op_wq system_wq
+
+#endif /* CONFIG_FSCACHE */
+
 /*****************************************************************************/
 /*
  * debug tracing
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index b2691439377b..ac4fd4d59479 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -39,6 +39,7 @@ MODULE_PARM_DESC(fscache_debug,
 
 struct kobject *fscache_root;
 struct workqueue_struct *fscache_op_wq;
+EXPORT_SYMBOL(fscache_op_wq);
 
 /* these values serve as lower bounds, will be adjusted in fscache_init() */
 static unsigned fscache_object_max_active = 4;
diff --git a/fs/fscache/read_helper.c b/fs/fscache/read_helper.c
new file mode 100644
index 000000000000..62fed27aa938
--- /dev/null
+++ b/fs/fscache/read_helper.c
@@ -0,0 +1,656 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Read helper.
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#define FSCACHE_DEBUG_LEVEL OPERATION
+#include <linux/module.h>
+#include <linux/export.h>
+#include <linux/slab.h>
+#include <linux/uio.h>
+#include <linux/task_io_accounting_ops.h>
+#include <linux/fscache-cache.h>
+#include "internal.h"
+#define CREATE_TRACE_POINTS
+#include <trace/events/fscache_support.h>
+
+MODULE_DESCRIPTION("FS Cache Manager Support");
+MODULE_AUTHOR("Red Hat, Inc.");
+MODULE_LICENSE("GPL");
+
+#define FSCACHE_RHLP_NOTE_READ_FROM_CACHE	FSCACHE_READ_FROM_CACHE
+#define FSCACHE_RHLP_NOTE_WRITE_TO_CACHE	FSCACHE_WRITE_TO_CACHE
+#define FSCACHE_RHLP_NOTE_FILL_WITH_ZERO	FSCACHE_FILL_WITH_ZERO
+#define FSCACHE_RHLP_NOTE_READ_FOR_WRITE	0x00000100 /* Type: FSCACHE_READ_FOR_WRITE */
+#define FSCACHE_RHLP_NOTE_READ_LOCKED_PAGE	0x00000200 /* Type: FSCACHE_READ_LOCKED_PAGE */
+#define FSCACHE_RHLP_NOTE_READ_PAGE_LIST	0x00000300 /* Type: FSCACHE_READ_PAGE_LIST */
+#define FSCACHE_RHLP_NOTE_LIST_NOMEM		0x00001000 /* Page list: ENOMEM */
+#define FSCACHE_RHLP_NOTE_LIST_U2D		0x00002000 /* Page list: page uptodate */
+#define FSCACHE_RHLP_NOTE_LIST_ERROR		0x00004000 /* Page list: add error */
+#define FSCACHE_RHLP_NOTE_TRAILER_ADD		0x00010000 /* Trailer: Creating */
+#define FSCACHE_RHLP_NOTE_TRAILER_NOMEM		0x00020000 /* Trailer: ENOMEM */
+#define FSCACHE_RHLP_NOTE_TRAILER_U2D		0x00040000 /* Trailer: Uptodate */
+#define FSCACHE_RHLP_NOTE_U2D_IN_PREFACE	0x00100000 /* Uptodate page in preface */
+#define FSCACHE_RHLP_NOTE_UNDERSIZED		0x00200000 /* Undersized block */
+#define FSCACHE_RHLP_NOTE_AFTER_EOF		0x00400000 /* After EOF */
+#define FSCACHE_RHLP_NOTE_DO_WRITE_TO_CACHE	0x00800000 /* Actually write to the cache */
+#define FSCACHE_RHLP_NOTE_CANCELLED		0x80000000 /* Operation cancelled by netfs */
+
+unsigned fscache_rh_debug;
+module_param_named(debug, fscache_rh_debug, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(fscache_rh_debug, "FS-Cache read helper debugging mask");
+#define fscache_debug fscache_rh_debug
+
+enum fscache_read_type {
+	FSCACHE_READ_PAGE_LIST,		/* Read the list of pages (readpages) */
+	FSCACHE_READ_LOCKED_PAGE,	/* requested_page is added and locked */
+	FSCACHE_READ_FOR_WRITE,		/* This read is a prelude to write_begin */
+};
+
+static void fscache_read_from_server(struct fscache_io_request *req)
+{
+	req->ops->issue_op(req);
+}
+
+/*
+ * Deal with the completion of writing the data to the cache.  We have to clear
+ * the PG_fscache bits on the pages involved and releases the caller's ref.
+ */
+static void fscache_read_copy_done(struct fscache_io_request *req)
+{
+	struct page *page;
+	pgoff_t index = req->pos >> PAGE_SHIFT;
+	pgoff_t last = index + req->nr_pages - 1;
+
+	XA_STATE(xas, &req->mapping->i_pages, index);
+
+	_enter("%lx,%x,%llx", index, req->nr_pages, req->transferred);
+
+	/* Clear PG_fscache on the pages that were being written out. */
+	rcu_read_lock();
+	xas_for_each(&xas, page, last) {
+		BUG_ON(xa_is_value(page));
+		BUG_ON(PageCompound(page));
+
+		unlock_page_fscache(page);
+	}
+	rcu_read_unlock();
+}
+
+/*
+ * Write a completed read request to the cache.
+ */
+static void fscache_do_read_copy_to_cache(struct work_struct *work)
+{
+	struct fscache_io_request *req =
+		container_of(work, struct fscache_io_request, work);
+	struct iov_iter iter;
+
+	_enter("");
+
+	iov_iter_mapping(&iter, WRITE, req->mapping, req->pos,
+			 round_up(req->len, req->dio_block_size));
+
+	req->io_done = fscache_read_copy_done;
+	fscache_write(req, &iter);
+	fscache_put_io_request(req);
+}
+
+static void fscache_read_copy_to_cache(struct fscache_io_request *req)
+{
+	fscache_get_io_request(req);
+
+	if (!in_softirq())
+		return fscache_do_read_copy_to_cache(&req->work);
+
+	BUG_ON(work_pending(&req->work));
+	INIT_WORK(&req->work, fscache_do_read_copy_to_cache);
+	if (!queue_work(fscache_op_wq, &req->work))
+		BUG();
+}
+
+/*
+ * Clear the unread part of the file on a short read.
+ */
+static void fscache_clear_unread(struct fscache_io_request *req)
+{
+	struct iov_iter iter;
+
+	iov_iter_mapping(&iter, WRITE, req->mapping,
+			 req->pos + req->transferred,
+			 req->len - req->transferred);
+
+	_debug("clear %zx @%llx", iov_iter_count(&iter), iter.mapping_start);
+
+	iov_iter_zero(iov_iter_count(&iter), &iter);
+}
+
+/*
+ * Handle completion of a read operation.  This may be called in softirq
+ * context.
+ */
+static void fscache_read_done(struct fscache_io_request *req)
+{
+	struct page *page;
+	pgoff_t start = req->pos >> PAGE_SHIFT;
+	pgoff_t last = start + req->nr_pages - 1;
+
+	XA_STATE(xas, &req->mapping->i_pages, start);
+
+	_enter("%lx,%x,%llx,%d",
+	       start, req->nr_pages, req->transferred, req->error);
+
+	if (req->transferred < req->len)
+		fscache_clear_unread(req);
+
+	if (!test_bit(FSCACHE_IO_DONT_UNLOCK_PAGES, &req->flags)) {
+		rcu_read_lock();
+		xas_for_each(&xas, page, last) {
+			if (test_bit(FSCACHE_IO_WRITE_TO_CACHE, &req->flags))
+				SetPageFsCache(page);
+			if (page == req->no_unlock_page)
+				SetPageUptodate(page);
+			else
+				page_endio(page, false, 0);
+			put_page(page);
+		}
+		rcu_read_unlock();
+	}
+
+	task_io_account_read(req->transferred);
+	req->ops->done(req);
+	if (test_and_clear_bit(FSCACHE_IO_READ_IN_PROGRESS, &req->flags))
+		wake_up_bit(&req->flags, FSCACHE_IO_READ_IN_PROGRESS);
+
+	if (test_bit(FSCACHE_IO_WRITE_TO_CACHE, &req->flags))
+		fscache_read_copy_to_cache(req);
+}
+
+/*
+ * Reissue the read against the server.
+ */
+static void fscache_reissue_read(struct work_struct *work)
+{
+	struct fscache_io_request *req =
+		container_of(work, struct fscache_io_request, work);
+
+	_debug("DOWNLOAD: %llu", req->len);
+
+	req->io_done = fscache_read_done;
+	fscache_read_from_server(req);
+	fscache_put_io_request(req);
+}
+
+/*
+ * Handle completion of a read from cache operation.  If the read failed, we
+ * need to reissue the request against the server.  We might, however, be
+ * called in softirq mode and need to punt.
+ */
+static void fscache_file_read_maybe_reissue(struct fscache_io_request *req)
+{
+	_enter("%d", req->error);
+
+	if (req->error == 0) {
+		fscache_read_done(req);
+	} else {
+		INIT_WORK(&req->work, fscache_reissue_read);
+		fscache_get_io_request(req);
+		queue_work(fscache_op_wq, &req->work);
+	}
+}
+
+/*
+ * Issue a read against the cache.
+ */
+static void fscache_read_from_cache(struct fscache_io_request *req)
+{
+	struct iov_iter iter;
+
+	iov_iter_mapping(&iter, READ, req->mapping, req->pos, req->len);
+	fscache_read(req, &iter);
+}
+
+/*
+ * Discard the locks and page refs that we obtained on a sequence of pages.
+ */
+static void fscache_ignore_pages(struct address_space *mapping,
+				  pgoff_t start, pgoff_t end)
+{
+	struct page *page;
+
+	_enter("%lx,%lx", start, end);
+
+	if (end > start) {
+		XA_STATE(xas, &mapping->i_pages, start);
+
+		rcu_read_lock();
+		xas_for_each(&xas, page, end - 1) {
+			_debug("- ignore %lx", page->index);
+			BUG_ON(xa_is_value(page));
+			BUG_ON(PageCompound(page));
+
+			unlock_page(page);
+			put_page(page);
+		}
+		rcu_read_unlock();
+	}
+}
+
+/**
+ * fscache_read_helper - Helper to manage a read request
+ * @req: The initialised request structure to use
+ * @requested_page: Singular page to include (LOCKED_PAGE/FOR_WRITE)
+ * @pages: Unattached pages to include (PAGE_LIST)
+ * @page_to_be_written: The index of the primary page (FOR_WRITE)
+ * @max_pages: The maximum number of pages to read in one transaction
+ * @type: FSCACHE_READ_*
+ * @aop_flags: AOP_FLAG_*
+ *
+ * Read a sequence of pages appropriately sized for an fscache allocation
+ * block.  Pages are added at both ends and to fill in the gaps as appropriate
+ * to make it the right size.
+ *
+ * req->mapping should indicate the mapping to which the pages will be attached.
+ *
+ * The operations pointed to by req->ops will be used to issue or reissue a
+ * read against the server in case the cache is unavailable, incomplete or
+ * generates an error.  req->iter will be set up to point to the iterator
+ * representing the buffer to be filled in.
+ *
+ * A ref on @req is consumed eventually by this function or one of its
+ * eventually-dispatched callees.
+ */
+static int fscache_read_helper(struct fscache_io_request *req,
+			       struct page **requested_page,
+			       struct list_head *pages,
+			       pgoff_t page_to_be_written,
+			       pgoff_t max_pages,
+			       enum fscache_read_type type,
+			       unsigned int aop_flags)
+{
+	struct fscache_request_shape shape;
+	struct address_space *mapping = req->mapping;
+	struct page *page;
+	enum fscache_read_helper_trace what;
+	unsigned int notes;
+	pgoff_t eof, cursor, start;
+	loff_t new_size;
+	int ret;
+
+	shape.granularity	= 1;
+	shape.max_io_pages	= max_pages;
+	shape.i_size		= i_size_read(mapping->host);
+	shape.for_write		= false;
+
+	switch (type) {
+	case FSCACHE_READ_PAGE_LIST:
+		shape.proposed_start = lru_to_page(pages)->index;
+		shape.proposed_nr_pages =
+			lru_to_last_page(pages)->index - shape.proposed_start + 1;
+		break;
+
+	case FSCACHE_READ_LOCKED_PAGE:
+		shape.proposed_start = (*requested_page)->index;
+		shape.proposed_nr_pages = 1;
+		break;
+
+	case FSCACHE_READ_FOR_WRITE:
+		new_size = (loff_t)(page_to_be_written + 1) << PAGE_SHIFT;
+		if (new_size > shape.i_size)
+			shape.i_size = new_size;
+		shape.proposed_start = page_to_be_written;
+		shape.proposed_nr_pages = 1;
+		break;
+
+	default:
+		BUG();
+	}
+
+	_enter("%lx,%x", shape.proposed_start, shape.proposed_nr_pages);
+
+	eof = (shape.i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
+	fscache_shape_request(req->cookie, &shape);
+	if (req->ops->reshape)
+		req->ops->reshape(req, &shape);
+	notes = shape.to_be_done;
+
+	req->dio_block_size = shape.dio_block_size;
+
+	start = cursor = shape.actual_start;
+
+	/* Add pages to the pagecache.  We keep the pages ref'd and locked
+	 * until the read is complete.  We may also need to add pages to both
+	 * sides of the request to make it up to the cache allocation granule
+	 * alignment and size.
+	 *
+	 * Note that it's possible for the file size to change whilst we're
+	 * doing this, but we rely on the server returning less than we asked
+	 * for if the file shrank.  We also rely on this to deal with a partial
+	 * page at the end of the file.
+	 *
+	 * If we're going to end up loading from the server and writing to the
+	 * cache, we start by inserting blank pages before the first page being
+	 * examined.  If we can fetch from the cache or we're not going to
+	 * write to the cache, it's unnecessary.
+	 */
+	if (notes & FSCACHE_RHLP_NOTE_WRITE_TO_CACHE) {
+		notes |= FSCACHE_RHLP_NOTE_DO_WRITE_TO_CACHE;
+		while (cursor < shape.proposed_start) {
+			page = find_or_create_page(mapping, cursor,
+						   readahead_gfp_mask(mapping));
+			if (!page)
+				goto nomem;
+			if (!PageUptodate(page)) {
+				req->nr_pages++; /* Add to the reading list */
+				cursor++;
+				continue;
+			}
+
+			/* There's an up-to-date page in the preface - just
+			 * fetch the requested pages and skip saving to the
+			 * cache.
+			 */
+			notes |= FSCACHE_RHLP_NOTE_U2D_IN_PREFACE;
+			notes &= ~FSCACHE_RHLP_NOTE_DO_WRITE_TO_CACHE;
+			fscache_ignore_pages(mapping, start, cursor + 1);
+			start = cursor = shape.proposed_start;
+			req->nr_pages = 0;
+			break;
+		}
+		page = NULL;
+	} else {
+		notes &= ~FSCACHE_RHLP_NOTE_DO_WRITE_TO_CACHE;
+		start = cursor = shape.proposed_start;
+		req->nr_pages = 0;
+	}
+
+	switch (type) {
+	case FSCACHE_READ_FOR_WRITE:
+		/* We're doing a prefetch for a write on a single page.  We get
+		 * or create the requested page if we weren't given it and lock
+		 * it.
+		 */
+		notes |= FSCACHE_RHLP_NOTE_READ_FOR_WRITE;
+		if (*requested_page) {
+			_debug("prewrite req %lx", cursor);
+			page = *requested_page;
+			ret = -ERESTARTSYS;
+			if (lock_page_killable(page) < 0)
+				goto dont;
+		} else {
+			_debug("prewrite new %lx %lx", cursor, eof);
+			page = grab_cache_page_write_begin(mapping, shape.proposed_start,
+							   aop_flags);
+			if (!page)
+				goto nomem;
+			*requested_page = page;
+		}
+
+		if (PageUptodate(page)) {
+			notes |= FSCACHE_RHLP_NOTE_LIST_U2D;
+
+			trace_fscache_read_helper(req->cookie,
+						  start, start + req->nr_pages,
+						  notes, fscache_read_helper_race);
+			req->ops->done(req);
+			ret = 0;
+			goto cancelled;
+		}
+
+		get_page(page);
+		req->no_unlock_page = page;
+		req->nr_pages++;
+		cursor++;
+		page = NULL;
+		ret = 0;
+		break;
+
+	case FSCACHE_READ_LOCKED_PAGE:
+		/* We've got a single page preattached to the inode and locked.
+		 * Get our own ref on it.
+		 */
+		_debug("locked");
+		notes |= FSCACHE_RHLP_NOTE_READ_LOCKED_PAGE;
+		get_page(*requested_page);
+		req->nr_pages++;
+		cursor++;
+		ret = 0;
+		break;
+
+	case FSCACHE_READ_PAGE_LIST:
+		/* We've been given a contiguous list of pages to add. */
+		notes |= FSCACHE_RHLP_NOTE_READ_PAGE_LIST;
+		do {
+			_debug("given %lx", cursor);
+
+			page = lru_to_page(pages);
+			if (WARN_ON(page->index != cursor))
+				break;
+
+			list_del(&page->lru);
+
+			ret = add_to_page_cache_lru(page, mapping, cursor,
+						    readahead_gfp_mask(mapping));
+			switch (ret) {
+			case 0:
+				/* Add to the reading list */
+				req->nr_pages++;
+				cursor++;
+				page = NULL;
+				break;
+
+			case -EEXIST:
+				put_page(page);
+
+				_debug("conflict %lx %d", cursor, ret);
+				page = find_or_create_page(mapping, cursor,
+							   readahead_gfp_mask(mapping));
+				if (!page) {
+					notes |= FSCACHE_RHLP_NOTE_LIST_NOMEM;
+					goto stop;
+				}
+
+				if (PageUptodate(page)) {
+					unlock_page(page);
+					put_page(page); /* Avoid overwriting */
+					ret = 0;
+					notes |= FSCACHE_RHLP_NOTE_LIST_U2D;
+					goto stop;
+				}
+
+				req->nr_pages++; /* Add to the reading list */
+				cursor++;
+				break;
+
+			default:
+				_debug("add fail %lx %d", cursor, ret);
+				put_page(page);
+				page = NULL;
+				notes |= FSCACHE_RHLP_NOTE_LIST_ERROR;
+				goto stop;
+			}
+
+			/* Trim the fetch to the cache granularity so we don't
+			 * get a chain-failure of blocks being unable to be
+			 * used because the previous uncached read spilt over.
+			 */
+			if ((notes & FSCACHE_RHLP_NOTE_U2D_IN_PREFACE) &&
+			    cursor == shape.actual_start + shape.granularity)
+				break;
+
+		} while (!list_empty(pages) && req->nr_pages < shape.actual_nr_pages);
+		ret = 0;
+		break;
+
+	default:
+		BUG();
+	}
+
+	/* If we're going to be writing to the cache, insert pages after the
+	 * requested block to make up the numbers.
+	 */
+	if (notes & FSCACHE_RHLP_NOTE_DO_WRITE_TO_CACHE) {
+		notes |= FSCACHE_RHLP_NOTE_TRAILER_ADD;
+		while (req->nr_pages < shape.actual_nr_pages) {
+			_debug("after %lx", cursor);
+			page = find_or_create_page(mapping, cursor,
+						   readahead_gfp_mask(mapping));
+			if (!page) {
+				notes |= FSCACHE_RHLP_NOTE_TRAILER_NOMEM;
+				goto stop;
+			}
+			if (PageUptodate(page)) {
+				unlock_page(page);
+				put_page(page); /* Avoid overwriting */
+				notes |= FSCACHE_RHLP_NOTE_TRAILER_U2D;
+				goto stop;
+			}
+
+			req->nr_pages++; /* Add to the reading list */
+			cursor++;
+		}
+	}
+
+stop:
+	_debug("have %u", req->nr_pages);
+	if (req->nr_pages == 0)
+		goto dont;
+
+	if (cursor <= shape.proposed_start) {
+		_debug("v.short");
+		goto nomem_unlock; /* We wouldn't've included the first page */
+	}
+
+submit_anyway:
+	if ((notes & FSCACHE_RHLP_NOTE_DO_WRITE_TO_CACHE) &&
+	    req->nr_pages < shape.actual_nr_pages) {
+		/* The request is short of what we need to be able to cache the
+		 * entire set of pages and the trailer, so trim it to cache
+		 * granularity if we can without reducing it to nothing.
+		 */
+		unsigned int down_to = round_down(req->nr_pages, shape.granularity);
+		_debug("short %u", down_to);
+
+		notes |= FSCACHE_RHLP_NOTE_UNDERSIZED;
+
+		if (down_to > 0) {
+			fscache_ignore_pages(mapping, shape.actual_start + down_to, cursor);
+			req->nr_pages = down_to;
+		} else {
+			notes &= ~FSCACHE_RHLP_NOTE_DO_WRITE_TO_CACHE;
+		}
+	}
+
+	req->len = req->nr_pages * PAGE_SIZE;
+	req->pos = start;
+	req->pos <<= PAGE_SHIFT;
+
+	if (start >= eof) {
+		notes |= FSCACHE_RHLP_NOTE_AFTER_EOF;
+		what = fscache_read_helper_skip;
+	} else if (notes & FSCACHE_RHLP_NOTE_FILL_WITH_ZERO) {
+		what = fscache_read_helper_zero;
+	} else if (notes & FSCACHE_RHLP_NOTE_READ_FROM_CACHE) {
+		what = fscache_read_helper_read;
+	} else {
+		what = fscache_read_helper_download;
+	}
+
+	ret = 0;
+	if (req->ops->is_req_valid) {
+		/* Allow the netfs to decide if the request is still valid
+		 * after all the pages are locked.
+		 */
+		ret = req->ops->is_req_valid(req);
+		if (ret < 0)
+			notes |= FSCACHE_RHLP_NOTE_CANCELLED;
+	}
+
+	trace_fscache_read_helper(req->cookie, start, start + req->nr_pages,
+				  notes, what);
+
+	if (notes & FSCACHE_RHLP_NOTE_CANCELLED)
+		goto cancelled;
+
+	if (notes & FSCACHE_RHLP_NOTE_DO_WRITE_TO_CACHE)
+		__set_bit(FSCACHE_IO_WRITE_TO_CACHE, &req->flags);
+
+	__set_bit(FSCACHE_IO_READ_IN_PROGRESS, &req->flags);
+
+	switch (what) {
+	case fscache_read_helper_skip:
+		/* The read is entirely beyond the end of the file, so skip the
+		 * actual operation and let the done handler deal with clearing
+		 * the pages.
+		 */
+		_debug("SKIP READ: %llu", req->len);
+		fscache_read_done(req);
+		break;
+	case fscache_read_helper_zero:
+		_debug("ZERO READ: %llu", req->len);
+		fscache_read_done(req);
+		break;
+	case fscache_read_helper_read:
+		req->io_done = fscache_file_read_maybe_reissue;
+		fscache_read_from_cache(req);
+		break;
+	case fscache_read_helper_download:
+		_debug("DOWNLOAD: %llu", req->len);
+		req->io_done = fscache_read_done;
+		fscache_read_from_server(req);
+		break;
+	default:
+		BUG();
+	}
+
+	_leave(" = 0");
+	return 0;
+
+nomem:
+	if (cursor > shape.proposed_start)
+		goto submit_anyway;
+nomem_unlock:
+	ret = -ENOMEM;
+cancelled:
+	fscache_ignore_pages(mapping, start, cursor);
+dont:
+	_leave(" = %d", ret);
+	return ret;
+}
+
+int fscache_read_helper_page_list(struct fscache_io_request *req,
+				  struct list_head *pages,
+				  pgoff_t max_pages)
+{
+	ASSERT(pages);
+	ASSERT(!list_empty(pages));
+	return fscache_read_helper(req, NULL, pages, 0, max_pages,
+				   FSCACHE_READ_PAGE_LIST, 0);
+}
+EXPORT_SYMBOL(fscache_read_helper_page_list);
+
+int fscache_read_helper_locked_page(struct fscache_io_request *req,
+				    struct page *page,
+				    pgoff_t max_pages)
+{
+	ASSERT(page);
+	return fscache_read_helper(req, &page, NULL, 0, max_pages,
+				   FSCACHE_READ_LOCKED_PAGE, 0);
+}
+EXPORT_SYMBOL(fscache_read_helper_locked_page);
+
+int fscache_read_helper_for_write(struct fscache_io_request *req,
+				  struct page **page,
+				  pgoff_t index,
+				  pgoff_t max_pages,
+				  unsigned int aop_flags)
+{
+	ASSERT(page);
+	ASSERTIF(*page, (*page)->index == index);
+	return fscache_read_helper(req, page, NULL, index, max_pages,
+				   FSCACHE_READ_FOR_WRITE, aop_flags);
+}
+EXPORT_SYMBOL(fscache_read_helper_for_write);
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index bfb28cebfcfd..0aee6edef672 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -181,12 +181,24 @@ struct fscache_io_request {
 	unsigned long		flags;
 #define FSCACHE_IO_DATA_FROM_SERVER	0	/* Set if data was read from server */
 #define FSCACHE_IO_DATA_FROM_CACHE	1	/* Set if data was read from the cache */
+#define FSCACHE_IO_DONT_UNLOCK_PAGES	2	/* Don't unlock the pages on completion */
+#define FSCACHE_IO_READ_IN_PROGRESS	3	/* Cleared and woken upon completion of the read */
+#define FSCACHE_IO_WRITE_TO_CACHE	4	/* Set if should write to cache */
 	void (*io_done)(struct fscache_io_request *);
+	struct work_struct	work;
+
+	/* Bits for readpages helper */
+	struct address_space	*mapping;	/* The mapping being accessed */
+	unsigned int		nr_pages;	/* Number of pages involved in the I/O */
+	unsigned int		dio_block_size;	/* Rounding for direct I/O in the cache */
+	struct page		*no_unlock_page; /* Don't unlock this page after read */
 };
 
 struct fscache_io_request_ops {
+	int (*is_req_valid)(struct fscache_io_request *);
 	bool (*is_still_valid)(struct fscache_io_request *);
 	void (*issue_op)(struct fscache_io_request *);
+	void (*reshape)(struct fscache_io_request *, struct fscache_request_shape *);
 	void (*done)(struct fscache_io_request *);
 	void (*get)(struct fscache_io_request *);
 	void (*put)(struct fscache_io_request *);
@@ -489,7 +501,7 @@ static inline void fscache_init_io_request(struct fscache_io_request *req,
 static inline
 void fscache_free_io_request(struct fscache_io_request *req)
 {
-	if (req->cookie)
+	if (fscache_cookie_valid(req->cookie))
 		__fscache_free_io_request(req);
 }
 
@@ -593,4 +605,16 @@ int fscache_write(struct fscache_io_request *req, struct iov_iter *iter)
 	return -ENOBUFS;
 }
 
+extern int fscache_read_helper_page_list(struct fscache_io_request *,
+					 struct list_head *,
+					 pgoff_t);
+extern int fscache_read_helper_locked_page(struct fscache_io_request *,
+					   struct page *,
+					   pgoff_t);
+extern int fscache_read_helper_for_write(struct fscache_io_request *,
+					 struct page **,
+					 pgoff_t,
+					 pgoff_t,
+					 unsigned int);
+
 #endif /* _LINUX_FSCACHE_H */
diff --git a/include/trace/events/fscache_support.h b/include/trace/events/fscache_support.h
new file mode 100644
index 000000000000..0d94650ad637
--- /dev/null
+++ b/include/trace/events/fscache_support.h
@@ -0,0 +1,91 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* FS-Cache support module tracepoints
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM fscache_support
+
+#if !defined(_TRACE_FSCACHE_SUPPORT_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_FSCACHE_SUPPORT_H
+
+#include <linux/fscache.h>
+#include <linux/tracepoint.h>
+
+/*
+ * Define enums for tracing information.
+ */
+#ifndef __FSCACHE_SUPPORT_DECLARE_TRACE_ENUMS_ONCE_ONLY
+#define __FSCACHE_SUPPORT_DECLARE_TRACE_ENUMS_ONCE_ONLY
+
+enum fscache_read_helper_trace {
+	fscache_read_helper_download,
+	fscache_read_helper_race,
+	fscache_read_helper_read,
+	fscache_read_helper_skip,
+	fscache_read_helper_zero,
+};
+
+#endif
+
+#define fscache_read_helper_traces				\
+	EM(fscache_read_helper_download,	"DOWN")		\
+	EM(fscache_read_helper_race,		"RACE")		\
+	EM(fscache_read_helper_read,		"READ")		\
+	EM(fscache_read_helper_skip,		"SKIP")		\
+	E_(fscache_read_helper_zero,		"ZERO")
+
+
+/*
+ * Export enum symbols via userspace.
+ */
+#undef EM
+#undef E_
+#define EM(a, b) TRACE_DEFINE_ENUM(a);
+#define E_(a, b) TRACE_DEFINE_ENUM(a);
+
+fscache_read_helper_traces;
+
+/*
+ * Now redefine the EM() and E_() macros to map the enums to the strings that
+ * will be printed in the output.
+ */
+#undef EM
+#undef E_
+#define EM(a, b)	{ a, b },
+#define E_(a, b)	{ a, b }
+
+TRACE_EVENT(fscache_read_helper,
+	    TP_PROTO(struct fscache_cookie *cookie, pgoff_t start, pgoff_t end,
+		     unsigned int notes, enum fscache_read_helper_trace what),
+
+	    TP_ARGS(cookie, start, end, notes, what),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,		cookie		)
+		    __field(pgoff_t,			start		)
+		    __field(pgoff_t,			end		)
+		    __field(unsigned int,		notes		)
+		    __field(enum fscache_read_helper_trace, what	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->cookie	= cookie ? cookie->debug_id : 0;
+		    __entry->start	= start;
+		    __entry->end	= end;
+		    __entry->what	= what;
+		    __entry->notes	= notes;
+			   ),
+
+	    TP_printk("c=%08x %s n=%08x p=%lx-%lx",
+		      __entry->cookie,
+		      __print_symbolic(__entry->what, fscache_read_helper_traces),
+		      __entry->notes,
+		      __entry->start, __entry->end)
+	    );
+
+#endif /* _TRACE_FSCACHE_SUPPORT_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 24/32] fscache: Add read helper
@ 2020-07-13 16:35   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:35 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs

Add a trio of helper functions:

	fscache_read_helper_page_list();
	fscache_read_helper_locked_page();
	fscache_read_helper_for_write();

to do the work of shaping read requests, attempting to read from the cache,
issuing or reissuing requests to the filesystem to pass to the server and
writing back to the filesystem.

The filesystem passes in a prepared request descriptor with the fscache
descriptor embedded in it to one of the helper functions.  The caller also
indicates which page(s) it is interested in and provides some operations to
issue reads and manage the request descriptor.

The helper is placed into its own module, fsinfo_support.ko, which must be
enabled unconditionally by any filesystem which wishes to use the helper
even if CONFIG_FSCACHE=no.  This module is selected by
CONFIG_FSCACHE_SUPPORT.  About half of the code is optimised away by
CONFIG_FSCACHE=no.

Also add a tracepoint to track calls.  A set of 'notes' are taken to record
the path through the function and this is dumped into the trace.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/Makefile                            |    2 
 fs/fscache/Kconfig                     |    4 
 fs/fscache/Makefile                    |    3 
 fs/fscache/internal.h                  |    8 
 fs/fscache/main.c                      |    1 
 fs/fscache/read_helper.c               |  656 ++++++++++++++++++++++++++++++++
 include/linux/fscache.h                |   26 +
 include/trace/events/fscache_support.h |   91 ++++
 8 files changed, 789 insertions(+), 2 deletions(-)
 create mode 100644 fs/fscache/read_helper.c
 create mode 100644 include/trace/events/fscache_support.h

diff --git a/fs/Makefile b/fs/Makefile
index 2ce5112b02c8..8b0a5b5b1d86 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -68,7 +68,7 @@ obj-$(CONFIG_PROFILING)		+= dcookies.o
 obj-$(CONFIG_DLM)		+= dlm/
  
 # Do not add any filesystems before this line
-obj-$(CONFIG_FSCACHE)		+= fscache/
+obj-$(CONFIG_FSCACHE_SUPPORT)	+= fscache/
 obj-$(CONFIG_REISERFS_FS)	+= reiserfs/
 obj-$(CONFIG_EXT4_FS)		+= ext4/
 # We place ext4 before ext2 so that clean ext3 root fs's do NOT mount using the
diff --git a/fs/fscache/Kconfig b/fs/fscache/Kconfig
index ce6f731065d0..369c12ef0167 100644
--- a/fs/fscache/Kconfig
+++ b/fs/fscache/Kconfig
@@ -1,7 +1,11 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
+config FSCACHE_SUPPORT
+	tristate "Support for local caching of network filesystems"
+
 config FSCACHE
 	tristate "General filesystem local caching manager"
+	depends on FSCACHE_SUPPORT
 	help
 	  This option enables a generic filesystem caching manager that can be
 	  used by various network and other filesystems to cache data locally.
diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
index 3caf66810e7b..0a5c8c654942 100644
--- a/fs/fscache/Makefile
+++ b/fs/fscache/Makefile
@@ -20,3 +20,6 @@ fscache-$(CONFIG_FSCACHE_HISTOGRAM) += histogram.o
 fscache-$(CONFIG_FSCACHE_OBJECT_LIST) += object-list.o
 
 obj-$(CONFIG_FSCACHE) := fscache.o
+
+fscache_support-y := read_helper.o
+obj-$(CONFIG_FSCACHE_SUPPORT) += fscache_support.o
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index a70c1a612309..2674438ccafd 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -30,6 +30,8 @@
 #include <linux/sched.h>
 #include <linux/seq_file.h>
 
+#if IS_ENABLED(CONFIG_FSCACHE)
+
 #define FSCACHE_MIN_THREADS	4
 #define FSCACHE_MAX_THREADS	32
 
@@ -266,6 +268,12 @@ void fscache_update_aux(struct fscache_cookie *cookie,
 		cookie->object_size = *object_size;
 }
 
+#else /* CONFIG_FSCACHE */
+
+#define fscache_op_wq system_wq
+
+#endif /* CONFIG_FSCACHE */
+
 /*****************************************************************************/
 /*
  * debug tracing
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index b2691439377b..ac4fd4d59479 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -39,6 +39,7 @@ MODULE_PARM_DESC(fscache_debug,
 
 struct kobject *fscache_root;
 struct workqueue_struct *fscache_op_wq;
+EXPORT_SYMBOL(fscache_op_wq);
 
 /* these values serve as lower bounds, will be adjusted in fscache_init() */
 static unsigned fscache_object_max_active = 4;
diff --git a/fs/fscache/read_helper.c b/fs/fscache/read_helper.c
new file mode 100644
index 000000000000..62fed27aa938
--- /dev/null
+++ b/fs/fscache/read_helper.c
@@ -0,0 +1,656 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Read helper.
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#define FSCACHE_DEBUG_LEVEL OPERATION
+#include <linux/module.h>
+#include <linux/export.h>
+#include <linux/slab.h>
+#include <linux/uio.h>
+#include <linux/task_io_accounting_ops.h>
+#include <linux/fscache-cache.h>
+#include "internal.h"
+#define CREATE_TRACE_POINTS
+#include <trace/events/fscache_support.h>
+
+MODULE_DESCRIPTION("FS Cache Manager Support");
+MODULE_AUTHOR("Red Hat, Inc.");
+MODULE_LICENSE("GPL");
+
+#define FSCACHE_RHLP_NOTE_READ_FROM_CACHE	FSCACHE_READ_FROM_CACHE
+#define FSCACHE_RHLP_NOTE_WRITE_TO_CACHE	FSCACHE_WRITE_TO_CACHE
+#define FSCACHE_RHLP_NOTE_FILL_WITH_ZERO	FSCACHE_FILL_WITH_ZERO
+#define FSCACHE_RHLP_NOTE_READ_FOR_WRITE	0x00000100 /* Type: FSCACHE_READ_FOR_WRITE */
+#define FSCACHE_RHLP_NOTE_READ_LOCKED_PAGE	0x00000200 /* Type: FSCACHE_READ_LOCKED_PAGE */
+#define FSCACHE_RHLP_NOTE_READ_PAGE_LIST	0x00000300 /* Type: FSCACHE_READ_PAGE_LIST */
+#define FSCACHE_RHLP_NOTE_LIST_NOMEM		0x00001000 /* Page list: ENOMEM */
+#define FSCACHE_RHLP_NOTE_LIST_U2D		0x00002000 /* Page list: page uptodate */
+#define FSCACHE_RHLP_NOTE_LIST_ERROR		0x00004000 /* Page list: add error */
+#define FSCACHE_RHLP_NOTE_TRAILER_ADD		0x00010000 /* Trailer: Creating */
+#define FSCACHE_RHLP_NOTE_TRAILER_NOMEM		0x00020000 /* Trailer: ENOMEM */
+#define FSCACHE_RHLP_NOTE_TRAILER_U2D		0x00040000 /* Trailer: Uptodate */
+#define FSCACHE_RHLP_NOTE_U2D_IN_PREFACE	0x00100000 /* Uptodate page in preface */
+#define FSCACHE_RHLP_NOTE_UNDERSIZED		0x00200000 /* Undersized block */
+#define FSCACHE_RHLP_NOTE_AFTER_EOF		0x00400000 /* After EOF */
+#define FSCACHE_RHLP_NOTE_DO_WRITE_TO_CACHE	0x00800000 /* Actually write to the cache */
+#define FSCACHE_RHLP_NOTE_CANCELLED		0x80000000 /* Operation cancelled by netfs */
+
+unsigned fscache_rh_debug;
+module_param_named(debug, fscache_rh_debug, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(fscache_rh_debug, "FS-Cache read helper debugging mask");
+#define fscache_debug fscache_rh_debug
+
+enum fscache_read_type {
+	FSCACHE_READ_PAGE_LIST,		/* Read the list of pages (readpages) */
+	FSCACHE_READ_LOCKED_PAGE,	/* requested_page is added and locked */
+	FSCACHE_READ_FOR_WRITE,		/* This read is a prelude to write_begin */
+};
+
+static void fscache_read_from_server(struct fscache_io_request *req)
+{
+	req->ops->issue_op(req);
+}
+
+/*
+ * Deal with the completion of writing the data to the cache.  We have to clear
+ * the PG_fscache bits on the pages involved and releases the caller's ref.
+ */
+static void fscache_read_copy_done(struct fscache_io_request *req)
+{
+	struct page *page;
+	pgoff_t index = req->pos >> PAGE_SHIFT;
+	pgoff_t last = index + req->nr_pages - 1;
+
+	XA_STATE(xas, &req->mapping->i_pages, index);
+
+	_enter("%lx,%x,%llx", index, req->nr_pages, req->transferred);
+
+	/* Clear PG_fscache on the pages that were being written out. */
+	rcu_read_lock();
+	xas_for_each(&xas, page, last) {
+		BUG_ON(xa_is_value(page));
+		BUG_ON(PageCompound(page));
+
+		unlock_page_fscache(page);
+	}
+	rcu_read_unlock();
+}
+
+/*
+ * Write a completed read request to the cache.
+ */
+static void fscache_do_read_copy_to_cache(struct work_struct *work)
+{
+	struct fscache_io_request *req =
+		container_of(work, struct fscache_io_request, work);
+	struct iov_iter iter;
+
+	_enter("");
+
+	iov_iter_mapping(&iter, WRITE, req->mapping, req->pos,
+			 round_up(req->len, req->dio_block_size));
+
+	req->io_done = fscache_read_copy_done;
+	fscache_write(req, &iter);
+	fscache_put_io_request(req);
+}
+
+static void fscache_read_copy_to_cache(struct fscache_io_request *req)
+{
+	fscache_get_io_request(req);
+
+	if (!in_softirq())
+		return fscache_do_read_copy_to_cache(&req->work);
+
+	BUG_ON(work_pending(&req->work));
+	INIT_WORK(&req->work, fscache_do_read_copy_to_cache);
+	if (!queue_work(fscache_op_wq, &req->work))
+		BUG();
+}
+
+/*
+ * Clear the unread part of the file on a short read.
+ */
+static void fscache_clear_unread(struct fscache_io_request *req)
+{
+	struct iov_iter iter;
+
+	iov_iter_mapping(&iter, WRITE, req->mapping,
+			 req->pos + req->transferred,
+			 req->len - req->transferred);
+
+	_debug("clear %zx @%llx", iov_iter_count(&iter), iter.mapping_start);
+
+	iov_iter_zero(iov_iter_count(&iter), &iter);
+}
+
+/*
+ * Handle completion of a read operation.  This may be called in softirq
+ * context.
+ */
+static void fscache_read_done(struct fscache_io_request *req)
+{
+	struct page *page;
+	pgoff_t start = req->pos >> PAGE_SHIFT;
+	pgoff_t last = start + req->nr_pages - 1;
+
+	XA_STATE(xas, &req->mapping->i_pages, start);
+
+	_enter("%lx,%x,%llx,%d",
+	       start, req->nr_pages, req->transferred, req->error);
+
+	if (req->transferred < req->len)
+		fscache_clear_unread(req);
+
+	if (!test_bit(FSCACHE_IO_DONT_UNLOCK_PAGES, &req->flags)) {
+		rcu_read_lock();
+		xas_for_each(&xas, page, last) {
+			if (test_bit(FSCACHE_IO_WRITE_TO_CACHE, &req->flags))
+				SetPageFsCache(page);
+			if (page == req->no_unlock_page)
+				SetPageUptodate(page);
+			else
+				page_endio(page, false, 0);
+			put_page(page);
+		}
+		rcu_read_unlock();
+	}
+
+	task_io_account_read(req->transferred);
+	req->ops->done(req);
+	if (test_and_clear_bit(FSCACHE_IO_READ_IN_PROGRESS, &req->flags))
+		wake_up_bit(&req->flags, FSCACHE_IO_READ_IN_PROGRESS);
+
+	if (test_bit(FSCACHE_IO_WRITE_TO_CACHE, &req->flags))
+		fscache_read_copy_to_cache(req);
+}
+
+/*
+ * Reissue the read against the server.
+ */
+static void fscache_reissue_read(struct work_struct *work)
+{
+	struct fscache_io_request *req =
+		container_of(work, struct fscache_io_request, work);
+
+	_debug("DOWNLOAD: %llu", req->len);
+
+	req->io_done = fscache_read_done;
+	fscache_read_from_server(req);
+	fscache_put_io_request(req);
+}
+
+/*
+ * Handle completion of a read from cache operation.  If the read failed, we
+ * need to reissue the request against the server.  We might, however, be
+ * called in softirq mode and need to punt.
+ */
+static void fscache_file_read_maybe_reissue(struct fscache_io_request *req)
+{
+	_enter("%d", req->error);
+
+	if (req->error == 0) {
+		fscache_read_done(req);
+	} else {
+		INIT_WORK(&req->work, fscache_reissue_read);
+		fscache_get_io_request(req);
+		queue_work(fscache_op_wq, &req->work);
+	}
+}
+
+/*
+ * Issue a read against the cache.
+ */
+static void fscache_read_from_cache(struct fscache_io_request *req)
+{
+	struct iov_iter iter;
+
+	iov_iter_mapping(&iter, READ, req->mapping, req->pos, req->len);
+	fscache_read(req, &iter);
+}
+
+/*
+ * Discard the locks and page refs that we obtained on a sequence of pages.
+ */
+static void fscache_ignore_pages(struct address_space *mapping,
+				  pgoff_t start, pgoff_t end)
+{
+	struct page *page;
+
+	_enter("%lx,%lx", start, end);
+
+	if (end > start) {
+		XA_STATE(xas, &mapping->i_pages, start);
+
+		rcu_read_lock();
+		xas_for_each(&xas, page, end - 1) {
+			_debug("- ignore %lx", page->index);
+			BUG_ON(xa_is_value(page));
+			BUG_ON(PageCompound(page));
+
+			unlock_page(page);
+			put_page(page);
+		}
+		rcu_read_unlock();
+	}
+}
+
+/**
+ * fscache_read_helper - Helper to manage a read request
+ * @req: The initialised request structure to use
+ * @requested_page: Singular page to include (LOCKED_PAGE/FOR_WRITE)
+ * @pages: Unattached pages to include (PAGE_LIST)
+ * @page_to_be_written: The index of the primary page (FOR_WRITE)
+ * @max_pages: The maximum number of pages to read in one transaction
+ * @type: FSCACHE_READ_*
+ * @aop_flags: AOP_FLAG_*
+ *
+ * Read a sequence of pages appropriately sized for an fscache allocation
+ * block.  Pages are added at both ends and to fill in the gaps as appropriate
+ * to make it the right size.
+ *
+ * req->mapping should indicate the mapping to which the pages will be attached.
+ *
+ * The operations pointed to by req->ops will be used to issue or reissue a
+ * read against the server in case the cache is unavailable, incomplete or
+ * generates an error.  req->iter will be set up to point to the iterator
+ * representing the buffer to be filled in.
+ *
+ * A ref on @req is consumed eventually by this function or one of its
+ * eventually-dispatched callees.
+ */
+static int fscache_read_helper(struct fscache_io_request *req,
+			       struct page **requested_page,
+			       struct list_head *pages,
+			       pgoff_t page_to_be_written,
+			       pgoff_t max_pages,
+			       enum fscache_read_type type,
+			       unsigned int aop_flags)
+{
+	struct fscache_request_shape shape;
+	struct address_space *mapping = req->mapping;
+	struct page *page;
+	enum fscache_read_helper_trace what;
+	unsigned int notes;
+	pgoff_t eof, cursor, start;
+	loff_t new_size;
+	int ret;
+
+	shape.granularity	= 1;
+	shape.max_io_pages	= max_pages;
+	shape.i_size		= i_size_read(mapping->host);
+	shape.for_write		= false;
+
+	switch (type) {
+	case FSCACHE_READ_PAGE_LIST:
+		shape.proposed_start = lru_to_page(pages)->index;
+		shape.proposed_nr_pages =
+			lru_to_last_page(pages)->index - shape.proposed_start + 1;
+		break;
+
+	case FSCACHE_READ_LOCKED_PAGE:
+		shape.proposed_start = (*requested_page)->index;
+		shape.proposed_nr_pages = 1;
+		break;
+
+	case FSCACHE_READ_FOR_WRITE:
+		new_size = (loff_t)(page_to_be_written + 1) << PAGE_SHIFT;
+		if (new_size > shape.i_size)
+			shape.i_size = new_size;
+		shape.proposed_start = page_to_be_written;
+		shape.proposed_nr_pages = 1;
+		break;
+
+	default:
+		BUG();
+	}
+
+	_enter("%lx,%x", shape.proposed_start, shape.proposed_nr_pages);
+
+	eof = (shape.i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
+	fscache_shape_request(req->cookie, &shape);
+	if (req->ops->reshape)
+		req->ops->reshape(req, &shape);
+	notes = shape.to_be_done;
+
+	req->dio_block_size = shape.dio_block_size;
+
+	start = cursor = shape.actual_start;
+
+	/* Add pages to the pagecache.  We keep the pages ref'd and locked
+	 * until the read is complete.  We may also need to add pages to both
+	 * sides of the request to make it up to the cache allocation granule
+	 * alignment and size.
+	 *
+	 * Note that it's possible for the file size to change whilst we're
+	 * doing this, but we rely on the server returning less than we asked
+	 * for if the file shrank.  We also rely on this to deal with a partial
+	 * page at the end of the file.
+	 *
+	 * If we're going to end up loading from the server and writing to the
+	 * cache, we start by inserting blank pages before the first page being
+	 * examined.  If we can fetch from the cache or we're not going to
+	 * write to the cache, it's unnecessary.
+	 */
+	if (notes & FSCACHE_RHLP_NOTE_WRITE_TO_CACHE) {
+		notes |= FSCACHE_RHLP_NOTE_DO_WRITE_TO_CACHE;
+		while (cursor < shape.proposed_start) {
+			page = find_or_create_page(mapping, cursor,
+						   readahead_gfp_mask(mapping));
+			if (!page)
+				goto nomem;
+			if (!PageUptodate(page)) {
+				req->nr_pages++; /* Add to the reading list */
+				cursor++;
+				continue;
+			}
+
+			/* There's an up-to-date page in the preface - just
+			 * fetch the requested pages and skip saving to the
+			 * cache.
+			 */
+			notes |= FSCACHE_RHLP_NOTE_U2D_IN_PREFACE;
+			notes &= ~FSCACHE_RHLP_NOTE_DO_WRITE_TO_CACHE;
+			fscache_ignore_pages(mapping, start, cursor + 1);
+			start = cursor = shape.proposed_start;
+			req->nr_pages = 0;
+			break;
+		}
+		page = NULL;
+	} else {
+		notes &= ~FSCACHE_RHLP_NOTE_DO_WRITE_TO_CACHE;
+		start = cursor = shape.proposed_start;
+		req->nr_pages = 0;
+	}
+
+	switch (type) {
+	case FSCACHE_READ_FOR_WRITE:
+		/* We're doing a prefetch for a write on a single page.  We get
+		 * or create the requested page if we weren't given it and lock
+		 * it.
+		 */
+		notes |= FSCACHE_RHLP_NOTE_READ_FOR_WRITE;
+		if (*requested_page) {
+			_debug("prewrite req %lx", cursor);
+			page = *requested_page;
+			ret = -ERESTARTSYS;
+			if (lock_page_killable(page) < 0)
+				goto dont;
+		} else {
+			_debug("prewrite new %lx %lx", cursor, eof);
+			page = grab_cache_page_write_begin(mapping, shape.proposed_start,
+							   aop_flags);
+			if (!page)
+				goto nomem;
+			*requested_page = page;
+		}
+
+		if (PageUptodate(page)) {
+			notes |= FSCACHE_RHLP_NOTE_LIST_U2D;
+
+			trace_fscache_read_helper(req->cookie,
+						  start, start + req->nr_pages,
+						  notes, fscache_read_helper_race);
+			req->ops->done(req);
+			ret = 0;
+			goto cancelled;
+		}
+
+		get_page(page);
+		req->no_unlock_page = page;
+		req->nr_pages++;
+		cursor++;
+		page = NULL;
+		ret = 0;
+		break;
+
+	case FSCACHE_READ_LOCKED_PAGE:
+		/* We've got a single page preattached to the inode and locked.
+		 * Get our own ref on it.
+		 */
+		_debug("locked");
+		notes |= FSCACHE_RHLP_NOTE_READ_LOCKED_PAGE;
+		get_page(*requested_page);
+		req->nr_pages++;
+		cursor++;
+		ret = 0;
+		break;
+
+	case FSCACHE_READ_PAGE_LIST:
+		/* We've been given a contiguous list of pages to add. */
+		notes |= FSCACHE_RHLP_NOTE_READ_PAGE_LIST;
+		do {
+			_debug("given %lx", cursor);
+
+			page = lru_to_page(pages);
+			if (WARN_ON(page->index != cursor))
+				break;
+
+			list_del(&page->lru);
+
+			ret = add_to_page_cache_lru(page, mapping, cursor,
+						    readahead_gfp_mask(mapping));
+			switch (ret) {
+			case 0:
+				/* Add to the reading list */
+				req->nr_pages++;
+				cursor++;
+				page = NULL;
+				break;
+
+			case -EEXIST:
+				put_page(page);
+
+				_debug("conflict %lx %d", cursor, ret);
+				page = find_or_create_page(mapping, cursor,
+							   readahead_gfp_mask(mapping));
+				if (!page) {
+					notes |= FSCACHE_RHLP_NOTE_LIST_NOMEM;
+					goto stop;
+				}
+
+				if (PageUptodate(page)) {
+					unlock_page(page);
+					put_page(page); /* Avoid overwriting */
+					ret = 0;
+					notes |= FSCACHE_RHLP_NOTE_LIST_U2D;
+					goto stop;
+				}
+
+				req->nr_pages++; /* Add to the reading list */
+				cursor++;
+				break;
+
+			default:
+				_debug("add fail %lx %d", cursor, ret);
+				put_page(page);
+				page = NULL;
+				notes |= FSCACHE_RHLP_NOTE_LIST_ERROR;
+				goto stop;
+			}
+
+			/* Trim the fetch to the cache granularity so we don't
+			 * get a chain-failure of blocks being unable to be
+			 * used because the previous uncached read spilt over.
+			 */
+			if ((notes & FSCACHE_RHLP_NOTE_U2D_IN_PREFACE) &&
+			    cursor == shape.actual_start + shape.granularity)
+				break;
+
+		} while (!list_empty(pages) && req->nr_pages < shape.actual_nr_pages);
+		ret = 0;
+		break;
+
+	default:
+		BUG();
+	}
+
+	/* If we're going to be writing to the cache, insert pages after the
+	 * requested block to make up the numbers.
+	 */
+	if (notes & FSCACHE_RHLP_NOTE_DO_WRITE_TO_CACHE) {
+		notes |= FSCACHE_RHLP_NOTE_TRAILER_ADD;
+		while (req->nr_pages < shape.actual_nr_pages) {
+			_debug("after %lx", cursor);
+			page = find_or_create_page(mapping, cursor,
+						   readahead_gfp_mask(mapping));
+			if (!page) {
+				notes |= FSCACHE_RHLP_NOTE_TRAILER_NOMEM;
+				goto stop;
+			}
+			if (PageUptodate(page)) {
+				unlock_page(page);
+				put_page(page); /* Avoid overwriting */
+				notes |= FSCACHE_RHLP_NOTE_TRAILER_U2D;
+				goto stop;
+			}
+
+			req->nr_pages++; /* Add to the reading list */
+			cursor++;
+		}
+	}
+
+stop:
+	_debug("have %u", req->nr_pages);
+	if (req->nr_pages == 0)
+		goto dont;
+
+	if (cursor <= shape.proposed_start) {
+		_debug("v.short");
+		goto nomem_unlock; /* We wouldn't've included the first page */
+	}
+
+submit_anyway:
+	if ((notes & FSCACHE_RHLP_NOTE_DO_WRITE_TO_CACHE) &&
+	    req->nr_pages < shape.actual_nr_pages) {
+		/* The request is short of what we need to be able to cache the
+		 * entire set of pages and the trailer, so trim it to cache
+		 * granularity if we can without reducing it to nothing.
+		 */
+		unsigned int down_to = round_down(req->nr_pages, shape.granularity);
+		_debug("short %u", down_to);
+
+		notes |= FSCACHE_RHLP_NOTE_UNDERSIZED;
+
+		if (down_to > 0) {
+			fscache_ignore_pages(mapping, shape.actual_start + down_to, cursor);
+			req->nr_pages = down_to;
+		} else {
+			notes &= ~FSCACHE_RHLP_NOTE_DO_WRITE_TO_CACHE;
+		}
+	}
+
+	req->len = req->nr_pages * PAGE_SIZE;
+	req->pos = start;
+	req->pos <<= PAGE_SHIFT;
+
+	if (start >= eof) {
+		notes |= FSCACHE_RHLP_NOTE_AFTER_EOF;
+		what = fscache_read_helper_skip;
+	} else if (notes & FSCACHE_RHLP_NOTE_FILL_WITH_ZERO) {
+		what = fscache_read_helper_zero;
+	} else if (notes & FSCACHE_RHLP_NOTE_READ_FROM_CACHE) {
+		what = fscache_read_helper_read;
+	} else {
+		what = fscache_read_helper_download;
+	}
+
+	ret = 0;
+	if (req->ops->is_req_valid) {
+		/* Allow the netfs to decide if the request is still valid
+		 * after all the pages are locked.
+		 */
+		ret = req->ops->is_req_valid(req);
+		if (ret < 0)
+			notes |= FSCACHE_RHLP_NOTE_CANCELLED;
+	}
+
+	trace_fscache_read_helper(req->cookie, start, start + req->nr_pages,
+				  notes, what);
+
+	if (notes & FSCACHE_RHLP_NOTE_CANCELLED)
+		goto cancelled;
+
+	if (notes & FSCACHE_RHLP_NOTE_DO_WRITE_TO_CACHE)
+		__set_bit(FSCACHE_IO_WRITE_TO_CACHE, &req->flags);
+
+	__set_bit(FSCACHE_IO_READ_IN_PROGRESS, &req->flags);
+
+	switch (what) {
+	case fscache_read_helper_skip:
+		/* The read is entirely beyond the end of the file, so skip the
+		 * actual operation and let the done handler deal with clearing
+		 * the pages.
+		 */
+		_debug("SKIP READ: %llu", req->len);
+		fscache_read_done(req);
+		break;
+	case fscache_read_helper_zero:
+		_debug("ZERO READ: %llu", req->len);
+		fscache_read_done(req);
+		break;
+	case fscache_read_helper_read:
+		req->io_done = fscache_file_read_maybe_reissue;
+		fscache_read_from_cache(req);
+		break;
+	case fscache_read_helper_download:
+		_debug("DOWNLOAD: %llu", req->len);
+		req->io_done = fscache_read_done;
+		fscache_read_from_server(req);
+		break;
+	default:
+		BUG();
+	}
+
+	_leave(" = 0");
+	return 0;
+
+nomem:
+	if (cursor > shape.proposed_start)
+		goto submit_anyway;
+nomem_unlock:
+	ret = -ENOMEM;
+cancelled:
+	fscache_ignore_pages(mapping, start, cursor);
+dont:
+	_leave(" = %d", ret);
+	return ret;
+}
+
+int fscache_read_helper_page_list(struct fscache_io_request *req,
+				  struct list_head *pages,
+				  pgoff_t max_pages)
+{
+	ASSERT(pages);
+	ASSERT(!list_empty(pages));
+	return fscache_read_helper(req, NULL, pages, 0, max_pages,
+				   FSCACHE_READ_PAGE_LIST, 0);
+}
+EXPORT_SYMBOL(fscache_read_helper_page_list);
+
+int fscache_read_helper_locked_page(struct fscache_io_request *req,
+				    struct page *page,
+				    pgoff_t max_pages)
+{
+	ASSERT(page);
+	return fscache_read_helper(req, &page, NULL, 0, max_pages,
+				   FSCACHE_READ_LOCKED_PAGE, 0);
+}
+EXPORT_SYMBOL(fscache_read_helper_locked_page);
+
+int fscache_read_helper_for_write(struct fscache_io_request *req,
+				  struct page **page,
+				  pgoff_t index,
+				  pgoff_t max_pages,
+				  unsigned int aop_flags)
+{
+	ASSERT(page);
+	ASSERTIF(*page, (*page)->index == index);
+	return fscache_read_helper(req, page, NULL, index, max_pages,
+				   FSCACHE_READ_FOR_WRITE, aop_flags);
+}
+EXPORT_SYMBOL(fscache_read_helper_for_write);
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index bfb28cebfcfd..0aee6edef672 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -181,12 +181,24 @@ struct fscache_io_request {
 	unsigned long		flags;
 #define FSCACHE_IO_DATA_FROM_SERVER	0	/* Set if data was read from server */
 #define FSCACHE_IO_DATA_FROM_CACHE	1	/* Set if data was read from the cache */
+#define FSCACHE_IO_DONT_UNLOCK_PAGES	2	/* Don't unlock the pages on completion */
+#define FSCACHE_IO_READ_IN_PROGRESS	3	/* Cleared and woken upon completion of the read */
+#define FSCACHE_IO_WRITE_TO_CACHE	4	/* Set if should write to cache */
 	void (*io_done)(struct fscache_io_request *);
+	struct work_struct	work;
+
+	/* Bits for readpages helper */
+	struct address_space	*mapping;	/* The mapping being accessed */
+	unsigned int		nr_pages;	/* Number of pages involved in the I/O */
+	unsigned int		dio_block_size;	/* Rounding for direct I/O in the cache */
+	struct page		*no_unlock_page; /* Don't unlock this page after read */
 };
 
 struct fscache_io_request_ops {
+	int (*is_req_valid)(struct fscache_io_request *);
 	bool (*is_still_valid)(struct fscache_io_request *);
 	void (*issue_op)(struct fscache_io_request *);
+	void (*reshape)(struct fscache_io_request *, struct fscache_request_shape *);
 	void (*done)(struct fscache_io_request *);
 	void (*get)(struct fscache_io_request *);
 	void (*put)(struct fscache_io_request *);
@@ -489,7 +501,7 @@ static inline void fscache_init_io_request(struct fscache_io_request *req,
 static inline
 void fscache_free_io_request(struct fscache_io_request *req)
 {
-	if (req->cookie)
+	if (fscache_cookie_valid(req->cookie))
 		__fscache_free_io_request(req);
 }
 
@@ -593,4 +605,16 @@ int fscache_write(struct fscache_io_request *req, struct iov_iter *iter)
 	return -ENOBUFS;
 }
 
+extern int fscache_read_helper_page_list(struct fscache_io_request *,
+					 struct list_head *,
+					 pgoff_t);
+extern int fscache_read_helper_locked_page(struct fscache_io_request *,
+					   struct page *,
+					   pgoff_t);
+extern int fscache_read_helper_for_write(struct fscache_io_request *,
+					 struct page **,
+					 pgoff_t,
+					 pgoff_t,
+					 unsigned int);
+
 #endif /* _LINUX_FSCACHE_H */
diff --git a/include/trace/events/fscache_support.h b/include/trace/events/fscache_support.h
new file mode 100644
index 000000000000..0d94650ad637
--- /dev/null
+++ b/include/trace/events/fscache_support.h
@@ -0,0 +1,91 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* FS-Cache support module tracepoints
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM fscache_support
+
+#if !defined(_TRACE_FSCACHE_SUPPORT_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_FSCACHE_SUPPORT_H
+
+#include <linux/fscache.h>
+#include <linux/tracepoint.h>
+
+/*
+ * Define enums for tracing information.
+ */
+#ifndef __FSCACHE_SUPPORT_DECLARE_TRACE_ENUMS_ONCE_ONLY
+#define __FSCACHE_SUPPORT_DECLARE_TRACE_ENUMS_ONCE_ONLY
+
+enum fscache_read_helper_trace {
+	fscache_read_helper_download,
+	fscache_read_helper_race,
+	fscache_read_helper_read,
+	fscache_read_helper_skip,
+	fscache_read_helper_zero,
+};
+
+#endif
+
+#define fscache_read_helper_traces				\
+	EM(fscache_read_helper_download,	"DOWN")		\
+	EM(fscache_read_helper_race,		"RACE")		\
+	EM(fscache_read_helper_read,		"READ")		\
+	EM(fscache_read_helper_skip,		"SKIP")		\
+	E_(fscache_read_helper_zero,		"ZERO")
+
+
+/*
+ * Export enum symbols via userspace.
+ */
+#undef EM
+#undef E_
+#define EM(a, b) TRACE_DEFINE_ENUM(a);
+#define E_(a, b) TRACE_DEFINE_ENUM(a);
+
+fscache_read_helper_traces;
+
+/*
+ * Now redefine the EM() and E_() macros to map the enums to the strings that
+ * will be printed in the output.
+ */
+#undef EM
+#undef E_
+#define EM(a, b)	{ a, b },
+#define E_(a, b)	{ a, b }
+
+TRACE_EVENT(fscache_read_helper,
+	    TP_PROTO(struct fscache_cookie *cookie, pgoff_t start, pgoff_t end,
+		     unsigned int notes, enum fscache_read_helper_trace what),
+
+	    TP_ARGS(cookie, start, end, notes, what),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,		cookie		)
+		    __field(pgoff_t,			start		)
+		    __field(pgoff_t,			end		)
+		    __field(unsigned int,		notes		)
+		    __field(enum fscache_read_helper_trace, what	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->cookie	= cookie ? cookie->debug_id : 0;
+		    __entry->start	= start;
+		    __entry->end	= end;
+		    __entry->what	= what;
+		    __entry->notes	= notes;
+			   ),
+
+	    TP_printk("c=%08x %s n=%08x p=%lx-%lx",
+		      __entry->cookie,
+		      __print_symbolic(__entry->what, fscache_read_helper_traces),
+		      __entry->notes,
+		      __entry->start, __entry->end)
+	    );
+
+#endif /* _TRACE_FSCACHE_SUPPORT_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 25/32] fscache: Display cache-specific data in /proc/fs/fscache/objects
  2020-07-13 16:30 ` David Howells
@ 2020-07-13 16:35   ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:35 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Allow the cache to add information in /proc/fs/fscache/objects instead of
displaying cookie key and aux data - which can be seen in the cookies file.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/content-map.c   |   41 +++++++++++++++++++++++++++++++++++++++++
 fs/cachefiles/interface.c     |    1 +
 fs/cachefiles/internal.h      |    1 +
 fs/fscache/object-list.c      |   33 +++------------------------------
 include/linux/fscache-cache.h |    4 ++++
 5 files changed, 50 insertions(+), 30 deletions(-)

diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
index 91c44bb39a93..f2a10e8d8d6d 100644
--- a/fs/cachefiles/content-map.c
+++ b/fs/cachefiles/content-map.c
@@ -396,3 +396,44 @@ void cachefiles_save_content_map(struct cachefiles_object *object)
 
 	_leave(" = %zd", ret);
 }
+
+/*
+ * Display object information in proc.
+ */
+int cachefiles_display_object(struct seq_file *m, struct fscache_object *_object)
+{
+	struct cachefiles_object *object =
+		container_of(_object, struct cachefiles_object, fscache);
+
+	if (object->fscache.cookie->type == FSCACHE_COOKIE_TYPE_INDEX) {
+		if (object->content_info != CACHEFILES_CONTENT_NO_DATA)
+			seq_printf(m, " ???%u???", object->content_info);
+	} else {
+		switch (object->content_info) {
+		case CACHEFILES_CONTENT_NO_DATA:
+			seq_puts(m, " <n>");
+			break;
+		case CACHEFILES_CONTENT_SINGLE:
+			seq_puts(m, " <s>");
+			break;
+		case CACHEFILES_CONTENT_ALL:
+			seq_puts(m, " <a>");
+			break;
+		case CACHEFILES_CONTENT_MAP:
+			read_lock_bh(&object->content_map_lock);
+			if (object->content_map) {
+				seq_printf(m, " %*phN",
+					   object->content_map_size,
+					   object->content_map);
+			}
+			read_unlock_bh(&object->content_map_lock);
+			break;
+		default:
+			seq_printf(m, " <%u>", object->content_info);
+			break;
+		}
+	}
+
+	seq_putc(m, '\n');
+	return 0;
+}
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index e73de62d0e73..78180d269c5f 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -491,4 +491,5 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
 	.shape_request		= cachefiles_shape_request,
 	.read			= cachefiles_read,
 	.write			= cachefiles_write,
+	.display_object		= cachefiles_display_object,
 };
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 2ea469b77712..c91a9b3c5bd5 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -132,6 +132,7 @@ extern void cachefiles_expand_content_map(struct cachefiles_object *object, loff
 extern void cachefiles_shorten_content_map(struct cachefiles_object *object, loff_t new_size);
 extern bool cachefiles_load_content_map(struct cachefiles_object *object);
 extern void cachefiles_save_content_map(struct cachefiles_object *object);
+extern int cachefiles_display_object(struct seq_file *m, struct fscache_object *object);
 
 /*
  * daemon.c
diff --git a/fs/fscache/object-list.c b/fs/fscache/object-list.c
index 5777f909d31a..361610e124bd 100644
--- a/fs/fscache/object-list.c
+++ b/fs/fscache/object-list.c
@@ -155,7 +155,6 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 	struct fscache_cookie *cookie;
 	unsigned long config = data->config;
 	char _type[3], *type;
-	u8 *p;
 
 	if ((unsigned long) v == 1) {
 		seq_puts(m, "OBJECT   PARENT   USE CHLDN OPS FL  S"
@@ -201,8 +200,6 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 		   obj->stage);
 
 	if (obj->cookie) {
-		uint16_t keylen = 0, auxlen = 0;
-
 		switch (cookie->type) {
 		case 0:
 			type = "IX";
@@ -211,8 +208,7 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 			type = "DT";
 			break;
 		default:
-			snprintf(_type, sizeof(_type), "%02u",
-				 cookie->type);
+			snprintf(_type, sizeof(_type), "%02x", cookie->type);
 			type = _type;
 			break;
 		}
@@ -223,34 +219,11 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 			   type,
 			   cookie->stage,
 			   cookie->flags);
-
-		if (config & FSCACHE_OBJLIST_CONFIG_KEY)
-			keylen = cookie->key_len;
-
-		if (config & FSCACHE_OBJLIST_CONFIG_AUX)
-			auxlen = cookie->aux_len;
-
-		if (keylen > 0 || auxlen > 0) {
-			seq_puts(m, " ");
-			p = keylen <= sizeof(cookie->inline_key) ?
-				cookie->inline_key : cookie->key;
-			for (; keylen > 0; keylen--)
-				seq_printf(m, "%02x", *p++);
-			if (auxlen > 0) {
-				if (config & FSCACHE_OBJLIST_CONFIG_KEY)
-					seq_puts(m, ", ");
-				p = auxlen <= sizeof(cookie->inline_aux) ?
-					cookie->inline_aux : cookie->aux;
-				for (; auxlen > 0; auxlen--)
-					seq_printf(m, "%02x", *p++);
-			}
-		}
-
-		seq_puts(m, "\n");
 	} else {
 		seq_puts(m, "<no_netfs>\n");
 	}
-	return 0;
+
+	return obj->cache->ops->display_object(m, obj);
 }
 
 static const struct seq_operations fscache_objlist_ops = {
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 81a41e37f07b..1357c44d371b 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -19,6 +19,7 @@
 
 #define NR_MAXCACHES BITS_PER_LONG
 
+struct seq_file;
 struct fscache_cache;
 struct fscache_cache_ops;
 struct fscache_object;
@@ -151,6 +152,9 @@ struct fscache_cache_ops {
 	int (*write)(struct fscache_object *object,
 		     struct fscache_io_request *req,
 		     struct iov_iter *iter);
+
+	/* Display object info in /proc/fs/fscache/objects */
+	int (*display_object)(struct seq_file *m, struct fscache_object *object);
 };
 
 extern struct fscache_cookie fscache_fsdef_index;



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 25/32] fscache: Display cache-specific data in /proc/fs/fscache/objects
@ 2020-07-13 16:35   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:35 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs

Allow the cache to add information in /proc/fs/fscache/objects instead of
displaying cookie key and aux data - which can be seen in the cookies file.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/content-map.c   |   41 +++++++++++++++++++++++++++++++++++++++++
 fs/cachefiles/interface.c     |    1 +
 fs/cachefiles/internal.h      |    1 +
 fs/fscache/object-list.c      |   33 +++------------------------------
 include/linux/fscache-cache.h |    4 ++++
 5 files changed, 50 insertions(+), 30 deletions(-)

diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
index 91c44bb39a93..f2a10e8d8d6d 100644
--- a/fs/cachefiles/content-map.c
+++ b/fs/cachefiles/content-map.c
@@ -396,3 +396,44 @@ void cachefiles_save_content_map(struct cachefiles_object *object)
 
 	_leave(" = %zd", ret);
 }
+
+/*
+ * Display object information in proc.
+ */
+int cachefiles_display_object(struct seq_file *m, struct fscache_object *_object)
+{
+	struct cachefiles_object *object =
+		container_of(_object, struct cachefiles_object, fscache);
+
+	if (object->fscache.cookie->type == FSCACHE_COOKIE_TYPE_INDEX) {
+		if (object->content_info != CACHEFILES_CONTENT_NO_DATA)
+			seq_printf(m, " ???%u???", object->content_info);
+	} else {
+		switch (object->content_info) {
+		case CACHEFILES_CONTENT_NO_DATA:
+			seq_puts(m, " <n>");
+			break;
+		case CACHEFILES_CONTENT_SINGLE:
+			seq_puts(m, " <s>");
+			break;
+		case CACHEFILES_CONTENT_ALL:
+			seq_puts(m, " <a>");
+			break;
+		case CACHEFILES_CONTENT_MAP:
+			read_lock_bh(&object->content_map_lock);
+			if (object->content_map) {
+				seq_printf(m, " %*phN",
+					   object->content_map_size,
+					   object->content_map);
+			}
+			read_unlock_bh(&object->content_map_lock);
+			break;
+		default:
+			seq_printf(m, " <%u>", object->content_info);
+			break;
+		}
+	}
+
+	seq_putc(m, '\n');
+	return 0;
+}
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index e73de62d0e73..78180d269c5f 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -491,4 +491,5 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
 	.shape_request		= cachefiles_shape_request,
 	.read			= cachefiles_read,
 	.write			= cachefiles_write,
+	.display_object		= cachefiles_display_object,
 };
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 2ea469b77712..c91a9b3c5bd5 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -132,6 +132,7 @@ extern void cachefiles_expand_content_map(struct cachefiles_object *object, loff
 extern void cachefiles_shorten_content_map(struct cachefiles_object *object, loff_t new_size);
 extern bool cachefiles_load_content_map(struct cachefiles_object *object);
 extern void cachefiles_save_content_map(struct cachefiles_object *object);
+extern int cachefiles_display_object(struct seq_file *m, struct fscache_object *object);
 
 /*
  * daemon.c
diff --git a/fs/fscache/object-list.c b/fs/fscache/object-list.c
index 5777f909d31a..361610e124bd 100644
--- a/fs/fscache/object-list.c
+++ b/fs/fscache/object-list.c
@@ -155,7 +155,6 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 	struct fscache_cookie *cookie;
 	unsigned long config = data->config;
 	char _type[3], *type;
-	u8 *p;
 
 	if ((unsigned long) v == 1) {
 		seq_puts(m, "OBJECT   PARENT   USE CHLDN OPS FL  S"
@@ -201,8 +200,6 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 		   obj->stage);
 
 	if (obj->cookie) {
-		uint16_t keylen = 0, auxlen = 0;
-
 		switch (cookie->type) {
 		case 0:
 			type = "IX";
@@ -211,8 +208,7 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 			type = "DT";
 			break;
 		default:
-			snprintf(_type, sizeof(_type), "%02u",
-				 cookie->type);
+			snprintf(_type, sizeof(_type), "%02x", cookie->type);
 			type = _type;
 			break;
 		}
@@ -223,34 +219,11 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 			   type,
 			   cookie->stage,
 			   cookie->flags);
-
-		if (config & FSCACHE_OBJLIST_CONFIG_KEY)
-			keylen = cookie->key_len;
-
-		if (config & FSCACHE_OBJLIST_CONFIG_AUX)
-			auxlen = cookie->aux_len;
-
-		if (keylen > 0 || auxlen > 0) {
-			seq_puts(m, " ");
-			p = keylen <= sizeof(cookie->inline_key) ?
-				cookie->inline_key : cookie->key;
-			for (; keylen > 0; keylen--)
-				seq_printf(m, "%02x", *p++);
-			if (auxlen > 0) {
-				if (config & FSCACHE_OBJLIST_CONFIG_KEY)
-					seq_puts(m, ", ");
-				p = auxlen <= sizeof(cookie->inline_aux) ?
-					cookie->inline_aux : cookie->aux;
-				for (; auxlen > 0; auxlen--)
-					seq_printf(m, "%02x", *p++);
-			}
-		}
-
-		seq_puts(m, "\n");
 	} else {
 		seq_puts(m, "<no_netfs>\n");
 	}
-	return 0;
+
+	return obj->cache->ops->display_object(m, obj);
 }
 
 static const struct seq_operations fscache_objlist_ops = {
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 81a41e37f07b..1357c44d371b 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -19,6 +19,7 @@
 
 #define NR_MAXCACHES BITS_PER_LONG
 
+struct seq_file;
 struct fscache_cache;
 struct fscache_cache_ops;
 struct fscache_object;
@@ -151,6 +152,9 @@ struct fscache_cache_ops {
 	int (*write)(struct fscache_object *object,
 		     struct fscache_io_request *req,
 		     struct iov_iter *iter);
+
+	/* Display object info in /proc/fs/fscache/objects */
+	int (*display_object)(struct seq_file *m, struct fscache_object *object);
 };
 
 extern struct fscache_cookie fscache_fsdef_index;

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 26/32] fscache: Remove more obsolete stats
  2020-07-13 16:30 ` David Howells
@ 2020-07-13 16:35   ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:35 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Remove some more stats that have become obsolete.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/internal.h |   18 ++----------------
 fs/fscache/obj.c      |    6 +++---
 fs/fscache/stats.c    |   50 +++++++++----------------------------------------
 3 files changed, 14 insertions(+), 60 deletions(-)

diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 2674438ccafd..d2b856aa5f0e 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -165,25 +165,13 @@ extern void fscache_proc_cleanup(void);
  * stats.c
  */
 #ifdef CONFIG_FSCACHE_STATS
-extern atomic_t fscache_n_op_pend;
-extern atomic_t fscache_n_op_run;
-extern atomic_t fscache_n_op_enqueue;
-extern atomic_t fscache_n_op_deferred_release;
-extern atomic_t fscache_n_op_initialised;
-extern atomic_t fscache_n_op_release;
-extern atomic_t fscache_n_op_gc;
-extern atomic_t fscache_n_op_cancelled;
-extern atomic_t fscache_n_op_rejected;
-
 extern atomic_t fscache_n_acquires;
 extern atomic_t fscache_n_acquires_null;
 extern atomic_t fscache_n_acquires_no_cache;
 extern atomic_t fscache_n_acquires_ok;
-extern atomic_t fscache_n_acquires_nobufs;
 extern atomic_t fscache_n_acquires_oom;
 
 extern atomic_t fscache_n_invalidates;
-extern atomic_t fscache_n_invalidates_run;
 
 extern atomic_t fscache_n_updates;
 extern atomic_t fscache_n_updates_null;
@@ -202,15 +190,13 @@ extern atomic_t fscache_n_object_no_alloc;
 extern atomic_t fscache_n_object_lookups;
 extern atomic_t fscache_n_object_lookups_negative;
 extern atomic_t fscache_n_object_lookups_positive;
-extern atomic_t fscache_n_object_lookups_timed_out;
-extern atomic_t fscache_n_object_created;
+extern atomic_t fscache_n_object_creates;
 extern atomic_t fscache_n_object_avail;
 extern atomic_t fscache_n_object_dead;
 
 extern atomic_t fscache_n_cop_alloc_object;
 extern atomic_t fscache_n_cop_lookup_object;
-extern atomic_t fscache_n_cop_lookup_complete;
-extern atomic_t fscache_n_cop_grab_object;
+extern atomic_t fscache_n_cop_create_object;
 extern atomic_t fscache_n_cop_invalidate_object;
 extern atomic_t fscache_n_cop_update_object;
 extern atomic_t fscache_n_cop_drop_object;
diff --git a/fs/fscache/obj.c b/fs/fscache/obj.c
index 63373b99ac34..baab7c465142 100644
--- a/fs/fscache/obj.c
+++ b/fs/fscache/obj.c
@@ -47,10 +47,10 @@ static int fscache_do_lookup_object(struct fscache_object *object, void *data)
 static int fscache_do_create_object(struct fscache_object *object, void *data)
 {
 	int ret;
-	fscache_stat(&fscache_n_object_lookups);
-	fscache_stat(&fscache_n_cop_lookup_object);
+	fscache_stat(&fscache_n_object_creates);
+	fscache_stat(&fscache_n_cop_create_object);
 	ret = object->cache->ops->create_object(object, data);
-	fscache_stat_d(&fscache_n_cop_lookup_object);
+	fscache_stat_d(&fscache_n_cop_create_object);
 	return ret;
 }
 
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index 583817f4f113..ccca0016fd26 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -14,25 +14,13 @@
 /*
  * operation counters
  */
-atomic_t fscache_n_op_pend;
-atomic_t fscache_n_op_run;
-atomic_t fscache_n_op_enqueue;
-atomic_t fscache_n_op_deferred_release;
-atomic_t fscache_n_op_initialised;
-atomic_t fscache_n_op_release;
-atomic_t fscache_n_op_gc;
-atomic_t fscache_n_op_cancelled;
-atomic_t fscache_n_op_rejected;
-
 atomic_t fscache_n_acquires;
 atomic_t fscache_n_acquires_null;
 atomic_t fscache_n_acquires_no_cache;
 atomic_t fscache_n_acquires_ok;
-atomic_t fscache_n_acquires_nobufs;
 atomic_t fscache_n_acquires_oom;
 
 atomic_t fscache_n_invalidates;
-atomic_t fscache_n_invalidates_run;
 
 atomic_t fscache_n_updates;
 atomic_t fscache_n_updates_null;
@@ -51,15 +39,13 @@ atomic_t fscache_n_object_no_alloc;
 atomic_t fscache_n_object_lookups;
 atomic_t fscache_n_object_lookups_negative;
 atomic_t fscache_n_object_lookups_positive;
-atomic_t fscache_n_object_lookups_timed_out;
-atomic_t fscache_n_object_created;
+atomic_t fscache_n_object_creates;
 atomic_t fscache_n_object_avail;
 atomic_t fscache_n_object_dead;
 
 atomic_t fscache_n_cop_alloc_object;
 atomic_t fscache_n_cop_lookup_object;
-atomic_t fscache_n_cop_lookup_complete;
-atomic_t fscache_n_cop_grab_object;
+atomic_t fscache_n_cop_create_object;
 atomic_t fscache_n_cop_invalidate_object;
 atomic_t fscache_n_cop_update_object;
 atomic_t fscache_n_cop_drop_object;
@@ -90,25 +76,21 @@ int fscache_stats_show(struct seq_file *m, void *v)
 		   atomic_read(&fscache_n_object_avail),
 		   atomic_read(&fscache_n_object_dead));
 
-	seq_printf(m, "Acquire: n=%u nul=%u noc=%u ok=%u nbf=%u"
-		   " oom=%u\n",
+	seq_printf(m, "Acquire: n=%u nul=%u noc=%u ok=%u oom=%u\n",
 		   atomic_read(&fscache_n_acquires),
 		   atomic_read(&fscache_n_acquires_null),
 		   atomic_read(&fscache_n_acquires_no_cache),
 		   atomic_read(&fscache_n_acquires_ok),
-		   atomic_read(&fscache_n_acquires_nobufs),
 		   atomic_read(&fscache_n_acquires_oom));
 
-	seq_printf(m, "Lookups: n=%u neg=%u pos=%u crt=%u tmo=%u\n",
+	seq_printf(m, "Lookups: n=%u neg=%u pos=%u crt=%u\n",
 		   atomic_read(&fscache_n_object_lookups),
 		   atomic_read(&fscache_n_object_lookups_negative),
 		   atomic_read(&fscache_n_object_lookups_positive),
-		   atomic_read(&fscache_n_object_created),
-		   atomic_read(&fscache_n_object_lookups_timed_out));
+		   atomic_read(&fscache_n_object_creates));
 
-	seq_printf(m, "Invals : n=%u run=%u\n",
-		   atomic_read(&fscache_n_invalidates),
-		   atomic_read(&fscache_n_invalidates_run));
+	seq_printf(m, "Invals : n=%u\n",
+		   atomic_read(&fscache_n_invalidates));
 
 	seq_printf(m, "Updates: n=%u nul=%u run=%u\n",
 		   atomic_read(&fscache_n_updates),
@@ -120,23 +102,9 @@ int fscache_stats_show(struct seq_file *m, void *v)
 		   atomic_read(&fscache_n_relinquishes_null),
 		   atomic_read(&fscache_n_relinquishes_retire));
 
-	seq_printf(m, "Ops    : pend=%u run=%u enq=%u can=%u rej=%u\n",
-		   atomic_read(&fscache_n_op_pend),
-		   atomic_read(&fscache_n_op_run),
-		   atomic_read(&fscache_n_op_enqueue),
-		   atomic_read(&fscache_n_op_cancelled),
-		   atomic_read(&fscache_n_op_rejected));
-	seq_printf(m, "Ops    : ini=%u dfr=%u rel=%u gc=%u\n",
-		   atomic_read(&fscache_n_op_initialised),
-		   atomic_read(&fscache_n_op_deferred_release),
-		   atomic_read(&fscache_n_op_release),
-		   atomic_read(&fscache_n_op_gc));
-
-	seq_printf(m, "CacheOp: alo=%d luo=%d luc=%d gro=%d\n",
+	seq_printf(m, "CacheOp: alo=%d luo=%d\n",
 		   atomic_read(&fscache_n_cop_alloc_object),
-		   atomic_read(&fscache_n_cop_lookup_object),
-		   atomic_read(&fscache_n_cop_lookup_complete),
-		   atomic_read(&fscache_n_cop_grab_object));
+		   atomic_read(&fscache_n_cop_lookup_object));
 	seq_printf(m, "CacheOp: inv=%d upo=%d dro=%d pto=%d atc=%d syn=%d\n",
 		   atomic_read(&fscache_n_cop_invalidate_object),
 		   atomic_read(&fscache_n_cop_update_object),



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 26/32] fscache: Remove more obsolete stats
@ 2020-07-13 16:35   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:35 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs

Remove some more stats that have become obsolete.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/internal.h |   18 ++----------------
 fs/fscache/obj.c      |    6 +++---
 fs/fscache/stats.c    |   50 +++++++++----------------------------------------
 3 files changed, 14 insertions(+), 60 deletions(-)

diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 2674438ccafd..d2b856aa5f0e 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -165,25 +165,13 @@ extern void fscache_proc_cleanup(void);
  * stats.c
  */
 #ifdef CONFIG_FSCACHE_STATS
-extern atomic_t fscache_n_op_pend;
-extern atomic_t fscache_n_op_run;
-extern atomic_t fscache_n_op_enqueue;
-extern atomic_t fscache_n_op_deferred_release;
-extern atomic_t fscache_n_op_initialised;
-extern atomic_t fscache_n_op_release;
-extern atomic_t fscache_n_op_gc;
-extern atomic_t fscache_n_op_cancelled;
-extern atomic_t fscache_n_op_rejected;
-
 extern atomic_t fscache_n_acquires;
 extern atomic_t fscache_n_acquires_null;
 extern atomic_t fscache_n_acquires_no_cache;
 extern atomic_t fscache_n_acquires_ok;
-extern atomic_t fscache_n_acquires_nobufs;
 extern atomic_t fscache_n_acquires_oom;
 
 extern atomic_t fscache_n_invalidates;
-extern atomic_t fscache_n_invalidates_run;
 
 extern atomic_t fscache_n_updates;
 extern atomic_t fscache_n_updates_null;
@@ -202,15 +190,13 @@ extern atomic_t fscache_n_object_no_alloc;
 extern atomic_t fscache_n_object_lookups;
 extern atomic_t fscache_n_object_lookups_negative;
 extern atomic_t fscache_n_object_lookups_positive;
-extern atomic_t fscache_n_object_lookups_timed_out;
-extern atomic_t fscache_n_object_created;
+extern atomic_t fscache_n_object_creates;
 extern atomic_t fscache_n_object_avail;
 extern atomic_t fscache_n_object_dead;
 
 extern atomic_t fscache_n_cop_alloc_object;
 extern atomic_t fscache_n_cop_lookup_object;
-extern atomic_t fscache_n_cop_lookup_complete;
-extern atomic_t fscache_n_cop_grab_object;
+extern atomic_t fscache_n_cop_create_object;
 extern atomic_t fscache_n_cop_invalidate_object;
 extern atomic_t fscache_n_cop_update_object;
 extern atomic_t fscache_n_cop_drop_object;
diff --git a/fs/fscache/obj.c b/fs/fscache/obj.c
index 63373b99ac34..baab7c465142 100644
--- a/fs/fscache/obj.c
+++ b/fs/fscache/obj.c
@@ -47,10 +47,10 @@ static int fscache_do_lookup_object(struct fscache_object *object, void *data)
 static int fscache_do_create_object(struct fscache_object *object, void *data)
 {
 	int ret;
-	fscache_stat(&fscache_n_object_lookups);
-	fscache_stat(&fscache_n_cop_lookup_object);
+	fscache_stat(&fscache_n_object_creates);
+	fscache_stat(&fscache_n_cop_create_object);
 	ret = object->cache->ops->create_object(object, data);
-	fscache_stat_d(&fscache_n_cop_lookup_object);
+	fscache_stat_d(&fscache_n_cop_create_object);
 	return ret;
 }
 
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index 583817f4f113..ccca0016fd26 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -14,25 +14,13 @@
 /*
  * operation counters
  */
-atomic_t fscache_n_op_pend;
-atomic_t fscache_n_op_run;
-atomic_t fscache_n_op_enqueue;
-atomic_t fscache_n_op_deferred_release;
-atomic_t fscache_n_op_initialised;
-atomic_t fscache_n_op_release;
-atomic_t fscache_n_op_gc;
-atomic_t fscache_n_op_cancelled;
-atomic_t fscache_n_op_rejected;
-
 atomic_t fscache_n_acquires;
 atomic_t fscache_n_acquires_null;
 atomic_t fscache_n_acquires_no_cache;
 atomic_t fscache_n_acquires_ok;
-atomic_t fscache_n_acquires_nobufs;
 atomic_t fscache_n_acquires_oom;
 
 atomic_t fscache_n_invalidates;
-atomic_t fscache_n_invalidates_run;
 
 atomic_t fscache_n_updates;
 atomic_t fscache_n_updates_null;
@@ -51,15 +39,13 @@ atomic_t fscache_n_object_no_alloc;
 atomic_t fscache_n_object_lookups;
 atomic_t fscache_n_object_lookups_negative;
 atomic_t fscache_n_object_lookups_positive;
-atomic_t fscache_n_object_lookups_timed_out;
-atomic_t fscache_n_object_created;
+atomic_t fscache_n_object_creates;
 atomic_t fscache_n_object_avail;
 atomic_t fscache_n_object_dead;
 
 atomic_t fscache_n_cop_alloc_object;
 atomic_t fscache_n_cop_lookup_object;
-atomic_t fscache_n_cop_lookup_complete;
-atomic_t fscache_n_cop_grab_object;
+atomic_t fscache_n_cop_create_object;
 atomic_t fscache_n_cop_invalidate_object;
 atomic_t fscache_n_cop_update_object;
 atomic_t fscache_n_cop_drop_object;
@@ -90,25 +76,21 @@ int fscache_stats_show(struct seq_file *m, void *v)
 		   atomic_read(&fscache_n_object_avail),
 		   atomic_read(&fscache_n_object_dead));
 
-	seq_printf(m, "Acquire: n=%u nul=%u noc=%u ok=%u nbf=%u"
-		   " oom=%u\n",
+	seq_printf(m, "Acquire: n=%u nul=%u noc=%u ok=%u oom=%u\n",
 		   atomic_read(&fscache_n_acquires),
 		   atomic_read(&fscache_n_acquires_null),
 		   atomic_read(&fscache_n_acquires_no_cache),
 		   atomic_read(&fscache_n_acquires_ok),
-		   atomic_read(&fscache_n_acquires_nobufs),
 		   atomic_read(&fscache_n_acquires_oom));
 
-	seq_printf(m, "Lookups: n=%u neg=%u pos=%u crt=%u tmo=%u\n",
+	seq_printf(m, "Lookups: n=%u neg=%u pos=%u crt=%u\n",
 		   atomic_read(&fscache_n_object_lookups),
 		   atomic_read(&fscache_n_object_lookups_negative),
 		   atomic_read(&fscache_n_object_lookups_positive),
-		   atomic_read(&fscache_n_object_created),
-		   atomic_read(&fscache_n_object_lookups_timed_out));
+		   atomic_read(&fscache_n_object_creates));
 
-	seq_printf(m, "Invals : n=%u run=%u\n",
-		   atomic_read(&fscache_n_invalidates),
-		   atomic_read(&fscache_n_invalidates_run));
+	seq_printf(m, "Invals : n=%u\n",
+		   atomic_read(&fscache_n_invalidates));
 
 	seq_printf(m, "Updates: n=%u nul=%u run=%u\n",
 		   atomic_read(&fscache_n_updates),
@@ -120,23 +102,9 @@ int fscache_stats_show(struct seq_file *m, void *v)
 		   atomic_read(&fscache_n_relinquishes_null),
 		   atomic_read(&fscache_n_relinquishes_retire));
 
-	seq_printf(m, "Ops    : pend=%u run=%u enq=%u can=%u rej=%u\n",
-		   atomic_read(&fscache_n_op_pend),
-		   atomic_read(&fscache_n_op_run),
-		   atomic_read(&fscache_n_op_enqueue),
-		   atomic_read(&fscache_n_op_cancelled),
-		   atomic_read(&fscache_n_op_rejected));
-	seq_printf(m, "Ops    : ini=%u dfr=%u rel=%u gc=%u\n",
-		   atomic_read(&fscache_n_op_initialised),
-		   atomic_read(&fscache_n_op_deferred_release),
-		   atomic_read(&fscache_n_op_release),
-		   atomic_read(&fscache_n_op_gc));
-
-	seq_printf(m, "CacheOp: alo=%d luo=%d luc=%d gro=%d\n",
+	seq_printf(m, "CacheOp: alo=%d luo=%d\n",
 		   atomic_read(&fscache_n_cop_alloc_object),
-		   atomic_read(&fscache_n_cop_lookup_object),
-		   atomic_read(&fscache_n_cop_lookup_complete),
-		   atomic_read(&fscache_n_cop_grab_object));
+		   atomic_read(&fscache_n_cop_lookup_object));
 	seq_printf(m, "CacheOp: inv=%d upo=%d dro=%d pto=%d atc=%d syn=%d\n",
 		   atomic_read(&fscache_n_cop_invalidate_object),
 		   atomic_read(&fscache_n_cop_update_object),

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 27/32] fscache: New stats
  2020-07-13 16:30 ` David Howells
                   ` (26 preceding siblings ...)
  (?)
@ 2020-07-13 16:35 ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:35 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Create some new stat counters appropriate to the new routines and display
them in /proc/fs/fscache/stats.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/dispatcher.c  |    6 ++++
 fs/fscache/internal.h    |   25 +++++++++++++++++
 fs/fscache/io.c          |    2 +
 fs/fscache/read_helper.c |   38 +++++++++++++++++++++++--
 fs/fscache/stats.c       |   69 ++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 137 insertions(+), 3 deletions(-)

diff --git a/fs/fscache/dispatcher.c b/fs/fscache/dispatcher.c
index fba71b99c951..489b8ab8cccd 100644
--- a/fs/fscache/dispatcher.c
+++ b/fs/fscache/dispatcher.c
@@ -41,6 +41,8 @@ void fscache_dispatch(struct fscache_cookie *cookie,
 	struct fscache_work *work;
 	bool queued = false;
 
+	fscache_stat(&fscache_n_dispatch_count);
+
 	work = kzalloc(sizeof(struct fscache_work), GFP_KERNEL);
 	if (work) {
 		work->cookie = cookie;
@@ -57,10 +59,13 @@ void fscache_dispatch(struct fscache_cookie *cookie,
 			queued = true;
 		}
 		spin_unlock(&fscache_work_lock);
+		if (queued)
+			fscache_stat(&fscache_n_dispatch_deferred);
 	}
 
 	if (!queued) {
 		kfree(work);
+		fscache_stat(&fscache_n_dispatch_inline);
 		func(cookie, object, param);
 	}
 }
@@ -86,6 +91,7 @@ static int fscache_dispatcher(void *data)
 
 			if (work) {
 				work->func(work->cookie, work->object, work->param);
+				fscache_stat(&fscache_n_dispatch_in_pool);
 				fscache_cookie_put(work->cookie, fscache_cookie_put_work);
 				kfree(work);
 			}
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index d2b856aa5f0e..d9391d3974d1 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -209,6 +209,30 @@ extern atomic_t fscache_n_cache_stale_objects;
 extern atomic_t fscache_n_cache_retired_objects;
 extern atomic_t fscache_n_cache_culled_objects;
 
+extern atomic_t fscache_n_dispatch_count;
+extern atomic_t fscache_n_dispatch_deferred;
+extern atomic_t fscache_n_dispatch_inline;
+extern atomic_t fscache_n_dispatch_in_pool;
+
+extern atomic_t fscache_n_read;
+extern atomic_t fscache_n_write;
+
+extern atomic_t fscache_n_read_helper;
+extern atomic_t fscache_n_read_helper_stop_nomem;
+extern atomic_t fscache_n_read_helper_stop_uptodate;
+extern atomic_t fscache_n_read_helper_stop_exist;
+extern atomic_t fscache_n_read_helper_stop_kill;
+extern atomic_t fscache_n_read_helper_read;
+extern atomic_t fscache_n_read_helper_download;
+extern atomic_t fscache_n_read_helper_zero;
+extern atomic_t fscache_n_read_helper_beyond_eof;
+extern atomic_t fscache_n_read_helper_reissue;
+extern atomic_t fscache_n_read_helper_read_done;
+extern atomic_t fscache_n_read_helper_read_failed;
+extern atomic_t fscache_n_read_helper_copy;
+extern atomic_t fscache_n_read_helper_copy_done;
+extern atomic_t fscache_n_read_helper_copy_failed;
+
 static inline void fscache_stat(atomic_t *stat)
 {
 	atomic_inc(stat);
@@ -256,6 +280,7 @@ void fscache_update_aux(struct fscache_cookie *cookie,
 
 #else /* CONFIG_FSCACHE */
 
+#define fscache_stat(stat) do {} while(0)
 #define fscache_op_wq system_wq
 
 #endif /* CONFIG_FSCACHE */
diff --git a/fs/fscache/io.c b/fs/fscache/io.c
index 8d7f79551699..d38101d77d27 100644
--- a/fs/fscache/io.c
+++ b/fs/fscache/io.c
@@ -138,6 +138,7 @@ int __fscache_read(struct fscache_io_request *req, struct iov_iter *iter)
 		fscache_begin_io_operation(req->cookie, FSCACHE_WANT_READ, req);
 
 	if (!IS_ERR(object)) {
+		fscache_stat(&fscache_n_read);
 		req->object = object;
 		return object->cache->ops->read(object, req, iter);
 	} else {
@@ -158,6 +159,7 @@ int __fscache_write(struct fscache_io_request *req, struct iov_iter *iter)
 		fscache_begin_io_operation(req->cookie, FSCACHE_WANT_WRITE, req);
 
 	if (!IS_ERR(object)) {
+		fscache_stat(&fscache_n_write);
 		req->object = object;
 		return object->cache->ops->write(object, req, iter);
 	} else {
diff --git a/fs/fscache/read_helper.c b/fs/fscache/read_helper.c
index 62fed27aa938..227b326a54e2 100644
--- a/fs/fscache/read_helper.c
+++ b/fs/fscache/read_helper.c
@@ -68,6 +68,11 @@ static void fscache_read_copy_done(struct fscache_io_request *req)
 
 	_enter("%lx,%x,%llx", index, req->nr_pages, req->transferred);
 
+	if (req->error == 0)
+		fscache_stat(&fscache_n_read_helper_copy_done);
+	else
+		fscache_stat(&fscache_n_read_helper_copy_failed);
+
 	/* Clear PG_fscache on the pages that were being written out. */
 	rcu_read_lock();
 	xas_for_each(&xas, page, last) {
@@ -90,6 +95,8 @@ static void fscache_do_read_copy_to_cache(struct work_struct *work)
 
 	_enter("");
 
+	fscache_stat(&fscache_n_read_helper_copy);
+
 	iov_iter_mapping(&iter, WRITE, req->mapping, req->pos,
 			 round_up(req->len, req->dio_block_size));
 
@@ -142,6 +149,11 @@ static void fscache_read_done(struct fscache_io_request *req)
 	_enter("%lx,%x,%llx,%d",
 	       start, req->nr_pages, req->transferred, req->error);
 
+	if (req->error == 0)
+		fscache_stat(&fscache_n_read_helper_read_done);
+	else
+		fscache_stat(&fscache_n_read_helper_read_failed);
+
 	if (req->transferred < req->len)
 		fscache_clear_unread(req);
 
@@ -195,6 +207,7 @@ static void fscache_file_read_maybe_reissue(struct fscache_io_request *req)
 	if (req->error == 0) {
 		fscache_read_done(req);
 	} else {
+		fscache_stat(&fscache_n_read_helper_reissue);
 		INIT_WORK(&req->work, fscache_reissue_read);
 		fscache_get_io_request(req);
 		queue_work(fscache_op_wq, &req->work);
@@ -279,6 +292,8 @@ static int fscache_read_helper(struct fscache_io_request *req,
 	loff_t new_size;
 	int ret;
 
+	fscache_stat(&fscache_n_read_helper);
+
 	shape.granularity	= 1;
 	shape.max_io_pages	= max_pages;
 	shape.i_size		= i_size_read(mapping->host);
@@ -341,8 +356,10 @@ static int fscache_read_helper(struct fscache_io_request *req,
 		while (cursor < shape.proposed_start) {
 			page = find_or_create_page(mapping, cursor,
 						   readahead_gfp_mask(mapping));
-			if (!page)
+			if (!page) {
+				fscache_stat(&fscache_n_read_helper_stop_nomem);
 				goto nomem;
+			}
 			if (!PageUptodate(page)) {
 				req->nr_pages++; /* Add to the reading list */
 				cursor++;
@@ -355,6 +372,7 @@ static int fscache_read_helper(struct fscache_io_request *req,
 			 */
 			notes |= FSCACHE_RHLP_NOTE_U2D_IN_PREFACE;
 			notes &= ~FSCACHE_RHLP_NOTE_DO_WRITE_TO_CACHE;
+			fscache_stat(&fscache_n_read_helper_stop_uptodate);
 			fscache_ignore_pages(mapping, start, cursor + 1);
 			start = cursor = shape.proposed_start;
 			req->nr_pages = 0;
@@ -378,18 +396,23 @@ static int fscache_read_helper(struct fscache_io_request *req,
 			_debug("prewrite req %lx", cursor);
 			page = *requested_page;
 			ret = -ERESTARTSYS;
-			if (lock_page_killable(page) < 0)
+			if (lock_page_killable(page) < 0) {
+				fscache_stat(&fscache_n_read_helper_stop_kill);
 				goto dont;
+			}
 		} else {
 			_debug("prewrite new %lx %lx", cursor, eof);
 			page = grab_cache_page_write_begin(mapping, shape.proposed_start,
 							   aop_flags);
-			if (!page)
+			if (!page) {
+				fscache_stat(&fscache_n_read_helper_stop_nomem);
 				goto nomem;
+			}
 			*requested_page = page;
 		}
 
 		if (PageUptodate(page)) {
+			fscache_stat(&fscache_n_read_helper_stop_uptodate);
 			notes |= FSCACHE_RHLP_NOTE_LIST_U2D;
 
 			trace_fscache_read_helper(req->cookie,
@@ -450,12 +473,14 @@ static int fscache_read_helper(struct fscache_io_request *req,
 							   readahead_gfp_mask(mapping));
 				if (!page) {
 					notes |= FSCACHE_RHLP_NOTE_LIST_NOMEM;
+					fscache_stat(&fscache_n_read_helper_stop_nomem);
 					goto stop;
 				}
 
 				if (PageUptodate(page)) {
 					unlock_page(page);
 					put_page(page); /* Avoid overwriting */
+					fscache_stat(&fscache_n_read_helper_stop_exist);
 					ret = 0;
 					notes |= FSCACHE_RHLP_NOTE_LIST_U2D;
 					goto stop;
@@ -468,6 +493,7 @@ static int fscache_read_helper(struct fscache_io_request *req,
 			default:
 				_debug("add fail %lx %d", cursor, ret);
 				put_page(page);
+				fscache_stat(&fscache_n_read_helper_stop_nomem);
 				page = NULL;
 				notes |= FSCACHE_RHLP_NOTE_LIST_ERROR;
 				goto stop;
@@ -500,12 +526,14 @@ static int fscache_read_helper(struct fscache_io_request *req,
 						   readahead_gfp_mask(mapping));
 			if (!page) {
 				notes |= FSCACHE_RHLP_NOTE_TRAILER_NOMEM;
+				fscache_stat(&fscache_n_read_helper_stop_nomem);
 				goto stop;
 			}
 			if (PageUptodate(page)) {
 				unlock_page(page);
 				put_page(page); /* Avoid overwriting */
 				notes |= FSCACHE_RHLP_NOTE_TRAILER_U2D;
+				fscache_stat(&fscache_n_read_helper_stop_uptodate);
 				goto stop;
 			}
 
@@ -587,18 +615,22 @@ static int fscache_read_helper(struct fscache_io_request *req,
 		 * the pages.
 		 */
 		_debug("SKIP READ: %llu", req->len);
+		fscache_stat(&fscache_n_read_helper_beyond_eof);
 		fscache_read_done(req);
 		break;
 	case fscache_read_helper_zero:
 		_debug("ZERO READ: %llu", req->len);
+		fscache_stat(&fscache_n_read_helper_zero);
 		fscache_read_done(req);
 		break;
 	case fscache_read_helper_read:
+		fscache_stat(&fscache_n_read_helper_read);
 		req->io_done = fscache_file_read_maybe_reissue;
 		fscache_read_from_cache(req);
 		break;
 	case fscache_read_helper_download:
 		_debug("DOWNLOAD: %llu", req->len);
+		fscache_stat(&fscache_n_read_helper_download);
 		req->io_done = fscache_read_done;
 		fscache_read_from_server(req);
 		break;
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index ccca0016fd26..63fb4d831f4d 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -58,6 +58,46 @@ atomic_t fscache_n_cache_stale_objects;
 atomic_t fscache_n_cache_retired_objects;
 atomic_t fscache_n_cache_culled_objects;
 
+atomic_t fscache_n_dispatch_count;
+atomic_t fscache_n_dispatch_deferred;
+atomic_t fscache_n_dispatch_inline;
+atomic_t fscache_n_dispatch_in_pool;
+
+atomic_t fscache_n_read;
+atomic_t fscache_n_write;
+
+atomic_t fscache_n_read_helper;
+atomic_t fscache_n_read_helper_stop_nomem;
+atomic_t fscache_n_read_helper_stop_uptodate;
+atomic_t fscache_n_read_helper_stop_exist;
+atomic_t fscache_n_read_helper_stop_kill;
+atomic_t fscache_n_read_helper_read;
+atomic_t fscache_n_read_helper_download;
+atomic_t fscache_n_read_helper_zero;
+atomic_t fscache_n_read_helper_beyond_eof;
+atomic_t fscache_n_read_helper_reissue;
+atomic_t fscache_n_read_helper_read_done;
+atomic_t fscache_n_read_helper_read_failed;
+atomic_t fscache_n_read_helper_copy;
+atomic_t fscache_n_read_helper_copy_done;
+atomic_t fscache_n_read_helper_copy_failed;
+
+EXPORT_SYMBOL(fscache_n_read_helper);
+EXPORT_SYMBOL(fscache_n_read_helper_stop_nomem);
+EXPORT_SYMBOL(fscache_n_read_helper_stop_uptodate);
+EXPORT_SYMBOL(fscache_n_read_helper_stop_exist);
+EXPORT_SYMBOL(fscache_n_read_helper_stop_kill);
+EXPORT_SYMBOL(fscache_n_read_helper_read);
+EXPORT_SYMBOL(fscache_n_read_helper_download);
+EXPORT_SYMBOL(fscache_n_read_helper_zero);
+EXPORT_SYMBOL(fscache_n_read_helper_beyond_eof);
+EXPORT_SYMBOL(fscache_n_read_helper_reissue);
+EXPORT_SYMBOL(fscache_n_read_helper_read_done);
+EXPORT_SYMBOL(fscache_n_read_helper_read_failed);
+EXPORT_SYMBOL(fscache_n_read_helper_copy);
+EXPORT_SYMBOL(fscache_n_read_helper_copy_done);
+EXPORT_SYMBOL(fscache_n_read_helper_copy_failed);
+
 /*
  * display the general statistics
  */
@@ -117,5 +157,34 @@ int fscache_stats_show(struct seq_file *m, void *v)
 		   atomic_read(&fscache_n_cache_stale_objects),
 		   atomic_read(&fscache_n_cache_retired_objects),
 		   atomic_read(&fscache_n_cache_culled_objects));
+
+	seq_printf(m, "Disp   : n=%u il=%u df=%u pl=%u\n",
+		   atomic_read(&fscache_n_dispatch_count),
+		   atomic_read(&fscache_n_dispatch_inline),
+		   atomic_read(&fscache_n_dispatch_deferred),
+		   atomic_read(&fscache_n_dispatch_in_pool));
+
+	seq_printf(m, "IO     : rd=%u wr=%u\n",
+		   atomic_read(&fscache_n_read),
+		   atomic_read(&fscache_n_write));
+
+	seq_printf(m, "RdHelp : nm=%u ud=%u ex=%u kl=%u\n",
+		   atomic_read(&fscache_n_read_helper_stop_nomem),
+		   atomic_read(&fscache_n_read_helper_stop_uptodate),
+		   atomic_read(&fscache_n_read_helper_stop_exist),
+		   atomic_read(&fscache_n_read_helper_stop_kill));
+	seq_printf(m, "RdHelp : n=%u rd=%u dl=%u zr=%u eo=%u\n",
+		   atomic_read(&fscache_n_read_helper),
+		   atomic_read(&fscache_n_read_helper_read),
+		   atomic_read(&fscache_n_read_helper_download),
+		   atomic_read(&fscache_n_read_helper_zero),
+		   atomic_read(&fscache_n_read_helper_beyond_eof));
+	seq_printf(m, "RdHelp : ri=%u dn=%u fl=%u cp=%u cd=%u cf=%u\n",
+		   atomic_read(&fscache_n_read_helper_reissue),
+		   atomic_read(&fscache_n_read_helper_read_done),
+		   atomic_read(&fscache_n_read_helper_read_failed),
+		   atomic_read(&fscache_n_read_helper_copy),
+		   atomic_read(&fscache_n_read_helper_copy_done),
+		   atomic_read(&fscache_n_read_helper_copy_failed));
 	return 0;
 }



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 28/32] fscache, cachefiles: Rewrite invalidation
  2020-07-13 16:30 ` David Howells
@ 2020-07-13 16:35   ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:35 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Rewrite the cache object invalidation code in fscache and cachefiles.  The
following changes are made to fscache:

 (1) Invalidation is now ignored or allowed to proceed depending on the
     'stage' a non-index cookie is in with respect to the backing object.

 (2) If invalidation is proceeds, it pins the object and holds an operation
     count  for the duration.

 (3) The fscache_object struct is given an invalidation counter that is
     incremented any time fscache_invalidate() is called, even if the
     cookie is at a stage in which it cannot be applied.  The counter,
     however, can be noted and applied retroactively later.

 (4) The invalidation counter is noted in the operation struct when a cache
     operation is begun and can be checked on operation completion to find
     out if any consequent metadata changes should be dropped.

 (5) New operations aren't allowed to proceed if the object is being
     invalidated.

and to cachefiles:

 (1) If an open object is invalidated, the open backing file is replaced
     with a tmpfile (as if opened O_TMPFILE).  This is held unlinked until
     the object released from memory, at which point the file is simply
     abandoned if it was retired or the old file is unlinked and the new
     one linked into its place.

     Note: This would be easier if linkat() could be given a flag to
     indicate the destination should be overwritten or if RENAME_EXCHANGE
     could be applied to tmpfiles, effectively unlinking the destination.

 (2) Upon invalidation, the content map is replaced with a blank one.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/inode.c                |    8 ++-
 fs/cachefiles/content-map.c   |   32 ++++++++++
 fs/cachefiles/interface.c     |  130 ++++++++++++++++++++++++++++++++++-------
 fs/cachefiles/internal.h      |    9 ++-
 fs/cachefiles/namei.c         |   69 ++++++++++++++++++++--
 fs/cachefiles/xattr.c         |    6 +-
 fs/fscache/cookie.c           |   47 +++++++++++++--
 fs/fscache/io.c               |    2 +
 fs/fscache/obj.c              |   31 +++-------
 include/linux/fscache-cache.h |    5 +-
 include/linux/fscache.h       |   15 +++--
 11 files changed, 283 insertions(+), 71 deletions(-)

diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index b0772e64a844..eab191b9c01d 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -569,7 +569,13 @@ static void afs_zap_data(struct afs_vnode *vnode)
 	_enter("{%llx:%llu}", vnode->fid.vid, vnode->fid.vnode);
 
 #ifdef CONFIG_AFS_FSCACHE
-	fscache_invalidate(vnode->cache, i_size_read(&vnode->vfs_inode));
+	{
+		struct afs_vnode_cache_aux aux = {
+			.data_version = vnode->status.data_version,
+		};
+		fscache_invalidate(afs_vnode_cache(vnode), &aux,
+				   i_size_read(&vnode->vfs_inode), 0);
+	}
 #endif
 
 	/* nuke all the non-dirty pages that aren't locked, mapped or being
diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
index f2a10e8d8d6d..3e310fd58497 100644
--- a/fs/cachefiles/content-map.c
+++ b/fs/cachefiles/content-map.c
@@ -192,6 +192,34 @@ void cachefiles_shape_request(struct fscache_object *obj,
 	       shape->to_be_done, shape->actual_start, shape->actual_nr_pages);
 }
 
+/*
+ * Allocate a new content map.
+ */
+u8 *cachefiles_new_content_map(struct cachefiles_object *object,
+			       unsigned int *_size)
+{
+	size_t size;
+	u8 *map = NULL;
+
+	_enter("");
+
+	if (!(object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK)) {
+		/* Single-chunk object.  The presence or absence of the content
+		 * map xattr is sufficient indication.
+		 */
+		*_size = 0;
+		return NULL;
+	}
+
+	/* Granular object. */
+	size = cachefiles_map_size(object->fscache.cookie->object_size);
+	map = kzalloc(size, GFP_KERNEL);
+	if (!map)
+		return ERR_PTR(-ENOMEM);
+	*_size = size;
+	return map;
+}
+
 /*
  * Mark the content map to indicate stored granule.
  */
@@ -205,7 +233,9 @@ void cachefiles_mark_content_map(struct fscache_io_request *req)
 
 	read_lock_bh(&object->content_map_lock);
 
-	if (object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK) {
+	if (req->inval_counter != object->fscache.inval_counter) {
+		_debug("inval mark");
+	} else if (object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK) {
 		if (pos == 0) {
 			object->content_info = CACHEFILES_CONTENT_SINGLE;
 			set_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->fscache.flags);
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 78180d269c5f..76f3a89d3e6c 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -203,7 +203,7 @@ static void cachefiles_update_object(struct fscache_object *_object)
 		}
 	}
 
-	cachefiles_set_object_xattr(object, XATTR_REPLACE);
+	cachefiles_set_object_xattr(object);
 
 out:
 	cachefiles_end_secure(cache, saved_cred);
@@ -213,11 +213,15 @@ static void cachefiles_update_object(struct fscache_object *_object)
 /*
  * Commit changes to the object as we drop it.
  */
-static void cachefiles_commit_object(struct cachefiles_object *object,
+static bool cachefiles_commit_object(struct cachefiles_object *object,
 				     struct cachefiles_cache *cache)
 {
 	if (object->content_map_changed)
 		cachefiles_save_content_map(object);
+
+	if (test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags))
+		return cachefiles_commit_tmpfile(cache, object);
+	return true;
 }
 
 /*
@@ -424,47 +428,127 @@ static int cachefiles_attr_changed(struct cachefiles_object *object)
 }
 
 /*
- * Invalidate an object
+ * Create a temporary file and leave it unattached and un-xattr'd until the
+ * time comes to discard the object from memory.
  */
-static void cachefiles_invalidate_object(struct fscache_object *_object)
+static struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
 {
-	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
 	const struct cred *saved_cred;
+	struct file *file;
 	struct path path;
 	uint64_t ni_size;
-	int ret;
+	long ret;
 
-	object = container_of(_object, struct cachefiles_object, fscache);
 	cache = container_of(object->fscache.cache,
 			     struct cachefiles_cache, cache);
 
 	ni_size = object->fscache.cookie->object_size;
 	ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
 
+	cachefiles_begin_secure(cache, &saved_cred);
+
+	path.mnt = cache->mnt;
+	path.dentry = vfs_tmpfile(cache->graveyard, S_IFREG, O_RDWR);
+	if (IS_ERR(path.dentry)) {
+		if (PTR_ERR(path.dentry) == -EIO)
+			cachefiles_io_error_obj(object, "Failed to create tmpfile");
+		file = ERR_CAST(path.dentry);
+		goto out;
+	}
+
+	trace_cachefiles_tmpfile(object, d_inode(path.dentry));
+
+	if (ni_size > 0) {
+		trace_cachefiles_trunc(object, d_inode(path.dentry), 0, ni_size);
+		ret = vfs_truncate(&path, ni_size);
+		if (ret < 0) {
+			file = ERR_PTR(ret);
+			goto out_dput;
+		}
+	}
+
+	file = open_with_fake_path(&path,
+				   O_RDWR | O_LARGEFILE | O_DIRECT,
+				   d_backing_inode(path.dentry),
+				   cache->cache_cred);
+out_dput:
+	dput(path.dentry);
+out:
+	cachefiles_end_secure(cache, saved_cred);
+	return file;
+}
+
+/*
+ * Invalidate an object
+ */
+static bool cachefiles_invalidate_object(struct fscache_object *_object,
+					 unsigned int flags)
+{
+	struct cachefiles_object *object;
+	struct file *file, *old_file;
+	u8 *map, *old_map;
+	unsigned int map_size;
+
+	object = container_of(_object, struct cachefiles_object, fscache);
+
 	_enter("{OBJ%x},[%llu]",
-	       object->fscache.debug_id, (unsigned long long)ni_size);
+	       object->fscache.debug_id, _object->cookie->object_size);
+
+	if ((flags & FSCACHE_INVAL_LIGHT) &&
+	    test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags)) {
+		_leave(" = t [light]");
+		return true;
+	}
 
 	if (object->dentry) {
 		ASSERT(d_is_reg(object->dentry));
 
-		path.dentry = object->dentry;
-		path.mnt = cache->mnt;
-
-		cachefiles_begin_secure(cache, &saved_cred);
-		ret = vfs_truncate(&path, 0);
-		if (ret == 0)
-			ret = vfs_truncate(&path, ni_size);
-		cachefiles_end_secure(cache, saved_cred);
-
-		if (ret != 0) {
-			if (ret == -EIO)
-				cachefiles_io_error_obj(object,
-							"Invalidate failed");
-		}
+		file = cachefiles_create_tmpfile(object);
+		if (IS_ERR(file))
+			goto failed;
+
+		map = cachefiles_new_content_map(object, &map_size);
+		if (IS_ERR(map))
+			goto failed_fput;
+
+		/* Substitute the VFS target */
+		_debug("sub");
+		dget(file->f_path.dentry); /* Do outside of content_map_lock */
+		spin_lock(&object->fscache.lock);
+		write_lock_bh(&object->content_map_lock);
+
+		if (!object->old)
+			/* Save the dentry carrying the path information */
+			object->old = object->dentry;
+
+		old_file = object->backing_file;
+		old_map = object->content_map;
+		object->backing_file = file;
+		object->dentry = file->f_path.dentry;
+		object->content_info = CACHEFILES_CONTENT_NO_DATA;
+		object->content_map = map;
+		object->content_map_size = map_size;
+		object->content_map_changed = true;
+		set_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags);
+		set_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->fscache.flags);
+
+		write_unlock_bh(&object->content_map_lock);
+		spin_unlock(&object->fscache.lock);
+		_debug("subbed");
+
+		kfree(old_map);
+		fput(old_file);
 	}
 
-	_leave("");
+	_leave(" = t [tmpfile]");
+	return true;
+
+failed_fput:
+	fput(file);
+failed:
+	_leave(" = f");
+	return false;
 }
 
 static unsigned int cachefiles_get_object_usage(const struct fscache_object *_object)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index c91a9b3c5bd5..ba60fc9dda0a 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -54,6 +54,8 @@ struct cachefiles_object {
 	struct file			*backing_file;	/* File open on backing storage */
 	loff_t				i_size;		/* object size */
 	atomic_t			usage;		/* object usage count */
+	unsigned long			flags;
+#define CACHEFILES_OBJECT_USING_TMPFILE	0		/* Object has a tmpfile that need linking */
 	uint8_t				type;		/* object type */
 	bool				new;		/* T if object new */
 
@@ -127,6 +129,7 @@ extern void cachefiles_daemon_unbind(struct cachefiles_cache *cache);
  */
 extern void cachefiles_shape_request(struct fscache_object *object,
 				     struct fscache_request_shape *shape);
+extern u8 *cachefiles_new_content_map(struct cachefiles_object *object, unsigned int *_size);
 extern void cachefiles_mark_content_map(struct fscache_io_request *req);
 extern void cachefiles_expand_content_map(struct cachefiles_object *object, loff_t size);
 extern void cachefiles_shorten_content_map(struct cachefiles_object *object, loff_t new_size);
@@ -185,6 +188,9 @@ extern int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
 extern int cachefiles_check_in_use(struct cachefiles_cache *cache,
 				   struct dentry *dir, char *filename);
 
+extern bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
+				      struct cachefiles_object *object);
+
 /*
  * proc.c
  */
@@ -237,8 +243,7 @@ static inline void cachefiles_end_secure(struct cachefiles_cache *cache,
  * xattr.c
  */
 extern int cachefiles_check_object_type(struct cachefiles_object *object);
-extern int cachefiles_set_object_xattr(struct cachefiles_object *object,
-				       unsigned int xattr_flags);
+extern int cachefiles_set_object_xattr(struct cachefiles_object *object);
 extern int cachefiles_check_auxdata(struct cachefiles_object *object);
 extern int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
 					  struct dentry *dentry);
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 3dc64ae5dde8..e63ee4b88268 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -468,7 +468,7 @@ bool cachefiles_walk_to_object(struct cachefiles_object *parent,
 
 	if (object->new) {
 		/* attach data to a newly constructed terminal object */
-		ret = cachefiles_set_object_xattr(object, XATTR_CREATE);
+		ret = cachefiles_set_object_xattr(object);
 		if (ret < 0)
 			goto check_error;
 	} else {
@@ -487,8 +487,6 @@ bool cachefiles_walk_to_object(struct cachefiles_object *parent,
 				pr_warn("cachefiles: Block size too large\n");
 				goto check_error;
 			}
-
-			object->old = dget(object->dentry);
 		} else {
 			BUG(); // TODO: open file in data-class subdir
 		}
@@ -523,9 +521,7 @@ bool cachefiles_walk_to_object(struct cachefiles_object *parent,
 		cachefiles_unmark_inode_in_use(object, object->dentry);
 	cachefiles_mark_object_inactive(cache, object);
 	dput(object->dentry);
-	dput(object->old);
 	object->dentry = NULL;
-	object->old = NULL;
 	goto error_out;
 
 lookup_error:
@@ -811,3 +807,66 @@ int cachefiles_check_in_use(struct cachefiles_cache *cache, struct dentry *dir,
 	//_leave(" = 0");
 	return ret;
 }
+
+/*
+ * Attempt to link a temporary file into its rightful place in the cache.
+ */
+bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
+			       struct cachefiles_object *object)
+{
+	struct dentry *dir, *dentry, *old;
+	char *name;
+	unsigned int namelen;
+	bool success = false;
+	int ret;
+
+	_enter(",%pd", object->old);
+
+	namelen = object->old->d_name.len;
+	name = kmemdup_nul(object->old->d_name.name, namelen, GFP_KERNEL);
+	if (!name)
+		goto out;
+
+	dir = dget_parent(object->old);
+
+	inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
+	ret = cachefiles_bury_object(cache, object, dir, object->old,
+				     FSCACHE_OBJECT_IS_STALE);
+	dput(object->old);
+	object->old = NULL;
+	if (ret < 0 && ret != -ENOENT) {
+		_debug("bury fail %d", ret);
+		goto out_name;
+	}
+
+	inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
+	dentry = lookup_one_len(name, dir, namelen);
+	if (IS_ERR(dentry)) {
+		_debug("lookup fail %ld", PTR_ERR(dentry));
+		goto out_unlock;
+	}
+
+	ret = vfs_link(object->dentry, d_inode(dir), dentry, NULL);
+	if (ret < 0) {
+		_debug("link fail %d", ret);
+		dput(dentry);
+	} else {
+		trace_cachefiles_link(object, d_inode(object->dentry));
+		spin_lock(&object->fscache.lock);
+		old = object->dentry;
+		object->dentry = dentry;
+		success = true;
+		clear_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags);
+		spin_unlock(&object->fscache.lock);
+		dput(old);
+	}
+
+out_unlock:
+	inode_unlock(d_inode(dir));
+out_name:
+	kfree(name);
+	dput(dir);
+out:
+	_leave(" = %u", success);
+	return success;
+}
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index a1d4a3d1db69..22c56ca2fd0b 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -104,8 +104,7 @@ int cachefiles_check_object_type(struct cachefiles_object *object)
 /*
  * set the state xattr on a cache file
  */
-int cachefiles_set_object_xattr(struct cachefiles_object *object,
-				unsigned int xattr_flags)
+int cachefiles_set_object_xattr(struct cachefiles_object *object)
 {
 	struct cachefiles_xattr *buf;
 	struct dentry *dentry = object->dentry;
@@ -129,8 +128,7 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 		memcpy(buf->data, fscache_get_aux(object->fscache.cookie), len);
 
 	ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
-			   buf, sizeof(struct cachefiles_xattr) + len,
-			   xattr_flags);
+			   buf, sizeof(struct cachefiles_xattr) + len, 0);
 	if (ret < 0) {
 		trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
 					   buf->content,
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 2d9d147411cd..fc93f4b69198 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -472,10 +472,14 @@ void fscache_set_cookie_stage(struct fscache_cookie *cookie,
 }
 
 /*
- * Invalidate an object.  Callable with spinlocks held.
+ * Invalidate an object.
  */
-void __fscache_invalidate(struct fscache_cookie *cookie, loff_t new_size)
+void __fscache_invalidate(struct fscache_cookie *cookie,
+			  const void *aux_data, loff_t new_size,
+			  unsigned int flags)
 {
+	struct fscache_object *object = NULL;
+
 	_enter("{%s}", cookie->type_name);
 
 	fscache_stat(&fscache_n_invalidates);
@@ -488,13 +492,42 @@ void __fscache_invalidate(struct fscache_cookie *cookie, loff_t new_size)
 	ASSERTCMP(cookie->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
 
 	spin_lock(&cookie->lock);
-	cookie->object_size = new_size;
+	fscache_update_aux(cookie, aux_data, &new_size);
 	cookie->zero_point = new_size;
-	spin_unlock(&cookie->lock);
 
-	if (!hlist_empty(&cookie->backing_objects) &&
-	    test_and_set_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags))
-		fscache_dispatch(cookie, NULL, 0, fscache_invalidate_object);
+	if (!hlist_empty(&cookie->backing_objects)) {
+		object = hlist_entry(cookie->backing_objects.first,
+				     struct fscache_object, cookie_link);
+		object->inval_counter++;
+	}
+
+	switch (cookie->stage) {
+	case FSCACHE_COOKIE_STAGE_QUIESCENT:
+	case FSCACHE_COOKIE_STAGE_DEAD:
+	case FSCACHE_COOKIE_STAGE_INITIALISING: /* Assume later checks will catch it */
+	case FSCACHE_COOKIE_STAGE_INVALIDATING: /* is_still_valid will catch it */
+		spin_unlock(&cookie->lock);
+		_leave(" [no %u]", cookie->stage);
+		return;
+
+	case FSCACHE_COOKIE_STAGE_LOOKING_UP:
+		spin_unlock(&cookie->lock);
+		_leave(" [look %x]", object->inval_counter);
+		return;
+
+	case FSCACHE_COOKIE_STAGE_NO_DATA_YET:
+	case FSCACHE_COOKIE_STAGE_ACTIVE:
+		cookie->stage = FSCACHE_COOKIE_STAGE_INVALIDATING;
+		wake_up_var(&cookie->stage);
+
+		atomic_inc(&cookie->n_ops);
+		object->cache->ops->grab_object(object, fscache_obj_get_inval);
+		spin_unlock(&cookie->lock);
+
+		fscache_dispatch(cookie, object, flags, fscache_invalidate_object);
+		_leave(" [inv]");
+		return;
+	}
 }
 EXPORT_SYMBOL(__fscache_invalidate);
 
diff --git a/fs/fscache/io.c b/fs/fscache/io.c
index d38101d77d27..1885cfbe7f04 100644
--- a/fs/fscache/io.c
+++ b/fs/fscache/io.c
@@ -84,6 +84,8 @@ static struct fscache_object *fscache_begin_io_operation(
 		goto not_live;
 
 	object->cache->ops->grab_object(object, fscache_obj_get_ioreq);
+	if (req)
+		req->inval_counter = object->inval_counter;
 
 	atomic_inc(&cookie->n_ops);
 	spin_unlock(&cookie->lock);
diff --git a/fs/fscache/obj.c b/fs/fscache/obj.c
index baab7c465142..a7064a4cb486 100644
--- a/fs/fscache/obj.c
+++ b/fs/fscache/obj.c
@@ -241,32 +241,21 @@ void fscache_lookup_object(struct fscache_cookie *cookie,
 }
 
 /*
- * Invalidate an object
+ * Invalidate an object.  param passes the invalidation flags.
  */
 void fscache_invalidate_object(struct fscache_cookie *cookie,
-			       struct fscache_object *unused, int param)
+			       struct fscache_object *object, int flags)
 {
-	struct fscache_object *object = NULL;
+	bool success;
 
-	spin_lock(&cookie->lock);
-
-	if (!hlist_empty(&cookie->backing_objects)) {
-		object = hlist_entry(cookie->backing_objects.first,
-				     struct fscache_object,
-				     cookie_link);
-		object = object->cache->ops->grab_object(object,
-							 fscache_obj_get_inval);
-	}
-
-	spin_unlock(&cookie->lock);
-
-	if (object) {
-		object->cache->ops->invalidate_object(object);
-		fscache_do_put_object(object, fscache_obj_put_inval);
-	}
+	success = object->cache->ops->invalidate_object(object, flags);
+	fscache_do_put_object(object, fscache_obj_put_inval);
 
-	clear_bit_unlock(FSCACHE_COOKIE_INVALIDATING, &cookie->flags);
-	wake_up_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING);
+	if (success)
+		fscache_set_cookie_stage(cookie, FSCACHE_COOKIE_STAGE_NO_DATA_YET);
+	else
+		fscache_set_cookie_stage(cookie, FSCACHE_COOKIE_STAGE_DEAD);
+	fscache_end_io_operation(cookie);
 }
 
 /*
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 1357c44d371b..da85eb15b3c9 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -120,7 +120,8 @@ struct fscache_cache_ops {
 	void (*update_object)(struct fscache_object *object);
 
 	/* Invalidate an object */
-	void (*invalidate_object)(struct fscache_object *object);
+	bool (*invalidate_object)(struct fscache_object *object,
+				  unsigned int flags);
 
 	/* discard the resources pinned by an object and effect retirement if
 	 * necessary */
@@ -176,10 +177,12 @@ enum fscache_object_stage {
 struct fscache_object {
 	int			debug_id;	/* debugging ID */
 	int			n_children;	/* number of child objects */
+	unsigned int		inval_counter;	/* Number of invalidations applied */
 	enum fscache_object_stage stage;	/* Stage of object's lifecycle */
 	spinlock_t		lock;		/* state and operations lock */
 
 	unsigned long		flags;
+#define FSCACHE_OBJECT_NEEDS_INVAL	8	/* T if object needs invalidation */
 #define FSCACHE_OBJECT_NEEDS_UPDATE	9	/* T if object attrs need writing to disk */
 
 	struct list_head	cache_link;	/* link in cache->object_list */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 0aee6edef672..c313950afd8a 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -57,6 +57,8 @@ enum fscache_cookie_type {
 #define FSCACHE_ADV_WRITE_CACHE		0x00 /* Do cache if written to locally */
 #define FSCACHE_ADV_WRITE_NOCACHE	0x02 /* Don't cache if written to locally */
 
+#define FSCACHE_INVAL_LIGHT		0x01 /* Don't re-invalidate if temp object */
+
 /*
  * fscache cached network filesystem type
  * - name, version and ops must be filled in before registration
@@ -105,7 +107,6 @@ struct fscache_cookie {
 	loff_t				zero_point;	/* Size after which no data on server */
 
 	unsigned long			flags;
-#define FSCACHE_COOKIE_INVALIDATING	4	/* T if cookie is being invalidated */
 #define FSCACHE_COOKIE_ACQUIRED		5	/* T if cookie is in use */
 #define FSCACHE_COOKIE_RELINQUISHED	6	/* T if cookie has been relinquished */
 
@@ -178,6 +179,7 @@ struct fscache_io_request {
 	loff_t			len;		/* Size of the I/O */
 	loff_t			transferred;	/* Amount of data transferred */
 	short			error;		/* 0 or error that occurred */
+	unsigned int		inval_counter;	/* object->inval_counter at begin_op */
 	unsigned long		flags;
 #define FSCACHE_IO_DATA_FROM_SERVER	0	/* Set if data was read from server */
 #define FSCACHE_IO_DATA_FROM_CACHE	1	/* Set if data was read from the cache */
@@ -230,7 +232,7 @@ extern void __fscache_unuse_cookie(struct fscache_cookie *, const void *, const
 extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
 extern void __fscache_update_cookie(struct fscache_cookie *, const void *, const loff_t *);
 extern void __fscache_shape_request(struct fscache_cookie *, struct fscache_request_shape *);
-extern void __fscache_invalidate(struct fscache_cookie *, loff_t);
+extern void __fscache_invalidate(struct fscache_cookie *, const void *, loff_t, unsigned int);
 extern void __fscache_init_io_request(struct fscache_io_request *,
 				      struct fscache_cookie *);
 extern void __fscache_free_io_request(struct fscache_io_request *);
@@ -461,22 +463,23 @@ void fscache_unpin_cookie(struct fscache_cookie *cookie)
 /**
  * fscache_invalidate - Notify cache that an object needs invalidation
  * @cookie: The cookie representing the cache object
+ * @aux_data: The updated auxiliary data for the cookie (may be NULL)
  * @size: The revised size of the object.
+ * @flags: Invalidation flags (FSCACHE_INVAL_*)
  *
  * Notify the cache that an object is needs to be invalidated and that it
  * should abort any retrievals or stores it is doing on the cache.  The object
  * is then marked non-caching until such time as the invalidation is complete.
  *
- * This can be called with spinlocks held.
- *
  * See Documentation/filesystems/caching/netfs-api.rst for a complete
  * description.
  */
 static inline
-void fscache_invalidate(struct fscache_cookie *cookie, loff_t size)
+void fscache_invalidate(struct fscache_cookie *cookie,
+			const void *aux_data, loff_t size, unsigned int flags)
 {
 	if (fscache_cookie_valid(cookie))
-		__fscache_invalidate(cookie, size);
+		__fscache_invalidate(cookie, aux_data, size, flags);
 }
 
 /**



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 28/32] fscache, cachefiles: Rewrite invalidation
@ 2020-07-13 16:35   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:35 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs

Rewrite the cache object invalidation code in fscache and cachefiles.  The
following changes are made to fscache:

 (1) Invalidation is now ignored or allowed to proceed depending on the
     'stage' a non-index cookie is in with respect to the backing object.

 (2) If invalidation is proceeds, it pins the object and holds an operation
     count  for the duration.

 (3) The fscache_object struct is given an invalidation counter that is
     incremented any time fscache_invalidate() is called, even if the
     cookie is at a stage in which it cannot be applied.  The counter,
     however, can be noted and applied retroactively later.

 (4) The invalidation counter is noted in the operation struct when a cache
     operation is begun and can be checked on operation completion to find
     out if any consequent metadata changes should be dropped.

 (5) New operations aren't allowed to proceed if the object is being
     invalidated.

and to cachefiles:

 (1) If an open object is invalidated, the open backing file is replaced
     with a tmpfile (as if opened O_TMPFILE).  This is held unlinked until
     the object released from memory, at which point the file is simply
     abandoned if it was retired or the old file is unlinked and the new
     one linked into its place.

     Note: This would be easier if linkat() could be given a flag to
     indicate the destination should be overwritten or if RENAME_EXCHANGE
     could be applied to tmpfiles, effectively unlinking the destination.

 (2) Upon invalidation, the content map is replaced with a blank one.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/inode.c                |    8 ++-
 fs/cachefiles/content-map.c   |   32 ++++++++++
 fs/cachefiles/interface.c     |  130 ++++++++++++++++++++++++++++++++++-------
 fs/cachefiles/internal.h      |    9 ++-
 fs/cachefiles/namei.c         |   69 ++++++++++++++++++++--
 fs/cachefiles/xattr.c         |    6 +-
 fs/fscache/cookie.c           |   47 +++++++++++++--
 fs/fscache/io.c               |    2 +
 fs/fscache/obj.c              |   31 +++-------
 include/linux/fscache-cache.h |    5 +-
 include/linux/fscache.h       |   15 +++--
 11 files changed, 283 insertions(+), 71 deletions(-)

diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index b0772e64a844..eab191b9c01d 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -569,7 +569,13 @@ static void afs_zap_data(struct afs_vnode *vnode)
 	_enter("{%llx:%llu}", vnode->fid.vid, vnode->fid.vnode);
 
 #ifdef CONFIG_AFS_FSCACHE
-	fscache_invalidate(vnode->cache, i_size_read(&vnode->vfs_inode));
+	{
+		struct afs_vnode_cache_aux aux = {
+			.data_version = vnode->status.data_version,
+		};
+		fscache_invalidate(afs_vnode_cache(vnode), &aux,
+				   i_size_read(&vnode->vfs_inode), 0);
+	}
 #endif
 
 	/* nuke all the non-dirty pages that aren't locked, mapped or being
diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
index f2a10e8d8d6d..3e310fd58497 100644
--- a/fs/cachefiles/content-map.c
+++ b/fs/cachefiles/content-map.c
@@ -192,6 +192,34 @@ void cachefiles_shape_request(struct fscache_object *obj,
 	       shape->to_be_done, shape->actual_start, shape->actual_nr_pages);
 }
 
+/*
+ * Allocate a new content map.
+ */
+u8 *cachefiles_new_content_map(struct cachefiles_object *object,
+			       unsigned int *_size)
+{
+	size_t size;
+	u8 *map = NULL;
+
+	_enter("");
+
+	if (!(object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK)) {
+		/* Single-chunk object.  The presence or absence of the content
+		 * map xattr is sufficient indication.
+		 */
+		*_size = 0;
+		return NULL;
+	}
+
+	/* Granular object. */
+	size = cachefiles_map_size(object->fscache.cookie->object_size);
+	map = kzalloc(size, GFP_KERNEL);
+	if (!map)
+		return ERR_PTR(-ENOMEM);
+	*_size = size;
+	return map;
+}
+
 /*
  * Mark the content map to indicate stored granule.
  */
@@ -205,7 +233,9 @@ void cachefiles_mark_content_map(struct fscache_io_request *req)
 
 	read_lock_bh(&object->content_map_lock);
 
-	if (object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK) {
+	if (req->inval_counter != object->fscache.inval_counter) {
+		_debug("inval mark");
+	} else if (object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK) {
 		if (pos == 0) {
 			object->content_info = CACHEFILES_CONTENT_SINGLE;
 			set_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->fscache.flags);
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 78180d269c5f..76f3a89d3e6c 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -203,7 +203,7 @@ static void cachefiles_update_object(struct fscache_object *_object)
 		}
 	}
 
-	cachefiles_set_object_xattr(object, XATTR_REPLACE);
+	cachefiles_set_object_xattr(object);
 
 out:
 	cachefiles_end_secure(cache, saved_cred);
@@ -213,11 +213,15 @@ static void cachefiles_update_object(struct fscache_object *_object)
 /*
  * Commit changes to the object as we drop it.
  */
-static void cachefiles_commit_object(struct cachefiles_object *object,
+static bool cachefiles_commit_object(struct cachefiles_object *object,
 				     struct cachefiles_cache *cache)
 {
 	if (object->content_map_changed)
 		cachefiles_save_content_map(object);
+
+	if (test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags))
+		return cachefiles_commit_tmpfile(cache, object);
+	return true;
 }
 
 /*
@@ -424,47 +428,127 @@ static int cachefiles_attr_changed(struct cachefiles_object *object)
 }
 
 /*
- * Invalidate an object
+ * Create a temporary file and leave it unattached and un-xattr'd until the
+ * time comes to discard the object from memory.
  */
-static void cachefiles_invalidate_object(struct fscache_object *_object)
+static struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
 {
-	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
 	const struct cred *saved_cred;
+	struct file *file;
 	struct path path;
 	uint64_t ni_size;
-	int ret;
+	long ret;
 
-	object = container_of(_object, struct cachefiles_object, fscache);
 	cache = container_of(object->fscache.cache,
 			     struct cachefiles_cache, cache);
 
 	ni_size = object->fscache.cookie->object_size;
 	ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
 
+	cachefiles_begin_secure(cache, &saved_cred);
+
+	path.mnt = cache->mnt;
+	path.dentry = vfs_tmpfile(cache->graveyard, S_IFREG, O_RDWR);
+	if (IS_ERR(path.dentry)) {
+		if (PTR_ERR(path.dentry) == -EIO)
+			cachefiles_io_error_obj(object, "Failed to create tmpfile");
+		file = ERR_CAST(path.dentry);
+		goto out;
+	}
+
+	trace_cachefiles_tmpfile(object, d_inode(path.dentry));
+
+	if (ni_size > 0) {
+		trace_cachefiles_trunc(object, d_inode(path.dentry), 0, ni_size);
+		ret = vfs_truncate(&path, ni_size);
+		if (ret < 0) {
+			file = ERR_PTR(ret);
+			goto out_dput;
+		}
+	}
+
+	file = open_with_fake_path(&path,
+				   O_RDWR | O_LARGEFILE | O_DIRECT,
+				   d_backing_inode(path.dentry),
+				   cache->cache_cred);
+out_dput:
+	dput(path.dentry);
+out:
+	cachefiles_end_secure(cache, saved_cred);
+	return file;
+}
+
+/*
+ * Invalidate an object
+ */
+static bool cachefiles_invalidate_object(struct fscache_object *_object,
+					 unsigned int flags)
+{
+	struct cachefiles_object *object;
+	struct file *file, *old_file;
+	u8 *map, *old_map;
+	unsigned int map_size;
+
+	object = container_of(_object, struct cachefiles_object, fscache);
+
 	_enter("{OBJ%x},[%llu]",
-	       object->fscache.debug_id, (unsigned long long)ni_size);
+	       object->fscache.debug_id, _object->cookie->object_size);
+
+	if ((flags & FSCACHE_INVAL_LIGHT) &&
+	    test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags)) {
+		_leave(" = t [light]");
+		return true;
+	}
 
 	if (object->dentry) {
 		ASSERT(d_is_reg(object->dentry));
 
-		path.dentry = object->dentry;
-		path.mnt = cache->mnt;
-
-		cachefiles_begin_secure(cache, &saved_cred);
-		ret = vfs_truncate(&path, 0);
-		if (ret == 0)
-			ret = vfs_truncate(&path, ni_size);
-		cachefiles_end_secure(cache, saved_cred);
-
-		if (ret != 0) {
-			if (ret == -EIO)
-				cachefiles_io_error_obj(object,
-							"Invalidate failed");
-		}
+		file = cachefiles_create_tmpfile(object);
+		if (IS_ERR(file))
+			goto failed;
+
+		map = cachefiles_new_content_map(object, &map_size);
+		if (IS_ERR(map))
+			goto failed_fput;
+
+		/* Substitute the VFS target */
+		_debug("sub");
+		dget(file->f_path.dentry); /* Do outside of content_map_lock */
+		spin_lock(&object->fscache.lock);
+		write_lock_bh(&object->content_map_lock);
+
+		if (!object->old)
+			/* Save the dentry carrying the path information */
+			object->old = object->dentry;
+
+		old_file = object->backing_file;
+		old_map = object->content_map;
+		object->backing_file = file;
+		object->dentry = file->f_path.dentry;
+		object->content_info = CACHEFILES_CONTENT_NO_DATA;
+		object->content_map = map;
+		object->content_map_size = map_size;
+		object->content_map_changed = true;
+		set_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags);
+		set_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->fscache.flags);
+
+		write_unlock_bh(&object->content_map_lock);
+		spin_unlock(&object->fscache.lock);
+		_debug("subbed");
+
+		kfree(old_map);
+		fput(old_file);
 	}
 
-	_leave("");
+	_leave(" = t [tmpfile]");
+	return true;
+
+failed_fput:
+	fput(file);
+failed:
+	_leave(" = f");
+	return false;
 }
 
 static unsigned int cachefiles_get_object_usage(const struct fscache_object *_object)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index c91a9b3c5bd5..ba60fc9dda0a 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -54,6 +54,8 @@ struct cachefiles_object {
 	struct file			*backing_file;	/* File open on backing storage */
 	loff_t				i_size;		/* object size */
 	atomic_t			usage;		/* object usage count */
+	unsigned long			flags;
+#define CACHEFILES_OBJECT_USING_TMPFILE	0		/* Object has a tmpfile that need linking */
 	uint8_t				type;		/* object type */
 	bool				new;		/* T if object new */
 
@@ -127,6 +129,7 @@ extern void cachefiles_daemon_unbind(struct cachefiles_cache *cache);
  */
 extern void cachefiles_shape_request(struct fscache_object *object,
 				     struct fscache_request_shape *shape);
+extern u8 *cachefiles_new_content_map(struct cachefiles_object *object, unsigned int *_size);
 extern void cachefiles_mark_content_map(struct fscache_io_request *req);
 extern void cachefiles_expand_content_map(struct cachefiles_object *object, loff_t size);
 extern void cachefiles_shorten_content_map(struct cachefiles_object *object, loff_t new_size);
@@ -185,6 +188,9 @@ extern int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
 extern int cachefiles_check_in_use(struct cachefiles_cache *cache,
 				   struct dentry *dir, char *filename);
 
+extern bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
+				      struct cachefiles_object *object);
+
 /*
  * proc.c
  */
@@ -237,8 +243,7 @@ static inline void cachefiles_end_secure(struct cachefiles_cache *cache,
  * xattr.c
  */
 extern int cachefiles_check_object_type(struct cachefiles_object *object);
-extern int cachefiles_set_object_xattr(struct cachefiles_object *object,
-				       unsigned int xattr_flags);
+extern int cachefiles_set_object_xattr(struct cachefiles_object *object);
 extern int cachefiles_check_auxdata(struct cachefiles_object *object);
 extern int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
 					  struct dentry *dentry);
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 3dc64ae5dde8..e63ee4b88268 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -468,7 +468,7 @@ bool cachefiles_walk_to_object(struct cachefiles_object *parent,
 
 	if (object->new) {
 		/* attach data to a newly constructed terminal object */
-		ret = cachefiles_set_object_xattr(object, XATTR_CREATE);
+		ret = cachefiles_set_object_xattr(object);
 		if (ret < 0)
 			goto check_error;
 	} else {
@@ -487,8 +487,6 @@ bool cachefiles_walk_to_object(struct cachefiles_object *parent,
 				pr_warn("cachefiles: Block size too large\n");
 				goto check_error;
 			}
-
-			object->old = dget(object->dentry);
 		} else {
 			BUG(); // TODO: open file in data-class subdir
 		}
@@ -523,9 +521,7 @@ bool cachefiles_walk_to_object(struct cachefiles_object *parent,
 		cachefiles_unmark_inode_in_use(object, object->dentry);
 	cachefiles_mark_object_inactive(cache, object);
 	dput(object->dentry);
-	dput(object->old);
 	object->dentry = NULL;
-	object->old = NULL;
 	goto error_out;
 
 lookup_error:
@@ -811,3 +807,66 @@ int cachefiles_check_in_use(struct cachefiles_cache *cache, struct dentry *dir,
 	//_leave(" = 0");
 	return ret;
 }
+
+/*
+ * Attempt to link a temporary file into its rightful place in the cache.
+ */
+bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
+			       struct cachefiles_object *object)
+{
+	struct dentry *dir, *dentry, *old;
+	char *name;
+	unsigned int namelen;
+	bool success = false;
+	int ret;
+
+	_enter(",%pd", object->old);
+
+	namelen = object->old->d_name.len;
+	name = kmemdup_nul(object->old->d_name.name, namelen, GFP_KERNEL);
+	if (!name)
+		goto out;
+
+	dir = dget_parent(object->old);
+
+	inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
+	ret = cachefiles_bury_object(cache, object, dir, object->old,
+				     FSCACHE_OBJECT_IS_STALE);
+	dput(object->old);
+	object->old = NULL;
+	if (ret < 0 && ret != -ENOENT) {
+		_debug("bury fail %d", ret);
+		goto out_name;
+	}
+
+	inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
+	dentry = lookup_one_len(name, dir, namelen);
+	if (IS_ERR(dentry)) {
+		_debug("lookup fail %ld", PTR_ERR(dentry));
+		goto out_unlock;
+	}
+
+	ret = vfs_link(object->dentry, d_inode(dir), dentry, NULL);
+	if (ret < 0) {
+		_debug("link fail %d", ret);
+		dput(dentry);
+	} else {
+		trace_cachefiles_link(object, d_inode(object->dentry));
+		spin_lock(&object->fscache.lock);
+		old = object->dentry;
+		object->dentry = dentry;
+		success = true;
+		clear_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags);
+		spin_unlock(&object->fscache.lock);
+		dput(old);
+	}
+
+out_unlock:
+	inode_unlock(d_inode(dir));
+out_name:
+	kfree(name);
+	dput(dir);
+out:
+	_leave(" = %u", success);
+	return success;
+}
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index a1d4a3d1db69..22c56ca2fd0b 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -104,8 +104,7 @@ int cachefiles_check_object_type(struct cachefiles_object *object)
 /*
  * set the state xattr on a cache file
  */
-int cachefiles_set_object_xattr(struct cachefiles_object *object,
-				unsigned int xattr_flags)
+int cachefiles_set_object_xattr(struct cachefiles_object *object)
 {
 	struct cachefiles_xattr *buf;
 	struct dentry *dentry = object->dentry;
@@ -129,8 +128,7 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 		memcpy(buf->data, fscache_get_aux(object->fscache.cookie), len);
 
 	ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
-			   buf, sizeof(struct cachefiles_xattr) + len,
-			   xattr_flags);
+			   buf, sizeof(struct cachefiles_xattr) + len, 0);
 	if (ret < 0) {
 		trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
 					   buf->content,
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 2d9d147411cd..fc93f4b69198 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -472,10 +472,14 @@ void fscache_set_cookie_stage(struct fscache_cookie *cookie,
 }
 
 /*
- * Invalidate an object.  Callable with spinlocks held.
+ * Invalidate an object.
  */
-void __fscache_invalidate(struct fscache_cookie *cookie, loff_t new_size)
+void __fscache_invalidate(struct fscache_cookie *cookie,
+			  const void *aux_data, loff_t new_size,
+			  unsigned int flags)
 {
+	struct fscache_object *object = NULL;
+
 	_enter("{%s}", cookie->type_name);
 
 	fscache_stat(&fscache_n_invalidates);
@@ -488,13 +492,42 @@ void __fscache_invalidate(struct fscache_cookie *cookie, loff_t new_size)
 	ASSERTCMP(cookie->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
 
 	spin_lock(&cookie->lock);
-	cookie->object_size = new_size;
+	fscache_update_aux(cookie, aux_data, &new_size);
 	cookie->zero_point = new_size;
-	spin_unlock(&cookie->lock);
 
-	if (!hlist_empty(&cookie->backing_objects) &&
-	    test_and_set_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags))
-		fscache_dispatch(cookie, NULL, 0, fscache_invalidate_object);
+	if (!hlist_empty(&cookie->backing_objects)) {
+		object = hlist_entry(cookie->backing_objects.first,
+				     struct fscache_object, cookie_link);
+		object->inval_counter++;
+	}
+
+	switch (cookie->stage) {
+	case FSCACHE_COOKIE_STAGE_QUIESCENT:
+	case FSCACHE_COOKIE_STAGE_DEAD:
+	case FSCACHE_COOKIE_STAGE_INITIALISING: /* Assume later checks will catch it */
+	case FSCACHE_COOKIE_STAGE_INVALIDATING: /* is_still_valid will catch it */
+		spin_unlock(&cookie->lock);
+		_leave(" [no %u]", cookie->stage);
+		return;
+
+	case FSCACHE_COOKIE_STAGE_LOOKING_UP:
+		spin_unlock(&cookie->lock);
+		_leave(" [look %x]", object->inval_counter);
+		return;
+
+	case FSCACHE_COOKIE_STAGE_NO_DATA_YET:
+	case FSCACHE_COOKIE_STAGE_ACTIVE:
+		cookie->stage = FSCACHE_COOKIE_STAGE_INVALIDATING;
+		wake_up_var(&cookie->stage);
+
+		atomic_inc(&cookie->n_ops);
+		object->cache->ops->grab_object(object, fscache_obj_get_inval);
+		spin_unlock(&cookie->lock);
+
+		fscache_dispatch(cookie, object, flags, fscache_invalidate_object);
+		_leave(" [inv]");
+		return;
+	}
 }
 EXPORT_SYMBOL(__fscache_invalidate);
 
diff --git a/fs/fscache/io.c b/fs/fscache/io.c
index d38101d77d27..1885cfbe7f04 100644
--- a/fs/fscache/io.c
+++ b/fs/fscache/io.c
@@ -84,6 +84,8 @@ static struct fscache_object *fscache_begin_io_operation(
 		goto not_live;
 
 	object->cache->ops->grab_object(object, fscache_obj_get_ioreq);
+	if (req)
+		req->inval_counter = object->inval_counter;
 
 	atomic_inc(&cookie->n_ops);
 	spin_unlock(&cookie->lock);
diff --git a/fs/fscache/obj.c b/fs/fscache/obj.c
index baab7c465142..a7064a4cb486 100644
--- a/fs/fscache/obj.c
+++ b/fs/fscache/obj.c
@@ -241,32 +241,21 @@ void fscache_lookup_object(struct fscache_cookie *cookie,
 }
 
 /*
- * Invalidate an object
+ * Invalidate an object.  param passes the invalidation flags.
  */
 void fscache_invalidate_object(struct fscache_cookie *cookie,
-			       struct fscache_object *unused, int param)
+			       struct fscache_object *object, int flags)
 {
-	struct fscache_object *object = NULL;
+	bool success;
 
-	spin_lock(&cookie->lock);
-
-	if (!hlist_empty(&cookie->backing_objects)) {
-		object = hlist_entry(cookie->backing_objects.first,
-				     struct fscache_object,
-				     cookie_link);
-		object = object->cache->ops->grab_object(object,
-							 fscache_obj_get_inval);
-	}
-
-	spin_unlock(&cookie->lock);
-
-	if (object) {
-		object->cache->ops->invalidate_object(object);
-		fscache_do_put_object(object, fscache_obj_put_inval);
-	}
+	success = object->cache->ops->invalidate_object(object, flags);
+	fscache_do_put_object(object, fscache_obj_put_inval);
 
-	clear_bit_unlock(FSCACHE_COOKIE_INVALIDATING, &cookie->flags);
-	wake_up_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING);
+	if (success)
+		fscache_set_cookie_stage(cookie, FSCACHE_COOKIE_STAGE_NO_DATA_YET);
+	else
+		fscache_set_cookie_stage(cookie, FSCACHE_COOKIE_STAGE_DEAD);
+	fscache_end_io_operation(cookie);
 }
 
 /*
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 1357c44d371b..da85eb15b3c9 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -120,7 +120,8 @@ struct fscache_cache_ops {
 	void (*update_object)(struct fscache_object *object);
 
 	/* Invalidate an object */
-	void (*invalidate_object)(struct fscache_object *object);
+	bool (*invalidate_object)(struct fscache_object *object,
+				  unsigned int flags);
 
 	/* discard the resources pinned by an object and effect retirement if
 	 * necessary */
@@ -176,10 +177,12 @@ enum fscache_object_stage {
 struct fscache_object {
 	int			debug_id;	/* debugging ID */
 	int			n_children;	/* number of child objects */
+	unsigned int		inval_counter;	/* Number of invalidations applied */
 	enum fscache_object_stage stage;	/* Stage of object's lifecycle */
 	spinlock_t		lock;		/* state and operations lock */
 
 	unsigned long		flags;
+#define FSCACHE_OBJECT_NEEDS_INVAL	8	/* T if object needs invalidation */
 #define FSCACHE_OBJECT_NEEDS_UPDATE	9	/* T if object attrs need writing to disk */
 
 	struct list_head	cache_link;	/* link in cache->object_list */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 0aee6edef672..c313950afd8a 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -57,6 +57,8 @@ enum fscache_cookie_type {
 #define FSCACHE_ADV_WRITE_CACHE		0x00 /* Do cache if written to locally */
 #define FSCACHE_ADV_WRITE_NOCACHE	0x02 /* Don't cache if written to locally */
 
+#define FSCACHE_INVAL_LIGHT		0x01 /* Don't re-invalidate if temp object */
+
 /*
  * fscache cached network filesystem type
  * - name, version and ops must be filled in before registration
@@ -105,7 +107,6 @@ struct fscache_cookie {
 	loff_t				zero_point;	/* Size after which no data on server */
 
 	unsigned long			flags;
-#define FSCACHE_COOKIE_INVALIDATING	4	/* T if cookie is being invalidated */
 #define FSCACHE_COOKIE_ACQUIRED		5	/* T if cookie is in use */
 #define FSCACHE_COOKIE_RELINQUISHED	6	/* T if cookie has been relinquished */
 
@@ -178,6 +179,7 @@ struct fscache_io_request {
 	loff_t			len;		/* Size of the I/O */
 	loff_t			transferred;	/* Amount of data transferred */
 	short			error;		/* 0 or error that occurred */
+	unsigned int		inval_counter;	/* object->inval_counter at begin_op */
 	unsigned long		flags;
 #define FSCACHE_IO_DATA_FROM_SERVER	0	/* Set if data was read from server */
 #define FSCACHE_IO_DATA_FROM_CACHE	1	/* Set if data was read from the cache */
@@ -230,7 +232,7 @@ extern void __fscache_unuse_cookie(struct fscache_cookie *, const void *, const
 extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
 extern void __fscache_update_cookie(struct fscache_cookie *, const void *, const loff_t *);
 extern void __fscache_shape_request(struct fscache_cookie *, struct fscache_request_shape *);
-extern void __fscache_invalidate(struct fscache_cookie *, loff_t);
+extern void __fscache_invalidate(struct fscache_cookie *, const void *, loff_t, unsigned int);
 extern void __fscache_init_io_request(struct fscache_io_request *,
 				      struct fscache_cookie *);
 extern void __fscache_free_io_request(struct fscache_io_request *);
@@ -461,22 +463,23 @@ void fscache_unpin_cookie(struct fscache_cookie *cookie)
 /**
  * fscache_invalidate - Notify cache that an object needs invalidation
  * @cookie: The cookie representing the cache object
+ * @aux_data: The updated auxiliary data for the cookie (may be NULL)
  * @size: The revised size of the object.
+ * @flags: Invalidation flags (FSCACHE_INVAL_*)
  *
  * Notify the cache that an object is needs to be invalidated and that it
  * should abort any retrievals or stores it is doing on the cache.  The object
  * is then marked non-caching until such time as the invalidation is complete.
  *
- * This can be called with spinlocks held.
- *
  * See Documentation/filesystems/caching/netfs-api.rst for a complete
  * description.
  */
 static inline
-void fscache_invalidate(struct fscache_cookie *cookie, loff_t size)
+void fscache_invalidate(struct fscache_cookie *cookie,
+			const void *aux_data, loff_t size, unsigned int flags)
 {
 	if (fscache_cookie_valid(cookie))
-		__fscache_invalidate(cookie, size);
+		__fscache_invalidate(cookie, aux_data, size, flags);
 }
 
 /**

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 29/32] fscache: Implement "will_modify" parameter on fscache_use_cookie()
  2020-07-13 16:30 ` David Howells
                   ` (28 preceding siblings ...)
  (?)
@ 2020-07-13 16:36 ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:36 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Implement the "will_modify" parameter passed to fscache_use_cookie().

Setting this to true will henceforth cause the affected object to be marked
as dirty on disk, subject to conflict resolution in the event that power
failure or a crash occurs or the filesystem operates in disconnected mode.

The dirty flag is removed when the fscache_object is discarded from memory.

A cache hook is provided to prepare for writing - and this can be used to
mark the object on disk.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/interface.c     |   65 +++++++++++++++++++++++++++++++++++++++++
 fs/cachefiles/internal.h      |    2 +
 fs/cachefiles/xattr.c         |   21 +++++++++++++
 fs/fscache/cookie.c           |   17 +++++++++--
 fs/fscache/internal.h         |    1 +
 fs/fscache/obj.c              |   28 +++++++++++++++---
 include/linux/fscache-cache.h |    4 +++
 7 files changed, 129 insertions(+), 9 deletions(-)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 76f3a89d3e6c..c626cc4248a7 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -210,14 +210,78 @@ static void cachefiles_update_object(struct fscache_object *_object)
 	_leave("");
 }
 
+/*
+ * Shorten the backing object to discard any dirty data and free up
+ * any unused granules.
+ */
+static bool cachefiles_shorten_object(struct cachefiles_object *object, loff_t new_size)
+{
+	struct cachefiles_cache *cache;
+	struct inode *inode;
+	struct path path;
+	loff_t i_size;
+
+	cache = container_of(object->fscache.cache,
+			     struct cachefiles_cache, cache);
+	path.mnt = cache->mnt;
+	path.dentry = object->dentry;
+
+	inode = d_inode(object->dentry);
+	trace_cachefiles_trunc(object, inode, i_size_read(inode), new_size);
+	if (vfs_truncate(&path, new_size) < 0) {
+		cachefiles_io_error_obj(object, "Trunc-to-size failed");
+		cachefiles_remove_object_xattr(cache, object->dentry);
+		return false;
+	}
+
+	new_size = round_up(new_size, CACHEFILES_DIO_BLOCK_SIZE);
+	i_size = i_size_read(inode);
+	if (i_size < new_size) {
+		trace_cachefiles_trunc(object, inode, i_size, new_size);
+		if (vfs_truncate(&path, new_size) < 0) {
+			cachefiles_io_error_obj(object, "Trunc-to-dio-size failed");
+			cachefiles_remove_object_xattr(cache, object->dentry);
+			return false;
+		}
+	}
+
+	return true;
+}
+
+/*
+ * Trim excess stored data off of an object.
+ */
+static bool cachefiles_trim_object(struct cachefiles_object *object)
+{
+	loff_t object_size;
+
+	_enter("{OBJ%x}", object->fscache.debug_id);
+
+	object_size = object->fscache.cookie->object_size;
+	if (i_size_read(d_inode(object->dentry)) <= object_size)
+		return true;
+
+	return cachefiles_shorten_object(object, object_size);
+}
+
 /*
  * Commit changes to the object as we drop it.
  */
 static bool cachefiles_commit_object(struct cachefiles_object *object,
 				     struct cachefiles_cache *cache)
 {
+	bool update = false;
+
 	if (object->content_map_changed)
 		cachefiles_save_content_map(object);
+	if (test_and_clear_bit(FSCACHE_OBJECT_LOCAL_WRITE, &object->fscache.flags))
+		update = true;
+	if (test_and_clear_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->fscache.flags))
+		update = true;
+	if (update) {
+		if (cachefiles_trim_object(object))
+			cachefiles_set_object_xattr(object);
+	}
 
 	if (test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags))
 		return cachefiles_commit_tmpfile(cache, object);
@@ -575,5 +639,6 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
 	.shape_request		= cachefiles_shape_request,
 	.read			= cachefiles_read,
 	.write			= cachefiles_write,
+	.prepare_to_write	= cachefiles_prepare_to_write,
 	.display_object		= cachefiles_display_object,
 };
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index ba60fc9dda0a..bfe56eb53104 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -247,7 +247,7 @@ extern int cachefiles_set_object_xattr(struct cachefiles_object *object);
 extern int cachefiles_check_auxdata(struct cachefiles_object *object);
 extern int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
 					  struct dentry *dentry);
-
+extern int cachefiles_prepare_to_write(struct fscache_object *object);
 
 /*
  * error handling
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index 22c56ca2fd0b..456301b7abb0 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -124,6 +124,8 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object)
 	buf->zero_point		= cpu_to_be64(object->fscache.cookie->zero_point);
 	buf->type		= object->fscache.cookie->type;
 	buf->content		= object->content_info;
+	if (test_bit(FSCACHE_OBJECT_LOCAL_WRITE, &object->fscache.flags))
+		buf->content	= CACHEFILES_CONTENT_DIRTY;
 	if (len > 0)
 		memcpy(buf->data, fscache_get_aux(object->fscache.cookie), len);
 
@@ -184,10 +186,16 @@ int cachefiles_check_auxdata(struct cachefiles_object *object)
 		why = cachefiles_coherency_check_aux;
 	} else if (be64_to_cpu(buf->object_size) != object->fscache.cookie->object_size) {
 		why = cachefiles_coherency_check_objsize;
+	} else if (buf->content == CACHEFILES_CONTENT_DIRTY) {
+		// TODO: Begin conflict resolution
+		pr_warn("Dirty object in cache\n");
+		why = cachefiles_coherency_check_dirty;
 	} else {
 		object->fscache.cookie->zero_point = be64_to_cpu(buf->zero_point);
 		object->content_info = buf->content;
 		why = cachefiles_coherency_check_ok;
+		object->fscache.cookie->zero_point = be64_to_cpu(buf->zero_point);
+		object->content_info = buf->content;
 		ret = 0;
 	}
 
@@ -219,3 +227,16 @@ int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
 	_leave(" = %d", ret);
 	return ret;
 }
+
+/*
+ * Stick a marker on the cache object to indicate that it's dirty.
+ */
+int cachefiles_prepare_to_write(struct fscache_object *_object)
+{
+	struct cachefiles_object *object =
+		container_of(_object, struct cachefiles_object, fscache);
+
+	_enter("c=%08x", object->fscache.cookie->debug_id);
+
+	return cachefiles_set_object_xattr(object);
+}
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index fc93f4b69198..22cb8efe261f 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -342,6 +342,8 @@ EXPORT_SYMBOL(__fscache_acquire_cookie);
 void __fscache_use_cookie(struct fscache_cookie *cookie, bool will_modify)
 {
 	enum fscache_cookie_stage stage;
+	struct fscache_object *object;
+	bool write_set;
 
 	_enter("c=%08x", cookie->debug_id);
 
@@ -360,7 +362,7 @@ void __fscache_use_cookie(struct fscache_cookie *cookie, bool will_modify)
 
 		/* The lookup job holds its own active increment */
 		atomic_inc(&cookie->n_active);
-		fscache_dispatch(cookie, NULL, 0, fscache_lookup_object);
+		fscache_dispatch(cookie, NULL, will_modify, fscache_lookup_object);
 		break;
 
 	case FSCACHE_COOKIE_STAGE_INITIALISING:
@@ -373,8 +375,17 @@ void __fscache_use_cookie(struct fscache_cookie *cookie, bool will_modify)
 	case FSCACHE_COOKIE_STAGE_NO_DATA_YET:
 	case FSCACHE_COOKIE_STAGE_ACTIVE:
 	case FSCACHE_COOKIE_STAGE_INVALIDATING:
-		// TODO: Handle will_modify
-		spin_unlock(&cookie->lock);
+		if (will_modify) {
+			object = hlist_entry(cookie->backing_objects.first,
+					     struct fscache_object, cookie_link);
+			write_set = test_and_set_bit(FSCACHE_OBJECT_LOCAL_WRITE,
+						     &object->flags);
+			spin_unlock(&cookie->lock);
+			if (!write_set)
+				fscache_dispatch(cookie, object, 0, fscache_prepare_to_write);
+		} else {
+			spin_unlock(&cookie->lock);
+		}
 		break;
 
 	case FSCACHE_COOKIE_STAGE_DEAD:
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index d9391d3974d1..120bb68f74b1 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -136,6 +136,7 @@ extern void fscache_lookup_object(struct fscache_cookie *, struct fscache_object
 extern void fscache_invalidate_object(struct fscache_cookie *, struct fscache_object *, int);
 extern void fscache_drop_object(struct fscache_cookie *, struct fscache_object *, bool);
 extern void fscache_discard_objects(struct fscache_cookie *, struct fscache_object *, int);
+extern void fscache_prepare_to_write(struct fscache_cookie *, struct fscache_object *, int);
 
 /*
  * object-list.c
diff --git a/fs/fscache/obj.c b/fs/fscache/obj.c
index a7064a4cb486..139b59472628 100644
--- a/fs/fscache/obj.c
+++ b/fs/fscache/obj.c
@@ -117,7 +117,8 @@ static bool fscache_wrangle_object(struct fscache_cookie *cookie,
  * Create an object chain, making sure that the index chain is fully created.
  */
 static struct fscache_object *fscache_lookup_object_chain(struct fscache_cookie *cookie,
-							  struct fscache_cache *cache)
+							  struct fscache_cache *cache,
+							  bool will_modify)
 {
 	struct fscache_object *object = NULL, *parent, *xobject;
 
@@ -131,7 +132,7 @@ static struct fscache_object *fscache_lookup_object_chain(struct fscache_cookie
 	spin_unlock(&cookie->lock);
 
 	/* Recurse to look up/create the parent index. */
-	parent = fscache_lookup_object_chain(cookie->parent, cache);
+	parent = fscache_lookup_object_chain(cookie->parent, cache, false);
 	if (IS_ERR(parent))
 		goto error;
 
@@ -146,6 +147,9 @@ static struct fscache_object *fscache_lookup_object_chain(struct fscache_cookie
 	if (!object)
 		goto error;
 
+	if (will_modify)
+		__set_bit(FSCACHE_OBJECT_LOCAL_WRITE, &object->flags);
+
 	xobject = fscache_attach_object(cookie, object);
 	if (xobject != object) {
 		fscache_do_put_object(object, fscache_obj_put_alloc_dup);
@@ -203,7 +207,8 @@ static struct fscache_object *fscache_lookup_object_chain(struct fscache_cookie
  * - this must make sure the index chain is instantiated and instantiate the
  *   object representation too
  */
-static void fscache_lookup_object_locked(struct fscache_cookie *cookie)
+static void fscache_lookup_object_locked(struct fscache_cookie *cookie,
+					 bool will_modify)
 {
 	struct fscache_object *object;
 	struct fscache_cache *cache;
@@ -221,12 +226,16 @@ static void fscache_lookup_object_locked(struct fscache_cookie *cookie)
 
 	_debug("cache %s", cache->tag->name);
 
-	object = fscache_lookup_object_chain(cookie, cache);
+	object = fscache_lookup_object_chain(cookie, cache, will_modify);
 	if (!object) {
 		_leave(" [fail]");
 		return;
 	}
 
+	if (will_modify &&
+	    test_and_set_bit(FSCACHE_OBJECT_LOCAL_WRITE, &object->flags))
+		fscache_prepare_to_write(cookie, object, 0);
+
 	fscache_do_put_object(object, fscache_obj_put);
 	_leave(" [done]");
 }
@@ -235,7 +244,7 @@ void fscache_lookup_object(struct fscache_cookie *cookie,
 			   struct fscache_object *object, int param)
 {
 	down_read(&fscache_addremove_sem);
-	fscache_lookup_object_locked(cookie);
+	fscache_lookup_object_locked(cookie, param);
 	up_read(&fscache_addremove_sem);
 	__fscache_unuse_cookie(cookie, NULL, NULL);
 }
@@ -339,3 +348,12 @@ void fscache_discard_objects(struct fscache_cookie *cookie,
 	up_read(&fscache_addremove_sem);
 	_leave("");
 }
+
+/*
+ * Prepare a cache object to be written to.
+ */
+void fscache_prepare_to_write(struct fscache_cookie *cookie,
+			      struct fscache_object *object, int param)
+{
+	object->cache->ops->prepare_to_write(object);
+}
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index da85eb15b3c9..3625fd431d9f 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -154,6 +154,9 @@ struct fscache_cache_ops {
 		     struct fscache_io_request *req,
 		     struct iov_iter *iter);
 
+	/* Prepare to write to a live cache object */
+	int (*prepare_to_write)(struct fscache_object *object);
+
 	/* Display object info in /proc/fs/fscache/objects */
 	int (*display_object)(struct seq_file *m, struct fscache_object *object);
 };
@@ -182,6 +185,7 @@ struct fscache_object {
 	spinlock_t		lock;		/* state and operations lock */
 
 	unsigned long		flags;
+#define FSCACHE_OBJECT_LOCAL_WRITE	1	/* T if the object is being modified locally */
 #define FSCACHE_OBJECT_NEEDS_INVAL	8	/* T if object needs invalidation */
 #define FSCACHE_OBJECT_NEEDS_UPDATE	9	/* T if object attrs need writing to disk */
 



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 30/32] fscache: Provide resize operation
  2020-07-13 16:30 ` David Howells
                   ` (29 preceding siblings ...)
  (?)
@ 2020-07-13 16:36 ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:36 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Provide a cache operation to resize an object.  This is intended to be run
synchronously rather than being deferred as it really needs to run inside
the inode lock on the netfs inode from ->setattr() to correctly order with
respect to other truncates and writes.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/interface.c     |   24 ++++++++++++++++++++++++
 fs/fscache/internal.h         |    3 +++
 fs/fscache/io.c               |   27 +++++++++++++++++++++++++++
 fs/fscache/stats.c            |    9 +++++++--
 include/linux/fscache-cache.h |    2 ++
 include/linux/fscache.h       |   18 ++++++++++++++++++
 6 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index c626cc4248a7..d4172a40ddc9 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -248,6 +248,29 @@ static bool cachefiles_shorten_object(struct cachefiles_object *object, loff_t n
 	return true;
 }
 
+/*
+ * Resize the backing object.
+ */
+static void cachefiles_resize_object(struct fscache_object *_object, loff_t new_size)
+{
+	struct cachefiles_object *object =
+		container_of(_object, struct cachefiles_object, fscache);
+	loff_t old_size = object->fscache.cookie->object_size;
+
+	_enter("%llu->%llu", old_size, new_size);
+
+	if (new_size < old_size) {
+		cachefiles_shorten_content_map(object, new_size);
+		cachefiles_shorten_object(object, new_size);
+		return;
+	}
+
+	/* The file is being expanded.  We don't need to do anything
+	 * particularly.  cookie->initial_size doesn't change and so the point
+	 * at which we have to download before doesn't change.
+	 */
+}
+
 /*
  * Trim excess stored data off of an object.
  */
@@ -631,6 +654,7 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
 	.free_lookup_data	= cachefiles_free_lookup_data,
 	.grab_object		= cachefiles_grab_object,
 	.update_object		= cachefiles_update_object,
+	.resize_object		= cachefiles_resize_object,
 	.invalidate_object	= cachefiles_invalidate_object,
 	.drop_object		= cachefiles_drop_object,
 	.put_object		= cachefiles_put_object,
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 120bb68f74b1..eb61e0716e20 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -178,6 +178,9 @@ extern atomic_t fscache_n_updates;
 extern atomic_t fscache_n_updates_null;
 extern atomic_t fscache_n_updates_run;
 
+extern atomic_t fscache_n_resizes;
+extern atomic_t fscache_n_resizes_null;
+
 extern atomic_t fscache_n_relinquishes;
 extern atomic_t fscache_n_relinquishes_null;
 extern atomic_t fscache_n_relinquishes_retire;
diff --git a/fs/fscache/io.c b/fs/fscache/io.c
index 1885cfbe7f04..1a074f9c4bbe 100644
--- a/fs/fscache/io.c
+++ b/fs/fscache/io.c
@@ -172,3 +172,30 @@ int __fscache_write(struct fscache_io_request *req, struct iov_iter *iter)
 	}
 }
 EXPORT_SYMBOL(__fscache_write);
+
+/*
+ * Change the size of a backing object.
+ */
+void __fscache_resize_cookie(struct fscache_cookie *cookie, loff_t new_size)
+{
+	struct fscache_object *object;
+
+	ASSERT(cookie->type != FSCACHE_COOKIE_TYPE_INDEX);
+
+	object = fscache_begin_io_operation(cookie, FSCACHE_WANT_WRITE, NULL);
+	if (!IS_ERR(object)) {
+		fscache_stat(&fscache_n_resizes);
+		set_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->flags);
+
+		/* We cannot defer a resize as we need to do it inside the
+		 * netfs's inode lock so that we're serialised with respect to
+		 * writes.
+		 */
+		object->cache->ops->resize_object(object, new_size);
+		object->cache->ops->put_object(object, fscache_obj_put_ioreq);
+		fscache_end_io_operation(cookie);
+	} else {
+		fscache_stat(&fscache_n_resizes_null);
+	}
+}
+EXPORT_SYMBOL(__fscache_resize_cookie);
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index 63fb4d831f4d..33cea7f527db 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -26,6 +26,9 @@ atomic_t fscache_n_updates;
 atomic_t fscache_n_updates_null;
 atomic_t fscache_n_updates_run;
 
+atomic_t fscache_n_resizes;
+atomic_t fscache_n_resizes_null;
+
 atomic_t fscache_n_relinquishes;
 atomic_t fscache_n_relinquishes_null;
 atomic_t fscache_n_relinquishes_retire;
@@ -132,10 +135,12 @@ int fscache_stats_show(struct seq_file *m, void *v)
 	seq_printf(m, "Invals : n=%u\n",
 		   atomic_read(&fscache_n_invalidates));
 
-	seq_printf(m, "Updates: n=%u nul=%u run=%u\n",
+	seq_printf(m, "Updates: n=%u nul=%u run=%u rsz=%u rsn=%u\n",
 		   atomic_read(&fscache_n_updates),
 		   atomic_read(&fscache_n_updates_null),
-		   atomic_read(&fscache_n_updates_run));
+		   atomic_read(&fscache_n_updates_run),
+		   atomic_read(&fscache_n_resizes),
+		   atomic_read(&fscache_n_resizes_null));
 
 	seq_printf(m, "Relinqs: n=%u nul=%u rtr=%u\n",
 		   atomic_read(&fscache_n_relinquishes),
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 3625fd431d9f..ba0ad89a968e 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -118,6 +118,8 @@ struct fscache_cache_ops {
 
 	/* store the updated auxiliary data on an object */
 	void (*update_object)(struct fscache_object *object);
+	/* Change the size of a data object */
+	void (*resize_object)(struct fscache_object *object, loff_t new_size);
 
 	/* Invalidate an object */
 	bool (*invalidate_object)(struct fscache_object *object,
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index c313950afd8a..cd8b6dc81c52 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -232,6 +232,7 @@ extern void __fscache_unuse_cookie(struct fscache_cookie *, const void *, const
 extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
 extern void __fscache_update_cookie(struct fscache_cookie *, const void *, const loff_t *);
 extern void __fscache_shape_request(struct fscache_cookie *, struct fscache_request_shape *);
+extern void __fscache_resize_cookie(struct fscache_cookie *, loff_t);
 extern void __fscache_invalidate(struct fscache_cookie *, const void *, loff_t, unsigned int);
 extern void __fscache_init_io_request(struct fscache_io_request *,
 				      struct fscache_cookie *);
@@ -431,6 +432,23 @@ void fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data,
 		__fscache_update_cookie(cookie, aux_data, object_size);
 }
 
+/**
+ * fscache_resize_cookie - Request that a cache object be resized
+ * @cookie: The cookie representing the cache object
+ * @new_size: The new size of the object (may be NULL)
+ *
+ * Request that the size of an object be changed.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+void fscache_resize_cookie(struct fscache_cookie *cookie, loff_t new_size)
+{
+	if (fscache_cookie_valid(cookie))
+		__fscache_resize_cookie(cookie, new_size);
+}
+
 /**
  * fscache_pin_cookie - Pin a data-storage cache object in its cache
  * @cookie: The cookie representing the cache object



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 31/32] fscache: Remove the update operation
  2020-07-13 16:30 ` David Howells
                   ` (30 preceding siblings ...)
  (?)
@ 2020-07-13 16:36 ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:36 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Remove the cache-side of the object update operation as it doesn't
serialise with other setattr, O_TRUNC and write operations.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/interface.c     |   59 -----------------------------------------
 fs/fscache/internal.h         |    1 -
 fs/fscache/obj.c              |   14 ----------
 fs/fscache/stats.c            |    4 +--
 include/linux/fscache-cache.h |    2 -
 5 files changed, 1 insertion(+), 79 deletions(-)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index d4172a40ddc9..21a06dd575ca 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -152,64 +152,6 @@ struct fscache_object *cachefiles_grab_object(struct fscache_object *_object,
 	return &object->fscache;
 }
 
-/*
- * update the auxiliary data for an object object on disk
- */
-static void cachefiles_update_object(struct fscache_object *_object)
-{
-	struct cachefiles_object *object;
-	struct cachefiles_cache *cache;
-	const struct cred *saved_cred;
-	struct inode *inode;
-	loff_t object_size, i_size;
-	int ret;
-
-	_enter("{OBJ%x}", _object->debug_id);
-
-	object = container_of(_object, struct cachefiles_object, fscache);
-	cache = container_of(object->fscache.cache, struct cachefiles_cache,
-			     cache);
-
-	cachefiles_begin_secure(cache, &saved_cred);
-
-	object_size = object->fscache.cookie->object_size;
-	inode = d_inode(object->dentry);
-	i_size = i_size_read(inode);
-	if (i_size > object_size) {
-		struct path path = {
-			.mnt	= cache->mnt,
-			.dentry	= object->dentry
-		};
-		_debug("trunc %llx -> %llx", i_size, object_size);
-		trace_cachefiles_trunc(object, inode, i_size, object_size);
-		ret = vfs_truncate(&path, object_size);
-		if (ret < 0) {
-			cachefiles_io_error_obj(object, "Trunc-to-size failed");
-			cachefiles_remove_object_xattr(cache, object->dentry);
-			goto out;
-		}
-
-		object_size = round_up(object_size, CACHEFILES_DIO_BLOCK_SIZE);
-		i_size = i_size_read(inode);
-		_debug("trunc %llx -> %llx", i_size, object_size);
-		if (i_size < object_size) {
-			trace_cachefiles_trunc(object, inode, i_size, object_size);
-			ret = vfs_truncate(&path, object_size);
-			if (ret < 0) {
-				cachefiles_io_error_obj(object, "Trunc-to-dio-size failed");
-				cachefiles_remove_object_xattr(cache, object->dentry);
-				goto out;
-			}
-		}
-	}
-
-	cachefiles_set_object_xattr(object);
-
-out:
-	cachefiles_end_secure(cache, saved_cred);
-	_leave("");
-}
-
 /*
  * Shorten the backing object to discard any dirty data and free up
  * any unused granules.
@@ -653,7 +595,6 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
 	.lookup_object		= cachefiles_lookup_object,
 	.free_lookup_data	= cachefiles_free_lookup_data,
 	.grab_object		= cachefiles_grab_object,
-	.update_object		= cachefiles_update_object,
 	.resize_object		= cachefiles_resize_object,
 	.invalidate_object	= cachefiles_invalidate_object,
 	.drop_object		= cachefiles_drop_object,
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index eb61e0716e20..ca74b0090e15 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -202,7 +202,6 @@ extern atomic_t fscache_n_cop_alloc_object;
 extern atomic_t fscache_n_cop_lookup_object;
 extern atomic_t fscache_n_cop_create_object;
 extern atomic_t fscache_n_cop_invalidate_object;
-extern atomic_t fscache_n_cop_update_object;
 extern atomic_t fscache_n_cop_drop_object;
 extern atomic_t fscache_n_cop_put_object;
 extern atomic_t fscache_n_cop_sync_cache;
diff --git a/fs/fscache/obj.c b/fs/fscache/obj.c
index 139b59472628..a2306f32044c 100644
--- a/fs/fscache/obj.c
+++ b/fs/fscache/obj.c
@@ -54,14 +54,6 @@ static int fscache_do_create_object(struct fscache_object *object, void *data)
 	return ret;
 }
 
-static void fscache_do_update_object(struct fscache_object *object)
-{
-	fscache_stat(&fscache_n_updates_run);
-	fscache_stat(&fscache_n_cop_update_object);
-	object->cache->ops->update_object(object);
-	fscache_stat_d(&fscache_n_cop_update_object);
-}
-
 static void fscache_do_drop_object(struct fscache_cache *cache,
 				   struct fscache_object *object,
 				   bool invalidate)
@@ -282,12 +274,6 @@ void fscache_drop_object(struct fscache_cookie *cookie,
 	_enter("{OBJ%x,%d},%u",
 	       object->debug_id, object->n_children, invalidate);
 
-	if (!invalidate &&
-	    test_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->flags)) {
-		_debug("final update");
-		fscache_do_update_object(object);
-	}
-
 	spin_lock(&cache->object_list_lock);
 	list_del_init(&object->cache_link);
 	spin_unlock(&cache->object_list_lock);
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index 33cea7f527db..f35f22f9a7f3 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -50,7 +50,6 @@ atomic_t fscache_n_cop_alloc_object;
 atomic_t fscache_n_cop_lookup_object;
 atomic_t fscache_n_cop_create_object;
 atomic_t fscache_n_cop_invalidate_object;
-atomic_t fscache_n_cop_update_object;
 atomic_t fscache_n_cop_drop_object;
 atomic_t fscache_n_cop_put_object;
 atomic_t fscache_n_cop_sync_cache;
@@ -150,9 +149,8 @@ int fscache_stats_show(struct seq_file *m, void *v)
 	seq_printf(m, "CacheOp: alo=%d luo=%d\n",
 		   atomic_read(&fscache_n_cop_alloc_object),
 		   atomic_read(&fscache_n_cop_lookup_object));
-	seq_printf(m, "CacheOp: inv=%d upo=%d dro=%d pto=%d atc=%d syn=%d\n",
+	seq_printf(m, "CacheOp: inv=%d dro=%d pto=%d atc=%d syn=%d\n",
 		   atomic_read(&fscache_n_cop_invalidate_object),
-		   atomic_read(&fscache_n_cop_update_object),
 		   atomic_read(&fscache_n_cop_drop_object),
 		   atomic_read(&fscache_n_cop_put_object),
 		   atomic_read(&fscache_n_cop_attr_changed),
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index ba0ad89a968e..14ee82de2d79 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -116,8 +116,6 @@ struct fscache_cache_ops {
 	/* unpin an object in the cache */
 	void (*unpin_object)(struct fscache_object *object);
 
-	/* store the updated auxiliary data on an object */
-	void (*update_object)(struct fscache_object *object);
 	/* Change the size of a data object */
 	void (*resize_object)(struct fscache_object *object, loff_t new_size);
 



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 32/32] cachefiles: Shape write requests
@ 2020-07-13 16:36   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:36 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

In cachefiles_shape_extent(), shape a write request to always write to the
cache.  The assumption is made that the caller has read the entire cache
granule beforehand if necessary.

Possibly this should be amended so that writes will only take place to
granules that are marked present and granules that lie beyond the EOF.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/content-map.c |   21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
index 3e310fd58497..592fc426500b 100644
--- a/fs/cachefiles/content-map.c
+++ b/fs/cachefiles/content-map.c
@@ -69,7 +69,8 @@ static void cachefiles_shape_single(struct fscache_object *obj,
 
 	shape->dio_block_size = CACHEFILES_DIO_BLOCK_SIZE;
 
-	if (object->content_info == CACHEFILES_CONTENT_SINGLE) {
+	if (!shape->for_write &&
+	    object->content_info == CACHEFILES_CONTENT_SINGLE) {
 		shape->to_be_done = FSCACHE_READ_FROM_CACHE;
 	} else {
 		eof = (shape->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
@@ -127,14 +128,20 @@ void cachefiles_shape_request(struct fscache_object *obj,
 	if (end - start > max_pages)
 		end = start + max_pages;
 
-	/* If the content map didn't get expanded for some reason - simply
-	 * ignore this granule.
-	 */
 	granule = start / CACHEFILES_GRAN_PAGES;
-	if (granule / 8 >= object->content_map_size)
-		return;
+	if (granule / 8 >= object->content_map_size) {
+		cachefiles_expand_content_map(object, shape->i_size);
+		if (granule / 8 >= object->content_map_size)
+			return;
+	}
 
-	if (cachefiles_granule_is_present(object, granule)) {
+	if (shape->for_write) {
+		/* Assume that the preparation to write involved preloading any
+		 * bits of the cache that weren't to be written and filling any
+		 * gaps that didn't end up being written.
+		 */
+		shape->to_be_done = FSCACHE_WRITE_TO_CACHE;
+	} else if (cachefiles_granule_is_present(object, granule)) {
 		/* The start of the requested extent is present in the cache -
 		 * restrict the returned extent to the maximum length of what's
 		 * available.



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 32/32] cachefiles: Shape write requests
@ 2020-07-13 16:36   ` David Howells
  0 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-13 16:36 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: Jeff Layton, Dave Wysochanski, dhowells-H+wXaHxf7aLQT0dZR+AlfA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA,
	linux-afs-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	v9fs-developer-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

In cachefiles_shape_extent(), shape a write request to always write to the
cache.  The assumption is made that the caller has read the entire cache
granule beforehand if necessary.

Possibly this should be amended so that writes will only take place to
granules that are marked present and granules that lie beyond the EOF.

Signed-off-by: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---

 fs/cachefiles/content-map.c |   21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
index 3e310fd58497..592fc426500b 100644
--- a/fs/cachefiles/content-map.c
+++ b/fs/cachefiles/content-map.c
@@ -69,7 +69,8 @@ static void cachefiles_shape_single(struct fscache_object *obj,
 
 	shape->dio_block_size = CACHEFILES_DIO_BLOCK_SIZE;
 
-	if (object->content_info == CACHEFILES_CONTENT_SINGLE) {
+	if (!shape->for_write &&
+	    object->content_info == CACHEFILES_CONTENT_SINGLE) {
 		shape->to_be_done = FSCACHE_READ_FROM_CACHE;
 	} else {
 		eof = (shape->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
@@ -127,14 +128,20 @@ void cachefiles_shape_request(struct fscache_object *obj,
 	if (end - start > max_pages)
 		end = start + max_pages;
 
-	/* If the content map didn't get expanded for some reason - simply
-	 * ignore this granule.
-	 */
 	granule = start / CACHEFILES_GRAN_PAGES;
-	if (granule / 8 >= object->content_map_size)
-		return;
+	if (granule / 8 >= object->content_map_size) {
+		cachefiles_expand_content_map(object, shape->i_size);
+		if (granule / 8 >= object->content_map_size)
+			return;
+	}
 
-	if (cachefiles_granule_is_present(object, granule)) {
+	if (shape->for_write) {
+		/* Assume that the preparation to write involved preloading any
+		 * bits of the cache that weren't to be written and filling any
+		 * gaps that didn't end up being written.
+		 */
+		shape->to_be_done = FSCACHE_WRITE_TO_CACHE;
+	} else if (cachefiles_granule_is_present(object, granule)) {
 		/* The start of the requested extent is present in the cache -
 		 * restrict the returned extent to the maximum length of what's
 		 * available.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/32] iov_iter: Add ITER_MAPPING
  2020-07-13 16:30   ` David Howells
  (?)
@ 2020-07-19  1:44   ` Al Viro
  -1 siblings, 0 replies; 55+ messages in thread
From: Al Viro @ 2020-07-19  1:44 UTC (permalink / raw)
  To: David Howells
  Cc: Trond Myklebust, Anna Schumaker, Steve French, Matthew Wilcox,
	Jeff Layton, Dave Wysochanski, linux-cachefs, linux-afs,
	linux-nfs, linux-cifs, ceph-devel, v9fs-developer, linux-fsdevel,
	linux-kernel

On Mon, Jul 13, 2020 at 05:30:52PM +0100, David Howells wrote:
> Add an iterator, ITER_MAPPING, that walks through a set of pages attached
> to an address_space, starting at a given page and offset and walking for
> the specified amount of bytes.
> 
> The caller must guarantee that the pages are all present and they must be
> locked using PG_locked, PG_writeback or PG_fscache to prevent them from
> going away or being migrated whilst they're being accessed.
> 
> This is useful for copying data from socket buffers to inodes in network
> filesystems and for transferring data between those inodes and the cache
> using direct I/O.
> 
> Whilst it is true that ITER_BVEC could be used instead, that would require
> a bio_vec array to be allocated to refer to all the pages - which should be
> redundant if inode->i_pages also points to all these pages.
> 
> This could also be turned into an ITER_XARRAY, taking and xarray pointer
> instead of a mapping pointer.  It would be mostly trivial, except for the
> use of find_get_pages_contig() by iov_iter_get_pages*().
> 

My main problem here is that your iterate_mapping() assumes that STEP is
safe under rcu_read_lock(), with no visible mentioning of that fact.
Note, BTW, that iov_iter_for_each_range() quietly calls user-supplied
callback in such context.

Incidentally, do you ever have different steps for bvec and mapping?

> +	if (unlikely(iov_iter_is_mapping(i))) {
> +		/* We really don't want to fetch pages if we can avoid it */
> +		i->iov_offset += size;
> +		i->count -= size;
> +		return;

That's... not nice.  At the very least you want to cap size by i->count here
(and for discard case as well, while we are at it).

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/32] iov_iter: Add ITER_MAPPING
  2020-07-13 16:30   ` David Howells
  (?)
  (?)
@ 2020-07-19  9:51   ` David Howells
  -1 siblings, 0 replies; 55+ messages in thread
From: David Howells @ 2020-07-19  9:51 UTC (permalink / raw)
  To: Al Viro
  Cc: dhowells, Trond Myklebust, Anna Schumaker, Steve French,
	Matthew Wilcox, Jeff Layton, Dave Wysochanski, linux-cachefs,
	linux-afs, linux-nfs, linux-cifs, ceph-devel, v9fs-developer,
	linux-fsdevel, linux-kernel

Al Viro <viro@zeniv.linux.org.uk> wrote:

> My main problem here is that your iterate_mapping() assumes that STEP is
> safe under rcu_read_lock(), with no visible mentioning of that fact.

Yeah, that's probably the biggest objection to this.

> Note, BTW, that iov_iter_for_each_range() quietly calls user-supplied
> callback in such context.

And calls kmap(), but should probably use kmap_atomic().  git grep doesn't
show any users of this, so can it be removed?

> Incidentally, do you ever have different steps for bvec and mapping?

Yes:

	csum_and_copy_from_iter_full()
	iov_iter_npages()
	iov_iter_get_pages()
	iov_iter_get_pages_alloc()

But I've tried to use the internal representation struct for bvec where I can
rather than inventing a new one.

David


^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2020-07-19  9:52 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-13 16:30 [PATCH 00/32] fscache: Rewrite 2: Make the I/O interface use kiocb/iov_iter David Howells
2020-07-13 16:30 ` David Howells
2020-07-13 16:30 ` [PATCH 01/32] iov_iter: Add ITER_MAPPING David Howells
2020-07-13 16:30   ` David Howells
2020-07-19  1:44   ` Al Viro
2020-07-19  9:51   ` David Howells
2020-07-13 16:31 ` [PATCH 02/32] vm: Add wait/unlock functions for PG_fscache David Howells
2020-07-13 16:31   ` David Howells
2020-07-13 16:31 ` [PATCH 03/32] vfs: Export rw_verify_area() for use by cachefiles David Howells
2020-07-13 16:31 ` [PATCH 04/32] vfs: Provide S_CACHE_FILE inode flag David Howells
2020-07-13 16:31   ` David Howells
2020-07-13 16:31 ` [PATCH 05/32] mm: Provide lru_to_last_page() to get last of a page list David Howells
2020-07-13 16:31 ` [PATCH 06/32] cachefiles: Remove tree of active files and use S_CACHE_FILE inode flag David Howells
2020-07-13 16:31   ` David Howells
2020-07-13 16:31 ` [PATCH 07/32] fscache: Provide a simple thread pool for running ops asynchronously David Howells
2020-07-13 16:32 ` [PATCH 08/32] fscache: Replace the object management state machine David Howells
2020-07-13 16:32   ` David Howells
2020-07-13 16:32 ` [PATCH 09/32] fscache: Rewrite the I/O API based on iov_iter David Howells
2020-07-13 16:32 ` [PATCH 10/32] fscache: Remove fscache_wait_on_invalidate() David Howells
2020-07-13 16:32   ` David Howells
2020-07-13 16:32 ` [PATCH 11/32] fscache: Keep track of size of a file last set independently on the server David Howells
2020-07-13 16:32   ` David Howells
2020-07-13 16:32 ` [PATCH 12/32] fscache, cachefiles: Fix disabled histogram warnings David Howells
2020-07-13 16:32   ` David Howells
2020-07-13 16:33 ` [PATCH 13/32] fscache: Recast assertion in terms of cookie not being an index David Howells
2020-07-13 16:33 ` [PATCH 14/32] cachefiles: Remove some redundant checks on unsigned values David Howells
2020-07-13 16:33   ` David Howells
2020-07-13 16:33 ` [PATCH 15/32] cachefiles: trace: Log coherency checks David Howells
2020-07-13 16:33   ` David Howells
2020-07-13 16:33 ` [PATCH 16/32] cachefiles: Split cachefiles_drop_object() up a bit David Howells
2020-07-13 16:33   ` David Howells
2020-07-13 16:33 ` [PATCH 17/32] cachefiles: Implement new fscache I/O backend API David Howells
2020-07-13 16:33 ` [PATCH 18/32] cachefiles: Merge object->backer into object->dentry David Howells
2020-07-13 16:34 ` [PATCH 19/32] cachefiles: Implement a content-present indicator and bitmap David Howells
2020-07-13 16:34   ` David Howells
2020-07-13 16:34 ` [PATCH 20/32] cachefiles: Implement extent shaper David Howells
2020-07-13 16:34 ` [PATCH 21/32] cachefiles: Round the cachefile size up to DIO block size David Howells
2020-07-13 16:34 ` [PATCH 22/32] cachefiles: Implement read and write parts of new I/O API David Howells
2020-07-13 16:34   ` David Howells
2020-07-13 16:34 ` [PATCH 23/32] cachefiles: Add I/O tracepoints David Howells
2020-07-13 16:34   ` David Howells
2020-07-13 16:35 ` [PATCH 24/32] fscache: Add read helper David Howells
2020-07-13 16:35   ` David Howells
2020-07-13 16:35 ` [PATCH 25/32] fscache: Display cache-specific data in /proc/fs/fscache/objects David Howells
2020-07-13 16:35   ` David Howells
2020-07-13 16:35 ` [PATCH 26/32] fscache: Remove more obsolete stats David Howells
2020-07-13 16:35   ` David Howells
2020-07-13 16:35 ` [PATCH 27/32] fscache: New stats David Howells
2020-07-13 16:35 ` [PATCH 28/32] fscache, cachefiles: Rewrite invalidation David Howells
2020-07-13 16:35   ` David Howells
2020-07-13 16:36 ` [PATCH 29/32] fscache: Implement "will_modify" parameter on fscache_use_cookie() David Howells
2020-07-13 16:36 ` [PATCH 30/32] fscache: Provide resize operation David Howells
2020-07-13 16:36 ` [PATCH 31/32] fscache: Remove the update operation David Howells
2020-07-13 16:36 ` [PATCH 32/32] cachefiles: Shape write requests David Howells
2020-07-13 16:36   ` David Howells

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.