ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/76] fscache: Modernisation
@ 2020-11-20 15:02 David Howells
  2020-11-20 15:02 ` [RFC PATCH 01/76] nfs, cifs, ceph, 9p: Disable use of fscache prior to its rewrite David Howells
                   ` (74 more replies)
  0 siblings, 75 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:02 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: linux-cifs, linux-afs, v9fs-developer, Matthew Wilcox (Oracle),
	Dave Wysochanski, Jeff Layton, Trond Myklebust, Ilya Dryomov,
	ceph-devel, Latchesar Ionkov, Matthew Wilcox, linux-nfs,
	Eric Van Hensbergen, dhowells, Jeff Layton, Matthew Wilcox,
	Alexander Viro, linux-cachefs, linux-afs, linux-nfs, linux-cifs,
	ceph-devel, v9fs-developer, linux-fsdevel, linux-kernel


Here's a set of patches that modernises the fscache API by switching to use
the kiocb interface to do async DIO to/from the cache rather than by
snooping the pagecache of the backing filesystem - something that doesn't
seem totally reliable.  This means that the cache uses a lot less memory
than it did and is faster than it was.

A netfs helper library is provided that handles most of the readpage,
readahead and write_begin ops on behalf of the filesystem.  This is
intended to be used unconditionally by the filesystem and provides a common
framework for doing caching, transparent huge pages (which works with AFS)
and, in the future, possibly fscrypt.  It also allows the netfs and the
cache to align, expand and slice up a read request from the VM in various
ways; the netfs need only provide a function to read a stretch of data to
the pagecache and the helper takes care of the rest.

This patch series ports the AFS filesystem to the new API and disables
caching in NFS, Ceph, CIFS and 9P.  Jeff Layton has a port for Ceph that is
solid and Dave Wysochanski has one for NFS that is mostly solid.  I've
given partial ports to 9P and CIFS to Dominique Martinet and Steve French
respectively.

-~-

To this end, the fscache API has been massively overhauled.  Code that was
using the old API is now disabled.  It's not practical to have the two APIs
coexist because they have totally different ways of doing things, but must
share common data.

The following parts have been removed:

    - The object state machine
    - The I/O operation manager
    - All non-transient references from fscache to the netfs's data
    - All non-transient callbacks from fscache to the netfs
    - The backing page I/O monitoring
    - The tracking of netfs pages that fscache knows about
    - The tracking of netfs pages that need writing to the cache
    - The use of bmap to work out if a page is stored in the cache
    - The copy of data to/from backing pages to netfs pages.

Instead, the I/O to the cache is driven much more from the netfs and the
netfs helpers.  There are a number of aspects to the I/O API:

 (1) The lowest level I/O primitives just take an iov_iter and start async
     DIO on the cache objects.  The caller gets a callback upon completion.
     The PG_fscache bit is now just used to indicate that there's a write
     to the cache in progress.  The cache will keep track in xattrs as to
     what areas of the backing file are occupied.

     - fscache_begin_operation(), fscache_end_operation()
     - fscache_read() and fscache_write().

 (2) The cookie that's obtained when an inode is set up must be 'used' when a
     file is opened (with an indication as to whether it might be modified)
     and 'unused' when it is done with.  At the point of unuse, the auxdata
     and file size can be specified.

     - fscache_use_cookie(), fscache_unuse_cookie()

 (3) The cookie can be invalidated at any time, and new auxiliary data and a
     new size provided.  Any in-progress I/O will either cause new I/O to
     wait, or a replacement tmpfile will be created and the in-progress I/O
     will just be abandoned.  The on-disk auxdata (in xattrs, say) are updated
     lazily.

     - fscache_invalidate()

 (4) The netfs helpers for read are provided to combine the (1), (2) above
     and do read-(re)issue to the network in the case that the data isn't
     present or the cache fails.  This requires that an operation
     descriptor be allocated and given some operations.  This needs to be
     used for ->readpage(), ->readahead() and prefetching for
     ->write_begin().  ->readpages() is obsolete and soon to go away.

     - include/linux/netfs.h
     - netfs_readpage(), netfs_readahead(), netfs_begin_write()

 (5) There are some helpers to keep a cookie in use once a file has been
     closed so that writeback can write to the cache.  write_inode and
     evict inode need to clear this.

     - fscache_set_page_dirty(), fscache_unpin_writeback()
     - fscache_write_to_cache()

I've also simplified the cookie management API to remove struct
fscache_cookie_def.  Instead, the pertinent details are supplied when a
cookie is created and the file size, key and auxdata are stored in the
cookie.  Callbacks and backpointers are simply removed.

I've added some pieces outside of the API also:

 (1) An inode flag to mark a backing cachefile as being in use by the
     kernel.  This prevents multiple caches mounted in the same directory
     from fighting over the same files.  It can also be extended to exclude
     other kernel users (such as swap) and could also be used to prevent
     userspace interfering with the file.

 (2) A new I/O iterator class, ITER_XARRAY, that iterates over the
     specified byte range of an xarray.  The caller is required to make
     sure that the pages don't evaporate under the callee (eg. pinning them
     by PG_locked, PG_writeback, PG_fscache or usage count).

     This is better than an ITER_BVEC as no allocation of bio_vec structs
     is required since the xarray holds pointers to all the pages involved.

 (3) Wait and unlock functions for PG_fscache.  These are in the core, so
     no need to call into fscache for it.

Future stuff:

 (1) Avoid the need to double-truncate a backing file on relinquishment.
     The first truncate cuts of rubbish data from the end so that it's not
     retrieved in future; the second expands the EOF back out to DIO block
     size.

 (2) Add a write helper to improve on generic_perform_write(), allowing a
     span of pages to be modified as a block and to take account of cache
     granularity.

 (3) Put in support for versioned monolithic objects (eg. AFS directories).

 (4) Currently it cachefiles only caches large files up to 1GiB.  File data
     beyond that isn't cached.  The problem here is that I'm using an xattr to
     hold the content map, and xattrs may be limited in size and I've limited
     myself to using a 512 byte xattr.  I can easily make it cache a
     non-sparse file of any size with no map, but as soon as it becomes
     sparse, I need a different strategy.

 (5) Change the indexing strategy so that the culling mechanism is brought
     into the kernel, rather than doing that in userspace, and use an index
     table of files with a LRU list.

 (6) Add support for throttling readahead when the server pushes back (cifs
     credits).

These patches can be found also on:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter

David
---
David Howells (75):
      nfs, cifs, ceph, 9p: Disable use of fscache prior to its rewrite
      afs: Disable use of the fscache I/O routines
      fscache: Add a cookie debug ID and use that in traces
      fscache: Procfile to display cookies
      fscache: Remove the old I/O API
      fscache: Remove the netfs data from the cookie
      fscache: Remove struct fscache_cookie_def
      fscache: Remove store_limit* from struct fscache_object
      fscache: Remove fscache_check_consistency()
      fscache: Remove fscache_attr_changed()
      fscache: Remove obsolete stats
      fscache: Remove old I/O tracepoints
      fscache: Temporarily disable fscache_invalidate()
      fscache: Remove the I/O operation manager
      fscache: Change %p in format strings to something else
      cachefiles: Change %p in format strings to something else
      iov_iter: Add ITER_XARRAY
      vm: Add wait/unlock functions for PG_fscache
      mm: Implement readahead_control pageset expansion
      mm: Stop generic_file_buffered_read() from grabbing a superfluous page
      vfs: Export rw_verify_area() for use by cachefiles
      vfs: Provide S_CACHE_FILE inode flag
      cachefiles: Remove tree of active files and use S_CACHE_FILE inode flag
      fscache: Provide a simple thread pool for running ops asynchronously
      fscache: Replace the object management state machine
      fscache: Rewrite the I/O API based on iov_iter
      fscache: Keep track of size of a file last set independently on the server
      fscache, cachefiles: Fix disabled histogram warnings
      fscache: Recast assertion in terms of cookie not being an index
      vfs, fscache: Force ->write_inode() to occur if cookie pinned for writeback
      fscache: Allow ->put_super() to be used to wait for cache operations
      netfs: Make a netfs helper module
      netfs: Provide readahead and readpage netfs helpers
      netfs: Use the cache
      fscache: read-helper: Add tracepoints
      cachefiles: Remove some redundant checks on unsigned values
      cachefiles: trace: Log coherency checks
      cachefiles: Split cachefiles_drop_object() up a bit
      cachefiles: Implement new fscache I/O backend API
      cachefiles: Merge object->backer into object->dentry
      cachefiles: Implement a content-present indicator and bitmap
      cachefiles: Shape requests from the fscache read helper
      cachefiles: Round the cachefile size up to DIO block size
      cachefiles: Implement read and write parts of new I/O API
      cachefiles: Add I/O tracepoints
      fscache: Display cache-specific data in /proc/fs/fscache/objects
      fscache: Remove more obsolete stats
      fscache: Always create /proc/fs/fscache/stats if configured
      netfs: Stats
      fscache: New stats
      fscache, cachefiles: Rewrite invalidation
      fscache: Implement "will_modify" parameter on fscache_use_cookie()
      fscache: Provide resize operation
      fscache: Remove the update operation
      afs: Pass page into dirty region helpers to provide THP size
      afs: Print the operation debug_id when logging an unexpected data version
      afs: Move key to afs_read struct
      afs: Don't truncate iter during data fetch
      afs: Log remote unmarshalling errors
      afs: Set up the iov_iter before calling afs_extract_data()
      afs: Use ITER_XARRAY for writing
      afs: Wait on PG_fscache before modifying/releasing a page
      afs: Extract writeback extension into its own function
      afs: Prepare for use of THPs
      afs: Use the fs operation ops to handle FetchData completion
      afs: Use new fscache read helper API
      netfs: Add write_begin helper
      fscache: Add support for writing to the cache
      afs: Use the fscache_write_begin() helper
      afs: Copy local writes to the cache when writing to the server
      afs: Invoke fscache_resize_cookie() when handling ATTR_SIZE for setattr
      afs: Add O_DIRECT read support
      afs: Skip truncation on the server of data we haven't written yet
      afs: Make afs_write_begin() return the THP subpage
      afs: Fix speculative status fetch going out of order wrt to modifications

Jeff Layton (1):
      fscache: disable cookie when doing an invalidation for DIO write


 fs/Kconfig                        |    1 +
 fs/Makefile                       |    1 +
 fs/afs/Kconfig                    |    1 +
 fs/afs/cache.c                    |   54 --
 fs/afs/cell.c                     |    9 +-
 fs/afs/dir.c                      |  226 ++++--
 fs/afs/file.c                     |  562 +++++--------
 fs/afs/fs_operation.c             |    4 +-
 fs/afs/fsclient.c                 |  126 ++-
 fs/afs/inode.c                    |  108 ++-
 fs/afs/internal.h                 |   69 +-
 fs/afs/rxrpc.c                    |  150 ++--
 fs/afs/super.c                    |   25 +
 fs/afs/volume.c                   |    9 +-
 fs/afs/write.c                    |  819 +++++++++++--------
 fs/afs/yfsclient.c                |   94 +--
 fs/cachefiles/Makefile            |    3 +-
 fs/cachefiles/bind.c              |   13 +-
 fs/cachefiles/content-map.c       |  515 ++++++++++++
 fs/cachefiles/daemon.c            |   10 +-
 fs/cachefiles/interface.c         |  597 ++++++++------
 fs/cachefiles/internal.h          |  155 ++--
 fs/cachefiles/io.c                |  360 +++++++++
 fs/cachefiles/key.c               |    2 +-
 fs/cachefiles/main.c              |   12 +-
 fs/cachefiles/namei.c             |  538 +++++--------
 fs/cachefiles/rdwr.c              |  975 ----------------------
 fs/cachefiles/xattr.c             |  271 +++----
 fs/fs-writeback.c                 |    8 +
 fs/fscache/Kconfig                |    2 +
 fs/fscache/Makefile               |   10 +-
 fs/fscache/cache.c                |  171 ++--
 fs/fscache/cache_init.c           |  139 ++++
 fs/fscache/cookie.c               |  939 ++++++++++------------
 fs/fscache/dispatcher.c           |  151 ++++
 fs/fscache/fsdef.c                |   56 +-
 fs/fscache/histogram.c            |    2 +-
 fs/fscache/internal.h             |  268 ++-----
 fs/fscache/io.c                   |  390 +++++++++
 fs/fscache/main.c                 |  151 +---
 fs/fscache/netfs.c                |   10 +-
 fs/fscache/obj.c                  |  360 +++++++++
 fs/fscache/object-list.c          |  129 +--
 fs/fscache/object.c               | 1133 --------------------------
 fs/fscache/object_bits.c          |  120 +++
 fs/fscache/operation.c            |  633 ---------------
 fs/fscache/page.c                 | 1248 -----------------------------
 fs/fscache/proc.c                 |   55 +-
 fs/fscache/stats.c                |  233 ++----
 fs/internal.h                     |    5 -
 fs/netfs/Kconfig                  |   23 +
 fs/netfs/Makefile                 |    5 +
 fs/netfs/internal.h               |   97 +++
 fs/netfs/read_helper.c            | 1125 ++++++++++++++++++++++++++
 fs/netfs/stats.c                  |   57 ++
 fs/nfs/fscache-index.c            |    4 +-
 fs/read_write.c                   |    1 +
 include/linux/fs.h                |    5 +
 include/linux/fscache-cache.h     |  510 +++---------
 include/linux/fscache-obsolete.h  |   13 +
 include/linux/fscache.h           |  814 ++++++++-----------
 include/linux/netfs.h             |  111 +++
 include/linux/pagemap.h           |   16 +
 include/linux/uio.h               |   11 +
 include/linux/writeback.h         |    1 +
 include/net/af_rxrpc.h            |    2 +-
 include/trace/events/afs.h        |   74 +-
 include/trace/events/cachefiles.h |  287 +++++--
 include/trace/events/fscache.h    |  428 ++--------
 include/trace/events/netfs.h      |  202 +++++
 lib/iov_iter.c                    |  312 +++++++-
 mm/filemap.c                      |   20 +
 mm/readahead.c                    |   70 ++
 net/rxrpc/recvmsg.c               |    9 +-
 74 files changed, 7421 insertions(+), 8698 deletions(-)
 create mode 100644 fs/cachefiles/content-map.c
 create mode 100644 fs/cachefiles/io.c
 delete mode 100644 fs/cachefiles/rdwr.c
 create mode 100644 fs/fscache/cache_init.c
 create mode 100644 fs/fscache/dispatcher.c
 create mode 100644 fs/fscache/io.c
 create mode 100644 fs/fscache/obj.c
 delete mode 100644 fs/fscache/object.c
 create mode 100644 fs/fscache/object_bits.c
 delete mode 100644 fs/fscache/operation.c
 delete mode 100644 fs/fscache/page.c
 create mode 100644 fs/netfs/Kconfig
 create mode 100644 fs/netfs/Makefile
 create mode 100644 fs/netfs/internal.h
 create mode 100644 fs/netfs/read_helper.c
 create mode 100644 fs/netfs/stats.c
 create mode 100644 include/linux/fscache-obsolete.h
 create mode 100644 include/linux/netfs.h
 create mode 100644 include/trace/events/netfs.h



^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC PATCH 01/76] nfs, cifs, ceph, 9p: Disable use of fscache prior to its rewrite
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
@ 2020-11-20 15:02 ` David Howells
  2020-11-20 15:02 ` [RFC PATCH 02/76] afs: Disable use of the fscache I/O routines David Howells
                   ` (73 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:02 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: Dave Wysochanski, Trond Myklebust, linux-nfs, Jeff Layton,
	Ilya Dryomov, ceph-devel, linux-cifs, Eric Van Hensbergen,
	Latchesar Ionkov, v9fs-developer, dhowells, Jeff Layton,
	Matthew Wilcox, Alexander Viro, linux-cachefs, linux-afs,
	linux-nfs, linux-cifs, ceph-devel, v9fs-developer, linux-fsdevel,
	linux-kernel

Temporarily disable the use of fscache by the various Linux network
filesystems, apart from afs, so that the fscache core can be rewritten.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Dave Wysochanski <dwysocha@redhat.com>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: Anna Schumaker <anna.schumaker@netapp.com>
cc: linux-nfs@vger.kernel.org
cc: Jeff Layton <jlayton@kernel.org>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: ceph-devel@vger.kernel.org
cc: Steve French <sfrench@samba.org>
cc: linux-cifs@vger.kernel.org
cc: Eric Van Hensbergen <ericvh@gmail.com>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: v9fs-developer@lists.sourceforge.net
---

 fs/9p/Kconfig      |    2 +-
 fs/ceph/Kconfig    |    2 +-
 fs/cifs/Kconfig    |    2 +-
 fs/fscache/Kconfig |    4 ++++
 fs/nfs/Kconfig     |    2 +-
 5 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/9p/Kconfig b/fs/9p/Kconfig
index 09fd4a185fd2..66ca72422961 100644
--- a/fs/9p/Kconfig
+++ b/fs/9p/Kconfig
@@ -13,7 +13,7 @@ config 9P_FS
 if 9P_FS
 config 9P_FSCACHE
 	bool "Enable 9P client caching support"
-	depends on 9P_FS=m && FSCACHE || 9P_FS=y && FSCACHE=y
+	depends on 9P_FS=m && FSCACHE_OLD || 9P_FS=y && FSCACHE_OLD=y
 	help
 	  Choose Y here to enable persistent, read-only local
 	  caching support for 9p clients using FS-Cache
diff --git a/fs/ceph/Kconfig b/fs/ceph/Kconfig
index 471e40156065..e955a38be3c8 100644
--- a/fs/ceph/Kconfig
+++ b/fs/ceph/Kconfig
@@ -20,7 +20,7 @@ config CEPH_FS
 if CEPH_FS
 config CEPH_FSCACHE
 	bool "Enable Ceph client caching support"
-	depends on CEPH_FS=m && FSCACHE || CEPH_FS=y && FSCACHE=y
+	depends on CEPH_FS=m && FSCACHE_OLD || CEPH_FS=y && FSCACHE_OLD=y
 	help
 	  Choose Y here to enable persistent, read-only local
 	  caching support for Ceph clients using FS-Cache
diff --git a/fs/cifs/Kconfig b/fs/cifs/Kconfig
index 604f65f4b6c5..441f24bff49c 100644
--- a/fs/cifs/Kconfig
+++ b/fs/cifs/Kconfig
@@ -206,7 +206,7 @@ config CIFS_SMB_DIRECT
 
 config CIFS_FSCACHE
 	bool "Provide CIFS client caching support"
-	depends on CIFS=m && FSCACHE || CIFS=y && FSCACHE=y
+	depends on CIFS=m && FSCACHE_OLD || CIFS=y && FSCACHE_OLD=y
 	help
 	  Makes CIFS FS-Cache capable. Say Y here if you want your CIFS data
 	  to be cached locally on disk through the general filesystem cache
diff --git a/fs/fscache/Kconfig b/fs/fscache/Kconfig
index 5e796e6c38e5..ce6f731065d0 100644
--- a/fs/fscache/Kconfig
+++ b/fs/fscache/Kconfig
@@ -60,3 +60,7 @@ config FSCACHE_OBJECT_LIST
 	help
 	  Maintain a global list of active fscache objects that can be
 	  retrieved through /proc/fs/fscache/objects for debugging purposes
+
+config FSCACHE_OLD
+	bool
+	default n
diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index 88e1763e02f3..8909ef506073 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -170,7 +170,7 @@ config ROOT_NFS
 
 config NFS_FSCACHE
 	bool "Provide NFS client caching support"
-	depends on NFS_FS=m && FSCACHE || NFS_FS=y && FSCACHE=y
+	depends on NFS_FS=m && FSCACHE_OLD || NFS_FS=y && FSCACHE_OLD=y
 	help
 	  Say Y here if you want NFS data to be cached locally on disc through
 	  the general filesystem cache manager



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 02/76] afs: Disable use of the fscache I/O routines
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
  2020-11-20 15:02 ` [RFC PATCH 01/76] nfs, cifs, ceph, 9p: Disable use of fscache prior to its rewrite David Howells
@ 2020-11-20 15:02 ` David Howells
  2020-11-20 15:03 ` [RFC PATCH 03/76] fscache: Add a cookie debug ID and use that in traces David Howells
                   ` (72 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:02 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: linux-afs, dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Disable use of the fscache I/O routined by the AFS filesystem.  It's about
to transition to passing iov_iters down and fscache is about to have its
I/O path to use iov_iter, so all that needs to change.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
---

 fs/afs/file.c  |  199 ++++++++++----------------------------------------------
 fs/afs/inode.c |    2 -
 fs/afs/write.c |   10 ---
 3 files changed, 36 insertions(+), 175 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 85f5adf21aa0..6d43713fde01 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -203,24 +203,6 @@ void afs_put_read(struct afs_read *req)
 	}
 }
 
-#ifdef CONFIG_AFS_FSCACHE
-/*
- * deal with notification that a page was read from the cache
- */
-static void afs_file_readpage_read_complete(struct page *page,
-					    void *data,
-					    int error)
-{
-	_enter("%p,%p,%d", page, data, error);
-
-	/* if the read completes with an error, we just unlock the page and let
-	 * the VM reissue the readpage */
-	if (!error)
-		SetPageUptodate(page);
-	unlock_page(page);
-}
-#endif
-
 static void afs_fetch_data_success(struct afs_operation *op)
 {
 	struct afs_vnode *vnode = op->file[0].vnode;
@@ -288,89 +270,46 @@ int afs_page_filler(void *data, struct page *page)
 	if (test_bit(AFS_VNODE_DELETED, &vnode->flags))
 		goto error;
 
-	/* is it cached? */
-#ifdef CONFIG_AFS_FSCACHE
-	ret = fscache_read_or_alloc_page(vnode->cache,
-					 page,
-					 afs_file_readpage_read_complete,
-					 NULL,
-					 GFP_KERNEL);
-#else
-	ret = -ENOBUFS;
-#endif
-	switch (ret) {
-		/* read BIO submitted (page in cache) */
-	case 0:
-		break;
-
-		/* page not yet cached */
-	case -ENODATA:
-		_debug("cache said ENODATA");
-		goto go_on;
-
-		/* page will not be cached */
-	case -ENOBUFS:
-		_debug("cache said ENOBUFS");
-
-		fallthrough;
-	default:
-	go_on:
-		req = kzalloc(struct_size(req, array, 1), GFP_KERNEL);
-		if (!req)
-			goto enomem;
-
-		/* We request a full page.  If the page is a partial one at the
-		 * end of the file, the server will return a short read and the
-		 * unmarshalling code will clear the unfilled space.
-		 */
-		refcount_set(&req->usage, 1);
-		req->pos = (loff_t)page->index << PAGE_SHIFT;
-		req->len = PAGE_SIZE;
-		req->nr_pages = 1;
-		req->pages = req->array;
-		req->pages[0] = page;
-		get_page(page);
-
-		/* read the contents of the file from the server into the
-		 * page */
-		ret = afs_fetch_data(vnode, key, req);
-		afs_put_read(req);
-
-		if (ret < 0) {
-			if (ret == -ENOENT) {
-				_debug("got NOENT from server"
-				       " - marking file deleted and stale");
-				set_bit(AFS_VNODE_DELETED, &vnode->flags);
-				ret = -ESTALE;
-			}
-
-#ifdef CONFIG_AFS_FSCACHE
-			fscache_uncache_page(vnode->cache, page);
-#endif
-			BUG_ON(PageFsCache(page));
-
-			if (ret == -EINTR ||
-			    ret == -ENOMEM ||
-			    ret == -ERESTARTSYS ||
-			    ret == -EAGAIN)
-				goto error;
-			goto io_error;
-		}
+	req = kzalloc(struct_size(req, array, 1), GFP_KERNEL);
+	if (!req)
+		goto enomem;
 
-		SetPageUptodate(page);
+	/* We request a full page.  If the page is a partial one at the
+	 * end of the file, the server will return a short read and the
+	 * unmarshalling code will clear the unfilled space.
+	 */
+	refcount_set(&req->usage, 1);
+	req->pos = (loff_t)page->index << PAGE_SHIFT;
+	req->len = PAGE_SIZE;
+	req->nr_pages = 1;
+	req->pages = req->array;
+	req->pages[0] = page;
+	get_page(page);
+
+	/* read the contents of the file from the server into the
+	 * page */
+	ret = afs_fetch_data(vnode, key, req);
+	afs_put_read(req);
 
-		/* send the page to the cache */
-#ifdef CONFIG_AFS_FSCACHE
-		if (PageFsCache(page) &&
-		    fscache_write_page(vnode->cache, page, vnode->status.size,
-				       GFP_KERNEL) != 0) {
-			fscache_uncache_page(vnode->cache, page);
-			BUG_ON(PageFsCache(page));
+	if (ret < 0) {
+		if (ret == -ENOENT) {
+			_debug("got NOENT from server"
+			       " - marking file deleted and stale");
+			set_bit(AFS_VNODE_DELETED, &vnode->flags);
+			ret = -ESTALE;
 		}
-#endif
-		unlock_page(page);
+
+		if (ret == -EINTR ||
+		    ret == -ENOMEM ||
+		    ret == -ERESTARTSYS ||
+		    ret == -EAGAIN)
+			goto error;
+		goto io_error;
 	}
 
+	SetPageUptodate(page);
+	unlock_page(page);
+
 	_leave(" = 0");
 	return 0;
 
@@ -416,23 +355,10 @@ static int afs_readpage(struct file *file, struct page *page)
  */
 static void afs_readpages_page_done(struct afs_read *req)
 {
-#ifdef CONFIG_AFS_FSCACHE
-	struct afs_vnode *vnode = req->vnode;
-#endif
 	struct page *page = req->pages[req->index];
 
 	req->pages[req->index] = NULL;
 	SetPageUptodate(page);
-
-	/* send the page to the cache */
-#ifdef CONFIG_AFS_FSCACHE
-	if (PageFsCache(page) &&
-	    fscache_write_page(vnode->cache, page, vnode->status.size,
-			       GFP_KERNEL) != 0) {
-		fscache_uncache_page(vnode->cache, page);
-		BUG_ON(PageFsCache(page));
-	}
-#endif
 	unlock_page(page);
 	put_page(page);
 }
@@ -491,9 +417,6 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping,
 		index = page->index;
 		if (add_to_page_cache_lru(page, mapping, index,
 					  readahead_gfp_mask(mapping))) {
-#ifdef CONFIG_AFS_FSCACHE
-			fscache_uncache_page(vnode->cache, page);
-#endif
 			put_page(page);
 			break;
 		}
@@ -526,9 +449,6 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping,
 	for (i = 0; i < req->nr_pages; i++) {
 		page = req->pages[i];
 		if (page) {
-#ifdef CONFIG_AFS_FSCACHE
-			fscache_uncache_page(vnode->cache, page);
-#endif
 			SetPageError(page);
 			unlock_page(page);
 		}
@@ -560,37 +480,6 @@ static int afs_readpages(struct file *file, struct address_space *mapping,
 	}
 
 	/* attempt to read as many of the pages as possible */
-#ifdef CONFIG_AFS_FSCACHE
-	ret = fscache_read_or_alloc_pages(vnode->cache,
-					  mapping,
-					  pages,
-					  &nr_pages,
-					  afs_file_readpage_read_complete,
-					  NULL,
-					  mapping_gfp_mask(mapping));
-#else
-	ret = -ENOBUFS;
-#endif
-
-	switch (ret) {
-		/* all pages are being read from the cache */
-	case 0:
-		BUG_ON(!list_empty(pages));
-		BUG_ON(nr_pages != 0);
-		_leave(" = 0 [reading all]");
-		return 0;
-
-		/* there were pages that couldn't be read from the cache */
-	case -ENODATA:
-	case -ENOBUFS:
-		break;
-
-		/* other error */
-	default:
-		_leave(" = %d", ret);
-		return ret;
-	}
-
 	while (!list_empty(pages)) {
 		ret = afs_readpages_one(file, mapping, pages);
 		if (ret < 0)
@@ -670,17 +559,6 @@ static void afs_invalidatepage(struct page *page, unsigned int offset,
 
 	BUG_ON(!PageLocked(page));
 
-#ifdef CONFIG_AFS_FSCACHE
-	/* we clean up only if the entire page is being invalidated */
-	if (offset == 0 && length == PAGE_SIZE) {
-		if (PageFsCache(page)) {
-			struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
-			fscache_wait_on_page_write(vnode->cache, page);
-			fscache_uncache_page(vnode->cache, page);
-		}
-	}
-#endif
-
 	if (PagePrivate(page))
 		afs_invalidate_dirty(page, offset, length);
 
@@ -702,13 +580,6 @@ static int afs_releasepage(struct page *page, gfp_t gfp_flags)
 
 	/* deny if page is being written to the cache and the caller hasn't
 	 * elected to wait */
-#ifdef CONFIG_AFS_FSCACHE
-	if (!fscache_maybe_release_page(vnode->cache, page, gfp_flags)) {
-		_leave(" = F [cache busy]");
-		return 0;
-	}
-#endif
-
 	if (PagePrivate(page)) {
 		priv = (unsigned long)detach_page_private(page);
 		trace_afs_page_dirty(vnode, tracepoint_string("rel"),
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 0fe8844b4bee..79c8dc06c0b8 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -420,7 +420,7 @@ static void afs_get_inode_cache(struct afs_vnode *vnode)
 	} __packed key;
 	struct afs_vnode_cache_aux aux;
 
-	if (vnode->status.type == AFS_FTYPE_DIR) {
+	if (vnode->status.type != AFS_FTYPE_FILE) {
 		vnode->cache = NULL;
 		return;
 	}
diff --git a/fs/afs/write.c b/fs/afs/write.c
index c9195fc67fd8..92eaa88000d7 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -847,9 +847,6 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 	/* Wait for the page to be written to the cache before we allow it to
 	 * be modified.  We then assume the entire page will need writing back.
 	 */
-#ifdef CONFIG_AFS_FSCACHE
-	fscache_wait_on_page_write(vnode->cache, vmf->page);
-#endif
 
 	if (PageWriteback(vmf->page) &&
 	    wait_on_page_bit_killable(vmf->page, PG_writeback) < 0)
@@ -936,12 +933,5 @@ int afs_launder_page(struct page *page)
 	priv = (unsigned long)detach_page_private(page);
 	trace_afs_page_dirty(vnode, tracepoint_string("laundered"),
 			     page->index, priv);
-
-#ifdef CONFIG_AFS_FSCACHE
-	if (PageFsCache(page)) {
-		fscache_wait_on_page_write(vnode->cache, page);
-		fscache_uncache_page(vnode->cache, page);
-	}
-#endif
 	return ret;
 }



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 03/76] fscache: Add a cookie debug ID and use that in traces
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
  2020-11-20 15:02 ` [RFC PATCH 01/76] nfs, cifs, ceph, 9p: Disable use of fscache prior to its rewrite David Howells
  2020-11-20 15:02 ` [RFC PATCH 02/76] afs: Disable use of the fscache I/O routines David Howells
@ 2020-11-20 15:03 ` David Howells
  2020-11-20 15:03 ` [RFC PATCH 04/76] fscache: Procfile to display cookies David Howells
                   ` (71 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:03 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Add a cookie debug ID and use that in traces and in procfiles rather than
displaying the (hashed) pointer to the cookie.  This is easier to correlate
and we don't lose anything when interpreting oops output since that shows
unhashed addresses and registers that aren't comparable to the hashed
values.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/cookie.c               |   29 ++++++---
 fs/fscache/fsdef.c                |    1 
 fs/fscache/object-list.c          |   14 ++--
 include/linux/fscache.h           |    1 
 include/trace/events/cachefiles.h |   68 +++++++++++-----------
 include/trace/events/fscache.h    |  116 +++++++++++++++++++------------------
 6 files changed, 121 insertions(+), 108 deletions(-)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 751bc5b1cddf..f2be98d2c64d 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -29,21 +29,29 @@ static int fscache_attach_object(struct fscache_cookie *cookie,
 
 static void fscache_print_cookie(struct fscache_cookie *cookie, char prefix)
 {
-	struct hlist_node *object;
+	struct fscache_object *object;
+	struct hlist_node *o;
 	const u8 *k;
 	unsigned loop;
 
-	pr_err("%c-cookie c=%p [p=%p fl=%lx nc=%u na=%u]\n",
-	       prefix, cookie, cookie->parent, cookie->flags,
+	pr_err("%c-cookie c=%08x [p=%08x fl=%lx nc=%u na=%u]\n",
+	       prefix,
+	       cookie->debug_id,
+	       cookie->parent ? cookie->parent->debug_id : 0,
+	       cookie->flags,
 	       atomic_read(&cookie->n_children),
 	       atomic_read(&cookie->n_active));
-	pr_err("%c-cookie d=%p n=%p\n",
-	       prefix, cookie->def, cookie->netfs_data);
+	pr_err("%c-cookie d=%p{%s} n=%p\n",
+	       prefix,
+	       cookie->def,
+	       cookie->def ? cookie->def->name : "?",
+	       cookie->netfs_data);
 
-	object = READ_ONCE(cookie->backing_objects.first);
-	if (object)
-		pr_err("%c-cookie o=%p\n",
-		       prefix, hlist_entry(object, struct fscache_object, cookie_link));
+	o = READ_ONCE(cookie->backing_objects.first);
+	if (o) {
+		object = hlist_entry(o, struct fscache_object, cookie_link);
+		pr_err("%c-cookie o=%u\n", prefix, object->debug_id);
+	}
 
 	pr_err("%c-key=[%u] '", prefix, cookie->key_len);
 	k = (cookie->key_len <= sizeof(cookie->inline_key)) ?
@@ -129,6 +137,8 @@ static long fscache_compare_cookie(const struct fscache_cookie *a,
 	return memcmp(ka, kb, a->key_len);
 }
 
+static atomic_t fscache_cookie_debug_id = ATOMIC_INIT(1);
+
 /*
  * Allocate a cookie.
  */
@@ -163,6 +173,7 @@ struct fscache_cookie *fscache_alloc_cookie(
 
 	atomic_set(&cookie->usage, 1);
 	atomic_set(&cookie->n_children, 0);
+	cookie->debug_id = atomic_inc_return(&fscache_cookie_debug_id);
 
 	/* We keep the active count elevated until relinquishment to prevent an
 	 * attempt to wake up every time the object operations queue quiesces.
diff --git a/fs/fscache/fsdef.c b/fs/fscache/fsdef.c
index 09ed8795ad86..5f8f6fe243fe 100644
--- a/fs/fscache/fsdef.c
+++ b/fs/fscache/fsdef.c
@@ -45,6 +45,7 @@ static struct fscache_cookie_def fscache_fsdef_index_def = {
 };
 
 struct fscache_cookie fscache_fsdef_index = {
+	.debug_id	= 1,
 	.usage		= ATOMIC_INIT(1),
 	.n_active	= ATOMIC_INIT(1),
 	.lock		= __SPIN_LOCK_UNLOCKED(fscache_fsdef_index.lock),
diff --git a/fs/fscache/object-list.c b/fs/fscache/object-list.c
index e106a1a1600d..1a0dc32c0a33 100644
--- a/fs/fscache/object-list.c
+++ b/fs/fscache/object-list.c
@@ -170,7 +170,7 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 	if ((unsigned long) v == 1) {
 		seq_puts(m, "OBJECT   PARENT   STAT CHLDN OPS OOP IPR EX READS"
 			 " EM EV FL S"
-			 " | NETFS_COOKIE_DEF TY FL NETFS_DATA");
+			 " | COOKIE   NETFS_COOKIE_DEF TY FL NETFS_DATA");
 		if (config & (FSCACHE_OBJLIST_CONFIG_KEY |
 			      FSCACHE_OBJLIST_CONFIG_AUX))
 			seq_puts(m, "       ");
@@ -189,7 +189,7 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 	if ((unsigned long) v == 2) {
 		seq_puts(m, "======== ======== ==== ===== === === === == ====="
 			 " == == == ="
-			 " | ================ == == ================");
+			 " | ======== ================ == === ================");
 		if (config & (FSCACHE_OBJLIST_CONFIG_KEY |
 			      FSCACHE_OBJLIST_CONFIG_AUX))
 			seq_puts(m, " ================");
@@ -231,9 +231,9 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 	}
 
 	seq_printf(m,
-		   "%8x %8x %s %5u %3u %3u %3u %2u %5u %2lx %2lx %2lx %1x | ",
+		   "%08x %08x %s %5u %3u %3u %3u %2u %5u %2lx %2lx %2lx %1x | ",
 		   obj->debug_id,
-		   obj->parent ? obj->parent->debug_id : -1,
+		   obj->parent ? obj->parent->debug_id : UINT_MAX,
 		   obj->state->short_name,
 		   obj->n_children,
 		   obj->n_ops,
@@ -246,7 +246,7 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 		   obj->flags,
 		   work_busy(&obj->work));
 
-	if (fscache_use_cookie(obj)) {
+	if (obj->cookie) {
 		uint16_t keylen = 0, auxlen = 0;
 
 		switch (cookie->type) {
@@ -263,7 +263,8 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 			break;
 		}
 
-		seq_printf(m, "%-16s %s %2lx %16p",
+		seq_printf(m, "%08x %-16s %s %3lx %16p",
+			   cookie->debug_id,
 			   cookie->def->name,
 			   type,
 			   cookie->flags,
@@ -292,7 +293,6 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 		}
 
 		seq_puts(m, "\n");
-		fscache_unuse_cookie(obj);
 	} else {
 		seq_puts(m, "<no_netfs>\n");
 	}
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index a1c928fe98e7..f31df9a075f6 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -134,6 +134,7 @@ struct fscache_cookie {
 	atomic_t			usage;		/* number of users of this cookie */
 	atomic_t			n_children;	/* number of children of this cookie */
 	atomic_t			n_active;	/* number of active users of netfs ptrs */
+	unsigned int			debug_id;
 	spinlock_t			lock;
 	spinlock_t			stores_lock;	/* lock on page store tree */
 	struct hlist_head		backing_objects; /* object(s) backing this file/index */
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 5d9de24cb9c0..9a448fe9355d 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -78,20 +78,20 @@ TRACE_EVENT(cachefiles_ref,
 
 	    /* Note that obj may be NULL */
 	    TP_STRUCT__entry(
-		    __field(struct cachefiles_object *,		obj		)
-		    __field(struct fscache_cookie *,		cookie		)
+		    __field(unsigned int,			obj		)
+		    __field(unsigned int,			cookie		)
 		    __field(enum cachefiles_obj_ref_trace,	why		)
 		    __field(int,				usage		)
 			     ),
 
 	    TP_fast_assign(
-		    __entry->obj	= obj;
-		    __entry->cookie	= cookie;
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->cookie	= cookie->debug_id;
 		    __entry->usage	= usage;
 		    __entry->why	= why;
 			   ),
 
-	    TP_printk("c=%p o=%p u=%d %s",
+	    TP_printk("c=%08x o=%08x u=%d %s",
 		      __entry->cookie, __entry->obj, __entry->usage,
 		      __print_symbolic(__entry->why, cachefiles_obj_ref_traces))
 	    );
@@ -104,18 +104,18 @@ TRACE_EVENT(cachefiles_lookup,
 	    TP_ARGS(obj, de, inode),
 
 	    TP_STRUCT__entry(
-		    __field(struct cachefiles_object *,	obj	)
+		    __field(unsigned int,		obj	)
 		    __field(struct dentry *,		de	)
 		    __field(struct inode *,		inode	)
 			     ),
 
 	    TP_fast_assign(
-		    __entry->obj	= obj;
+		    __entry->obj	= obj->fscache.debug_id;
 		    __entry->de		= de;
 		    __entry->inode	= inode;
 			   ),
 
-	    TP_printk("o=%p d=%p i=%p",
+	    TP_printk("o=%08x d=%p i=%p",
 		      __entry->obj, __entry->de, __entry->inode)
 	    );
 
@@ -126,18 +126,18 @@ TRACE_EVENT(cachefiles_mkdir,
 	    TP_ARGS(obj, de, ret),
 
 	    TP_STRUCT__entry(
-		    __field(struct cachefiles_object *,	obj	)
+		    __field(unsigned int,		obj	)
 		    __field(struct dentry *,		de	)
 		    __field(int,			ret	)
 			     ),
 
 	    TP_fast_assign(
-		    __entry->obj	= obj;
+		    __entry->obj	= obj->fscache.debug_id;
 		    __entry->de		= de;
 		    __entry->ret	= ret;
 			   ),
 
-	    TP_printk("o=%p d=%p r=%u",
+	    TP_printk("o=%08x d=%p r=%u",
 		      __entry->obj, __entry->de, __entry->ret)
 	    );
 
@@ -148,18 +148,18 @@ TRACE_EVENT(cachefiles_create,
 	    TP_ARGS(obj, de, ret),
 
 	    TP_STRUCT__entry(
-		    __field(struct cachefiles_object *,	obj	)
+		    __field(unsigned int,		obj	)
 		    __field(struct dentry *,		de	)
 		    __field(int,			ret	)
 			     ),
 
 	    TP_fast_assign(
-		    __entry->obj	= obj;
+		    __entry->obj	= obj->fscache.debug_id;
 		    __entry->de		= de;
 		    __entry->ret	= ret;
 			   ),
 
-	    TP_printk("o=%p d=%p r=%u",
+	    TP_printk("o=%08x d=%p r=%u",
 		      __entry->obj, __entry->de, __entry->ret)
 	    );
 
@@ -172,18 +172,18 @@ TRACE_EVENT(cachefiles_unlink,
 
 	    /* Note that obj may be NULL */
 	    TP_STRUCT__entry(
-		    __field(struct cachefiles_object *,	obj		)
+		    __field(unsigned int,		obj		)
 		    __field(struct dentry *,		de		)
 		    __field(enum fscache_why_object_killed, why		)
 			     ),
 
 	    TP_fast_assign(
-		    __entry->obj	= obj;
+		    __entry->obj	= obj->fscache.debug_id;
 		    __entry->de		= de;
 		    __entry->why	= why;
 			   ),
 
-	    TP_printk("o=%p d=%p w=%s",
+	    TP_printk("o=%08x d=%p w=%s",
 		      __entry->obj, __entry->de,
 		      __print_symbolic(__entry->why, cachefiles_obj_kill_traces))
 	    );
@@ -198,20 +198,20 @@ TRACE_EVENT(cachefiles_rename,
 
 	    /* Note that obj may be NULL */
 	    TP_STRUCT__entry(
-		    __field(struct cachefiles_object *,	obj		)
+		    __field(unsigned int,		obj		)
 		    __field(struct dentry *,		de		)
 		    __field(struct dentry *,		to		)
 		    __field(enum fscache_why_object_killed, why		)
 			     ),
 
 	    TP_fast_assign(
-		    __entry->obj	= obj;
+		    __entry->obj	= obj->fscache.debug_id;
 		    __entry->de		= de;
 		    __entry->to		= to;
 		    __entry->why	= why;
 			   ),
 
-	    TP_printk("o=%p d=%p t=%p w=%s",
+	    TP_printk("o=%08x d=%p t=%p w=%s",
 		      __entry->obj, __entry->de, __entry->to,
 		      __print_symbolic(__entry->why, cachefiles_obj_kill_traces))
 	    );
@@ -224,16 +224,16 @@ TRACE_EVENT(cachefiles_mark_active,
 
 	    /* Note that obj may be NULL */
 	    TP_STRUCT__entry(
-		    __field(struct cachefiles_object *,	obj		)
+		    __field(unsigned int,		obj		)
 		    __field(struct dentry *,		de		)
 			     ),
 
 	    TP_fast_assign(
-		    __entry->obj	= obj;
+		    __entry->obj	= obj->fscache.debug_id;
 		    __entry->de		= de;
 			   ),
 
-	    TP_printk("o=%p d=%p",
+	    TP_printk("o=%08x d=%p",
 		      __entry->obj, __entry->de)
 	    );
 
@@ -246,22 +246,22 @@ TRACE_EVENT(cachefiles_wait_active,
 
 	    /* Note that obj may be NULL */
 	    TP_STRUCT__entry(
-		    __field(struct cachefiles_object *,	obj		)
+		    __field(unsigned int,		obj		)
+		    __field(unsigned int,		xobj		)
 		    __field(struct dentry *,		de		)
-		    __field(struct cachefiles_object *,	xobj		)
 		    __field(u16,			flags		)
 		    __field(u16,			fsc_flags	)
 			     ),
 
 	    TP_fast_assign(
-		    __entry->obj	= obj;
+		    __entry->obj	= obj->fscache.debug_id;
 		    __entry->de		= de;
-		    __entry->xobj	= xobj;
+		    __entry->xobj	= xobj->fscache.debug_id;
 		    __entry->flags	= xobj->flags;
 		    __entry->fsc_flags	= xobj->fscache.flags;
 			   ),
 
-	    TP_printk("o=%p d=%p wo=%p wf=%x wff=%x",
+	    TP_printk("o=%08x d=%p wo=%08x wf=%x wff=%x",
 		      __entry->obj, __entry->de, __entry->xobj,
 		      __entry->flags, __entry->fsc_flags)
 	    );
@@ -275,18 +275,18 @@ TRACE_EVENT(cachefiles_mark_inactive,
 
 	    /* Note that obj may be NULL */
 	    TP_STRUCT__entry(
-		    __field(struct cachefiles_object *,	obj		)
+		    __field(unsigned int,		obj		)
 		    __field(struct dentry *,		de		)
 		    __field(struct inode *,		inode		)
 			     ),
 
 	    TP_fast_assign(
-		    __entry->obj	= obj;
+		    __entry->obj	= obj->fscache.debug_id;
 		    __entry->de		= de;
 		    __entry->inode	= inode;
 			   ),
 
-	    TP_printk("o=%p d=%p i=%p",
+	    TP_printk("o=%08x d=%p i=%p",
 		      __entry->obj, __entry->de, __entry->inode)
 	    );
 
@@ -299,18 +299,18 @@ TRACE_EVENT(cachefiles_mark_buried,
 
 	    /* Note that obj may be NULL */
 	    TP_STRUCT__entry(
-		    __field(struct cachefiles_object *,	obj		)
+		    __field(unsigned int,		obj		)
 		    __field(struct dentry *,		de		)
 		    __field(enum fscache_why_object_killed, why		)
 			     ),
 
 	    TP_fast_assign(
-		    __entry->obj	= obj;
+		    __entry->obj	= obj->fscache.debug_id;
 		    __entry->de		= de;
 		    __entry->why	= why;
 			   ),
 
-	    TP_printk("o=%p d=%p w=%s",
+	    TP_printk("o=%08x d=%p w=%s",
 		      __entry->obj, __entry->de,
 		      __print_symbolic(__entry->why, cachefiles_obj_kill_traces))
 	    );
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index d16fe6ed78a2..0b9e058aba4d 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -167,8 +167,8 @@ TRACE_EVENT(fscache_cookie,
 	    TP_ARGS(cookie, where, usage),
 
 	    TP_STRUCT__entry(
-		    __field(struct fscache_cookie *,	cookie		)
-		    __field(struct fscache_cookie *,	parent		)
+		    __field(unsigned int,		cookie		)
+		    __field(unsigned int,		parent		)
 		    __field(enum fscache_cookie_trace,	where		)
 		    __field(int,			usage		)
 		    __field(int,			n_children	)
@@ -177,8 +177,8 @@ TRACE_EVENT(fscache_cookie,
 			     ),
 
 	    TP_fast_assign(
-		    __entry->cookie	= cookie;
-		    __entry->parent	= cookie->parent;
+		    __entry->cookie	= cookie->debug_id;
+		    __entry->parent	= cookie->parent ? cookie->parent->debug_id : 0;
 		    __entry->where	= where;
 		    __entry->usage	= usage;
 		    __entry->n_children	= atomic_read(&cookie->n_children);
@@ -186,7 +186,7 @@ TRACE_EVENT(fscache_cookie,
 		    __entry->flags	= cookie->flags;
 			   ),
 
-	    TP_printk("%s c=%p u=%d p=%p Nc=%d Na=%d f=%02x",
+	    TP_printk("%s c=%08x u=%d p=%08x Nc=%d Na=%d f=%02x",
 		      __print_symbolic(__entry->where, fscache_cookie_traces),
 		      __entry->cookie, __entry->usage,
 		      __entry->parent, __entry->n_children, __entry->n_active,
@@ -199,17 +199,17 @@ TRACE_EVENT(fscache_netfs,
 	    TP_ARGS(netfs),
 
 	    TP_STRUCT__entry(
-		    __field(struct fscache_cookie *,	cookie		)
+		    __field(unsigned int,		cookie		)
 		    __array(char,			name, 8		)
 			     ),
 
 	    TP_fast_assign(
-		    __entry->cookie		= netfs->primary_index;
+		    __entry->cookie		= netfs->primary_index->debug_id;
 		    strncpy(__entry->name, netfs->name, 8);
 		    __entry->name[7]		= 0;
 			   ),
 
-	    TP_printk("c=%p n=%s",
+	    TP_printk("c=%08x n=%s",
 		      __entry->cookie, __entry->name)
 	    );
 
@@ -219,8 +219,8 @@ TRACE_EVENT(fscache_acquire,
 	    TP_ARGS(cookie),
 
 	    TP_STRUCT__entry(
-		    __field(struct fscache_cookie *,	cookie		)
-		    __field(struct fscache_cookie *,	parent		)
+		    __field(unsigned int,		cookie		)
+		    __field(unsigned int,		parent		)
 		    __array(char,			name, 8		)
 		    __field(int,			p_usage		)
 		    __field(int,			p_n_children	)
@@ -228,8 +228,8 @@ TRACE_EVENT(fscache_acquire,
 			     ),
 
 	    TP_fast_assign(
-		    __entry->cookie		= cookie;
-		    __entry->parent		= cookie->parent;
+		    __entry->cookie		= cookie->debug_id;
+		    __entry->parent		= cookie->parent->debug_id;
 		    __entry->p_usage		= atomic_read(&cookie->parent->usage);
 		    __entry->p_n_children	= atomic_read(&cookie->parent->n_children);
 		    __entry->p_flags		= cookie->parent->flags;
@@ -237,7 +237,7 @@ TRACE_EVENT(fscache_acquire,
 		    __entry->name[7]		= 0;
 			   ),
 
-	    TP_printk("c=%p p=%p pu=%d pc=%d pf=%02x n=%s",
+	    TP_printk("c=%08x p=%08x pu=%d pc=%d pf=%02x n=%s",
 		      __entry->cookie, __entry->parent, __entry->p_usage,
 		      __entry->p_n_children, __entry->p_flags, __entry->name)
 	    );
@@ -248,8 +248,8 @@ TRACE_EVENT(fscache_relinquish,
 	    TP_ARGS(cookie, retire),
 
 	    TP_STRUCT__entry(
-		    __field(struct fscache_cookie *,	cookie		)
-		    __field(struct fscache_cookie *,	parent		)
+		    __field(unsigned int,		cookie		)
+		    __field(unsigned int,		parent		)
 		    __field(int,			usage		)
 		    __field(int,			n_children	)
 		    __field(int,			n_active	)
@@ -258,8 +258,8 @@ TRACE_EVENT(fscache_relinquish,
 			     ),
 
 	    TP_fast_assign(
-		    __entry->cookie	= cookie;
-		    __entry->parent	= cookie->parent;
+		    __entry->cookie	= cookie->debug_id;
+		    __entry->parent	= cookie->parent->debug_id;
 		    __entry->usage	= atomic_read(&cookie->usage);
 		    __entry->n_children	= atomic_read(&cookie->n_children);
 		    __entry->n_active	= atomic_read(&cookie->n_active);
@@ -267,7 +267,7 @@ TRACE_EVENT(fscache_relinquish,
 		    __entry->retire	= retire;
 			   ),
 
-	    TP_printk("c=%p u=%d p=%p Nc=%d Na=%d f=%02x r=%u",
+	    TP_printk("c=%08x u=%d p=%08x Nc=%d Na=%d f=%02x r=%u",
 		      __entry->cookie, __entry->usage,
 		      __entry->parent, __entry->n_children, __entry->n_active,
 		      __entry->flags, __entry->retire)
@@ -279,7 +279,7 @@ TRACE_EVENT(fscache_enable,
 	    TP_ARGS(cookie),
 
 	    TP_STRUCT__entry(
-		    __field(struct fscache_cookie *,	cookie		)
+		    __field(unsigned int,		cookie		)
 		    __field(int,			usage		)
 		    __field(int,			n_children	)
 		    __field(int,			n_active	)
@@ -287,14 +287,14 @@ TRACE_EVENT(fscache_enable,
 			     ),
 
 	    TP_fast_assign(
-		    __entry->cookie	= cookie;
+		    __entry->cookie	= cookie->debug_id;
 		    __entry->usage	= atomic_read(&cookie->usage);
 		    __entry->n_children	= atomic_read(&cookie->n_children);
 		    __entry->n_active	= atomic_read(&cookie->n_active);
 		    __entry->flags	= cookie->flags;
 			   ),
 
-	    TP_printk("c=%p u=%d Nc=%d Na=%d f=%02x",
+	    TP_printk("c=%08x u=%d Nc=%d Na=%d f=%02x",
 		      __entry->cookie, __entry->usage,
 		      __entry->n_children, __entry->n_active, __entry->flags)
 	    );
@@ -305,7 +305,7 @@ TRACE_EVENT(fscache_disable,
 	    TP_ARGS(cookie),
 
 	    TP_STRUCT__entry(
-		    __field(struct fscache_cookie *,	cookie		)
+		    __field(unsigned int,		cookie		)
 		    __field(int,			usage		)
 		    __field(int,			n_children	)
 		    __field(int,			n_active	)
@@ -313,14 +313,14 @@ TRACE_EVENT(fscache_disable,
 			     ),
 
 	    TP_fast_assign(
-		    __entry->cookie	= cookie;
+		    __entry->cookie	= cookie->debug_id;
 		    __entry->usage	= atomic_read(&cookie->usage);
 		    __entry->n_children	= atomic_read(&cookie->n_children);
 		    __entry->n_active	= atomic_read(&cookie->n_active);
 		    __entry->flags	= cookie->flags;
 			   ),
 
-	    TP_printk("c=%p u=%d Nc=%d Na=%d f=%02x",
+	    TP_printk("c=%08x u=%d Nc=%d Na=%d f=%02x",
 		      __entry->cookie, __entry->usage,
 		      __entry->n_children, __entry->n_active, __entry->flags)
 	    );
@@ -333,8 +333,8 @@ TRACE_EVENT(fscache_osm,
 	    TP_ARGS(object, state, wait, oob, event_num),
 
 	    TP_STRUCT__entry(
-		    __field(struct fscache_cookie *,	cookie		)
-		    __field(struct fscache_object *,	object		)
+		    __field(unsigned int,		cookie		)
+		    __field(unsigned int,		object		)
 		    __array(char,			state, 8	)
 		    __field(bool,			wait		)
 		    __field(bool,			oob		)
@@ -342,15 +342,15 @@ TRACE_EVENT(fscache_osm,
 			     ),
 
 	    TP_fast_assign(
-		    __entry->cookie		= object->cookie;
-		    __entry->object		= object;
+		    __entry->cookie		= object->cookie->debug_id;
+		    __entry->object		= object->debug_id;
 		    __entry->wait		= wait;
 		    __entry->oob		= oob;
 		    __entry->event_num		= event_num;
 		    memcpy(__entry->state, state->short_name, 8);
 			   ),
 
-	    TP_printk("c=%p o=%p %s %s%sev=%d",
+	    TP_printk("c=%08x o=%08d %s %s%sev=%d",
 		      __entry->cookie,
 		      __entry->object,
 		      __entry->state,
@@ -370,18 +370,18 @@ TRACE_EVENT(fscache_page,
 	    TP_ARGS(cookie, page, why),
 
 	    TP_STRUCT__entry(
-		    __field(struct fscache_cookie *,	cookie		)
+		    __field(unsigned int,		cookie		)
 		    __field(pgoff_t,			page		)
 		    __field(enum fscache_page_trace,	why		)
 			     ),
 
 	    TP_fast_assign(
-		    __entry->cookie		= cookie;
+		    __entry->cookie		= cookie->debug_id;
 		    __entry->page		= page->index;
 		    __entry->why		= why;
 			   ),
 
-	    TP_printk("c=%p %s pg=%lx",
+	    TP_printk("c=%08x %s pg=%lx",
 		      __entry->cookie,
 		      __print_symbolic(__entry->why, fscache_page_traces),
 		      __entry->page)
@@ -394,20 +394,20 @@ TRACE_EVENT(fscache_check_page,
 	    TP_ARGS(cookie, page, val, n),
 
 	    TP_STRUCT__entry(
-		    __field(struct fscache_cookie *,	cookie		)
+		    __field(unsigned int,		cookie		)
 		    __field(void *,			page		)
 		    __field(void *,			val		)
 		    __field(int,			n		)
 			     ),
 
 	    TP_fast_assign(
-		    __entry->cookie		= cookie;
+		    __entry->cookie		= cookie->debug_id;
 		    __entry->page		= page;
 		    __entry->val		= val;
 		    __entry->n			= n;
 			   ),
 
-	    TP_printk("c=%p pg=%p val=%p n=%d",
+	    TP_printk("c=%08x pg=%p val=%p n=%d",
 		      __entry->cookie, __entry->page, __entry->val, __entry->n)
 	    );
 
@@ -417,14 +417,14 @@ TRACE_EVENT(fscache_wake_cookie,
 	    TP_ARGS(cookie),
 
 	    TP_STRUCT__entry(
-		    __field(struct fscache_cookie *,	cookie		)
+		    __field(unsigned int,		cookie		)
 			     ),
 
 	    TP_fast_assign(
-		    __entry->cookie		= cookie;
+		    __entry->cookie		= cookie->debug_id;
 			   ),
 
-	    TP_printk("c=%p", __entry->cookie)
+	    TP_printk("c=%08x", __entry->cookie)
 	    );
 
 TRACE_EVENT(fscache_op,
@@ -434,18 +434,18 @@ TRACE_EVENT(fscache_op,
 	    TP_ARGS(cookie, op, why),
 
 	    TP_STRUCT__entry(
-		    __field(struct fscache_cookie *,	cookie		)
-		    __field(struct fscache_operation *,	op		)
+		    __field(unsigned int,		cookie		)
+		    __field(unsigned int,		op		)
 		    __field(enum fscache_op_trace,	why		)
 			     ),
 
 	    TP_fast_assign(
-		    __entry->cookie		= cookie;
-		    __entry->op			= op;
+		    __entry->cookie		= cookie->debug_id;
+		    __entry->op			= op->debug_id;
 		    __entry->why		= why;
 			   ),
 
-	    TP_printk("c=%p op=%p %s",
+	    TP_printk("c=%08x op=%08x %s",
 		      __entry->cookie, __entry->op,
 		      __print_symbolic(__entry->why, fscache_op_traces))
 	    );
@@ -457,20 +457,20 @@ TRACE_EVENT(fscache_page_op,
 	    TP_ARGS(cookie, page, op, what),
 
 	    TP_STRUCT__entry(
-		    __field(struct fscache_cookie *,	cookie		)
+		    __field(unsigned int,		cookie		)
+		    __field(unsigned int,		op		)
 		    __field(pgoff_t,			page		)
-		    __field(struct fscache_operation *,	op		)
 		    __field(enum fscache_page_op_trace,	what		)
 			     ),
 
 	    TP_fast_assign(
-		    __entry->cookie		= cookie;
+		    __entry->cookie		= cookie->debug_id;
 		    __entry->page		= page ? page->index : 0;
-		    __entry->op			= op;
+		    __entry->op			= op->debug_id;
 		    __entry->what		= what;
 			   ),
 
-	    TP_printk("c=%p %s pg=%lx op=%p",
+	    TP_printk("c=%08x %s pg=%lx op=%08x",
 		      __entry->cookie,
 		      __print_symbolic(__entry->what, fscache_page_op_traces),
 		      __entry->page, __entry->op)
@@ -483,20 +483,20 @@ TRACE_EVENT(fscache_wrote_page,
 	    TP_ARGS(cookie, page, op, ret),
 
 	    TP_STRUCT__entry(
-		    __field(struct fscache_cookie *,	cookie		)
+		    __field(unsigned int,		cookie		)
+		    __field(unsigned int,		op		)
 		    __field(pgoff_t,			page		)
-		    __field(struct fscache_operation *,	op		)
 		    __field(int,			ret		)
 			     ),
 
 	    TP_fast_assign(
-		    __entry->cookie		= cookie;
+		    __entry->cookie		= cookie->debug_id;
 		    __entry->page		= page->index;
-		    __entry->op			= op;
+		    __entry->op			= op->debug_id;
 		    __entry->ret		= ret;
 			   ),
 
-	    TP_printk("c=%p pg=%lx op=%p ret=%d",
+	    TP_printk("c=%08x pg=%lx op=%08x ret=%d",
 		      __entry->cookie, __entry->page, __entry->op, __entry->ret)
 	    );
 
@@ -507,22 +507,22 @@ TRACE_EVENT(fscache_gang_lookup,
 	    TP_ARGS(cookie, op, results, n, store_limit),
 
 	    TP_STRUCT__entry(
-		    __field(struct fscache_cookie *,	cookie		)
-		    __field(struct fscache_operation *,	op		)
+		    __field(unsigned int,		cookie		)
+		    __field(unsigned int,		op		)
 		    __field(pgoff_t,			results0	)
 		    __field(int,			n		)
 		    __field(pgoff_t,			store_limit	)
 			     ),
 
 	    TP_fast_assign(
-		    __entry->cookie		= cookie;
-		    __entry->op			= op;
+		    __entry->cookie		= cookie->debug_id;
+		    __entry->op			= op->debug_id;
 		    __entry->results0		= results[0] ? ((struct page *)results[0])->index : (pgoff_t)-1;
 		    __entry->n			= n;
 		    __entry->store_limit	= store_limit;
 			   ),
 
-	    TP_printk("c=%p op=%p r0=%lx n=%d sl=%lx",
+	    TP_printk("c=%08x op=%08x r0=%lx n=%d sl=%lx",
 		      __entry->cookie, __entry->op, __entry->results0, __entry->n,
 		      __entry->store_limit)
 	    );



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 04/76] fscache: Procfile to display cookies
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (2 preceding siblings ...)
  2020-11-20 15:03 ` [RFC PATCH 03/76] fscache: Add a cookie debug ID and use that in traces David Howells
@ 2020-11-20 15:03 ` David Howells
  2020-11-20 15:03 ` [RFC PATCH 05/76] fscache: Remove the old I/O API David Howells
                   ` (70 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:03 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Add /proc/fs/fscache/cookies to display active cookies.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/cookie.c     |  103 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/fscache/internal.h   |    1 
 fs/fscache/proc.c       |    7 +++
 include/linux/fscache.h |    1 
 4 files changed, 112 insertions(+)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index f2be98d2c64d..c7047544972b 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -19,6 +19,8 @@ static atomic_t fscache_object_debug_id = ATOMIC_INIT(0);
 
 #define fscache_cookie_hash_shift 15
 static struct hlist_bl_head fscache_cookie_hash[1 << fscache_cookie_hash_shift];
+static LIST_HEAD(fscache_cookies);
+static DEFINE_RWLOCK(fscache_cookies_lock);
 
 static int fscache_acquire_non_index_cookie(struct fscache_cookie *cookie,
 					    loff_t object_size);
@@ -65,6 +67,9 @@ void fscache_free_cookie(struct fscache_cookie *cookie)
 {
 	if (cookie) {
 		BUG_ON(!hlist_empty(&cookie->backing_objects));
+		write_lock(&fscache_cookies_lock);
+		list_del(&cookie->proc_link);
+		write_unlock(&fscache_cookies_lock);
 		if (cookie->aux_len > sizeof(cookie->inline_aux))
 			kfree(cookie->aux);
 		if (cookie->key_len > sizeof(cookie->inline_key))
@@ -192,6 +197,10 @@ struct fscache_cookie *fscache_alloc_cookie(
 	/* radix tree insertion won't use the preallocation pool unless it's
 	 * told it may not wait */
 	INIT_RADIX_TREE(&cookie->stores, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
+
+	write_lock(&fscache_cookies_lock);
+	list_add_tail(&cookie->proc_link, &fscache_cookies);
+	write_unlock(&fscache_cookies_lock);
 	return cookie;
 
 nomem:
@@ -969,3 +978,97 @@ int __fscache_check_consistency(struct fscache_cookie *cookie,
 	return -ESTALE;
 }
 EXPORT_SYMBOL(__fscache_check_consistency);
+
+/*
+ * Generate a list of extant cookies in /proc/fs/fscache/cookies
+ */
+static int fscache_cookies_seq_show(struct seq_file *m, void *v)
+{
+	struct fscache_cookie *cookie;
+	unsigned int keylen = 0, auxlen = 0;
+	char _type[3], *type;
+	u8 *p;
+
+	if (v == &fscache_cookies) {
+		seq_puts(m,
+			 "COOKIE   PARENT   USAGE CHILD ACT TY FL  DEF              NETFS_DATA\n"
+			 "======== ======== ===== ===== === == === ================ ==========\n"
+			 );
+		return 0;
+	}
+
+	cookie = list_entry(v, struct fscache_cookie, proc_link);
+
+	switch (cookie->type) {
+	case 0:
+		type = "IX";
+		break;
+	case 1:
+		type = "DT";
+		break;
+	default:
+		snprintf(_type, sizeof(_type), "%02u",
+			 cookie->type);
+		type = _type;
+		break;
+	}
+
+	seq_printf(m,
+		   "%08x %08x %5u %5u %3u %s %03lx %-16s %px",
+		   cookie->debug_id,
+		   cookie->parent ? cookie->parent->debug_id : 0,
+		   atomic_read(&cookie->usage),
+		   atomic_read(&cookie->n_children),
+		   atomic_read(&cookie->n_active),
+		   type,
+		   cookie->flags,
+		   cookie->def->name,
+		   cookie->netfs_data);
+
+	keylen = cookie->key_len;
+	auxlen = cookie->aux_len;
+
+	if (keylen > 0 || auxlen > 0) {
+		seq_puts(m, " ");
+		p = keylen <= sizeof(cookie->inline_key) ?
+			cookie->inline_key : cookie->key;
+		for (; keylen > 0; keylen--)
+			seq_printf(m, "%02x", *p++);
+		if (auxlen > 0) {
+			seq_puts(m, ", ");
+			p = auxlen <= sizeof(cookie->inline_aux) ?
+				cookie->inline_aux : cookie->aux;
+			for (; auxlen > 0; auxlen--)
+				seq_printf(m, "%02x", *p++);
+		}
+	}
+
+	seq_puts(m, "\n");
+	return 0;
+}
+
+static void *fscache_cookies_seq_start(struct seq_file *m, loff_t *_pos)
+	__acquires(fscache_cookies_lock)
+{
+	read_lock(&fscache_cookies_lock);
+	return seq_list_start_head(&fscache_cookies, *_pos);
+}
+
+static void *fscache_cookies_seq_next(struct seq_file *m, void *v, loff_t *_pos)
+{
+	return seq_list_next(v, &fscache_cookies, _pos);
+}
+
+static void fscache_cookies_seq_stop(struct seq_file *m, void *v)
+	__releases(rcu)
+{
+	read_unlock(&fscache_cookies_lock);
+}
+
+
+const struct seq_operations fscache_cookies_seq_ops = {
+	.start  = fscache_cookies_seq_start,
+	.next   = fscache_cookies_seq_next,
+	.stop   = fscache_cookies_seq_stop,
+	.show   = fscache_cookies_seq_show,
+};
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 08e91efbce53..4b535c2dae4a 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -45,6 +45,7 @@ extern struct fscache_cache *fscache_select_cache_for_object(
  * cookie.c
  */
 extern struct kmem_cache *fscache_cookie_jar;
+extern const struct seq_operations fscache_cookies_seq_ops;
 
 extern void fscache_free_cookie(struct fscache_cookie *);
 extern struct fscache_cookie *fscache_alloc_cookie(struct fscache_cookie *,
diff --git a/fs/fscache/proc.c b/fs/fscache/proc.c
index 90a7bc22f7e1..da51fdfc8641 100644
--- a/fs/fscache/proc.c
+++ b/fs/fscache/proc.c
@@ -21,6 +21,10 @@ int __init fscache_proc_init(void)
 	if (!proc_mkdir("fs/fscache", NULL))
 		goto error_dir;
 
+	if (!proc_create_seq("fs/fscache/cookies", S_IFREG | 0444, NULL,
+			     &fscache_cookies_seq_ops))
+		goto error_cookies;
+
 #ifdef CONFIG_FSCACHE_STATS
 	if (!proc_create_single("fs/fscache/stats", S_IFREG | 0444, NULL,
 			fscache_stats_show))
@@ -53,6 +57,8 @@ int __init fscache_proc_init(void)
 	remove_proc_entry("fs/fscache/stats", NULL);
 error_stats:
 #endif
+	remove_proc_entry("fs/fscache/cookies", NULL);
+error_cookies:
 	remove_proc_entry("fs/fscache", NULL);
 error_dir:
 	_leave(" = -ENOMEM");
@@ -73,5 +79,6 @@ void fscache_proc_cleanup(void)
 #ifdef CONFIG_FSCACHE_STATS
 	remove_proc_entry("fs/fscache/stats", NULL);
 #endif
+	remove_proc_entry("fs/fscache/cookies", NULL);
 	remove_proc_entry("fs/fscache", NULL);
 }
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index f31df9a075f6..2606fe7edd29 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -141,6 +141,7 @@ struct fscache_cookie {
 	const struct fscache_cookie_def	*def;		/* definition */
 	struct fscache_cookie		*parent;	/* parent of this entry */
 	struct hlist_bl_node		hash_link;	/* Link in hash table */
+	struct list_head		proc_link;	/* Link in proc list */
 	void				*netfs_data;	/* back pointer to netfs */
 	struct radix_tree_root		stores;		/* pages to be stored on this cookie */
 #define FSCACHE_COOKIE_PENDING_TAG	0		/* pages tag: pending write to cache */



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 05/76] fscache: Remove the old I/O API
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (3 preceding siblings ...)
  2020-11-20 15:03 ` [RFC PATCH 04/76] fscache: Procfile to display cookies David Howells
@ 2020-11-20 15:03 ` David Howells
  2020-11-20 15:03 ` [RFC PATCH 06/76] fscache: Remove the netfs data from the cookie David Howells
                   ` (69 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:03 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Remove the old fscache I/O API.  There's no point trying to transform it as
the new one will bear no similarities to the old one.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/Makefile           |    1 
 fs/cachefiles/interface.c        |   15 -
 fs/cachefiles/internal.h         |   38 -
 fs/cachefiles/main.c             |    1 
 fs/cachefiles/rdwr.c             |  975 -----------------------------------
 fs/fscache/cache.c               |    6 
 fs/fscache/cookie.c              |   10 
 fs/fscache/internal.h            |   23 -
 fs/fscache/object.c              |    2 
 fs/fscache/page.c                | 1068 --------------------------------------
 fs/fscache/stats.c               |    6 
 include/linux/fscache-cache.h    |  132 -----
 include/linux/fscache-obsolete.h |   13 
 include/linux/fscache.h          |  349 ------------
 14 files changed, 15 insertions(+), 2624 deletions(-)
 delete mode 100644 fs/cachefiles/rdwr.c
 create mode 100644 include/linux/fscache-obsolete.h

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index 891dedda5905..3455d3646547 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -10,7 +10,6 @@ cachefiles-y := \
 	key.o \
 	main.o \
 	namei.o \
-	rdwr.o \
 	security.o \
 	xattr.o
 
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 4cea5fbf695e..04d92ad402a4 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -540,14 +540,6 @@ static void cachefiles_invalidate_object(struct fscache_operation *op)
 	_leave("");
 }
 
-/*
- * dissociate a cache from all the pages it was backing
- */
-static void cachefiles_dissociate_pages(struct fscache_cache *cache)
-{
-	_enter("");
-}
-
 const struct fscache_cache_ops cachefiles_cache_ops = {
 	.name			= "cachefiles",
 	.alloc_object		= cachefiles_alloc_object,
@@ -560,12 +552,5 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
 	.put_object		= cachefiles_put_object,
 	.sync_cache		= cachefiles_sync_cache,
 	.attr_changed		= cachefiles_attr_changed,
-	.read_or_alloc_page	= cachefiles_read_or_alloc_page,
-	.read_or_alloc_pages	= cachefiles_read_or_alloc_pages,
-	.allocate_page		= cachefiles_allocate_page,
-	.allocate_pages		= cachefiles_allocate_pages,
-	.write_page		= cachefiles_write_page,
-	.uncache_page		= cachefiles_uncache_page,
-	.dissociate_pages	= cachefiles_dissociate_pages,
 	.check_consistency	= cachefiles_check_consistency,
 };
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index cf9bd6401c2d..aca73c8403ab 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -43,7 +43,6 @@ struct cachefiles_object {
 	atomic_t			usage;		/* object usage count */
 	uint8_t				type;		/* object type */
 	uint8_t				new;		/* T if object new */
-	spinlock_t			work_lock;
 	struct rb_node			active_node;	/* link in active tree (dentry is key) */
 };
 
@@ -89,28 +88,6 @@ struct cachefiles_cache {
 	char				*tag;		/* cache binding tag */
 };
 
-/*
- * backing file read tracking
- */
-struct cachefiles_one_read {
-	wait_queue_entry_t			monitor;	/* link into monitored waitqueue */
-	struct page			*back_page;	/* backing file page we're waiting for */
-	struct page			*netfs_page;	/* netfs page we're going to fill */
-	struct fscache_retrieval	*op;		/* retrieval op covering this */
-	struct list_head		op_link;	/* link in op's todo list */
-};
-
-/*
- * backing file write tracking
- */
-struct cachefiles_one_write {
-	struct page			*netfs_page;	/* netfs page to copy */
-	struct cachefiles_object	*object;
-	struct list_head		obj_link;	/* link in object's lists */
-	fscache_rw_complete_t		end_io_func;
-	void				*context;
-};
-
 /*
  * auxiliary data xattr buffer
  */
@@ -202,21 +179,6 @@ void cachefiles_hist(atomic_t histogram[], unsigned long start_jif)
 #define cachefiles_hist(hist, start_jif) do {} while (0)
 #endif
 
-/*
- * rdwr.c
- */
-extern int cachefiles_read_or_alloc_page(struct fscache_retrieval *,
-					 struct page *, gfp_t);
-extern int cachefiles_read_or_alloc_pages(struct fscache_retrieval *,
-					  struct list_head *, unsigned *,
-					  gfp_t);
-extern int cachefiles_allocate_page(struct fscache_retrieval *, struct page *,
-				    gfp_t);
-extern int cachefiles_allocate_pages(struct fscache_retrieval *,
-				     struct list_head *, unsigned *, gfp_t);
-extern int cachefiles_write_page(struct fscache_storage *, struct page *);
-extern void cachefiles_uncache_page(struct fscache_object *, struct page *);
-
 /*
  * security.c
  */
diff --git a/fs/cachefiles/main.c b/fs/cachefiles/main.c
index ddf0cd58d60c..3f0101a74809 100644
--- a/fs/cachefiles/main.c
+++ b/fs/cachefiles/main.c
@@ -42,7 +42,6 @@ static void cachefiles_object_init_once(void *_object)
 	struct cachefiles_object *object = _object;
 
 	memset(object, 0, sizeof(*object));
-	spin_lock_init(&object->work_lock);
 }
 
 /*
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
deleted file mode 100644
index 8bda092e60c5..000000000000
--- a/fs/cachefiles/rdwr.c
+++ /dev/null
@@ -1,975 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/* Storage object read/write
- *
- * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- */
-
-#include <linux/mount.h>
-#include <linux/slab.h>
-#include <linux/file.h>
-#include <linux/swap.h>
-#include "internal.h"
-
-/*
- * detect wake up events generated by the unlocking of pages in which we're
- * interested
- * - we use this to detect read completion of backing pages
- * - the caller holds the waitqueue lock
- */
-static int cachefiles_read_waiter(wait_queue_entry_t *wait, unsigned mode,
-				  int sync, void *_key)
-{
-	struct cachefiles_one_read *monitor =
-		container_of(wait, struct cachefiles_one_read, monitor);
-	struct cachefiles_object *object;
-	struct fscache_retrieval *op = monitor->op;
-	struct wait_bit_key *key = _key;
-	struct page *page = wait->private;
-
-	ASSERT(key);
-
-	_enter("{%lu},%u,%d,{%p,%u}",
-	       monitor->netfs_page->index, mode, sync,
-	       key->flags, key->bit_nr);
-
-	if (key->flags != &page->flags ||
-	    key->bit_nr != PG_locked)
-		return 0;
-
-	_debug("--- monitor %p %lx ---", page, page->flags);
-
-	if (!PageUptodate(page) && !PageError(page)) {
-		/* unlocked, not uptodate and not erronous? */
-		_debug("page probably truncated");
-	}
-
-	/* remove from the waitqueue */
-	list_del(&wait->entry);
-
-	/* move onto the action list and queue for FS-Cache thread pool */
-	ASSERT(op);
-
-	/* We need to temporarily bump the usage count as we don't own a ref
-	 * here otherwise cachefiles_read_copier() may free the op between the
-	 * monitor being enqueued on the op->to_do list and the op getting
-	 * enqueued on the work queue.
-	 */
-	fscache_get_retrieval(op);
-
-	object = container_of(op->op.object, struct cachefiles_object, fscache);
-	spin_lock(&object->work_lock);
-	list_add_tail(&monitor->op_link, &op->to_do);
-	fscache_enqueue_retrieval(op);
-	spin_unlock(&object->work_lock);
-
-	fscache_put_retrieval(op);
-	return 0;
-}
-
-/*
- * handle a probably truncated page
- * - check to see if the page is still relevant and reissue the read if
- *   possible
- * - return -EIO on error, -ENODATA if the page is gone, -EINPROGRESS if we
- *   must wait again and 0 if successful
- */
-static int cachefiles_read_reissue(struct cachefiles_object *object,
-				   struct cachefiles_one_read *monitor)
-{
-	struct address_space *bmapping = d_backing_inode(object->backer)->i_mapping;
-	struct page *backpage = monitor->back_page, *backpage2;
-	int ret;
-
-	_enter("{ino=%lx},{%lx,%lx}",
-	       d_backing_inode(object->backer)->i_ino,
-	       backpage->index, backpage->flags);
-
-	/* skip if the page was truncated away completely */
-	if (backpage->mapping != bmapping) {
-		_leave(" = -ENODATA [mapping]");
-		return -ENODATA;
-	}
-
-	backpage2 = find_get_page(bmapping, backpage->index);
-	if (!backpage2) {
-		_leave(" = -ENODATA [gone]");
-		return -ENODATA;
-	}
-
-	if (backpage != backpage2) {
-		put_page(backpage2);
-		_leave(" = -ENODATA [different]");
-		return -ENODATA;
-	}
-
-	/* the page is still there and we already have a ref on it, so we don't
-	 * need a second */
-	put_page(backpage2);
-
-	INIT_LIST_HEAD(&monitor->op_link);
-	add_page_wait_queue(backpage, &monitor->monitor);
-
-	if (trylock_page(backpage)) {
-		ret = -EIO;
-		if (PageError(backpage))
-			goto unlock_discard;
-		ret = 0;
-		if (PageUptodate(backpage))
-			goto unlock_discard;
-
-		_debug("reissue read");
-		ret = bmapping->a_ops->readpage(NULL, backpage);
-		if (ret < 0)
-			goto discard;
-	}
-
-	/* but the page may have been read before the monitor was installed, so
-	 * the monitor may miss the event - so we have to ensure that we do get
-	 * one in such a case */
-	if (trylock_page(backpage)) {
-		_debug("jumpstart %p {%lx}", backpage, backpage->flags);
-		unlock_page(backpage);
-	}
-
-	/* it'll reappear on the todo list */
-	_leave(" = -EINPROGRESS");
-	return -EINPROGRESS;
-
-unlock_discard:
-	unlock_page(backpage);
-discard:
-	spin_lock_irq(&object->work_lock);
-	list_del(&monitor->op_link);
-	spin_unlock_irq(&object->work_lock);
-	_leave(" = %d", ret);
-	return ret;
-}
-
-/*
- * copy data from backing pages to netfs pages to complete a read operation
- * - driven by FS-Cache's thread pool
- */
-static void cachefiles_read_copier(struct fscache_operation *_op)
-{
-	struct cachefiles_one_read *monitor;
-	struct cachefiles_object *object;
-	struct fscache_retrieval *op;
-	int error, max;
-
-	op = container_of(_op, struct fscache_retrieval, op);
-	object = container_of(op->op.object,
-			      struct cachefiles_object, fscache);
-
-	_enter("{ino=%lu}", d_backing_inode(object->backer)->i_ino);
-
-	max = 8;
-	spin_lock_irq(&object->work_lock);
-
-	while (!list_empty(&op->to_do)) {
-		monitor = list_entry(op->to_do.next,
-				     struct cachefiles_one_read, op_link);
-		list_del(&monitor->op_link);
-
-		spin_unlock_irq(&object->work_lock);
-
-		_debug("- copy {%lu}", monitor->back_page->index);
-
-	recheck:
-		if (test_bit(FSCACHE_COOKIE_INVALIDATING,
-			     &object->fscache.cookie->flags)) {
-			error = -ESTALE;
-		} else if (PageUptodate(monitor->back_page)) {
-			copy_highpage(monitor->netfs_page, monitor->back_page);
-			fscache_mark_page_cached(monitor->op,
-						 monitor->netfs_page);
-			error = 0;
-		} else if (!PageError(monitor->back_page)) {
-			/* the page has probably been truncated */
-			error = cachefiles_read_reissue(object, monitor);
-			if (error == -EINPROGRESS)
-				goto next;
-			goto recheck;
-		} else {
-			cachefiles_io_error_obj(
-				object,
-				"Readpage failed on backing file %lx",
-				(unsigned long) monitor->back_page->flags);
-			error = -EIO;
-		}
-
-		put_page(monitor->back_page);
-
-		fscache_end_io(op, monitor->netfs_page, error);
-		put_page(monitor->netfs_page);
-		fscache_retrieval_complete(op, 1);
-		fscache_put_retrieval(op);
-		kfree(monitor);
-
-	next:
-		/* let the thread pool have some air occasionally */
-		max--;
-		if (max < 0 || need_resched()) {
-			if (!list_empty(&op->to_do))
-				fscache_enqueue_retrieval(op);
-			_leave(" [maxed out]");
-			return;
-		}
-
-		spin_lock_irq(&object->work_lock);
-	}
-
-	spin_unlock_irq(&object->work_lock);
-	_leave("");
-}
-
-/*
- * read the corresponding page to the given set from the backing file
- * - an uncertain page is simply discarded, to be tried again another time
- */
-static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
-					    struct fscache_retrieval *op,
-					    struct page *netpage)
-{
-	struct cachefiles_one_read *monitor;
-	struct address_space *bmapping;
-	struct page *newpage, *backpage;
-	int ret;
-
-	_enter("");
-
-	_debug("read back %p{%lu,%d}",
-	       netpage, netpage->index, page_count(netpage));
-
-	monitor = kzalloc(sizeof(*monitor), cachefiles_gfp);
-	if (!monitor)
-		goto nomem;
-
-	monitor->netfs_page = netpage;
-	monitor->op = fscache_get_retrieval(op);
-
-	init_waitqueue_func_entry(&monitor->monitor, cachefiles_read_waiter);
-
-	/* attempt to get hold of the backing page */
-	bmapping = d_backing_inode(object->backer)->i_mapping;
-	newpage = NULL;
-
-	for (;;) {
-		backpage = find_get_page(bmapping, netpage->index);
-		if (backpage)
-			goto backing_page_already_present;
-
-		if (!newpage) {
-			newpage = __page_cache_alloc(cachefiles_gfp);
-			if (!newpage)
-				goto nomem_monitor;
-		}
-
-		ret = add_to_page_cache_lru(newpage, bmapping,
-					    netpage->index, cachefiles_gfp);
-		if (ret == 0)
-			goto installed_new_backing_page;
-		if (ret != -EEXIST)
-			goto nomem_page;
-	}
-
-	/* we've installed a new backing page, so now we need to start
-	 * it reading */
-installed_new_backing_page:
-	_debug("- new %p", newpage);
-
-	backpage = newpage;
-	newpage = NULL;
-
-read_backing_page:
-	ret = bmapping->a_ops->readpage(NULL, backpage);
-	if (ret < 0)
-		goto read_error;
-
-	/* set the monitor to transfer the data across */
-monitor_backing_page:
-	_debug("- monitor add");
-
-	/* install the monitor */
-	get_page(monitor->netfs_page);
-	get_page(backpage);
-	monitor->back_page = backpage;
-	monitor->monitor.private = backpage;
-	add_page_wait_queue(backpage, &monitor->monitor);
-	monitor = NULL;
-
-	/* but the page may have been read before the monitor was installed, so
-	 * the monitor may miss the event - so we have to ensure that we do get
-	 * one in such a case */
-	if (trylock_page(backpage)) {
-		_debug("jumpstart %p {%lx}", backpage, backpage->flags);
-		unlock_page(backpage);
-	}
-	goto success;
-
-	/* if the backing page is already present, it can be in one of
-	 * three states: read in progress, read failed or read okay */
-backing_page_already_present:
-	_debug("- present");
-
-	if (newpage) {
-		put_page(newpage);
-		newpage = NULL;
-	}
-
-	if (PageError(backpage))
-		goto io_error;
-
-	if (PageUptodate(backpage))
-		goto backing_page_already_uptodate;
-
-	if (!trylock_page(backpage))
-		goto monitor_backing_page;
-	_debug("read %p {%lx}", backpage, backpage->flags);
-	goto read_backing_page;
-
-	/* the backing page is already up to date, attach the netfs
-	 * page to the pagecache and LRU and copy the data across */
-backing_page_already_uptodate:
-	_debug("- uptodate");
-
-	fscache_mark_page_cached(op, netpage);
-
-	copy_highpage(netpage, backpage);
-	fscache_end_io(op, netpage, 0);
-	fscache_retrieval_complete(op, 1);
-
-success:
-	_debug("success");
-	ret = 0;
-
-out:
-	if (backpage)
-		put_page(backpage);
-	if (monitor) {
-		fscache_put_retrieval(monitor->op);
-		kfree(monitor);
-	}
-	_leave(" = %d", ret);
-	return ret;
-
-read_error:
-	_debug("read error %d", ret);
-	if (ret == -ENOMEM) {
-		fscache_retrieval_complete(op, 1);
-		goto out;
-	}
-io_error:
-	cachefiles_io_error_obj(object, "Page read error on backing file");
-	fscache_retrieval_complete(op, 1);
-	ret = -ENOBUFS;
-	goto out;
-
-nomem_page:
-	put_page(newpage);
-nomem_monitor:
-	fscache_put_retrieval(monitor->op);
-	kfree(monitor);
-nomem:
-	fscache_retrieval_complete(op, 1);
-	_leave(" = -ENOMEM");
-	return -ENOMEM;
-}
-
-/*
- * read a page from the cache or allocate a block in which to store it
- * - cache withdrawal is prevented by the caller
- * - returns -EINTR if interrupted
- * - returns -ENOMEM if ran out of memory
- * - returns -ENOBUFS if no buffers can be made available
- * - returns -ENOBUFS if page is beyond EOF
- * - if the page is backed by a block in the cache:
- *   - a read will be started which will call the callback on completion
- *   - 0 will be returned
- * - else if the page is unbacked:
- *   - the metadata will be retained
- *   - -ENODATA will be returned
- */
-int cachefiles_read_or_alloc_page(struct fscache_retrieval *op,
-				  struct page *page,
-				  gfp_t gfp)
-{
-	struct cachefiles_object *object;
-	struct cachefiles_cache *cache;
-	struct inode *inode;
-	sector_t block;
-	unsigned shift;
-	int ret, ret2;
-
-	object = container_of(op->op.object,
-			      struct cachefiles_object, fscache);
-	cache = container_of(object->fscache.cache,
-			     struct cachefiles_cache, cache);
-
-	_enter("{%p},{%lx},,,", object, page->index);
-
-	if (!object->backer)
-		goto enobufs;
-
-	inode = d_backing_inode(object->backer);
-	ASSERT(S_ISREG(inode->i_mode));
-	ASSERT(inode->i_mapping->a_ops->readpages);
-
-	/* calculate the shift required to use bmap */
-	shift = PAGE_SHIFT - inode->i_sb->s_blocksize_bits;
-
-	op->op.flags &= FSCACHE_OP_KEEP_FLAGS;
-	op->op.flags |= FSCACHE_OP_ASYNC;
-	op->op.processor = cachefiles_read_copier;
-
-	/* we assume the absence or presence of the first block is a good
-	 * enough indication for the page as a whole
-	 * - TODO: don't use bmap() for this as it is _not_ actually good
-	 *   enough for this as it doesn't indicate errors, but it's all we've
-	 *   got for the moment
-	 */
-	block = page->index;
-	block <<= shift;
-
-	ret2 = bmap(inode, &block);
-	ASSERT(ret2 == 0);
-
-	_debug("%llx -> %llx",
-	       (unsigned long long) (page->index << shift),
-	       (unsigned long long) block);
-
-	if (block) {
-		/* submit the apparently valid page to the backing fs to be
-		 * read from disk */
-		ret = cachefiles_read_backing_file_one(object, op, page);
-	} else if (cachefiles_has_space(cache, 0, 1) == 0) {
-		/* there's space in the cache we can use */
-		fscache_mark_page_cached(op, page);
-		fscache_retrieval_complete(op, 1);
-		ret = -ENODATA;
-	} else {
-		goto enobufs;
-	}
-
-	_leave(" = %d", ret);
-	return ret;
-
-enobufs:
-	fscache_retrieval_complete(op, 1);
-	_leave(" = -ENOBUFS");
-	return -ENOBUFS;
-}
-
-/*
- * read the corresponding pages to the given set from the backing file
- * - any uncertain pages are simply discarded, to be tried again another time
- */
-static int cachefiles_read_backing_file(struct cachefiles_object *object,
-					struct fscache_retrieval *op,
-					struct list_head *list)
-{
-	struct cachefiles_one_read *monitor = NULL;
-	struct address_space *bmapping = d_backing_inode(object->backer)->i_mapping;
-	struct page *newpage = NULL, *netpage, *_n, *backpage = NULL;
-	int ret = 0;
-
-	_enter("");
-
-	list_for_each_entry_safe(netpage, _n, list, lru) {
-		list_del(&netpage->lru);
-
-		_debug("read back %p{%lu,%d}",
-		       netpage, netpage->index, page_count(netpage));
-
-		if (!monitor) {
-			monitor = kzalloc(sizeof(*monitor), cachefiles_gfp);
-			if (!monitor)
-				goto nomem;
-
-			monitor->op = fscache_get_retrieval(op);
-			init_waitqueue_func_entry(&monitor->monitor,
-						  cachefiles_read_waiter);
-		}
-
-		for (;;) {
-			backpage = find_get_page(bmapping, netpage->index);
-			if (backpage)
-				goto backing_page_already_present;
-
-			if (!newpage) {
-				newpage = __page_cache_alloc(cachefiles_gfp);
-				if (!newpage)
-					goto nomem;
-			}
-
-			ret = add_to_page_cache_lru(newpage, bmapping,
-						    netpage->index,
-						    cachefiles_gfp);
-			if (ret == 0)
-				goto installed_new_backing_page;
-			if (ret != -EEXIST)
-				goto nomem;
-		}
-
-		/* we've installed a new backing page, so now we need
-		 * to start it reading */
-	installed_new_backing_page:
-		_debug("- new %p", newpage);
-
-		backpage = newpage;
-		newpage = NULL;
-
-	reread_backing_page:
-		ret = bmapping->a_ops->readpage(NULL, backpage);
-		if (ret < 0)
-			goto read_error;
-
-		/* add the netfs page to the pagecache and LRU, and set the
-		 * monitor to transfer the data across */
-	monitor_backing_page:
-		_debug("- monitor add");
-
-		ret = add_to_page_cache_lru(netpage, op->mapping,
-					    netpage->index, cachefiles_gfp);
-		if (ret < 0) {
-			if (ret == -EEXIST) {
-				put_page(backpage);
-				backpage = NULL;
-				put_page(netpage);
-				netpage = NULL;
-				fscache_retrieval_complete(op, 1);
-				continue;
-			}
-			goto nomem;
-		}
-
-		/* install a monitor */
-		get_page(netpage);
-		monitor->netfs_page = netpage;
-
-		get_page(backpage);
-		monitor->back_page = backpage;
-		monitor->monitor.private = backpage;
-		add_page_wait_queue(backpage, &monitor->monitor);
-		monitor = NULL;
-
-		/* but the page may have been read before the monitor was
-		 * installed, so the monitor may miss the event - so we have to
-		 * ensure that we do get one in such a case */
-		if (trylock_page(backpage)) {
-			_debug("2unlock %p {%lx}", backpage, backpage->flags);
-			unlock_page(backpage);
-		}
-
-		put_page(backpage);
-		backpage = NULL;
-
-		put_page(netpage);
-		netpage = NULL;
-		continue;
-
-		/* if the backing page is already present, it can be in one of
-		 * three states: read in progress, read failed or read okay */
-	backing_page_already_present:
-		_debug("- present %p", backpage);
-
-		if (PageError(backpage))
-			goto io_error;
-
-		if (PageUptodate(backpage))
-			goto backing_page_already_uptodate;
-
-		_debug("- not ready %p{%lx}", backpage, backpage->flags);
-
-		if (!trylock_page(backpage))
-			goto monitor_backing_page;
-
-		if (PageError(backpage)) {
-			_debug("error %lx", backpage->flags);
-			unlock_page(backpage);
-			goto io_error;
-		}
-
-		if (PageUptodate(backpage))
-			goto backing_page_already_uptodate_unlock;
-
-		/* we've locked a page that's neither up to date nor erroneous,
-		 * so we need to attempt to read it again */
-		goto reread_backing_page;
-
-		/* the backing page is already up to date, attach the netfs
-		 * page to the pagecache and LRU and copy the data across */
-	backing_page_already_uptodate_unlock:
-		_debug("uptodate %lx", backpage->flags);
-		unlock_page(backpage);
-	backing_page_already_uptodate:
-		_debug("- uptodate");
-
-		ret = add_to_page_cache_lru(netpage, op->mapping,
-					    netpage->index, cachefiles_gfp);
-		if (ret < 0) {
-			if (ret == -EEXIST) {
-				put_page(backpage);
-				backpage = NULL;
-				put_page(netpage);
-				netpage = NULL;
-				fscache_retrieval_complete(op, 1);
-				continue;
-			}
-			goto nomem;
-		}
-
-		copy_highpage(netpage, backpage);
-
-		put_page(backpage);
-		backpage = NULL;
-
-		fscache_mark_page_cached(op, netpage);
-
-		/* the netpage is unlocked and marked up to date here */
-		fscache_end_io(op, netpage, 0);
-		put_page(netpage);
-		netpage = NULL;
-		fscache_retrieval_complete(op, 1);
-		continue;
-	}
-
-	netpage = NULL;
-
-	_debug("out");
-
-out:
-	/* tidy up */
-	if (newpage)
-		put_page(newpage);
-	if (netpage)
-		put_page(netpage);
-	if (backpage)
-		put_page(backpage);
-	if (monitor) {
-		fscache_put_retrieval(op);
-		kfree(monitor);
-	}
-
-	list_for_each_entry_safe(netpage, _n, list, lru) {
-		list_del(&netpage->lru);
-		put_page(netpage);
-		fscache_retrieval_complete(op, 1);
-	}
-
-	_leave(" = %d", ret);
-	return ret;
-
-nomem:
-	_debug("nomem");
-	ret = -ENOMEM;
-	goto record_page_complete;
-
-read_error:
-	_debug("read error %d", ret);
-	if (ret == -ENOMEM)
-		goto record_page_complete;
-io_error:
-	cachefiles_io_error_obj(object, "Page read error on backing file");
-	ret = -ENOBUFS;
-record_page_complete:
-	fscache_retrieval_complete(op, 1);
-	goto out;
-}
-
-/*
- * read a list of pages from the cache or allocate blocks in which to store
- * them
- */
-int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op,
-				   struct list_head *pages,
-				   unsigned *nr_pages,
-				   gfp_t gfp)
-{
-	struct cachefiles_object *object;
-	struct cachefiles_cache *cache;
-	struct list_head backpages;
-	struct pagevec pagevec;
-	struct inode *inode;
-	struct page *page, *_n;
-	unsigned shift, nrbackpages;
-	int ret, ret2, space;
-
-	object = container_of(op->op.object,
-			      struct cachefiles_object, fscache);
-	cache = container_of(object->fscache.cache,
-			     struct cachefiles_cache, cache);
-
-	_enter("{OBJ%x,%d},,%d,,",
-	       object->fscache.debug_id, atomic_read(&op->op.usage),
-	       *nr_pages);
-
-	if (!object->backer)
-		goto all_enobufs;
-
-	space = 1;
-	if (cachefiles_has_space(cache, 0, *nr_pages) < 0)
-		space = 0;
-
-	inode = d_backing_inode(object->backer);
-	ASSERT(S_ISREG(inode->i_mode));
-	ASSERT(inode->i_mapping->a_ops->readpages);
-
-	/* calculate the shift required to use bmap */
-	shift = PAGE_SHIFT - inode->i_sb->s_blocksize_bits;
-
-	pagevec_init(&pagevec);
-
-	op->op.flags &= FSCACHE_OP_KEEP_FLAGS;
-	op->op.flags |= FSCACHE_OP_ASYNC;
-	op->op.processor = cachefiles_read_copier;
-
-	INIT_LIST_HEAD(&backpages);
-	nrbackpages = 0;
-
-	ret = space ? -ENODATA : -ENOBUFS;
-	list_for_each_entry_safe(page, _n, pages, lru) {
-		sector_t block;
-
-		/* we assume the absence or presence of the first block is a
-		 * good enough indication for the page as a whole
-		 * - TODO: don't use bmap() for this as it is _not_ actually
-		 *   good enough for this as it doesn't indicate errors, but
-		 *   it's all we've got for the moment
-		 */
-		block = page->index;
-		block <<= shift;
-
-		ret2 = bmap(inode, &block);
-		ASSERT(ret2 == 0);
-
-		_debug("%llx -> %llx",
-		       (unsigned long long) (page->index << shift),
-		       (unsigned long long) block);
-
-		if (block) {
-			/* we have data - add it to the list to give to the
-			 * backing fs */
-			list_move(&page->lru, &backpages);
-			(*nr_pages)--;
-			nrbackpages++;
-		} else if (space && pagevec_add(&pagevec, page) == 0) {
-			fscache_mark_pages_cached(op, &pagevec);
-			fscache_retrieval_complete(op, 1);
-			ret = -ENODATA;
-		} else {
-			fscache_retrieval_complete(op, 1);
-		}
-	}
-
-	if (pagevec_count(&pagevec) > 0)
-		fscache_mark_pages_cached(op, &pagevec);
-
-	if (list_empty(pages))
-		ret = 0;
-
-	/* submit the apparently valid pages to the backing fs to be read from
-	 * disk */
-	if (nrbackpages > 0) {
-		ret2 = cachefiles_read_backing_file(object, op, &backpages);
-		if (ret2 == -ENOMEM || ret2 == -EINTR)
-			ret = ret2;
-	}
-
-	_leave(" = %d [nr=%u%s]",
-	       ret, *nr_pages, list_empty(pages) ? " empty" : "");
-	return ret;
-
-all_enobufs:
-	fscache_retrieval_complete(op, *nr_pages);
-	return -ENOBUFS;
-}
-
-/*
- * allocate a block in the cache in which to store a page
- * - cache withdrawal is prevented by the caller
- * - returns -EINTR if interrupted
- * - returns -ENOMEM if ran out of memory
- * - returns -ENOBUFS if no buffers can be made available
- * - returns -ENOBUFS if page is beyond EOF
- * - otherwise:
- *   - the metadata will be retained
- *   - 0 will be returned
- */
-int cachefiles_allocate_page(struct fscache_retrieval *op,
-			     struct page *page,
-			     gfp_t gfp)
-{
-	struct cachefiles_object *object;
-	struct cachefiles_cache *cache;
-	int ret;
-
-	object = container_of(op->op.object,
-			      struct cachefiles_object, fscache);
-	cache = container_of(object->fscache.cache,
-			     struct cachefiles_cache, cache);
-
-	_enter("%p,{%lx},", object, page->index);
-
-	ret = cachefiles_has_space(cache, 0, 1);
-	if (ret == 0)
-		fscache_mark_page_cached(op, page);
-	else
-		ret = -ENOBUFS;
-
-	fscache_retrieval_complete(op, 1);
-	_leave(" = %d", ret);
-	return ret;
-}
-
-/*
- * allocate blocks in the cache in which to store a set of pages
- * - cache withdrawal is prevented by the caller
- * - returns -EINTR if interrupted
- * - returns -ENOMEM if ran out of memory
- * - returns -ENOBUFS if some buffers couldn't be made available
- * - returns -ENOBUFS if some pages are beyond EOF
- * - otherwise:
- *   - -ENODATA will be returned
- * - metadata will be retained for any page marked
- */
-int cachefiles_allocate_pages(struct fscache_retrieval *op,
-			      struct list_head *pages,
-			      unsigned *nr_pages,
-			      gfp_t gfp)
-{
-	struct cachefiles_object *object;
-	struct cachefiles_cache *cache;
-	struct pagevec pagevec;
-	struct page *page;
-	int ret;
-
-	object = container_of(op->op.object,
-			      struct cachefiles_object, fscache);
-	cache = container_of(object->fscache.cache,
-			     struct cachefiles_cache, cache);
-
-	_enter("%p,,,%d,", object, *nr_pages);
-
-	ret = cachefiles_has_space(cache, 0, *nr_pages);
-	if (ret == 0) {
-		pagevec_init(&pagevec);
-
-		list_for_each_entry(page, pages, lru) {
-			if (pagevec_add(&pagevec, page) == 0)
-				fscache_mark_pages_cached(op, &pagevec);
-		}
-
-		if (pagevec_count(&pagevec) > 0)
-			fscache_mark_pages_cached(op, &pagevec);
-		ret = -ENODATA;
-	} else {
-		ret = -ENOBUFS;
-	}
-
-	fscache_retrieval_complete(op, *nr_pages);
-	_leave(" = %d", ret);
-	return ret;
-}
-
-/*
- * request a page be stored in the cache
- * - cache withdrawal is prevented by the caller
- * - this request may be ignored if there's no cache block available, in which
- *   case -ENOBUFS will be returned
- * - if the op is in progress, 0 will be returned
- */
-int cachefiles_write_page(struct fscache_storage *op, struct page *page)
-{
-	struct cachefiles_object *object;
-	struct cachefiles_cache *cache;
-	struct file *file;
-	struct path path;
-	loff_t pos, eof;
-	size_t len;
-	void *data;
-	int ret = -ENOBUFS;
-
-	ASSERT(op != NULL);
-	ASSERT(page != NULL);
-
-	object = container_of(op->op.object,
-			      struct cachefiles_object, fscache);
-
-	_enter("%p,%p{%lx},,,", object, page, page->index);
-
-	if (!object->backer) {
-		_leave(" = -ENOBUFS");
-		return -ENOBUFS;
-	}
-
-	ASSERT(d_is_reg(object->backer));
-
-	cache = container_of(object->fscache.cache,
-			     struct cachefiles_cache, cache);
-
-	pos = (loff_t)page->index << PAGE_SHIFT;
-
-	/* We mustn't write more data than we have, so we have to beware of a
-	 * partial page at EOF.
-	 */
-	eof = object->fscache.store_limit_l;
-	if (pos >= eof)
-		goto error;
-
-	/* write the page to the backing filesystem and let it store it in its
-	 * own time */
-	path.mnt = cache->mnt;
-	path.dentry = object->backer;
-	file = dentry_open(&path, O_RDWR | O_LARGEFILE, cache->cache_cred);
-	if (IS_ERR(file)) {
-		ret = PTR_ERR(file);
-		goto error_2;
-	}
-
-	len = PAGE_SIZE;
-	if (eof & ~PAGE_MASK) {
-		if (eof - pos < PAGE_SIZE) {
-			_debug("cut short %llx to %llx",
-			       pos, eof);
-			len = eof - pos;
-			ASSERTCMP(pos + len, ==, eof);
-		}
-	}
-
-	data = kmap(page);
-	ret = kernel_write(file, data, len, &pos);
-	kunmap(page);
-	fput(file);
-	if (ret != len)
-		goto error_eio;
-
-	_leave(" = 0");
-	return 0;
-
-error_eio:
-	ret = -EIO;
-error_2:
-	if (ret == -EIO)
-		cachefiles_io_error_obj(object,
-					"Write page to backing file failed");
-error:
-	_leave(" = -ENOBUFS [%d]", ret);
-	return -ENOBUFS;
-}
-
-/*
- * detach a backing block from a page
- * - cache withdrawal is prevented by the caller
- */
-void cachefiles_uncache_page(struct fscache_object *_object, struct page *page)
-	__releases(&object->fscache.cookie->lock)
-{
-	struct cachefiles_object *object;
-
-	object = container_of(_object, struct cachefiles_object, fscache);
-
-	_enter("%p,{%lu}", object, page->index);
-
-	spin_unlock(&object->fscache.cookie->lock);
-}
diff --git a/fs/fscache/cache.c b/fs/fscache/cache.c
index fcc136361415..7de6d4cd29ee 100644
--- a/fs/fscache/cache.c
+++ b/fs/fscache/cache.c
@@ -382,12 +382,6 @@ void fscache_withdraw_cache(struct fscache_cache *cache)
 	cache->ops->sync_cache(cache);
 	fscache_stat_d(&fscache_n_cop_sync_cache);
 
-	/* dissociate all the netfs pages backed by this cache from the block
-	 * mappings in the cache */
-	fscache_stat(&fscache_n_cop_dissociate_pages);
-	cache->ops->dissociate_pages(cache);
-	fscache_stat_d(&fscache_n_cop_dissociate_pages);
-
 	/* we now have to destroy all the active objects pertaining to this
 	 * cache - which we do by passing them off to thread pool to be
 	 * disposed of */
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index c7047544972b..b35f727cc0d4 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -191,13 +191,8 @@ struct fscache_cookie *fscache_alloc_cookie(
 	cookie->flags		= (1 << FSCACHE_COOKIE_NO_DATA_YET);
 	cookie->type		= def->type;
 	spin_lock_init(&cookie->lock);
-	spin_lock_init(&cookie->stores_lock);
 	INIT_HLIST_HEAD(&cookie->backing_objects);
 
-	/* radix tree insertion won't use the preallocation pool unless it's
-	 * told it may not wait */
-	INIT_RADIX_TREE(&cookie->stores, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
-
 	write_lock(&fscache_cookies_lock);
 	list_add_tail(&cookie->proc_link, &fscache_cookies);
 	write_unlock(&fscache_cookies_lock);
@@ -786,10 +781,6 @@ void __fscache_disable_cookie(struct fscache_cookie *cookie,
 			       !atomic_read(&cookie->n_active));
 	}
 
-	/* Make sure any pending writes are cancelled. */
-	if (cookie->type != FSCACHE_COOKIE_TYPE_INDEX)
-		fscache_invalidate_writes(cookie);
-
 	/* Reset the cookie state if it wasn't relinquished */
 	if (!test_bit(FSCACHE_COOKIE_RELINQUISHED, &cookie->flags)) {
 		atomic_inc(&cookie->n_active);
@@ -838,7 +829,6 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie,
 	/* Clear pointers back to the netfs */
 	cookie->netfs_data	= NULL;
 	cookie->def		= NULL;
-	BUG_ON(!radix_tree_empty(&cookie->stores));
 
 	if (cookie->parent) {
 		ASSERTCMP(atomic_read(&cookie->parent->usage), >, 0);
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 4b535c2dae4a..83dfbe0e3964 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -142,7 +142,6 @@ extern int fscache_wait_for_operation_activation(struct fscache_object *,
 						 struct fscache_operation *,
 						 atomic_t *,
 						 atomic_t *);
-extern void fscache_invalidate_writes(struct fscache_cookie *);
 
 /*
  * proc.c
@@ -272,7 +271,6 @@ extern atomic_t fscache_n_cop_allocate_page;
 extern atomic_t fscache_n_cop_allocate_pages;
 extern atomic_t fscache_n_cop_write_page;
 extern atomic_t fscache_n_cop_uncache_page;
-extern atomic_t fscache_n_cop_dissociate_pages;
 
 extern atomic_t fscache_n_cache_no_space_reject;
 extern atomic_t fscache_n_cache_stale_objects;
@@ -325,27 +323,6 @@ static inline void fscache_cookie_get(struct fscache_cookie *cookie,
 	trace_fscache_cookie(cookie, where, usage);
 }
 
-/*
- * get an extra reference to a netfs retrieval context
- */
-static inline
-void *fscache_get_context(struct fscache_cookie *cookie, void *context)
-{
-	if (cookie->def->get_context)
-		cookie->def->get_context(cookie->netfs_data, context);
-	return context;
-}
-
-/*
- * release a reference to a netfs retrieval context
- */
-static inline
-void fscache_put_context(struct fscache_cookie *cookie, void *context)
-{
-	if (cookie->def->put_context)
-		cookie->def->put_context(cookie->netfs_data, context);
-}
-
 /*
  * Update the auxiliary data on a cookie.
  */
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index cb2146e02cd5..3d7f956a01c6 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -973,14 +973,12 @@ static const struct fscache_state *_fscache_invalidate_object(struct fscache_obj
 	 * retire the object instead.
 	 */
 	if (!fscache_use_cookie(object)) {
-		ASSERT(radix_tree_empty(&object->cookie->stores));
 		set_bit(FSCACHE_OBJECT_RETIRED, &object->flags);
 		_leave(" [no cookie]");
 		return transit_to(KILL_OBJECT);
 	}
 
 	/* Reject any new read/write ops and abort any that are pending. */
-	fscache_invalidate_writes(cookie);
 	clear_bit(FSCACHE_OBJECT_PENDING_WRITE, &object->flags);
 	fscache_cancel_all_ops(object);
 
diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index 26af6fdf1538..1beffb071205 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -13,174 +13,6 @@
 #include <linux/slab.h>
 #include "internal.h"
 
-/*
- * check to see if a page is being written to the cache
- */
-bool __fscache_check_page_write(struct fscache_cookie *cookie, struct page *page)
-{
-	void *val;
-
-	rcu_read_lock();
-	val = radix_tree_lookup(&cookie->stores, page->index);
-	rcu_read_unlock();
-	trace_fscache_check_page(cookie, page, val, 0);
-
-	return val != NULL;
-}
-EXPORT_SYMBOL(__fscache_check_page_write);
-
-/*
- * wait for a page to finish being written to the cache
- */
-void __fscache_wait_on_page_write(struct fscache_cookie *cookie, struct page *page)
-{
-	wait_queue_head_t *wq = bit_waitqueue(&cookie->flags, 0);
-
-	trace_fscache_page(cookie, page, fscache_page_write_wait);
-
-	wait_event(*wq, !__fscache_check_page_write(cookie, page));
-}
-EXPORT_SYMBOL(__fscache_wait_on_page_write);
-
-/*
- * wait for a page to finish being written to the cache. Put a timeout here
- * since we might be called recursively via parent fs.
- */
-static
-bool release_page_wait_timeout(struct fscache_cookie *cookie, struct page *page)
-{
-	wait_queue_head_t *wq = bit_waitqueue(&cookie->flags, 0);
-
-	return wait_event_timeout(*wq, !__fscache_check_page_write(cookie, page),
-				  HZ);
-}
-
-/*
- * decide whether a page can be released, possibly by cancelling a store to it
- * - we're allowed to sleep if __GFP_DIRECT_RECLAIM is flagged
- */
-bool __fscache_maybe_release_page(struct fscache_cookie *cookie,
-				  struct page *page,
-				  gfp_t gfp)
-{
-	struct page *xpage;
-	void *val;
-
-	_enter("%p,%p,%x", cookie, page, gfp);
-
-	trace_fscache_page(cookie, page, fscache_page_maybe_release);
-
-try_again:
-	rcu_read_lock();
-	val = radix_tree_lookup(&cookie->stores, page->index);
-	if (!val) {
-		rcu_read_unlock();
-		fscache_stat(&fscache_n_store_vmscan_not_storing);
-		__fscache_uncache_page(cookie, page);
-		return true;
-	}
-
-	/* see if the page is actually undergoing storage - if so we can't get
-	 * rid of it till the cache has finished with it */
-	if (radix_tree_tag_get(&cookie->stores, page->index,
-			       FSCACHE_COOKIE_STORING_TAG)) {
-		rcu_read_unlock();
-		goto page_busy;
-	}
-
-	/* the page is pending storage, so we attempt to cancel the store and
-	 * discard the store request so that the page can be reclaimed */
-	spin_lock(&cookie->stores_lock);
-	rcu_read_unlock();
-
-	if (radix_tree_tag_get(&cookie->stores, page->index,
-			       FSCACHE_COOKIE_STORING_TAG)) {
-		/* the page started to undergo storage whilst we were looking,
-		 * so now we can only wait or return */
-		spin_unlock(&cookie->stores_lock);
-		goto page_busy;
-	}
-
-	xpage = radix_tree_delete(&cookie->stores, page->index);
-	trace_fscache_page(cookie, page, fscache_page_radix_delete);
-	spin_unlock(&cookie->stores_lock);
-
-	if (xpage) {
-		fscache_stat(&fscache_n_store_vmscan_cancelled);
-		fscache_stat(&fscache_n_store_radix_deletes);
-		ASSERTCMP(xpage, ==, page);
-	} else {
-		fscache_stat(&fscache_n_store_vmscan_gone);
-	}
-
-	wake_up_bit(&cookie->flags, 0);
-	trace_fscache_wake_cookie(cookie);
-	if (xpage)
-		put_page(xpage);
-	__fscache_uncache_page(cookie, page);
-	return true;
-
-page_busy:
-	/* We will wait here if we're allowed to, but that could deadlock the
-	 * allocator as the work threads writing to the cache may all end up
-	 * sleeping on memory allocation, so we may need to impose a timeout
-	 * too. */
-	if (!(gfp & __GFP_DIRECT_RECLAIM) || !(gfp & __GFP_FS)) {
-		fscache_stat(&fscache_n_store_vmscan_busy);
-		return false;
-	}
-
-	fscache_stat(&fscache_n_store_vmscan_wait);
-	if (!release_page_wait_timeout(cookie, page))
-		_debug("fscache writeout timeout page: %p{%lx}",
-			page, page->index);
-
-	gfp &= ~__GFP_DIRECT_RECLAIM;
-	goto try_again;
-}
-EXPORT_SYMBOL(__fscache_maybe_release_page);
-
-/*
- * note that a page has finished being written to the cache
- */
-static void fscache_end_page_write(struct fscache_object *object,
-				   struct page *page)
-{
-	struct fscache_cookie *cookie;
-	struct page *xpage = NULL, *val;
-
-	spin_lock(&object->lock);
-	cookie = object->cookie;
-	if (cookie) {
-		/* delete the page from the tree if it is now no longer
-		 * pending */
-		spin_lock(&cookie->stores_lock);
-		radix_tree_tag_clear(&cookie->stores, page->index,
-				     FSCACHE_COOKIE_STORING_TAG);
-		trace_fscache_page(cookie, page, fscache_page_radix_clear_store);
-		if (!radix_tree_tag_get(&cookie->stores, page->index,
-					FSCACHE_COOKIE_PENDING_TAG)) {
-			fscache_stat(&fscache_n_store_radix_deletes);
-			xpage = radix_tree_delete(&cookie->stores, page->index);
-			trace_fscache_page(cookie, page, fscache_page_radix_delete);
-			trace_fscache_page(cookie, page, fscache_page_write_end);
-
-			val = radix_tree_lookup(&cookie->stores, page->index);
-			trace_fscache_check_page(cookie, page, val, 1);
-		} else {
-			trace_fscache_page(cookie, page, fscache_page_write_end_pend);
-		}
-		spin_unlock(&cookie->stores_lock);
-		wake_up_bit(&cookie->flags, 0);
-		trace_fscache_wake_cookie(cookie);
-	} else {
-		trace_fscache_page(cookie, page, fscache_page_write_end_noc);
-	}
-	spin_unlock(&object->lock);
-	if (xpage)
-		put_page(xpage);
-}
-
 /*
  * actually apply the changed attributes to a cache object
  */
@@ -265,76 +97,6 @@ int __fscache_attr_changed(struct fscache_cookie *cookie)
 }
 EXPORT_SYMBOL(__fscache_attr_changed);
 
-/*
- * Handle cancellation of a pending retrieval op
- */
-static void fscache_do_cancel_retrieval(struct fscache_operation *_op)
-{
-	struct fscache_retrieval *op =
-		container_of(_op, struct fscache_retrieval, op);
-
-	atomic_set(&op->n_pages, 0);
-}
-
-/*
- * release a retrieval op reference
- */
-static void fscache_release_retrieval_op(struct fscache_operation *_op)
-{
-	struct fscache_retrieval *op =
-		container_of(_op, struct fscache_retrieval, op);
-
-	_enter("{OP%x}", op->op.debug_id);
-
-	ASSERTIFCMP(op->op.state != FSCACHE_OP_ST_INITIALISED,
-		    atomic_read(&op->n_pages), ==, 0);
-
-	fscache_hist(fscache_retrieval_histogram, op->start_time);
-	if (op->context)
-		fscache_put_context(op->cookie, op->context);
-
-	_leave("");
-}
-
-/*
- * allocate a retrieval op
- */
-static struct fscache_retrieval *fscache_alloc_retrieval(
-	struct fscache_cookie *cookie,
-	struct address_space *mapping,
-	fscache_rw_complete_t end_io_func,
-	void *context)
-{
-	struct fscache_retrieval *op;
-
-	/* allocate a retrieval operation and attempt to submit it */
-	op = kzalloc(sizeof(*op), GFP_NOIO);
-	if (!op) {
-		fscache_stat(&fscache_n_retrievals_nomem);
-		return NULL;
-	}
-
-	fscache_operation_init(cookie, &op->op, NULL,
-			       fscache_do_cancel_retrieval,
-			       fscache_release_retrieval_op);
-	op->op.flags	= FSCACHE_OP_MYTHREAD |
-		(1UL << FSCACHE_OP_WAITING) |
-		(1UL << FSCACHE_OP_UNUSE_COOKIE);
-	op->cookie	= cookie;
-	op->mapping	= mapping;
-	op->end_io_func	= end_io_func;
-	op->context	= context;
-	op->start_time	= jiffies;
-	INIT_LIST_HEAD(&op->to_do);
-
-	/* Pin the netfs read context in case we need to do the actual netfs
-	 * read because we've encountered a cache read failure.
-	 */
-	if (context)
-		fscache_get_context(op->cookie, context);
-	return op;
-}
-
 /*
  * wait for a deferred lookup to complete
  */
@@ -416,833 +178,3 @@ int fscache_wait_for_operation_activation(struct fscache_object *object,
 	}
 	return 0;
 }
-
-/*
- * read a page from the cache or allocate a block in which to store it
- * - we return:
- *   -ENOMEM	- out of memory, nothing done
- *   -ERESTARTSYS - interrupted
- *   -ENOBUFS	- no backing object available in which to cache the block
- *   -ENODATA	- no data available in the backing object for this block
- *   0		- dispatched a read - it'll call end_io_func() when finished
- */
-int __fscache_read_or_alloc_page(struct fscache_cookie *cookie,
-				 struct page *page,
-				 fscache_rw_complete_t end_io_func,
-				 void *context,
-				 gfp_t gfp)
-{
-	struct fscache_retrieval *op;
-	struct fscache_object *object;
-	bool wake_cookie = false;
-	int ret;
-
-	_enter("%p,%p,,,", cookie, page);
-
-	fscache_stat(&fscache_n_retrievals);
-
-	if (hlist_empty(&cookie->backing_objects))
-		goto nobufs;
-
-	if (test_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags)) {
-		_leave(" = -ENOBUFS [invalidating]");
-		return -ENOBUFS;
-	}
-
-	ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
-	ASSERTCMP(page, !=, NULL);
-
-	if (fscache_wait_for_deferred_lookup(cookie) < 0)
-		return -ERESTARTSYS;
-
-	op = fscache_alloc_retrieval(cookie, page->mapping,
-				     end_io_func, context);
-	if (!op) {
-		_leave(" = -ENOMEM");
-		return -ENOMEM;
-	}
-	atomic_set(&op->n_pages, 1);
-	trace_fscache_page_op(cookie, page, &op->op, fscache_page_op_retr_one);
-
-	spin_lock(&cookie->lock);
-
-	if (!fscache_cookie_enabled(cookie) ||
-	    hlist_empty(&cookie->backing_objects))
-		goto nobufs_unlock;
-	object = hlist_entry(cookie->backing_objects.first,
-			     struct fscache_object, cookie_link);
-
-	ASSERT(test_bit(FSCACHE_OBJECT_IS_LOOKED_UP, &object->flags));
-
-	__fscache_use_cookie(cookie);
-	atomic_inc(&object->n_reads);
-	__set_bit(FSCACHE_OP_DEC_READ_CNT, &op->op.flags);
-
-	if (fscache_submit_op(object, &op->op) < 0)
-		goto nobufs_unlock_dec;
-	spin_unlock(&cookie->lock);
-
-	fscache_stat(&fscache_n_retrieval_ops);
-
-	/* we wait for the operation to become active, and then process it
-	 * *here*, in this thread, and not in the thread pool */
-	ret = fscache_wait_for_operation_activation(
-		object, &op->op,
-		__fscache_stat(&fscache_n_retrieval_op_waits),
-		__fscache_stat(&fscache_n_retrievals_object_dead));
-	if (ret < 0)
-		goto error;
-
-	/* ask the cache to honour the operation */
-	if (test_bit(FSCACHE_COOKIE_NO_DATA_YET, &object->cookie->flags)) {
-		fscache_stat(&fscache_n_cop_allocate_page);
-		ret = object->cache->ops->allocate_page(op, page, gfp);
-		fscache_stat_d(&fscache_n_cop_allocate_page);
-		if (ret == 0)
-			ret = -ENODATA;
-	} else {
-		fscache_stat(&fscache_n_cop_read_or_alloc_page);
-		ret = object->cache->ops->read_or_alloc_page(op, page, gfp);
-		fscache_stat_d(&fscache_n_cop_read_or_alloc_page);
-	}
-
-error:
-	if (ret == -ENOMEM)
-		fscache_stat(&fscache_n_retrievals_nomem);
-	else if (ret == -ERESTARTSYS)
-		fscache_stat(&fscache_n_retrievals_intr);
-	else if (ret == -ENODATA)
-		fscache_stat(&fscache_n_retrievals_nodata);
-	else if (ret < 0)
-		fscache_stat(&fscache_n_retrievals_nobufs);
-	else
-		fscache_stat(&fscache_n_retrievals_ok);
-
-	fscache_put_retrieval(op);
-	_leave(" = %d", ret);
-	return ret;
-
-nobufs_unlock_dec:
-	atomic_dec(&object->n_reads);
-	wake_cookie = __fscache_unuse_cookie(cookie);
-nobufs_unlock:
-	spin_unlock(&cookie->lock);
-	if (wake_cookie)
-		__fscache_wake_unused_cookie(cookie);
-	fscache_put_retrieval(op);
-nobufs:
-	fscache_stat(&fscache_n_retrievals_nobufs);
-	_leave(" = -ENOBUFS");
-	return -ENOBUFS;
-}
-EXPORT_SYMBOL(__fscache_read_or_alloc_page);
-
-/*
- * read a list of page from the cache or allocate a block in which to store
- * them
- * - we return:
- *   -ENOMEM	- out of memory, some pages may be being read
- *   -ERESTARTSYS - interrupted, some pages may be being read
- *   -ENOBUFS	- no backing object or space available in which to cache any
- *                pages not being read
- *   -ENODATA	- no data available in the backing object for some or all of
- *                the pages
- *   0		- dispatched a read on all pages
- *
- * end_io_func() will be called for each page read from the cache as it is
- * finishes being read
- *
- * any pages for which a read is dispatched will be removed from pages and
- * nr_pages
- */
-int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
-				  struct address_space *mapping,
-				  struct list_head *pages,
-				  unsigned *nr_pages,
-				  fscache_rw_complete_t end_io_func,
-				  void *context,
-				  gfp_t gfp)
-{
-	struct fscache_retrieval *op;
-	struct fscache_object *object;
-	bool wake_cookie = false;
-	int ret;
-
-	_enter("%p,,%d,,,", cookie, *nr_pages);
-
-	fscache_stat(&fscache_n_retrievals);
-
-	if (hlist_empty(&cookie->backing_objects))
-		goto nobufs;
-
-	if (test_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags)) {
-		_leave(" = -ENOBUFS [invalidating]");
-		return -ENOBUFS;
-	}
-
-	ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
-	ASSERTCMP(*nr_pages, >, 0);
-	ASSERT(!list_empty(pages));
-
-	if (fscache_wait_for_deferred_lookup(cookie) < 0)
-		return -ERESTARTSYS;
-
-	op = fscache_alloc_retrieval(cookie, mapping, end_io_func, context);
-	if (!op)
-		return -ENOMEM;
-	atomic_set(&op->n_pages, *nr_pages);
-	trace_fscache_page_op(cookie, NULL, &op->op, fscache_page_op_retr_multi);
-
-	spin_lock(&cookie->lock);
-
-	if (!fscache_cookie_enabled(cookie) ||
-	    hlist_empty(&cookie->backing_objects))
-		goto nobufs_unlock;
-	object = hlist_entry(cookie->backing_objects.first,
-			     struct fscache_object, cookie_link);
-
-	__fscache_use_cookie(cookie);
-	atomic_inc(&object->n_reads);
-	__set_bit(FSCACHE_OP_DEC_READ_CNT, &op->op.flags);
-
-	if (fscache_submit_op(object, &op->op) < 0)
-		goto nobufs_unlock_dec;
-	spin_unlock(&cookie->lock);
-
-	fscache_stat(&fscache_n_retrieval_ops);
-
-	/* we wait for the operation to become active, and then process it
-	 * *here*, in this thread, and not in the thread pool */
-	ret = fscache_wait_for_operation_activation(
-		object, &op->op,
-		__fscache_stat(&fscache_n_retrieval_op_waits),
-		__fscache_stat(&fscache_n_retrievals_object_dead));
-	if (ret < 0)
-		goto error;
-
-	/* ask the cache to honour the operation */
-	if (test_bit(FSCACHE_COOKIE_NO_DATA_YET, &object->cookie->flags)) {
-		fscache_stat(&fscache_n_cop_allocate_pages);
-		ret = object->cache->ops->allocate_pages(
-			op, pages, nr_pages, gfp);
-		fscache_stat_d(&fscache_n_cop_allocate_pages);
-	} else {
-		fscache_stat(&fscache_n_cop_read_or_alloc_pages);
-		ret = object->cache->ops->read_or_alloc_pages(
-			op, pages, nr_pages, gfp);
-		fscache_stat_d(&fscache_n_cop_read_or_alloc_pages);
-	}
-
-error:
-	if (ret == -ENOMEM)
-		fscache_stat(&fscache_n_retrievals_nomem);
-	else if (ret == -ERESTARTSYS)
-		fscache_stat(&fscache_n_retrievals_intr);
-	else if (ret == -ENODATA)
-		fscache_stat(&fscache_n_retrievals_nodata);
-	else if (ret < 0)
-		fscache_stat(&fscache_n_retrievals_nobufs);
-	else
-		fscache_stat(&fscache_n_retrievals_ok);
-
-	fscache_put_retrieval(op);
-	_leave(" = %d", ret);
-	return ret;
-
-nobufs_unlock_dec:
-	atomic_dec(&object->n_reads);
-	wake_cookie = __fscache_unuse_cookie(cookie);
-nobufs_unlock:
-	spin_unlock(&cookie->lock);
-	fscache_put_retrieval(op);
-	if (wake_cookie)
-		__fscache_wake_unused_cookie(cookie);
-nobufs:
-	fscache_stat(&fscache_n_retrievals_nobufs);
-	_leave(" = -ENOBUFS");
-	return -ENOBUFS;
-}
-EXPORT_SYMBOL(__fscache_read_or_alloc_pages);
-
-/*
- * allocate a block in the cache on which to store a page
- * - we return:
- *   -ENOMEM	- out of memory, nothing done
- *   -ERESTARTSYS - interrupted
- *   -ENOBUFS	- no backing object available in which to cache the block
- *   0		- block allocated
- */
-int __fscache_alloc_page(struct fscache_cookie *cookie,
-			 struct page *page,
-			 gfp_t gfp)
-{
-	struct fscache_retrieval *op;
-	struct fscache_object *object;
-	bool wake_cookie = false;
-	int ret;
-
-	_enter("%p,%p,,,", cookie, page);
-
-	fscache_stat(&fscache_n_allocs);
-
-	if (hlist_empty(&cookie->backing_objects))
-		goto nobufs;
-
-	ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
-	ASSERTCMP(page, !=, NULL);
-
-	if (test_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags)) {
-		_leave(" = -ENOBUFS [invalidating]");
-		return -ENOBUFS;
-	}
-
-	if (fscache_wait_for_deferred_lookup(cookie) < 0)
-		return -ERESTARTSYS;
-
-	op = fscache_alloc_retrieval(cookie, page->mapping, NULL, NULL);
-	if (!op)
-		return -ENOMEM;
-	atomic_set(&op->n_pages, 1);
-	trace_fscache_page_op(cookie, page, &op->op, fscache_page_op_alloc_one);
-
-	spin_lock(&cookie->lock);
-
-	if (!fscache_cookie_enabled(cookie) ||
-	    hlist_empty(&cookie->backing_objects))
-		goto nobufs_unlock;
-	object = hlist_entry(cookie->backing_objects.first,
-			     struct fscache_object, cookie_link);
-
-	__fscache_use_cookie(cookie);
-	if (fscache_submit_op(object, &op->op) < 0)
-		goto nobufs_unlock_dec;
-	spin_unlock(&cookie->lock);
-
-	fscache_stat(&fscache_n_alloc_ops);
-
-	ret = fscache_wait_for_operation_activation(
-		object, &op->op,
-		__fscache_stat(&fscache_n_alloc_op_waits),
-		__fscache_stat(&fscache_n_allocs_object_dead));
-	if (ret < 0)
-		goto error;
-
-	/* ask the cache to honour the operation */
-	fscache_stat(&fscache_n_cop_allocate_page);
-	ret = object->cache->ops->allocate_page(op, page, gfp);
-	fscache_stat_d(&fscache_n_cop_allocate_page);
-
-error:
-	if (ret == -ERESTARTSYS)
-		fscache_stat(&fscache_n_allocs_intr);
-	else if (ret < 0)
-		fscache_stat(&fscache_n_allocs_nobufs);
-	else
-		fscache_stat(&fscache_n_allocs_ok);
-
-	fscache_put_retrieval(op);
-	_leave(" = %d", ret);
-	return ret;
-
-nobufs_unlock_dec:
-	wake_cookie = __fscache_unuse_cookie(cookie);
-nobufs_unlock:
-	spin_unlock(&cookie->lock);
-	fscache_put_retrieval(op);
-	if (wake_cookie)
-		__fscache_wake_unused_cookie(cookie);
-nobufs:
-	fscache_stat(&fscache_n_allocs_nobufs);
-	_leave(" = -ENOBUFS");
-	return -ENOBUFS;
-}
-EXPORT_SYMBOL(__fscache_alloc_page);
-
-/*
- * Unmark pages allocate in the readahead code path (via:
- * fscache_readpages_or_alloc) after delegating to the base filesystem
- */
-void __fscache_readpages_cancel(struct fscache_cookie *cookie,
-				struct list_head *pages)
-{
-	struct page *page;
-
-	list_for_each_entry(page, pages, lru) {
-		if (PageFsCache(page))
-			__fscache_uncache_page(cookie, page);
-	}
-}
-EXPORT_SYMBOL(__fscache_readpages_cancel);
-
-/*
- * release a write op reference
- */
-static void fscache_release_write_op(struct fscache_operation *_op)
-{
-	_enter("{OP%x}", _op->debug_id);
-}
-
-/*
- * perform the background storage of a page into the cache
- */
-static void fscache_write_op(struct fscache_operation *_op)
-{
-	struct fscache_storage *op =
-		container_of(_op, struct fscache_storage, op);
-	struct fscache_object *object = op->op.object;
-	struct fscache_cookie *cookie;
-	struct page *page;
-	unsigned n;
-	void *results[1];
-	int ret;
-
-	_enter("{OP%x,%d}", op->op.debug_id, atomic_read(&op->op.usage));
-
-again:
-	spin_lock(&object->lock);
-	cookie = object->cookie;
-
-	if (!fscache_object_is_active(object)) {
-		/* If we get here, then the on-disk cache object likely no
-		 * longer exists, so we should just cancel this write
-		 * operation.
-		 */
-		spin_unlock(&object->lock);
-		fscache_op_complete(&op->op, true);
-		_leave(" [inactive]");
-		return;
-	}
-
-	if (!cookie) {
-		/* If we get here, then the cookie belonging to the object was
-		 * detached, probably by the cookie being withdrawn due to
-		 * memory pressure, which means that the pages we might write
-		 * to the cache from no longer exist - therefore, we can just
-		 * cancel this write operation.
-		 */
-		spin_unlock(&object->lock);
-		fscache_op_complete(&op->op, true);
-		_leave(" [cancel] op{f=%lx s=%u} obj{s=%s f=%lx}",
-		       _op->flags, _op->state, object->state->short_name,
-		       object->flags);
-		return;
-	}
-
-	spin_lock(&cookie->stores_lock);
-
-	fscache_stat(&fscache_n_store_calls);
-
-	/* find a page to store */
-	results[0] = NULL;
-	page = NULL;
-	n = radix_tree_gang_lookup_tag(&cookie->stores, results, 0, 1,
-				       FSCACHE_COOKIE_PENDING_TAG);
-	trace_fscache_gang_lookup(cookie, &op->op, results, n, op->store_limit);
-	if (n != 1)
-		goto superseded;
-	page = results[0];
-	_debug("gang %d [%lx]", n, page->index);
-
-	radix_tree_tag_set(&cookie->stores, page->index,
-			   FSCACHE_COOKIE_STORING_TAG);
-	radix_tree_tag_clear(&cookie->stores, page->index,
-			     FSCACHE_COOKIE_PENDING_TAG);
-	trace_fscache_page(cookie, page, fscache_page_radix_pend2store);
-
-	spin_unlock(&cookie->stores_lock);
-	spin_unlock(&object->lock);
-
-	if (page->index >= op->store_limit)
-		goto discard_page;
-
-	fscache_stat(&fscache_n_store_pages);
-	fscache_stat(&fscache_n_cop_write_page);
-	ret = object->cache->ops->write_page(op, page);
-	fscache_stat_d(&fscache_n_cop_write_page);
-	trace_fscache_wrote_page(cookie, page, &op->op, ret);
-	fscache_end_page_write(object, page);
-	if (ret < 0) {
-		fscache_abort_object(object);
-		fscache_op_complete(&op->op, true);
-	} else {
-		fscache_enqueue_operation(&op->op);
-	}
-
-	_leave("");
-	return;
-
-discard_page:
-	fscache_stat(&fscache_n_store_pages_over_limit);
-	trace_fscache_wrote_page(cookie, page, &op->op, -ENOBUFS);
-	fscache_end_page_write(object, page);
-	goto again;
-
-superseded:
-	/* this writer is going away and there aren't any more things to
-	 * write */
-	_debug("cease");
-	spin_unlock(&cookie->stores_lock);
-	clear_bit(FSCACHE_OBJECT_PENDING_WRITE, &object->flags);
-	spin_unlock(&object->lock);
-	fscache_op_complete(&op->op, false);
-	_leave("");
-}
-
-/*
- * Clear the pages pending writing for invalidation
- */
-void fscache_invalidate_writes(struct fscache_cookie *cookie)
-{
-	struct page *page;
-	void *results[16];
-	int n, i;
-
-	_enter("");
-
-	for (;;) {
-		spin_lock(&cookie->stores_lock);
-		n = radix_tree_gang_lookup_tag(&cookie->stores, results, 0,
-					       ARRAY_SIZE(results),
-					       FSCACHE_COOKIE_PENDING_TAG);
-		if (n == 0) {
-			spin_unlock(&cookie->stores_lock);
-			break;
-		}
-
-		for (i = n - 1; i >= 0; i--) {
-			page = results[i];
-			radix_tree_delete(&cookie->stores, page->index);
-			trace_fscache_page(cookie, page, fscache_page_radix_delete);
-			trace_fscache_page(cookie, page, fscache_page_inval);
-		}
-
-		spin_unlock(&cookie->stores_lock);
-
-		for (i = n - 1; i >= 0; i--)
-			put_page(results[i]);
-	}
-
-	wake_up_bit(&cookie->flags, 0);
-	trace_fscache_wake_cookie(cookie);
-
-	_leave("");
-}
-
-/*
- * request a page be stored in the cache
- * - returns:
- *   -ENOMEM	- out of memory, nothing done
- *   -ENOBUFS	- no backing object available in which to cache the page
- *   0		- dispatched a write - it'll call end_io_func() when finished
- *
- * if the cookie still has a backing object at this point, that object can be
- * in one of a few states with respect to storage processing:
- *
- *  (1) negative lookup, object not yet created (FSCACHE_COOKIE_CREATING is
- *      set)
- *
- *	(a) no writes yet
- *
- *	(b) writes deferred till post-creation (mark page for writing and
- *	    return immediately)
- *
- *  (2) negative lookup, object created, initial fill being made from netfs
- *
- *	(a) fill point not yet reached this page (mark page for writing and
- *          return)
- *
- *	(b) fill point passed this page (queue op to store this page)
- *
- *  (3) object extant (queue op to store this page)
- *
- * any other state is invalid
- */
-int __fscache_write_page(struct fscache_cookie *cookie,
-			 struct page *page,
-			 loff_t object_size,
-			 gfp_t gfp)
-{
-	struct fscache_storage *op;
-	struct fscache_object *object;
-	bool wake_cookie = false;
-	int ret;
-
-	_enter("%p,%x,", cookie, (u32) page->flags);
-
-	ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
-	ASSERT(PageFsCache(page));
-
-	fscache_stat(&fscache_n_stores);
-
-	if (test_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags)) {
-		_leave(" = -ENOBUFS [invalidating]");
-		return -ENOBUFS;
-	}
-
-	op = kzalloc(sizeof(*op), GFP_NOIO | __GFP_NOMEMALLOC | __GFP_NORETRY);
-	if (!op)
-		goto nomem;
-
-	fscache_operation_init(cookie, &op->op, fscache_write_op, NULL,
-			       fscache_release_write_op);
-	op->op.flags = FSCACHE_OP_ASYNC |
-		(1 << FSCACHE_OP_WAITING) |
-		(1 << FSCACHE_OP_UNUSE_COOKIE);
-
-	ret = radix_tree_maybe_preload(gfp & ~__GFP_HIGHMEM);
-	if (ret < 0)
-		goto nomem_free;
-
-	trace_fscache_page_op(cookie, page, &op->op, fscache_page_op_write_one);
-
-	ret = -ENOBUFS;
-	spin_lock(&cookie->lock);
-
-	if (!fscache_cookie_enabled(cookie) ||
-	    hlist_empty(&cookie->backing_objects))
-		goto nobufs;
-	object = hlist_entry(cookie->backing_objects.first,
-			     struct fscache_object, cookie_link);
-	if (test_bit(FSCACHE_IOERROR, &object->cache->flags))
-		goto nobufs;
-
-	trace_fscache_page(cookie, page, fscache_page_write);
-
-	/* add the page to the pending-storage radix tree on the backing
-	 * object */
-	spin_lock(&object->lock);
-
-	if (object->store_limit_l != object_size)
-		fscache_set_store_limit(object, object_size);
-
-	spin_lock(&cookie->stores_lock);
-
-	_debug("store limit %llx", (unsigned long long) object->store_limit);
-
-	ret = radix_tree_insert(&cookie->stores, page->index, page);
-	if (ret < 0) {
-		if (ret == -EEXIST)
-			goto already_queued;
-		_debug("insert failed %d", ret);
-		goto nobufs_unlock_obj;
-	}
-
-	trace_fscache_page(cookie, page, fscache_page_radix_insert);
-	radix_tree_tag_set(&cookie->stores, page->index,
-			   FSCACHE_COOKIE_PENDING_TAG);
-	trace_fscache_page(cookie, page, fscache_page_radix_set_pend);
-	get_page(page);
-
-	/* we only want one writer at a time, but we do need to queue new
-	 * writers after exclusive ops */
-	if (test_and_set_bit(FSCACHE_OBJECT_PENDING_WRITE, &object->flags))
-		goto already_pending;
-
-	spin_unlock(&cookie->stores_lock);
-	spin_unlock(&object->lock);
-
-	op->op.debug_id	= atomic_inc_return(&fscache_op_debug_id);
-	op->store_limit = object->store_limit;
-
-	__fscache_use_cookie(cookie);
-	if (fscache_submit_op(object, &op->op) < 0)
-		goto submit_failed;
-
-	spin_unlock(&cookie->lock);
-	radix_tree_preload_end();
-	fscache_stat(&fscache_n_store_ops);
-	fscache_stat(&fscache_n_stores_ok);
-
-	/* the work queue now carries its own ref on the object */
-	fscache_put_operation(&op->op);
-	_leave(" = 0");
-	return 0;
-
-already_queued:
-	fscache_stat(&fscache_n_stores_again);
-already_pending:
-	spin_unlock(&cookie->stores_lock);
-	spin_unlock(&object->lock);
-	spin_unlock(&cookie->lock);
-	radix_tree_preload_end();
-	fscache_put_operation(&op->op);
-	fscache_stat(&fscache_n_stores_ok);
-	_leave(" = 0");
-	return 0;
-
-submit_failed:
-	spin_lock(&cookie->stores_lock);
-	radix_tree_delete(&cookie->stores, page->index);
-	trace_fscache_page(cookie, page, fscache_page_radix_delete);
-	spin_unlock(&cookie->stores_lock);
-	wake_cookie = __fscache_unuse_cookie(cookie);
-	put_page(page);
-	ret = -ENOBUFS;
-	goto nobufs;
-
-nobufs_unlock_obj:
-	spin_unlock(&cookie->stores_lock);
-	spin_unlock(&object->lock);
-nobufs:
-	spin_unlock(&cookie->lock);
-	radix_tree_preload_end();
-	fscache_put_operation(&op->op);
-	if (wake_cookie)
-		__fscache_wake_unused_cookie(cookie);
-	fscache_stat(&fscache_n_stores_nobufs);
-	_leave(" = -ENOBUFS");
-	return -ENOBUFS;
-
-nomem_free:
-	fscache_put_operation(&op->op);
-nomem:
-	fscache_stat(&fscache_n_stores_oom);
-	_leave(" = -ENOMEM");
-	return -ENOMEM;
-}
-EXPORT_SYMBOL(__fscache_write_page);
-
-/*
- * remove a page from the cache
- */
-void __fscache_uncache_page(struct fscache_cookie *cookie, struct page *page)
-{
-	struct fscache_object *object;
-
-	_enter(",%p", page);
-
-	ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
-	ASSERTCMP(page, !=, NULL);
-
-	fscache_stat(&fscache_n_uncaches);
-
-	/* cache withdrawal may beat us to it */
-	if (!PageFsCache(page))
-		goto done;
-
-	trace_fscache_page(cookie, page, fscache_page_uncache);
-
-	/* get the object */
-	spin_lock(&cookie->lock);
-
-	if (hlist_empty(&cookie->backing_objects)) {
-		ClearPageFsCache(page);
-		goto done_unlock;
-	}
-
-	object = hlist_entry(cookie->backing_objects.first,
-			     struct fscache_object, cookie_link);
-
-	/* there might now be stuff on disk we could read */
-	clear_bit(FSCACHE_COOKIE_NO_DATA_YET, &cookie->flags);
-
-	/* only invoke the cache backend if we managed to mark the page
-	 * uncached here; this deals with synchronisation vs withdrawal */
-	if (TestClearPageFsCache(page) &&
-	    object->cache->ops->uncache_page) {
-		/* the cache backend releases the cookie lock */
-		fscache_stat(&fscache_n_cop_uncache_page);
-		object->cache->ops->uncache_page(object, page);
-		fscache_stat_d(&fscache_n_cop_uncache_page);
-		goto done;
-	}
-
-done_unlock:
-	spin_unlock(&cookie->lock);
-done:
-	_leave("");
-}
-EXPORT_SYMBOL(__fscache_uncache_page);
-
-/**
- * fscache_mark_page_cached - Mark a page as being cached
- * @op: The retrieval op pages are being marked for
- * @page: The page to be marked
- *
- * Mark a netfs page as being cached.  After this is called, the netfs
- * must call fscache_uncache_page() to remove the mark.
- */
-void fscache_mark_page_cached(struct fscache_retrieval *op, struct page *page)
-{
-	struct fscache_cookie *cookie = op->op.object->cookie;
-
-#ifdef CONFIG_FSCACHE_STATS
-	atomic_inc(&fscache_n_marks);
-#endif
-
-	trace_fscache_page(cookie, page, fscache_page_cached);
-
-	_debug("- mark %p{%lx}", page, page->index);
-	if (TestSetPageFsCache(page)) {
-		static bool once_only;
-		if (!once_only) {
-			once_only = true;
-			pr_warn("Cookie type %s marked page %lx multiple times\n",
-				cookie->def->name, page->index);
-		}
-	}
-
-	if (cookie->def->mark_page_cached)
-		cookie->def->mark_page_cached(cookie->netfs_data,
-					      op->mapping, page);
-}
-EXPORT_SYMBOL(fscache_mark_page_cached);
-
-/**
- * fscache_mark_pages_cached - Mark pages as being cached
- * @op: The retrieval op pages are being marked for
- * @pagevec: The pages to be marked
- *
- * Mark a bunch of netfs pages as being cached.  After this is called,
- * the netfs must call fscache_uncache_page() to remove the mark.
- */
-void fscache_mark_pages_cached(struct fscache_retrieval *op,
-			       struct pagevec *pagevec)
-{
-	unsigned long loop;
-
-	for (loop = 0; loop < pagevec->nr; loop++)
-		fscache_mark_page_cached(op, pagevec->pages[loop]);
-
-	pagevec_reinit(pagevec);
-}
-EXPORT_SYMBOL(fscache_mark_pages_cached);
-
-/*
- * Uncache all the pages in an inode that are marked PG_fscache, assuming them
- * to be associated with the given cookie.
- */
-void __fscache_uncache_all_inode_pages(struct fscache_cookie *cookie,
-				       struct inode *inode)
-{
-	struct address_space *mapping = inode->i_mapping;
-	struct pagevec pvec;
-	pgoff_t next;
-	int i;
-
-	_enter("%p,%p", cookie, inode);
-
-	if (!mapping || mapping->nrpages == 0) {
-		_leave(" [no pages]");
-		return;
-	}
-
-	pagevec_init(&pvec);
-	next = 0;
-	do {
-		if (!pagevec_lookup(&pvec, mapping, &next))
-			break;
-		for (i = 0; i < pagevec_count(&pvec); i++) {
-			struct page *page = pvec.pages[i];
-			if (PageFsCache(page)) {
-				__fscache_wait_on_page_write(cookie, page);
-				__fscache_uncache_page(cookie, page);
-			}
-		}
-		pagevec_release(&pvec);
-		cond_resched();
-	} while (next);
-
-	_leave("");
-}
-EXPORT_SYMBOL(__fscache_uncache_all_inode_pages);
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index a5aa93ece8c5..281022871e70 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -124,7 +124,6 @@ atomic_t fscache_n_cop_allocate_page;
 atomic_t fscache_n_cop_allocate_pages;
 atomic_t fscache_n_cop_write_page;
 atomic_t fscache_n_cop_uncache_page;
-atomic_t fscache_n_cop_dissociate_pages;
 
 atomic_t fscache_n_cache_no_space_reject;
 atomic_t fscache_n_cache_stale_objects;
@@ -265,14 +264,13 @@ int fscache_stats_show(struct seq_file *m, void *v)
 		   atomic_read(&fscache_n_cop_put_object),
 		   atomic_read(&fscache_n_cop_attr_changed),
 		   atomic_read(&fscache_n_cop_sync_cache));
-	seq_printf(m, "CacheOp: rap=%d ras=%d alp=%d als=%d wrp=%d ucp=%d dsp=%d\n",
+	seq_printf(m, "CacheOp: rap=%d ras=%d alp=%d als=%d wrp=%d ucp=%d\n",
 		   atomic_read(&fscache_n_cop_read_or_alloc_page),
 		   atomic_read(&fscache_n_cop_read_or_alloc_pages),
 		   atomic_read(&fscache_n_cop_allocate_page),
 		   atomic_read(&fscache_n_cop_allocate_pages),
 		   atomic_read(&fscache_n_cop_write_page),
-		   atomic_read(&fscache_n_cop_uncache_page),
-		   atomic_read(&fscache_n_cop_dissociate_pages));
+		   atomic_read(&fscache_n_cop_uncache_page));
 	seq_printf(m, "CacheEv: nsp=%d stl=%d rtr=%d cul=%d\n",
 		   atomic_read(&fscache_n_cache_no_space_reject),
 		   atomic_read(&fscache_n_cache_stale_objects),
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 3f0b19dcfae7..f01fe979b323 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -137,88 +137,6 @@ extern void fscache_operation_init(struct fscache_cookie *,
 				   fscache_operation_cancel_t,
 				   fscache_operation_release_t);
 
-/*
- * data read operation
- */
-struct fscache_retrieval {
-	struct fscache_operation op;
-	struct fscache_cookie	*cookie;	/* The netfs cookie */
-	struct address_space	*mapping;	/* netfs pages */
-	fscache_rw_complete_t	end_io_func;	/* function to call on I/O completion */
-	void			*context;	/* netfs read context (pinned) */
-	struct list_head	to_do;		/* list of things to be done by the backend */
-	unsigned long		start_time;	/* time at which retrieval started */
-	atomic_t		n_pages;	/* number of pages to be retrieved */
-};
-
-typedef int (*fscache_page_retrieval_func_t)(struct fscache_retrieval *op,
-					     struct page *page,
-					     gfp_t gfp);
-
-typedef int (*fscache_pages_retrieval_func_t)(struct fscache_retrieval *op,
-					      struct list_head *pages,
-					      unsigned *nr_pages,
-					      gfp_t gfp);
-
-/**
- * fscache_get_retrieval - Get an extra reference on a retrieval operation
- * @op: The retrieval operation to get a reference on
- *
- * Get an extra reference on a retrieval operation.
- */
-static inline
-struct fscache_retrieval *fscache_get_retrieval(struct fscache_retrieval *op)
-{
-	atomic_inc(&op->op.usage);
-	return op;
-}
-
-/**
- * fscache_enqueue_retrieval - Enqueue a retrieval operation for processing
- * @op: The retrieval operation affected
- *
- * Enqueue a retrieval operation for processing by the FS-Cache thread pool.
- */
-static inline void fscache_enqueue_retrieval(struct fscache_retrieval *op)
-{
-	fscache_enqueue_operation(&op->op);
-}
-
-/**
- * fscache_retrieval_complete - Record (partial) completion of a retrieval
- * @op: The retrieval operation affected
- * @n_pages: The number of pages to account for
- */
-static inline void fscache_retrieval_complete(struct fscache_retrieval *op,
-					      int n_pages)
-{
-	if (atomic_sub_return_relaxed(n_pages, &op->n_pages) <= 0)
-		fscache_op_complete(&op->op, false);
-}
-
-/**
- * fscache_put_retrieval - Drop a reference to a retrieval operation
- * @op: The retrieval operation affected
- *
- * Drop a reference to a retrieval operation.
- */
-static inline void fscache_put_retrieval(struct fscache_retrieval *op)
-{
-	fscache_put_operation(&op->op);
-}
-
-/*
- * cached page storage work item
- * - used to do three things:
- *   - batch writes to the cache
- *   - do cache writes asynchronously
- *   - defer writes until cache object lookup completion
- */
-struct fscache_storage {
-	struct fscache_operation op;
-	pgoff_t			store_limit;	/* don't write more than this */
-};
-
 /*
  * cache operations
  */
@@ -275,35 +193,6 @@ struct fscache_cache_ops {
 
 	/* reserve space for an object's data and associated metadata */
 	int (*reserve_space)(struct fscache_object *object, loff_t i_size);
-
-	/* request a backing block for a page be read or allocated in the
-	 * cache */
-	fscache_page_retrieval_func_t read_or_alloc_page;
-
-	/* request backing blocks for a list of pages be read or allocated in
-	 * the cache */
-	fscache_pages_retrieval_func_t read_or_alloc_pages;
-
-	/* request a backing block for a page be allocated in the cache so that
-	 * it can be written directly */
-	fscache_page_retrieval_func_t allocate_page;
-
-	/* request backing blocks for pages be allocated in the cache so that
-	 * they can be written directly */
-	fscache_pages_retrieval_func_t allocate_pages;
-
-	/* write a page to its backing block in the cache */
-	int (*write_page)(struct fscache_storage *op, struct page *page);
-
-	/* detach backing block from a page (optional)
-	 * - must release the cookie lock before returning
-	 * - may sleep
-	 */
-	void (*uncache_page)(struct fscache_object *object,
-			     struct page *page);
-
-	/* dissociate a cache from all the pages it was backing */
-	void (*dissociate_pages)(struct fscache_cache *cache);
 };
 
 extern struct fscache_cookie fscache_fsdef_index;
@@ -466,21 +355,6 @@ void fscache_set_store_limit(struct fscache_object *object, loff_t i_size)
 		object->store_limit++;
 }
 
-/**
- * fscache_end_io - End a retrieval operation on a page
- * @op: The FS-Cache operation covering the retrieval
- * @page: The page that was to be fetched
- * @error: The error code (0 if successful)
- *
- * Note the end of an operation to retrieve a page, as covered by a particular
- * operation record.
- */
-static inline void fscache_end_io(struct fscache_retrieval *op,
-				  struct page *page, int error)
-{
-	op->end_io_func(page, op->context, error);
-}
-
 static inline void __fscache_use_cookie(struct fscache_cookie *cookie)
 {
 	atomic_inc(&cookie->n_active);
@@ -538,12 +412,6 @@ extern void fscache_withdraw_cache(struct fscache_cache *cache);
 
 extern void fscache_io_error(struct fscache_cache *cache);
 
-extern void fscache_mark_page_cached(struct fscache_retrieval *op,
-				     struct page *page);
-
-extern void fscache_mark_pages_cached(struct fscache_retrieval *op,
-				      struct pagevec *pagevec);
-
 extern bool fscache_object_sleep_till_congested(signed long *timeoutp);
 
 extern enum fscache_checkaux fscache_check_aux(struct fscache_object *object,
diff --git a/include/linux/fscache-obsolete.h b/include/linux/fscache-obsolete.h
new file mode 100644
index 000000000000..8d6d3a3b0d0a
--- /dev/null
+++ b/include/linux/fscache-obsolete.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* Obsolete fscache bits
+ *
+ * Copyright (C) 2004-2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#ifndef _LINUX_FSCACHE_OBSOLETE_H
+#define _LINUX_FSCACHE_OBSOLETE_H
+
+
+
+#endif /* _LINUX_FSCACHE_OBSOLETE_H */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 2606fe7edd29..143d48281117 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -47,10 +47,6 @@ struct fscache_cache_tag;
 struct fscache_cookie;
 struct fscache_netfs;
 
-typedef void (*fscache_rw_complete_t)(struct page *page,
-				      void *context,
-				      int error);
-
 /* result of index entry consultation */
 enum fscache_checkaux {
 	FSCACHE_CHECKAUX_OKAY,		/* entry okay as is */
@@ -89,27 +85,6 @@ struct fscache_cookie_def {
 					   const void *data,
 					   uint16_t datalen,
 					   loff_t object_size);
-
-	/* get an extra reference on a read context
-	 * - this function can be absent if the completion function doesn't
-	 *   require a context
-	 */
-	void (*get_context)(void *cookie_netfs_data, void *context);
-
-	/* release an extra reference on a read context
-	 * - this function can be absent if the completion function doesn't
-	 *   require a context
-	 */
-	void (*put_context)(void *cookie_netfs_data, void *context);
-
-	/* indicate page that now have cache metadata retained
-	 * - this function should mark the specified page as now being cached
-	 * - the page will have been marked with PG_fscache before this is
-	 *   called, so this is optional
-	 */
-	void (*mark_page_cached)(void *cookie_netfs_data,
-				 struct address_space *mapping,
-				 struct page *page);
 };
 
 /*
@@ -136,16 +111,12 @@ struct fscache_cookie {
 	atomic_t			n_active;	/* number of active users of netfs ptrs */
 	unsigned int			debug_id;
 	spinlock_t			lock;
-	spinlock_t			stores_lock;	/* lock on page store tree */
 	struct hlist_head		backing_objects; /* object(s) backing this file/index */
 	const struct fscache_cookie_def	*def;		/* definition */
 	struct fscache_cookie		*parent;	/* parent of this entry */
 	struct hlist_bl_node		hash_link;	/* Link in hash table */
 	struct list_head		proc_link;	/* Link in proc list */
 	void				*netfs_data;	/* back pointer to netfs */
-	struct radix_tree_root		stores;		/* pages to be stored on this cookie */
-#define FSCACHE_COOKIE_PENDING_TAG	0		/* pages tag: pending write to cache */
-#define FSCACHE_COOKIE_STORING_TAG	1		/* pages tag: writing to cache */
 
 	unsigned long			flags;
 #define FSCACHE_COOKIE_LOOKING_UP	0	/* T if non-index cookie being looked up still */
@@ -202,29 +173,6 @@ extern void __fscache_update_cookie(struct fscache_cookie *, const void *);
 extern int __fscache_attr_changed(struct fscache_cookie *);
 extern void __fscache_invalidate(struct fscache_cookie *);
 extern void __fscache_wait_on_invalidate(struct fscache_cookie *);
-extern int __fscache_read_or_alloc_page(struct fscache_cookie *,
-					struct page *,
-					fscache_rw_complete_t,
-					void *,
-					gfp_t);
-extern int __fscache_read_or_alloc_pages(struct fscache_cookie *,
-					 struct address_space *,
-					 struct list_head *,
-					 unsigned *,
-					 fscache_rw_complete_t,
-					 void *,
-					 gfp_t);
-extern int __fscache_alloc_page(struct fscache_cookie *, struct page *, gfp_t);
-extern int __fscache_write_page(struct fscache_cookie *, struct page *, loff_t, gfp_t);
-extern void __fscache_uncache_page(struct fscache_cookie *, struct page *);
-extern bool __fscache_check_page_write(struct fscache_cookie *, struct page *);
-extern void __fscache_wait_on_page_write(struct fscache_cookie *, struct page *);
-extern bool __fscache_maybe_release_page(struct fscache_cookie *, struct page *,
-					 gfp_t);
-extern void __fscache_uncache_all_inode_pages(struct fscache_cookie *,
-					      struct inode *);
-extern void __fscache_readpages_cancel(struct fscache_cookie *cookie,
-				       struct list_head *pages);
 extern void __fscache_disable_cookie(struct fscache_cookie *, const void *, bool);
 extern void __fscache_enable_cookie(struct fscache_cookie *, const void *, loff_t,
 				    bool (*)(void *), void *);
@@ -491,303 +439,6 @@ void fscache_wait_on_invalidate(struct fscache_cookie *cookie)
 		__fscache_wait_on_invalidate(cookie);
 }
 
-/**
- * fscache_reserve_space - Reserve data space for a cached object
- * @cookie: The cookie representing the cache object
- * @i_size: The amount of space to be reserved
- *
- * Reserve an amount of space in the cache for the cache object attached to a
- * cookie so that a write to that object within the space can always be
- * honoured.
- *
- * See Documentation/filesystems/caching/netfs-api.rst for a complete
- * description.
- */
-static inline
-int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size)
-{
-	return -ENOBUFS;
-}
-
-/**
- * fscache_read_or_alloc_page - Read a page from the cache or allocate a block
- * in which to store it
- * @cookie: The cookie representing the cache object
- * @page: The netfs page to fill if possible
- * @end_io_func: The callback to invoke when and if the page is filled
- * @context: An arbitrary piece of data to pass on to end_io_func()
- * @gfp: The conditions under which memory allocation should be made
- *
- * Read a page from the cache, or if that's not possible make a potential
- * one-block reservation in the cache into which the page may be stored once
- * fetched from the server.
- *
- * If the page is not backed by the cache object, or if it there's some reason
- * it can't be, -ENOBUFS will be returned and nothing more will be done for
- * that page.
- *
- * Else, if that page is backed by the cache, a read will be initiated directly
- * to the netfs's page and 0 will be returned by this function.  The
- * end_io_func() callback will be invoked when the operation terminates on a
- * completion or failure.  Note that the callback may be invoked before the
- * return.
- *
- * Else, if the page is unbacked, -ENODATA is returned and a block may have
- * been allocated in the cache.
- *
- * See Documentation/filesystems/caching/netfs-api.rst for a complete
- * description.
- */
-static inline
-int fscache_read_or_alloc_page(struct fscache_cookie *cookie,
-			       struct page *page,
-			       fscache_rw_complete_t end_io_func,
-			       void *context,
-			       gfp_t gfp)
-{
-	if (fscache_cookie_valid(cookie) && fscache_cookie_enabled(cookie))
-		return __fscache_read_or_alloc_page(cookie, page, end_io_func,
-						    context, gfp);
-	else
-		return -ENOBUFS;
-}
-
-/**
- * fscache_read_or_alloc_pages - Read pages from the cache and/or allocate
- * blocks in which to store them
- * @cookie: The cookie representing the cache object
- * @mapping: The netfs inode mapping to which the pages will be attached
- * @pages: A list of potential netfs pages to be filled
- * @nr_pages: Number of pages to be read and/or allocated
- * @end_io_func: The callback to invoke when and if each page is filled
- * @context: An arbitrary piece of data to pass on to end_io_func()
- * @gfp: The conditions under which memory allocation should be made
- *
- * Read a set of pages from the cache, or if that's not possible, attempt to
- * make a potential one-block reservation for each page in the cache into which
- * that page may be stored once fetched from the server.
- *
- * If some pages are not backed by the cache object, or if it there's some
- * reason they can't be, -ENOBUFS will be returned and nothing more will be
- * done for that pages.
- *
- * Else, if some of the pages are backed by the cache, a read will be initiated
- * directly to the netfs's page and 0 will be returned by this function.  The
- * end_io_func() callback will be invoked when the operation terminates on a
- * completion or failure.  Note that the callback may be invoked before the
- * return.
- *
- * Else, if a page is unbacked, -ENODATA is returned and a block may have
- * been allocated in the cache.
- *
- * Because the function may want to return all of -ENOBUFS, -ENODATA and 0 in
- * regard to different pages, the return values are prioritised in that order.
- * Any pages submitted for reading are removed from the pages list.
- *
- * See Documentation/filesystems/caching/netfs-api.rst for a complete
- * description.
- */
-static inline
-int fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
-				struct address_space *mapping,
-				struct list_head *pages,
-				unsigned *nr_pages,
-				fscache_rw_complete_t end_io_func,
-				void *context,
-				gfp_t gfp)
-{
-	if (fscache_cookie_valid(cookie) && fscache_cookie_enabled(cookie))
-		return __fscache_read_or_alloc_pages(cookie, mapping, pages,
-						     nr_pages, end_io_func,
-						     context, gfp);
-	else
-		return -ENOBUFS;
-}
-
-/**
- * fscache_alloc_page - Allocate a block in which to store a page
- * @cookie: The cookie representing the cache object
- * @page: The netfs page to allocate a page for
- * @gfp: The conditions under which memory allocation should be made
- *
- * Request Allocation a block in the cache in which to store a netfs page
- * without retrieving any contents from the cache.
- *
- * If the page is not backed by a file then -ENOBUFS will be returned and
- * nothing more will be done, and no reservation will be made.
- *
- * Else, a block will be allocated if one wasn't already, and 0 will be
- * returned
- *
- * See Documentation/filesystems/caching/netfs-api.rst for a complete
- * description.
- */
-static inline
-int fscache_alloc_page(struct fscache_cookie *cookie,
-		       struct page *page,
-		       gfp_t gfp)
-{
-	if (fscache_cookie_valid(cookie) && fscache_cookie_enabled(cookie))
-		return __fscache_alloc_page(cookie, page, gfp);
-	else
-		return -ENOBUFS;
-}
-
-/**
- * fscache_readpages_cancel - Cancel read/alloc on pages
- * @cookie: The cookie representing the inode's cache object.
- * @pages: The netfs pages that we canceled write on in readpages()
- *
- * Uncache/unreserve the pages reserved earlier in readpages() via
- * fscache_readpages_or_alloc() and similar.  In most successful caches in
- * readpages() this doesn't do anything.  In cases when the underlying netfs's
- * readahead failed we need to clean up the pagelist (unmark and uncache).
- *
- * This function may sleep as it may have to clean up disk state.
- */
-static inline
-void fscache_readpages_cancel(struct fscache_cookie *cookie,
-			      struct list_head *pages)
-{
-	if (fscache_cookie_valid(cookie))
-		__fscache_readpages_cancel(cookie, pages);
-}
-
-/**
- * fscache_write_page - Request storage of a page in the cache
- * @cookie: The cookie representing the cache object
- * @page: The netfs page to store
- * @object_size: Updated size of object
- * @gfp: The conditions under which memory allocation should be made
- *
- * Request the contents of the netfs page be written into the cache.  This
- * request may be ignored if no cache block is currently allocated, in which
- * case it will return -ENOBUFS.
- *
- * If a cache block was already allocated, a write will be initiated and 0 will
- * be returned.  The PG_fscache_write page bit is set immediately and will then
- * be cleared at the completion of the write to indicate the success or failure
- * of the operation.  Note that the completion may happen before the return.
- *
- * See Documentation/filesystems/caching/netfs-api.rst for a complete
- * description.
- */
-static inline
-int fscache_write_page(struct fscache_cookie *cookie,
-		       struct page *page,
-		       loff_t object_size,
-		       gfp_t gfp)
-{
-	if (fscache_cookie_valid(cookie) && fscache_cookie_enabled(cookie))
-		return __fscache_write_page(cookie, page, object_size, gfp);
-	else
-		return -ENOBUFS;
-}
-
-/**
- * fscache_uncache_page - Indicate that caching is no longer required on a page
- * @cookie: The cookie representing the cache object
- * @page: The netfs page that was being cached.
- *
- * Tell the cache that we no longer want a page to be cached and that it should
- * remove any knowledge of the netfs page it may have.
- *
- * Note that this cannot cancel any outstanding I/O operations between this
- * page and the cache.
- *
- * See Documentation/filesystems/caching/netfs-api.rst for a complete
- * description.
- */
-static inline
-void fscache_uncache_page(struct fscache_cookie *cookie,
-			  struct page *page)
-{
-	if (fscache_cookie_valid(cookie))
-		__fscache_uncache_page(cookie, page);
-}
-
-/**
- * fscache_check_page_write - Ask if a page is being writing to the cache
- * @cookie: The cookie representing the cache object
- * @page: The netfs page that is being cached.
- *
- * Ask the cache if a page is being written to the cache.
- *
- * See Documentation/filesystems/caching/netfs-api.rst for a complete
- * description.
- */
-static inline
-bool fscache_check_page_write(struct fscache_cookie *cookie,
-			      struct page *page)
-{
-	if (fscache_cookie_valid(cookie))
-		return __fscache_check_page_write(cookie, page);
-	return false;
-}
-
-/**
- * fscache_wait_on_page_write - Wait for a page to complete writing to the cache
- * @cookie: The cookie representing the cache object
- * @page: The netfs page that is being cached.
- *
- * Ask the cache to wake us up when a page is no longer being written to the
- * cache.
- *
- * See Documentation/filesystems/caching/netfs-api.rst for a complete
- * description.
- */
-static inline
-void fscache_wait_on_page_write(struct fscache_cookie *cookie,
-				struct page *page)
-{
-	if (fscache_cookie_valid(cookie))
-		__fscache_wait_on_page_write(cookie, page);
-}
-
-/**
- * fscache_maybe_release_page - Consider releasing a page, cancelling a store
- * @cookie: The cookie representing the cache object
- * @page: The netfs page that is being cached.
- * @gfp: The gfp flags passed to releasepage()
- *
- * Consider releasing a page for the vmscan algorithm, on behalf of the netfs's
- * releasepage() call.  A storage request on the page may cancelled if it is
- * not currently being processed.
- *
- * The function returns true if the page no longer has a storage request on it,
- * and false if a storage request is left in place.  If true is returned, the
- * page will have been passed to fscache_uncache_page().  If false is returned
- * the page cannot be freed yet.
- */
-static inline
-bool fscache_maybe_release_page(struct fscache_cookie *cookie,
-				struct page *page,
-				gfp_t gfp)
-{
-	if (fscache_cookie_valid(cookie) && PageFsCache(page))
-		return __fscache_maybe_release_page(cookie, page, gfp);
-	return true;
-}
-
-/**
- * fscache_uncache_all_inode_pages - Uncache all an inode's pages
- * @cookie: The cookie representing the inode's cache object.
- * @inode: The inode to uncache pages from.
- *
- * Uncache all the pages in an inode that are marked PG_fscache, assuming them
- * to be associated with the given cookie.
- *
- * This function may sleep.  It will wait for pages that are being written out
- * and will wait whilst the PG_fscache mark is removed by the cache.
- */
-static inline
-void fscache_uncache_all_inode_pages(struct fscache_cookie *cookie,
-				     struct inode *inode)
-{
-	if (fscache_cookie_valid(cookie))
-		__fscache_uncache_all_inode_pages(cookie, inode);
-}
-
 /**
  * fscache_disable_cookie - Disable a cookie
  * @cookie: The cookie representing the cache object



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 06/76] fscache: Remove the netfs data from the cookie
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (4 preceding siblings ...)
  2020-11-20 15:03 ` [RFC PATCH 05/76] fscache: Remove the old I/O API David Howells
@ 2020-11-20 15:03 ` David Howells
  2020-11-20 15:04 ` [RFC PATCH 07/76] fscache: Remove struct fscache_cookie_def David Howells
                   ` (68 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:03 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Remove the netfs data pointer from the cookie so that we don't refer back
to the netfs state and don't need accessors to manage this.  Keep the
information we do need (of which there's not actually a lot) in the cookie
which we can keep hold of if the netfs state goes away.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/cache.c                |   39 --------
 fs/afs/cell.c                 |    3 -
 fs/afs/inode.c                |    3 -
 fs/afs/volume.c               |    4 -
 fs/cachefiles/interface.c     |  104 +++------------------
 fs/cachefiles/internal.h      |   20 +---
 fs/cachefiles/namei.c         |   10 +-
 fs/cachefiles/xattr.c         |  204 +++++++----------------------------------
 fs/fscache/cache.c            |   20 +---
 fs/fscache/cookie.c           |   37 ++++---
 fs/fscache/fsdef.c            |   37 -------
 fs/fscache/internal.h         |   29 ++++--
 fs/fscache/netfs.c            |    3 -
 fs/fscache/object-list.c      |    9 +-
 fs/fscache/object.c           |   48 ----------
 fs/nfs/fscache-index.c        |    4 -
 include/linux/fscache-cache.h |   32 +++++-
 include/linux/fscache.h       |   39 +-------
 18 files changed, 144 insertions(+), 501 deletions(-)

diff --git a/fs/afs/cache.c b/fs/afs/cache.c
index 037af93e3aba..9b2de3014dbf 100644
--- a/fs/afs/cache.c
+++ b/fs/afs/cache.c
@@ -8,11 +8,6 @@
 #include <linux/sched.h>
 #include "internal.h"
 
-static enum fscache_checkaux afs_vnode_cache_check_aux(void *cookie_netfs_data,
-						       const void *buffer,
-						       uint16_t buflen,
-						       loff_t object_size);
-
 struct fscache_netfs afs_cache_netfs = {
 	.name			= "afs",
 	.version		= 2,
@@ -31,38 +26,4 @@ struct fscache_cookie_def afs_volume_cache_index_def = {
 struct fscache_cookie_def afs_vnode_cache_index_def = {
 	.name		= "AFS.vnode",
 	.type		= FSCACHE_COOKIE_TYPE_DATAFILE,
-	.check_aux	= afs_vnode_cache_check_aux,
 };
-
-/*
- * check that the auxiliary data indicates that the entry is still valid
- */
-static enum fscache_checkaux afs_vnode_cache_check_aux(void *cookie_netfs_data,
-						       const void *buffer,
-						       uint16_t buflen,
-						       loff_t object_size)
-{
-	struct afs_vnode *vnode = cookie_netfs_data;
-	struct afs_vnode_cache_aux aux;
-
-	_enter("{%llx,%x,%llx},%p,%u",
-	       vnode->fid.vnode, vnode->fid.unique, vnode->status.data_version,
-	       buffer, buflen);
-
-	memcpy(&aux, buffer, sizeof(aux));
-
-	/* check the size of the data is what we're expecting */
-	if (buflen != sizeof(aux)) {
-		_leave(" = OBSOLETE [len %hx != %zx]", buflen, sizeof(aux));
-		return FSCACHE_CHECKAUX_OBSOLETE;
-	}
-
-	if (vnode->status.data_version != aux.data_version) {
-		_leave(" = OBSOLETE [vers %llx != %llx]",
-		       aux.data_version, vnode->status.data_version);
-		return FSCACHE_CHECKAUX_OBSOLETE;
-	}
-
-	_leave(" = SUCCESS");
-	return FSCACHE_CHECKAUX_OKAY;
-}
diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index 887b673f6223..d4cc13ce847b 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -681,9 +681,10 @@ static int afs_activate_cell(struct afs_net *net, struct afs_cell *cell)
 #ifdef CONFIG_AFS_FSCACHE
 	cell->cache = fscache_acquire_cookie(afs_cache_netfs.primary_index,
 					     &afs_cell_cache_index_def,
+					     NULL,
 					     cell->name, strlen(cell->name),
 					     NULL, 0,
-					     cell, 0, true);
+					     0, true);
 #endif
 	ret = afs_proc_cell_setup(cell);
 	if (ret < 0)
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 79c8dc06c0b8..883fa9f67520 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -433,9 +433,10 @@ static void afs_get_inode_cache(struct afs_vnode *vnode)
 
 	vnode->cache = fscache_acquire_cookie(vnode->volume->cache,
 					      &afs_vnode_cache_index_def,
+					      NULL,
 					      &key, sizeof(key),
 					      &aux, sizeof(aux),
-					      vnode, vnode->status.size, true);
+					      vnode->status.size, true);
 #endif
 }
 
diff --git a/fs/afs/volume.c b/fs/afs/volume.c
index f84194b791d3..9ca246ab1a86 100644
--- a/fs/afs/volume.c
+++ b/fs/afs/volume.c
@@ -273,9 +273,9 @@ void afs_activate_volume(struct afs_volume *volume)
 #ifdef CONFIG_AFS_FSCACHE
 	volume->cache = fscache_acquire_cookie(volume->cell->cache,
 					       &afs_volume_cache_index_def,
+					       NULL,
 					       &volume->vid, sizeof(volume->vid),
-					       NULL, 0,
-					       volume, 0, true);
+					       NULL, 0, 0, true);
 #endif
 }
 
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 04d92ad402a4..530ed11bb213 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -7,13 +7,9 @@
 
 #include <linux/slab.h>
 #include <linux/mount.h>
+#include <linux/xattr.h>
 #include "internal.h"
 
-struct cachefiles_lookup_data {
-	struct cachefiles_xattr	*auxdata;	/* auxiliary data */
-	char			*key;		/* key path */
-};
-
 static int cachefiles_attr_changed(struct fscache_object *_object);
 
 /*
@@ -23,11 +19,9 @@ static struct fscache_object *cachefiles_alloc_object(
 	struct fscache_cache *_cache,
 	struct fscache_cookie *cookie)
 {
-	struct cachefiles_lookup_data *lookup_data;
 	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
-	struct cachefiles_xattr *auxdata;
-	unsigned keylen, auxlen;
+	unsigned keylen;
 	void *buffer, *p;
 	char *key;
 
@@ -35,10 +29,6 @@ static struct fscache_object *cachefiles_alloc_object(
 
 	_enter("{%s},%p,", cache->cache.identifier, cookie);
 
-	lookup_data = kmalloc(sizeof(*lookup_data), cachefiles_gfp);
-	if (!lookup_data)
-		goto nomem_lookup_data;
-
 	/* create a new object record and a temporary leaf image */
 	object = kmem_cache_alloc(cachefiles_object_jar, cachefiles_gfp);
 	if (!object)
@@ -62,10 +52,7 @@ static struct fscache_object *cachefiles_alloc_object(
 		goto nomem_buffer;
 
 	keylen = cookie->key_len;
-	if (keylen <= sizeof(cookie->inline_key))
-		p = cookie->inline_key;
-	else
-		p = cookie->key;
+	p = fscache_get_key(cookie);
 	memcpy(buffer + 2, p, keylen);
 
 	*(uint16_t *)buffer = keylen;
@@ -75,28 +62,13 @@ static struct fscache_object *cachefiles_alloc_object(
 
 	/* turn the raw key into something that can work with as a filename */
 	key = cachefiles_cook_key(buffer, keylen + 2, object->type);
+	kfree(buffer);
 	if (!key)
 		goto nomem_key;
 
-	/* get hold of the auxiliary data and prepend the object type */
-	auxdata = buffer;
-	auxlen = cookie->aux_len;
-	if (auxlen) {
-		if (auxlen <= sizeof(cookie->inline_aux))
-			p = cookie->inline_aux;
-		else
-			p = cookie->aux;
-		memcpy(auxdata->data, p, auxlen);
-	}
-
-	auxdata->len = auxlen + 1;
-	auxdata->type = cookie->type;
-
-	lookup_data->auxdata = auxdata;
-	lookup_data->key = key;
-	object->lookup_data = lookup_data;
+	object->lookup_key = key;
 
-	_leave(" = %p [%p]", &object->fscache, lookup_data);
+	_leave(" = %p [%s]", &object->fscache, key);
 	return &object->fscache;
 
 nomem_key:
@@ -106,8 +78,6 @@ static struct fscache_object *cachefiles_alloc_object(
 	kmem_cache_free(cachefiles_object_jar, object);
 	fscache_object_destroyed(&cache->cache);
 nomem_object:
-	kfree(lookup_data);
-nomem_lookup_data:
 	_leave(" = -ENOMEM");
 	return ERR_PTR(-ENOMEM);
 }
@@ -118,7 +88,6 @@ static struct fscache_object *cachefiles_alloc_object(
  */
 static int cachefiles_lookup_object(struct fscache_object *_object)
 {
-	struct cachefiles_lookup_data *lookup_data;
 	struct cachefiles_object *parent, *object;
 	struct cachefiles_cache *cache;
 	const struct cred *saved_cred;
@@ -130,15 +99,12 @@ static int cachefiles_lookup_object(struct fscache_object *_object)
 	parent = container_of(_object->parent,
 			      struct cachefiles_object, fscache);
 	object = container_of(_object, struct cachefiles_object, fscache);
-	lookup_data = object->lookup_data;
 
-	ASSERTCMP(lookup_data, !=, NULL);
+	ASSERTCMP(object->lookup_key, !=, NULL);
 
 	/* look up the key, creating any missing bits */
 	cachefiles_begin_secure(cache, &saved_cred);
-	ret = cachefiles_walk_to_object(parent, object,
-					lookup_data->key,
-					lookup_data->auxdata);
+	ret = cachefiles_walk_to_object(parent, object, object->lookup_key);
 	cachefiles_end_secure(cache, saved_cred);
 
 	/* polish off by setting the attributes of non-index files */
@@ -165,14 +131,10 @@ static void cachefiles_lookup_complete(struct fscache_object *_object)
 
 	object = container_of(_object, struct cachefiles_object, fscache);
 
-	_enter("{OBJ%x,%p}", object->fscache.debug_id, object->lookup_data);
+	_enter("{OBJ%x}", object->fscache.debug_id);
 
-	if (object->lookup_data) {
-		kfree(object->lookup_data->key);
-		kfree(object->lookup_data->auxdata);
-		kfree(object->lookup_data);
-		object->lookup_data = NULL;
-	}
+	kfree(object->lookup_key);
+	object->lookup_key = NULL;
 }
 
 /*
@@ -204,12 +166,8 @@ struct fscache_object *cachefiles_grab_object(struct fscache_object *_object,
 static void cachefiles_update_object(struct fscache_object *_object)
 {
 	struct cachefiles_object *object;
-	struct cachefiles_xattr *auxdata;
 	struct cachefiles_cache *cache;
-	struct fscache_cookie *cookie;
 	const struct cred *saved_cred;
-	const void *aux;
-	unsigned auxlen;
 
 	_enter("{OBJ%x}", _object->debug_id);
 
@@ -217,40 +175,9 @@ static void cachefiles_update_object(struct fscache_object *_object)
 	cache = container_of(object->fscache.cache, struct cachefiles_cache,
 			     cache);
 
-	if (!fscache_use_cookie(_object)) {
-		_leave(" [relinq]");
-		return;
-	}
-
-	cookie = object->fscache.cookie;
-	auxlen = cookie->aux_len;
-
-	if (!auxlen) {
-		fscache_unuse_cookie(_object);
-		_leave(" [no aux]");
-		return;
-	}
-
-	auxdata = kmalloc(2 + auxlen + 3, cachefiles_gfp);
-	if (!auxdata) {
-		fscache_unuse_cookie(_object);
-		_leave(" [nomem]");
-		return;
-	}
-
-	aux = (auxlen <= sizeof(cookie->inline_aux)) ?
-		cookie->inline_aux : cookie->aux;
-
-	memcpy(auxdata->data, aux, auxlen);
-	fscache_unuse_cookie(_object);
-
-	auxdata->len = auxlen + 1;
-	auxdata->type = cookie->type;
-
 	cachefiles_begin_secure(cache, &saved_cred);
-	cachefiles_update_object_xattr(object, auxdata);
+	cachefiles_set_object_xattr(object, XATTR_REPLACE);
 	cachefiles_end_secure(cache, saved_cred);
-	kfree(auxdata);
 	_leave("");
 }
 
@@ -354,12 +281,7 @@ static void cachefiles_put_object(struct fscache_object *_object,
 		ASSERTCMP(object->fscache.n_ops, ==, 0);
 		ASSERTCMP(object->fscache.n_children, ==, 0);
 
-		if (object->lookup_data) {
-			kfree(object->lookup_data->key);
-			kfree(object->lookup_data->auxdata);
-			kfree(object->lookup_data);
-			object->lookup_data = NULL;
-		}
+		kfree(object->lookup_key);
 
 		cache = object->fscache.cache;
 		fscache_object_destroy(&object->fscache);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index aca73c8403ab..a5d48f271ce1 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -34,7 +34,7 @@ extern unsigned cachefiles_debug;
  */
 struct cachefiles_object {
 	struct fscache_object		fscache;	/* fscache handle */
-	struct cachefiles_lookup_data	*lookup_data;	/* cached lookup data */
+	char				*lookup_key;	/* key to look up */
 	struct dentry			*dentry;	/* the file/dir representing this object */
 	struct dentry			*backer;	/* backing file */
 	loff_t				i_size;		/* object size */
@@ -88,15 +88,6 @@ struct cachefiles_cache {
 	char				*tag;		/* cache binding tag */
 };
 
-/*
- * auxiliary data xattr buffer
- */
-struct cachefiles_xattr {
-	uint16_t			len;
-	uint8_t				type;
-	uint8_t				data[];
-};
-
 #include <trace/events/cachefiles.h>
 
 /*
@@ -142,8 +133,7 @@ extern int cachefiles_delete_object(struct cachefiles_cache *cache,
 				    struct cachefiles_object *object);
 extern int cachefiles_walk_to_object(struct cachefiles_object *parent,
 				     struct cachefiles_object *object,
-				     const char *key,
-				     struct cachefiles_xattr *auxdata);
+				     const char *key);
 extern struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
 					       struct dentry *dir,
 					       const char *name);
@@ -204,12 +194,8 @@ static inline void cachefiles_end_secure(struct cachefiles_cache *cache,
  */
 extern int cachefiles_check_object_type(struct cachefiles_object *object);
 extern int cachefiles_set_object_xattr(struct cachefiles_object *object,
-				       struct cachefiles_xattr *auxdata);
-extern int cachefiles_update_object_xattr(struct cachefiles_object *object,
-					  struct cachefiles_xattr *auxdata);
+				       unsigned int xattr_flags);
 extern int cachefiles_check_auxdata(struct cachefiles_object *object);
-extern int cachefiles_check_object_xattr(struct cachefiles_object *object,
-					 struct cachefiles_xattr *auxdata);
 extern int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
 					  struct dentry *dentry);
 
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index ecc8ecbbfa5a..432002080b83 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -45,11 +45,10 @@ void __cachefiles_printk_object(struct cachefiles_object *object,
 	spin_lock(&object->fscache.lock);
 	cookie = object->fscache.cookie;
 	if (cookie) {
-		pr_err("%scookie=%p [pr=%p nd=%p fl=%lx]\n",
+		pr_err("%scookie=%p [pr=%p fl=%lx]\n",
 		       prefix,
 		       object->fscache.cookie,
 		       object->fscache.cookie->parent,
-		       object->fscache.cookie->netfs_data,
 		       object->fscache.cookie->flags);
 		pr_err("%skey=[%u] '", prefix, cookie->key_len);
 		k = (cookie->key_len <= sizeof(cookie->inline_key)) ?
@@ -481,8 +480,7 @@ int cachefiles_delete_object(struct cachefiles_cache *cache,
  */
 int cachefiles_walk_to_object(struct cachefiles_object *parent,
 			      struct cachefiles_object *object,
-			      const char *key,
-			      struct cachefiles_xattr *auxdata)
+			      const char *key)
 {
 	struct cachefiles_cache *cache;
 	struct dentry *dir, *next = NULL;
@@ -636,7 +634,7 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	if (!object->new) {
 		_debug("validate '%pd'", next);
 
-		ret = cachefiles_check_object_xattr(object, auxdata);
+		ret = cachefiles_check_auxdata(object);
 		if (ret == -ESTALE) {
 			/* delete the object (the deleter drops the directory
 			 * mutex) */
@@ -671,7 +669,7 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 
 	if (object->new) {
 		/* attach data to a newly constructed terminal object */
-		ret = cachefiles_set_object_xattr(object, auxdata);
+		ret = cachefiles_set_object_xattr(object, XATTR_CREATE);
 		if (ret < 0)
 			goto check_error;
 	} else {
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index 72e42438f3d7..5ab8643ca97a 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -15,6 +15,11 @@
 #include <linux/slab.h>
 #include "internal.h"
 
+struct cachefiles_xattr {
+	uint8_t				type;
+	uint8_t				data[];
+} __packed;
+
 static const char cachefiles_xattr_cache[] =
 	XATTR_USER_PREFIX "CacheFiles.cache";
 
@@ -97,56 +102,35 @@ int cachefiles_check_object_type(struct cachefiles_object *object)
  * set the state xattr on a cache file
  */
 int cachefiles_set_object_xattr(struct cachefiles_object *object,
-				struct cachefiles_xattr *auxdata)
-{
-	struct dentry *dentry = object->dentry;
-	int ret;
-
-	ASSERT(dentry);
-
-	_enter("%p,#%d", object, auxdata->len);
-
-	/* attempt to install the cache metadata directly */
-	_debug("SET #%u", auxdata->len);
-
-	clear_bit(FSCACHE_COOKIE_AUX_UPDATED, &object->fscache.cookie->flags);
-	ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
-			   &auxdata->type, auxdata->len,
-			   XATTR_CREATE);
-	if (ret < 0 && ret != -ENOMEM)
-		cachefiles_io_error_obj(
-			object,
-			"Failed to set xattr with error %d", ret);
-
-	_leave(" = %d", ret);
-	return ret;
-}
-
-/*
- * update the state xattr on a cache file
- */
-int cachefiles_update_object_xattr(struct cachefiles_object *object,
-				   struct cachefiles_xattr *auxdata)
+				unsigned int xattr_flags)
 {
+	struct cachefiles_xattr *buf;
 	struct dentry *dentry = object->dentry;
+	unsigned int len = object->fscache.cookie->aux_len;
 	int ret;
 
 	if (!dentry)
 		return -ESTALE;
 
-	_enter("%p,#%d", object, auxdata->len);
+	_enter("%p,#%d", object, len);
 
-	/* attempt to install the cache metadata directly */
-	_debug("SET #%u", auxdata->len);
+	buf = kmalloc(sizeof(struct cachefiles_xattr) + len, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	buf->type = object->fscache.cookie->def->type;
+	if (len > 0)
+		memcpy(buf->data, fscache_get_aux(object->fscache.cookie), len);
 
 	clear_bit(FSCACHE_COOKIE_AUX_UPDATED, &object->fscache.cookie->flags);
 	ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
-			   &auxdata->type, auxdata->len,
-			   XATTR_REPLACE);
+			   buf, sizeof(struct cachefiles_xattr) + len,
+			   xattr_flags);
+	kfree(buf);
 	if (ret < 0 && ret != -ENOMEM)
 		cachefiles_io_error_obj(
 			object,
-			"Failed to update xattr with error %d", ret);
+			"Failed to set xattr with error %d", ret);
 
 	_leave(" = %d", ret);
 	return ret;
@@ -157,148 +141,30 @@ int cachefiles_update_object_xattr(struct cachefiles_object *object,
  */
 int cachefiles_check_auxdata(struct cachefiles_object *object)
 {
-	struct cachefiles_xattr *auxbuf;
-	enum fscache_checkaux validity;
+	struct cachefiles_xattr *buf;
 	struct dentry *dentry = object->dentry;
-	ssize_t xlen;
-	int ret;
+	unsigned int len = object->fscache.cookie->aux_len, tlen;
+	const void *p = fscache_get_aux(object->fscache.cookie);
+	ssize_t ret;
 
 	ASSERT(dentry);
 	ASSERT(d_backing_inode(dentry));
-	ASSERT(object->fscache.cookie->def->check_aux);
 
-	auxbuf = kmalloc(sizeof(struct cachefiles_xattr) + 512, GFP_KERNEL);
-	if (!auxbuf)
+	tlen = sizeof(struct cachefiles_xattr) + len;
+	buf = kmalloc(tlen, GFP_KERNEL);
+	if (!buf)
 		return -ENOMEM;
 
-	xlen = vfs_getxattr(dentry, cachefiles_xattr_cache,
-			    &auxbuf->type, 512 + 1);
-	ret = -ESTALE;
-	if (xlen < 1 ||
-	    auxbuf->type != object->fscache.cookie->def->type)
-		goto error;
-
-	xlen--;
-	validity = fscache_check_aux(&object->fscache, &auxbuf->data, xlen,
-				     i_size_read(d_backing_inode(dentry)));
-	if (validity != FSCACHE_CHECKAUX_OKAY)
-		goto error;
-
-	ret = 0;
-error:
-	kfree(auxbuf);
-	return ret;
-}
-
-/*
- * check the state xattr on a cache file
- * - return -ESTALE if the object should be deleted
- */
-int cachefiles_check_object_xattr(struct cachefiles_object *object,
-				  struct cachefiles_xattr *auxdata)
-{
-	struct cachefiles_xattr *auxbuf;
-	struct dentry *dentry = object->dentry;
-	int ret;
-
-	_enter("%p,#%d", object, auxdata->len);
-
-	ASSERT(dentry);
-	ASSERT(d_backing_inode(dentry));
-
-	auxbuf = kmalloc(sizeof(struct cachefiles_xattr) + 512, cachefiles_gfp);
-	if (!auxbuf) {
-		_leave(" = -ENOMEM");
-		return -ENOMEM;
-	}
-
-	/* read the current type label */
-	ret = vfs_getxattr(dentry, cachefiles_xattr_cache,
-			   &auxbuf->type, 512 + 1);
-	if (ret < 0) {
-		if (ret == -ENODATA)
-			goto stale; /* no attribute - power went off
-				     * mid-cull? */
-
-		if (ret == -ERANGE)
-			goto bad_type_length;
-
-		cachefiles_io_error_obj(object,
-					"Can't read xattr on %lu (err %d)",
-					d_backing_inode(dentry)->i_ino, -ret);
-		goto error;
-	}
-
-	/* check the on-disk object */
-	if (ret < 1)
-		goto bad_type_length;
-
-	if (auxbuf->type != auxdata->type)
-		goto stale;
-
-	auxbuf->len = ret;
-
-	/* consult the netfs */
-	if (object->fscache.cookie->def->check_aux) {
-		enum fscache_checkaux result;
-		unsigned int dlen;
-
-		dlen = auxbuf->len - 1;
-
-		_debug("checkaux %s #%u",
-		       object->fscache.cookie->def->name, dlen);
-
-		result = fscache_check_aux(&object->fscache,
-					   &auxbuf->data, dlen,
-					   i_size_read(d_backing_inode(dentry)));
-
-		switch (result) {
-			/* entry okay as is */
-		case FSCACHE_CHECKAUX_OKAY:
-			goto okay;
-
-			/* entry requires update */
-		case FSCACHE_CHECKAUX_NEEDS_UPDATE:
-			break;
-
-			/* entry requires deletion */
-		case FSCACHE_CHECKAUX_OBSOLETE:
-			goto stale;
-
-		default:
-			BUG();
-		}
-
-		/* update the current label */
-		ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
-				   &auxdata->type, auxdata->len,
-				   XATTR_REPLACE);
-		if (ret < 0) {
-			cachefiles_io_error_obj(object,
-						"Can't update xattr on %lu"
-						" (error %d)",
-						d_backing_inode(dentry)->i_ino, -ret);
-			goto error;
-		}
-	}
-
-okay:
-	ret = 0;
+	ret = vfs_getxattr(dentry, cachefiles_xattr_cache, buf, tlen);
+	if (ret == tlen &&
+	    buf->type == object->fscache.cookie->def->type &&
+	    memcmp(buf->data, p, len) == 0)
+		ret = 0;
+	else
+		ret = -ESTALE;
 
-error:
-	kfree(auxbuf);
-	_leave(" = %d", ret);
+	kfree(buf);
 	return ret;
-
-bad_type_length:
-	pr_err("Cache object %lu xattr length incorrect\n",
-	       d_backing_inode(dentry)->i_ino);
-	ret = -EIO;
-	goto error;
-
-stale:
-	ret = -ESTALE;
-	goto error;
 }
 
 /*
diff --git a/fs/fscache/cache.c b/fs/fscache/cache.c
index 7de6d4cd29ee..2231691845b8 100644
--- a/fs/fscache/cache.c
+++ b/fs/fscache/cache.c
@@ -30,6 +30,7 @@ struct fscache_cache_tag *__fscache_lookup_cache_tag(const char *name)
 	list_for_each_entry(tag, &fscache_cache_tag_list, link) {
 		if (strcmp(tag->name, name) == 0) {
 			atomic_inc(&tag->usage);
+			refcount_inc(&tag->ref);
 			up_read(&fscache_addremove_sem);
 			return tag;
 		}
@@ -44,6 +45,7 @@ struct fscache_cache_tag *__fscache_lookup_cache_tag(const char *name)
 		return ERR_PTR(-ENOMEM);
 
 	atomic_set(&xtag->usage, 1);
+	refcount_set(&xtag->ref, 1);
 	strcpy(xtag->name, name);
 
 	/* write lock, search again and add if still not present */
@@ -52,6 +54,7 @@ struct fscache_cache_tag *__fscache_lookup_cache_tag(const char *name)
 	list_for_each_entry(tag, &fscache_cache_tag_list, link) {
 		if (strcmp(tag->name, name) == 0) {
 			atomic_inc(&tag->usage);
+			refcount_inc(&tag->ref);
 			up_write(&fscache_addremove_sem);
 			kfree(xtag);
 			return tag;
@@ -64,7 +67,7 @@ struct fscache_cache_tag *__fscache_lookup_cache_tag(const char *name)
 }
 
 /*
- * release a reference to a cache tag
+ * Unuse a cache tag
  */
 void __fscache_release_cache_tag(struct fscache_cache_tag *tag)
 {
@@ -77,8 +80,7 @@ void __fscache_release_cache_tag(struct fscache_cache_tag *tag)
 			tag = NULL;
 
 		up_write(&fscache_addremove_sem);
-
-		kfree(tag);
+		fscache_put_cache_tag(tag);
 	}
 }
 
@@ -130,20 +132,10 @@ struct fscache_cache *fscache_select_cache_for_object(
 
 	spin_unlock(&cookie->lock);
 
-	if (!cookie->def->select_cache)
-		goto no_preference;
-
-	/* ask the netfs for its preference */
-	tag = cookie->def->select_cache(cookie->parent->netfs_data,
-					cookie->netfs_data);
+	tag = cookie->preferred_cache;
 	if (!tag)
 		goto no_preference;
 
-	if (tag == ERR_PTR(-ENOMEM)) {
-		_leave(" = NULL [nomem tag]");
-		return NULL;
-	}
-
 	if (!tag->cache) {
 		_leave(" = NULL [unbacked tag]");
 		return NULL;
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index b35f727cc0d4..e38f02589077 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -43,11 +43,10 @@ static void fscache_print_cookie(struct fscache_cookie *cookie, char prefix)
 	       cookie->flags,
 	       atomic_read(&cookie->n_children),
 	       atomic_read(&cookie->n_active));
-	pr_err("%c-cookie d=%p{%s} n=%p\n",
+	pr_err("%c-cookie d=%p{%s}\n",
 	       prefix,
 	       cookie->def,
-	       cookie->def ? cookie->def->name : "?",
-	       cookie->netfs_data);
+	       cookie->def ? cookie->def->name : "?");
 
 	o = READ_ONCE(cookie->backing_objects.first);
 	if (o) {
@@ -74,6 +73,7 @@ void fscache_free_cookie(struct fscache_cookie *cookie)
 			kfree(cookie->aux);
 		if (cookie->key_len > sizeof(cookie->inline_key))
 			kfree(cookie->key);
+		fscache_put_cache_tag(cookie->preferred_cache);
 		kmem_cache_free(fscache_cookie_jar, cookie);
 	}
 }
@@ -150,9 +150,9 @@ static atomic_t fscache_cookie_debug_id = ATOMIC_INIT(1);
 struct fscache_cookie *fscache_alloc_cookie(
 	struct fscache_cookie *parent,
 	const struct fscache_cookie_def *def,
+	struct fscache_cache_tag *preferred_cache,
 	const void *index_key, size_t index_key_len,
 	const void *aux_data, size_t aux_data_len,
-	void *netfs_data,
 	loff_t object_size)
 {
 	struct fscache_cookie *cookie;
@@ -187,7 +187,9 @@ struct fscache_cookie *fscache_alloc_cookie(
 
 	cookie->def		= def;
 	cookie->parent		= parent;
-	cookie->netfs_data	= netfs_data;
+
+	cookie->preferred_cache	= fscache_get_cache_tag(preferred_cache);
+	
 	cookie->flags		= (1 << FSCACHE_COOKIE_NO_DATA_YET);
 	cookie->type		= def->type;
 	spin_lock_init(&cookie->lock);
@@ -252,7 +254,6 @@ struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *candidate)
  *   - the top level index cookie for each netfs is stored in the fscache_netfs
  *     struct upon registration
  * - def points to the definition
- * - the netfs_data will be passed to the functions pointed to in *def
  * - all attached caches will be searched to see if they contain this object
  * - index objects aren't stored on disk until there's a dependent file that
  *   needs storing
@@ -264,9 +265,9 @@ struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *candidate)
 struct fscache_cookie *__fscache_acquire_cookie(
 	struct fscache_cookie *parent,
 	const struct fscache_cookie_def *def,
+	struct fscache_cache_tag *preferred_cache,
 	const void *index_key, size_t index_key_len,
 	const void *aux_data, size_t aux_data_len,
-	void *netfs_data,
 	loff_t object_size,
 	bool enable)
 {
@@ -274,9 +275,9 @@ struct fscache_cookie *__fscache_acquire_cookie(
 
 	BUG_ON(!def);
 
-	_enter("{%s},{%s},%p,%u",
+	_enter("{%s},{%s},%u",
 	       parent ? (char *) parent->def->name : "<no-parent>",
-	       def->name, netfs_data, enable);
+	       def->name, enable);
 
 	if (!index_key || !index_key_len || index_key_len > 255 || aux_data_len > 255)
 		return NULL;
@@ -300,10 +301,10 @@ struct fscache_cookie *__fscache_acquire_cookie(
 	BUG_ON(def->type == FSCACHE_COOKIE_TYPE_INDEX &&
 	       parent->type != FSCACHE_COOKIE_TYPE_INDEX);
 
-	candidate = fscache_alloc_cookie(parent, def,
+	candidate = fscache_alloc_cookie(parent, def, preferred_cache,
 					 index_key, index_key_len,
 					 aux_data, aux_data_len,
-					 netfs_data, object_size);
+					 object_size);
 	if (!candidate) {
 		fscache_stat(&fscache_n_acquires_oom);
 		_leave(" [ENOMEM]");
@@ -814,8 +815,8 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie,
 		return;
 	}
 
-	_enter("%p{%s,%p,%d},%d",
-	       cookie, cookie->def->name, cookie->netfs_data,
+	_enter("%p{%s,%d},%d",
+	       cookie, cookie->def->name,
 	       atomic_read(&cookie->n_active), retire);
 
 	trace_fscache_relinquish(cookie, retire);
@@ -827,7 +828,6 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie,
 	__fscache_disable_cookie(cookie, aux_data, retire);
 
 	/* Clear pointers back to the netfs */
-	cookie->netfs_data	= NULL;
 	cookie->def		= NULL;
 
 	if (cookie->parent) {
@@ -981,8 +981,8 @@ static int fscache_cookies_seq_show(struct seq_file *m, void *v)
 
 	if (v == &fscache_cookies) {
 		seq_puts(m,
-			 "COOKIE   PARENT   USAGE CHILD ACT TY FL  DEF              NETFS_DATA\n"
-			 "======== ======== ===== ===== === == === ================ ==========\n"
+			 "COOKIE   PARENT   USAGE CHILD ACT TY FL  DEF             \n"
+			 "======== ======== ===== ===== === == === ================\n"
 			 );
 		return 0;
 	}
@@ -1004,7 +1004,7 @@ static int fscache_cookies_seq_show(struct seq_file *m, void *v)
 	}
 
 	seq_printf(m,
-		   "%08x %08x %5u %5u %3u %s %03lx %-16s %px",
+		   "%08x %08x %5u %5u %3u %s %03lx %-16s",
 		   cookie->debug_id,
 		   cookie->parent ? cookie->parent->debug_id : 0,
 		   atomic_read(&cookie->usage),
@@ -1012,8 +1012,7 @@ static int fscache_cookies_seq_show(struct seq_file *m, void *v)
 		   atomic_read(&cookie->n_active),
 		   type,
 		   cookie->flags,
-		   cookie->def->name,
-		   cookie->netfs_data);
+		   cookie->def->name);
 
 	keylen = cookie->key_len;
 	auxlen = cookie->aux_len;
diff --git a/fs/fscache/fsdef.c b/fs/fscache/fsdef.c
index 5f8f6fe243fe..0e2e242121fe 100644
--- a/fs/fscache/fsdef.c
+++ b/fs/fscache/fsdef.c
@@ -9,12 +9,6 @@
 #include <linux/module.h>
 #include "internal.h"
 
-static
-enum fscache_checkaux fscache_fsdef_netfs_check_aux(void *cookie_netfs_data,
-						    const void *data,
-						    uint16_t datalen,
-						    loff_t object_size);
-
 /*
  * The root index is owned by FS-Cache itself.
  *
@@ -64,35 +58,4 @@ EXPORT_SYMBOL(fscache_fsdef_index);
 struct fscache_cookie_def fscache_fsdef_netfs_def = {
 	.name		= "FSDEF.netfs",
 	.type		= FSCACHE_COOKIE_TYPE_INDEX,
-	.check_aux	= fscache_fsdef_netfs_check_aux,
 };
-
-/*
- * check that the index structure version number stored in the auxiliary data
- * matches the one the netfs gave us
- */
-static enum fscache_checkaux fscache_fsdef_netfs_check_aux(
-	void *cookie_netfs_data,
-	const void *data,
-	uint16_t datalen,
-	loff_t object_size)
-{
-	struct fscache_netfs *netfs = cookie_netfs_data;
-	uint32_t version;
-
-	_enter("{%s},,%hu", netfs->name, datalen);
-
-	if (datalen != sizeof(version)) {
-		_leave(" = OBSOLETE [dl=%d v=%zu]", datalen, sizeof(version));
-		return FSCACHE_CHECKAUX_OBSOLETE;
-	}
-
-	memcpy(&version, data, sizeof(version));
-	if (version != netfs->version) {
-		_leave(" = OBSOLETE [ver=%x net=%x]", version, netfs->version);
-		return FSCACHE_CHECKAUX_OBSOLETE;
-	}
-
-	_leave(" = OKAY");
-	return FSCACHE_CHECKAUX_OKAY;
-}
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 83dfbe0e3964..7dc447f7f3b0 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -24,6 +24,7 @@
 
 #define pr_fmt(fmt) "FS-Cache: " fmt
 
+#include <linux/slab.h>
 #include <linux/fscache-cache.h>
 #include <trace/events/fscache.h>
 #include <linux/sched.h>
@@ -41,6 +42,20 @@ extern struct rw_semaphore fscache_addremove_sem;
 extern struct fscache_cache *fscache_select_cache_for_object(
 	struct fscache_cookie *);
 
+static inline
+struct fscache_cache_tag *fscache_get_cache_tag(struct fscache_cache_tag *tag)
+{
+	if (tag)
+		refcount_inc(&tag->ref);
+	return tag;
+}
+
+static inline void fscache_put_cache_tag(struct fscache_cache_tag *tag)
+{
+	if (tag && refcount_dec_and_test(&tag->ref))
+		kfree(tag);
+}
+
 /*
  * cookie.c
  */
@@ -50,9 +65,10 @@ extern const struct seq_operations fscache_cookies_seq_ops;
 extern void fscache_free_cookie(struct fscache_cookie *);
 extern struct fscache_cookie *fscache_alloc_cookie(struct fscache_cookie *,
 						   const struct fscache_cookie_def *,
+						   struct fscache_cache_tag *,
 						   const void *, size_t,
 						   const void *, size_t,
-						   void *, loff_t);
+						   loff_t);
 extern struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *);
 extern void fscache_cookie_put(struct fscache_cookie *,
 			       enum fscache_cookie_trace);
@@ -329,16 +345,9 @@ static inline void fscache_cookie_get(struct fscache_cookie *cookie,
 static inline
 void fscache_update_aux(struct fscache_cookie *cookie, const void *aux_data)
 {
-	void *p;
-
-	if (!aux_data)
-		return;
-	if (cookie->aux_len <= sizeof(cookie->inline_aux))
-		p = cookie->inline_aux;
-	else
-		p = cookie->aux;
+	void *p = fscache_get_aux(cookie);
 
-	if (memcmp(p, aux_data, cookie->aux_len) != 0) {
+	if (p && memcmp(p, aux_data, cookie->aux_len) != 0) {
 		memcpy(p, aux_data, cookie->aux_len);
 		set_bit(FSCACHE_COOKIE_AUX_UPDATED, &cookie->flags);
 	}
diff --git a/fs/fscache/netfs.c b/fs/fscache/netfs.c
index cce92216fa28..8ce4c7fa11a0 100644
--- a/fs/fscache/netfs.c
+++ b/fs/fscache/netfs.c
@@ -22,9 +22,10 @@ int __fscache_register_netfs(struct fscache_netfs *netfs)
 	/* allocate a cookie for the primary index */
 	candidate = fscache_alloc_cookie(&fscache_fsdef_index,
 					 &fscache_fsdef_netfs_def,
+					 NULL,
 					 netfs->name, strlen(netfs->name),
 					 &netfs->version, sizeof(netfs->version),
-					 netfs, 0);
+					 0);
 	if (!candidate) {
 		_leave(" = -ENOMEM");
 		return -ENOMEM;
diff --git a/fs/fscache/object-list.c b/fs/fscache/object-list.c
index 1a0dc32c0a33..240e71d3faf2 100644
--- a/fs/fscache/object-list.c
+++ b/fs/fscache/object-list.c
@@ -170,7 +170,7 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 	if ((unsigned long) v == 1) {
 		seq_puts(m, "OBJECT   PARENT   STAT CHLDN OPS OOP IPR EX READS"
 			 " EM EV FL S"
-			 " | COOKIE   NETFS_COOKIE_DEF TY FL NETFS_DATA");
+			 " | COOKIE   NETFS_COOKIE_DEF TY FL");
 		if (config & (FSCACHE_OBJLIST_CONFIG_KEY |
 			      FSCACHE_OBJLIST_CONFIG_AUX))
 			seq_puts(m, "       ");
@@ -189,7 +189,7 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 	if ((unsigned long) v == 2) {
 		seq_puts(m, "======== ======== ==== ===== === === === == ====="
 			 " == == == ="
-			 " | ======== ================ == === ================");
+			 " | ======== ================ == ===");
 		if (config & (FSCACHE_OBJLIST_CONFIG_KEY |
 			      FSCACHE_OBJLIST_CONFIG_AUX))
 			seq_puts(m, " ================");
@@ -263,12 +263,11 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 			break;
 		}
 
-		seq_printf(m, "%08x %-16s %s %3lx %16p",
+		seq_printf(m, "%08x %-16s %s %3lx",
 			   cookie->debug_id,
 			   cookie->def->name,
 			   type,
-			   cookie->flags,
-			   cookie->netfs_data);
+			   cookie->flags);
 
 		if (config & FSCACHE_OBJLIST_CONFIG_KEY)
 			keylen = cookie->key_len;
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index 3d7f956a01c6..4ee9b5473c43 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -910,54 +910,6 @@ static void fscache_dequeue_object(struct fscache_object *object)
 	_leave("");
 }
 
-/**
- * fscache_check_aux - Ask the netfs whether an object on disk is still valid
- * @object: The object to ask about
- * @data: The auxiliary data for the object
- * @datalen: The size of the auxiliary data
- *
- * This function consults the netfs about the coherency state of an object.
- * The caller must be holding a ref on cookie->n_active (held by
- * fscache_look_up_object() on behalf of the cache backend during object lookup
- * and creation).
- */
-enum fscache_checkaux fscache_check_aux(struct fscache_object *object,
-					const void *data, uint16_t datalen,
-					loff_t object_size)
-{
-	enum fscache_checkaux result;
-
-	if (!object->cookie->def->check_aux) {
-		fscache_stat(&fscache_n_checkaux_none);
-		return FSCACHE_CHECKAUX_OKAY;
-	}
-
-	result = object->cookie->def->check_aux(object->cookie->netfs_data,
-						data, datalen, object_size);
-	switch (result) {
-		/* entry okay as is */
-	case FSCACHE_CHECKAUX_OKAY:
-		fscache_stat(&fscache_n_checkaux_okay);
-		break;
-
-		/* entry requires update */
-	case FSCACHE_CHECKAUX_NEEDS_UPDATE:
-		fscache_stat(&fscache_n_checkaux_update);
-		break;
-
-		/* entry requires deletion */
-	case FSCACHE_CHECKAUX_OBSOLETE:
-		fscache_stat(&fscache_n_checkaux_obsolete);
-		break;
-
-	default:
-		BUG();
-	}
-
-	return result;
-}
-EXPORT_SYMBOL(fscache_check_aux);
-
 /*
  * Asynchronously invalidate an object.
  */
diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c
index 573b1da9342c..b1049815729e 100644
--- a/fs/nfs/fscache-index.c
+++ b/fs/nfs/fscache-index.c
@@ -106,7 +106,7 @@ enum fscache_checkaux nfs_fscache_inode_check_aux(void *cookie_netfs_data,
  *   cache fails with EIO - in which case the server must be contacted to
  *   retrieve the data, which requires the read context for security.
  */
-static void nfs_fh_get_context(void *cookie_netfs_data, void *context)
+static void nfs_fh_get_context(void *context)
 {
 	get_nfs_open_context(context);
 }
@@ -116,7 +116,7 @@ static void nfs_fh_get_context(void *cookie_netfs_data, void *context)
  * - This function can be absent if the completion function doesn't require a
  *   context.
  */
-static void nfs_fh_put_context(void *cookie_netfs_data, void *context)
+static void nfs_fh_put_context(void *context)
 {
 	if (context)
 		put_nfs_open_context(context);
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index f01fe979b323..0b395bcb4b16 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -45,8 +45,9 @@ struct fscache_cache_tag {
 	struct fscache_cache	*cache;		/* cache referred to by this tag */
 	unsigned long		flags;
 #define FSCACHE_TAG_RESERVED	0		/* T if tag is reserved for a cache */
-	atomic_t		usage;
-	char			name[];	/* tag name */
+	atomic_t		usage;		/* Number of using netfs's */
+	refcount_t		ref;		/* Reference count on structure */
+	char			name[];		/* tag name */
 };
 
 /*
@@ -414,11 +415,6 @@ extern void fscache_io_error(struct fscache_cache *cache);
 
 extern bool fscache_object_sleep_till_congested(signed long *timeoutp);
 
-extern enum fscache_checkaux fscache_check_aux(struct fscache_object *object,
-					       const void *data,
-					       uint16_t datalen,
-					       loff_t object_size);
-
 extern void fscache_object_retrying_stale(struct fscache_object *object);
 
 enum fscache_why_object_killed {
@@ -430,4 +426,26 @@ enum fscache_why_object_killed {
 extern void fscache_object_mark_killed(struct fscache_object *object,
 				       enum fscache_why_object_killed why);
 
+/*
+ * Find the key on a cookie.
+ */
+static inline void *fscache_get_key(struct fscache_cookie *cookie)
+{
+	if (cookie->key_len <= sizeof(cookie->inline_key))
+		return cookie->inline_key;
+	else
+		return cookie->key;
+}
+
+/*
+ * Find the auxiliary data on a cookie.
+ */
+static inline void *fscache_get_aux(struct fscache_cookie *cookie)
+{
+	if (cookie->aux_len <= sizeof(cookie->inline_aux))
+		return cookie->inline_aux;
+	else
+		return cookie->aux;
+}
+
 #endif /* _LINUX_FSCACHE_CACHE_H */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 143d48281117..a9a11dff6eae 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -47,13 +47,6 @@ struct fscache_cache_tag;
 struct fscache_cookie;
 struct fscache_netfs;
 
-/* result of index entry consultation */
-enum fscache_checkaux {
-	FSCACHE_CHECKAUX_OKAY,		/* entry okay as is */
-	FSCACHE_CHECKAUX_NEEDS_UPDATE,	/* entry requires update */
-	FSCACHE_CHECKAUX_OBSOLETE,	/* entry requires deletion */
-};
-
 /*
  * fscache cookie definition
  */
@@ -65,26 +58,6 @@ struct fscache_cookie_def {
 	uint8_t type;
 #define FSCACHE_COOKIE_TYPE_INDEX	0
 #define FSCACHE_COOKIE_TYPE_DATAFILE	1
-
-	/* select the cache into which to insert an entry in this index
-	 * - optional
-	 * - should return a cache identifier or NULL to cause the cache to be
-	 *   inherited from the parent if possible or the first cache picked
-	 *   for a non-index file if not
-	 */
-	struct fscache_cache_tag *(*select_cache)(
-		const void *parent_netfs_data,
-		const void *cookie_netfs_data);
-
-	/* consult the netfs about the state of an object
-	 * - this function can be absent if the index carries no state data
-	 * - the netfs data from the cookie being used as the target is
-	 *   presented, as is the auxiliary data and the object size
-	 */
-	enum fscache_checkaux (*check_aux)(void *cookie_netfs_data,
-					   const void *data,
-					   uint16_t datalen,
-					   loff_t object_size);
 };
 
 /*
@@ -114,9 +87,9 @@ struct fscache_cookie {
 	struct hlist_head		backing_objects; /* object(s) backing this file/index */
 	const struct fscache_cookie_def	*def;		/* definition */
 	struct fscache_cookie		*parent;	/* parent of this entry */
+	struct fscache_cache_tag	*preferred_cache; /* The preferred cache or NULL */
 	struct hlist_bl_node		hash_link;	/* Link in hash table */
 	struct list_head		proc_link;	/* Link in proc list */
-	void				*netfs_data;	/* back pointer to netfs */
 
 	unsigned long			flags;
 #define FSCACHE_COOKIE_LOOKING_UP	0	/* T if non-index cookie being looked up still */
@@ -164,9 +137,10 @@ extern void __fscache_release_cache_tag(struct fscache_cache_tag *);
 extern struct fscache_cookie *__fscache_acquire_cookie(
 	struct fscache_cookie *,
 	const struct fscache_cookie_def *,
+	struct fscache_cache_tag *,
 	const void *, size_t,
 	const void *, size_t,
-	void *, loff_t, bool);
+	loff_t, bool);
 extern void __fscache_relinquish_cookie(struct fscache_cookie *, const void *, bool);
 extern int __fscache_check_consistency(struct fscache_cookie *, const void *);
 extern void __fscache_update_cookie(struct fscache_cookie *, const void *);
@@ -252,6 +226,7 @@ void fscache_release_cache_tag(struct fscache_cache_tag *tag)
  * fscache_acquire_cookie - Acquire a cookie to represent a cache object
  * @parent: The cookie that's to be the parent of this one
  * @def: A description of the cache object, including callback operations
+ * @preferred_cache: The cache to use (or NULL)
  * @index_key: The index key for this cookie
  * @index_key_len: Size of the index key
  * @aux_data: The auxiliary data for the cookie (may be NULL)
@@ -272,19 +247,19 @@ static inline
 struct fscache_cookie *fscache_acquire_cookie(
 	struct fscache_cookie *parent,
 	const struct fscache_cookie_def *def,
+	struct fscache_cache_tag *preferred_cache,
 	const void *index_key,
 	size_t index_key_len,
 	const void *aux_data,
 	size_t aux_data_len,
-	void *netfs_data,
 	loff_t object_size,
 	bool enable)
 {
 	if (fscache_cookie_valid(parent) && fscache_cookie_enabled(parent))
-		return __fscache_acquire_cookie(parent, def,
+		return __fscache_acquire_cookie(parent, def, preferred_cache,
 						index_key, index_key_len,
 						aux_data, aux_data_len,
-						netfs_data, object_size, enable);
+						object_size, enable);
 	else
 		return NULL;
 }



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 07/76] fscache: Remove struct fscache_cookie_def
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (5 preceding siblings ...)
  2020-11-20 15:03 ` [RFC PATCH 06/76] fscache: Remove the netfs data from the cookie David Howells
@ 2020-11-20 15:04 ` David Howells
  2020-11-20 15:04 ` [RFC PATCH 08/76] fscache: Remove store_limit* from struct fscache_object David Howells
                   ` (67 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:04 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Remove the cookie definition structure so that there's no pointers from
that back into the network filesystem.  All of the method pointers that
were in there have been removed anyway.  Any remaining information is
stashed in the cookie.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/cache.c                 |   15 ------------
 fs/afs/cell.c                  |    4 ++-
 fs/afs/inode.c                 |   15 +++++++-----
 fs/afs/internal.h              |    7 -----
 fs/afs/volume.c                |    5 ++--
 fs/cachefiles/interface.c      |    4 ++-
 fs/cachefiles/xattr.c          |    6 ++---
 fs/fscache/cookie.c            |   51 ++++++++++++++++++----------------------
 fs/fscache/fsdef.c             |   17 +------------
 fs/fscache/internal.h          |    5 ++--
 fs/fscache/netfs.c             |    5 ++--
 fs/fscache/object-list.c       |   16 +++----------
 fs/fscache/object.c            |    2 +-
 fs/fscache/page.c              |    2 +-
 include/linux/fscache.h        |   37 +++++++++++++++--------------
 include/trace/events/fscache.h |    2 +-
 16 files changed, 76 insertions(+), 117 deletions(-)

diff --git a/fs/afs/cache.c b/fs/afs/cache.c
index 9b2de3014dbf..0ee9ede6fc67 100644
--- a/fs/afs/cache.c
+++ b/fs/afs/cache.c
@@ -12,18 +12,3 @@ struct fscache_netfs afs_cache_netfs = {
 	.name			= "afs",
 	.version		= 2,
 };
-
-struct fscache_cookie_def afs_cell_cache_index_def = {
-	.name		= "AFS.cell",
-	.type		= FSCACHE_COOKIE_TYPE_INDEX,
-};
-
-struct fscache_cookie_def afs_volume_cache_index_def = {
-	.name		= "AFS.volume",
-	.type		= FSCACHE_COOKIE_TYPE_INDEX,
-};
-
-struct fscache_cookie_def afs_vnode_cache_index_def = {
-	.name		= "AFS.vnode",
-	.type		= FSCACHE_COOKIE_TYPE_DATAFILE,
-};
diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index d4cc13ce847b..585fd10da5be 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -680,7 +680,9 @@ static int afs_activate_cell(struct afs_net *net, struct afs_cell *cell)
 
 #ifdef CONFIG_AFS_FSCACHE
 	cell->cache = fscache_acquire_cookie(afs_cache_netfs.primary_index,
-					     &afs_cell_cache_index_def,
+					     FSCACHE_COOKIE_TYPE_INDEX,
+					     "AFS.cell",
+					     0,
 					     NULL,
 					     cell->name, strlen(cell->name),
 					     NULL, 0,
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 883fa9f67520..9f9fd0f85940 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -431,12 +431,15 @@ static void afs_get_inode_cache(struct afs_vnode *vnode)
 	key.vnode_id_ext[1]	= vnode->fid.vnode_hi;
 	aux.data_version	= vnode->status.data_version;
 
-	vnode->cache = fscache_acquire_cookie(vnode->volume->cache,
-					      &afs_vnode_cache_index_def,
-					      NULL,
-					      &key, sizeof(key),
-					      &aux, sizeof(aux),
-					      vnode->status.size, true);
+	vnode->cache = fscache_acquire_cookie(
+		vnode->volume->cache,
+		FSCACHE_COOKIE_TYPE_DATAFILE,
+		"AFS.vnode",
+		vnode->status.type == AFS_FTYPE_FILE ? 0 : FSCACHE_ADV_SINGLE_CHUNK,
+		NULL,
+		&key, sizeof(key),
+		&aux, sizeof(aux),
+		vnode->status.size, true);
 #endif
 }
 
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 14d5d75f4b6e..12bb08eaeeff 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -945,13 +945,6 @@ extern void afs_merge_fs_addr6(struct afs_addr_list *, __be32 *, u16);
  */
 #ifdef CONFIG_AFS_FSCACHE
 extern struct fscache_netfs afs_cache_netfs;
-extern struct fscache_cookie_def afs_cell_cache_index_def;
-extern struct fscache_cookie_def afs_volume_cache_index_def;
-extern struct fscache_cookie_def afs_vnode_cache_index_def;
-#else
-#define afs_cell_cache_index_def	(*(struct fscache_cookie_def *) NULL)
-#define afs_volume_cache_index_def	(*(struct fscache_cookie_def *) NULL)
-#define afs_vnode_cache_index_def	(*(struct fscache_cookie_def *) NULL)
 #endif
 
 /*
diff --git a/fs/afs/volume.c b/fs/afs/volume.c
index 9ca246ab1a86..5eaaa762fbd9 100644
--- a/fs/afs/volume.c
+++ b/fs/afs/volume.c
@@ -272,8 +272,9 @@ void afs_activate_volume(struct afs_volume *volume)
 {
 #ifdef CONFIG_AFS_FSCACHE
 	volume->cache = fscache_acquire_cookie(volume->cell->cache,
-					       &afs_volume_cache_index_def,
-					       NULL,
+					       FSCACHE_COOKIE_TYPE_INDEX,
+					       "AFS.vol",
+					       0, NULL,
 					       &volume->vid, sizeof(volume->vid),
 					       NULL, 0, 0, true);
 #endif
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 530ed11bb213..b7aa5c733cb7 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -41,7 +41,7 @@ static struct fscache_object *cachefiles_alloc_object(
 
 	fscache_object_init(&object->fscache, cookie, &cache->cache);
 
-	object->type = cookie->def->type;
+	object->type = cookie->type;
 
 	/* get hold of the raw key
 	 * - stick the length on the front and leave space on the back for the
@@ -109,7 +109,7 @@ static int cachefiles_lookup_object(struct fscache_object *_object)
 
 	/* polish off by setting the attributes of non-index files */
 	if (ret == 0 &&
-	    object->fscache.cookie->def->type != FSCACHE_COOKIE_TYPE_INDEX)
+	    object->fscache.cookie->type != FSCACHE_COOKIE_TYPE_INDEX)
 		cachefiles_attr_changed(&object->fscache);
 
 	if (ret < 0 && ret != -ETIMEDOUT) {
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index 5ab8643ca97a..a4f1eddebe6f 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -39,7 +39,7 @@ int cachefiles_check_object_type(struct cachefiles_object *object)
 	if (!object->fscache.cookie)
 		strcpy(type, "C3");
 	else
-		snprintf(type, 3, "%02x", object->fscache.cookie->def->type);
+		snprintf(type, 3, "%02x", object->fscache.cookie->type);
 
 	_enter("%p{%s}", object, type);
 
@@ -118,7 +118,7 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 	if (!buf)
 		return -ENOMEM;
 
-	buf->type = object->fscache.cookie->def->type;
+	buf->type = object->fscache.cookie->type;
 	if (len > 0)
 		memcpy(buf->data, fscache_get_aux(object->fscache.cookie), len);
 
@@ -157,7 +157,7 @@ int cachefiles_check_auxdata(struct cachefiles_object *object)
 
 	ret = vfs_getxattr(dentry, cachefiles_xattr_cache, buf, tlen);
 	if (ret == tlen &&
-	    buf->type == object->fscache.cookie->def->type &&
+	    buf->type == object->fscache.cookie->type &&
 	    memcmp(buf->data, p, len) == 0)
 		ret = 0;
 	else
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index e38f02589077..32c070c929b9 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -43,10 +43,9 @@ static void fscache_print_cookie(struct fscache_cookie *cookie, char prefix)
 	       cookie->flags,
 	       atomic_read(&cookie->n_children),
 	       atomic_read(&cookie->n_active));
-	pr_err("%c-cookie d=%p{%s}\n",
+	pr_err("%c-cookie d=%s\n",
 	       prefix,
-	       cookie->def,
-	       cookie->def ? cookie->def->name : "?");
+	       cookie->type_name);
 
 	o = READ_ONCE(cookie->backing_objects.first);
 	if (o) {
@@ -149,7 +148,9 @@ static atomic_t fscache_cookie_debug_id = ATOMIC_INIT(1);
  */
 struct fscache_cookie *fscache_alloc_cookie(
 	struct fscache_cookie *parent,
-	const struct fscache_cookie_def *def,
+	enum fscache_cookie_type type,
+	const char *type_name,
+	u8 advice,
 	struct fscache_cache_tag *preferred_cache,
 	const void *index_key, size_t index_key_len,
 	const void *aux_data, size_t aux_data_len,
@@ -162,8 +163,11 @@ struct fscache_cookie *fscache_alloc_cookie(
 	if (!cookie)
 		return NULL;
 
+	cookie->type = type;
+	cookie->advice = advice;
 	cookie->key_len = index_key_len;
 	cookie->aux_len = aux_data_len;
+	strlcpy(cookie->type_name, type_name, sizeof(cookie->type_name));
 
 	if (fscache_set_key(cookie, index_key, index_key_len) < 0)
 		goto nomem;
@@ -185,13 +189,10 @@ struct fscache_cookie *fscache_alloc_cookie(
 	 */
 	atomic_set(&cookie->n_active, 1);
 
-	cookie->def		= def;
 	cookie->parent		= parent;
-
 	cookie->preferred_cache	= fscache_get_cache_tag(preferred_cache);
 	
 	cookie->flags		= (1 << FSCACHE_COOKIE_NO_DATA_YET);
-	cookie->type		= def->type;
 	spin_lock_init(&cookie->lock);
 	INIT_HLIST_HEAD(&cookie->backing_objects);
 
@@ -253,7 +254,6 @@ struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *candidate)
  * - parent specifies the parent object
  *   - the top level index cookie for each netfs is stored in the fscache_netfs
  *     struct upon registration
- * - def points to the definition
  * - all attached caches will be searched to see if they contain this object
  * - index objects aren't stored on disk until there's a dependent file that
  *   needs storing
@@ -264,7 +264,9 @@ struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *candidate)
  */
 struct fscache_cookie *__fscache_acquire_cookie(
 	struct fscache_cookie *parent,
-	const struct fscache_cookie_def *def,
+	enum fscache_cookie_type type,
+	const char *type_name,
+	u8 advice,
 	struct fscache_cache_tag *preferred_cache,
 	const void *index_key, size_t index_key_len,
 	const void *aux_data, size_t aux_data_len,
@@ -273,11 +275,8 @@ struct fscache_cookie *__fscache_acquire_cookie(
 {
 	struct fscache_cookie *candidate, *cookie;
 
-	BUG_ON(!def);
-
 	_enter("{%s},{%s},%u",
-	       parent ? (char *) parent->def->name : "<no-parent>",
-	       def->name, enable);
+	       parent ? parent->type_name : "<no-parent>", type_name, enable);
 
 	if (!index_key || !index_key_len || index_key_len > 255 || aux_data_len > 255)
 		return NULL;
@@ -296,12 +295,11 @@ struct fscache_cookie *__fscache_acquire_cookie(
 	}
 
 	/* validate the definition */
-	BUG_ON(!def->name[0]);
-
-	BUG_ON(def->type == FSCACHE_COOKIE_TYPE_INDEX &&
+	BUG_ON(type == FSCACHE_COOKIE_TYPE_INDEX &&
 	       parent->type != FSCACHE_COOKIE_TYPE_INDEX);
 
-	candidate = fscache_alloc_cookie(parent, def, preferred_cache,
+	candidate = fscache_alloc_cookie(parent, type, type_name, advice,
+					 preferred_cache,
 					 index_key, index_key_len,
 					 aux_data, aux_data_len,
 					 object_size);
@@ -496,7 +494,7 @@ static int fscache_alloc_object(struct fscache_cache *cache,
 	struct fscache_object *object;
 	int ret;
 
-	_enter("%p,%p{%s}", cache, cookie, cookie->def->name);
+	_enter("%p,%p{%s}", cache, cookie, cookie->type_name);
 
 	spin_lock(&cookie->lock);
 	hlist_for_each_entry(object, &cookie->backing_objects,
@@ -523,7 +521,7 @@ static int fscache_alloc_object(struct fscache_cache *cache,
 	object->debug_id = atomic_inc_return(&fscache_object_debug_id);
 
 	_debug("ALLOC OBJ%x: %s {%lx}",
-	       object->debug_id, cookie->def->name, object->events);
+	       object->debug_id, cookie->type_name, object->events);
 
 	ret = fscache_alloc_object(cache, cookie->parent);
 	if (ret < 0)
@@ -571,7 +569,7 @@ static int fscache_attach_object(struct fscache_cookie *cookie,
 	struct fscache_cache *cache = object->cache;
 	int ret;
 
-	_enter("{%s},{OBJ%x}", cookie->def->name, object->debug_id);
+	_enter("{%s},{OBJ%x}", cookie->type_name, object->debug_id);
 
 	ASSERTCMP(object->cookie, ==, cookie);
 
@@ -633,7 +631,7 @@ void __fscache_invalidate(struct fscache_cookie *cookie)
 {
 	struct fscache_object *object;
 
-	_enter("{%s}", cookie->def->name);
+	_enter("{%s}", cookie->type_name);
 
 	fscache_stat(&fscache_n_invalidates);
 
@@ -698,7 +696,7 @@ void __fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data
 		return;
 	}
 
-	_enter("{%s}", cookie->def->name);
+	_enter("{%s}", cookie->type_name);
 
 	spin_lock(&cookie->lock);
 
@@ -737,7 +735,7 @@ void __fscache_disable_cookie(struct fscache_cookie *cookie,
 
 	if (atomic_read(&cookie->n_children) != 0) {
 		pr_err("Cookie '%s' still has children\n",
-		       cookie->def->name);
+		       cookie->type_name);
 		BUG();
 	}
 
@@ -816,7 +814,7 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie,
 	}
 
 	_enter("%p{%s,%d},%d",
-	       cookie, cookie->def->name,
+	       cookie, cookie->type_name,
 	       atomic_read(&cookie->n_active), retire);
 
 	trace_fscache_relinquish(cookie, retire);
@@ -827,9 +825,6 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie,
 
 	__fscache_disable_cookie(cookie, aux_data, retire);
 
-	/* Clear pointers back to the netfs */
-	cookie->def		= NULL;
-
 	if (cookie->parent) {
 		ASSERTCMP(atomic_read(&cookie->parent->usage), >, 0);
 		ASSERTCMP(atomic_read(&cookie->parent->n_children), >, 0);
@@ -1012,7 +1007,7 @@ static int fscache_cookies_seq_show(struct seq_file *m, void *v)
 		   atomic_read(&cookie->n_active),
 		   type,
 		   cookie->flags,
-		   cookie->def->name);
+		   cookie->type_name);
 
 	keylen = cookie->key_len;
 	auxlen = cookie->aux_len;
diff --git a/fs/fscache/fsdef.c b/fs/fscache/fsdef.c
index 0e2e242121fe..b802cbddb578 100644
--- a/fs/fscache/fsdef.c
+++ b/fs/fscache/fsdef.c
@@ -33,29 +33,14 @@
  * cache.  It can create whatever objects it likes in that index, including
  * further indices.
  */
-static struct fscache_cookie_def fscache_fsdef_index_def = {
-	.name		= ".FS-Cache",
-	.type		= FSCACHE_COOKIE_TYPE_INDEX,
-};
-
 struct fscache_cookie fscache_fsdef_index = {
 	.debug_id	= 1,
 	.usage		= ATOMIC_INIT(1),
 	.n_active	= ATOMIC_INIT(1),
 	.lock		= __SPIN_LOCK_UNLOCKED(fscache_fsdef_index.lock),
 	.backing_objects = HLIST_HEAD_INIT,
-	.def		= &fscache_fsdef_index_def,
+	.type_name	= ".fscach",
 	.flags		= 1 << FSCACHE_COOKIE_ENABLED,
 	.type		= FSCACHE_COOKIE_TYPE_INDEX,
 };
 EXPORT_SYMBOL(fscache_fsdef_index);
-
-/*
- * Definition of an entry in the root index.  Each entry is an index, keyed to
- * a specific netfs and only applicable to a particular version of the index
- * structure used by that netfs.
- */
-struct fscache_cookie_def fscache_fsdef_netfs_def = {
-	.name		= "FSDEF.netfs",
-	.type		= FSCACHE_COOKIE_TYPE_INDEX,
-};
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 7dc447f7f3b0..bc66bf7182ed 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -64,7 +64,9 @@ extern const struct seq_operations fscache_cookies_seq_ops;
 
 extern void fscache_free_cookie(struct fscache_cookie *);
 extern struct fscache_cookie *fscache_alloc_cookie(struct fscache_cookie *,
-						   const struct fscache_cookie_def *,
+						   enum fscache_cookie_type,
+						   const char *,
+						   u8,
 						   struct fscache_cache_tag *,
 						   const void *, size_t,
 						   const void *, size_t,
@@ -77,7 +79,6 @@ extern void fscache_cookie_put(struct fscache_cookie *,
  * fsdef.c
  */
 extern struct fscache_cookie fscache_fsdef_index;
-extern struct fscache_cookie_def fscache_fsdef_netfs_def;
 
 /*
  * histogram.c
diff --git a/fs/fscache/netfs.c b/fs/fscache/netfs.c
index 8ce4c7fa11a0..f8a816f844f6 100644
--- a/fs/fscache/netfs.c
+++ b/fs/fscache/netfs.c
@@ -21,8 +21,9 @@ int __fscache_register_netfs(struct fscache_netfs *netfs)
 
 	/* allocate a cookie for the primary index */
 	candidate = fscache_alloc_cookie(&fscache_fsdef_index,
-					 &fscache_fsdef_netfs_def,
-					 NULL,
+					 FSCACHE_COOKIE_TYPE_INDEX,
+					 ".netfs",
+					 0, NULL,
 					 netfs->name, strlen(netfs->name),
 					 &netfs->version, sizeof(netfs->version),
 					 0);
diff --git a/fs/fscache/object-list.c b/fs/fscache/object-list.c
index 240e71d3faf2..fc28de4738ec 100644
--- a/fs/fscache/object-list.c
+++ b/fs/fscache/object-list.c
@@ -21,8 +21,6 @@ struct fscache_objlist_data {
 	unsigned long	config;		/* display configuration */
 #define FSCACHE_OBJLIST_CONFIG_KEY	0x00000001	/* show object keys */
 #define FSCACHE_OBJLIST_CONFIG_AUX	0x00000002	/* show object auxdata */
-#define FSCACHE_OBJLIST_CONFIG_COOKIE	0x00000004	/* show objects with cookies */
-#define FSCACHE_OBJLIST_CONFIG_NOCOOKIE	0x00000008	/* show objects without cookies */
 #define FSCACHE_OBJLIST_CONFIG_BUSY	0x00000010	/* show busy objects */
 #define FSCACHE_OBJLIST_CONFIG_IDLE	0x00000020	/* show idle objects */
 #define FSCACHE_OBJLIST_CONFIG_PENDWR	0x00000040	/* show objects with pending writes */
@@ -170,7 +168,7 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 	if ((unsigned long) v == 1) {
 		seq_puts(m, "OBJECT   PARENT   STAT CHLDN OPS OOP IPR EX READS"
 			 " EM EV FL S"
-			 " | COOKIE   NETFS_COOKIE_DEF TY FL");
+			 " | COOKIE   TYPE    TY FL");
 		if (config & (FSCACHE_OBJLIST_CONFIG_KEY |
 			      FSCACHE_OBJLIST_CONFIG_AUX))
 			seq_puts(m, "       ");
@@ -189,7 +187,7 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 	if ((unsigned long) v == 2) {
 		seq_puts(m, "======== ======== ==== ===== === === === == ====="
 			 " == == == ="
-			 " | ======== ================ == ===");
+			 " | ======== ======= == ===");
 		if (config & (FSCACHE_OBJLIST_CONFIG_KEY |
 			      FSCACHE_OBJLIST_CONFIG_AUX))
 			seq_puts(m, " ================");
@@ -213,8 +211,6 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 
 	cookie = obj->cookie;
 	if (~config) {
-		FILTER(cookie->def,
-		       COOKIE, NOCOOKIE);
 		FILTER(fscache_object_is_active(obj) ||
 		       obj->n_ops != 0 ||
 		       obj->n_obj_ops != 0 ||
@@ -263,9 +259,9 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 			break;
 		}
 
-		seq_printf(m, "%08x %-16s %s %3lx",
+		seq_printf(m, "%08x %-7s %s %3lx",
 			   cookie->debug_id,
-			   cookie->def->name,
+			   cookie->type_name,
 			   type,
 			   cookie->flags);
 
@@ -338,8 +334,6 @@ static void fscache_objlist_config(struct fscache_objlist_data *data)
 		switch (buf[len]) {
 		case 'K': config |= FSCACHE_OBJLIST_CONFIG_KEY;		break;
 		case 'A': config |= FSCACHE_OBJLIST_CONFIG_AUX;		break;
-		case 'C': config |= FSCACHE_OBJLIST_CONFIG_COOKIE;	break;
-		case 'c': config |= FSCACHE_OBJLIST_CONFIG_NOCOOKIE;	break;
 		case 'B': config |= FSCACHE_OBJLIST_CONFIG_BUSY;	break;
 		case 'b': config |= FSCACHE_OBJLIST_CONFIG_IDLE;	break;
 		case 'W': config |= FSCACHE_OBJLIST_CONFIG_PENDWR;	break;
@@ -354,8 +348,6 @@ static void fscache_objlist_config(struct fscache_objlist_data *data)
 	rcu_read_unlock();
 	key_put(key);
 
-	if (!(config & (FSCACHE_OBJLIST_CONFIG_COOKIE | FSCACHE_OBJLIST_CONFIG_NOCOOKIE)))
-	    config   |= FSCACHE_OBJLIST_CONFIG_COOKIE | FSCACHE_OBJLIST_CONFIG_NOCOOKIE;
 	if (!(config & (FSCACHE_OBJLIST_CONFIG_BUSY | FSCACHE_OBJLIST_CONFIG_IDLE)))
 	    config   |= FSCACHE_OBJLIST_CONFIG_BUSY | FSCACHE_OBJLIST_CONFIG_IDLE;
 	if (!(config & (FSCACHE_OBJLIST_CONFIG_PENDWR | FSCACHE_OBJLIST_CONFIG_NOPENDWR)))
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index 4ee9b5473c43..bb551e29454d 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -474,7 +474,7 @@ static const struct fscache_state *fscache_look_up_object(struct fscache_object
 	}
 
 	_debug("LOOKUP \"%s\" in \"%s\"",
-	       cookie->def->name, object->cache->tag->name);
+	       cookie->type_name, object->cache->tag->name);
 
 	fscache_stat(&fscache_n_object_lookups);
 	fscache_stat(&fscache_n_cop_lookup_object);
diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index 1beffb071205..fd9cc16abc18 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -50,7 +50,7 @@ int __fscache_attr_changed(struct fscache_cookie *cookie)
 
 	_enter("%p", cookie);
 
-	ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
+	ASSERTCMP(cookie->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
 
 	fscache_stat(&fscache_n_attr_changed);
 
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index a9a11dff6eae..a930532a39d0 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -42,24 +42,17 @@
 /* pattern used to fill dead space in an index entry */
 #define FSCACHE_INDEX_DEADFILL_PATTERN 0x79
 
-struct pagevec;
 struct fscache_cache_tag;
 struct fscache_cookie;
 struct fscache_netfs;
 
-/*
- * fscache cookie definition
- */
-struct fscache_cookie_def {
-	/* name of cookie type */
-	char name[16];
-
-	/* cookie type */
-	uint8_t type;
-#define FSCACHE_COOKIE_TYPE_INDEX	0
-#define FSCACHE_COOKIE_TYPE_DATAFILE	1
+enum fscache_cookie_type {
+	FSCACHE_COOKIE_TYPE_INDEX,
+	FSCACHE_COOKIE_TYPE_DATAFILE,
 };
 
+#define FSCACHE_ADV_SINGLE_CHUNK	0x01 /* The object is a single chunk of data */
+
 /*
  * fscache cached network filesystem type
  * - name, version and ops must be filled in before registration
@@ -85,11 +78,11 @@ struct fscache_cookie {
 	unsigned int			debug_id;
 	spinlock_t			lock;
 	struct hlist_head		backing_objects; /* object(s) backing this file/index */
-	const struct fscache_cookie_def	*def;		/* definition */
 	struct fscache_cookie		*parent;	/* parent of this entry */
 	struct fscache_cache_tag	*preferred_cache; /* The preferred cache or NULL */
 	struct hlist_bl_node		hash_link;	/* Link in hash table */
 	struct list_head		proc_link;	/* Link in proc list */
+	char				type_name[8];	/* Cookie type name */
 
 	unsigned long			flags;
 #define FSCACHE_COOKIE_LOOKING_UP	0	/* T if non-index cookie being looked up still */
@@ -103,7 +96,8 @@ struct fscache_cookie {
 #define FSCACHE_COOKIE_ACQUIRED		9	/* T if cookie is in use */
 #define FSCACHE_COOKIE_RELINQUISHING	10	/* T if cookie is being relinquished */
 
-	u8				type;		/* Type of object */
+	enum fscache_cookie_type	type:8;
+	u8				advice;		/* FSCACHE_COOKIE_ADV_* */
 	u8				key_len;	/* Length of index key */
 	u8				aux_len;	/* Length of auxiliary data */
 	u32				key_hash;	/* Hash of parent, type, key, len */
@@ -136,7 +130,9 @@ extern void __fscache_release_cache_tag(struct fscache_cache_tag *);
 
 extern struct fscache_cookie *__fscache_acquire_cookie(
 	struct fscache_cookie *,
-	const struct fscache_cookie_def *,
+	enum fscache_cookie_type,
+	const char *,
+	u8,
 	struct fscache_cache_tag *,
 	const void *, size_t,
 	const void *, size_t,
@@ -225,7 +221,9 @@ void fscache_release_cache_tag(struct fscache_cache_tag *tag)
 /**
  * fscache_acquire_cookie - Acquire a cookie to represent a cache object
  * @parent: The cookie that's to be the parent of this one
- * @def: A description of the cache object, including callback operations
+ * @type: Type of the cookie
+ * @type_name: Name of cookie type (max 7 chars)
+ * @advice: Advice flags (FSCACHE_COOKIE_ADV_*)
  * @preferred_cache: The cache to use (or NULL)
  * @index_key: The index key for this cookie
  * @index_key_len: Size of the index key
@@ -246,7 +244,9 @@ void fscache_release_cache_tag(struct fscache_cache_tag *tag)
 static inline
 struct fscache_cookie *fscache_acquire_cookie(
 	struct fscache_cookie *parent,
-	const struct fscache_cookie_def *def,
+	enum fscache_cookie_type type,
+	const char *type_name,
+	u8 advice,
 	struct fscache_cache_tag *preferred_cache,
 	const void *index_key,
 	size_t index_key_len,
@@ -256,7 +256,8 @@ struct fscache_cookie *fscache_acquire_cookie(
 	bool enable)
 {
 	if (fscache_cookie_valid(parent) && fscache_cookie_enabled(parent))
-		return __fscache_acquire_cookie(parent, def, preferred_cache,
+		return __fscache_acquire_cookie(parent, type, type_name, advice,
+						preferred_cache,
 						index_key, index_key_len,
 						aux_data, aux_data_len,
 						object_size, enable);
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 0b9e058aba4d..953e15fad063 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -233,7 +233,7 @@ TRACE_EVENT(fscache_acquire,
 		    __entry->p_usage		= atomic_read(&cookie->parent->usage);
 		    __entry->p_n_children	= atomic_read(&cookie->parent->n_children);
 		    __entry->p_flags		= cookie->parent->flags;
-		    memcpy(__entry->name, cookie->def->name, 8);
+		    memcpy(__entry->name, cookie->type_name, 8);
 		    __entry->name[7]		= 0;
 			   ),
 



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 08/76] fscache: Remove store_limit* from struct fscache_object
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (6 preceding siblings ...)
  2020-11-20 15:04 ` [RFC PATCH 07/76] fscache: Remove struct fscache_cookie_def David Howells
@ 2020-11-20 15:04 ` David Howells
  2020-11-20 15:04 ` [RFC PATCH 09/76] fscache: Remove fscache_check_consistency() David Howells
                   ` (66 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:04 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Remove the store_limit values from struct fscache_object and store the
object size in the cookie.  The netfs can update this at will, and we don't
want to call back into the netfs to fetch it.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/interface.c     |   10 ++--------
 fs/fscache/cookie.c           |   14 ++++++--------
 fs/fscache/object.c           |    2 --
 include/linux/fscache-cache.h |   22 ----------------------
 include/linux/fscache.h       |    1 +
 5 files changed, 9 insertions(+), 40 deletions(-)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index b7aa5c733cb7..a5d348581bcc 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -359,7 +359,7 @@ static int cachefiles_attr_changed(struct fscache_object *_object)
 	loff_t oi_size;
 	int ret;
 
-	ni_size = _object->store_limit_l;
+	ni_size = _object->cookie->object_size;
 
 	_enter("{OBJ%x},[%llu]",
 	       _object->debug_id, (unsigned long long) ni_size);
@@ -376,8 +376,6 @@ static int cachefiles_attr_changed(struct fscache_object *_object)
 
 	ASSERT(d_is_reg(object->backer));
 
-	fscache_set_store_limit(&object->fscache, ni_size);
-
 	oi_size = i_size_read(d_backing_inode(object->backer));
 	if (oi_size == ni_size)
 		return 0;
@@ -406,7 +404,6 @@ static int cachefiles_attr_changed(struct fscache_object *_object)
 	cachefiles_end_secure(cache, saved_cred);
 
 	if (ret == -EIO) {
-		fscache_set_store_limit(&object->fscache, 0);
 		cachefiles_io_error_obj(object, "Size set failed");
 		ret = -ENOBUFS;
 	}
@@ -431,7 +428,7 @@ static void cachefiles_invalidate_object(struct fscache_operation *op)
 	cache = container_of(object->fscache.cache,
 			     struct cachefiles_cache, cache);
 
-	ni_size = op->object->store_limit_l;
+	ni_size = op->object->cookie->object_size;
 
 	_enter("{OBJ%x},[%llu]",
 	       op->object->debug_id, (unsigned long long)ni_size);
@@ -439,8 +436,6 @@ static void cachefiles_invalidate_object(struct fscache_operation *op)
 	if (object->backer) {
 		ASSERT(d_is_reg(object->backer));
 
-		fscache_set_store_limit(&object->fscache, ni_size);
-
 		path.dentry = object->backer;
 		path.mnt = cache->mnt;
 
@@ -451,7 +446,6 @@ static void cachefiles_invalidate_object(struct fscache_operation *op)
 		cachefiles_end_secure(cache, saved_cred);
 
 		if (ret != 0) {
-			fscache_set_store_limit(&object->fscache, 0);
 			if (ret == -EIO)
 				cachefiles_io_error_obj(object,
 							"Invalidate failed");
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 32c070c929b9..7811fe935b47 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -22,8 +22,7 @@ static struct hlist_bl_head fscache_cookie_hash[1 << fscache_cookie_hash_shift];
 static LIST_HEAD(fscache_cookies);
 static DEFINE_RWLOCK(fscache_cookies_lock);
 
-static int fscache_acquire_non_index_cookie(struct fscache_cookie *cookie,
-					    loff_t object_size);
+static int fscache_acquire_non_index_cookie(struct fscache_cookie *cookie);
 static int fscache_alloc_object(struct fscache_cache *cache,
 				struct fscache_cookie *cookie);
 static int fscache_attach_object(struct fscache_cookie *cookie,
@@ -167,6 +166,7 @@ struct fscache_cookie *fscache_alloc_cookie(
 	cookie->advice = advice;
 	cookie->key_len = index_key_len;
 	cookie->aux_len = aux_data_len;
+	cookie->object_size = object_size;
 	strlcpy(cookie->type_name, type_name, sizeof(cookie->type_name));
 
 	if (fscache_set_key(cookie, index_key, index_key_len) < 0)
@@ -337,7 +337,7 @@ struct fscache_cookie *__fscache_acquire_cookie(
 		 * - we create indices on disk when we need them as an index
 		 * may exist in multiple caches */
 		if (cookie->type != FSCACHE_COOKIE_TYPE_INDEX) {
-			if (fscache_acquire_non_index_cookie(cookie, object_size) == 0) {
+			if (fscache_acquire_non_index_cookie(cookie) == 0) {
 				set_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags);
 			} else {
 				atomic_dec(&parent->n_children);
@@ -376,6 +376,7 @@ void __fscache_enable_cookie(struct fscache_cookie *cookie,
 	wait_on_bit_lock(&cookie->flags, FSCACHE_COOKIE_ENABLEMENT_LOCK,
 			 TASK_UNINTERRUPTIBLE);
 
+	cookie->object_size = object_size;
 	fscache_update_aux(cookie, aux_data);
 
 	if (test_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags))
@@ -387,7 +388,7 @@ void __fscache_enable_cookie(struct fscache_cookie *cookie,
 		/* Wait for outstanding disablement to complete */
 		__fscache_wait_on_invalidate(cookie);
 
-		if (fscache_acquire_non_index_cookie(cookie, object_size) == 0)
+		if (fscache_acquire_non_index_cookie(cookie) == 0)
 			set_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags);
 	} else {
 		set_bit(FSCACHE_COOKIE_ENABLED, &cookie->flags);
@@ -404,8 +405,7 @@ EXPORT_SYMBOL(__fscache_enable_cookie);
  * - this must make sure the index chain is instantiated and instantiate the
  *   object representation too
  */
-static int fscache_acquire_non_index_cookie(struct fscache_cookie *cookie,
-					    loff_t object_size)
+static int fscache_acquire_non_index_cookie(struct fscache_cookie *cookie)
 {
 	struct fscache_object *object;
 	struct fscache_cache *cache;
@@ -456,8 +456,6 @@ static int fscache_acquire_non_index_cookie(struct fscache_cookie *cookie,
 	object = hlist_entry(cookie->backing_objects.first,
 			     struct fscache_object, cookie_link);
 
-	fscache_set_store_limit(object, object_size);
-
 	/* initiate the process of looking up all the objects in the chain
 	 * (done by fscache_initialise_object()) */
 	fscache_raise_event(object, FSCACHE_OBJECT_EV_NEW_CHILD);
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index bb551e29454d..e2e04ac2c0a1 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -319,8 +319,6 @@ void fscache_object_init(struct fscache_object *object,
 	object->n_children = 0;
 	object->n_ops = object->n_in_progress = object->n_exclusive = 0;
 	object->events = 0;
-	object->store_limit = 0;
-	object->store_limit_l = 0;
 	object->cache = cache;
 	object->cookie = cookie;
 	fscache_cookie_get(cookie, fscache_cookie_get_attach_object);
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 0b395bcb4b16..fd07843e3d69 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -274,8 +274,6 @@ struct fscache_object {
 #ifdef CONFIG_FSCACHE_OBJECT_LIST
 	struct rb_node		objlist_link;	/* link in global object list */
 #endif
-	pgoff_t			store_limit;	/* current storage limit */
-	loff_t			store_limit_l;	/* current storage limit */
 };
 
 extern void fscache_object_init(struct fscache_object *, struct fscache_cookie *,
@@ -336,26 +334,6 @@ static inline void fscache_object_lookup_error(struct fscache_object *object)
 	set_bit(FSCACHE_OBJECT_EV_ERROR, &object->events);
 }
 
-/**
- * fscache_set_store_limit - Set the maximum size to be stored in an object
- * @object: The object to set the maximum on
- * @i_size: The limit to set in bytes
- *
- * Set the maximum size an object is permitted to reach, implying the highest
- * byte that may be written.  Intended to be called by the attr_changed() op.
- *
- * See Documentation/filesystems/caching/backend-api.rst for a complete
- * description.
- */
-static inline
-void fscache_set_store_limit(struct fscache_object *object, loff_t i_size)
-{
-	object->store_limit_l = i_size;
-	object->store_limit = i_size >> PAGE_SHIFT;
-	if (i_size & ~PAGE_MASK)
-		object->store_limit++;
-}
-
 static inline void __fscache_use_cookie(struct fscache_cookie *cookie)
 {
 	atomic_inc(&cookie->n_active);
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index a930532a39d0..f855f6d04667 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -83,6 +83,7 @@ struct fscache_cookie {
 	struct hlist_bl_node		hash_link;	/* Link in hash table */
 	struct list_head		proc_link;	/* Link in proc list */
 	char				type_name[8];	/* Cookie type name */
+	loff_t				object_size;	/* Size of the netfs object */
 
 	unsigned long			flags;
 #define FSCACHE_COOKIE_LOOKING_UP	0	/* T if non-index cookie being looked up still */



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 09/76] fscache: Remove fscache_check_consistency()
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (7 preceding siblings ...)
  2020-11-20 15:04 ` [RFC PATCH 08/76] fscache: Remove store_limit* from struct fscache_object David Howells
@ 2020-11-20 15:04 ` David Howells
  2020-11-20 15:04 ` [RFC PATCH 10/76] fscache: Remove fscache_attr_changed() David Howells
                   ` (65 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:04 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Remove fscache_check_consistency() as that allows the netfs to pry into the
inner working of the cache - and what's in the cookie should be taken as
consistent with the disk (possibly lazily).

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/interface.c     |   26 -------------
 fs/fscache/cookie.c           |   79 ----------------------------------------
 fs/fscache/internal.h         |    9 -----
 fs/fscache/page.c             |   82 -----------------------------------------
 include/linux/fscache-cache.h |    4 --
 include/linux/fscache.h       |   23 ------------
 6 files changed, 223 deletions(-)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index a5d348581bcc..a3837ed090a8 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -320,31 +320,6 @@ static void cachefiles_sync_cache(struct fscache_cache *_cache)
 				    ret);
 }
 
-/*
- * check if the backing cache is updated to FS-Cache
- * - called by FS-Cache when evaluates if need to invalidate the cache
- */
-static int cachefiles_check_consistency(struct fscache_operation *op)
-{
-	struct cachefiles_object *object;
-	struct cachefiles_cache *cache;
-	const struct cred *saved_cred;
-	int ret;
-
-	_enter("{OBJ%x}", op->object->debug_id);
-
-	object = container_of(op->object, struct cachefiles_object, fscache);
-	cache = container_of(object->fscache.cache,
-			     struct cachefiles_cache, cache);
-
-	cachefiles_begin_secure(cache, &saved_cred);
-	ret = cachefiles_check_auxdata(object);
-	cachefiles_end_secure(cache, saved_cred);
-
-	_leave(" = %d", ret);
-	return ret;
-}
-
 /*
  * notification the attributes on an object have changed
  * - called with reads/writes excluded by FS-Cache
@@ -468,5 +443,4 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
 	.put_object		= cachefiles_put_object,
 	.sync_cache		= cachefiles_sync_cache,
 	.attr_changed		= cachefiles_attr_changed,
-	.check_consistency	= cachefiles_check_consistency,
 };
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 7811fe935b47..a50fd9e384a6 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -883,85 +883,6 @@ void fscache_cookie_put(struct fscache_cookie *cookie,
 	_leave("");
 }
 
-/*
- * check the consistency between the netfs inode and the backing cache
- *
- * NOTE: it only serves no-index type
- */
-int __fscache_check_consistency(struct fscache_cookie *cookie,
-				const void *aux_data)
-{
-	struct fscache_operation *op;
-	struct fscache_object *object;
-	bool wake_cookie = false;
-	int ret;
-
-	_enter("%p,", cookie);
-
-	ASSERTCMP(cookie->type, ==, FSCACHE_COOKIE_TYPE_DATAFILE);
-
-	if (fscache_wait_for_deferred_lookup(cookie) < 0)
-		return -ERESTARTSYS;
-
-	if (hlist_empty(&cookie->backing_objects))
-		return 0;
-
-	op = kzalloc(sizeof(*op), GFP_NOIO | __GFP_NOMEMALLOC | __GFP_NORETRY);
-	if (!op)
-		return -ENOMEM;
-
-	fscache_operation_init(cookie, op, NULL, NULL, NULL);
-	op->flags = FSCACHE_OP_MYTHREAD |
-		(1 << FSCACHE_OP_WAITING) |
-		(1 << FSCACHE_OP_UNUSE_COOKIE);
-	trace_fscache_page_op(cookie, NULL, op, fscache_page_op_check_consistency);
-
-	spin_lock(&cookie->lock);
-
-	fscache_update_aux(cookie, aux_data);
-
-	if (!fscache_cookie_enabled(cookie) ||
-	    hlist_empty(&cookie->backing_objects))
-		goto inconsistent;
-	object = hlist_entry(cookie->backing_objects.first,
-			     struct fscache_object, cookie_link);
-	if (test_bit(FSCACHE_IOERROR, &object->cache->flags))
-		goto inconsistent;
-
-	op->debug_id = atomic_inc_return(&fscache_op_debug_id);
-
-	__fscache_use_cookie(cookie);
-	if (fscache_submit_op(object, op) < 0)
-		goto submit_failed;
-
-	/* the work queue now carries its own ref on the object */
-	spin_unlock(&cookie->lock);
-
-	ret = fscache_wait_for_operation_activation(object, op, NULL, NULL);
-	if (ret == 0) {
-		/* ask the cache to honour the operation */
-		ret = object->cache->ops->check_consistency(op);
-		fscache_op_complete(op, false);
-	} else if (ret == -ENOBUFS) {
-		ret = 0;
-	}
-
-	fscache_put_operation(op);
-	_leave(" = %d", ret);
-	return ret;
-
-submit_failed:
-	wake_cookie = __fscache_unuse_cookie(cookie);
-inconsistent:
-	spin_unlock(&cookie->lock);
-	if (wake_cookie)
-		__fscache_wake_unused_cookie(cookie);
-	kfree(op);
-	_leave(" = -ESTALE");
-	return -ESTALE;
-}
-EXPORT_SYMBOL(__fscache_check_consistency);
-
 /*
  * Generate a list of extant cookies in /proc/fs/fscache/cookies
  */
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index bc66bf7182ed..20cbd1288b5a 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -151,15 +151,6 @@ extern void fscache_abort_object(struct fscache_object *);
 extern void fscache_start_operations(struct fscache_object *);
 extern void fscache_operation_gc(struct work_struct *);
 
-/*
- * page.c
- */
-extern int fscache_wait_for_deferred_lookup(struct fscache_cookie *);
-extern int fscache_wait_for_operation_activation(struct fscache_object *,
-						 struct fscache_operation *,
-						 atomic_t *,
-						 atomic_t *);
-
 /*
  * proc.c
  */
diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index fd9cc16abc18..73636e9d652d 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -96,85 +96,3 @@ int __fscache_attr_changed(struct fscache_cookie *cookie)
 	return -ENOBUFS;
 }
 EXPORT_SYMBOL(__fscache_attr_changed);
-
-/*
- * wait for a deferred lookup to complete
- */
-int fscache_wait_for_deferred_lookup(struct fscache_cookie *cookie)
-{
-	unsigned long jif;
-
-	_enter("");
-
-	if (!test_bit(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags)) {
-		_leave(" = 0 [imm]");
-		return 0;
-	}
-
-	fscache_stat(&fscache_n_retrievals_wait);
-
-	jif = jiffies;
-	if (wait_on_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP,
-			TASK_INTERRUPTIBLE) != 0) {
-		fscache_stat(&fscache_n_retrievals_intr);
-		_leave(" = -ERESTARTSYS");
-		return -ERESTARTSYS;
-	}
-
-	ASSERT(!test_bit(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags));
-
-	smp_rmb();
-	fscache_hist(fscache_retrieval_delay_histogram, jif);
-	_leave(" = 0 [dly]");
-	return 0;
-}
-
-/*
- * wait for an object to become active (or dead)
- */
-int fscache_wait_for_operation_activation(struct fscache_object *object,
-					  struct fscache_operation *op,
-					  atomic_t *stat_op_waits,
-					  atomic_t *stat_object_dead)
-{
-	int ret;
-
-	if (!test_bit(FSCACHE_OP_WAITING, &op->flags))
-		goto check_if_dead;
-
-	_debug(">>> WT");
-	if (stat_op_waits)
-		fscache_stat(stat_op_waits);
-	if (wait_on_bit(&op->flags, FSCACHE_OP_WAITING,
-			TASK_INTERRUPTIBLE) != 0) {
-		trace_fscache_op(object->cookie, op, fscache_op_signal);
-		ret = fscache_cancel_op(op, false);
-		if (ret == 0)
-			return -ERESTARTSYS;
-
-		/* it's been removed from the pending queue by another party,
-		 * so we should get to run shortly */
-		wait_on_bit(&op->flags, FSCACHE_OP_WAITING,
-			    TASK_UNINTERRUPTIBLE);
-	}
-	_debug("<<< GO");
-
-check_if_dead:
-	if (op->state == FSCACHE_OP_ST_CANCELLED) {
-		if (stat_object_dead)
-			fscache_stat(stat_object_dead);
-		_leave(" = -ENOBUFS [cancelled]");
-		return -ENOBUFS;
-	}
-	if (unlikely(fscache_object_is_dying(object) ||
-		     fscache_cache_is_broken(object))) {
-		enum fscache_operation_state state = op->state;
-		trace_fscache_op(object->cookie, op, fscache_op_signal);
-		fscache_cancel_op(op, true);
-		if (stat_object_dead)
-			fscache_stat(stat_object_dead);
-		_leave(" = -ENOBUFS [obj dead %d]", state);
-		return -ENOBUFS;
-	}
-	return 0;
-}
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index fd07843e3d69..bd9411cd466f 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -167,10 +167,6 @@ struct fscache_cache_ops {
 	/* unpin an object in the cache */
 	void (*unpin_object)(struct fscache_object *object);
 
-	/* check the consistency between the backing cache and the FS-Cache
-	 * cookie */
-	int (*check_consistency)(struct fscache_operation *op);
-
 	/* store the updated auxiliary data on an object */
 	void (*update_object)(struct fscache_object *object);
 
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index f855f6d04667..76cfec6868de 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -139,7 +139,6 @@ extern struct fscache_cookie *__fscache_acquire_cookie(
 	const void *, size_t,
 	loff_t, bool);
 extern void __fscache_relinquish_cookie(struct fscache_cookie *, const void *, bool);
-extern int __fscache_check_consistency(struct fscache_cookie *, const void *);
 extern void __fscache_update_cookie(struct fscache_cookie *, const void *);
 extern int __fscache_attr_changed(struct fscache_cookie *);
 extern void __fscache_invalidate(struct fscache_cookie *);
@@ -290,28 +289,6 @@ void fscache_relinquish_cookie(struct fscache_cookie *cookie,
 		__fscache_relinquish_cookie(cookie, aux_data, retire);
 }
 
-/**
- * fscache_check_consistency - Request validation of a cache's auxiliary data
- * @cookie: The cookie representing the cache object
- * @aux_data: The updated auxiliary data for the cookie (may be NULL)
- *
- * Request an consistency check from fscache, which passes the request to the
- * backing cache.  The auxiliary data on the cookie will be updated first if
- * @aux_data is set.
- *
- * Returns 0 if consistent and -ESTALE if inconsistent.  May also
- * return -ENOMEM and -ERESTARTSYS.
- */
-static inline
-int fscache_check_consistency(struct fscache_cookie *cookie,
-			      const void *aux_data)
-{
-	if (fscache_cookie_valid(cookie) && fscache_cookie_enabled(cookie))
-		return __fscache_check_consistency(cookie, aux_data);
-	else
-		return 0;
-}
-
 /**
  * fscache_update_cookie - Request that a cache object be updated
  * @cookie: The cookie representing the cache object



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 10/76] fscache: Remove fscache_attr_changed()
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (8 preceding siblings ...)
  2020-11-20 15:04 ` [RFC PATCH 09/76] fscache: Remove fscache_check_consistency() David Howells
@ 2020-11-20 15:04 ` David Howells
  2020-11-20 15:04 ` [RFC PATCH 11/76] fscache: Remove obsolete stats David Howells
                   ` (64 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:04 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Remove fscache_attr_changed() as it's unused.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/interface.c     |   13 ++---
 fs/fscache/Makefile           |    3 -
 fs/fscache/page.c             |   98 -----------------------------------------
 include/linux/fscache-cache.h |    4 --
 include/linux/fscache.h       |   21 ---------
 5 files changed, 6 insertions(+), 133 deletions(-)
 delete mode 100644 fs/fscache/page.c

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index a3837ed090a8..81322e3acadd 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -10,7 +10,7 @@
 #include <linux/xattr.h>
 #include "internal.h"
 
-static int cachefiles_attr_changed(struct fscache_object *_object);
+static int cachefiles_attr_changed(struct cachefiles_object *object);
 
 /*
  * allocate an object record for a cookie lookup and prepare the lookup data
@@ -110,7 +110,7 @@ static int cachefiles_lookup_object(struct fscache_object *_object)
 	/* polish off by setting the attributes of non-index files */
 	if (ret == 0 &&
 	    object->fscache.cookie->type != FSCACHE_COOKIE_TYPE_INDEX)
-		cachefiles_attr_changed(&object->fscache);
+		cachefiles_attr_changed(object);
 
 	if (ret < 0 && ret != -ETIMEDOUT) {
 		if (ret != -ENOBUFS)
@@ -324,9 +324,8 @@ static void cachefiles_sync_cache(struct fscache_cache *_cache)
  * notification the attributes on an object have changed
  * - called with reads/writes excluded by FS-Cache
  */
-static int cachefiles_attr_changed(struct fscache_object *_object)
+static int cachefiles_attr_changed(struct cachefiles_object *object)
 {
-	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
 	const struct cred *saved_cred;
 	struct iattr newattrs;
@@ -334,12 +333,11 @@ static int cachefiles_attr_changed(struct fscache_object *_object)
 	loff_t oi_size;
 	int ret;
 
-	ni_size = _object->cookie->object_size;
+	ni_size = object->fscache.cookie->object_size;
 
 	_enter("{OBJ%x},[%llu]",
-	       _object->debug_id, (unsigned long long) ni_size);
+	       object->fscache.debug_id, (unsigned long long) ni_size);
 
-	object = container_of(_object, struct cachefiles_object, fscache);
 	cache = container_of(object->fscache.cache,
 			     struct cachefiles_cache, cache);
 
@@ -442,5 +440,4 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
 	.drop_object		= cachefiles_drop_object,
 	.put_object		= cachefiles_put_object,
 	.sync_cache		= cachefiles_sync_cache,
-	.attr_changed		= cachefiles_attr_changed,
 };
diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
index 79e08e05ef84..565a3441d31d 100644
--- a/fs/fscache/Makefile
+++ b/fs/fscache/Makefile
@@ -10,8 +10,7 @@ fscache-y := \
 	main.o \
 	netfs.o \
 	object.o \
-	operation.o \
-	page.o
+	operation.o
 
 fscache-$(CONFIG_PROC_FS) += proc.o
 fscache-$(CONFIG_FSCACHE_STATS) += stats.o
diff --git a/fs/fscache/page.c b/fs/fscache/page.c
deleted file mode 100644
index 73636e9d652d..000000000000
--- a/fs/fscache/page.c
+++ /dev/null
@@ -1,98 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/* Cache page management and data I/O routines
- *
- * Copyright (C) 2004-2008 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- */
-
-#define FSCACHE_DEBUG_LEVEL PAGE
-#include <linux/module.h>
-#include <linux/fscache-cache.h>
-#include <linux/buffer_head.h>
-#include <linux/pagevec.h>
-#include <linux/slab.h>
-#include "internal.h"
-
-/*
- * actually apply the changed attributes to a cache object
- */
-static void fscache_attr_changed_op(struct fscache_operation *op)
-{
-	struct fscache_object *object = op->object;
-	int ret;
-
-	_enter("{OBJ%x OP%x}", object->debug_id, op->debug_id);
-
-	fscache_stat(&fscache_n_attr_changed_calls);
-
-	if (fscache_object_is_active(object)) {
-		fscache_stat(&fscache_n_cop_attr_changed);
-		ret = object->cache->ops->attr_changed(object);
-		fscache_stat_d(&fscache_n_cop_attr_changed);
-		if (ret < 0)
-			fscache_abort_object(object);
-		fscache_op_complete(op, ret < 0);
-	} else {
-		fscache_op_complete(op, true);
-	}
-
-	_leave("");
-}
-
-/*
- * notification that the attributes on an object have changed
- */
-int __fscache_attr_changed(struct fscache_cookie *cookie)
-{
-	struct fscache_operation *op;
-	struct fscache_object *object;
-	bool wake_cookie = false;
-
-	_enter("%p", cookie);
-
-	ASSERTCMP(cookie->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
-
-	fscache_stat(&fscache_n_attr_changed);
-
-	op = kzalloc(sizeof(*op), GFP_KERNEL);
-	if (!op) {
-		fscache_stat(&fscache_n_attr_changed_nomem);
-		_leave(" = -ENOMEM");
-		return -ENOMEM;
-	}
-
-	fscache_operation_init(cookie, op, fscache_attr_changed_op, NULL, NULL);
-	trace_fscache_page_op(cookie, NULL, op, fscache_page_op_attr_changed);
-	op->flags = FSCACHE_OP_ASYNC |
-		(1 << FSCACHE_OP_EXCLUSIVE) |
-		(1 << FSCACHE_OP_UNUSE_COOKIE);
-
-	spin_lock(&cookie->lock);
-
-	if (!fscache_cookie_enabled(cookie) ||
-	    hlist_empty(&cookie->backing_objects))
-		goto nobufs;
-	object = hlist_entry(cookie->backing_objects.first,
-			     struct fscache_object, cookie_link);
-
-	__fscache_use_cookie(cookie);
-	if (fscache_submit_exclusive_op(object, op) < 0)
-		goto nobufs_dec;
-	spin_unlock(&cookie->lock);
-	fscache_stat(&fscache_n_attr_changed_ok);
-	fscache_put_operation(op);
-	_leave(" = 0");
-	return 0;
-
-nobufs_dec:
-	wake_cookie = __fscache_unuse_cookie(cookie);
-nobufs:
-	spin_unlock(&cookie->lock);
-	fscache_put_operation(op);
-	if (wake_cookie)
-		__fscache_wake_unused_cookie(cookie);
-	fscache_stat(&fscache_n_attr_changed_nobufs);
-	_leave(" = %d", -ENOBUFS);
-	return -ENOBUFS;
-}
-EXPORT_SYMBOL(__fscache_attr_changed);
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index bd9411cd466f..60b2f8288668 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -184,10 +184,6 @@ struct fscache_cache_ops {
 	/* sync a cache */
 	void (*sync_cache)(struct fscache_cache *cache);
 
-	/* notification that the attributes of a non-index object (such as
-	 * i_size) have changed */
-	int (*attr_changed)(struct fscache_object *object);
-
 	/* reserve space for an object's data and associated metadata */
 	int (*reserve_space)(struct fscache_object *object, loff_t i_size);
 };
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 76cfec6868de..c62b62938c4d 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -140,7 +140,6 @@ extern struct fscache_cookie *__fscache_acquire_cookie(
 	loff_t, bool);
 extern void __fscache_relinquish_cookie(struct fscache_cookie *, const void *, bool);
 extern void __fscache_update_cookie(struct fscache_cookie *, const void *);
-extern int __fscache_attr_changed(struct fscache_cookie *);
 extern void __fscache_invalidate(struct fscache_cookie *);
 extern void __fscache_wait_on_invalidate(struct fscache_cookie *);
 extern void __fscache_disable_cookie(struct fscache_cookie *, const void *, bool);
@@ -337,26 +336,6 @@ void fscache_unpin_cookie(struct fscache_cookie *cookie)
 {
 }
 
-/**
- * fscache_attr_changed - Notify cache that an object's attributes changed
- * @cookie: The cookie representing the cache object
- *
- * Send a notification to the cache indicating that an object's attributes have
- * changed.  This includes the data size.  These attributes will be obtained
- * through the get_attr() cookie definition op.
- *
- * See Documentation/filesystems/caching/netfs-api.rst for a complete
- * description.
- */
-static inline
-int fscache_attr_changed(struct fscache_cookie *cookie)
-{
-	if (fscache_cookie_valid(cookie) && fscache_cookie_enabled(cookie))
-		return __fscache_attr_changed(cookie);
-	else
-		return -ENOBUFS;
-}
-
 /**
  * fscache_invalidate - Notify cache that an object needs invalidation
  * @cookie: The cookie representing the cache object



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 11/76] fscache: Remove obsolete stats
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (9 preceding siblings ...)
  2020-11-20 15:04 ` [RFC PATCH 10/76] fscache: Remove fscache_attr_changed() David Howells
@ 2020-11-20 15:04 ` David Howells
  2020-11-20 15:05 ` [RFC PATCH 12/76] fscache: Remove old I/O tracepoints David Howells
                   ` (63 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:04 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Remove a bunch of now-unused fscache stats counters that were obsoleted by
the removal of the old I/O routines.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/internal.h |   61 -----------------------
 fs/fscache/stats.c    |  129 -------------------------------------------------
 2 files changed, 1 insertion(+), 189 deletions(-)

diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 20cbd1288b5a..360137fd19a7 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -166,9 +166,6 @@ extern void fscache_proc_cleanup(void);
  * stats.c
  */
 #ifdef CONFIG_FSCACHE_STATS
-extern atomic_t fscache_n_ops_processed[FSCACHE_MAX_THREADS];
-extern atomic_t fscache_n_objs_processed[FSCACHE_MAX_THREADS];
-
 extern atomic_t fscache_n_op_pend;
 extern atomic_t fscache_n_op_run;
 extern atomic_t fscache_n_op_enqueue;
@@ -179,52 +176,6 @@ extern atomic_t fscache_n_op_gc;
 extern atomic_t fscache_n_op_cancelled;
 extern atomic_t fscache_n_op_rejected;
 
-extern atomic_t fscache_n_attr_changed;
-extern atomic_t fscache_n_attr_changed_ok;
-extern atomic_t fscache_n_attr_changed_nobufs;
-extern atomic_t fscache_n_attr_changed_nomem;
-extern atomic_t fscache_n_attr_changed_calls;
-
-extern atomic_t fscache_n_allocs;
-extern atomic_t fscache_n_allocs_ok;
-extern atomic_t fscache_n_allocs_wait;
-extern atomic_t fscache_n_allocs_nobufs;
-extern atomic_t fscache_n_allocs_intr;
-extern atomic_t fscache_n_allocs_object_dead;
-extern atomic_t fscache_n_alloc_ops;
-extern atomic_t fscache_n_alloc_op_waits;
-
-extern atomic_t fscache_n_retrievals;
-extern atomic_t fscache_n_retrievals_ok;
-extern atomic_t fscache_n_retrievals_wait;
-extern atomic_t fscache_n_retrievals_nodata;
-extern atomic_t fscache_n_retrievals_nobufs;
-extern atomic_t fscache_n_retrievals_intr;
-extern atomic_t fscache_n_retrievals_nomem;
-extern atomic_t fscache_n_retrievals_object_dead;
-extern atomic_t fscache_n_retrieval_ops;
-extern atomic_t fscache_n_retrieval_op_waits;
-
-extern atomic_t fscache_n_stores;
-extern atomic_t fscache_n_stores_ok;
-extern atomic_t fscache_n_stores_again;
-extern atomic_t fscache_n_stores_nobufs;
-extern atomic_t fscache_n_stores_oom;
-extern atomic_t fscache_n_store_ops;
-extern atomic_t fscache_n_store_calls;
-extern atomic_t fscache_n_store_pages;
-extern atomic_t fscache_n_store_radix_deletes;
-extern atomic_t fscache_n_store_pages_over_limit;
-
-extern atomic_t fscache_n_store_vmscan_not_storing;
-extern atomic_t fscache_n_store_vmscan_gone;
-extern atomic_t fscache_n_store_vmscan_busy;
-extern atomic_t fscache_n_store_vmscan_cancelled;
-extern atomic_t fscache_n_store_vmscan_wait;
-
-extern atomic_t fscache_n_marks;
-extern atomic_t fscache_n_uncaches;
-
 extern atomic_t fscache_n_acquires;
 extern atomic_t fscache_n_acquires_null;
 extern atomic_t fscache_n_acquires_no_cache;
@@ -241,7 +192,6 @@ extern atomic_t fscache_n_updates_run;
 
 extern atomic_t fscache_n_relinquishes;
 extern atomic_t fscache_n_relinquishes_null;
-extern atomic_t fscache_n_relinquishes_waitcrt;
 extern atomic_t fscache_n_relinquishes_retire;
 
 extern atomic_t fscache_n_cookie_index;
@@ -258,11 +208,6 @@ extern atomic_t fscache_n_object_created;
 extern atomic_t fscache_n_object_avail;
 extern atomic_t fscache_n_object_dead;
 
-extern atomic_t fscache_n_checkaux_none;
-extern atomic_t fscache_n_checkaux_okay;
-extern atomic_t fscache_n_checkaux_update;
-extern atomic_t fscache_n_checkaux_obsolete;
-
 extern atomic_t fscache_n_cop_alloc_object;
 extern atomic_t fscache_n_cop_lookup_object;
 extern atomic_t fscache_n_cop_lookup_complete;
@@ -273,12 +218,6 @@ extern atomic_t fscache_n_cop_drop_object;
 extern atomic_t fscache_n_cop_put_object;
 extern atomic_t fscache_n_cop_sync_cache;
 extern atomic_t fscache_n_cop_attr_changed;
-extern atomic_t fscache_n_cop_read_or_alloc_page;
-extern atomic_t fscache_n_cop_read_or_alloc_pages;
-extern atomic_t fscache_n_cop_allocate_page;
-extern atomic_t fscache_n_cop_allocate_pages;
-extern atomic_t fscache_n_cop_write_page;
-extern atomic_t fscache_n_cop_uncache_page;
 
 extern atomic_t fscache_n_cache_no_space_reject;
 extern atomic_t fscache_n_cache_stale_objects;
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index 281022871e70..5b1cec456199 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -24,52 +24,6 @@ atomic_t fscache_n_op_gc;
 atomic_t fscache_n_op_cancelled;
 atomic_t fscache_n_op_rejected;
 
-atomic_t fscache_n_attr_changed;
-atomic_t fscache_n_attr_changed_ok;
-atomic_t fscache_n_attr_changed_nobufs;
-atomic_t fscache_n_attr_changed_nomem;
-atomic_t fscache_n_attr_changed_calls;
-
-atomic_t fscache_n_allocs;
-atomic_t fscache_n_allocs_ok;
-atomic_t fscache_n_allocs_wait;
-atomic_t fscache_n_allocs_nobufs;
-atomic_t fscache_n_allocs_intr;
-atomic_t fscache_n_allocs_object_dead;
-atomic_t fscache_n_alloc_ops;
-atomic_t fscache_n_alloc_op_waits;
-
-atomic_t fscache_n_retrievals;
-atomic_t fscache_n_retrievals_ok;
-atomic_t fscache_n_retrievals_wait;
-atomic_t fscache_n_retrievals_nodata;
-atomic_t fscache_n_retrievals_nobufs;
-atomic_t fscache_n_retrievals_intr;
-atomic_t fscache_n_retrievals_nomem;
-atomic_t fscache_n_retrievals_object_dead;
-atomic_t fscache_n_retrieval_ops;
-atomic_t fscache_n_retrieval_op_waits;
-
-atomic_t fscache_n_stores;
-atomic_t fscache_n_stores_ok;
-atomic_t fscache_n_stores_again;
-atomic_t fscache_n_stores_nobufs;
-atomic_t fscache_n_stores_oom;
-atomic_t fscache_n_store_ops;
-atomic_t fscache_n_store_calls;
-atomic_t fscache_n_store_pages;
-atomic_t fscache_n_store_radix_deletes;
-atomic_t fscache_n_store_pages_over_limit;
-
-atomic_t fscache_n_store_vmscan_not_storing;
-atomic_t fscache_n_store_vmscan_gone;
-atomic_t fscache_n_store_vmscan_busy;
-atomic_t fscache_n_store_vmscan_cancelled;
-atomic_t fscache_n_store_vmscan_wait;
-
-atomic_t fscache_n_marks;
-atomic_t fscache_n_uncaches;
-
 atomic_t fscache_n_acquires;
 atomic_t fscache_n_acquires_null;
 atomic_t fscache_n_acquires_no_cache;
@@ -86,7 +40,6 @@ atomic_t fscache_n_updates_run;
 
 atomic_t fscache_n_relinquishes;
 atomic_t fscache_n_relinquishes_null;
-atomic_t fscache_n_relinquishes_waitcrt;
 atomic_t fscache_n_relinquishes_retire;
 
 atomic_t fscache_n_cookie_index;
@@ -103,11 +56,6 @@ atomic_t fscache_n_object_created;
 atomic_t fscache_n_object_avail;
 atomic_t fscache_n_object_dead;
 
-atomic_t fscache_n_checkaux_none;
-atomic_t fscache_n_checkaux_okay;
-atomic_t fscache_n_checkaux_update;
-atomic_t fscache_n_checkaux_obsolete;
-
 atomic_t fscache_n_cop_alloc_object;
 atomic_t fscache_n_cop_lookup_object;
 atomic_t fscache_n_cop_lookup_complete;
@@ -118,12 +66,6 @@ atomic_t fscache_n_cop_drop_object;
 atomic_t fscache_n_cop_put_object;
 atomic_t fscache_n_cop_sync_cache;
 atomic_t fscache_n_cop_attr_changed;
-atomic_t fscache_n_cop_read_or_alloc_page;
-atomic_t fscache_n_cop_read_or_alloc_pages;
-atomic_t fscache_n_cop_allocate_page;
-atomic_t fscache_n_cop_allocate_pages;
-atomic_t fscache_n_cop_write_page;
-atomic_t fscache_n_cop_uncache_page;
 
 atomic_t fscache_n_cache_no_space_reject;
 atomic_t fscache_n_cache_stale_objects;
@@ -147,15 +89,6 @@ int fscache_stats_show(struct seq_file *m, void *v)
 		   atomic_read(&fscache_n_object_no_alloc),
 		   atomic_read(&fscache_n_object_avail),
 		   atomic_read(&fscache_n_object_dead));
-	seq_printf(m, "ChkAux : non=%u ok=%u upd=%u obs=%u\n",
-		   atomic_read(&fscache_n_checkaux_none),
-		   atomic_read(&fscache_n_checkaux_okay),
-		   atomic_read(&fscache_n_checkaux_update),
-		   atomic_read(&fscache_n_checkaux_obsolete));
-
-	seq_printf(m, "Pages  : mrk=%u unc=%u\n",
-		   atomic_read(&fscache_n_marks),
-		   atomic_read(&fscache_n_uncaches));
 
 	seq_printf(m, "Acquire: n=%u nul=%u noc=%u ok=%u nbf=%u"
 		   " oom=%u\n",
@@ -182,64 +115,11 @@ int fscache_stats_show(struct seq_file *m, void *v)
 		   atomic_read(&fscache_n_updates_null),
 		   atomic_read(&fscache_n_updates_run));
 
-	seq_printf(m, "Relinqs: n=%u nul=%u wcr=%u rtr=%u\n",
+	seq_printf(m, "Relinqs: n=%u nul=%u rtr=%u\n",
 		   atomic_read(&fscache_n_relinquishes),
 		   atomic_read(&fscache_n_relinquishes_null),
-		   atomic_read(&fscache_n_relinquishes_waitcrt),
 		   atomic_read(&fscache_n_relinquishes_retire));
 
-	seq_printf(m, "AttrChg: n=%u ok=%u nbf=%u oom=%u run=%u\n",
-		   atomic_read(&fscache_n_attr_changed),
-		   atomic_read(&fscache_n_attr_changed_ok),
-		   atomic_read(&fscache_n_attr_changed_nobufs),
-		   atomic_read(&fscache_n_attr_changed_nomem),
-		   atomic_read(&fscache_n_attr_changed_calls));
-
-	seq_printf(m, "Allocs : n=%u ok=%u wt=%u nbf=%u int=%u\n",
-		   atomic_read(&fscache_n_allocs),
-		   atomic_read(&fscache_n_allocs_ok),
-		   atomic_read(&fscache_n_allocs_wait),
-		   atomic_read(&fscache_n_allocs_nobufs),
-		   atomic_read(&fscache_n_allocs_intr));
-	seq_printf(m, "Allocs : ops=%u owt=%u abt=%u\n",
-		   atomic_read(&fscache_n_alloc_ops),
-		   atomic_read(&fscache_n_alloc_op_waits),
-		   atomic_read(&fscache_n_allocs_object_dead));
-
-	seq_printf(m, "Retrvls: n=%u ok=%u wt=%u nod=%u nbf=%u"
-		   " int=%u oom=%u\n",
-		   atomic_read(&fscache_n_retrievals),
-		   atomic_read(&fscache_n_retrievals_ok),
-		   atomic_read(&fscache_n_retrievals_wait),
-		   atomic_read(&fscache_n_retrievals_nodata),
-		   atomic_read(&fscache_n_retrievals_nobufs),
-		   atomic_read(&fscache_n_retrievals_intr),
-		   atomic_read(&fscache_n_retrievals_nomem));
-	seq_printf(m, "Retrvls: ops=%u owt=%u abt=%u\n",
-		   atomic_read(&fscache_n_retrieval_ops),
-		   atomic_read(&fscache_n_retrieval_op_waits),
-		   atomic_read(&fscache_n_retrievals_object_dead));
-
-	seq_printf(m, "Stores : n=%u ok=%u agn=%u nbf=%u oom=%u\n",
-		   atomic_read(&fscache_n_stores),
-		   atomic_read(&fscache_n_stores_ok),
-		   atomic_read(&fscache_n_stores_again),
-		   atomic_read(&fscache_n_stores_nobufs),
-		   atomic_read(&fscache_n_stores_oom));
-	seq_printf(m, "Stores : ops=%u run=%u pgs=%u rxd=%u olm=%u\n",
-		   atomic_read(&fscache_n_store_ops),
-		   atomic_read(&fscache_n_store_calls),
-		   atomic_read(&fscache_n_store_pages),
-		   atomic_read(&fscache_n_store_radix_deletes),
-		   atomic_read(&fscache_n_store_pages_over_limit));
-
-	seq_printf(m, "VmScan : nos=%u gon=%u bsy=%u can=%u wt=%u\n",
-		   atomic_read(&fscache_n_store_vmscan_not_storing),
-		   atomic_read(&fscache_n_store_vmscan_gone),
-		   atomic_read(&fscache_n_store_vmscan_busy),
-		   atomic_read(&fscache_n_store_vmscan_cancelled),
-		   atomic_read(&fscache_n_store_vmscan_wait));
-
 	seq_printf(m, "Ops    : pend=%u run=%u enq=%u can=%u rej=%u\n",
 		   atomic_read(&fscache_n_op_pend),
 		   atomic_read(&fscache_n_op_run),
@@ -264,13 +144,6 @@ int fscache_stats_show(struct seq_file *m, void *v)
 		   atomic_read(&fscache_n_cop_put_object),
 		   atomic_read(&fscache_n_cop_attr_changed),
 		   atomic_read(&fscache_n_cop_sync_cache));
-	seq_printf(m, "CacheOp: rap=%d ras=%d alp=%d als=%d wrp=%d ucp=%d\n",
-		   atomic_read(&fscache_n_cop_read_or_alloc_page),
-		   atomic_read(&fscache_n_cop_read_or_alloc_pages),
-		   atomic_read(&fscache_n_cop_allocate_page),
-		   atomic_read(&fscache_n_cop_allocate_pages),
-		   atomic_read(&fscache_n_cop_write_page),
-		   atomic_read(&fscache_n_cop_uncache_page));
 	seq_printf(m, "CacheEv: nsp=%d stl=%d rtr=%d cul=%d\n",
 		   atomic_read(&fscache_n_cache_no_space_reject),
 		   atomic_read(&fscache_n_cache_stale_objects),



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 12/76] fscache: Remove old I/O tracepoints
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (10 preceding siblings ...)
  2020-11-20 15:04 ` [RFC PATCH 11/76] fscache: Remove obsolete stats David Howells
@ 2020-11-20 15:05 ` David Howells
  2020-11-20 15:05 ` [RFC PATCH 13/76] fscache: Temporarily disable fscache_invalidate() David Howells
                   ` (62 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:05 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Remove now-unused fscache tracepoints that have been obsoleted by the
removal of the old I/O code.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/object.c            |    1 
 include/trace/events/fscache.h |  195 ----------------------------------------
 2 files changed, 196 deletions(-)

diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index e2e04ac2c0a1..e31164f0446b 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -942,7 +942,6 @@ static const struct fscache_state *_fscache_invalidate_object(struct fscache_obj
 	op->flags = FSCACHE_OP_ASYNC |
 		(1 << FSCACHE_OP_EXCLUSIVE) |
 		(1 << FSCACHE_OP_UNUSE_COOKIE);
-	trace_fscache_page_op(cookie, NULL, op, fscache_page_op_invalidate);
 
 	spin_lock(&cookie->lock);
 	if (fscache_submit_exclusive_op(object, op) < 0)
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 953e15fad063..2ebfd688a7c2 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -33,24 +33,6 @@ enum fscache_cookie_trace {
 	fscache_cookie_put_parent,
 };
 
-enum fscache_page_trace {
-	fscache_page_cached,
-	fscache_page_inval,
-	fscache_page_maybe_release,
-	fscache_page_radix_clear_store,
-	fscache_page_radix_delete,
-	fscache_page_radix_insert,
-	fscache_page_radix_pend2store,
-	fscache_page_radix_set_pend,
-	fscache_page_uncache,
-	fscache_page_write,
-	fscache_page_write_end,
-	fscache_page_write_end_pend,
-	fscache_page_write_end_noc,
-	fscache_page_write_wait,
-	fscache_page_trace__nr
-};
-
 enum fscache_op_trace {
 	fscache_op_cancel,
 	fscache_op_cancel_all,
@@ -69,17 +51,6 @@ enum fscache_op_trace {
 	fscache_op_trace__nr
 };
 
-enum fscache_page_op_trace {
-	fscache_page_op_alloc_one,
-	fscache_page_op_attr_changed,
-	fscache_page_op_check_consistency,
-	fscache_page_op_invalidate,
-	fscache_page_op_retr_multi,
-	fscache_page_op_retr_one,
-	fscache_page_op_write_one,
-	fscache_page_op_trace__nr
-};
-
 #endif
 
 /*
@@ -98,22 +69,6 @@ enum fscache_page_op_trace {
 	EM(fscache_cookie_put_object,		"PUT obj")		\
 	E_(fscache_cookie_put_parent,		"PUT prn")
 
-#define fscache_page_traces						\
-	EM(fscache_page_cached,			"Cached ")		\
-	EM(fscache_page_inval,			"InvalPg")		\
-	EM(fscache_page_maybe_release,		"MayRels")		\
-	EM(fscache_page_uncache,		"Uncache")		\
-	EM(fscache_page_radix_clear_store,	"RxCStr ")		\
-	EM(fscache_page_radix_delete,		"RxDel  ")		\
-	EM(fscache_page_radix_insert,		"RxIns  ")		\
-	EM(fscache_page_radix_pend2store,	"RxP2S  ")		\
-	EM(fscache_page_radix_set_pend,		"RxSPend ")		\
-	EM(fscache_page_write,			"WritePg")		\
-	EM(fscache_page_write_end,		"EndPgWr")		\
-	EM(fscache_page_write_end_pend,		"EndPgWP")		\
-	EM(fscache_page_write_end_noc,		"EndPgNC")		\
-	E_(fscache_page_write_wait,		"WtOnWrt")
-
 #define fscache_op_traces						\
 	EM(fscache_op_cancel,			"Cancel1")		\
 	EM(fscache_op_cancel_all,		"CancelA")		\
@@ -130,15 +85,6 @@ enum fscache_page_op_trace {
 	EM(fscache_op_submit_ex,		"SubmitX")		\
 	E_(fscache_op_work,			"Work   ")
 
-#define fscache_page_op_traces						\
-	EM(fscache_page_op_alloc_one,		"Alloc1 ")		\
-	EM(fscache_page_op_attr_changed,	"AttrChg")		\
-	EM(fscache_page_op_check_consistency,	"CheckCn")		\
-	EM(fscache_page_op_invalidate,		"Inval  ")		\
-	EM(fscache_page_op_retr_multi,		"RetrMul")		\
-	EM(fscache_page_op_retr_one,		"Retr1  ")		\
-	E_(fscache_page_op_write_one,		"Write1 ")
-
 /*
  * Export enum symbols via userspace.
  */
@@ -363,70 +309,6 @@ TRACE_EVENT(fscache_osm,
 		      __entry->event_num)
 	    );
 
-TRACE_EVENT(fscache_page,
-	    TP_PROTO(struct fscache_cookie *cookie, struct page *page,
-		     enum fscache_page_trace why),
-
-	    TP_ARGS(cookie, page, why),
-
-	    TP_STRUCT__entry(
-		    __field(unsigned int,		cookie		)
-		    __field(pgoff_t,			page		)
-		    __field(enum fscache_page_trace,	why		)
-			     ),
-
-	    TP_fast_assign(
-		    __entry->cookie		= cookie->debug_id;
-		    __entry->page		= page->index;
-		    __entry->why		= why;
-			   ),
-
-	    TP_printk("c=%08x %s pg=%lx",
-		      __entry->cookie,
-		      __print_symbolic(__entry->why, fscache_page_traces),
-		      __entry->page)
-	    );
-
-TRACE_EVENT(fscache_check_page,
-	    TP_PROTO(struct fscache_cookie *cookie, struct page *page,
-		     void *val, int n),
-
-	    TP_ARGS(cookie, page, val, n),
-
-	    TP_STRUCT__entry(
-		    __field(unsigned int,		cookie		)
-		    __field(void *,			page		)
-		    __field(void *,			val		)
-		    __field(int,			n		)
-			     ),
-
-	    TP_fast_assign(
-		    __entry->cookie		= cookie->debug_id;
-		    __entry->page		= page;
-		    __entry->val		= val;
-		    __entry->n			= n;
-			   ),
-
-	    TP_printk("c=%08x pg=%p val=%p n=%d",
-		      __entry->cookie, __entry->page, __entry->val, __entry->n)
-	    );
-
-TRACE_EVENT(fscache_wake_cookie,
-	    TP_PROTO(struct fscache_cookie *cookie),
-
-	    TP_ARGS(cookie),
-
-	    TP_STRUCT__entry(
-		    __field(unsigned int,		cookie		)
-			     ),
-
-	    TP_fast_assign(
-		    __entry->cookie		= cookie->debug_id;
-			   ),
-
-	    TP_printk("c=%08x", __entry->cookie)
-	    );
-
 TRACE_EVENT(fscache_op,
 	    TP_PROTO(struct fscache_cookie *cookie, struct fscache_operation *op,
 		     enum fscache_op_trace why),
@@ -450,83 +332,6 @@ TRACE_EVENT(fscache_op,
 		      __print_symbolic(__entry->why, fscache_op_traces))
 	    );
 
-TRACE_EVENT(fscache_page_op,
-	    TP_PROTO(struct fscache_cookie *cookie, struct page *page,
-		     struct fscache_operation *op, enum fscache_page_op_trace what),
-
-	    TP_ARGS(cookie, page, op, what),
-
-	    TP_STRUCT__entry(
-		    __field(unsigned int,		cookie		)
-		    __field(unsigned int,		op		)
-		    __field(pgoff_t,			page		)
-		    __field(enum fscache_page_op_trace,	what		)
-			     ),
-
-	    TP_fast_assign(
-		    __entry->cookie		= cookie->debug_id;
-		    __entry->page		= page ? page->index : 0;
-		    __entry->op			= op->debug_id;
-		    __entry->what		= what;
-			   ),
-
-	    TP_printk("c=%08x %s pg=%lx op=%08x",
-		      __entry->cookie,
-		      __print_symbolic(__entry->what, fscache_page_op_traces),
-		      __entry->page, __entry->op)
-	    );
-
-TRACE_EVENT(fscache_wrote_page,
-	    TP_PROTO(struct fscache_cookie *cookie, struct page *page,
-		     struct fscache_operation *op, int ret),
-
-	    TP_ARGS(cookie, page, op, ret),
-
-	    TP_STRUCT__entry(
-		    __field(unsigned int,		cookie		)
-		    __field(unsigned int,		op		)
-		    __field(pgoff_t,			page		)
-		    __field(int,			ret		)
-			     ),
-
-	    TP_fast_assign(
-		    __entry->cookie		= cookie->debug_id;
-		    __entry->page		= page->index;
-		    __entry->op			= op->debug_id;
-		    __entry->ret		= ret;
-			   ),
-
-	    TP_printk("c=%08x pg=%lx op=%08x ret=%d",
-		      __entry->cookie, __entry->page, __entry->op, __entry->ret)
-	    );
-
-TRACE_EVENT(fscache_gang_lookup,
-	    TP_PROTO(struct fscache_cookie *cookie, struct fscache_operation *op,
-		     void **results, int n, pgoff_t store_limit),
-
-	    TP_ARGS(cookie, op, results, n, store_limit),
-
-	    TP_STRUCT__entry(
-		    __field(unsigned int,		cookie		)
-		    __field(unsigned int,		op		)
-		    __field(pgoff_t,			results0	)
-		    __field(int,			n		)
-		    __field(pgoff_t,			store_limit	)
-			     ),
-
-	    TP_fast_assign(
-		    __entry->cookie		= cookie->debug_id;
-		    __entry->op			= op->debug_id;
-		    __entry->results0		= results[0] ? ((struct page *)results[0])->index : (pgoff_t)-1;
-		    __entry->n			= n;
-		    __entry->store_limit	= store_limit;
-			   ),
-
-	    TP_printk("c=%08x op=%08x r0=%lx n=%d sl=%lx",
-		      __entry->cookie, __entry->op, __entry->results0, __entry->n,
-		      __entry->store_limit)
-	    );
-
 #endif /* _TRACE_FSCACHE_H */
 
 /* This part must be outside protection */



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 13/76] fscache: Temporarily disable fscache_invalidate()
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (11 preceding siblings ...)
  2020-11-20 15:05 ` [RFC PATCH 12/76] fscache: Remove old I/O tracepoints David Howells
@ 2020-11-20 15:05 ` David Howells
  2020-11-20 15:05 ` [RFC PATCH 14/76] fscache: Remove the I/O operation manager David Howells
                   ` (61 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:05 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Temporarily disable the fscache side of fscache_invalidate() so that the
operation managing code can be removed.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/interface.c     |    9 ++---
 fs/fscache/cookie.c           |    4 +-
 fs/fscache/object.c           |   78 +----------------------------------------
 include/linux/fscache-cache.h |    2 +
 4 files changed, 7 insertions(+), 86 deletions(-)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 81322e3acadd..99f42d216ef7 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -388,7 +388,7 @@ static int cachefiles_attr_changed(struct cachefiles_object *object)
 /*
  * Invalidate an object
  */
-static void cachefiles_invalidate_object(struct fscache_operation *op)
+static void cachefiles_invalidate_object(struct fscache_object *_object)
 {
 	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
@@ -397,14 +397,14 @@ static void cachefiles_invalidate_object(struct fscache_operation *op)
 	uint64_t ni_size;
 	int ret;
 
-	object = container_of(op->object, struct cachefiles_object, fscache);
+	object = container_of(_object, struct cachefiles_object, fscache);
 	cache = container_of(object->fscache.cache,
 			     struct cachefiles_cache, cache);
 
-	ni_size = op->object->cookie->object_size;
+	ni_size = object->fscache.cookie->object_size;
 
 	_enter("{OBJ%x},[%llu]",
-	       op->object->debug_id, (unsigned long long)ni_size);
+	       object->fscache.debug_id, (unsigned long long)ni_size);
 
 	if (object->backer) {
 		ASSERT(d_is_reg(object->backer));
@@ -425,7 +425,6 @@ static void cachefiles_invalidate_object(struct fscache_operation *op)
 		}
 	}
 
-	fscache_op_complete(op, true);
 	_leave("");
 }
 
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index a50fd9e384a6..30394b32a91c 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -653,9 +653,7 @@ void __fscache_invalidate(struct fscache_cookie *cookie)
 			object = hlist_entry(cookie->backing_objects.first,
 					     struct fscache_object,
 					     cookie_link);
-			if (fscache_object_is_live(object))
-				fscache_raise_event(
-					object, FSCACHE_OBJECT_EV_INVALIDATE);
+			/* TODO: Do invalidation */
 		}
 
 		spin_unlock(&cookie->lock);
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index e31164f0446b..5eda1cd265ef 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -908,86 +908,10 @@ static void fscache_dequeue_object(struct fscache_object *object)
 	_leave("");
 }
 
-/*
- * Asynchronously invalidate an object.
- */
-static const struct fscache_state *_fscache_invalidate_object(struct fscache_object *object,
-							      int event)
-{
-	struct fscache_operation *op;
-	struct fscache_cookie *cookie = object->cookie;
-
-	_enter("{OBJ%x},%d", object->debug_id, event);
-
-	/* We're going to need the cookie.  If the cookie is not available then
-	 * retire the object instead.
-	 */
-	if (!fscache_use_cookie(object)) {
-		set_bit(FSCACHE_OBJECT_RETIRED, &object->flags);
-		_leave(" [no cookie]");
-		return transit_to(KILL_OBJECT);
-	}
-
-	/* Reject any new read/write ops and abort any that are pending. */
-	clear_bit(FSCACHE_OBJECT_PENDING_WRITE, &object->flags);
-	fscache_cancel_all_ops(object);
-
-	/* Now we have to wait for in-progress reads and writes */
-	op = kzalloc(sizeof(*op), GFP_KERNEL);
-	if (!op)
-		goto nomem;
-
-	fscache_operation_init(cookie, op, object->cache->ops->invalidate_object,
-			       NULL, NULL);
-	op->flags = FSCACHE_OP_ASYNC |
-		(1 << FSCACHE_OP_EXCLUSIVE) |
-		(1 << FSCACHE_OP_UNUSE_COOKIE);
-
-	spin_lock(&cookie->lock);
-	if (fscache_submit_exclusive_op(object, op) < 0)
-		goto submit_op_failed;
-	spin_unlock(&cookie->lock);
-	fscache_put_operation(op);
-
-	/* Once we've completed the invalidation, we know there will be no data
-	 * stored in the cache and thus we can reinstate the data-check-skip
-	 * optimisation.
-	 */
-	set_bit(FSCACHE_COOKIE_NO_DATA_YET, &cookie->flags);
-
-	/* We can allow read and write requests to come in once again.  They'll
-	 * queue up behind our exclusive invalidation operation.
-	 */
-	if (test_and_clear_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags))
-		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING);
-	_leave(" [ok]");
-	return transit_to(UPDATE_OBJECT);
-
-nomem:
-	fscache_mark_object_dead(object);
-	fscache_unuse_cookie(object);
-	_leave(" [ENOMEM]");
-	return transit_to(KILL_OBJECT);
-
-submit_op_failed:
-	fscache_mark_object_dead(object);
-	spin_unlock(&cookie->lock);
-	fscache_unuse_cookie(object);
-	kfree(op);
-	_leave(" [EIO]");
-	return transit_to(KILL_OBJECT);
-}
-
 static const struct fscache_state *fscache_invalidate_object(struct fscache_object *object,
 							     int event)
 {
-	const struct fscache_state *s;
-
-	fscache_stat(&fscache_n_invalidates_run);
-	fscache_stat(&fscache_n_cop_invalidate_object);
-	s = _fscache_invalidate_object(object, event);
-	fscache_stat_d(&fscache_n_cop_invalidate_object);
-	return s;
+	return transit_to(UPDATE_OBJECT);
 }
 
 /*
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 60b2f8288668..0fbe25b1271b 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -171,7 +171,7 @@ struct fscache_cache_ops {
 	void (*update_object)(struct fscache_object *object);
 
 	/* Invalidate an object */
-	void (*invalidate_object)(struct fscache_operation *op);
+	void (*invalidate_object)(struct fscache_object *object);
 
 	/* discard the resources pinned by an object and effect retirement if
 	 * necessary */



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 14/76] fscache: Remove the I/O operation manager
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (12 preceding siblings ...)
  2020-11-20 15:05 ` [RFC PATCH 13/76] fscache: Temporarily disable fscache_invalidate() David Howells
@ 2020-11-20 15:05 ` David Howells
  2020-11-20 15:05 ` [RFC PATCH 15/76] fscache: Change %p in format strings to something else David Howells
                   ` (60 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:05 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Remove the fscache I/O operation manager.  Getting operation-operation
interactions and object-operation interactions correct has proven really
difficult; furthermore, the operations are being replaced with kiocb-driven
stuff on the cache front.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/namei.c          |    5 
 fs/fscache/Makefile            |    3 
 fs/fscache/cache.c             |    3 
 fs/fscache/cookie.c            |    1 
 fs/fscache/internal.h          |   13 -
 fs/fscache/object-list.c       |   25 --
 fs/fscache/object.c            |   24 --
 fs/fscache/operation.c         |  633 ----------------------------------------
 include/linux/fscache-cache.h  |   71 ----
 include/trace/events/fscache.h |   57 ----
 10 files changed, 7 insertions(+), 828 deletions(-)
 delete mode 100644 fs/fscache/operation.c

diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 432002080b83..924042e8cced 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -36,9 +36,8 @@ void __cachefiles_printk_object(struct cachefiles_object *object,
 	       prefix, object->fscache.state->name,
 	       object->fscache.flags, work_busy(&object->fscache.work),
 	       object->fscache.events, object->fscache.event_mask);
-	pr_err("%sops=%u inp=%u exc=%u\n",
-	       prefix, object->fscache.n_ops, object->fscache.n_in_progress,
-	       object->fscache.n_exclusive);
+	pr_err("%sops=%u\n",
+	       prefix, object->fscache.n_ops);
 	pr_err("%sparent=%p\n",
 	       prefix, object->fscache.parent);
 
diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
index 565a3441d31d..ac3fcd909fff 100644
--- a/fs/fscache/Makefile
+++ b/fs/fscache/Makefile
@@ -9,8 +9,7 @@ fscache-y := \
 	fsdef.o \
 	main.o \
 	netfs.o \
-	object.o \
-	operation.o
+	object.o
 
 fscache-$(CONFIG_PROC_FS) += proc.o
 fscache-$(CONFIG_FSCACHE_STATS) += stats.o
diff --git a/fs/fscache/cache.c b/fs/fscache/cache.c
index 2231691845b8..92f747b72ec3 100644
--- a/fs/fscache/cache.c
+++ b/fs/fscache/cache.c
@@ -182,12 +182,9 @@ void fscache_init_cache(struct fscache_cache *cache,
 	vsnprintf(cache->identifier, sizeof(cache->identifier), idfmt, va);
 	va_end(va);
 
-	INIT_WORK(&cache->op_gc, fscache_operation_gc);
 	INIT_LIST_HEAD(&cache->link);
 	INIT_LIST_HEAD(&cache->object_list);
-	INIT_LIST_HEAD(&cache->op_gc_list);
 	spin_lock_init(&cache->object_list_lock);
-	spin_lock_init(&cache->op_gc_list_lock);
 }
 EXPORT_SYMBOL(fscache_init_cache);
 
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 30394b32a91c..aa33d16711e1 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -756,7 +756,6 @@ void __fscache_disable_cookie(struct fscache_cookie *cookie,
 		hlist_for_each_entry(object, &cookie->backing_objects, cookie_link) {
 			if (invalidate)
 				set_bit(FSCACHE_OBJECT_RETIRED, &object->flags);
-			clear_bit(FSCACHE_OBJECT_PENDING_WRITE, &object->flags);
 			fscache_raise_event(object, FSCACHE_OBJECT_EV_KILL);
 		}
 	} else {
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 360137fd19a7..bc5539d2157b 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -138,19 +138,6 @@ extern void fscache_objlist_remove(struct fscache_object *);
 #define fscache_objlist_remove(object) do {} while(0)
 #endif
 
-/*
- * operation.c
- */
-extern int fscache_submit_exclusive_op(struct fscache_object *,
-				       struct fscache_operation *);
-extern int fscache_submit_op(struct fscache_object *,
-			     struct fscache_operation *);
-extern int fscache_cancel_op(struct fscache_operation *, bool);
-extern void fscache_cancel_all_ops(struct fscache_object *);
-extern void fscache_abort_object(struct fscache_object *);
-extern void fscache_start_operations(struct fscache_object *);
-extern void fscache_operation_gc(struct work_struct *);
-
 /*
  * proc.c
  */
diff --git a/fs/fscache/object-list.c b/fs/fscache/object-list.c
index fc28de4738ec..147c556ee01b 100644
--- a/fs/fscache/object-list.c
+++ b/fs/fscache/object-list.c
@@ -23,10 +23,6 @@ struct fscache_objlist_data {
 #define FSCACHE_OBJLIST_CONFIG_AUX	0x00000002	/* show object auxdata */
 #define FSCACHE_OBJLIST_CONFIG_BUSY	0x00000010	/* show busy objects */
 #define FSCACHE_OBJLIST_CONFIG_IDLE	0x00000020	/* show idle objects */
-#define FSCACHE_OBJLIST_CONFIG_PENDWR	0x00000040	/* show objects with pending writes */
-#define FSCACHE_OBJLIST_CONFIG_NOPENDWR	0x00000080	/* show objects without pending writes */
-#define FSCACHE_OBJLIST_CONFIG_READS	0x00000100	/* show objects with active reads */
-#define FSCACHE_OBJLIST_CONFIG_NOREADS	0x00000200	/* show objects without active reads */
 #define FSCACHE_OBJLIST_CONFIG_EVENTS	0x00000400	/* show objects with events */
 #define FSCACHE_OBJLIST_CONFIG_NOEVENTS	0x00000800	/* show objects without no events */
 #define FSCACHE_OBJLIST_CONFIG_WORK	0x00001000	/* show objects with work */
@@ -166,7 +162,7 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 	u8 *p;
 
 	if ((unsigned long) v == 1) {
-		seq_puts(m, "OBJECT   PARENT   STAT CHLDN OPS OOP IPR EX READS"
+		seq_puts(m, "OBJECT   PARENT   STAT CHLDN OPS OOP"
 			 " EM EV FL S"
 			 " | COOKIE   TYPE    TY FL");
 		if (config & (FSCACHE_OBJLIST_CONFIG_KEY |
@@ -185,7 +181,7 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 	}
 
 	if ((unsigned long) v == 2) {
-		seq_puts(m, "======== ======== ==== ===== === === === == ====="
+		seq_puts(m, "======== ======== ==== ===== === ==="
 			 " == == == ="
 			 " | ======== ======= == ===");
 		if (config & (FSCACHE_OBJLIST_CONFIG_KEY |
@@ -217,26 +213,19 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 		       obj->flags ||
 		       !list_empty(&obj->dependents),
 		       BUSY, IDLE);
-		FILTER(test_bit(FSCACHE_OBJECT_PENDING_WRITE, &obj->flags),
-		       PENDWR, NOPENDWR);
-		FILTER(atomic_read(&obj->n_reads),
-		       READS, NOREADS);
 		FILTER(obj->events & obj->event_mask,
 		       EVENTS, NOEVENTS);
 		FILTER(work_busy(&obj->work), WORK, NOWORK);
 	}
 
 	seq_printf(m,
-		   "%08x %08x %s %5u %3u %3u %3u %2u %5u %2lx %2lx %2lx %1x | ",
+		   "%08x %08x %s %5u %3u %3u %2lx %2lx %2lx %1x | ",
 		   obj->debug_id,
 		   obj->parent ? obj->parent->debug_id : UINT_MAX,
 		   obj->state->short_name,
 		   obj->n_children,
 		   obj->n_ops,
 		   obj->n_obj_ops,
-		   obj->n_in_progress,
-		   obj->n_exclusive,
-		   atomic_read(&obj->n_reads),
 		   obj->event_mask,
 		   obj->events,
 		   obj->flags,
@@ -336,10 +325,6 @@ static void fscache_objlist_config(struct fscache_objlist_data *data)
 		case 'A': config |= FSCACHE_OBJLIST_CONFIG_AUX;		break;
 		case 'B': config |= FSCACHE_OBJLIST_CONFIG_BUSY;	break;
 		case 'b': config |= FSCACHE_OBJLIST_CONFIG_IDLE;	break;
-		case 'W': config |= FSCACHE_OBJLIST_CONFIG_PENDWR;	break;
-		case 'w': config |= FSCACHE_OBJLIST_CONFIG_NOPENDWR;	break;
-		case 'R': config |= FSCACHE_OBJLIST_CONFIG_READS;	break;
-		case 'r': config |= FSCACHE_OBJLIST_CONFIG_NOREADS;	break;
 		case 'S': config |= FSCACHE_OBJLIST_CONFIG_WORK;	break;
 		case 's': config |= FSCACHE_OBJLIST_CONFIG_NOWORK;	break;
 		}
@@ -350,10 +335,6 @@ static void fscache_objlist_config(struct fscache_objlist_data *data)
 
 	if (!(config & (FSCACHE_OBJLIST_CONFIG_BUSY | FSCACHE_OBJLIST_CONFIG_IDLE)))
 	    config   |= FSCACHE_OBJLIST_CONFIG_BUSY | FSCACHE_OBJLIST_CONFIG_IDLE;
-	if (!(config & (FSCACHE_OBJLIST_CONFIG_PENDWR | FSCACHE_OBJLIST_CONFIG_NOPENDWR)))
-	    config   |= FSCACHE_OBJLIST_CONFIG_PENDWR | FSCACHE_OBJLIST_CONFIG_NOPENDWR;
-	if (!(config & (FSCACHE_OBJLIST_CONFIG_READS | FSCACHE_OBJLIST_CONFIG_NOREADS)))
-	    config   |= FSCACHE_OBJLIST_CONFIG_READS | FSCACHE_OBJLIST_CONFIG_NOREADS;
 	if (!(config & (FSCACHE_OBJLIST_CONFIG_EVENTS | FSCACHE_OBJLIST_CONFIG_NOEVENTS)))
 	    config   |= FSCACHE_OBJLIST_CONFIG_EVENTS | FSCACHE_OBJLIST_CONFIG_NOEVENTS;
 	if (!(config & (FSCACHE_OBJLIST_CONFIG_WORK | FSCACHE_OBJLIST_CONFIG_NOWORK)))
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index 5eda1cd265ef..5d50976bf379 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -315,9 +315,8 @@ void fscache_object_init(struct fscache_object *object,
 	INIT_WORK(&object->work, fscache_object_work_func);
 	INIT_LIST_HEAD(&object->dependents);
 	INIT_LIST_HEAD(&object->dep_link);
-	INIT_LIST_HEAD(&object->pending_ops);
 	object->n_children = 0;
-	object->n_ops = object->n_in_progress = object->n_exclusive = 0;
+	object->n_ops = 0;
 	object->events = 0;
 	object->cache = cache;
 	object->cookie = cookie;
@@ -580,14 +579,6 @@ static const struct fscache_state *fscache_object_available(struct fscache_objec
 	spin_lock(&object->lock);
 
 	fscache_done_parent_op(object);
-	if (object->n_in_progress == 0) {
-		if (object->n_ops > 0) {
-			ASSERTCMP(object->n_ops, >=, object->n_obj_ops);
-			fscache_start_operations(object);
-		} else {
-			ASSERT(list_empty(&object->pending_ops));
-		}
-	}
 	spin_unlock(&object->lock);
 
 	fscache_stat(&fscache_n_cop_lookup_complete);
@@ -654,24 +645,11 @@ static const struct fscache_state *fscache_kill_object(struct fscache_object *ob
 	fscache_mark_object_dead(object);
 	object->oob_event_mask = 0;
 
-	if (test_bit(FSCACHE_OBJECT_RETIRED, &object->flags)) {
-		/* Reject any new read/write ops and abort any that are pending. */
-		clear_bit(FSCACHE_OBJECT_PENDING_WRITE, &object->flags);
-		fscache_cancel_all_ops(object);
-	}
-
 	if (list_empty(&object->dependents) &&
 	    object->n_ops == 0 &&
 	    object->n_children == 0)
 		return transit_to(DROP_OBJECT);
 
-	if (object->n_in_progress == 0) {
-		spin_lock(&object->lock);
-		if (object->n_ops > 0 && object->n_in_progress == 0)
-			fscache_start_operations(object);
-		spin_unlock(&object->lock);
-	}
-
 	if (!list_empty(&object->dependents))
 		return transit_to(KILL_DEPENDENTS);
 
diff --git a/fs/fscache/operation.c b/fs/fscache/operation.c
deleted file mode 100644
index 4a5651d4904e..000000000000
--- a/fs/fscache/operation.c
+++ /dev/null
@@ -1,633 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/* FS-Cache worker operation management routines
- *
- * Copyright (C) 2008 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- *
- * See Documentation/filesystems/caching/operations.rst
- */
-
-#define FSCACHE_DEBUG_LEVEL OPERATION
-#include <linux/module.h>
-#include <linux/seq_file.h>
-#include <linux/slab.h>
-#include "internal.h"
-
-atomic_t fscache_op_debug_id;
-EXPORT_SYMBOL(fscache_op_debug_id);
-
-static void fscache_operation_dummy_cancel(struct fscache_operation *op)
-{
-}
-
-/**
- * fscache_operation_init - Do basic initialisation of an operation
- * @op: The operation to initialise
- * @release: The release function to assign
- *
- * Do basic initialisation of an operation.  The caller must still set flags,
- * object and processor if needed.
- */
-void fscache_operation_init(struct fscache_cookie *cookie,
-			    struct fscache_operation *op,
-			    fscache_operation_processor_t processor,
-			    fscache_operation_cancel_t cancel,
-			    fscache_operation_release_t release)
-{
-	INIT_WORK(&op->work, fscache_op_work_func);
-	atomic_set(&op->usage, 1);
-	op->state = FSCACHE_OP_ST_INITIALISED;
-	op->debug_id = atomic_inc_return(&fscache_op_debug_id);
-	op->processor = processor;
-	op->cancel = cancel ?: fscache_operation_dummy_cancel;
-	op->release = release;
-	INIT_LIST_HEAD(&op->pend_link);
-	fscache_stat(&fscache_n_op_initialised);
-	trace_fscache_op(cookie, op, fscache_op_init);
-}
-EXPORT_SYMBOL(fscache_operation_init);
-
-/**
- * fscache_enqueue_operation - Enqueue an operation for processing
- * @op: The operation to enqueue
- *
- * Enqueue an operation for processing by the FS-Cache thread pool.
- *
- * This will get its own ref on the object.
- */
-void fscache_enqueue_operation(struct fscache_operation *op)
-{
-	struct fscache_cookie *cookie = op->object->cookie;
-	
-	_enter("{OBJ%x OP%x,%u}",
-	       op->object->debug_id, op->debug_id, atomic_read(&op->usage));
-
-	ASSERT(list_empty(&op->pend_link));
-	ASSERT(op->processor != NULL);
-	ASSERT(fscache_object_is_available(op->object));
-	ASSERTCMP(atomic_read(&op->usage), >, 0);
-	ASSERTIFCMP(op->state != FSCACHE_OP_ST_IN_PROGRESS,
-		    op->state, ==,  FSCACHE_OP_ST_CANCELLED);
-
-	fscache_stat(&fscache_n_op_enqueue);
-	switch (op->flags & FSCACHE_OP_TYPE) {
-	case FSCACHE_OP_ASYNC:
-		trace_fscache_op(cookie, op, fscache_op_enqueue_async);
-		_debug("queue async");
-		atomic_inc(&op->usage);
-		if (!queue_work(fscache_op_wq, &op->work))
-			fscache_put_operation(op);
-		break;
-	case FSCACHE_OP_MYTHREAD:
-		trace_fscache_op(cookie, op, fscache_op_enqueue_mythread);
-		_debug("queue for caller's attention");
-		break;
-	default:
-		pr_err("Unexpected op type %lx", op->flags);
-		BUG();
-		break;
-	}
-}
-EXPORT_SYMBOL(fscache_enqueue_operation);
-
-/*
- * start an op running
- */
-static void fscache_run_op(struct fscache_object *object,
-			   struct fscache_operation *op)
-{
-	ASSERTCMP(op->state, ==, FSCACHE_OP_ST_PENDING);
-
-	op->state = FSCACHE_OP_ST_IN_PROGRESS;
-	object->n_in_progress++;
-	if (test_and_clear_bit(FSCACHE_OP_WAITING, &op->flags))
-		wake_up_bit(&op->flags, FSCACHE_OP_WAITING);
-	if (op->processor)
-		fscache_enqueue_operation(op);
-	else
-		trace_fscache_op(object->cookie, op, fscache_op_run);
-	fscache_stat(&fscache_n_op_run);
-}
-
-/*
- * report an unexpected submission
- */
-static void fscache_report_unexpected_submission(struct fscache_object *object,
-						 struct fscache_operation *op,
-						 const struct fscache_state *ostate)
-{
-	static bool once_only;
-	struct fscache_operation *p;
-	unsigned n;
-
-	if (once_only)
-		return;
-	once_only = true;
-
-	kdebug("unexpected submission OP%x [OBJ%x %s]",
-	       op->debug_id, object->debug_id, object->state->name);
-	kdebug("objstate=%s [%s]", object->state->name, ostate->name);
-	kdebug("objflags=%lx", object->flags);
-	kdebug("objevent=%lx [%lx]", object->events, object->event_mask);
-	kdebug("ops=%u inp=%u exc=%u",
-	       object->n_ops, object->n_in_progress, object->n_exclusive);
-
-	if (!list_empty(&object->pending_ops)) {
-		n = 0;
-		list_for_each_entry(p, &object->pending_ops, pend_link) {
-			ASSERTCMP(p->object, ==, object);
-			kdebug("%p %p", op->processor, op->release);
-			n++;
-		}
-
-		kdebug("n=%u", n);
-	}
-
-	dump_stack();
-}
-
-/*
- * submit an exclusive operation for an object
- * - other ops are excluded from running simultaneously with this one
- * - this gets any extra refs it needs on an op
- */
-int fscache_submit_exclusive_op(struct fscache_object *object,
-				struct fscache_operation *op)
-{
-	const struct fscache_state *ostate;
-	unsigned long flags;
-	int ret;
-
-	_enter("{OBJ%x OP%x},", object->debug_id, op->debug_id);
-
-	trace_fscache_op(object->cookie, op, fscache_op_submit_ex);
-
-	ASSERTCMP(op->state, ==, FSCACHE_OP_ST_INITIALISED);
-	ASSERTCMP(atomic_read(&op->usage), >, 0);
-
-	spin_lock(&object->lock);
-	ASSERTCMP(object->n_ops, >=, object->n_in_progress);
-	ASSERTCMP(object->n_ops, >=, object->n_exclusive);
-	ASSERT(list_empty(&op->pend_link));
-
-	ostate = object->state;
-	smp_rmb();
-
-	op->state = FSCACHE_OP_ST_PENDING;
-	flags = READ_ONCE(object->flags);
-	if (unlikely(!(flags & BIT(FSCACHE_OBJECT_IS_LIVE)))) {
-		fscache_stat(&fscache_n_op_rejected);
-		op->cancel(op);
-		op->state = FSCACHE_OP_ST_CANCELLED;
-		ret = -ENOBUFS;
-	} else if (unlikely(fscache_cache_is_broken(object))) {
-		op->cancel(op);
-		op->state = FSCACHE_OP_ST_CANCELLED;
-		ret = -EIO;
-	} else if (flags & BIT(FSCACHE_OBJECT_IS_AVAILABLE)) {
-		op->object = object;
-		object->n_ops++;
-		object->n_exclusive++;	/* reads and writes must wait */
-
-		if (object->n_in_progress > 0) {
-			atomic_inc(&op->usage);
-			list_add_tail(&op->pend_link, &object->pending_ops);
-			fscache_stat(&fscache_n_op_pend);
-		} else if (!list_empty(&object->pending_ops)) {
-			atomic_inc(&op->usage);
-			list_add_tail(&op->pend_link, &object->pending_ops);
-			fscache_stat(&fscache_n_op_pend);
-			fscache_start_operations(object);
-		} else {
-			ASSERTCMP(object->n_in_progress, ==, 0);
-			fscache_run_op(object, op);
-		}
-
-		/* need to issue a new write op after this */
-		clear_bit(FSCACHE_OBJECT_PENDING_WRITE, &object->flags);
-		ret = 0;
-	} else if (flags & BIT(FSCACHE_OBJECT_IS_LOOKED_UP)) {
-		op->object = object;
-		object->n_ops++;
-		object->n_exclusive++;	/* reads and writes must wait */
-		atomic_inc(&op->usage);
-		list_add_tail(&op->pend_link, &object->pending_ops);
-		fscache_stat(&fscache_n_op_pend);
-		ret = 0;
-	} else if (flags & BIT(FSCACHE_OBJECT_KILLED_BY_CACHE)) {
-		op->cancel(op);
-		op->state = FSCACHE_OP_ST_CANCELLED;
-		ret = -ENOBUFS;
-	} else {
-		fscache_report_unexpected_submission(object, op, ostate);
-		op->cancel(op);
-		op->state = FSCACHE_OP_ST_CANCELLED;
-		ret = -ENOBUFS;
-	}
-
-	spin_unlock(&object->lock);
-	return ret;
-}
-
-/*
- * submit an operation for an object
- * - objects may be submitted only in the following states:
- *   - during object creation (write ops may be submitted)
- *   - whilst the object is active
- *   - after an I/O error incurred in one of the two above states (op rejected)
- * - this gets any extra refs it needs on an op
- */
-int fscache_submit_op(struct fscache_object *object,
-		      struct fscache_operation *op)
-{
-	const struct fscache_state *ostate;
-	unsigned long flags;
-	int ret;
-
-	_enter("{OBJ%x OP%x},{%u}",
-	       object->debug_id, op->debug_id, atomic_read(&op->usage));
-
-	trace_fscache_op(object->cookie, op, fscache_op_submit);
-
-	ASSERTCMP(op->state, ==, FSCACHE_OP_ST_INITIALISED);
-	ASSERTCMP(atomic_read(&op->usage), >, 0);
-
-	spin_lock(&object->lock);
-	ASSERTCMP(object->n_ops, >=, object->n_in_progress);
-	ASSERTCMP(object->n_ops, >=, object->n_exclusive);
-	ASSERT(list_empty(&op->pend_link));
-
-	ostate = object->state;
-	smp_rmb();
-
-	op->state = FSCACHE_OP_ST_PENDING;
-	flags = READ_ONCE(object->flags);
-	if (unlikely(!(flags & BIT(FSCACHE_OBJECT_IS_LIVE)))) {
-		fscache_stat(&fscache_n_op_rejected);
-		op->cancel(op);
-		op->state = FSCACHE_OP_ST_CANCELLED;
-		ret = -ENOBUFS;
-	} else if (unlikely(fscache_cache_is_broken(object))) {
-		op->cancel(op);
-		op->state = FSCACHE_OP_ST_CANCELLED;
-		ret = -EIO;
-	} else if (flags & BIT(FSCACHE_OBJECT_IS_AVAILABLE)) {
-		op->object = object;
-		object->n_ops++;
-
-		if (object->n_exclusive > 0) {
-			atomic_inc(&op->usage);
-			list_add_tail(&op->pend_link, &object->pending_ops);
-			fscache_stat(&fscache_n_op_pend);
-		} else if (!list_empty(&object->pending_ops)) {
-			atomic_inc(&op->usage);
-			list_add_tail(&op->pend_link, &object->pending_ops);
-			fscache_stat(&fscache_n_op_pend);
-			fscache_start_operations(object);
-		} else {
-			ASSERTCMP(object->n_exclusive, ==, 0);
-			fscache_run_op(object, op);
-		}
-		ret = 0;
-	} else if (flags & BIT(FSCACHE_OBJECT_IS_LOOKED_UP)) {
-		op->object = object;
-		object->n_ops++;
-		atomic_inc(&op->usage);
-		list_add_tail(&op->pend_link, &object->pending_ops);
-		fscache_stat(&fscache_n_op_pend);
-		ret = 0;
-	} else if (flags & BIT(FSCACHE_OBJECT_KILLED_BY_CACHE)) {
-		op->cancel(op);
-		op->state = FSCACHE_OP_ST_CANCELLED;
-		ret = -ENOBUFS;
-	} else {
-		fscache_report_unexpected_submission(object, op, ostate);
-		ASSERT(!fscache_object_is_active(object));
-		op->cancel(op);
-		op->state = FSCACHE_OP_ST_CANCELLED;
-		ret = -ENOBUFS;
-	}
-
-	spin_unlock(&object->lock);
-	return ret;
-}
-
-/*
- * queue an object for withdrawal on error, aborting all following asynchronous
- * operations
- */
-void fscache_abort_object(struct fscache_object *object)
-{
-	_enter("{OBJ%x}", object->debug_id);
-
-	fscache_raise_event(object, FSCACHE_OBJECT_EV_ERROR);
-}
-
-/*
- * Jump start the operation processing on an object.  The caller must hold
- * object->lock.
- */
-void fscache_start_operations(struct fscache_object *object)
-{
-	struct fscache_operation *op;
-	bool stop = false;
-
-	while (!list_empty(&object->pending_ops) && !stop) {
-		op = list_entry(object->pending_ops.next,
-				struct fscache_operation, pend_link);
-
-		if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags)) {
-			if (object->n_in_progress > 0)
-				break;
-			stop = true;
-		}
-		list_del_init(&op->pend_link);
-		fscache_run_op(object, op);
-
-		/* the pending queue was holding a ref on the object */
-		fscache_put_operation(op);
-	}
-
-	ASSERTCMP(object->n_in_progress, <=, object->n_ops);
-
-	_debug("woke %d ops on OBJ%x",
-	       object->n_in_progress, object->debug_id);
-}
-
-/*
- * cancel an operation that's pending on an object
- */
-int fscache_cancel_op(struct fscache_operation *op,
-		      bool cancel_in_progress_op)
-{
-	struct fscache_object *object = op->object;
-	bool put = false;
-	int ret;
-
-	_enter("OBJ%x OP%x}", op->object->debug_id, op->debug_id);
-
-	trace_fscache_op(object->cookie, op, fscache_op_cancel);
-
-	ASSERTCMP(op->state, >=, FSCACHE_OP_ST_PENDING);
-	ASSERTCMP(op->state, !=, FSCACHE_OP_ST_CANCELLED);
-	ASSERTCMP(atomic_read(&op->usage), >, 0);
-
-	spin_lock(&object->lock);
-
-	ret = -EBUSY;
-	if (op->state == FSCACHE_OP_ST_PENDING) {
-		ASSERT(!list_empty(&op->pend_link));
-		list_del_init(&op->pend_link);
-		put = true;
-
-		fscache_stat(&fscache_n_op_cancelled);
-		op->cancel(op);
-		op->state = FSCACHE_OP_ST_CANCELLED;
-		if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags))
-			object->n_exclusive--;
-		if (test_and_clear_bit(FSCACHE_OP_WAITING, &op->flags))
-			wake_up_bit(&op->flags, FSCACHE_OP_WAITING);
-		ret = 0;
-	} else if (op->state == FSCACHE_OP_ST_IN_PROGRESS && cancel_in_progress_op) {
-		ASSERTCMP(object->n_in_progress, >, 0);
-		if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags))
-			object->n_exclusive--;
-		object->n_in_progress--;
-		if (object->n_in_progress == 0)
-			fscache_start_operations(object);
-
-		fscache_stat(&fscache_n_op_cancelled);
-		op->cancel(op);
-		op->state = FSCACHE_OP_ST_CANCELLED;
-		if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags))
-			object->n_exclusive--;
-		if (test_and_clear_bit(FSCACHE_OP_WAITING, &op->flags))
-			wake_up_bit(&op->flags, FSCACHE_OP_WAITING);
-		ret = 0;
-	}
-
-	if (put)
-		fscache_put_operation(op);
-	spin_unlock(&object->lock);
-	_leave(" = %d", ret);
-	return ret;
-}
-
-/*
- * Cancel all pending operations on an object
- */
-void fscache_cancel_all_ops(struct fscache_object *object)
-{
-	struct fscache_operation *op;
-
-	_enter("OBJ%x", object->debug_id);
-
-	spin_lock(&object->lock);
-
-	while (!list_empty(&object->pending_ops)) {
-		op = list_entry(object->pending_ops.next,
-				struct fscache_operation, pend_link);
-		fscache_stat(&fscache_n_op_cancelled);
-		list_del_init(&op->pend_link);
-
-		trace_fscache_op(object->cookie, op, fscache_op_cancel_all);
-
-		ASSERTCMP(op->state, ==, FSCACHE_OP_ST_PENDING);
-		op->cancel(op);
-		op->state = FSCACHE_OP_ST_CANCELLED;
-
-		if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags))
-			object->n_exclusive--;
-		if (test_and_clear_bit(FSCACHE_OP_WAITING, &op->flags))
-			wake_up_bit(&op->flags, FSCACHE_OP_WAITING);
-		fscache_put_operation(op);
-		cond_resched_lock(&object->lock);
-	}
-
-	spin_unlock(&object->lock);
-	_leave("");
-}
-
-/*
- * Record the completion or cancellation of an in-progress operation.
- */
-void fscache_op_complete(struct fscache_operation *op, bool cancelled)
-{
-	struct fscache_object *object = op->object;
-
-	_enter("OBJ%x", object->debug_id);
-
-	ASSERTCMP(op->state, ==, FSCACHE_OP_ST_IN_PROGRESS);
-	ASSERTCMP(object->n_in_progress, >, 0);
-	ASSERTIFCMP(test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags),
-		    object->n_exclusive, >, 0);
-	ASSERTIFCMP(test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags),
-		    object->n_in_progress, ==, 1);
-
-	spin_lock(&object->lock);
-
-	if (!cancelled) {
-		trace_fscache_op(object->cookie, op, fscache_op_completed);
-		op->state = FSCACHE_OP_ST_COMPLETE;
-	} else {
-		op->cancel(op);
-		trace_fscache_op(object->cookie, op, fscache_op_cancelled);
-		op->state = FSCACHE_OP_ST_CANCELLED;
-	}
-
-	if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags))
-		object->n_exclusive--;
-	object->n_in_progress--;
-	if (object->n_in_progress == 0)
-		fscache_start_operations(object);
-
-	spin_unlock(&object->lock);
-	_leave("");
-}
-EXPORT_SYMBOL(fscache_op_complete);
-
-/*
- * release an operation
- * - queues pending ops if this is the last in-progress op
- */
-void fscache_put_operation(struct fscache_operation *op)
-{
-	struct fscache_object *object;
-	struct fscache_cache *cache;
-
-	_enter("{OBJ%x OP%x,%d}",
-	       op->object ? op->object->debug_id : 0,
-	       op->debug_id, atomic_read(&op->usage));
-
-	ASSERTCMP(atomic_read(&op->usage), >, 0);
-
-	if (!atomic_dec_and_test(&op->usage))
-		return;
-
-	trace_fscache_op(op->object ? op->object->cookie : NULL, op, fscache_op_put);
-
-	_debug("PUT OP");
-	ASSERTIFCMP(op->state != FSCACHE_OP_ST_INITIALISED &&
-		    op->state != FSCACHE_OP_ST_COMPLETE,
-		    op->state, ==, FSCACHE_OP_ST_CANCELLED);
-
-	fscache_stat(&fscache_n_op_release);
-
-	if (op->release) {
-		op->release(op);
-		op->release = NULL;
-	}
-	op->state = FSCACHE_OP_ST_DEAD;
-
-	object = op->object;
-	if (likely(object)) {
-		if (test_bit(FSCACHE_OP_DEC_READ_CNT, &op->flags))
-			atomic_dec(&object->n_reads);
-		if (test_bit(FSCACHE_OP_UNUSE_COOKIE, &op->flags))
-			fscache_unuse_cookie(object);
-
-		/* now... we may get called with the object spinlock held, so we
-		 * complete the cleanup here only if we can immediately acquire the
-		 * lock, and defer it otherwise */
-		if (!spin_trylock(&object->lock)) {
-			_debug("defer put");
-			fscache_stat(&fscache_n_op_deferred_release);
-
-			cache = object->cache;
-			spin_lock(&cache->op_gc_list_lock);
-			list_add_tail(&op->pend_link, &cache->op_gc_list);
-			spin_unlock(&cache->op_gc_list_lock);
-			schedule_work(&cache->op_gc);
-			_leave(" [defer]");
-			return;
-		}
-
-		ASSERTCMP(object->n_ops, >, 0);
-		object->n_ops--;
-		if (object->n_ops == 0)
-			fscache_raise_event(object, FSCACHE_OBJECT_EV_CLEARED);
-
-		spin_unlock(&object->lock);
-	}
-
-	kfree(op);
-	_leave(" [done]");
-}
-EXPORT_SYMBOL(fscache_put_operation);
-
-/*
- * garbage collect operations that have had their release deferred
- */
-void fscache_operation_gc(struct work_struct *work)
-{
-	struct fscache_operation *op;
-	struct fscache_object *object;
-	struct fscache_cache *cache =
-		container_of(work, struct fscache_cache, op_gc);
-	int count = 0;
-
-	_enter("");
-
-	do {
-		spin_lock(&cache->op_gc_list_lock);
-		if (list_empty(&cache->op_gc_list)) {
-			spin_unlock(&cache->op_gc_list_lock);
-			break;
-		}
-
-		op = list_entry(cache->op_gc_list.next,
-				struct fscache_operation, pend_link);
-		list_del(&op->pend_link);
-		spin_unlock(&cache->op_gc_list_lock);
-
-		object = op->object;
-		trace_fscache_op(object->cookie, op, fscache_op_gc);
-
-		spin_lock(&object->lock);
-
-		_debug("GC DEFERRED REL OBJ%x OP%x",
-		       object->debug_id, op->debug_id);
-		fscache_stat(&fscache_n_op_gc);
-
-		ASSERTCMP(atomic_read(&op->usage), ==, 0);
-		ASSERTCMP(op->state, ==, FSCACHE_OP_ST_DEAD);
-
-		ASSERTCMP(object->n_ops, >, 0);
-		object->n_ops--;
-		if (object->n_ops == 0)
-			fscache_raise_event(object, FSCACHE_OBJECT_EV_CLEARED);
-
-		spin_unlock(&object->lock);
-		kfree(op);
-
-	} while (count++ < 20);
-
-	if (!list_empty(&cache->op_gc_list))
-		schedule_work(&cache->op_gc);
-
-	_leave("");
-}
-
-/*
- * execute an operation using fs_op_wq to provide processing context -
- * the caller holds a ref to this object, so we don't need to hold one
- */
-void fscache_op_work_func(struct work_struct *work)
-{
-	struct fscache_operation *op =
-		container_of(work, struct fscache_operation, work);
-	unsigned long start;
-
-	_enter("{OBJ%x OP%x,%d}",
-	       op->object->debug_id, op->debug_id, atomic_read(&op->usage));
-
-	trace_fscache_op(op->object->cookie, op, fscache_op_work);
-
-	ASSERT(op->processor != NULL);
-	start = jiffies;
-	op->processor(op);
-	fscache_hist(fscache_ops_histogram, start);
-	fscache_put_operation(op);
-
-	_leave("");
-}
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 0fbe25b1271b..8b6aa11dee54 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -23,7 +23,6 @@
 struct fscache_cache;
 struct fscache_cache_ops;
 struct fscache_object;
-struct fscache_operation;
 
 enum fscache_obj_ref_trace {
 	fscache_obj_get_add_to_deps,
@@ -62,11 +61,8 @@ struct fscache_cache {
 	char			identifier[36];	/* cache label */
 
 	/* node management */
-	struct work_struct	op_gc;		/* operation garbage collector */
 	struct list_head	object_list;	/* list of data/index objects */
-	struct list_head	op_gc_list;	/* list of ops to be deleted */
 	spinlock_t		object_list_lock;
-	spinlock_t		op_gc_list_lock;
 	atomic_t		object_count;	/* no. of live objects in this cache */
 	struct fscache_object	*fsdef;		/* object for the fsdef index */
 	unsigned long		flags;
@@ -76,68 +72,6 @@ struct fscache_cache {
 
 extern wait_queue_head_t fscache_cache_cleared_wq;
 
-/*
- * operation to be applied to a cache object
- * - retrieval initiation operations are done in the context of the process
- *   that issued them, and not in an async thread pool
- */
-typedef void (*fscache_operation_release_t)(struct fscache_operation *op);
-typedef void (*fscache_operation_processor_t)(struct fscache_operation *op);
-typedef void (*fscache_operation_cancel_t)(struct fscache_operation *op);
-
-enum fscache_operation_state {
-	FSCACHE_OP_ST_BLANK,		/* Op is not yet submitted */
-	FSCACHE_OP_ST_INITIALISED,	/* Op is initialised */
-	FSCACHE_OP_ST_PENDING,		/* Op is blocked from running */
-	FSCACHE_OP_ST_IN_PROGRESS,	/* Op is in progress */
-	FSCACHE_OP_ST_COMPLETE,		/* Op is complete */
-	FSCACHE_OP_ST_CANCELLED,	/* Op has been cancelled */
-	FSCACHE_OP_ST_DEAD		/* Op is now dead */
-};
-
-struct fscache_operation {
-	struct work_struct	work;		/* record for async ops */
-	struct list_head	pend_link;	/* link in object->pending_ops */
-	struct fscache_object	*object;	/* object to be operated upon */
-
-	unsigned long		flags;
-#define FSCACHE_OP_TYPE		0x000f	/* operation type */
-#define FSCACHE_OP_ASYNC	0x0001	/* - async op, processor may sleep for disk */
-#define FSCACHE_OP_MYTHREAD	0x0002	/* - processing is done be issuing thread, not pool */
-#define FSCACHE_OP_WAITING	4	/* cleared when op is woken */
-#define FSCACHE_OP_EXCLUSIVE	5	/* exclusive op, other ops must wait */
-#define FSCACHE_OP_DEC_READ_CNT	6	/* decrement object->n_reads on destruction */
-#define FSCACHE_OP_UNUSE_COOKIE	7	/* call fscache_unuse_cookie() on completion */
-#define FSCACHE_OP_KEEP_FLAGS	0x00f0	/* flags to keep when repurposing an op */
-
-	enum fscache_operation_state state;
-	atomic_t		usage;
-	unsigned		debug_id;	/* debugging ID */
-
-	/* operation processor callback
-	 * - can be NULL if FSCACHE_OP_WAITING is going to be used to perform
-	 *   the op in a non-pool thread */
-	fscache_operation_processor_t processor;
-
-	/* Operation cancellation cleanup (optional) */
-	fscache_operation_cancel_t cancel;
-
-	/* operation releaser */
-	fscache_operation_release_t release;
-};
-
-extern atomic_t fscache_op_debug_id;
-extern void fscache_op_work_func(struct work_struct *work);
-
-extern void fscache_enqueue_operation(struct fscache_operation *);
-extern void fscache_op_complete(struct fscache_operation *, bool);
-extern void fscache_put_operation(struct fscache_operation *);
-extern void fscache_operation_init(struct fscache_cookie *,
-				   struct fscache_operation *,
-				   fscache_operation_processor_t,
-				   fscache_operation_cancel_t,
-				   fscache_operation_release_t);
-
 /*
  * cache operations
  */
@@ -232,9 +166,6 @@ struct fscache_object {
 	int			n_children;	/* number of child objects */
 	int			n_ops;		/* number of extant ops on object */
 	int			n_obj_ops;	/* number of object ops outstanding on object */
-	int			n_in_progress;	/* number of ops in progress */
-	int			n_exclusive;	/* number of exclusive ops queued or in progress */
-	atomic_t		n_reads;	/* number of read ops in progress */
 	spinlock_t		lock;		/* state and operations lock */
 
 	unsigned long		lookup_jif;	/* time at which lookup started */
@@ -245,7 +176,6 @@ struct fscache_object {
 
 	unsigned long		flags;
 #define FSCACHE_OBJECT_LOCK		0	/* T if object is busy being processed */
-#define FSCACHE_OBJECT_PENDING_WRITE	1	/* T if object has pending write */
 #define FSCACHE_OBJECT_WAITING		2	/* T if object is waiting on its parent */
 #define FSCACHE_OBJECT_IS_LIVE		3	/* T if object is not withdrawn or relinquished */
 #define FSCACHE_OBJECT_IS_LOOKED_UP	4	/* T if object has been looked up */
@@ -262,7 +192,6 @@ struct fscache_object {
 	struct work_struct	work;		/* attention scheduling record */
 	struct list_head	dependents;	/* FIFO of dependent objects */
 	struct list_head	dep_link;	/* link in parent's dependents list */
-	struct list_head	pending_ops;	/* unstarted operations on this object */
 #ifdef CONFIG_FSCACHE_OBJECT_LIST
 	struct rb_node		objlist_link;	/* link in global object list */
 #endif
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 2ebfd688a7c2..08d7de72409d 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -33,24 +33,6 @@ enum fscache_cookie_trace {
 	fscache_cookie_put_parent,
 };
 
-enum fscache_op_trace {
-	fscache_op_cancel,
-	fscache_op_cancel_all,
-	fscache_op_cancelled,
-	fscache_op_completed,
-	fscache_op_enqueue_async,
-	fscache_op_enqueue_mythread,
-	fscache_op_gc,
-	fscache_op_init,
-	fscache_op_put,
-	fscache_op_run,
-	fscache_op_signal,
-	fscache_op_submit,
-	fscache_op_submit_ex,
-	fscache_op_work,
-	fscache_op_trace__nr
-};
-
 #endif
 
 /*
@@ -69,22 +51,6 @@ enum fscache_op_trace {
 	EM(fscache_cookie_put_object,		"PUT obj")		\
 	E_(fscache_cookie_put_parent,		"PUT prn")
 
-#define fscache_op_traces						\
-	EM(fscache_op_cancel,			"Cancel1")		\
-	EM(fscache_op_cancel_all,		"CancelA")		\
-	EM(fscache_op_cancelled,		"Canclld")		\
-	EM(fscache_op_completed,		"Complet")		\
-	EM(fscache_op_enqueue_async,		"EnqAsyn")		\
-	EM(fscache_op_enqueue_mythread,		"EnqMyTh")		\
-	EM(fscache_op_gc,			"GC     ")		\
-	EM(fscache_op_init,			"Init   ")		\
-	EM(fscache_op_put,			"Put    ")		\
-	EM(fscache_op_run,			"Run    ")		\
-	EM(fscache_op_signal,			"Signal ")		\
-	EM(fscache_op_submit,			"Submit ")		\
-	EM(fscache_op_submit_ex,		"SubmitX")		\
-	E_(fscache_op_work,			"Work   ")
-
 /*
  * Export enum symbols via userspace.
  */
@@ -309,29 +275,6 @@ TRACE_EVENT(fscache_osm,
 		      __entry->event_num)
 	    );
 
-TRACE_EVENT(fscache_op,
-	    TP_PROTO(struct fscache_cookie *cookie, struct fscache_operation *op,
-		     enum fscache_op_trace why),
-
-	    TP_ARGS(cookie, op, why),
-
-	    TP_STRUCT__entry(
-		    __field(unsigned int,		cookie		)
-		    __field(unsigned int,		op		)
-		    __field(enum fscache_op_trace,	why		)
-			     ),
-
-	    TP_fast_assign(
-		    __entry->cookie		= cookie->debug_id;
-		    __entry->op			= op->debug_id;
-		    __entry->why		= why;
-			   ),
-
-	    TP_printk("c=%08x op=%08x %s",
-		      __entry->cookie, __entry->op,
-		      __print_symbolic(__entry->why, fscache_op_traces))
-	    );
-
 #endif /* _TRACE_FSCACHE_H */
 
 /* This part must be outside protection */



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 15/76] fscache: Change %p in format strings to something else
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (13 preceding siblings ...)
  2020-11-20 15:05 ` [RFC PATCH 14/76] fscache: Remove the I/O operation manager David Howells
@ 2020-11-20 15:05 ` David Howells
  2020-11-20 15:06 ` [RFC PATCH 16/76] cachefiles: " David Howells
                   ` (59 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:05 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Change plain %p in format strings in fscache code to something more useful,
since %p is now hashed before printing and thus no longer matches the
contents of an oops register dump.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/cache.c  |    8 ++++----
 fs/fscache/cookie.c |   16 +++++++---------
 fs/fscache/object.c |    1 -
 3 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/fs/fscache/cache.c b/fs/fscache/cache.c
index 92f747b72ec3..60fa68c7da3a 100644
--- a/fs/fscache/cache.c
+++ b/fs/fscache/cache.c
@@ -118,7 +118,7 @@ struct fscache_cache *fscache_select_cache_for_object(
 			cache = NULL;
 
 		spin_unlock(&cookie->lock);
-		_leave(" = %p [parent]", cache);
+		_leave(" = %s [parent]", cache ? cache->tag->name : "NULL");
 		return cache;
 	}
 
@@ -144,14 +144,14 @@ struct fscache_cache *fscache_select_cache_for_object(
 	if (test_bit(FSCACHE_IOERROR, &tag->cache->flags))
 		return NULL;
 
-	_leave(" = %p [specific]", tag->cache);
+	_leave(" = %s [specific]", tag->name);
 	return tag->cache;
 
 no_preference:
 	/* netfs has no preference - just select first cache */
 	cache = list_entry(fscache_cache_list.next,
 			   struct fscache_cache, link);
-	_leave(" = %p [first]", cache);
+	_leave(" = %s [first]", cache->tag->name);
 	return cache;
 }
 
@@ -324,7 +324,7 @@ static void fscache_withdraw_all_objects(struct fscache_cache *cache,
 					    struct fscache_object, cache_link);
 			list_move_tail(&object->cache_link, dying_objects);
 
-			_debug("withdraw %p", object->cookie);
+			_debug("withdraw %x", object->cookie->debug_id);
 
 			/* This must be done under object_list_lock to prevent
 			 * a race with fscache_drop_object().
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index aa33d16711e1..0346bc7f5818 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -369,7 +369,7 @@ void __fscache_enable_cookie(struct fscache_cookie *cookie,
 			     bool (*can_enable)(void *data),
 			     void *data)
 {
-	_enter("%p", cookie);
+	_enter("%x", cookie->debug_id);
 
 	trace_fscache_enable(cookie);
 
@@ -464,10 +464,8 @@ static int fscache_acquire_non_index_cookie(struct fscache_cookie *cookie)
 
 	/* we may be required to wait for lookup to complete at this point */
 	if (!fscache_defer_lookup) {
-		_debug("non-deferred lookup %p", &cookie->flags);
 		wait_on_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP,
 			    TASK_UNINTERRUPTIBLE);
-		_debug("complete");
 		if (test_bit(FSCACHE_COOKIE_UNAVAILABLE, &cookie->flags))
 			goto unavailable;
 	}
@@ -492,7 +490,7 @@ static int fscache_alloc_object(struct fscache_cache *cache,
 	struct fscache_object *object;
 	int ret;
 
-	_enter("%p,%p{%s}", cache, cookie, cookie->type_name);
+	_enter("%s,%x{%s}", cache->tag->name, cookie->debug_id, cookie->type_name);
 
 	spin_lock(&cookie->lock);
 	hlist_for_each_entry(object, &cookie->backing_objects,
@@ -668,7 +666,7 @@ EXPORT_SYMBOL(__fscache_invalidate);
  */
 void __fscache_wait_on_invalidate(struct fscache_cookie *cookie)
 {
-	_enter("%p", cookie);
+	_enter("%x", cookie->debug_id);
 
 	wait_on_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING,
 		    TASK_UNINTERRUPTIBLE);
@@ -723,7 +721,7 @@ void __fscache_disable_cookie(struct fscache_cookie *cookie,
 	struct fscache_object *object;
 	bool awaken = false;
 
-	_enter("%p,%u", cookie, invalidate);
+	_enter("%x,%u", cookie->debug_id, invalidate);
 
 	trace_fscache_disable(cookie);
 
@@ -808,8 +806,8 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie,
 		return;
 	}
 
-	_enter("%p{%s,%d},%d",
-	       cookie, cookie->type_name,
+	_enter("%x{%s,%d},%d",
+	       cookie->debug_id, cookie->type_name,
 	       atomic_read(&cookie->n_active), retire);
 
 	trace_fscache_relinquish(cookie, retire);
@@ -859,7 +857,7 @@ void fscache_cookie_put(struct fscache_cookie *cookie,
 	struct fscache_cookie *parent;
 	int usage;
 
-	_enter("%p", cookie);
+	_enter("%x", cookie->debug_id);
 
 	do {
 		usage = atomic_dec_return(&cookie->usage);
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index 5d50976bf379..efdd5309d88e 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -519,7 +519,6 @@ void fscache_object_lookup_negative(struct fscache_object *object)
 		set_bit(FSCACHE_COOKIE_NO_DATA_YET, &cookie->flags);
 		clear_bit(FSCACHE_COOKIE_UNAVAILABLE, &cookie->flags);
 
-		_debug("wake up lookup %p", &cookie->flags);
 		clear_bit_unlock(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags);
 		wake_up_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP);
 	}



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 16/76] cachefiles: Change %p in format strings to something else
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (14 preceding siblings ...)
  2020-11-20 15:05 ` [RFC PATCH 15/76] fscache: Change %p in format strings to something else David Howells
@ 2020-11-20 15:06 ` David Howells
  2020-11-20 15:06 ` [RFC PATCH 17/76] iov_iter: Add ITER_XARRAY David Howells
                   ` (58 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:06 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Change plain %p in format strings in cachefiles code to something more
useful, since %p is now hashed before printing and thus no longer matches
the contents of an oops register dump.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/bind.c      |    2 --
 fs/cachefiles/interface.c |    6 +++---
 fs/cachefiles/key.c       |    2 +-
 fs/cachefiles/namei.c     |   46 +++++++++++++++++++++------------------------
 fs/cachefiles/xattr.c     |    4 ++--
 5 files changed, 27 insertions(+), 33 deletions(-)

diff --git a/fs/cachefiles/bind.c b/fs/cachefiles/bind.c
index dfb14dbddf51..2e9d01a9d53f 100644
--- a/fs/cachefiles/bind.c
+++ b/fs/cachefiles/bind.c
@@ -108,8 +108,6 @@ static int cachefiles_daemon_add_cache(struct cachefiles_cache *cache)
 	atomic_set(&fsdef->usage, 1);
 	fsdef->type = FSCACHE_COOKIE_TYPE_INDEX;
 
-	_debug("- fsdef %p", fsdef);
-
 	/* look up the directory at the root of the cache */
 	ret = kern_path(cache->rootdirname, LOOKUP_DIRECTORY, &path);
 	if (ret < 0)
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 99f42d216ef7..7e10793352fd 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -27,7 +27,7 @@ static struct fscache_object *cachefiles_alloc_object(
 
 	cache = container_of(_cache, struct cachefiles_cache, cache);
 
-	_enter("{%s},%p,", cache->cache.identifier, cookie);
+	_enter("{%s},%x,", cache->cache.identifier, cookie->debug_id);
 
 	/* create a new object record and a temporary leaf image */
 	object = kmem_cache_alloc(cachefiles_object_jar, cachefiles_gfp);
@@ -68,7 +68,7 @@ static struct fscache_object *cachefiles_alloc_object(
 
 	object->lookup_key = key;
 
-	_leave(" = %p [%s]", &object->fscache, key);
+	_leave(" = %x [%s]", object->fscache.debug_id, key);
 	return &object->fscache;
 
 nomem_key:
@@ -301,7 +301,7 @@ static void cachefiles_sync_cache(struct fscache_cache *_cache)
 	const struct cred *saved_cred;
 	int ret;
 
-	_enter("%p", _cache);
+	_enter("%s", _cache->tag->name);
 
 	cache = container_of(_cache, struct cachefiles_cache, cache);
 
diff --git a/fs/cachefiles/key.c b/fs/cachefiles/key.c
index be96f5fc5cac..7f94efc97e23 100644
--- a/fs/cachefiles/key.c
+++ b/fs/cachefiles/key.c
@@ -150,6 +150,6 @@ char *cachefiles_cook_key(const u8 *raw, int keylen, uint8_t type)
 	key[len++] = 0;
 	key[len] = 0;
 
-	_leave(" = %p %d", key, len);
+	_leave(" = %s %d", key, len);
 	return key;
 }
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 924042e8cced..2f37451691b0 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -38,17 +38,17 @@ void __cachefiles_printk_object(struct cachefiles_object *object,
 	       object->fscache.events, object->fscache.event_mask);
 	pr_err("%sops=%u\n",
 	       prefix, object->fscache.n_ops);
-	pr_err("%sparent=%p\n",
-	       prefix, object->fscache.parent);
+	pr_err("%sparent=%x\n",
+	       prefix, object->fscache.parent ? object->fscache.parent->debug_id : 0);
 
 	spin_lock(&object->fscache.lock);
 	cookie = object->fscache.cookie;
 	if (cookie) {
-		pr_err("%scookie=%p [pr=%p fl=%lx]\n",
+		pr_err("%scookie=%x [pr=%x fl=%lx]\n",
 		       prefix,
-		       object->fscache.cookie,
-		       object->fscache.cookie->parent,
-		       object->fscache.cookie->flags);
+		       cookie->debug_id,
+		       cookie->parent ? cookie->parent->debug_id : 0,
+		       cookie->flags);
 		pr_err("%skey=[%u] '", prefix, cookie->key_len);
 		k = (cookie->key_len <= sizeof(cookie->inline_key)) ?
 			cookie->inline_key : cookie->key;
@@ -108,7 +108,7 @@ static void cachefiles_mark_object_buried(struct cachefiles_cache *cache,
 
 	/* found the dentry for  */
 found_dentry:
-	kdebug("preemptive burial: OBJ%x [%s] %p",
+	kdebug("preemptive burial: OBJ%x [%s] %pd",
 	       object->fscache.debug_id,
 	       object->fscache.state->name,
 	       dentry);
@@ -138,7 +138,7 @@ static int cachefiles_mark_object_active(struct cachefiles_cache *cache,
 	struct rb_node **_p, *_parent = NULL;
 	struct dentry *dentry;
 
-	_enter(",%p", object);
+	_enter(",%x", object->fscache.debug_id);
 
 try_again:
 	write_lock(&cache->active_lock);
@@ -296,8 +296,6 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
 
 	_enter(",'%pd','%pd'", dir, rep);
 
-	_debug("remove %p from %p", rep, dir);
-
 	/* non-directories can just be unlinked */
 	if (!d_is_dir(rep)) {
 		_debug("unlink stale object");
@@ -436,7 +434,7 @@ int cachefiles_delete_object(struct cachefiles_cache *cache,
 	struct dentry *dir;
 	int ret;
 
-	_enter(",OBJ%x{%p}", object->fscache.debug_id, object->dentry);
+	_enter(",OBJ%x{%pd}", object->fscache.debug_id, object->dentry);
 
 	ASSERT(object->dentry);
 	ASSERT(d_backing_inode(object->dentry));
@@ -489,7 +487,7 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	const char *name;
 	int ret, nlen;
 
-	_enter("OBJ%x{%p},OBJ%x,%s,",
+	_enter("OBJ%x{%pd},OBJ%x,%s,",
 	       parent->fscache.debug_id, parent->dentry,
 	       object->fscache.debug_id, key);
 
@@ -534,7 +532,7 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 
 	inode = d_backing_inode(next);
 	trace_cachefiles_lookup(object, next, inode);
-	_debug("next -> %p %s", next, inode ? "positive" : "negative");
+	_debug("next -> %pd %s", next, inode ? "positive" : "negative");
 
 	if (!key)
 		object->new = !inode;
@@ -572,8 +570,8 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 			}
 			ASSERT(d_backing_inode(next));
 
-			_debug("mkdir -> %p{%p{ino=%lu}}",
-			       next, d_backing_inode(next), d_backing_inode(next)->i_ino);
+			_debug("mkdir -> %pd{ino=%lu}",
+			       next, d_backing_inode(next)->i_ino);
 
 		} else if (!d_can_lookup(next)) {
 			pr_err("inode %lu is not a directory\n",
@@ -602,8 +600,8 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 
 			ASSERT(d_backing_inode(next));
 
-			_debug("create -> %p{%p{ino=%lu}}",
-			       next, d_backing_inode(next), d_backing_inode(next)->i_ino);
+			_debug("create -> %pd{ino=%lu}",
+			       next, d_backing_inode(next)->i_ino);
 
 		} else if (!d_can_lookup(next) &&
 			   !d_is_reg(next)
@@ -772,7 +770,7 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
 		goto lookup_error;
 	}
 
-	_debug("subdir -> %p %s",
+	_debug("subdir -> %pd %s",
 	       subdir, d_backing_inode(subdir) ? "positive" : "negative");
 
 	/* we need to create the subdir if it doesn't exist yet */
@@ -798,10 +796,8 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
 		}
 		ASSERT(d_backing_inode(subdir));
 
-		_debug("mkdir -> %p{%p{ino=%lu}}",
-		       subdir,
-		       d_backing_inode(subdir),
-		       d_backing_inode(subdir)->i_ino);
+		_debug("mkdir -> %pd{ino=%lu}",
+		       subdir, d_backing_inode(subdir)->i_ino);
 	}
 
 	inode_unlock(d_inode(dir));
@@ -879,7 +875,7 @@ static struct dentry *cachefiles_check_active(struct cachefiles_cache *cache,
 	if (IS_ERR(victim))
 		goto lookup_error;
 
-	//_debug("victim -> %p %s",
+	//_debug("victim -> %pd %s",
 	//       victim, d_backing_inode(victim) ? "positive" : "negative");
 
 	/* if the object is no longer there then we probably retired the object
@@ -910,7 +906,7 @@ static struct dentry *cachefiles_check_active(struct cachefiles_cache *cache,
 
 	read_unlock(&cache->active_lock);
 
-	//_leave(" = %p", victim);
+	//_leave(" = %pd", victim);
 	return victim;
 
 object_in_use:
@@ -956,7 +952,7 @@ int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
 	if (IS_ERR(victim))
 		return PTR_ERR(victim);
 
-	_debug("victim -> %p %s",
+	_debug("victim -> %pd %s",
 	       victim, d_backing_inode(victim) ? "positive" : "negative");
 
 	/* okay... the victim is not being used so we can cull it
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index a4f1eddebe6f..46913d4157dd 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -41,7 +41,7 @@ int cachefiles_check_object_type(struct cachefiles_object *object)
 	else
 		snprintf(type, 3, "%02x", object->fscache.cookie->type);
 
-	_enter("%p{%s}", object, type);
+	_enter("%x{%s}", object->fscache.debug_id, type);
 
 	/* attempt to install a type label directly */
 	ret = vfs_setxattr(dentry, cachefiles_xattr_cache, type, 2,
@@ -112,7 +112,7 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 	if (!dentry)
 		return -ESTALE;
 
-	_enter("%p,#%d", object, len);
+	_enter("%x,#%d", object->fscache.debug_id, len);
 
 	buf = kmalloc(sizeof(struct cachefiles_xattr) + len, GFP_KERNEL);
 	if (!buf)



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 17/76] iov_iter: Add ITER_XARRAY
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (15 preceding siblings ...)
  2020-11-20 15:06 ` [RFC PATCH 16/76] cachefiles: " David Howells
@ 2020-11-20 15:06 ` David Howells
  2020-11-20 15:06 ` [RFC PATCH 18/76] vm: Add wait/unlock functions for PG_fscache David Howells
                   ` (57 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:06 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: Matthew Wilcox, dhowells, Jeff Layton, Matthew Wilcox,
	Alexander Viro, linux-cachefs, linux-afs, linux-nfs, linux-cifs,
	ceph-devel, v9fs-developer, linux-fsdevel, linux-kernel

Add an iterator, ITER_XARRAY, that walks through a set of pages attached to
an xarray, starting at a given page and offset and walking for the
specified amount of bytes.  The iterator supports transparent huge pages.

The caller must guarantee that the pages are all present and they must be
locked using PG_locked, PG_writeback or PG_fscache to prevent them from
going away or being migrated whilst they're being accessed.

This is useful for copying data from socket buffers to inodes in network
filesystems and for transferring data between those inodes and the cache
using direct I/O.

Whilst it is true that ITER_BVEC could be used instead, that would require
a bio_vec array to be allocated to refer to all the pages - which should be
redundant if inode->i_pages also points to all these pages.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
---

 include/linux/uio.h |   11 ++
 lib/iov_iter.c      |  312 +++++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 300 insertions(+), 23 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 72d88566694e..08b186df54ac 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -10,6 +10,7 @@
 #include <uapi/linux/uio.h>
 
 struct page;
+struct address_space;
 struct pipe_inode_info;
 
 struct kvec {
@@ -24,6 +25,7 @@ enum iter_type {
 	ITER_BVEC = 16,
 	ITER_PIPE = 32,
 	ITER_DISCARD = 64,
+	ITER_XARRAY = 128,
 };
 
 struct iov_iter {
@@ -39,6 +41,7 @@ struct iov_iter {
 		const struct iovec *iov;
 		const struct kvec *kvec;
 		const struct bio_vec *bvec;
+		struct xarray *xarray;
 		struct pipe_inode_info *pipe;
 	};
 	union {
@@ -47,6 +50,7 @@ struct iov_iter {
 			unsigned int head;
 			unsigned int start_head;
 		};
+		loff_t xarray_start;
 	};
 };
 
@@ -80,6 +84,11 @@ static inline bool iov_iter_is_discard(const struct iov_iter *i)
 	return iov_iter_type(i) == ITER_DISCARD;
 }
 
+static inline bool iov_iter_is_xarray(const struct iov_iter *i)
+{
+	return iov_iter_type(i) == ITER_XARRAY;
+}
+
 static inline unsigned char iov_iter_rw(const struct iov_iter *i)
 {
 	return i->type & (READ | WRITE);
@@ -221,6 +230,8 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_
 void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode_info *pipe,
 			size_t count);
 void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count);
+void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray,
+		     loff_t start, size_t count);
 ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
 			size_t maxsize, unsigned maxpages, size_t *start);
 ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 1635111c5bd2..6edf0f290159 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -78,7 +78,44 @@
 	}						\
 }
 
-#define iterate_all_kinds(i, n, v, I, B, K) {			\
+#define iterate_xarray(i, n, __v, skip, STEP) {		\
+	struct page *head = NULL;				\
+	size_t wanted = n, seg, offset;				\
+	loff_t start = i->xarray_start + skip;			\
+	pgoff_t index = start >> PAGE_SHIFT;			\
+	int j;							\
+								\
+	XA_STATE(xas, i->xarray, index);			\
+								\
+	rcu_read_lock();						\
+	xas_for_each(&xas, head, ULONG_MAX) {				\
+		if (xas_retry(&xas, head))				\
+			continue;					\
+		if (WARN_ON(xa_is_value(head)))				\
+			break;						\
+		if (WARN_ON(PageHuge(head)))				\
+			break;						\
+		for (j = (head->index < index) ? index - head->index : 0; \
+		     j < thp_nr_pages(head); j++) {			\
+			__v.bv_page = head + j;				\
+			offset = (i->xarray_start + skip) & ~PAGE_MASK;	\
+			seg = PAGE_SIZE - offset;			\
+			__v.bv_offset = offset;				\
+			__v.bv_len = min(n, seg);			\
+			(void)(STEP);					\
+			n -= __v.bv_len;				\
+			skip += __v.bv_len;				\
+			if (n == 0)					\
+				break;					\
+		}							\
+		if (n == 0)						\
+			break;						\
+	}							\
+	rcu_read_unlock();					\
+	n = wanted - n;						\
+}
+
+#define iterate_all_kinds(i, n, v, I, B, K, X) {		\
 	if (likely(n)) {					\
 		size_t skip = i->iov_offset;			\
 		if (unlikely(i->type & ITER_BVEC)) {		\
@@ -90,6 +127,9 @@
 			struct kvec v;				\
 			iterate_kvec(i, n, v, kvec, skip, (K))	\
 		} else if (unlikely(i->type & ITER_DISCARD)) {	\
+		} else if (unlikely(i->type & ITER_XARRAY)) {	\
+			struct bio_vec v;			\
+			iterate_xarray(i, n, v, skip, (X));	\
 		} else {					\
 			const struct iovec *iov;		\
 			struct iovec v;				\
@@ -98,7 +138,7 @@
 	}							\
 }
 
-#define iterate_and_advance(i, n, v, I, B, K) {			\
+#define iterate_and_advance(i, n, v, I, B, K, X) {		\
 	if (unlikely(i->count < n))				\
 		n = i->count;					\
 	if (i->count) {						\
@@ -123,6 +163,9 @@
 			i->kvec = kvec;				\
 		} else if (unlikely(i->type & ITER_DISCARD)) {	\
 			skip += n;				\
+		} else if (unlikely(i->type & ITER_XARRAY)) {	\
+			struct bio_vec v;			\
+			iterate_xarray(i, n, v, skip, (X))	\
 		} else {					\
 			const struct iovec *iov;		\
 			struct iovec v;				\
@@ -636,7 +679,9 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 		copyout(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len),
 		memcpy_to_page(v.bv_page, v.bv_offset,
 			       (from += v.bv_len) - v.bv_len, v.bv_len),
-		memcpy(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len)
+		memcpy(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len),
+		memcpy_to_page(v.bv_page, v.bv_offset,
+			       (from += v.bv_len) - v.bv_len, v.bv_len)
 	)
 
 	return bytes;
@@ -752,6 +797,15 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 			bytes = curr_addr - s_addr - rem;
 			return bytes;
 		}
+		}),
+		({
+		rem = copy_mc_to_page(v.bv_page, v.bv_offset,
+				      (from += v.bv_len) - v.bv_len, v.bv_len);
+		if (rem) {
+			curr_addr = (unsigned long) from;
+			bytes = curr_addr - s_addr - rem;
+			return bytes;
+		}
 		})
 	)
 
@@ -773,7 +827,9 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 		copyin((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
 		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
 				 v.bv_offset, v.bv_len),
-		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
+		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
+		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len)
 	)
 
 	return bytes;
@@ -799,7 +855,9 @@ bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
 		0;}),
 		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
 				 v.bv_offset, v.bv_len),
-		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
+		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
+		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len)
 	)
 
 	iov_iter_advance(i, bytes);
@@ -819,7 +877,9 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
 					 v.iov_base, v.iov_len),
 		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
 				 v.bv_offset, v.bv_len),
-		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
+		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
+		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len)
 	)
 
 	return bytes;
@@ -854,7 +914,9 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
 		memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page,
 				 v.bv_offset, v.bv_len),
 		memcpy_flushcache((to += v.iov_len) - v.iov_len, v.iov_base,
-			v.iov_len)
+			v.iov_len),
+		memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len)
 	)
 
 	return bytes;
@@ -878,7 +940,9 @@ bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
 		0;}),
 		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
 				 v.bv_offset, v.bv_len),
-		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
+		memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
+		memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len)
 	)
 
 	iov_iter_advance(i, bytes);
@@ -915,7 +979,7 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
 {
 	if (unlikely(!page_copy_sane(page, offset, bytes)))
 		return 0;
-	if (i->type & (ITER_BVEC|ITER_KVEC)) {
+	if (i->type & (ITER_BVEC | ITER_KVEC | ITER_XARRAY)) {
 		void *kaddr = kmap_atomic(page);
 		size_t wanted = copy_to_iter(kaddr + offset, bytes, i);
 		kunmap_atomic(kaddr);
@@ -938,7 +1002,7 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
 		WARN_ON(1);
 		return 0;
 	}
-	if (i->type & (ITER_BVEC|ITER_KVEC)) {
+	if (i->type & (ITER_BVEC | ITER_KVEC | ITER_XARRAY)) {
 		void *kaddr = kmap_atomic(page);
 		size_t wanted = _copy_from_iter(kaddr + offset, bytes, i);
 		kunmap_atomic(kaddr);
@@ -982,7 +1046,8 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
 	iterate_and_advance(i, bytes, v,
 		clear_user(v.iov_base, v.iov_len),
 		memzero_page(v.bv_page, v.bv_offset, v.bv_len),
-		memset(v.iov_base, 0, v.iov_len)
+		memset(v.iov_base, 0, v.iov_len),
+		memzero_page(v.bv_page, v.bv_offset, v.bv_len)
 	)
 
 	return bytes;
@@ -1006,7 +1071,9 @@ size_t iov_iter_copy_from_user_atomic(struct page *page,
 		copyin((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
 		memcpy_from_page((p += v.bv_len) - v.bv_len, v.bv_page,
 				 v.bv_offset, v.bv_len),
-		memcpy((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
+		memcpy((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
+		memcpy_from_page((p += v.bv_len) - v.bv_len, v.bv_page,
+				 v.bv_offset, v.bv_len)
 	)
 	kunmap_atomic(kaddr);
 	return bytes;
@@ -1077,7 +1144,12 @@ void iov_iter_advance(struct iov_iter *i, size_t size)
 		i->count -= size;
 		return;
 	}
-	iterate_and_advance(i, size, v, 0, 0, 0)
+	if (unlikely(iov_iter_is_xarray(i))) {
+		i->iov_offset += size;
+		i->count -= size;
+		return;
+	}
+	iterate_and_advance(i, size, v, 0, 0, 0, 0)
 }
 EXPORT_SYMBOL(iov_iter_advance);
 
@@ -1121,7 +1193,12 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll)
 		return;
 	}
 	unroll -= i->iov_offset;
-	if (iov_iter_is_bvec(i)) {
+	if (iov_iter_is_xarray(i)) {
+		BUG(); /* We should never go beyond the start of the specified
+			* range since we might then be straying into pages that
+			* aren't pinned.
+			*/
+	} else if (iov_iter_is_bvec(i)) {
 		const struct bio_vec *bvec = i->bvec;
 		while (1) {
 			size_t n = (--bvec)->bv_len;
@@ -1158,9 +1235,9 @@ size_t iov_iter_single_seg_count(const struct iov_iter *i)
 		return i->count;	// it is a silly place, anyway
 	if (i->nr_segs == 1)
 		return i->count;
-	if (unlikely(iov_iter_is_discard(i)))
+	if (unlikely(iov_iter_is_discard(i) || iov_iter_is_xarray(i)))
 		return i->count;
-	else if (iov_iter_is_bvec(i))
+	if (iov_iter_is_bvec(i))
 		return min(i->count, i->bvec->bv_len - i->iov_offset);
 	else
 		return min(i->count, i->iov->iov_len - i->iov_offset);
@@ -1208,6 +1285,31 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction,
 }
 EXPORT_SYMBOL(iov_iter_pipe);
 
+/**
+ * iov_iter_xarray - Initialise an I/O iterator to use the pages in an xarray
+ * @i: The iterator to initialise.
+ * @direction: The direction of the transfer.
+ * @xarray: The xarray to access.
+ * @start: The start file position.
+ * @count: The size of the I/O buffer in bytes.
+ *
+ * Set up an I/O iterator to either draw data out of the pages attached to an
+ * inode or to inject data into those pages.  The pages *must* be prevented
+ * from evaporation, either by taking a ref on them or locking them by the
+ * caller.
+ */
+void iov_iter_xarray(struct iov_iter *i, unsigned int direction,
+		     struct xarray *xarray, loff_t start, size_t count)
+{
+	BUG_ON(direction & ~1);
+	i->type = ITER_XARRAY | (direction & (READ | WRITE));
+	i->xarray = xarray;
+	i->xarray_start = start;
+	i->count = count;
+	i->iov_offset = 0;
+}
+EXPORT_SYMBOL(iov_iter_xarray);
+
 /**
  * iov_iter_discard - Initialise an I/O iterator that discards data
  * @i: The iterator to initialise.
@@ -1241,7 +1343,8 @@ unsigned long iov_iter_alignment(const struct iov_iter *i)
 	iterate_all_kinds(i, size, v,
 		(res |= (unsigned long)v.iov_base | v.iov_len, 0),
 		res |= v.bv_offset | v.bv_len,
-		res |= (unsigned long)v.iov_base | v.iov_len
+		res |= (unsigned long)v.iov_base | v.iov_len,
+		res |= v.bv_offset | v.bv_len
 	)
 	return res;
 }
@@ -1263,7 +1366,9 @@ unsigned long iov_iter_gap_alignment(const struct iov_iter *i)
 		(res |= (!res ? 0 : (unsigned long)v.bv_offset) |
 			(size != v.bv_len ? size : 0)),
 		(res |= (!res ? 0 : (unsigned long)v.iov_base) |
-			(size != v.iov_len ? size : 0))
+			(size != v.iov_len ? size : 0)),
+		(res |= (!res ? 0 : (unsigned long)v.bv_offset) |
+			(size != v.bv_len ? size : 0))
 		);
 	return res;
 }
@@ -1313,6 +1418,75 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
 	return __pipe_get_pages(i, min(maxsize, capacity), pages, iter_head, start);
 }
 
+static ssize_t iter_xarray_copy_pages(struct page **pages, struct xarray *xa,
+				       pgoff_t index, unsigned int nr_pages)
+{
+	XA_STATE(xas, xa, index);
+	struct page *page;
+	unsigned int ret = 0;
+
+	rcu_read_lock();
+	for (page = xas_load(&xas); page; page = xas_next(&xas)) {
+		if (xas_retry(&xas, page))
+			continue;
+
+		/* Has the page moved or been split? */
+		if (unlikely(page != xas_reload(&xas))) {
+			xas_reset(&xas);
+			continue;
+		}
+
+		pages[ret] = find_subpage(page, xas.xa_index);
+		get_page(pages[ret]);
+		if (++ret == nr_pages)
+			break;
+	}
+	rcu_read_unlock();
+	return ret;
+}
+
+static ssize_t iter_xarray_get_pages(struct iov_iter *i,
+				     struct page **pages, size_t maxsize,
+				     unsigned maxpages, size_t *_start_offset)
+{
+	unsigned nr, offset;
+	pgoff_t index, count;
+	size_t size = maxsize, actual;
+	loff_t pos;
+
+	if (!size || !maxpages)
+		return 0;
+
+	pos = i->xarray_start + i->iov_offset;
+	index = pos >> PAGE_SHIFT;
+	offset = pos & ~PAGE_MASK;
+	*_start_offset = offset;
+
+	count = 1;
+	if (size > PAGE_SIZE - offset) {
+		size -= PAGE_SIZE - offset;
+		count += size >> PAGE_SHIFT;
+		size &= ~PAGE_MASK;
+		if (size)
+			count++;
+	}
+
+	if (count > maxpages)
+		count = maxpages;
+
+	nr = iter_xarray_copy_pages(pages, i->xarray, index, count);
+	if (nr == 0)
+		return 0;
+
+	actual = PAGE_SIZE * nr;
+	actual -= offset;
+	if (nr == count && size > 0) {
+		unsigned last_offset = (nr > 1) ? 0 : offset;
+		actual -= PAGE_SIZE - (last_offset + size);
+	}
+	return actual;
+}
+
 ssize_t iov_iter_get_pages(struct iov_iter *i,
 		   struct page **pages, size_t maxsize, unsigned maxpages,
 		   size_t *start)
@@ -1322,6 +1496,8 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
 
 	if (unlikely(iov_iter_is_pipe(i)))
 		return pipe_get_pages(i, pages, maxsize, maxpages, start);
+	if (unlikely(iov_iter_is_xarray(i)))
+		return iter_xarray_get_pages(i, pages, maxsize, maxpages, start);
 	if (unlikely(iov_iter_is_discard(i)))
 		return -EFAULT;
 
@@ -1348,7 +1524,8 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
 		return v.bv_len;
 	}),({
 		return -EFAULT;
-	})
+	}),
+	0
 	)
 	return 0;
 }
@@ -1392,6 +1569,51 @@ static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
 	return n;
 }
 
+static ssize_t iter_xarray_get_pages_alloc(struct iov_iter *i,
+					   struct page ***pages, size_t maxsize,
+					   size_t *_start_offset)
+{
+	struct page **p;
+	unsigned nr, offset;
+	pgoff_t index, count;
+	size_t size = maxsize, actual;
+	loff_t pos;
+
+	if (!size)
+		return 0;
+
+	pos = i->xarray_start + i->iov_offset;
+	index = pos >> PAGE_SHIFT;
+	offset = pos & ~PAGE_MASK;
+	*_start_offset = offset;
+
+	count = 1;
+	if (size > PAGE_SIZE - offset) {
+		size -= PAGE_SIZE - offset;
+		count += size >> PAGE_SHIFT;
+		size &= ~PAGE_MASK;
+		if (size)
+			count++;
+	}
+
+	p = get_pages_array(count);
+	if (!p)
+		return -ENOMEM;
+	*pages = p;
+
+	nr = iter_xarray_copy_pages(p, i->xarray, index, count);
+	if (nr == 0)
+		return 0;
+
+	actual = PAGE_SIZE * nr;
+	actual -= offset;
+	if (nr == count && size > 0) {
+		unsigned last_offset = (nr > 1) ? 0 : offset;
+		actual -= PAGE_SIZE - (last_offset + size);
+	}
+	return actual;
+}
+
 ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 		   struct page ***pages, size_t maxsize,
 		   size_t *start)
@@ -1403,6 +1625,8 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 
 	if (unlikely(iov_iter_is_pipe(i)))
 		return pipe_get_pages_alloc(i, pages, maxsize, start);
+	if (unlikely(iov_iter_is_xarray(i)))
+		return iter_xarray_get_pages_alloc(i, pages, maxsize, start);
 	if (unlikely(iov_iter_is_discard(i)))
 		return -EFAULT;
 
@@ -1435,7 +1659,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 		return v.bv_len;
 	}),({
 		return -EFAULT;
-	})
+	}), 0
 	)
 	return 0;
 }
@@ -1473,6 +1697,13 @@ size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
 				      v.iov_base, v.iov_len,
 				      sum, off);
 		off += v.iov_len;
+	}), ({
+		char *p = kmap_atomic(v.bv_page);
+		sum = csum_and_memcpy((to += v.bv_len) - v.bv_len,
+				      p + v.bv_offset, v.bv_len,
+				      sum, off);
+		kunmap_atomic(p);
+		off += v.bv_len;
 	})
 	)
 	*csum = sum;
@@ -1514,6 +1745,13 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum,
 				      v.iov_base, v.iov_len,
 				      sum, off);
 		off += v.iov_len;
+	}), ({
+		char *p = kmap_atomic(v.bv_page);
+		sum = csum_and_memcpy((to += v.bv_len) - v.bv_len,
+				      p + v.bv_offset, v.bv_len,
+				      sum, off);
+		kunmap_atomic(p);
+		off += v.bv_len;
 	})
 	)
 	*csum = sum;
@@ -1559,6 +1797,13 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump,
 				     (from += v.iov_len) - v.iov_len,
 				     v.iov_len, sum, off);
 		off += v.iov_len;
+	}), ({
+		char *p = kmap_atomic(v.bv_page);
+		sum = csum_and_memcpy(p + v.bv_offset,
+				      (from += v.bv_len) - v.bv_len,
+				      v.bv_len, sum, off);
+		kunmap_atomic(p);
+		off += v.bv_len;
 	})
 	)
 	*csum = sum;
@@ -1608,6 +1853,21 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages)
 		npages = pipe_space_for_user(iter_head, pipe->tail, pipe);
 		if (npages >= maxpages)
 			return maxpages;
+	} else if (unlikely(iov_iter_is_xarray(i))) {
+		unsigned offset;
+
+		offset = (i->xarray_start + i->iov_offset) & ~PAGE_MASK;
+
+		npages = 1;
+		if (size > PAGE_SIZE - offset) {
+			size -= PAGE_SIZE - offset;
+			npages += size >> PAGE_SHIFT;
+			size &= ~PAGE_MASK;
+			if (size)
+				npages++;
+		}
+		if (npages >= maxpages)
+			return maxpages;
 	} else iterate_all_kinds(i, size, v, ({
 		unsigned long p = (unsigned long)v.iov_base;
 		npages += DIV_ROUND_UP(p + v.iov_len, PAGE_SIZE)
@@ -1624,7 +1884,8 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages)
 			- p / PAGE_SIZE;
 		if (npages >= maxpages)
 			return maxpages;
-	})
+	}),
+	0
 	)
 	return npages;
 }
@@ -1637,7 +1898,7 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
 		WARN_ON(1);
 		return NULL;
 	}
-	if (unlikely(iov_iter_is_discard(new)))
+	if (unlikely(iov_iter_is_discard(new) || iov_iter_is_xarray(new)))
 		return NULL;
 	if (iov_iter_is_bvec(new))
 		return new->bvec = kmemdup(new->bvec,
@@ -1842,7 +2103,12 @@ int iov_iter_for_each_range(struct iov_iter *i, size_t bytes,
 		kunmap(v.bv_page);
 		err;}), ({
 		w = v;
-		err = f(&w, context);})
+		err = f(&w, context);}), ({
+		w.iov_base = kmap(v.bv_page) + v.bv_offset;
+		w.iov_len = v.bv_len;
+		err = f(&w, context);
+		kunmap(v.bv_page);
+		err;})
 	)
 	return err;
 }



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 18/76] vm: Add wait/unlock functions for PG_fscache
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (16 preceding siblings ...)
  2020-11-20 15:06 ` [RFC PATCH 17/76] iov_iter: Add ITER_XARRAY David Howells
@ 2020-11-20 15:06 ` David Howells
  2020-11-20 15:06 ` [RFC PATCH 19/76] mm: Implement readahead_control pageset expansion David Howells
                   ` (56 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:06 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Add functions to unlock and wait for unlock of PG_fscache analogously with
those for PG_lock.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 include/linux/pagemap.h |   14 ++++++++++++++
 mm/filemap.c            |   18 ++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e1e19c1f9ec9..67fe07096103 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -591,6 +591,7 @@ extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
 extern void unlock_page(struct page *page);
+extern void unlock_page_fscache(struct page *page);
 
 /*
  * Return true if the page was successfully locked
@@ -681,6 +682,19 @@ static inline int wait_on_page_locked_killable(struct page *page)
 	return wait_on_page_bit_killable(compound_head(page), PG_locked);
 }
 
+/**
+ * wait_on_page_fscache - Wait for PG_fscache to be cleared on a page
+ * @page: The page
+ *
+ * Wait for the fscache mark to be removed from a page, usually signifying the
+ * completion of a write from that page to the cache.
+ */
+static inline void wait_on_page_fscache(struct page *page)
+{
+	if (PagePrivate2(page))
+		wait_on_page_bit(compound_head(page), PG_fscache);
+}
+
 extern void put_and_wait_on_page_locked(struct page *page);
 
 void wait_on_page_writeback(struct page *page);
diff --git a/mm/filemap.c b/mm/filemap.c
index d5e7c2029d16..cfb753955e36 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1466,6 +1466,24 @@ void unlock_page(struct page *page)
 }
 EXPORT_SYMBOL(unlock_page);
 
+/**
+ * unlock_page_fscache - Unlock a page pinned with PG_fscache
+ * @page: The page
+ *
+ * Unlocks the page and wakes up sleepers in wait_on_page_fscache().  Also
+ * wakes those waiting for the lock and writeback bits because the wakeup
+ * mechanism is shared.  But that's OK - those sleepers will just go back to
+ * sleep.
+ */
+void unlock_page_fscache(struct page *page)
+{
+	page = compound_head(page);
+	VM_BUG_ON_PAGE(!PagePrivate2(page), page);
+	clear_bit_unlock(PG_fscache, &page->flags);
+	wake_up_page_bit(page, PG_fscache);
+}
+EXPORT_SYMBOL(unlock_page_fscache);
+
 /**
  * end_page_writeback - end writeback against a page
  * @page: the page



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 19/76] mm: Implement readahead_control pageset expansion
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (17 preceding siblings ...)
  2020-11-20 15:06 ` [RFC PATCH 18/76] vm: Add wait/unlock functions for PG_fscache David Howells
@ 2020-11-20 15:06 ` David Howells
  2020-11-20 15:06 ` [RFC PATCH 20/76] mm: Stop generic_file_buffered_read() from grabbing a superfluous page David Howells
                   ` (55 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:06 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: Matthew Wilcox (Oracle),
	dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Provide a function, readahead_expand(), that expands the set of pages
specified by a readahead_control object to encompass a revised area with a
proposed size and length.

The proposed area must include all of the old area and may be expanded yet
more by this function so that the edges align on (transparent huge) page
boundaries as allocated.

The expansion will be cut short if a page already exists in either of the
areas being expanded into.  Note that any expansion made in such a case is
not rolled back.

This will be used by fscache so that reads can be expanded to cache granule
boundaries, thereby allowing whole granules to be stored in the cache, but
there are other potential users also.

Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
---

 include/linux/pagemap.h |    2 +
 mm/readahead.c          |   70 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 67fe07096103..852e3dd59aa0 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -774,6 +774,8 @@ extern void __delete_from_page_cache(struct page *page, void *shadow);
 int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask);
 void delete_from_page_cache_batch(struct address_space *mapping,
 				  struct pagevec *pvec);
+void readahead_expand(struct readahead_control *ractl,
+		      loff_t new_start, size_t new_len);
 
 /*
  * Like add_to_page_cache_locked, but used to add newly allocated pages:
diff --git a/mm/readahead.c b/mm/readahead.c
index c5b0457415be..4446dada0bc2 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -638,3 +638,73 @@ SYSCALL_DEFINE3(readahead, int, fd, loff_t, offset, size_t, count)
 {
 	return ksys_readahead(fd, offset, count);
 }
+
+/**
+ * readahead_expand - Expand a readahead request
+ * @ractl: The request to be expanded
+ * @new_start: The revised start
+ * @new_len: The revised size of the request
+ *
+ * Attempt to expand a readahead request outwards from the current size to the
+ * specified size by inserting locked pages before and after the current window
+ * to increase the size to the new window.  This may involve the insertion of
+ * THPs, in which case the window may get expanded even beyond what was
+ * requested.
+ *
+ * The algorithm will stop if it encounters a conflicting page already in the
+ * pagecache and leave a smaller expansion than requested.
+ *
+ * The caller must check for this by examining the revised @ractl object for a
+ * different expansion than was requested.
+ */
+void readahead_expand(struct readahead_control *ractl,
+		      loff_t new_start, size_t new_len)
+{
+	struct address_space *mapping = ractl->mapping;
+	pgoff_t new_index, new_nr_pages;
+	gfp_t gfp_mask = readahead_gfp_mask(mapping);
+
+	new_index = new_start / PAGE_SIZE;
+
+	/* Expand the leading edge downwards */
+	while (ractl->_index > new_index) {
+		unsigned long index = ractl->_index - 1;
+		struct page *page = xa_load(&mapping->i_pages, index);
+
+		if (page && !xa_is_value(page))
+			return; /* Page apparently present */
+
+		page = __page_cache_alloc(gfp_mask);
+		if (!page)
+			return;
+		if (add_to_page_cache_lru(page, mapping, index, gfp_mask) < 0) {
+			put_page(page);
+			return;
+		}
+
+		ractl->_nr_pages++;
+		ractl->_index = page->index;
+	}
+
+	new_len += new_start - readahead_pos(ractl);
+	new_nr_pages = DIV_ROUND_UP(new_len, PAGE_SIZE);
+
+	/* Expand the trailing edge upwards */
+	while (ractl->_nr_pages < new_nr_pages) {
+		unsigned long index = ractl->_index + ractl->_nr_pages;
+		struct page *page = xa_load(&mapping->i_pages, index);
+
+		if (page && !xa_is_value(page))
+			return; /* Page apparently present */
+
+		page = __page_cache_alloc(gfp_mask);
+		if (!page)
+			return;
+		if (add_to_page_cache_lru(page, mapping, index, gfp_mask) < 0) {
+			put_page(page);
+			return;
+		}
+		ractl->_nr_pages++;
+	}
+}
+EXPORT_SYMBOL(readahead_expand);



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 20/76] mm: Stop generic_file_buffered_read() from grabbing a superfluous page
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (18 preceding siblings ...)
  2020-11-20 15:06 ` [RFC PATCH 19/76] mm: Implement readahead_control pageset expansion David Howells
@ 2020-11-20 15:06 ` David Howells
  2020-11-20 15:07 ` [RFC PATCH 21/76] vfs: Export rw_verify_area() for use by cachefiles David Howells
                   ` (54 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:06 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Under some circumstances, generic_file_buffered_read() will allocate
sufficient pages to read to the end of the file, call readahead/readpages
on them and copy the data over - and then it will allocate another page at
the EOF and call readpage on that and then ignore it.  This is unnecessary
and a waste of time and resources.

Catch the overallocation in the "no_cached_page:" part and prevent it from
happening.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 mm/filemap.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index cfb753955e36..5a63aa1dd71e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2447,6 +2447,8 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb,
 		 * Ok, it wasn't cached, so we need to create a new
 		 * page..
 		 */
+		if ((index << PAGE_SHIFT) >= i_size_read(inode))
+			goto out;
 		page = page_cache_alloc(mapping);
 		if (!page) {
 			error = -ENOMEM;



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 21/76] vfs: Export rw_verify_area() for use by cachefiles
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (19 preceding siblings ...)
  2020-11-20 15:06 ` [RFC PATCH 20/76] mm: Stop generic_file_buffered_read() from grabbing a superfluous page David Howells
@ 2020-11-20 15:07 ` David Howells
  2020-11-20 15:07 ` [RFC PATCH 22/76] vfs: Provide S_CACHE_FILE inode flag David Howells
                   ` (53 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:07 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Export rw_verify_area() for so that cachefiles can use it before issuing
call_read_iter() and call_write_iter() to effect async DIO operations
against the cache.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/internal.h      |    5 -----
 fs/read_write.c    |    1 +
 include/linux/fs.h |    1 +
 3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index a7cd0f64faa4..0a536f21f498 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -165,11 +165,6 @@ extern char *simple_dname(struct dentry *, char *, int);
 extern void dput_to_list(struct dentry *, struct list_head *);
 extern void shrink_dentry_list(struct list_head *);
 
-/*
- * read_write.c
- */
-extern int rw_verify_area(int, struct file *, const loff_t *, size_t);
-
 /*
  * pipe.c
  */
diff --git a/fs/read_write.c b/fs/read_write.c
index 75f764b43418..fe84e11245bd 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -400,6 +400,7 @@ int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t
 	return security_file_permission(file,
 				read_write == READ ? MAY_READ : MAY_WRITE);
 }
+EXPORT_SYMBOL(rw_verify_area);
 
 static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8667d0cdc71e..52fb357579c8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2762,6 +2762,7 @@ extern int notify_change(struct dentry *, struct iattr *, struct inode **);
 extern int inode_permission(struct inode *, int);
 extern int generic_permission(struct inode *, int);
 extern int __check_sticky(struct inode *dir, struct inode *inode);
+extern int rw_verify_area(int, struct file *, const loff_t *, size_t);
 
 static inline bool execute_ok(struct inode *inode)
 {



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 22/76] vfs: Provide S_CACHE_FILE inode flag
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (20 preceding siblings ...)
  2020-11-20 15:07 ` [RFC PATCH 21/76] vfs: Export rw_verify_area() for use by cachefiles David Howells
@ 2020-11-20 15:07 ` David Howells
  2020-11-20 15:07 ` [RFC PATCH 23/76] cachefiles: Remove tree of active files and use " David Howells
                   ` (52 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:07 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Provide an S_CACHE_FILE inode flag that cachefiles can set to ward off
other kernel services and drivers (including itself) from using its cache
files.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 include/linux/fs.h |    1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 52fb357579c8..f6dad577ea07 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1991,6 +1991,7 @@ struct super_operations {
 #define S_ENCRYPTED	(1 << 14) /* Encrypted file (using fs/crypto/) */
 #define S_CASEFOLD	(1 << 15) /* Casefolded file */
 #define S_VERITY	(1 << 16) /* Verity file (using fs/verity/) */
+#define S_CACHE_FILE	(1 << 17) /* File is in use as cache file (eg. fs/cachefiles) */
 
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 23/76] cachefiles: Remove tree of active files and use S_CACHE_FILE inode flag
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (21 preceding siblings ...)
  2020-11-20 15:07 ` [RFC PATCH 22/76] vfs: Provide S_CACHE_FILE inode flag David Howells
@ 2020-11-20 15:07 ` David Howells
  2020-11-20 15:07 ` [RFC PATCH 24/76] fscache: Provide a simple thread pool for running ops asynchronously David Howells
                   ` (51 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:07 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Remove the tree of active dentries from the cachefiles_cache struct and
instead set a flag, S_CACHE_FILE, on the backing inode to indicate that
this file is in use by the kernel so as to ward off other kernel users.

This simplifies the code a lot and also prevents two overlain caches from
fighting with each other.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/daemon.c            |    4 
 fs/cachefiles/interface.c         |   20 --
 fs/cachefiles/internal.h          |   10 -
 fs/cachefiles/namei.c             |  375 +++++++------------------------------
 include/trace/events/cachefiles.h |   29 ---
 5 files changed, 78 insertions(+), 360 deletions(-)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 752c1e43416f..8a937d6d5e22 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -102,8 +102,6 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
 	}
 
 	mutex_init(&cache->daemon_mutex);
-	cache->active_nodes = RB_ROOT;
-	rwlock_init(&cache->active_lock);
 	init_waitqueue_head(&cache->daemon_pollwq);
 
 	/* set default caching limits
@@ -138,8 +136,6 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
 
 	cachefiles_daemon_unbind(cache);
 
-	ASSERT(!cache->active_nodes.rb_node);
-
 	/* clean up the control file interface */
 	cache->cachefilesd = NULL;
 	file->private_data = NULL;
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 7e10793352fd..dc2cfbed4250 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -36,7 +36,6 @@ static struct fscache_object *cachefiles_alloc_object(
 
 	ASSERTCMP(object->backer, ==, NULL);
 
-	BUG_ON(test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags));
 	atomic_set(&object->usage, 1);
 
 	fscache_object_init(&object->fscache, cookie, &cache->cache);
@@ -74,7 +73,6 @@ static struct fscache_object *cachefiles_alloc_object(
 nomem_key:
 	kfree(buffer);
 nomem_buffer:
-	BUG_ON(test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags));
 	kmem_cache_free(cachefiles_object_jar, object);
 	fscache_object_destroyed(&cache->cache);
 nomem_object:
@@ -190,8 +188,6 @@ static void cachefiles_drop_object(struct fscache_object *_object)
 	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
 	const struct cred *saved_cred;
-	struct inode *inode;
-	blkcnt_t i_blocks = 0;
 
 	ASSERT(_object);
 
@@ -218,10 +214,6 @@ static void cachefiles_drop_object(struct fscache_object *_object)
 		    _object != cache->cache.fsdef
 		    ) {
 			_debug("- retire object OBJ%x", object->fscache.debug_id);
-			inode = d_backing_inode(object->dentry);
-			if (inode)
-				i_blocks = inode->i_blocks;
-
 			cachefiles_begin_secure(cache, &saved_cred);
 			cachefiles_delete_object(cache, object);
 			cachefiles_end_secure(cache, saved_cred);
@@ -231,14 +223,11 @@ static void cachefiles_drop_object(struct fscache_object *_object)
 		if (object->backer != object->dentry)
 			dput(object->backer);
 		object->backer = NULL;
-	}
 
-	/* note that the object is now inactive */
-	if (test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags))
-		cachefiles_mark_object_inactive(cache, object, i_blocks);
-
-	dput(object->dentry);
-	object->dentry = NULL;
+		cachefiles_unmark_inode_in_use(object, object->dentry);
+		dput(object->dentry);
+		object->dentry = NULL;
+	}
 
 	_leave("");
 }
@@ -274,7 +263,6 @@ static void cachefiles_put_object(struct fscache_object *_object,
 	if (u == 0) {
 		_debug("- kill object OBJ%x", object->fscache.debug_id);
 
-		ASSERT(!test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags));
 		ASSERTCMP(object->fscache.parent, ==, NULL);
 		ASSERTCMP(object->backer, ==, NULL);
 		ASSERTCMP(object->dentry, ==, NULL);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index a5d48f271ce1..f8f308ce7385 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -38,12 +38,9 @@ struct cachefiles_object {
 	struct dentry			*dentry;	/* the file/dir representing this object */
 	struct dentry			*backer;	/* backing file */
 	loff_t				i_size;		/* object size */
-	unsigned long			flags;
-#define CACHEFILES_OBJECT_ACTIVE	0		/* T if marked active */
 	atomic_t			usage;		/* object usage count */
 	uint8_t				type;		/* object type */
 	uint8_t				new;		/* T if object new */
-	struct rb_node			active_node;	/* link in active tree (dentry is key) */
 };
 
 extern struct kmem_cache *cachefiles_object_jar;
@@ -59,8 +56,6 @@ struct cachefiles_cache {
 	const struct cred		*cache_cred;	/* security override for accessing cache */
 	struct mutex			daemon_mutex;	/* command serialisation mutex */
 	wait_queue_head_t		daemon_pollwq;	/* poll waitqueue for daemon */
-	struct rb_root			active_nodes;	/* active nodes (can't be culled) */
-	rwlock_t			active_lock;	/* lock for active_nodes */
 	atomic_t			gravecounter;	/* graveyard uniquifier */
 	atomic_t			f_released;	/* number of objects released lately */
 	atomic_long_t			b_released;	/* number of blocks released lately */
@@ -126,9 +121,8 @@ extern char *cachefiles_cook_key(const u8 *raw, int keylen, uint8_t type);
 /*
  * namei.c
  */
-extern void cachefiles_mark_object_inactive(struct cachefiles_cache *cache,
-					    struct cachefiles_object *object,
-					    blkcnt_t i_blocks);
+extern void cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
+				    struct dentry *dentry);
 extern int cachefiles_delete_object(struct cachefiles_cache *cache,
 				    struct cachefiles_object *object);
 extern int cachefiles_walk_to_object(struct cachefiles_object *parent,
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 2f37451691b0..1fb011474765 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -21,251 +21,51 @@
 #define CACHEFILES_KEYBUF_SIZE 512
 
 /*
- * dump debugging info about an object
+ * Mark the backing file as being a cache file if it's not already in use so.
  */
-static noinline
-void __cachefiles_printk_object(struct cachefiles_object *object,
-				const char *prefix)
+static bool cachefiles_mark_inode_in_use(struct cachefiles_object *object,
+					 struct dentry *dentry)
 {
-	struct fscache_cookie *cookie;
-	const u8 *k;
-	unsigned loop;
-
-	pr_err("%sobject: OBJ%x\n", prefix, object->fscache.debug_id);
-	pr_err("%sobjstate=%s fl=%lx wbusy=%x ev=%lx[%lx]\n",
-	       prefix, object->fscache.state->name,
-	       object->fscache.flags, work_busy(&object->fscache.work),
-	       object->fscache.events, object->fscache.event_mask);
-	pr_err("%sops=%u\n",
-	       prefix, object->fscache.n_ops);
-	pr_err("%sparent=%x\n",
-	       prefix, object->fscache.parent ? object->fscache.parent->debug_id : 0);
-
-	spin_lock(&object->fscache.lock);
-	cookie = object->fscache.cookie;
-	if (cookie) {
-		pr_err("%scookie=%x [pr=%x fl=%lx]\n",
-		       prefix,
-		       cookie->debug_id,
-		       cookie->parent ? cookie->parent->debug_id : 0,
-		       cookie->flags);
-		pr_err("%skey=[%u] '", prefix, cookie->key_len);
-		k = (cookie->key_len <= sizeof(cookie->inline_key)) ?
-			cookie->inline_key : cookie->key;
-		for (loop = 0; loop < cookie->key_len; loop++)
-			pr_cont("%02x", k[loop]);
-		pr_cont("'\n");
-	} else {
-		pr_err("%scookie=NULL\n", prefix);
-	}
-	spin_unlock(&object->fscache.lock);
-}
-
-/*
- * dump debugging info about a pair of objects
- */
-static noinline void cachefiles_printk_object(struct cachefiles_object *object,
-					      struct cachefiles_object *xobject)
-{
-	if (object)
-		__cachefiles_printk_object(object, "");
-	if (xobject)
-		__cachefiles_printk_object(xobject, "x");
-}
-
-/*
- * mark the owner of a dentry, if there is one, to indicate that that dentry
- * has been preemptively deleted
- * - the caller must hold the i_mutex on the dentry's parent as required to
- *   call vfs_unlink(), vfs_rmdir() or vfs_rename()
- */
-static void cachefiles_mark_object_buried(struct cachefiles_cache *cache,
-					  struct dentry *dentry,
-					  enum fscache_why_object_killed why)
-{
-	struct cachefiles_object *object;
-	struct rb_node *p;
-
-	_enter(",'%pd'", dentry);
-
-	write_lock(&cache->active_lock);
-
-	p = cache->active_nodes.rb_node;
-	while (p) {
-		object = rb_entry(p, struct cachefiles_object, active_node);
-		if (object->dentry > dentry)
-			p = p->rb_left;
-		else if (object->dentry < dentry)
-			p = p->rb_right;
-		else
-			goto found_dentry;
-	}
-
-	write_unlock(&cache->active_lock);
-	trace_cachefiles_mark_buried(NULL, dentry, why);
-	_leave(" [no owner]");
-	return;
+	struct inode *inode = d_backing_inode(dentry);
+	bool can_use = false;
 
-	/* found the dentry for  */
-found_dentry:
-	kdebug("preemptive burial: OBJ%x [%s] %pd",
-	       object->fscache.debug_id,
-	       object->fscache.state->name,
-	       dentry);
+	_enter(",%x", object->fscache.debug_id);
 
-	trace_cachefiles_mark_buried(object, dentry, why);
+	inode_lock(inode);
 
-	if (fscache_object_is_live(&object->fscache)) {
-		pr_err("\n");
-		pr_err("Error: Can't preemptively bury live object\n");
-		cachefiles_printk_object(object, NULL);
+	if (!(inode->i_flags & S_CACHE_FILE)) {
+		inode->i_flags |= S_CACHE_FILE;
+		trace_cachefiles_mark_active(object, dentry);
+		can_use = true;
 	} else {
-		if (why != FSCACHE_OBJECT_IS_STALE)
-			fscache_object_mark_killed(&object->fscache, why);
+		pr_notice("cachefiles: Inode already in use: %pd\n", dentry);
 	}
 
-	write_unlock(&cache->active_lock);
-	_leave(" [owner marked]");
+	inode_unlock(inode);
+	return can_use;
 }
 
 /*
- * record the fact that an object is now active
+ * Unmark a backing inode.
  */
-static int cachefiles_mark_object_active(struct cachefiles_cache *cache,
-					 struct cachefiles_object *object)
+void cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
+				    struct dentry *dentry)
 {
-	struct cachefiles_object *xobject;
-	struct rb_node **_p, *_parent = NULL;
-	struct dentry *dentry;
-
-	_enter(",%x", object->fscache.debug_id);
-
-try_again:
-	write_lock(&cache->active_lock);
-
-	dentry = object->dentry;
-	trace_cachefiles_mark_active(object, dentry);
-
-	if (test_and_set_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags)) {
-		pr_err("Error: Object already active\n");
-		cachefiles_printk_object(object, NULL);
-		BUG();
-	}
-
-	_p = &cache->active_nodes.rb_node;
-	while (*_p) {
-		_parent = *_p;
-		xobject = rb_entry(_parent,
-				   struct cachefiles_object, active_node);
-
-		ASSERT(xobject != object);
-
-		if (xobject->dentry > dentry)
-			_p = &(*_p)->rb_left;
-		else if (xobject->dentry < dentry)
-			_p = &(*_p)->rb_right;
-		else
-			goto wait_for_old_object;
-	}
-
-	rb_link_node(&object->active_node, _parent, _p);
-	rb_insert_color(&object->active_node, &cache->active_nodes);
-
-	write_unlock(&cache->active_lock);
-	_leave(" = 0");
-	return 0;
-
-	/* an old object from a previous incarnation is hogging the slot - we
-	 * need to wait for it to be destroyed */
-wait_for_old_object:
-	trace_cachefiles_wait_active(object, dentry, xobject);
-	clear_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags);
-
-	if (fscache_object_is_live(&xobject->fscache)) {
-		pr_err("\n");
-		pr_err("Error: Unexpected object collision\n");
-		cachefiles_printk_object(object, xobject);
-	}
-	atomic_inc(&xobject->usage);
-	write_unlock(&cache->active_lock);
-
-	if (test_bit(CACHEFILES_OBJECT_ACTIVE, &xobject->flags)) {
-		wait_queue_head_t *wq;
-
-		signed long timeout = 60 * HZ;
-		wait_queue_entry_t wait;
-		bool requeue;
-
-		/* if the object we're waiting for is queued for processing,
-		 * then just put ourselves on the queue behind it */
-		if (work_pending(&xobject->fscache.work)) {
-			_debug("queue OBJ%x behind OBJ%x immediately",
-			       object->fscache.debug_id,
-			       xobject->fscache.debug_id);
-			goto requeue;
-		}
-
-		/* otherwise we sleep until either the object we're waiting for
-		 * is done, or the fscache_object is congested */
-		wq = bit_waitqueue(&xobject->flags, CACHEFILES_OBJECT_ACTIVE);
-		init_wait(&wait);
-		requeue = false;
-		do {
-			prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
-			if (!test_bit(CACHEFILES_OBJECT_ACTIVE, &xobject->flags))
-				break;
-
-			requeue = fscache_object_sleep_till_congested(&timeout);
-		} while (timeout > 0 && !requeue);
-		finish_wait(wq, &wait);
-
-		if (requeue &&
-		    test_bit(CACHEFILES_OBJECT_ACTIVE, &xobject->flags)) {
-			_debug("queue OBJ%x behind OBJ%x after wait",
-			       object->fscache.debug_id,
-			       xobject->fscache.debug_id);
-			goto requeue;
-		}
-
-		if (timeout <= 0) {
-			pr_err("\n");
-			pr_err("Error: Overlong wait for old active object to go away\n");
-			cachefiles_printk_object(object, xobject);
-			goto requeue;
-		}
-	}
-
-	ASSERT(!test_bit(CACHEFILES_OBJECT_ACTIVE, &xobject->flags));
-
-	cache->cache.ops->put_object(&xobject->fscache,
-		(enum fscache_obj_ref_trace)cachefiles_obj_put_wait_retry);
-	goto try_again;
+	struct inode *inode = d_backing_inode(dentry);
 
-requeue:
-	cache->cache.ops->put_object(&xobject->fscache,
-		(enum fscache_obj_ref_trace)cachefiles_obj_put_wait_timeo);
-	_leave(" = -ETIMEDOUT");
-	return -ETIMEDOUT;
+	inode_lock(inode);
+	inode->i_flags &= ~S_CACHE_FILE;
+	inode_unlock(inode);
+	trace_cachefiles_mark_inactive(object, dentry, inode);
 }
 
 /*
  * Mark an object as being inactive.
  */
-void cachefiles_mark_object_inactive(struct cachefiles_cache *cache,
-				     struct cachefiles_object *object,
-				     blkcnt_t i_blocks)
+static void cachefiles_mark_object_inactive(struct cachefiles_cache *cache,
+					    struct cachefiles_object *object)
 {
-	struct dentry *dentry = object->dentry;
-	struct inode *inode = d_backing_inode(dentry);
-
-	trace_cachefiles_mark_inactive(object, dentry, inode);
-
-	write_lock(&cache->active_lock);
-	rb_erase(&object->active_node, &cache->active_nodes);
-	clear_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags);
-	write_unlock(&cache->active_lock);
-
-	wake_up_bit(&object->flags, CACHEFILES_OBJECT_ACTIVE);
+	blkcnt_t i_blocks = d_backing_inode(object->dentry)->i_blocks;
 
 	/* This object can now be culled, so we need to let the daemon know
 	 * that there is something it can remove if it needs to.
@@ -286,7 +86,6 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
 				  struct cachefiles_object *object,
 				  struct dentry *dir,
 				  struct dentry *rep,
-				  bool preemptive,
 				  enum fscache_why_object_killed why)
 {
 	struct dentry *grave, *trap;
@@ -308,9 +107,6 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
 		} else {
 			trace_cachefiles_unlink(object, rep, why);
 			ret = vfs_unlink(d_inode(dir), rep, NULL);
-
-			if (preemptive)
-				cachefiles_mark_object_buried(cache, rep, why);
 		}
 
 		inode_unlock(d_inode(dir));
@@ -371,8 +167,7 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
 			return -ENOMEM;
 		}
 
-		cachefiles_io_error(cache, "Lookup error %ld",
-				    PTR_ERR(grave));
+		cachefiles_io_error(cache, "Lookup error %ld", PTR_ERR(grave));
 		return -EIO;
 	}
 
@@ -414,9 +209,6 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
 		if (ret != 0 && ret != -ENOMEM)
 			cachefiles_io_error(cache,
 					    "Rename failed with error %d", ret);
-
-		if (preemptive)
-			cachefiles_mark_object_buried(cache, rep, why);
 	}
 
 	unlock_rename(cache->graveyard, dir);
@@ -444,26 +236,18 @@ int cachefiles_delete_object(struct cachefiles_cache *cache,
 
 	inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
 
-	if (test_bit(FSCACHE_OBJECT_KILLED_BY_CACHE, &object->fscache.flags)) {
-		/* object allocation for the same key preemptively deleted this
-		 * object's file so that it could create its own file */
-		_debug("object preemptively buried");
+	/* We need to check that our parent is _still_ our parent - it may have
+	 * been renamed.
+	 */
+	if (dir == object->dentry->d_parent) {
+		ret = cachefiles_bury_object(cache, object, dir, object->dentry,
+					     FSCACHE_OBJECT_WAS_RETIRED);
+	} else {
+		/* It got moved, presumably by cachefilesd culling it, so it's
+		 * no longer in the key path and we can ignore it.
+		 */
 		inode_unlock(d_inode(dir));
 		ret = 0;
-	} else {
-		/* we need to check that our parent is _still_ our parent - it
-		 * may have been renamed */
-		if (dir == object->dentry->d_parent) {
-			ret = cachefiles_bury_object(cache, object, dir,
-						     object->dentry, false,
-						     FSCACHE_OBJECT_WAS_RETIRED);
-		} else {
-			/* it got moved, presumably by cachefilesd culling it,
-			 * so it's no longer in the key path and we can ignore
-			 * it */
-			inode_unlock(d_inode(dir));
-			ret = 0;
-		}
 	}
 
 	dput(dir);
@@ -485,6 +269,7 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	struct path path;
 	unsigned long start;
 	const char *name;
+	bool marked = false;
 	int ret, nlen;
 
 	_enter("OBJ%x{%pd},OBJ%x,%s,",
@@ -527,6 +312,7 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	cachefiles_hist(cachefiles_lookup_histogram, start);
 	if (IS_ERR(next)) {
 		trace_cachefiles_lookup(object, next, NULL);
+		ret = PTR_ERR(next);
 		goto lookup_error;
 	}
 
@@ -626,6 +412,13 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 	/* we've found the object we were looking for */
 	object->dentry = next;
 
+	/* note that we're now using this object */
+	if (!cachefiles_mark_inode_in_use(object, object->dentry)) {
+		ret = -EBUSY;
+		goto check_error_unlock;
+	}
+	marked = true;
+
 	/* if we've found that the terminal object exists, then we need to
 	 * check its attributes and delete it if it's out of date */
 	if (!object->new) {
@@ -638,13 +431,12 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 			object->dentry = NULL;
 
 			ret = cachefiles_bury_object(cache, object, dir, next,
-						     true,
 						     FSCACHE_OBJECT_IS_STALE);
 			dput(next);
 			next = NULL;
 
 			if (ret < 0)
-				goto delete_error;
+				goto error_out2;
 
 			_debug("redo lookup");
 			fscache_object_retrying_stale(&object->fscache);
@@ -652,16 +444,10 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 		}
 	}
 
-	/* note that we're now using this object */
-	ret = cachefiles_mark_object_active(cache, object);
-
 	inode_unlock(d_inode(dir));
 	dput(dir);
 	dir = NULL;
 
-	if (ret == -ETIMEDOUT)
-		goto mark_active_timed_out;
-
 	_debug("=== OBTAINED_OBJECT ===");
 
 	if (object->new) {
@@ -710,26 +496,19 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 		cachefiles_io_error(cache, "Create/mkdir failed");
 	goto error;
 
-mark_active_timed_out:
-	_debug("mark active timed out");
-	goto release_dentry;
-
+check_error_unlock:
+	inode_unlock(d_inode(dir));
+	dput(dir);
 check_error:
-	_debug("check error %d", ret);
-	cachefiles_mark_object_inactive(
-		cache, object, d_backing_inode(object->dentry)->i_blocks);
-release_dentry:
+	if (marked)
+		cachefiles_unmark_inode_in_use(object, object->dentry);
+	cachefiles_mark_object_inactive(cache, object);
 	dput(object->dentry);
 	object->dentry = NULL;
 	goto error_out;
 
-delete_error:
-	_debug("delete error %d", ret);
-	goto error_out2;
-
 lookup_error:
-	_debug("lookup error %ld", PTR_ERR(next));
-	ret = PTR_ERR(next);
+	_debug("lookup error %d", ret);
 	if (ret == -EIO)
 		cachefiles_io_error(cache, "Lookup failed");
 	next = NULL;
@@ -857,8 +636,6 @@ static struct dentry *cachefiles_check_active(struct cachefiles_cache *cache,
 					      struct dentry *dir,
 					      char *filename)
 {
-	struct cachefiles_object *object;
-	struct rb_node *_n;
 	struct dentry *victim;
 	unsigned long start;
 	int ret;
@@ -888,34 +665,9 @@ static struct dentry *cachefiles_check_active(struct cachefiles_cache *cache,
 		return ERR_PTR(-ENOENT);
 	}
 
-	/* check to see if we're using this object */
-	read_lock(&cache->active_lock);
-
-	_n = cache->active_nodes.rb_node;
-
-	while (_n) {
-		object = rb_entry(_n, struct cachefiles_object, active_node);
-
-		if (object->dentry > victim)
-			_n = _n->rb_left;
-		else if (object->dentry < victim)
-			_n = _n->rb_right;
-		else
-			goto object_in_use;
-	}
-
-	read_unlock(&cache->active_lock);
-
 	//_leave(" = %pd", victim);
 	return victim;
 
-object_in_use:
-	read_unlock(&cache->active_lock);
-	inode_unlock(d_inode(dir));
-	dput(victim);
-	//_leave(" = -EBUSY [in use]");
-	return ERR_PTR(-EBUSY);
-
 lookup_error:
 	inode_unlock(d_inode(dir));
 	ret = PTR_ERR(victim);
@@ -944,6 +696,7 @@ int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
 		    char *filename)
 {
 	struct dentry *victim;
+	struct inode *inode;
 	int ret;
 
 	_enter(",%pd/,%s", dir, filename);
@@ -952,6 +705,19 @@ int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
 	if (IS_ERR(victim))
 		return PTR_ERR(victim);
 
+	/* check to see if someone is using this object */
+	inode = d_inode(victim);
+	inode_lock(inode);
+	if (inode->i_flags & S_CACHE_FILE) {
+		ret = -EBUSY;
+	} else {
+		inode->i_flags |= S_CACHE_FILE;
+		ret = 0;
+	}
+	inode_unlock(inode);
+	if (ret < 0)
+		goto error_unlock;
+
 	_debug("victim -> %pd %s",
 	       victim, d_backing_inode(victim) ? "positive" : "negative");
 
@@ -967,7 +733,7 @@ int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
 	/*  actually remove the victim (drops the dir mutex) */
 	_debug("bury");
 
-	ret = cachefiles_bury_object(cache, NULL, dir, victim, false,
+	ret = cachefiles_bury_object(cache, NULL, dir, victim,
 				     FSCACHE_OBJECT_WAS_CULLED);
 	if (ret < 0)
 		goto error;
@@ -1004,6 +770,7 @@ int cachefiles_check_in_use(struct cachefiles_cache *cache, struct dentry *dir,
 			    char *filename)
 {
 	struct dentry *victim;
+	int ret = 0;
 
 	//_enter(",%pd/,%s",
 	//       dir, filename);
@@ -1013,7 +780,9 @@ int cachefiles_check_in_use(struct cachefiles_cache *cache, struct dentry *dir,
 		return PTR_ERR(victim);
 
 	inode_unlock(d_inode(dir));
+	if (d_inode(victim)->i_flags & S_CACHE_FILE)
+		ret = -EBUSY;
 	dput(victim);
 	//_leave(" = 0");
-	return 0;
+	return ret;
 }
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 9a448fe9355d..c877035c2946 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -237,35 +237,6 @@ TRACE_EVENT(cachefiles_mark_active,
 		      __entry->obj, __entry->de)
 	    );
 
-TRACE_EVENT(cachefiles_wait_active,
-	    TP_PROTO(struct cachefiles_object *obj,
-		     struct dentry *de,
-		     struct cachefiles_object *xobj),
-
-	    TP_ARGS(obj, de, xobj),
-
-	    /* Note that obj may be NULL */
-	    TP_STRUCT__entry(
-		    __field(unsigned int,		obj		)
-		    __field(unsigned int,		xobj		)
-		    __field(struct dentry *,		de		)
-		    __field(u16,			flags		)
-		    __field(u16,			fsc_flags	)
-			     ),
-
-	    TP_fast_assign(
-		    __entry->obj	= obj->fscache.debug_id;
-		    __entry->de		= de;
-		    __entry->xobj	= xobj->fscache.debug_id;
-		    __entry->flags	= xobj->flags;
-		    __entry->fsc_flags	= xobj->fscache.flags;
-			   ),
-
-	    TP_printk("o=%08x d=%p wo=%08x wf=%x wff=%x",
-		      __entry->obj, __entry->de, __entry->xobj,
-		      __entry->flags, __entry->fsc_flags)
-	    );
-
 TRACE_EVENT(cachefiles_mark_inactive,
 	    TP_PROTO(struct cachefiles_object *obj,
 		     struct dentry *de,



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 24/76] fscache: Provide a simple thread pool for running ops asynchronously
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (22 preceding siblings ...)
  2020-11-20 15:07 ` [RFC PATCH 23/76] cachefiles: Remove tree of active files and use " David Howells
@ 2020-11-20 15:07 ` David Howells
  2020-11-20 15:08 ` [RFC PATCH 26/76] fscache: Rewrite the I/O API based on iov_iter David Howells
                   ` (50 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:07 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Provide a simple thread pool that can be used to run cookie management
operations in the background and a dispatcher infrastructure to punt
operations to the pool if threads are available or to just run the
operation in the calling thread if not.

A future patch will replace all the object state machine stuff with whole
routines that do all the work in one go without trying to interleave bits
from various objects.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/Makefile            |    1 
 fs/fscache/dispatcher.c        |  145 ++++++++++++++++++++++++++++++++++++++++
 fs/fscache/internal.h          |    8 ++
 fs/fscache/main.c              |    7 ++
 include/trace/events/fscache.h |    6 +-
 5 files changed, 166 insertions(+), 1 deletion(-)
 create mode 100644 fs/fscache/dispatcher.c

diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
index ac3fcd909fff..7b10c6aad157 100644
--- a/fs/fscache/Makefile
+++ b/fs/fscache/Makefile
@@ -6,6 +6,7 @@
 fscache-y := \
 	cache.o \
 	cookie.o \
+	dispatcher.o \
 	fsdef.o \
 	main.o \
 	netfs.o \
diff --git a/fs/fscache/dispatcher.c b/fs/fscache/dispatcher.c
new file mode 100644
index 000000000000..3d957e499da3
--- /dev/null
+++ b/fs/fscache/dispatcher.c
@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Object dispatcher
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#define FSCACHE_DEBUG_LEVEL OPERATION
+#include <linux/kthread.h>
+#include <linux/slab.h>
+#include <linux/wait.h>
+#include <linux/freezer.h>
+#include "internal.h"
+
+#define FSCACHE_DISPATCHER_POOL_SIZE 8
+
+static LIST_HEAD(fscache_pending_work);
+static DEFINE_SPINLOCK(fscache_work_lock);
+static DECLARE_WAIT_QUEUE_HEAD(fscache_dispatcher_pool);
+static struct completion fscache_dispatcher_pool_done[FSCACHE_DISPATCHER_POOL_SIZE];
+static bool fscache_dispatcher_stop;
+
+struct fscache_work {
+	struct list_head	link;
+	struct fscache_cookie	*cookie;
+	struct fscache_object	*object;
+	int			param;
+	void (*func)(struct fscache_cookie *, struct fscache_object *, int);
+};
+
+/*
+ * Attempt to queue some work to do.  If there's too much asynchronous work
+ * already queued, we'll do it here in this thread instead.
+ */
+void fscache_dispatch(struct fscache_cookie *cookie,
+		      struct fscache_object *object,
+		      int param,
+		      void (*func)(struct fscache_cookie *,
+				   struct fscache_object *, int))
+{
+	struct fscache_work *work;
+	bool queued = false;
+
+	work = kzalloc(sizeof(struct fscache_work), GFP_KERNEL);
+	if (work) {
+		work->cookie = cookie;
+		work->object = object;
+		work->param = param;
+		work->func = func;
+
+		spin_lock(&fscache_work_lock);
+		if (waitqueue_active(&fscache_dispatcher_pool) ||
+		    list_empty(&fscache_pending_work)) {
+			fscache_cookie_get(cookie, fscache_cookie_get_work);
+			list_add_tail(&work->link, &fscache_pending_work);
+			wake_up(&fscache_dispatcher_pool);
+			queued = true;
+		}
+		spin_unlock(&fscache_work_lock);
+	}
+
+	if (!queued) {
+		kfree(work);
+		func(cookie, object, param);
+	}
+}
+
+/*
+ * A dispatcher thread.
+ */
+static int fscache_dispatcher(void *data)
+{
+	struct completion *done = data;
+
+	for (;;) {
+		if (!list_empty(&fscache_pending_work)) {
+			struct fscache_work *work = NULL;
+
+			spin_lock(&fscache_work_lock);
+			if (!list_empty(&fscache_pending_work)) {
+				work = list_entry(fscache_pending_work.next,
+						  struct fscache_work, link);
+				list_del_init(&work->link);
+			}
+			spin_unlock(&fscache_work_lock);
+
+			if (work) {
+				work->func(work->cookie, work->object, work->param);
+				fscache_cookie_put(work->cookie, fscache_cookie_put_work);
+				kfree(work);
+			}
+			continue;
+		} else if (fscache_dispatcher_stop) {
+			break;
+		}
+
+		wait_event_freezable(fscache_dispatcher_pool,
+				     (fscache_dispatcher_stop ||
+				      !list_empty(&fscache_pending_work)));
+	}
+
+	complete_and_exit(done, 0);
+}
+
+/*
+ * Start up the dispatcher threads.
+ */
+int fscache_init_dispatchers(void)
+{
+	struct task_struct *t;
+	int i;
+
+	for (i = 0; i < FSCACHE_DISPATCHER_POOL_SIZE; i++) {
+		init_completion(&fscache_dispatcher_pool_done[i]);
+		t = kthread_create(fscache_dispatcher,
+				   &fscache_dispatcher_pool_done[i],
+				   "kfsc/%d", i);
+		if (IS_ERR(t))
+			goto failed;
+		wake_up_process(t);
+	}
+
+	return 0;
+
+failed:
+	fscache_dispatcher_stop = true;
+	wake_up_all(&fscache_dispatcher_pool);
+	for (i--; i >= 0; i--)
+		wait_for_completion(&fscache_dispatcher_pool_done[i]);
+	return PTR_ERR(t);
+}
+
+/*
+ * Kill off the dispatcher threads.
+ */
+void fscache_kill_dispatchers(void)
+{
+	int i;
+
+	fscache_dispatcher_stop = true;
+	wake_up_all(&fscache_dispatcher_pool);
+
+	for (i = 0; i < FSCACHE_DISPATCHER_POOL_SIZE; i++)
+		wait_for_completion(&fscache_dispatcher_pool_done[i]);
+}
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index bc5539d2157b..2100e2222884 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -75,6 +75,14 @@ extern struct fscache_cookie *fscache_hash_cookie(struct fscache_cookie *);
 extern void fscache_cookie_put(struct fscache_cookie *,
 			       enum fscache_cookie_trace);
 
+/*
+ * dispatcher.c
+ */
+extern void fscache_dispatch(struct fscache_cookie *, struct fscache_object *, int,
+			     void (*func)(struct fscache_cookie *, struct fscache_object *, int));
+extern int fscache_init_dispatchers(void);
+extern void fscache_kill_dispatchers(void);
+
 /*
  * fsdef.c
  */
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index c1e6cc9091aa..c8f1beafa8e1 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -125,6 +125,10 @@ static int __init fscache_init(void)
 	for_each_possible_cpu(cpu)
 		init_waitqueue_head(&per_cpu(fscache_object_cong_wait, cpu));
 
+	ret = fscache_init_dispatchers();
+	if (ret < 0)
+		goto error_dispatchers;
+
 	ret = fscache_proc_init();
 	if (ret < 0)
 		goto error_proc;
@@ -159,6 +163,8 @@ static int __init fscache_init(void)
 	unregister_sysctl_table(fscache_sysctl_header);
 error_sysctl:
 #endif
+	fscache_kill_dispatchers();
+error_dispatchers:
 	fscache_proc_cleanup();
 error_proc:
 	destroy_workqueue(fscache_op_wq);
@@ -183,6 +189,7 @@ static void __exit fscache_exit(void)
 	unregister_sysctl_table(fscache_sysctl_header);
 #endif
 	fscache_proc_cleanup();
+	fscache_kill_dispatchers();
 	destroy_workqueue(fscache_op_wq);
 	destroy_workqueue(fscache_object_wq);
 	pr_notice("Unloaded\n");
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 08d7de72409d..fb3fdf2921ee 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -26,11 +26,13 @@ enum fscache_cookie_trace {
 	fscache_cookie_get_attach_object,
 	fscache_cookie_get_reacquire,
 	fscache_cookie_get_register_netfs,
+	fscache_cookie_get_work,
 	fscache_cookie_put_acquire_nobufs,
 	fscache_cookie_put_dup_netfs,
 	fscache_cookie_put_relinquish,
 	fscache_cookie_put_object,
 	fscache_cookie_put_parent,
+	fscache_cookie_put_work,
 };
 
 #endif
@@ -45,11 +47,13 @@ enum fscache_cookie_trace {
 	EM(fscache_cookie_get_attach_object,	"GET obj")		\
 	EM(fscache_cookie_get_reacquire,	"GET raq")		\
 	EM(fscache_cookie_get_register_netfs,	"GET net")		\
+	EM(fscache_cookie_get_work,		"GET wrk")		\
 	EM(fscache_cookie_put_acquire_nobufs,	"PUT nbf")		\
 	EM(fscache_cookie_put_dup_netfs,	"PUT dnt")		\
 	EM(fscache_cookie_put_relinquish,	"PUT rlq")		\
 	EM(fscache_cookie_put_object,		"PUT obj")		\
-	E_(fscache_cookie_put_parent,		"PUT prn")
+	EM(fscache_cookie_put_parent,		"PUT prn")		\
+	E_(fscache_cookie_put_work,		"PUT wrk")
 
 /*
  * Export enum symbols via userspace.



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 26/76] fscache: Rewrite the I/O API based on iov_iter
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (23 preceding siblings ...)
  2020-11-20 15:07 ` [RFC PATCH 24/76] fscache: Provide a simple thread pool for running ops asynchronously David Howells
@ 2020-11-20 15:08 ` David Howells
  2020-11-20 15:08 ` [RFC PATCH 27/76] fscache: Keep track of size of a file last set independently on the server David Howells
                   ` (49 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:08 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Rewrite the fscache I/O API by introducing a number of new routines based
on a number of principles:

 (1) The cache provides *only* write-to-cache and read-from-cache calls for
     transferring data to/from the cache.

 (2) The bufferage for I/O to/from the cache is supplied with an iov_iter.
     There is no requirement that the iov_iters involved have anything to
     do with an inode's pagecache, though if it does, an ITER_MAPPING
     iterator is available.

 (3) I/O to/from any particular cache object is done in one of a number of
     modes, set for the cache object at cookie acquisition time:

	(A) Single blob.  The blob must be written in its entirety in one
	    go.

	(B) Granular.  Writes to the cache should be done in granule
	    sized-blocks, where, for the moment, a granule will be 256KiB,
	    but could be variable.  This allows the metadata indicating
	    which granules are present to be smaller at the cost of using
	    more disk space.

     In both cases, reads from the cache may be done in smaller chunks and
     small update writes may be done inside a block that exists.

 (4) I/O to/from the cache must be aligned to the DIO block size of the
     backing filesystem.  The cache tells the caller what it should
     consider the DIO block size to be.  This will never be larger than
     page size.

 (5) Completion of the I/O results in a callback - after which the cache
     no longer knows about it.

 (6) The cache doesn't retain any pointers back into the netfs, either the
     code, its state or its pagecache.


To do granular I/O, the netfs has to take the read or write request it got
from the VFS/VM and 'shape' it to fit the caching parameters.  It does this
by filling in a form to indicate the extent of the operation it might like
to make:

	struct fscache_request_shape {
		/* Parameters */
		loff_t		i_size;
		pgoff_t		proposed_start;
		unsigned int	proposed_nr_pages;
		unsigned int	max_io_pages;
		bool		for_write;

		/* Result */
		unsigned int	to_be_done;
		unsigned int	granularity;
		unsigned int	dio_block_size;
		unsigned int	actual_nr_pages;
		pgoff_t		actual_start;
	};

and then it calls:

	void fscache_shape_request(struct fscache_cookie *cookie,
				   struct fscache_request_shape *shape);

to shape it.

The netfs should set 'proposed_start' to be the first page to read,
'proposed_nr_pages' to indicate the size of the request and 'i_size' to
indicate the size that the file should be considered to be.  'max_io_pages'
should be set to the maximum size of a transaction, up to UINT_MAX, and
'for_write' should be set to true if this is for a write to the cache.

The cache will then shape the proposed read to fit a blocking factor
appropriate for the cache object and region of the file.  It may extend
start forward and may shrink or extend the request to fit the granularity
of the cache.  This will be trimmed to the end of file as specified by the
proposed file size.

Upon return, 'to_be_done' will be set to one of FSCACHE_READ_FROM_SERVER,
FSCACHE_READ_FROM_CACHE, FSCACHE_FILL_WITH_ZERO, and may have
FSCACHE_WRITE_TO_CACHE bitwise-OR'd onto it.

'actual_start' and 'actual_nr_pages' will be set to indicate
the cache's proposal for the desired size and position of the operation.
'granularity' will be set to hold the cache block granularity (in pages)
and transaction can be shortened to a multiple of this.  Note that the
shaped request will always include the proposed_start page.

'dio_block_size' will be set to whatever I/O size the cache must
communicate with its storage in.  This is necessary to set up the iov_iter
to be passed to the cache for reading and writing so that it can do direct
I/O.

Once the netfs has set up its request, if FSCACHE_READ_FROM_CACHE was set,
it should then call:

	void fscache_read(struct fscache_io_request *req)

to read data from the cache.  To do this, it needs to fill out a request
descriptor:

	struct fscache_io_request {
		const struct fscache_io_request_ops *ops;
		struct fscache_cookie	*cookie;
		loff_t			pos;
		loff_t			len;
		int			error;
		bool (*is_still_valid)(struct fscache_io_request *);
		void (*done)(struct fscache_io_request *);
		...
	};

The ops pointer, cookie, position and length should be set to describe the
I/O operation to be performed.  An 'is_still_valid' function may be
provided to check whether the operation should still go ahead after a wait
in case it got invalidated by the server.

A 'done' function may be provided that will be called to finalise the
operation.  If provided, the 'done' function will be always be called, even
when the operation doesn't take place because there's no cache.  If no done
function is called, the operation will be synchronous.

Note that the pages must be pinned - typically by locking them.

If FSCACHE_WRITE_TO_CACHE was set, then once the data is read from the
server, the netfs should write it to the cache by calling:

	void fscache_write(struct fscache_io_request *req)

The request descriptor is set as for fscache_read().  Note that the pages
must be pinned.  In this case, PG_fscache can be set on the page and the
pages can be unlocked; the bit can then be cleared by the done handler.
The releasepage, invalidatepage, launderpage and page_mkwrite functions
should be used to suspend progress until the bit is cleared.  The following
functions are made available in an earlier patch for this:

	void unlock_page_fscache(struct page *page);
	void wait_on_page_fscache(struct page *page)

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/Makefile               |    1 
 fs/fscache/cookie.c               |    4 -
 fs/fscache/internal.h             |    1 
 fs/fscache/io.c                   |  144 +++++++++++++++++++++++++++++
 fs/fscache/obj.c                  |    2 
 include/linux/fscache-cache.h     |   20 ++++
 include/linux/fscache.h           |  186 +++++++++++++++++++++++++++++++++++++
 include/trace/events/cachefiles.h |    2 
 include/trace/events/fscache.h    |    2 
 9 files changed, 357 insertions(+), 5 deletions(-)
 create mode 100644 fs/fscache/io.c

diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
index 396e1b5fdc28..3caf66810e7b 100644
--- a/fs/fscache/Makefile
+++ b/fs/fscache/Makefile
@@ -8,6 +8,7 @@ fscache-y := \
 	cookie.o \
 	dispatcher.o \
 	fsdef.o \
+	io.o \
 	main.o \
 	netfs.o \
 	obj.o \
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 3e79177a7b7b..e6b87596b1d0 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -20,7 +20,7 @@ static struct hlist_bl_head fscache_cookie_hash[1 << fscache_cookie_hash_shift];
 static LIST_HEAD(fscache_cookies);
 static DEFINE_RWLOCK(fscache_cookies_lock);
 
-static void fscache_print_cookie(struct fscache_cookie *cookie, char prefix)
+void fscache_print_cookie(struct fscache_cookie *cookie, char prefix)
 {
 	struct fscache_object *object;
 	struct hlist_node *o;
@@ -569,7 +569,7 @@ void __fscache_invalidate(struct fscache_cookie *cookie)
 	case FSCACHE_COOKIE_STAGE_ACTIVE:
 		cookie->stage = FSCACHE_COOKIE_STAGE_INVALIDATING;
 
-		atomic_inc(&cookie->n_ops);
+		fscache_count_io_operation(cookie);
 		object->cache->ops->grab_object(object, fscache_obj_get_inval);
 		spin_unlock(&cookie->lock);
 		wake_up_cookie_stage(cookie);
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 67ca437c1f73..6c2a6ebe4f02 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -59,6 +59,7 @@ static inline void fscache_put_cache_tag(struct fscache_cache_tag *tag)
 /*
  * cookie.c
  */
+extern void fscache_print_cookie(struct fscache_cookie *cookie, char prefix);
 extern struct kmem_cache *fscache_cookie_jar;
 extern const struct seq_operations fscache_cookies_seq_ops;
 
diff --git a/fs/fscache/io.c b/fs/fscache/io.c
new file mode 100644
index 000000000000..7ad13900b281
--- /dev/null
+++ b/fs/fscache/io.c
@@ -0,0 +1,144 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Data I/O routines
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#define FSCACHE_DEBUG_LEVEL OPERATION
+#include <linux/module.h>
+#include <linux/fscache-cache.h>
+#include <linux/slab.h>
+#include "internal.h"
+
+/*
+ * Wait for a cookie to reach the specified stage.
+ */
+void __fscache_wait_for_operation(struct fscache_op_resources *opr,
+				  enum fscache_want_stage want_stage)
+{
+	struct fscache_cookie *cookie = opr->object->cookie;
+	enum fscache_cookie_stage stage;
+
+again:
+	stage = READ_ONCE(cookie->stage);
+	_enter("c=%08x{%u},%x", cookie->debug_id, stage, want_stage);
+
+	if (fscache_cache_is_broken(opr->object)) {
+		_leave(" [broken]");
+		return;
+	}
+
+	switch (stage) {
+	case FSCACHE_COOKIE_STAGE_INITIALISING:
+	case FSCACHE_COOKIE_STAGE_LOOKING_UP:
+	case FSCACHE_COOKIE_STAGE_INVALIDATING:
+		wait_var_event(&cookie->stage, READ_ONCE(cookie->stage) != stage);
+		goto again;
+
+	case FSCACHE_COOKIE_STAGE_NO_DATA_YET:
+	case FSCACHE_COOKIE_STAGE_ACTIVE:
+		return;
+	case FSCACHE_COOKIE_STAGE_INDEX:
+	case FSCACHE_COOKIE_STAGE_DROPPED:
+	case FSCACHE_COOKIE_STAGE_RELINQUISHING:
+	default:
+		_leave(" [not live]");
+		return;
+	}
+}
+EXPORT_SYMBOL(__fscache_wait_for_operation);
+
+/*
+ * Release the resources needed by an operation.
+ */
+void __fscache_end_operation(struct fscache_op_resources *opr)
+{
+	struct fscache_object *object = opr->object;
+
+	fscache_uncount_io_operation(object->cookie);
+	object->cache->ops->put_object(object, fscache_obj_put_ioreq);
+}
+EXPORT_SYMBOL(__fscache_end_operation);
+
+/*
+ * Begin an I/O operation on the cache, waiting till we reach the right state.
+ *
+ * Attaches the resources required to the operation resources record.
+ */
+int __fscache_begin_operation(struct fscache_cookie *cookie,
+			      struct fscache_op_resources *opr,
+			      enum fscache_want_stage want_stage)
+{
+	struct fscache_object *object;
+	enum fscache_cookie_stage stage;
+	long timeo;
+	bool once_only = false;
+
+again:
+	spin_lock(&cookie->lock);
+
+	stage = cookie->stage;
+	_enter("c=%08x{%u},%x", cookie->debug_id, stage, want_stage);
+
+	switch (stage) {
+	case FSCACHE_COOKIE_STAGE_INITIALISING:
+	case FSCACHE_COOKIE_STAGE_LOOKING_UP:
+	case FSCACHE_COOKIE_STAGE_INVALIDATING:
+		goto wait_and_validate;
+
+	case FSCACHE_COOKIE_STAGE_NO_DATA_YET:
+		if (want_stage == FSCACHE_WANT_READ)
+			goto no_data_yet;
+		fallthrough;
+	case FSCACHE_COOKIE_STAGE_ACTIVE:
+		goto ready;
+	case FSCACHE_COOKIE_STAGE_INDEX:
+	case FSCACHE_COOKIE_STAGE_DROPPED:
+	case FSCACHE_COOKIE_STAGE_RELINQUISHING:
+		WARN(1, "Can't use cookie in stage %u\n", cookie->stage);
+		goto not_live;
+	default:
+		goto not_live;
+	}
+
+ready:
+	object = hlist_entry(cookie->backing_objects.first,
+			     struct fscache_object, cookie_link);
+
+	if (fscache_cache_is_broken(object))
+		goto not_live;
+
+	opr->object = object;
+	object->cache->ops->grab_object(object, fscache_obj_get_ioreq);
+	object->cache->ops->begin_operation(opr);
+
+	fscache_count_io_operation(cookie);
+	spin_unlock(&cookie->lock);
+	return 0;
+
+wait_and_validate:
+	spin_unlock(&cookie->lock);
+	timeo = wait_var_event_timeout(&cookie->stage,
+				       READ_ONCE(cookie->stage) != stage, 20 * HZ);
+	if (timeo <= 1 && !once_only) {
+		pr_warn("%s: cookie stage change wait timed out: cookie->stage=%u stage=%u",
+			__func__, READ_ONCE(cookie->stage), stage);
+		fscache_print_cookie(cookie, 'O');
+		once_only = true;
+	}
+	goto again;
+
+no_data_yet:
+	spin_unlock(&cookie->lock);
+	opr->object = NULL;
+	_leave(" = -ENODATA");
+	return -ENODATA;
+
+not_live:
+	spin_unlock(&cookie->lock);
+	opr->object = NULL;
+	_leave(" = -ENOBUFS");
+	return -ENOBUFS;
+}
+EXPORT_SYMBOL(__fscache_begin_operation);
diff --git a/fs/fscache/obj.c b/fs/fscache/obj.c
index d4a3b6fac791..7286d3e2eb31 100644
--- a/fs/fscache/obj.c
+++ b/fs/fscache/obj.c
@@ -274,7 +274,7 @@ void fscache_invalidate_object(struct fscache_cookie *cookie,
 		fscache_set_cookie_stage(cookie, FSCACHE_COOKIE_STAGE_NO_DATA_YET);
 	else
 		fscache_set_cookie_stage(cookie, FSCACHE_COOKIE_STAGE_FAILED);
-	fscache_end_io_operation(cookie);
+	fscache_uncount_io_operation(cookie);
 }
 
 /*
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 19770160dbb9..97415d19aa82 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -27,6 +27,7 @@ enum fscache_obj_ref_trace {
 	fscache_obj_get_attach,
 	fscache_obj_get_exists,
 	fscache_obj_get_inval,
+	fscache_obj_get_ioreq,
 	fscache_obj_get_wait,
 	fscache_obj_get_withdraw,
 	fscache_obj_new,
@@ -37,6 +38,7 @@ enum fscache_obj_ref_trace {
 	fscache_obj_put_drop_child,
 	fscache_obj_put_drop_obj,
 	fscache_obj_put_inval,
+	fscache_obj_put_ioreq,
 	fscache_obj_put_lookup_fail,
 	fscache_obj_put_withdraw,
 	fscache_obj_ref__nr_traces
@@ -134,6 +136,9 @@ struct fscache_cache_ops {
 
 	/* reserve space for an object's data and associated metadata */
 	int (*reserve_space)(struct fscache_object *object, loff_t i_size);
+
+	/* Begin an operation on a cache object */
+	void (*begin_operation)(struct fscache_op_resources *opr);
 };
 
 extern struct fscache_cookie fscache_fsdef_index;
@@ -231,12 +236,23 @@ static inline void *fscache_get_aux(struct fscache_cookie *cookie)
 }
 
 /*
- * Complete an I/O operation
+ * Count the start of an I/O operation
  */
-static inline void fscache_end_io_operation(struct fscache_cookie *cookie)
+static inline void fscache_count_io_operation(struct fscache_cookie *cookie)
+{
+	atomic_inc(&cookie->n_ops);
+}
+
+/*
+ * Count the end of an I/O operation
+ */
+static inline void fscache_uncount_io_operation(struct fscache_cookie *cookie)
 {
 	if (atomic_dec_and_test(&cookie->n_ops))
 		wake_up_var(&cookie->n_ops);
 }
 
+extern void __fscache_wait_for_operation(struct fscache_op_resources *, enum fscache_want_stage);
+extern void __fscache_end_operation(struct fscache_op_resources *);
+
 #endif /* _LINUX_FSCACHE_CACHE_H */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 9d5e0d7ba860..3cd18ddc3903 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -21,11 +21,15 @@
 #include <linux/list_bl.h>
 
 #if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+#define __fscache_available (1)
 #define fscache_available() (1)
 #define fscache_cookie_valid(cookie) (cookie)
+#define fscache_object_valid(object) (object)
 #else
+#define __fscache_available (0)
 #define fscache_available() (0)
 #define fscache_cookie_valid(cookie) (0)
+#define fscache_object_valid(object) (NULL)
 #endif
 
 
@@ -45,6 +49,10 @@
 struct fscache_cache_tag;
 struct fscache_cookie;
 struct fscache_netfs;
+struct fscache_op_resources;
+struct fscache_op_ops;
+
+typedef void (*fscache_io_terminated_t)(void *priv, ssize_t transferred_or_error);
 
 enum fscache_cookie_type {
 	FSCACHE_COOKIE_TYPE_INDEX,
@@ -55,6 +63,12 @@ enum fscache_cookie_type {
 #define FSCACHE_ADV_WRITE_CACHE		0x00 /* Do cache if written to locally */
 #define FSCACHE_ADV_WRITE_NOCACHE	0x02 /* Don't cache if written to locally */
 
+enum fscache_want_stage {
+	FSCACHE_WANT_PARAMS,
+	FSCACHE_WANT_WRITE,
+	FSCACHE_WANT_READ,
+};
+
 /*
  * fscache cached network filesystem type
  * - name, version and ops must be filled in before registration
@@ -124,6 +138,42 @@ struct fscache_cookie {
 	};
 };
 
+/*
+ * Resources required to do operations.
+ */
+struct fscache_op_resources {
+#if __fscache_available
+	const struct fscache_op_ops	*ops;
+	struct fscache_object		*object;
+#endif
+};
+
+/*
+ * Table of operations for doing things to a non-index cookie.
+ */
+struct fscache_op_ops {
+	/* Wait for an operation to complete */
+	void (*wait_for_operation)(struct fscache_op_resources *opr,
+				   enum fscache_want_stage want_stage);
+	/* End an operation */
+	void (*end_operation)(struct fscache_op_resources *opr);
+
+	/* Read data from the cache */
+	int (*read)(struct fscache_op_resources *opr,
+		    loff_t start_pos,
+		    struct iov_iter *iter,
+		    bool seek_data,
+		    fscache_io_terminated_t term_func,
+		    void *term_func_priv);
+
+	/* Write data to the cache */
+	int (*write)(struct fscache_op_resources *opr,
+		     loff_t start_pos,
+		     struct iov_iter *iter,
+		     fscache_io_terminated_t term_func,
+		     void *term_func_priv);
+};
+
 /*
  * slow-path functions for when there is actually caching available, and the
  * netfs does actually have a valid token
@@ -147,6 +197,8 @@ extern struct fscache_cookie *__fscache_acquire_cookie(
 	loff_t);
 extern void __fscache_use_cookie(struct fscache_cookie *, bool);
 extern void __fscache_unuse_cookie(struct fscache_cookie *, const void *, const loff_t *);
+extern int __fscache_begin_operation(struct fscache_cookie *, struct fscache_op_resources *,
+				     enum fscache_want_stage);
 extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
 extern void __fscache_update_cookie(struct fscache_cookie *, const void *, const loff_t *);
 extern void __fscache_invalidate(struct fscache_cookie *);
@@ -392,4 +444,138 @@ void fscache_invalidate(struct fscache_cookie *cookie)
 		__fscache_invalidate(cookie);
 }
 
+/**
+ * fscache_begin_operation - Begin an fscache I/O operation
+ * @cookie: The cookie representing the cache object
+ * @opr: Where to stash the resources
+ * @want_stage: The minimum stage the object must be at
+ *
+ * Prepare to do an operation against a cache object, represented by @cookie.
+ * Any resources pinned to make the operation possible will be cached in *@opr.
+ * The stage that the object must be at before the operation can take place
+ * should be specified in @want_stage.  The function will wait until this is
+ * the case, or the object fails.
+ *
+ * Returns 0 on success, -ENODATA if reading is desired, but there's no data
+ * available yet and -ENOBUFS if the cache object is unavailable.
+ */
+static inline
+int fscache_begin_operation(struct fscache_cookie *cookie,
+			    struct fscache_op_resources *opr,
+			    enum fscache_want_stage want_stage)
+{
+	if (fscache_cookie_valid(cookie))
+		return __fscache_begin_operation(cookie, opr, want_stage);
+	return -ENOBUFS;
+}
+
+/**
+ * fscache_operation_valid - Return true if operations resources are usable
+ * @opr: The resources to check.
+ *
+ * Returns a pointer to the operations table if usable or NULL if not.
+ */
+static inline
+const struct fscache_op_ops *fscache_operation_valid(const struct fscache_op_resources *opr)
+{
+#if __fscache_available
+	return fscache_object_valid(opr->object) ? opr->ops : NULL;
+#else
+	return NULL;
+#endif
+}
+
+/**
+ * fscache_wait_for_operation - Wait for an object become accessible
+ * @cookie: The cookie representing the cache object
+ * @want_stage: The minimum stage the object must be at
+ *
+ * See if the target cache object is at the specified minimum stage of
+ * accessibility yet, and if not, wait for it.
+ */
+static inline
+void fscache_wait_for_operation(struct fscache_op_resources *opr,
+				enum fscache_want_stage want_stage)
+{
+	const struct fscache_op_ops *ops = fscache_operation_valid(opr);
+	if (ops)
+		ops->wait_for_operation(opr, want_stage);
+}
+
+/**
+ * fscache_end_operation - End an fscache I/O operation.
+ * @opr: The resources to dispose of.
+ */
+static inline
+void fscache_end_operation(struct fscache_op_resources *opr)
+{
+	const struct fscache_op_ops *ops = fscache_operation_valid(opr);
+	if (ops)
+		ops->end_operation(opr);
+}
+
+/**
+ * fscache_read - Start a read from the cache.
+ * @opr: The cache resources to use
+ * @start_pos: The beginning file offset in the cache file
+ * @iter: The buffer to fill - and also the length
+ * @seek_data: True to seek for the data
+ * @term_func: The function to call upon completion
+ * @term_func_priv: The private data for @term_func
+ *
+ * Start a read from the cache.  @opr indicates the cache object to read from
+ * and must be obtained by a call to fscache_begin_operation() beforehand.
+ *
+ * The data is read into the iterator, @iter, and that also indicates the size
+ * of the operation.  @start_pos is the start position in the file, though if
+ * @seek_data is set, the cache will use SEEK_DATA to find the next piece of
+ * data, writing zeros for the hole into the iterator.
+ *
+ * Upon termination of the operation, @term_func will be called and supplied
+ * with @term_func_priv plus the amount of data written, if successful, or the
+ * error code otherwise.
+ */
+static inline
+int fscache_read(struct fscache_op_resources *opr,
+		 loff_t start_pos,
+		 struct iov_iter *iter,
+		 bool seek_data,
+		 fscache_io_terminated_t term_func,
+		 void *term_func_priv)
+{
+	const struct fscache_op_ops *ops = fscache_operation_valid(opr);
+	return ops->read(opr, start_pos, iter, seek_data,
+			 term_func, term_func_priv);
+}
+
+/**
+ * fscache_write - Start a write to the cache.
+ * @opr: The cache resources to use
+ * @start_pos: The beginning file offset in the cache file
+ * @iter: The data to write - and also the length
+ * @term_func: The function to call upon completion
+ * @term_func_priv: The private data for @term_func
+ *
+ * Start a write to the cache.  @opr indicates the cache object to write to and
+ * must be obtained by a call to fscache_begin_operation() beforehand.
+ *
+ * The data to be written is obtained from the iterator, @iter, and that also
+ * indicates the size of the operation.  @start_pos is the start position in
+ * the file.
+ *
+ * Upon termination of the operation, @term_func will be called and supplied
+ * with @term_func_priv plus the amount of data written, if successful, or the
+ * error code otherwise.
+ */
+static inline
+int fscache_write(struct fscache_op_resources *opr,
+		 loff_t start_pos,
+		 struct iov_iter *iter,
+		 fscache_io_terminated_t term_func,
+		 void *term_func_priv)
+{
+	const struct fscache_op_ops *ops = fscache_operation_valid(opr);
+	return ops->write(opr, start_pos, iter, term_func, term_func_priv);
+}
+
 #endif /* _LINUX_FSCACHE_H */
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 4fedc2e9c428..0aa3f3126f6e 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -39,6 +39,7 @@ enum cachefiles_obj_ref_trace {
 	EM(fscache_obj_get_attach,		"GET attach")		\
 	EM(fscache_obj_get_exists,		"GET exists")		\
 	EM(fscache_obj_get_inval,		"GET inval")		\
+	EM(fscache_obj_get_ioreq,		"GET ioreq")		\
 	EM(fscache_obj_get_wait,		"GET wait")		\
 	EM(fscache_obj_get_withdraw,		"GET withdraw")		\
 	EM(fscache_obj_new,			"NEW obj")		\
@@ -49,6 +50,7 @@ enum cachefiles_obj_ref_trace {
 	EM(fscache_obj_put_drop_child,		"PUT drop_child")	\
 	EM(fscache_obj_put_drop_obj,		"PUT drop_obj")		\
 	EM(fscache_obj_put_inval,		"PUT inval")		\
+	EM(fscache_obj_put_ioreq,		"PUT ioreq")		\
 	EM(fscache_obj_put_withdraw,		"PUT withdraw")		\
 	EM(fscache_obj_put_lookup_fail,		"PUT lookup_fail")	\
 	EM(cachefiles_obj_put_wait_retry,	"PUT wait_retry")	\
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 0794900b7ab9..2edf74c40e83 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -51,12 +51,14 @@ enum fscache_cookie_trace {
 	EM(fscache_cookie_get_acquire_parent,	"GET paren")		\
 	EM(fscache_cookie_get_attach_object,	"GET attch")		\
 	EM(fscache_cookie_get_hash_collision,	"GET hcoll")		\
+	EM(fscache_cookie_get_ioreq,		"GET ioreq")		\
 	EM(fscache_cookie_get_register_netfs,	"GET rgstr")		\
 	EM(fscache_cookie_get_work,		"GET work ")		\
 	EM(fscache_cookie_new_acquire,		"NEW acq  ")		\
 	EM(fscache_cookie_new_netfs,		"NEW netfs")		\
 	EM(fscache_cookie_put_dup_netfs,	"PUT dupnf")		\
 	EM(fscache_cookie_put_hash_collision,	"PUT hcoll")		\
+	EM(fscache_cookie_put_ioreq,		"PUT ioreq")		\
 	EM(fscache_cookie_put_object,		"PUT obj  ")		\
 	EM(fscache_cookie_put_parent,		"PUT paren")		\
 	EM(fscache_cookie_put_relinquish,	"PUT relnq")		\



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 27/76] fscache: Keep track of size of a file last set independently on the server
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (24 preceding siblings ...)
  2020-11-20 15:08 ` [RFC PATCH 26/76] fscache: Rewrite the I/O API based on iov_iter David Howells
@ 2020-11-20 15:08 ` David Howells
  2020-11-20 15:08 ` [RFC PATCH 28/76] fscache, cachefiles: Fix disabled histogram warnings David Howells
                   ` (48 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:08 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Keep track of the size of a file that we're caching as last set
independently on the server by another client.  As long as this does not
change, we can make the assumption that anything over that boundary, if not
represented in the local cache, will not be represented on the server
either and can be just cleared rather than being read, thereby saving a
trip to the server.

This only works if we make space in the cache by zapping whole files and
not just punching bits out of them as if we write to the server but don't
keep a copy in the cache, the assumption mentioned above no longer holds
true.

We also need to update this size when invalidation occurs.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/inode.c          |    2 +-
 fs/fscache/cookie.c     |    6 +++++-
 include/linux/fscache.h |    8 +++++---
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index df999d703e2b..0ba2815f8152 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -569,7 +569,7 @@ static void afs_zap_data(struct afs_vnode *vnode)
 	_enter("{%llx:%llu}", vnode->fid.vid, vnode->fid.vnode);
 
 #ifdef CONFIG_AFS_FSCACHE
-	fscache_invalidate(vnode->cache);
+	fscache_invalidate(vnode->cache, i_size_read(&vnode->vfs_inode));
 #endif
 
 	/* nuke all the non-dirty pages that aren't locked, mapped or being
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index e6b87596b1d0..230025a1495c 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -159,6 +159,7 @@ struct fscache_cookie *fscache_alloc_cookie(
 	cookie->key_len = index_key_len;
 	cookie->aux_len = aux_data_len;
 	cookie->object_size = object_size;
+	cookie->zero_point = object_size;
 	strlcpy(cookie->type_name, type_name, sizeof(cookie->type_name));
 
 	if (fscache_set_key(cookie, index_key, index_key_len) < 0)
@@ -532,7 +533,7 @@ void fscache_set_cookie_stage(struct fscache_cookie *cookie,
 /*
  * Invalidate an object.  Callable with spinlocks held.
  */
-void __fscache_invalidate(struct fscache_cookie *cookie)
+void __fscache_invalidate(struct fscache_cookie *cookie, loff_t new_size)
 {
 	struct fscache_object *object = NULL;
 
@@ -548,6 +549,9 @@ void __fscache_invalidate(struct fscache_cookie *cookie)
 	ASSERTCMP(cookie->type, ==, FSCACHE_COOKIE_TYPE_DATAFILE);
 
 	spin_lock(&cookie->lock);
+	cookie->object_size = new_size;
+	cookie->zero_point = new_size;
+
 	if (!hlist_empty(&cookie->backing_objects))
 		object = hlist_entry(cookie->backing_objects.first,
 				     struct fscache_object, cookie_link);
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 3cd18ddc3903..8d47c7a08456 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -118,6 +118,7 @@ struct fscache_cookie {
 	struct list_head		proc_link;	/* Link in proc list */
 	char				type_name[8];	/* Cookie type name */
 	loff_t				object_size;	/* Size of the netfs object */
+	loff_t				zero_point;	/* Size after which no data on server */
 
 	unsigned long			flags;
 #define FSCACHE_COOKIE_RELINQUISHED	6		/* T if cookie has been relinquished */
@@ -201,7 +202,7 @@ extern int __fscache_begin_operation(struct fscache_cookie *, struct fscache_op_
 				     enum fscache_want_stage);
 extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
 extern void __fscache_update_cookie(struct fscache_cookie *, const void *, const loff_t *);
-extern void __fscache_invalidate(struct fscache_cookie *);
+extern void __fscache_invalidate(struct fscache_cookie *, loff_t);
 
 /**
  * fscache_register_netfs - Register a filesystem as desiring caching services
@@ -427,6 +428,7 @@ void fscache_unpin_cookie(struct fscache_cookie *cookie)
 /**
  * fscache_invalidate - Notify cache that an object needs invalidation
  * @cookie: The cookie representing the cache object
+ * @size: The revised size of the object.
  *
  * Notify the cache that an object is needs to be invalidated and that it
  * should abort any retrievals or stores it is doing on the cache.  The object
@@ -438,10 +440,10 @@ void fscache_unpin_cookie(struct fscache_cookie *cookie)
  * description.
  */
 static inline
-void fscache_invalidate(struct fscache_cookie *cookie)
+void fscache_invalidate(struct fscache_cookie *cookie, loff_t size)
 {
 	if (fscache_cookie_valid(cookie))
-		__fscache_invalidate(cookie);
+		__fscache_invalidate(cookie, size);
 }
 
 /**



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 28/76] fscache, cachefiles: Fix disabled histogram warnings
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (25 preceding siblings ...)
  2020-11-20 15:08 ` [RFC PATCH 27/76] fscache: Keep track of size of a file last set independently on the server David Howells
@ 2020-11-20 15:08 ` David Howells
  2020-11-20 15:08 ` [RFC PATCH 29/76] fscache: Recast assertion in terms of cookie not being an index David Howells
                   ` (47 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:08 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Fix variable unused warnings due to disabled histogram stuff.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/internal.h |    7 +++++--
 fs/fscache/internal.h    |    6 ++++--
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index b89f76a03546..16d15291a629 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -143,11 +143,11 @@ extern int cachefiles_check_in_use(struct cachefiles_cache *cache,
 /*
  * proc.c
  */
-#ifdef CONFIG_CACHEFILES_HISTOGRAM
 extern atomic_t cachefiles_lookup_histogram[HZ];
 extern atomic_t cachefiles_mkdir_histogram[HZ];
 extern atomic_t cachefiles_create_histogram[HZ];
 
+#ifdef CONFIG_CACHEFILES_HISTOGRAM
 extern int __init cachefiles_proc_init(void);
 extern void cachefiles_proc_cleanup(void);
 static inline
@@ -162,7 +162,10 @@ void cachefiles_hist(atomic_t histogram[], unsigned long start_jif)
 #else
 #define cachefiles_proc_init()		(0)
 #define cachefiles_proc_cleanup()	do {} while (0)
-#define cachefiles_hist(hist, start_jif) do {} while (0)
+static inline
+void cachefiles_hist(atomic_t histogram[], unsigned long start_jif)
+{
+}
 #endif
 
 /*
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 6c2a6ebe4f02..2c1f4151c092 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -109,13 +109,13 @@ extern struct fscache_cookie fscache_fsdef_index;
 /*
  * histogram.c
  */
-#ifdef CONFIG_FSCACHE_HISTOGRAM
 extern atomic_t fscache_obj_instantiate_histogram[HZ];
 extern atomic_t fscache_objs_histogram[HZ];
 extern atomic_t fscache_ops_histogram[HZ];
 extern atomic_t fscache_retrieval_delay_histogram[HZ];
 extern atomic_t fscache_retrieval_histogram[HZ];
 
+#ifdef CONFIG_FSCACHE_HISTOGRAM
 static inline void fscache_hist(atomic_t histogram[], unsigned long start_jif)
 {
 	unsigned long jif = jiffies - start_jif;
@@ -127,7 +127,9 @@ static inline void fscache_hist(atomic_t histogram[], unsigned long start_jif)
 extern const struct seq_operations fscache_histogram_ops;
 
 #else
-#define fscache_hist(hist, start_jif) do {} while (0)
+static inline void fscache_hist(atomic_t histogram[], unsigned long start_jif)
+{
+}
 #endif
 
 /*



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 29/76] fscache: Recast assertion in terms of cookie not being an index
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (26 preceding siblings ...)
  2020-11-20 15:08 ` [RFC PATCH 28/76] fscache, cachefiles: Fix disabled histogram warnings David Howells
@ 2020-11-20 15:08 ` David Howells
  2020-11-20 15:08 ` [RFC PATCH 30/76] vfs, fscache: Force ->write_inode() to occur if cookie pinned for writeback David Howells
                   ` (46 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:08 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Recast assertion in terms of cookie not being an index rather than being a
datafile.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/cookie.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 230025a1495c..8d8aba14912a 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -546,7 +546,7 @@ void __fscache_invalidate(struct fscache_cookie *cookie, loff_t new_size)
 	 * there, and if it's doing that, it may as well just retire the
 	 * cookie.
 	 */
-	ASSERTCMP(cookie->type, ==, FSCACHE_COOKIE_TYPE_DATAFILE);
+	ASSERTCMP(cookie->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
 
 	spin_lock(&cookie->lock);
 	cookie->object_size = new_size;



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 30/76] vfs, fscache: Force ->write_inode() to occur if cookie pinned for writeback
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (27 preceding siblings ...)
  2020-11-20 15:08 ` [RFC PATCH 29/76] fscache: Recast assertion in terms of cookie not being an index David Howells
@ 2020-11-20 15:08 ` David Howells
  2020-11-20 15:09 ` [RFC PATCH 31/76] fscache: Allow ->put_super() to be used to wait for cache operations David Howells
                   ` (45 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:08 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Use an inode flag, I_PINNING_FSCACHE_WB, to indicate that a cookie is
pinned in use by that inode for the purposes of writeback.

Pinning is necessary because the in-use pin from the open file is released
before the writeback takes place, but if the resources aren't pinned, the
dirty data can't be written to the cache.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fs-writeback.c         |    8 ++++++++
 fs/fscache/io.c           |   38 ++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h        |    3 +++
 include/linux/fscache.h   |   22 ++++++++++++++++++++++
 include/linux/writeback.h |    1 +
 5 files changed, 72 insertions(+)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index e6005c78bfa9..f7fb7fb37337 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1506,6 +1506,13 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 
 	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
 		inode->i_state |= I_DIRTY_PAGES;
+	else if (unlikely(inode->i_state & I_PINNING_FSCACHE_WB)) {
+		if (!(inode->i_state & I_DIRTY_PAGES)) {
+			inode->i_state &= ~I_PINNING_FSCACHE_WB;
+			wbc->unpinned_fscache_wb = true;
+			dirty |= I_PINNING_FSCACHE_WB; /* Cause write_inode */
+		}
+	}
 
 	spin_unlock(&inode->i_lock);
 
@@ -1517,6 +1524,7 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 		if (ret == 0)
 			ret = err;
 	}
+	wbc->unpinned_fscache_wb = false;
 	trace_writeback_single_inode(inode, wbc, nr_to_write);
 	return ret;
 }
diff --git a/fs/fscache/io.c b/fs/fscache/io.c
index 7ad13900b281..87ffe84c9f27 100644
--- a/fs/fscache/io.c
+++ b/fs/fscache/io.c
@@ -142,3 +142,41 @@ int __fscache_begin_operation(struct fscache_cookie *cookie,
 	return -ENOBUFS;
 }
 EXPORT_SYMBOL(__fscache_begin_operation);
+
+/**
+ * fscache_set_page_dirty - Mark page dirty and pin a cache object for writeback
+ * @page: The page being dirtied
+ * @cookie: The cookie referring to the cache object
+ *
+ * Set the dirty flag on a page and pin an in-use cache object in memory when
+ * dirtying a page so that writeback can later write to it.  This is intended
+ * to be called from the filesystem's ->set_page_dirty() method.
+ *
+ *  Returns 1 if PG_dirty was set on the page, 0 otherwise.
+ */
+int fscache_set_page_dirty(struct page *page, struct fscache_cookie *cookie)
+{
+	struct inode *inode = page->mapping->host;
+	bool need_use = false;
+
+	_enter("");
+
+	if (!__set_page_dirty_nobuffers(page))
+		return 0;
+	if (!fscache_cookie_valid(cookie))
+		return 1;
+
+	if (!(inode->i_state & I_PINNING_FSCACHE_WB)) {
+		spin_lock(&inode->i_lock);
+		if (!(inode->i_state & I_PINNING_FSCACHE_WB)) {
+			inode->i_state |= I_PINNING_FSCACHE_WB;
+			need_use = true;
+		}
+		spin_unlock(&inode->i_lock);
+
+		if (need_use)
+			fscache_use_cookie(cookie, true);
+	}
+	return 1;
+}
+EXPORT_SYMBOL(fscache_set_page_dirty);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f6dad577ea07..fdbac1252381 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2151,6 +2151,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
  *			Used to detect that mark_inode_dirty() should not move
  * 			inode between dirty lists.
  *
+ * I_PINNING_FSCACHE_WB	Inode is pinning an fscache object for writeback.
+ *
  * Q: What is the difference between I_WILL_FREE and I_FREEING?
  */
 #define I_DIRTY_SYNC		(1 << 0)
@@ -2173,6 +2175,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
 #define I_CREATING		(1 << 15)
 #define I_DONTCACHE		(1 << 16)
 #define I_SYNC_QUEUED		(1 << 17)
+#define I_PINNING_FSCACHE_WB	(1 << 18)
 
 #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
 #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 8d47c7a08456..d2fc98a5755a 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -19,6 +19,7 @@
 #include <linux/pagemap.h>
 #include <linux/pagevec.h>
 #include <linux/list_bl.h>
+#include <linux/writeback.h>
 
 #if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
 #define __fscache_available (1)
@@ -580,4 +581,25 @@ int fscache_write(struct fscache_op_resources *opr,
 	return ops->write(opr, start_pos, iter, term_func, term_func_priv);
 }
 
+#if __fscache_available
+extern int fscache_set_page_dirty(struct page *page, struct fscache_cookie *cookie);
+#else
+#define fscache_set_page_dirty(PAGE, COOKIE) (__set_page_dirty_nobuffers((PAGE)))
+#endif
+
+/**
+ * fscache_unpin_writeback - Unpin writeback resources
+ * @wbc: The writeback control
+ * @cookie: The cookie referring to the cache object
+ *
+ * Unpin the writeback resources pinned by fscache_set_page_dirty().  This is
+ * intended to be called by the netfs's ->write_inode() method.
+ */
+static inline void fscache_unpin_writeback(struct writeback_control *wbc,
+					   struct fscache_cookie *cookie)
+{
+	if (wbc->unpinned_fscache_wb)
+		fscache_unuse_cookie(cookie, NULL, NULL);
+}
+
 #endif /* _LINUX_FSCACHE_H */
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 8e5c5bb16e2d..5e7d1d38f261 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -69,6 +69,7 @@ struct writeback_control {
 	unsigned for_reclaim:1;		/* Invoked from the page allocator */
 	unsigned range_cyclic:1;	/* range_start is cyclic */
 	unsigned for_sync:1;		/* sync(2) WB_SYNC_ALL writeback */
+	unsigned unpinned_fscache_wb:1;	/* Cleared I_PINNING_FSCACHE_WB */
 
 	/*
 	 * When writeback IOs are bounced through async layers, only the



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 31/76] fscache: Allow ->put_super() to be used to wait for cache operations
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (28 preceding siblings ...)
  2020-11-20 15:08 ` [RFC PATCH 30/76] vfs, fscache: Force ->write_inode() to occur if cookie pinned for writeback David Howells
@ 2020-11-20 15:09 ` David Howells
  2020-11-20 15:09 ` [RFC PATCH 32/76] netfs: Make a netfs helper module David Howells
                   ` (44 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:09 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Provide a helper to allow ->put_super() to be used to wait for outstanding
cache operations that are pinning inodes.  The helper has a loop that waits
for the first inode that has a non-zero usage and a cookie.  It then calls
evict_inodes() to reduce the list and loops round again until it finds no
more candidate inodes.

Without this, evict_inodes() won't get rid of such operations, and the
"VFS: Busy inodes ..." message will be displayed and the inode abandoned.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/io.c         |   50 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fscache.h |    2 ++
 2 files changed, 52 insertions(+)

diff --git a/fs/fscache/io.c b/fs/fscache/io.c
index 87ffe84c9f27..de9ffc16eb4f 100644
--- a/fs/fscache/io.c
+++ b/fs/fscache/io.c
@@ -180,3 +180,53 @@ int fscache_set_page_dirty(struct page *page, struct fscache_cookie *cookie)
 	return 1;
 }
 EXPORT_SYMBOL(fscache_set_page_dirty);
+
+/**
+ * fscache_put_super - Wait for outstanding ops to complete
+ * @sb: The superblock to wait on
+ * @get_cookie: Function to get the cookie on an inode
+ *
+ * Wait for outstanding cache operations on the inodes of a superblock to
+ * complete as they might be pinning an inode.  This is designed to be called
+ * from ->put_super(), right before the "VFS: Busy inodes" check.
+ */
+void fscache_put_super(struct super_block *sb,
+		       struct fscache_cookie *(*get_cookie)(struct inode *inode))
+{
+	struct fscache_cookie *cookie;
+	struct inode *inode, *p;
+
+	while (!list_empty(&sb->s_inodes)) {
+		/* Find the first inode that we need to wait on */
+		inode = NULL;
+		cookie = NULL;
+		spin_lock(&sb->s_inode_list_lock);
+		list_for_each_entry(p, &sb->s_inodes, i_sb_list) {
+			if (atomic_inc_not_zero(&p->i_count)) {
+				inode = p;
+				cookie = get_cookie(inode);
+				if (!cookie) {
+					iput(inode);
+					inode = NULL;
+					cookie = NULL;
+					continue;
+				}
+				break;
+			}
+		}
+		spin_unlock(&sb->s_inode_list_lock);
+
+		if (inode) {
+			/* n_ops is kept artificially raised to stop wakeups */
+			atomic_dec(&cookie->n_ops);
+			wait_var_event(&cookie->n_ops, atomic_read(&cookie->n_ops) == 0);
+			atomic_inc(&cookie->n_ops);
+			iput(inode);
+		}
+
+		evict_inodes(sb);
+		if (!inode)
+			break;
+	}
+}
+EXPORT_SYMBOL(fscache_put_super);
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index d2fc98a5755a..38a252b06b54 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -204,6 +204,8 @@ extern int __fscache_begin_operation(struct fscache_cookie *, struct fscache_op_
 extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
 extern void __fscache_update_cookie(struct fscache_cookie *, const void *, const loff_t *);
 extern void __fscache_invalidate(struct fscache_cookie *, loff_t);
+extern void fscache_put_super(struct super_block *,
+			      struct fscache_cookie *(*get_cookie)(struct inode *));
 
 /**
  * fscache_register_netfs - Register a filesystem as desiring caching services



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 32/76] netfs: Make a netfs helper module
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (29 preceding siblings ...)
  2020-11-20 15:09 ` [RFC PATCH 31/76] fscache: Allow ->put_super() to be used to wait for cache operations David Howells
@ 2020-11-20 15:09 ` David Howells
  2020-11-20 15:09 ` [RFC PATCH 33/76] netfs: Provide readahead and readpage netfs helpers David Howells
                   ` (43 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:09 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Make a netfs helper module to manage read request segmentation, caching
support and transparent huge page support on behalf of a network
filesystem.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/Kconfig |    1 +
 fs/netfs/Kconfig   |    8 ++++++++
 2 files changed, 9 insertions(+)
 create mode 100644 fs/netfs/Kconfig

diff --git a/fs/fscache/Kconfig b/fs/fscache/Kconfig
index ce6f731065d0..b04f07160e5e 100644
--- a/fs/fscache/Kconfig
+++ b/fs/fscache/Kconfig
@@ -2,6 +2,7 @@
 
 config FSCACHE
 	tristate "General filesystem local caching manager"
+	depends on NETFS_SUPPORT
 	help
 	  This option enables a generic filesystem caching manager that can be
 	  used by various network and other filesystems to cache data locally.
diff --git a/fs/netfs/Kconfig b/fs/netfs/Kconfig
new file mode 100644
index 000000000000..2ebf90e6ca95
--- /dev/null
+++ b/fs/netfs/Kconfig
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config NETFS_SUPPORT
+	tristate "Support for network filesystem high-level I/O"
+	help
+	  This option enables support for network filesystems, including
+	  helpers for high-level buffered I/O, abstracting out read
+	  segmentation, local caching and transparent huge page support.



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 33/76] netfs: Provide readahead and readpage netfs helpers
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (30 preceding siblings ...)
  2020-11-20 15:09 ` [RFC PATCH 32/76] netfs: Make a netfs helper module David Howells
@ 2020-11-20 15:09 ` David Howells
  2020-11-20 15:09 ` [RFC PATCH 34/76] netfs: Use the cache David Howells
                   ` (42 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:09 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Add a pair of helper functions:

 (*) netfs_readahead()
 (*) netfs_readpage()

to do the work of handling a readahead or a readpage, where the page(s)
that form part of the request may be split between the local cache, the
server or just require clearing, and may be single pages and transparent
huge pages.  This is all handled within the helper.

Note that while both will read from the cache if there is data present,
only netfs_readahead() will expand the request beyond what it was asked to
do, and only netfs_readahead() will write back to the cache.

netfs_readpage(), on the other hand, is synchronous and only fetches the
page (which might be a THP) it is asked for.

The netfs gives the helper parameters from the VM, the cache cookie it
wants to use (or NULL) and a table of operations (only one of which is
mandatory):

 (*) expand_readahead() [optional]

     Called to allow the netfs to request an expansion of a readahead
     request to meet its own alignment requirements.  This is done by
     changing rreq->start and rreq->len.

 (*) clamp_length() [optional]

     Called to allow the netfs to cut down a subrequest to meet its own
     boundary requirements.  If it does this, the helper will generate
     additional subrequests until the full request is satisfied.

 (*) is_still_valid() [optional]

     Called to find out if the data just read from the cache has been
     invalidated and must be reread from the server.

 (*) issue_op() [required]

     Called to ask the netfs to issue a read to the server.  The subrequest
     describes the read.  The read request holds information about the file
     being accessed.

     The netfs can cache information in rreq->netfs_priv.

     Upon completion, the netfs should set the error, transferred and can
     also set FSCACHE_SREQ_CLEAR_TAIL and then call
     fscache_subreq_terminated().

 (*) done() [optional]

     Called after the pages have been unlocked.  The read request is still
     pinning the file and mapping and may still be pinning pages with
     PG_fscache.  rreq->error indicates any error that has been
     accumulated.

 (*) cleanup() [optional]

     Called when the helper is disposing of a finished read request.  This
     allows the netfs to clear rreq->netfs_priv.

Netfs support is enabled with CONFIG_NETFS_SUPPORT=y.  It will be built
even if CONFIG_FSCACHE=n and in this case much of it should be optimised
away, allowing the filesystem to use it even when caching is disabled.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/Kconfig             |    1 
 fs/Makefile            |    1 
 fs/afs/Kconfig         |    1 
 fs/netfs/Makefile      |    6 
 fs/netfs/internal.h    |   61 ++++
 fs/netfs/read_helper.c |  697 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/netfs.h  |   95 +++++++
 7 files changed, 862 insertions(+)
 create mode 100644 fs/netfs/Makefile
 create mode 100644 fs/netfs/internal.h
 create mode 100644 fs/netfs/read_helper.c
 create mode 100644 include/linux/netfs.h

diff --git a/fs/Kconfig b/fs/Kconfig
index aa4c12282301..e3dfbbea502f 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -307,6 +307,7 @@ menuconfig NETWORK_FILESYSTEMS
 
 if NETWORK_FILESYSTEMS
 
+source "fs/netfs/Kconfig"
 source "fs/nfs/Kconfig"
 source "fs/nfsd/Kconfig"
 
diff --git a/fs/Makefile b/fs/Makefile
index 999d1a23f036..e922c3bc0a43 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -68,6 +68,7 @@ obj-$(CONFIG_PROFILING)		+= dcookies.o
 obj-$(CONFIG_DLM)		+= dlm/
  
 # Do not add any filesystems before this line
+obj-$(CONFIG_NETFS_SUPPORT)	+= netfs/
 obj-$(CONFIG_FSCACHE)		+= fscache/
 obj-$(CONFIG_REISERFS_FS)	+= reiserfs/
 obj-$(CONFIG_EXT4_FS)		+= ext4/
diff --git a/fs/afs/Kconfig b/fs/afs/Kconfig
index 1ad211d72b3b..fc8ba9142f2f 100644
--- a/fs/afs/Kconfig
+++ b/fs/afs/Kconfig
@@ -4,6 +4,7 @@ config AFS_FS
 	depends on INET
 	select AF_RXRPC
 	select DNS_RESOLVER
+	select NETFS_SUPPORT
 	help
 	  If you say Y here, you will get an experimental Andrew File System
 	  driver. It currently only supports unsecured read-only AFS access.
diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
new file mode 100644
index 000000000000..4b4eff2ba369
--- /dev/null
+++ b/fs/netfs/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+
+netfs-y := \
+	read_helper.o
+
+obj-$(CONFIG_NETFS_SUPPORT) := netfs.o
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
new file mode 100644
index 000000000000..b07f43f9130e
--- /dev/null
+++ b/fs/netfs/internal.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* Internal definitions for network filesystem support
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) "netfs: " fmt
+
+/*
+ * read_helper.c
+ */
+extern unsigned int netfs_debug;
+
+#define netfs_stat(x) do {} while(0)
+#define netfs_stat_d(x) do {} while(0)
+
+/*****************************************************************************/
+/*
+ * debug tracing
+ */
+#define dbgprintk(FMT, ...) \
+	printk("[%-6.6s] "FMT"\n", current->comm, ##__VA_ARGS__)
+
+#define kenter(FMT, ...) dbgprintk("==> %s("FMT")", __func__, ##__VA_ARGS__)
+#define kleave(FMT, ...) dbgprintk("<== %s()"FMT"", __func__, ##__VA_ARGS__)
+#define kdebug(FMT, ...) dbgprintk(FMT, ##__VA_ARGS__)
+
+#ifdef __KDEBUG
+#define _enter(FMT, ...) kenter(FMT, ##__VA_ARGS__)
+#define _leave(FMT, ...) kleave(FMT, ##__VA_ARGS__)
+#define _debug(FMT, ...) kdebug(FMT, ##__VA_ARGS__)
+
+#elif defined(CONFIG_NETFS_DEBUG)
+#define _enter(FMT, ...)			\
+do {						\
+	if (netfs_debug)			\
+		kenter(FMT, ##__VA_ARGS__);	\
+} while (0)
+
+#define _leave(FMT, ...)			\
+do {						\
+	if (netfs_debug)			\
+		kleave(FMT, ##__VA_ARGS__);	\
+} while (0)
+
+#define _debug(FMT, ...)			\
+do {						\
+	if (netfs_debug)			\
+		kdebug(FMT, ##__VA_ARGS__);	\
+} while (0)
+
+#else
+#define _enter(FMT, ...) no_printk("==> %s("FMT")", __func__, ##__VA_ARGS__)
+#define _leave(FMT, ...) no_printk("<== %s()"FMT"", __func__, ##__VA_ARGS__)
+#define _debug(FMT, ...) no_printk(FMT, ##__VA_ARGS__)
+#endif
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
new file mode 100644
index 000000000000..d0e3d5e0ced4
--- /dev/null
+++ b/fs/netfs/read_helper.c
@@ -0,0 +1,697 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Network filesystem high-level read support.
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/export.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/file.h>
+#include <linux/slab.h>
+#include <linux/uio.h>
+#include <linux/sched/mm.h>
+#include <linux/task_io_accounting_ops.h>
+#include <linux/netfs.h>
+#include <linux/fscache.h>
+#include "internal.h"
+
+MODULE_DESCRIPTION("Network fs support");
+MODULE_AUTHOR("Red Hat, Inc.");
+MODULE_LICENSE("GPL");
+
+unsigned netfs_debug;
+module_param_named(debug, netfs_debug, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(netfs_debug, "Netfs support debugging mask");
+
+static void netfs_rreq_work(struct work_struct *);
+static void __netfs_put_subrequest(struct netfs_read_subrequest *);
+
+static void netfs_put_subrequest(struct netfs_read_subrequest *subreq)
+{
+	if (refcount_dec_and_test(&subreq->usage))
+		__netfs_put_subrequest(subreq);
+}
+
+static struct netfs_read_request *netfs_alloc_read_request(
+	const struct netfs_read_request_ops *ops, void *netfs_priv,
+	struct file *file)
+{
+	struct netfs_read_request *rreq;
+
+	rreq = kzalloc(sizeof(struct netfs_read_request), GFP_KERNEL);
+	if (rreq) {
+		rreq->netfs_ops	= ops;
+		rreq->netfs_priv = netfs_priv;
+		rreq->inode	= file_inode(file);
+		rreq->i_size	= i_size_read(rreq->inode);
+		INIT_LIST_HEAD(&rreq->subrequests);
+		INIT_WORK(&rreq->work, netfs_rreq_work);
+		refcount_set(&rreq->usage, 1);
+		__set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
+		ops->init_rreq(rreq, file);
+	}
+
+	return rreq;
+}
+
+static void netfs_get_read_request(struct netfs_read_request *rreq)
+{
+	refcount_inc(&rreq->usage);
+}
+
+static void netfs_rreq_clear_subreqs(struct netfs_read_request *rreq)
+{
+	struct netfs_read_subrequest *subreq;
+
+	while (!list_empty(&rreq->subrequests)) {
+		subreq = list_first_entry(&rreq->subrequests,
+					  struct netfs_read_subrequest, rreq_link);
+		list_del(&subreq->rreq_link);
+		netfs_put_subrequest(subreq);
+	}
+}
+
+static void netfs_put_read_request(struct netfs_read_request *rreq)
+{
+	if (refcount_dec_and_test(&rreq->usage)) {
+		netfs_rreq_clear_subreqs(rreq);
+		if (rreq->netfs_priv)
+			rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv);
+		kfree(rreq);
+	}
+}
+
+/*
+ * Allocate and partially initialise an I/O request structure.
+ */
+static struct netfs_read_subrequest *netfs_alloc_subrequest(
+	struct netfs_read_request *rreq)
+{
+	struct netfs_read_subrequest *subreq;
+
+	subreq = kzalloc(sizeof(struct netfs_read_subrequest), GFP_KERNEL);
+	if (subreq) {
+		INIT_LIST_HEAD(&subreq->rreq_link);
+		refcount_set(&subreq->usage, 2);
+		subreq->rreq = rreq;
+	}
+
+	return subreq;
+}
+
+static void netfs_get_read_subrequest(struct netfs_read_subrequest *subreq)
+{
+	refcount_inc(&subreq->usage);
+}
+
+static void __netfs_put_subrequest(struct netfs_read_subrequest *subreq)
+{
+	netfs_put_read_request(subreq->rreq);
+	kfree(subreq);
+}
+
+/*
+ * Clear the unread part of an I/O request.
+ */
+static void netfs_clear_unread(struct netfs_read_subrequest *subreq)
+{
+	struct iov_iter iter;
+
+	iov_iter_xarray(&iter, WRITE, &subreq->rreq->mapping->i_pages,
+			subreq->start + subreq->transferred,
+			subreq->len   - subreq->transferred);
+	iov_iter_zero(iov_iter_count(&iter), &iter);
+}
+
+/*
+ * Fill a subrequest region with zeroes.
+ */
+static void netfs_fill_with_zeroes(struct netfs_read_request *rreq,
+				   struct netfs_read_subrequest *subreq)
+{
+	__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+	netfs_subreq_terminated(subreq, 0);
+}
+
+/*
+ * Ask the netfs to issue a read request to the server for us.
+ *
+ * The netfs is expected to read from subreq->pos + subreq->transferred to
+ * subreq->pos + subreq->len - 1.  It may not backtrack and write data into the
+ * buffer prior to the transferred point as it might clobber dirty data
+ * obtained from the cache.
+ *
+ * Alternatively, the netfs is allowed to indicate one of two things:
+ *
+ * - NETFS_SREQ_SHORT_READ: A short read - it will get called again to try and
+ *   make progress.
+ *
+ * - NETFS_SREQ_CLEAR_TAIL: A short read - the rest of the buffer will be
+ *   cleared.
+ */
+static void netfs_read_from_server(struct netfs_read_request *rreq,
+				   struct netfs_read_subrequest *subreq)
+{
+	rreq->netfs_ops->issue_op(subreq);
+}
+
+/*
+ * Release those waiting.
+ */
+static void netfs_rreq_completed(struct netfs_read_request *rreq)
+{
+	netfs_rreq_clear_subreqs(rreq);
+	netfs_put_read_request(rreq);
+}
+
+/*
+ * Unlock the pages in a read operation.  We need to set PG_fscache on any
+ * pages we're going to write back before we unlock them.
+ */
+static void netfs_rreq_unlock(struct netfs_read_request *rreq)
+{
+	struct netfs_read_subrequest *subreq;
+	struct page *page;
+	unsigned int iopos, account = 0;
+	pgoff_t start_page = rreq->start / PAGE_SIZE;
+	pgoff_t last_page = ((rreq->start + rreq->len) / PAGE_SIZE) - 1;
+	bool subreq_failed = false;
+	int i;
+
+	XA_STATE(xas, &rreq->mapping->i_pages, start_page);
+
+	if (test_bit(NETFS_RREQ_FAILED, &rreq->flags)) {
+		__clear_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags);
+		list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
+			__clear_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags);
+		}
+	}
+
+	/* Walk through the pagecache and the I/O request lists simultaneously.
+	 * We may have a mixture of cached and uncached sections and we only
+	 * really want to write out the uncached sections.  This is slightly
+	 * complicated by the possibility that we might have huge pages with a
+	 * mixture inside.
+	 */
+	subreq = list_first_entry(&rreq->subrequests,
+				  struct netfs_read_subrequest, rreq_link);
+	iopos = 0;
+	subreq_failed = (subreq->error < 0);
+
+	rcu_read_lock();
+	xas_for_each(&xas, page, last_page) {
+		unsigned int pgpos = (page->index - start_page) * PAGE_SIZE;
+		unsigned int pgend = pgpos + thp_size(page);
+		bool pg_failed = false;
+
+		for (;;) {
+			if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags))
+				SetPageFsCache(page);
+			pg_failed |= subreq_failed;
+			if (pgend < iopos + subreq->len)
+				break;
+
+			account += subreq->transferred;
+			iopos += subreq->len;
+			if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) {
+				subreq = list_next_entry(subreq, rreq_link);
+				subreq_failed = (subreq->error < 0);
+			} else {
+				subreq = NULL;
+				subreq_failed = false;
+			}
+			if (pgend == iopos)
+				break;
+		}
+
+		if (!subreq_failed) {
+			for (i = 0; i < thp_nr_pages(page); i++)
+				flush_dcache_page(page);
+			SetPageUptodate(page);
+		}
+
+		if (!test_bit(NETFS_RREQ_DONT_UNLOCK_PAGES, &rreq->flags)) {
+			if (page->index == rreq->no_unlock_page &&
+			    test_bit(NETFS_RREQ_NO_UNLOCK_PAGE, &rreq->flags))
+				_debug("no unlock");
+			else
+				unlock_page(page);
+		}
+	}
+	rcu_read_unlock();
+
+	task_io_account_read(account);
+	if (rreq->netfs_ops->done)
+		rreq->netfs_ops->done(rreq);
+}
+
+/*
+ * Handle a short read.
+ */
+static void netfs_rreq_short_read(struct netfs_read_request *rreq,
+				  struct netfs_read_subrequest *subreq)
+{
+	__clear_bit(NETFS_SREQ_SHORT_READ, &subreq->flags);
+	__set_bit(NETFS_SREQ_SEEK_DATA_READ, &subreq->flags);
+
+	netfs_get_read_subrequest(subreq);
+	atomic_inc(&rreq->nr_rd_ops);
+	netfs_read_from_server(rreq, subreq);
+}
+
+/*
+ * Resubmit any short or failed operations.  Returns true if we got the rreq
+ * ref back.
+ */
+static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq)
+{
+	struct netfs_read_subrequest *subreq;
+
+	WARN_ON(in_softirq());
+
+	/* We don't want terminating submissions trying to wake us up whilst
+	 * we're still going through the list.
+	 */
+	atomic_inc(&rreq->nr_rd_ops);
+
+	__clear_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags);
+	list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
+		if (subreq->error) {
+			if (subreq->source != NETFS_READ_FROM_CACHE)
+				break;
+			subreq->source = NETFS_DOWNLOAD_FROM_SERVER;
+			subreq->error = 0;
+			netfs_get_read_subrequest(subreq);
+			atomic_inc(&rreq->nr_rd_ops);
+			netfs_read_from_server(rreq, subreq);
+		} else if (test_bit(NETFS_SREQ_SHORT_READ, &subreq->flags)) {
+			netfs_rreq_short_read(rreq, subreq);
+		}
+	}
+
+	/* If we decrement nr_rd_ops to 0, the usage ref belongs to us. */
+	if (atomic_dec_and_test(&rreq->nr_rd_ops))
+		return true;
+
+	wake_up_var(&rreq->nr_rd_ops);
+	return false;
+}
+
+/*
+ * Assess the state of a read request and decide what to do next.
+ *
+ * Note that we could be in an ordinary kernel thread, on a workqueue or in
+ * softirq context at this point.  We inherit a ref from the caller.
+ */
+static void netfs_rreq_assess(struct netfs_read_request *rreq)
+{
+again:
+	if (!test_bit(NETFS_RREQ_FAILED, &rreq->flags) &&
+	    test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags)) {
+		if (netfs_rreq_perform_resubmissions(rreq))
+			goto again;
+		return;
+	}
+
+	netfs_rreq_unlock(rreq);
+
+	clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
+	wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS);
+
+	netfs_rreq_completed(rreq);
+}
+
+static void netfs_rreq_work(struct work_struct *work)
+{
+	struct netfs_read_request *rreq =
+		container_of(work, struct netfs_read_request, work);
+	netfs_rreq_assess(rreq);
+}
+
+/*
+ * Handle the completion of all outstanding I/O operations on a read request.
+ * We inherit a ref from the caller.
+ */
+static void netfs_rreq_terminated(struct netfs_read_request *rreq)
+{
+	if (test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags) &&
+	    in_softirq()) {
+		if (!queue_work(system_unbound_wq, &rreq->work))
+			BUG();
+	} else {
+		netfs_rreq_assess(rreq);
+	}
+}
+
+/**
+ * netfs_subreq_terminated - Note the termination of an I/O operation.
+ * @subreq: The I/O request that has terminated.
+ *
+ * This tells the read helper that a contributory I/O operation has terminated,
+ * one way or another, and that it should integrate the results.
+ *
+ * The caller must have set @subreq->transferred, @subreq->error and
+ * @subreq->flags as appropriate for the outcome of the operation.  The helper
+ * will look after reissuing I/O operations as appropriate and writing
+ * downloaded data to the cache.
+ *
+ * This may be called from a softirq handler, so we want to avoid taking the
+ * spinlock if we can.
+ */
+void netfs_subreq_terminated(struct netfs_read_subrequest *subreq,
+			     ssize_t transferred_or_error)
+{
+	struct netfs_read_request *rreq = subreq->rreq;
+	int u;
+
+	_enter("[%u]{%llx,%lx},%zd",
+	       subreq->debug_index, subreq->start, subreq->flags,
+	       transferred_or_error);
+
+	if (IS_ERR_VALUE(transferred_or_error)) {
+		subreq->error = transferred_or_error;
+		goto failed;
+	}
+
+	if (WARN_ON(transferred_or_error > subreq->len - subreq->transferred))
+		transferred_or_error = subreq->len - subreq->transferred;
+
+	subreq->error = 0;
+	subreq->transferred += transferred_or_error;
+	if (subreq->transferred < subreq->len)
+		goto incomplete;
+
+complete:
+	__clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
+	if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags))
+		set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags);
+
+out:
+	/* If we decrement nr_rd_ops to 0, the ref belongs to us. */
+	u = atomic_dec_return(&rreq->nr_rd_ops);
+	if (u == 0)
+		netfs_rreq_terminated(rreq);
+	else if (u == 1)
+		wake_up_var(&rreq->nr_rd_ops);
+
+	netfs_put_subrequest(subreq);
+	return;
+
+incomplete:
+	if (test_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags)) {
+		netfs_clear_unread(subreq);
+		subreq->transferred = subreq->len;
+		goto complete;
+	}
+
+	if (transferred_or_error == 0) {
+		if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) {
+			subreq->error = -ENODATA;
+			goto failed;
+		}
+	} else {
+		__clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags);
+	}
+
+	__set_bit(NETFS_SREQ_SHORT_READ, &subreq->flags);
+	set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags);
+	goto out;
+
+failed:
+	if (subreq->source == NETFS_READ_FROM_CACHE) {
+		set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags);
+	} else {
+		set_bit(NETFS_RREQ_FAILED, &rreq->flags);
+		rreq->error = subreq->error;
+	}
+	goto out;
+}
+EXPORT_SYMBOL(netfs_subreq_terminated);
+
+static enum netfs_read_source netfs_cache_prepare_read(struct netfs_read_subrequest *subreq,
+						       loff_t i_size)
+{
+	struct netfs_read_request *rreq = subreq->rreq;
+
+	if (subreq->start >= rreq->i_size)
+		return NETFS_FILL_WITH_ZEROES;
+	return NETFS_DOWNLOAD_FROM_SERVER;
+}
+
+/*
+ * Work out what sort of subrequest the next one will be.
+ */
+static enum netfs_read_source
+netfs_rreq_prepare_read(struct netfs_read_request *rreq,
+			struct netfs_read_subrequest *subreq)
+{
+	enum netfs_read_source source;
+
+	_enter("%llx-%llx,%llx", subreq->start, subreq->start + subreq->len, rreq->i_size);
+
+	source = netfs_cache_prepare_read(subreq, rreq->i_size);
+	if (source == NETFS_INVALID_READ)
+		goto out;
+
+	if (source == NETFS_DOWNLOAD_FROM_SERVER) {
+		/* Call out to the netfs to let it shrink the request to fit
+		 * its own I/O sizes and boundaries.  If it shinks it here, it
+		 * will be called again to make simultaneous calls; if it wants
+		 * to make serial calls, it can indicate a short read and then
+		 * we will call it again.
+		 */
+		if (subreq->len > rreq->i_size - subreq->start)
+			subreq->len = rreq->i_size - subreq->start;
+
+		if (rreq->netfs_ops->clamp_length &&
+		    !rreq->netfs_ops->clamp_length(subreq)) {
+			source = NETFS_INVALID_READ;
+			goto out;
+		}
+	}
+
+	if (WARN_ON(subreq->len == 0))
+		source = NETFS_INVALID_READ;
+
+out:
+	subreq->source = source;
+	return source;
+}
+
+/*
+ * Slice off a piece of a read request and submit an I/O request for it.
+ */
+static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq,
+				    unsigned int *_debug_index)
+{
+	struct netfs_read_subrequest *subreq;
+	enum netfs_read_source source;
+
+	subreq = netfs_alloc_subrequest(rreq);
+	if (!subreq)
+		return false;
+
+	subreq->debug_index	= (*_debug_index)++;
+	subreq->start		= rreq->start + rreq->submitted;
+	subreq->len		= rreq->len   - rreq->submitted;
+
+	_debug("slice %llx,%zx,%zx", subreq->start, subreq->len, rreq->submitted);
+
+	/* Call out to the cache to find out what it can do with the remaining
+	 * subset.  It tells us in subreq->flags what it decided should be done
+	 * and adjusts subreq->len down if the subset crosses a cache boundary.
+	 *
+	 * Then when we hand the subset, it can choose to take a subset of that
+	 * (the starts must coincide), in which case, we go around the loop
+	 * again and ask it to download the next piece.
+	 */
+	source = netfs_rreq_prepare_read(rreq, subreq);
+	if (source == NETFS_INVALID_READ)
+		goto subreq_failed;
+
+	netfs_get_read_request(rreq);
+	atomic_inc(&rreq->nr_rd_ops);
+	list_add_tail(&subreq->rreq_link, &rreq->subrequests);
+
+	rreq->submitted += subreq->len;
+
+	switch (source) {
+	case NETFS_FILL_WITH_ZEROES:
+		netfs_fill_with_zeroes(rreq, subreq);
+		break;
+	case NETFS_DOWNLOAD_FROM_SERVER:
+		netfs_read_from_server(rreq, subreq);
+		break;
+	default:
+		BUG();
+	}
+
+	return true;
+
+subreq_failed:
+	netfs_put_subrequest(subreq);
+	netfs_put_subrequest(subreq);
+	return false;
+}
+
+static void netfs_rreq_expand(struct netfs_read_request *rreq,
+			      struct readahead_control *ractl)
+{
+	/* Give the netfs a chance to change the request parameters.  The
+	 * resultant request must contain the original region.
+	 */
+	if (rreq->netfs_ops->expand_readahead)
+		rreq->netfs_ops->expand_readahead(rreq);
+
+	/* Expand the request if the cache wants it to start earlier.  Note
+	 * that the expansion may get further extended if the VM wishes to
+	 * insert THPs and the preferred start and/or end wind up in the middle
+	 * of THPs.
+	 *
+	 * If this is the case, however, the THP size should be an integer
+	 * multiple of the cache granule size, so we get a whole number of
+	 * granules to deal with.
+	 */
+	if (rreq->start  != readahead_pos(ractl) ||
+	    rreq->len != readahead_length(ractl)) {
+		readahead_expand(ractl, rreq->start, rreq->len);
+		rreq->start  = readahead_pos(ractl);
+		rreq->len = readahead_length(ractl);
+	}
+}
+
+/**
+ * netfs_readahead - Helper to manage a read request
+ * @ractl: The description of the readahead request
+ * @ops: The network filesystem's operations for the helper to use
+ * @netfs_priv: Private netfs data to be retained in the request
+ *
+ * Fulfil a readahead request by drawing data from the cache if possible, or
+ * the netfs if not.  Space beyond the EOF is zero-filled.  Multiple I/O
+ * requests from different sources will get munged together.  If necessary, the
+ * readahead window can be expanded in either direction to a more convenient
+ * alighment for RPC efficiency or to make storage in the cache feasible.
+ *
+ * The calling netfs must provide a table of operations, only one of which,
+ * issue_op, is mandatory.  It may also be passed a private token, which will
+ * be retained in rreq->netfs_priv and will be cleaned up by ops->cleanup().
+ *
+ * This is usable whether or not caching is enabled.
+ */
+void netfs_readahead(struct readahead_control *ractl,
+		     const struct netfs_read_request_ops *ops,
+		     void *netfs_priv)
+{
+	struct netfs_read_request *rreq;
+	struct page *page;
+	unsigned int debug_index = 0;
+
+	_enter("%lx,%x", readahead_index(ractl), readahead_count(ractl));
+
+	if (readahead_count(ractl) == 0)
+		goto cleanup;
+
+	rreq = netfs_alloc_read_request(ops, netfs_priv, ractl->file);
+	if (!rreq)
+		goto cleanup;
+	rreq->mapping	= ractl->mapping;
+	rreq->start	= readahead_pos(ractl);
+	rreq->len	= readahead_length(ractl);
+
+	netfs_rreq_expand(rreq, ractl);
+
+	atomic_set(&rreq->nr_rd_ops, 1);
+	do {
+		if (!netfs_rreq_submit_slice(rreq, &debug_index))
+			break;
+
+	} while (rreq->submitted < rreq->len);
+
+	if (rreq->submitted == 0) {
+		netfs_put_read_request(rreq);
+		return;
+	}
+
+	// TODO: If we didn't submit enough readage, we need to try punting to
+	// a work queue.
+
+	while ((page = readahead_page(ractl)))
+		put_page(page);
+
+	/* If we decrement nr_rd_ops to 0, the ref belongs to us. */
+	if (atomic_dec_and_test(&rreq->nr_rd_ops))
+		netfs_rreq_assess(rreq);
+	return;
+
+cleanup:
+	if (netfs_priv)
+		ops->cleanup(ractl->mapping, netfs_priv);
+	return;
+}
+EXPORT_SYMBOL(netfs_readahead);
+
+/**
+ * netfs_page - Helper to manage a readpage request
+ * @file: The file to read from
+ * @page: The page to read
+ * @ops: The network filesystem's operations for the helper to use
+ * @netfs_priv: Private netfs data to be retained in the request
+ *
+ * Fulfil a readpage request by drawing data from the cache if possible, or the
+ * netfs if not.  Space beyond the EOF is zero-filled.  Multiple I/O requests
+ * from different sources will get munged together.
+ *
+ * The calling netfs must provide a table of operations, only one of which,
+ * issue_op, is mandatory.  It may also be passed a private token, which will
+ * be retained in rreq->netfs_priv and will be cleaned up by ops->cleanup().
+ *
+ * This is usable whether or not caching is enabled.
+ */
+int netfs_readpage(struct file *file,
+		   struct page *page,
+		   const struct netfs_read_request_ops *ops,
+		   void *netfs_priv)
+{
+	struct netfs_read_request *rreq;
+	unsigned int debug_index = 0;
+	int ret;
+
+	_enter("%lx", page->index);
+
+	rreq = netfs_alloc_read_request(ops, netfs_priv, file);
+	if (!rreq) {
+		if (netfs_priv)
+			ops->cleanup(netfs_priv, page->mapping);
+		return -ENOMEM;
+	}
+	rreq->mapping		= page->mapping;
+	rreq->start		= page->index * PAGE_SIZE;
+	rreq->len		= thp_size(page);
+
+	netfs_get_read_request(rreq);
+
+	atomic_set(&rreq->nr_rd_ops, 1);
+	do {
+		if (!netfs_rreq_submit_slice(rreq, &debug_index))
+			break;
+
+	} while (rreq->submitted < rreq->len);
+
+	// TODO: If we didn't submit enough readage, we need to clean up
+
+	/* Keep nr_rd_ops incremented so that the ref always belongs to us, and
+	 * the service code isn't punted off to a random thread pool to
+	 * process.
+	 */
+	do {
+		wait_var_event(&rreq->nr_rd_ops, atomic_read(&rreq->nr_rd_ops) == 1);
+		netfs_rreq_assess(rreq);
+	} while (test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags));
+
+	ret = rreq->error;
+	netfs_put_read_request(rreq);
+	return ret;
+}
+EXPORT_SYMBOL(netfs_readpage);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
new file mode 100644
index 000000000000..722b8131a1ea
--- /dev/null
+++ b/include/linux/netfs.h
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* Network filesystem support services.
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * See: Documentation/filesystems/caching/netfs-support.rst
+ *
+ * for a description of the network filesystem interface declared here.
+ */
+
+#ifndef _LINUX_NETFS_H
+#define _LINUX_NETFS_H
+
+enum netfs_read_source {
+	NETFS_FILL_WITH_ZEROES,
+	NETFS_DOWNLOAD_FROM_SERVER,
+	NETFS_READ_FROM_CACHE,
+	NETFS_INVALID_READ,
+} __mode(byte);
+
+/*
+ * Descriptor for a single component subrequest.
+ */
+struct netfs_read_subrequest {
+	struct netfs_read_request *rreq;	/* Supervising read request */
+	struct list_head	rreq_link;	/* Link in rreq->subrequests */
+	loff_t			start;		/* Where to start the I/O */
+	size_t			len;		/* Size of the I/O */
+	size_t			transferred;	/* Amount of data transferred */
+	refcount_t		usage;
+	short			error;		/* 0 or error that occurred */
+	unsigned short		debug_index;	/* Index in list (for debugging output) */
+	enum netfs_read_source	source;		/* Where to read from */
+	unsigned long		flags;
+#define NETFS_SREQ_WRITE_TO_CACHE	0	/* Set if should write to cache */
+#define NETFS_SREQ_CLEAR_TAIL		1	/* Set if the rest of the read should be cleared */
+#define NETFS_SREQ_SHORT_READ		2	/* Set if there was a short read from the cache */
+#define NETFS_SREQ_SEEK_DATA_READ	3	/* Set if ->read() should SEEK_DATA first */
+#define NETFS_SREQ_NO_PROGRESS		4	/* Set if we didn't manage to read any data */
+};
+
+/*
+ * Descriptor for a read helper request.  This is used to make multiple I/O
+ * requests on a variety of sources and then stitch the result together.
+ */
+struct netfs_read_request {
+	struct inode		*inode;		/* The file being accessed */
+	struct address_space	*mapping;	/* The mapping being accessed */
+	struct list_head	subrequests;	/* Requests to fetch I/O from disk or net */
+	struct work_struct	work;
+	void			*netfs_priv;	/* Private data for the netfs */
+	atomic_t		nr_rd_ops;	/* Number of read ops in progress */
+	size_t			submitted;	/* Amount submitted for I/O so far */
+	size_t			len;		/* Length of the request */
+	short			error;		/* 0 or error that occurred */
+	loff_t			i_size;		/* Size of the file */
+	loff_t			start;		/* Start position */
+	pgoff_t			no_unlock_page;	/* Don't unlock this page after read */
+	refcount_t		usage;
+	unsigned long		flags;
+#define NETFS_RREQ_INCOMPLETE_IO	0	/* Some ioreqs terminated short or with error */
+#define NETFS_RREQ_WRITE_TO_CACHE	1	/* Need to write to the cache */
+#define NETFS_RREQ_NO_UNLOCK_PAGE	2	/* Don't unlock no_unlock_page on completion */
+#define NETFS_RREQ_DONT_UNLOCK_PAGES	3	/* Don't unlock the pages on completion */
+#define NETFS_RREQ_FAILED		4	/* The request failed */
+#define NETFS_RREQ_IN_PROGRESS		5	/* Unlocked when the request completes */
+	const struct netfs_read_request_ops *netfs_ops;
+};
+
+/*
+ * Operations the network filesystem can/must provide to the helpers.
+ */
+struct netfs_read_request_ops {
+	void (*init_rreq)(struct netfs_read_request *rreq, struct file *file);
+	void (*expand_readahead)(struct netfs_read_request *rreq);
+	bool (*clamp_length)(struct netfs_read_subrequest *subreq);
+	void (*issue_op)(struct netfs_read_subrequest *subreq);
+	bool (*is_still_valid)(struct netfs_read_request *rreq);
+	void (*done)(struct netfs_read_request *rreq);
+	void (*cleanup)(struct address_space *mapping, void *netfs_priv);
+};
+
+struct readahead_control;
+extern void netfs_readahead(struct readahead_control *,
+			    const struct netfs_read_request_ops *,
+			    void *);
+extern int netfs_readpage(struct file *,
+			  struct page *,
+			  const struct netfs_read_request_ops *,
+			  void *);
+
+extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t);
+
+#endif /* _LINUX_NETFS_H */



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 34/76] netfs: Use the cache
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (31 preceding siblings ...)
  2020-11-20 15:09 ` [RFC PATCH 33/76] netfs: Provide readahead and readpage netfs helpers David Howells
@ 2020-11-20 15:09 ` David Howells
  2020-11-20 15:10 ` [RFC PATCH 35/76] fscache: read-helper: Add tracepoints David Howells
                   ` (41 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:09 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Add support to the netfs support read helpers for reading data from the
cache instead of downloading it from the server and support for writing
data just downloaded or cleared to the cache.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/io.c         |    1 
 fs/netfs/read_helper.c  |  197 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fscache.h |   11 +++
 include/linux/netfs.h   |    5 +
 4 files changed, 213 insertions(+), 1 deletion(-)

diff --git a/fs/fscache/io.c b/fs/fscache/io.c
index de9ffc16eb4f..90c056e85cea 100644
--- a/fs/fscache/io.c
+++ b/fs/fscache/io.c
@@ -9,6 +9,7 @@
 #include <linux/module.h>
 #include <linux/fscache-cache.h>
 #include <linux/slab.h>
+#include <linux/netfs.h>
 #include "internal.h"
 
 /*
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index d0e3d5e0ced4..f5185402b504 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -79,6 +79,7 @@ static void netfs_put_read_request(struct netfs_read_request *rreq)
 		netfs_rreq_clear_subreqs(rreq);
 		if (rreq->netfs_priv)
 			rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv);
+		fscache_end_operation(&rreq->cache_resources);
 		kfree(rreq);
 	}
 }
@@ -125,6 +126,31 @@ static void netfs_clear_unread(struct netfs_read_subrequest *subreq)
 	iov_iter_zero(iov_iter_count(&iter), &iter);
 }
 
+static void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error)
+{
+	struct netfs_read_subrequest *subreq = priv;
+
+	netfs_subreq_terminated(subreq, transferred_or_error);
+}
+
+/*
+ * Issue a read against the cache.
+ * - Eats the caller's ref on subreq.
+ */
+static void netfs_read_from_cache(struct netfs_read_request *rreq,
+				  struct netfs_read_subrequest *subreq,
+				  bool seek_data)
+{
+	struct iov_iter iter;
+
+	iov_iter_xarray(&iter, READ, &rreq->mapping->i_pages,
+			subreq->start + subreq->transferred,
+			subreq->len   - subreq->transferred);
+
+	fscache_read(&rreq->cache_resources, subreq->start, &iter, seek_data,
+		     netfs_cache_read_terminated, subreq);
+}
+
 /*
  * Fill a subrequest region with zeroes.
  */
@@ -166,6 +192,122 @@ static void netfs_rreq_completed(struct netfs_read_request *rreq)
 	netfs_put_read_request(rreq);
 }
 
+/*
+ * Deal with the completion of writing the data to the cache.  We have to clear
+ * the PG_fscache bits on the pages involved and release the caller's ref.
+ *
+ * May be called in softirq mode and we inherit a ref from the caller.
+ */
+static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq)
+{
+	struct netfs_read_subrequest *subreq;
+	struct page *page;
+	pgoff_t unlocked = 0;
+	bool have_unlocked = false;
+
+	rcu_read_lock();
+
+	list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
+		XA_STATE(xas, &rreq->mapping->i_pages, subreq->start / PAGE_SIZE);
+
+		xas_for_each(&xas, page, (subreq->start + subreq->len - 1) / PAGE_SIZE) {
+			/* We might have multiple writes from the same huge
+			 * page, but we mustn't unlock a page more than once.
+			 */
+			if (have_unlocked && page->index <= unlocked)
+				continue;
+			unlocked = page->index;
+			unlock_page_fscache(page);
+			have_unlocked = true;
+		}
+	}
+
+	rcu_read_unlock();
+	netfs_rreq_completed(rreq);
+}
+
+static void netfs_rreq_copy_terminated(void *priv, ssize_t transferred_or_error)
+{
+	struct netfs_read_subrequest *subreq = priv;
+	struct netfs_read_request *rreq = subreq->rreq;
+
+	if (IS_ERR_VALUE(transferred_or_error))
+		subreq->error = transferred_or_error;
+	else
+		subreq->error = 0;
+
+	/* If we decrement nr_wr_ops to 0, the ref belongs to us. */
+	if (atomic_dec_and_test(&rreq->nr_wr_ops))
+		netfs_rreq_unmark_after_write(rreq);
+
+	netfs_put_subrequest(subreq);
+}
+
+/*
+ * Perform any outstanding writes to the cache.  We inherit a ref from the
+ * caller.
+ */
+static void netfs_rreq_do_write_to_cache(struct netfs_read_request *rreq)
+{
+	struct netfs_read_subrequest *subreq, *next, *p;
+	struct iov_iter iter;
+
+	/* We don't want terminating writes trying to wake us up whilst we're
+	 * still going through the list.
+	 */
+	atomic_inc(&rreq->nr_wr_ops);
+
+	list_for_each_entry_safe(subreq, p, &rreq->subrequests, rreq_link) {
+		if (!test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) {
+			list_del_init(&subreq->rreq_link);
+			netfs_put_subrequest(subreq);
+		}
+	}
+
+	list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
+		/* Amalgamate adjacent writes */
+		while (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) {
+			next = list_next_entry(subreq, rreq_link);
+			if (next->start > subreq->start + subreq->len)
+				break;
+			subreq->len += next->len;
+			list_del_init(&next->rreq_link);
+			netfs_put_subrequest(next);
+		}
+
+		iov_iter_xarray(&iter, WRITE, &rreq->mapping->i_pages,
+				subreq->start, subreq->len);
+
+		atomic_inc(&rreq->nr_wr_ops);
+		netfs_get_read_subrequest(subreq);
+		fscache_write(&rreq->cache_resources, subreq->start, &iter,
+			      netfs_rreq_copy_terminated, subreq);
+	}
+
+	/* If we decrement nr_wr_ops to 0, the usage ref belongs to us. */
+	if (atomic_dec_and_test(&rreq->nr_wr_ops))
+		netfs_rreq_unmark_after_write(rreq);
+}
+
+static void netfs_rreq_write_to_cache_work(struct work_struct *work)
+{
+	struct netfs_read_request *rreq =
+		container_of(work, struct netfs_read_request, work);
+
+	netfs_rreq_do_write_to_cache(rreq);
+}
+
+static void netfs_rreq_write_to_cache(struct netfs_read_request *rreq)
+{
+	if (in_softirq()) {
+		rreq->work.func = netfs_rreq_write_to_cache_work;
+		if (!queue_work(system_unbound_wq, &rreq->work))
+			BUG();
+	} else {
+		netfs_rreq_do_write_to_cache(rreq);
+	}
+}
+
 /*
  * Unlock the pages in a read operation.  We need to set PG_fscache on any
  * pages we're going to write back before we unlock them.
@@ -258,7 +400,10 @@ static void netfs_rreq_short_read(struct netfs_read_request *rreq,
 
 	netfs_get_read_subrequest(subreq);
 	atomic_inc(&rreq->nr_rd_ops);
-	netfs_read_from_server(rreq, subreq);
+	if (subreq->source == NETFS_READ_FROM_CACHE)
+		netfs_read_from_cache(rreq, subreq, true);
+	else
+		netfs_read_from_server(rreq, subreq);
 }
 
 /*
@@ -299,6 +444,25 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq)
 	return false;
 }
 
+/*
+ * Check to see if the data read is still valid.
+ */
+static void netfs_rreq_is_still_valid(struct netfs_read_request *rreq)
+{
+	struct netfs_read_subrequest *subreq;
+
+	if (!rreq->netfs_ops->is_still_valid ||
+	    rreq->netfs_ops->is_still_valid(rreq))
+		return;
+
+	list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
+		if (subreq->source == NETFS_READ_FROM_CACHE) {
+			subreq->error = -ESTALE;
+			__set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags);
+		}
+	}
+}
+
 /*
  * Assess the state of a read request and decide what to do next.
  *
@@ -308,6 +472,8 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq)
 static void netfs_rreq_assess(struct netfs_read_request *rreq)
 {
 again:
+	netfs_rreq_is_still_valid(rreq);
+
 	if (!test_bit(NETFS_RREQ_FAILED, &rreq->flags) &&
 	    test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags)) {
 		if (netfs_rreq_perform_resubmissions(rreq))
@@ -320,6 +486,9 @@ static void netfs_rreq_assess(struct netfs_read_request *rreq)
 	clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
 	wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS);
 
+	if (test_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags))
+		return netfs_rreq_write_to_cache(rreq);
+
 	netfs_rreq_completed(rreq);
 }
 
@@ -434,7 +603,11 @@ static enum netfs_read_source netfs_cache_prepare_read(struct netfs_read_subrequ
 						       loff_t i_size)
 {
 	struct netfs_read_request *rreq = subreq->rreq;
+	struct fscache_op_resources *opr = &rreq->cache_resources;
+	const struct fscache_op_ops *ops = fscache_operation_valid(opr);
 
+	if (ops)
+		return ops->prepare_read(subreq, i_size);
 	if (subreq->start >= rreq->i_size)
 		return NETFS_FILL_WITH_ZEROES;
 	return NETFS_DOWNLOAD_FROM_SERVER;
@@ -524,6 +697,9 @@ static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq,
 	case NETFS_DOWNLOAD_FROM_SERVER:
 		netfs_read_from_server(rreq, subreq);
 		break;
+	case NETFS_READ_FROM_CACHE:
+		netfs_read_from_cache(rreq, subreq, false);
+		break;
 	default:
 		BUG();
 	}
@@ -536,9 +712,24 @@ static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq,
 	return false;
 }
 
+static void netfs_cache_expand_readahead(struct netfs_read_request *rreq,
+					 loff_t *_start, size_t *_len, loff_t i_size)
+{
+	struct fscache_op_resources *opr = &rreq->cache_resources;
+	const struct fscache_op_ops *ops = fscache_operation_valid(opr);
+
+	if (ops)
+		ops->expand_readahead(opr, _start, _len, i_size);
+}
+
 static void netfs_rreq_expand(struct netfs_read_request *rreq,
 			      struct readahead_control *ractl)
 {
+	/* Give the cache a chance to change the request parameters.  The
+	 * resultant request must contain the original region.
+	 */
+	netfs_cache_expand_readahead(rreq, &rreq->start, &rreq->len, rreq->i_size);
+
 	/* Give the netfs a chance to change the request parameters.  The
 	 * resultant request must contain the original region.
 	 */
@@ -600,6 +791,8 @@ void netfs_readahead(struct readahead_control *ractl,
 	rreq->start	= readahead_pos(ractl);
 	rreq->len	= readahead_length(ractl);
 
+	if (ops->begin_cache_operation)
+		ops->begin_cache_operation(rreq);
 	netfs_rreq_expand(rreq, ractl);
 
 	atomic_set(&rreq->nr_rd_ops, 1);
@@ -670,6 +863,8 @@ int netfs_readpage(struct file *file,
 	rreq->start		= page->index * PAGE_SIZE;
 	rreq->len		= thp_size(page);
 
+	if (ops->begin_cache_operation)
+		ops->begin_cache_operation(rreq);
 	netfs_get_read_request(rreq);
 
 	atomic_set(&rreq->nr_rd_ops, 1);
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 38a252b06b54..3c53386e4f6e 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -52,6 +52,7 @@ struct fscache_cookie;
 struct fscache_netfs;
 struct fscache_op_resources;
 struct fscache_op_ops;
+struct netfs_read_subrequest;
 
 typedef void (*fscache_io_terminated_t)(void *priv, ssize_t transferred_or_error);
 
@@ -174,6 +175,16 @@ struct fscache_op_ops {
 		     struct iov_iter *iter,
 		     fscache_io_terminated_t term_func,
 		     void *term_func_priv);
+
+	/* Expand readahead request */
+	void (*expand_readahead)(struct fscache_op_resources *opr,
+				 loff_t *_start, size_t *_len, loff_t i_size);
+
+	/* Prepare a read operation, shortening it to a cached/uncached
+	 * boundary as appropriate.
+	 */
+	enum netfs_read_source (*prepare_read)(struct netfs_read_subrequest *subreq,
+					       loff_t i_size);
 };
 
 /*
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 722b8131a1ea..f2ffea11ea9b 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -12,6 +12,8 @@
 #ifndef _LINUX_NETFS_H
 #define _LINUX_NETFS_H
 
+#include <linux/fscache.h>
+
 enum netfs_read_source {
 	NETFS_FILL_WITH_ZEROES,
 	NETFS_DOWNLOAD_FROM_SERVER,
@@ -47,10 +49,12 @@ struct netfs_read_subrequest {
 struct netfs_read_request {
 	struct inode		*inode;		/* The file being accessed */
 	struct address_space	*mapping;	/* The mapping being accessed */
+	struct fscache_op_resources cache_resources;
 	struct list_head	subrequests;	/* Requests to fetch I/O from disk or net */
 	struct work_struct	work;
 	void			*netfs_priv;	/* Private data for the netfs */
 	atomic_t		nr_rd_ops;	/* Number of read ops in progress */
+	atomic_t		nr_wr_ops;	/* Number of write ops in progress */
 	size_t			submitted;	/* Amount submitted for I/O so far */
 	size_t			len;		/* Length of the request */
 	short			error;		/* 0 or error that occurred */
@@ -73,6 +77,7 @@ struct netfs_read_request {
  */
 struct netfs_read_request_ops {
 	void (*init_rreq)(struct netfs_read_request *rreq, struct file *file);
+	int (*begin_cache_operation)(struct netfs_read_request *rreq);
 	void (*expand_readahead)(struct netfs_read_request *rreq);
 	bool (*clamp_length)(struct netfs_read_subrequest *subreq);
 	void (*issue_op)(struct netfs_read_subrequest *subreq);



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 35/76] fscache: read-helper: Add tracepoints
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (32 preceding siblings ...)
  2020-11-20 15:09 ` [RFC PATCH 34/76] netfs: Use the cache David Howells
@ 2020-11-20 15:10 ` David Howells
  2020-11-20 15:10 ` [RFC PATCH 36/76] cachefiles: Remove some redundant checks on unsigned values David Howells
                   ` (40 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:10 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Add three tracepoints to track the activity of the read helpers:

 (1) fscache_support/fscache_read

     This logs entry to the read helpers and also expansion of the range in
     a readahead request.

 (2) fscache_support/fscache_rreq

     This logs the progress of fscache_read_request objects which track
     read requests.  A read request may be a compound of multiple
     subrequests.

 (3) fscache_support/fscache_sreq

     This logs the progress of fscache_read_subrequest objects, which track
     the contributions from various sources to a read request.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/netfs/read_helper.c       |   37 ++++++++
 include/linux/netfs.h        |    2 
 include/trace/events/netfs.h |  200 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 239 insertions(+)
 create mode 100644 include/trace/events/netfs.h

diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index f5185402b504..21d3578c6979 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -16,6 +16,8 @@
 #include <linux/netfs.h>
 #include <linux/fscache.h>
 #include "internal.h"
+#define CREATE_TRACE_POINTS
+#include <trace/events/netfs.h>
 
 MODULE_DESCRIPTION("Network fs support");
 MODULE_AUTHOR("Red Hat, Inc.");
@@ -38,6 +40,7 @@ static struct netfs_read_request *netfs_alloc_read_request(
 	const struct netfs_read_request_ops *ops, void *netfs_priv,
 	struct file *file)
 {
+	static atomic_t debug_ids;
 	struct netfs_read_request *rreq;
 
 	rreq = kzalloc(sizeof(struct netfs_read_request), GFP_KERNEL);
@@ -46,6 +49,7 @@ static struct netfs_read_request *netfs_alloc_read_request(
 		rreq->netfs_priv = netfs_priv;
 		rreq->inode	= file_inode(file);
 		rreq->i_size	= i_size_read(rreq->inode);
+		rreq->debug_id	= atomic_inc_return(&debug_ids);
 		INIT_LIST_HEAD(&rreq->subrequests);
 		INIT_WORK(&rreq->work, netfs_rreq_work);
 		refcount_set(&rreq->usage, 1);
@@ -80,6 +84,7 @@ static void netfs_put_read_request(struct netfs_read_request *rreq)
 		if (rreq->netfs_priv)
 			rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv);
 		fscache_end_operation(&rreq->cache_resources);
+		trace_netfs_rreq(rreq, netfs_rreq_trace_free);
 		kfree(rreq);
 	}
 }
@@ -109,6 +114,7 @@ static void netfs_get_read_subrequest(struct netfs_read_subrequest *subreq)
 
 static void __netfs_put_subrequest(struct netfs_read_subrequest *subreq)
 {
+	trace_netfs_sreq(subreq, netfs_sreq_trace_free);
 	netfs_put_read_request(subreq->rreq);
 	kfree(subreq);
 }
@@ -188,6 +194,7 @@ static void netfs_read_from_server(struct netfs_read_request *rreq,
  */
 static void netfs_rreq_completed(struct netfs_read_request *rreq)
 {
+	trace_netfs_rreq(rreq, netfs_rreq_trace_done);
 	netfs_rreq_clear_subreqs(rreq);
 	netfs_put_read_request(rreq);
 }
@@ -205,6 +212,8 @@ static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq)
 	pgoff_t unlocked = 0;
 	bool have_unlocked = false;
 
+	trace_netfs_rreq(rreq, netfs_rreq_trace_unmark);
+
 	rcu_read_lock();
 
 	list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
@@ -236,6 +245,8 @@ static void netfs_rreq_copy_terminated(void *priv, ssize_t transferred_or_error)
 	else
 		subreq->error = 0;
 
+	trace_netfs_sreq(subreq, netfs_sreq_trace_write_term);
+
 	/* If we decrement nr_wr_ops to 0, the ref belongs to us. */
 	if (atomic_dec_and_test(&rreq->nr_wr_ops))
 		netfs_rreq_unmark_after_write(rreq);
@@ -252,6 +263,8 @@ static void netfs_rreq_do_write_to_cache(struct netfs_read_request *rreq)
 	struct netfs_read_subrequest *subreq, *next, *p;
 	struct iov_iter iter;
 
+	trace_netfs_rreq(rreq, netfs_rreq_trace_write);
+
 	/* We don't want terminating writes trying to wake us up whilst we're
 	 * still going through the list.
 	 */
@@ -275,6 +288,8 @@ static void netfs_rreq_do_write_to_cache(struct netfs_read_request *rreq)
 			netfs_put_subrequest(next);
 		}
 
+		trace_netfs_sreq(subreq, netfs_sreq_trace_write);
+
 		iov_iter_xarray(&iter, WRITE, &rreq->mapping->i_pages,
 				subreq->start, subreq->len);
 
@@ -342,6 +357,8 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq)
 	iopos = 0;
 	subreq_failed = (subreq->error < 0);
 
+	trace_netfs_rreq(rreq, netfs_rreq_trace_unlock);
+
 	rcu_read_lock();
 	xas_for_each(&xas, page, last_page) {
 		unsigned int pgpos = (page->index - start_page) * PAGE_SIZE;
@@ -398,6 +415,8 @@ static void netfs_rreq_short_read(struct netfs_read_request *rreq,
 	__clear_bit(NETFS_SREQ_SHORT_READ, &subreq->flags);
 	__set_bit(NETFS_SREQ_SEEK_DATA_READ, &subreq->flags);
 
+	trace_netfs_sreq(subreq, netfs_sreq_trace_resubmit_short);
+
 	netfs_get_read_subrequest(subreq);
 	atomic_inc(&rreq->nr_rd_ops);
 	if (subreq->source == NETFS_READ_FROM_CACHE)
@@ -416,6 +435,8 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq)
 
 	WARN_ON(in_softirq());
 
+	trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit);
+
 	/* We don't want terminating submissions trying to wake us up whilst
 	 * we're still going through the list.
 	 */
@@ -428,6 +449,7 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq)
 				break;
 			subreq->source = NETFS_DOWNLOAD_FROM_SERVER;
 			subreq->error = 0;
+			trace_netfs_sreq(subreq, netfs_sreq_trace_download_instead);
 			netfs_get_read_subrequest(subreq);
 			atomic_inc(&rreq->nr_rd_ops);
 			netfs_read_from_server(rreq, subreq);
@@ -471,6 +493,8 @@ static void netfs_rreq_is_still_valid(struct netfs_read_request *rreq)
  */
 static void netfs_rreq_assess(struct netfs_read_request *rreq)
 {
+	trace_netfs_rreq(rreq, netfs_rreq_trace_assess);
+
 again:
 	netfs_rreq_is_still_valid(rreq);
 
@@ -558,6 +582,8 @@ void netfs_subreq_terminated(struct netfs_read_subrequest *subreq,
 		set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags);
 
 out:
+	trace_netfs_sreq(subreq, netfs_sreq_trace_terminated);
+
 	/* If we decrement nr_rd_ops to 0, the ref belongs to us. */
 	u = atomic_dec_return(&rreq->nr_rd_ops);
 	if (u == 0)
@@ -650,6 +676,7 @@ netfs_rreq_prepare_read(struct netfs_read_request *rreq,
 
 out:
 	subreq->source = source;
+	trace_netfs_sreq(subreq, netfs_sreq_trace_prepare);
 	return source;
 }
 
@@ -690,6 +717,7 @@ static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq,
 
 	rreq->submitted += subreq->len;
 
+	trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
 	switch (source) {
 	case NETFS_FILL_WITH_ZEROES:
 		netfs_fill_with_zeroes(rreq, subreq);
@@ -750,6 +778,9 @@ static void netfs_rreq_expand(struct netfs_read_request *rreq,
 		readahead_expand(ractl, rreq->start, rreq->len);
 		rreq->start  = readahead_pos(ractl);
 		rreq->len = readahead_length(ractl);
+
+		trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl),
+				 netfs_read_trace_expanded);
 	}
 }
 
@@ -791,6 +822,9 @@ void netfs_readahead(struct readahead_control *ractl,
 	rreq->start	= readahead_pos(ractl);
 	rreq->len	= readahead_length(ractl);
 
+	trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl),
+			 netfs_read_trace_readahead);
+
 	if (ops->begin_cache_operation)
 		ops->begin_cache_operation(rreq);
 	netfs_rreq_expand(rreq, ractl);
@@ -863,6 +897,9 @@ int netfs_readpage(struct file *file,
 	rreq->start		= page->index * PAGE_SIZE;
 	rreq->len		= thp_size(page);
 
+	trace_netfs_read(rreq, page->index * PAGE_SIZE, thp_size(page),
+			 netfs_read_trace_readpage);
+
 	if (ops->begin_cache_operation)
 		ops->begin_cache_operation(rreq);
 	netfs_get_read_request(rreq);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index f2ffea11ea9b..c4276eb9914d 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -53,6 +53,8 @@ struct netfs_read_request {
 	struct list_head	subrequests;	/* Requests to fetch I/O from disk or net */
 	struct work_struct	work;
 	void			*netfs_priv;	/* Private data for the netfs */
+	unsigned int		debug_id;
+	unsigned int		cookie_debug_id;
 	atomic_t		nr_rd_ops;	/* Number of read ops in progress */
 	atomic_t		nr_wr_ops;	/* Number of write ops in progress */
 	size_t			submitted;	/* Amount submitted for I/O so far */
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
new file mode 100644
index 000000000000..56a734a3fb0d
--- /dev/null
+++ b/include/trace/events/netfs.h
@@ -0,0 +1,200 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* Network filesystem support module tracepoints
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM netfs
+
+#if !defined(_TRACE_NETFS_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_NETFS_H
+
+#include <linux/fscache.h>
+#include <linux/tracepoint.h>
+
+/*
+ * Define enums for tracing information.
+ */
+#ifndef __NETFS_DECLARE_TRACE_ENUMS_ONCE_ONLY
+#define __NETFS_DECLARE_TRACE_ENUMS_ONCE_ONLY
+
+enum netfs_read_trace {
+	netfs_read_trace_expanded,
+	netfs_read_trace_readahead,
+	netfs_read_trace_readpage,
+};
+
+enum netfs_rreq_trace {
+	netfs_rreq_trace_assess,
+	netfs_rreq_trace_done,
+	netfs_rreq_trace_free,
+	netfs_rreq_trace_resubmit,
+	netfs_rreq_trace_unlock,
+	netfs_rreq_trace_unmark,
+	netfs_rreq_trace_write,
+};
+
+enum netfs_sreq_trace {
+	netfs_sreq_trace_download_instead,
+	netfs_sreq_trace_free,
+	netfs_sreq_trace_prepare,
+	netfs_sreq_trace_resubmit_short,
+	netfs_sreq_trace_submit,
+	netfs_sreq_trace_terminated,
+	netfs_sreq_trace_write,
+	netfs_sreq_trace_write_term,
+};
+
+#endif
+
+#define netfs_read_traces					\
+	EM(netfs_read_trace_expanded,		"EXPANDED ")	\
+	EM(netfs_read_trace_readahead,		"READAHEAD")	\
+	E_(netfs_read_trace_readpage,		"READPAGE ")
+
+#define netfs_rreq_traces					\
+	EM(netfs_rreq_trace_assess,		"ASSESS")	\
+	EM(netfs_rreq_trace_done,		"DONE  ")	\
+	EM(netfs_rreq_trace_free,		"FREE  ")	\
+	EM(netfs_rreq_trace_resubmit,		"RESUBM")	\
+	EM(netfs_rreq_trace_unlock,		"UNLOCK")	\
+	EM(netfs_rreq_trace_unmark,		"UNMARK")	\
+	E_(netfs_rreq_trace_write,		"WRITE ")
+
+#define netfs_sreq_sources					\
+	EM(NETFS_FILL_WITH_ZEROES,		"ZERO")		\
+	EM(NETFS_DOWNLOAD_FROM_SERVER,		"DOWN")		\
+	EM(NETFS_READ_FROM_CACHE,		"READ")		\
+	E_(NETFS_INVALID_READ,			"INVL")		\
+
+#define netfs_sreq_traces					\
+	EM(netfs_sreq_trace_download_instead,	"RDOWN")	\
+	EM(netfs_sreq_trace_free,		"FREE ")	\
+	EM(netfs_sreq_trace_prepare,		"PREP ")	\
+	EM(netfs_sreq_trace_resubmit_short,	"SHORT")	\
+	EM(netfs_sreq_trace_submit,		"SUBMT")	\
+	EM(netfs_sreq_trace_terminated,		"TERM ")	\
+	EM(netfs_sreq_trace_write,		"WRITE")	\
+	E_(netfs_sreq_trace_write_term,		"WTERM")
+
+
+/*
+ * Export enum symbols via userspace.
+ */
+#undef EM
+#undef E_
+#define EM(a, b) TRACE_DEFINE_ENUM(a);
+#define E_(a, b) TRACE_DEFINE_ENUM(a);
+
+netfs_read_traces;
+netfs_rreq_traces;
+netfs_sreq_sources;
+netfs_sreq_traces;
+
+/*
+ * Now redefine the EM() and E_() macros to map the enums to the strings that
+ * will be printed in the output.
+ */
+#undef EM
+#undef E_
+#define EM(a, b)	{ a, b },
+#define E_(a, b)	{ a, b }
+
+TRACE_EVENT(netfs_read,
+	    TP_PROTO(struct netfs_read_request *rreq,
+		     loff_t start, size_t len,
+		     enum netfs_read_trace what),
+
+	    TP_ARGS(rreq, start, len, what),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,		rreq		)
+		    __field(unsigned int,		cookie		)
+		    __field(loff_t,			start		)
+		    __field(size_t,			len		)
+		    __field(enum netfs_read_trace,	what		)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->rreq	= rreq->debug_id;
+		    __entry->cookie	= rreq->cookie_debug_id;
+		    __entry->start	= start;
+		    __entry->len	= len;
+		    __entry->what	= what;
+			   ),
+
+	    TP_printk("R=%08x %s c=%08x s=%llx %zx",
+		      __entry->rreq,
+		      __print_symbolic(__entry->what, netfs_read_traces),
+		      __entry->cookie,
+		      __entry->start, __entry->len)
+	    );
+
+TRACE_EVENT(netfs_rreq,
+	    TP_PROTO(struct netfs_read_request *rreq,
+		     enum netfs_rreq_trace what),
+
+	    TP_ARGS(rreq, what),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,		rreq		)
+		    __field(unsigned short,		flags		)
+		    __field(enum netfs_rreq_trace,	what		)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->rreq	= rreq->debug_id;
+		    __entry->flags	= rreq->flags;
+		    __entry->what	= what;
+			   ),
+
+	    TP_printk("R=%08x %s f=%02x",
+		      __entry->rreq,
+		      __print_symbolic(__entry->what, netfs_rreq_traces),
+		      __entry->flags)
+	    );
+
+TRACE_EVENT(netfs_sreq,
+	    TP_PROTO(struct netfs_read_subrequest *sreq,
+		     enum netfs_sreq_trace what),
+
+	    TP_ARGS(sreq, what),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,		rreq		)
+		    __field(unsigned short,		index		)
+		    __field(short,			error		)
+		    __field(unsigned short,		flags		)
+		    __field(enum netfs_read_source,	source		)
+		    __field(enum netfs_sreq_trace,	what		)
+		    __field(size_t,			len		)
+		    __field(size_t,			transferred	)
+		    __field(loff_t,			start		)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->rreq	= sreq->rreq->debug_id;
+		    __entry->index	= sreq->debug_index;
+		    __entry->error	= sreq->error;
+		    __entry->flags	= sreq->flags;
+		    __entry->source	= sreq->source;
+		    __entry->what	= what;
+		    __entry->len	= sreq->len;
+		    __entry->transferred = sreq->transferred;
+		    __entry->start	= sreq->start;
+			   ),
+
+	    TP_printk("R=%08x[%u] %s %s f=%02x s=%llx %zx/%zx e=%d",
+		      __entry->rreq, __entry->index,
+		      __print_symbolic(__entry->what, netfs_sreq_traces),
+		      __print_symbolic(__entry->source, netfs_sreq_sources),
+		      __entry->flags,
+		      __entry->start, __entry->transferred, __entry->len,
+		      __entry->error)
+	    );
+
+#endif /* _TRACE_NETFS_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 36/76] cachefiles: Remove some redundant checks on unsigned values
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (33 preceding siblings ...)
  2020-11-20 15:10 ` [RFC PATCH 35/76] fscache: read-helper: Add tracepoints David Howells
@ 2020-11-20 15:10 ` David Howells
  2020-11-20 15:10 ` [RFC PATCH 37/76] cachefiles: trace: Log coherency checks David Howells
                   ` (39 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:10 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Remove some redundant checks for unsigned values being >= 0.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/bind.c   |    6 ++----
 fs/cachefiles/daemon.c |    6 +++---
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/cachefiles/bind.c b/fs/cachefiles/bind.c
index a39e65d5103b..c89a422fd1d1 100644
--- a/fs/cachefiles/bind.c
+++ b/fs/cachefiles/bind.c
@@ -36,13 +36,11 @@ int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
 	       args);
 
 	/* start by checking things over */
-	ASSERT(cache->fstop_percent >= 0 &&
-	       cache->fstop_percent < cache->fcull_percent &&
+	ASSERT(cache->fstop_percent < cache->fcull_percent &&
 	       cache->fcull_percent < cache->frun_percent &&
 	       cache->frun_percent  < 100);
 
-	ASSERT(cache->bstop_percent >= 0 &&
-	       cache->bstop_percent < cache->bcull_percent &&
+	ASSERT(cache->bstop_percent < cache->bcull_percent &&
 	       cache->bcull_percent < cache->brun_percent &&
 	       cache->brun_percent  < 100);
 
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 8a937d6d5e22..e8ab3ab57147 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -221,7 +221,7 @@ static ssize_t cachefiles_daemon_write(struct file *file,
 	if (test_bit(CACHEFILES_DEAD, &cache->flags))
 		return -EIO;
 
-	if (datalen < 0 || datalen > PAGE_SIZE - 1)
+	if (datalen > PAGE_SIZE - 1)
 		return -EOPNOTSUPP;
 
 	/* drag the command string into the kernel so we can parse it */
@@ -378,7 +378,7 @@ static int cachefiles_daemon_fstop(struct cachefiles_cache *cache, char *args)
 	if (args[0] != '%' || args[1] != '\0')
 		return -EINVAL;
 
-	if (fstop < 0 || fstop >= cache->fcull_percent)
+	if (fstop >= cache->fcull_percent)
 		return cachefiles_daemon_range_error(cache, args);
 
 	cache->fstop_percent = fstop;
@@ -450,7 +450,7 @@ static int cachefiles_daemon_bstop(struct cachefiles_cache *cache, char *args)
 	if (args[0] != '%' || args[1] != '\0')
 		return -EINVAL;
 
-	if (bstop < 0 || bstop >= cache->bcull_percent)
+	if (bstop >= cache->bcull_percent)
 		return cachefiles_daemon_range_error(cache, args);
 
 	cache->bstop_percent = bstop;



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 37/76] cachefiles: trace: Log coherency checks
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (34 preceding siblings ...)
  2020-11-20 15:10 ` [RFC PATCH 36/76] cachefiles: Remove some redundant checks on unsigned values David Howells
@ 2020-11-20 15:10 ` David Howells
  2020-11-20 15:10 ` [RFC PATCH 38/76] cachefiles: Split cachefiles_drop_object() up a bit David Howells
                   ` (38 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:10 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Add a cachefiles tracepoint that logs the result of coherency management
when the coherency data on a file in the cache is checked or committed.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/xattr.c             |   45 ++++++++++++++++++++++--------
 include/trace/events/cachefiles.h |   56 +++++++++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+), 12 deletions(-)

diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index 75a2a6b70e4a..19db19b7d7db 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -125,12 +125,21 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 	ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
 			   buf, sizeof(struct cachefiles_xattr) + len,
 			   xattr_flags);
-	kfree(buf);
-	if (ret < 0 && ret != -ENOMEM)
-		cachefiles_io_error_obj(
-			object,
-			"Failed to set xattr with error %d", ret);
+	if (ret < 0) {
+		trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
+					   0,
+					   cachefiles_coherency_set_fail);
+		if (ret != -ENOMEM)
+			cachefiles_io_error_obj(
+				object,
+				"Failed to set xattr with error %d", ret);
+	} else {
+		trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
+					   0,
+					   cachefiles_coherency_set_ok);
+	}
 
+	kfree(buf);
 	_leave(" = %d", ret);
 	return ret;
 }
@@ -144,7 +153,9 @@ int cachefiles_check_auxdata(struct cachefiles_object *object)
 	struct dentry *dentry = object->dentry;
 	unsigned int len = object->fscache.cookie->aux_len, tlen;
 	const void *p = fscache_get_aux(object->fscache.cookie);
-	ssize_t ret;
+	enum cachefiles_coherency_trace why;
+	ssize_t xlen;
+	int ret = -ESTALE;
 
 	ASSERT(dentry);
 	ASSERT(d_backing_inode(dentry));
@@ -154,14 +165,24 @@ int cachefiles_check_auxdata(struct cachefiles_object *object)
 	if (!buf)
 		return -ENOMEM;
 
-	ret = vfs_getxattr(dentry, cachefiles_xattr_cache, buf, tlen);
-	if (ret == tlen &&
-	    buf->type == object->fscache.cookie->type &&
-	    memcmp(buf->data, p, len) == 0)
+	xlen = vfs_getxattr(dentry, cachefiles_xattr_cache, buf, tlen);
+	if (xlen != tlen) {
+		if (xlen == -EIO)
+			cachefiles_io_error_obj(
+				object,
+				"Failed to read aux with error %zd", xlen);
+		why = cachefiles_coherency_check_xattr;
+	} else if (buf->type != object->fscache.cookie->type) {
+		why = cachefiles_coherency_check_type;
+	} else if (memcmp(buf->data, p, len) != 0) {
+		why = cachefiles_coherency_check_aux;
+	} else {
+		why = cachefiles_coherency_check_ok;
 		ret = 0;
-	else
-		ret = -ESTALE;
+	}
 
+	trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
+				   0, why);
 	kfree(buf);
 	return ret;
 }
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 0aa3f3126f6e..bf588c3f4a07 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -24,6 +24,19 @@ enum cachefiles_obj_ref_trace {
 	cachefiles_obj_ref__nr_traces
 };
 
+enum cachefiles_coherency_trace {
+	cachefiles_coherency_check_aux,
+	cachefiles_coherency_check_content,
+	cachefiles_coherency_check_dirty,
+	cachefiles_coherency_check_len,
+	cachefiles_coherency_check_objsize,
+	cachefiles_coherency_check_ok,
+	cachefiles_coherency_check_type,
+	cachefiles_coherency_check_xattr,
+	cachefiles_coherency_set_fail,
+	cachefiles_coherency_set_ok,
+};
+
 #endif
 
 /*
@@ -56,6 +69,18 @@ enum cachefiles_obj_ref_trace {
 	EM(cachefiles_obj_put_wait_retry,	"PUT wait_retry")	\
 	E_(cachefiles_obj_put_wait_timeo,	"PUT wait_timeo")
 
+#define cachefiles_coherency_traces					\
+	EM(cachefiles_coherency_check_aux,	"BAD aux ")		\
+	EM(cachefiles_coherency_check_content,	"BAD cont")		\
+	EM(cachefiles_coherency_check_dirty,	"BAD dirt")		\
+	EM(cachefiles_coherency_check_len,	"BAD len ")		\
+	EM(cachefiles_coherency_check_objsize,	"BAD osiz")		\
+	EM(cachefiles_coherency_check_ok,	"OK      ")		\
+	EM(cachefiles_coherency_check_type,	"BAD type")		\
+	EM(cachefiles_coherency_check_xattr,	"BAD xatt")		\
+	EM(cachefiles_coherency_set_fail,	"SET fail")		\
+	E_(cachefiles_coherency_set_ok,		"SET ok  ")
+
 /*
  * Export enum symbols via userspace.
  */
@@ -66,6 +91,7 @@ enum cachefiles_obj_ref_trace {
 
 cachefiles_obj_kill_traces;
 cachefiles_obj_ref_traces;
+cachefiles_coherency_traces;
 
 /*
  * Now redefine the EM() and E_() macros to map the enums to the strings that
@@ -295,6 +321,36 @@ TRACE_EVENT(cachefiles_mark_buried,
 		      __print_symbolic(__entry->why, cachefiles_obj_kill_traces))
 	    );
 
+TRACE_EVENT(cachefiles_coherency,
+	    TP_PROTO(struct cachefiles_object *obj,
+		     ino_t ino,
+		     int content,
+		     enum cachefiles_coherency_trace why),
+
+	    TP_ARGS(obj, ino, content, why),
+
+	    /* Note that obj may be NULL */
+	    TP_STRUCT__entry(
+		    __field(unsigned int,			obj	)
+		    __field(enum cachefiles_coherency_trace,	why	)
+		    __field(int,				content	)
+		    __field(u64,				ino	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->why	= why;
+		    __entry->content	= content;
+		    __entry->ino	= ino;
+			   ),
+
+	    TP_printk("o=%08x %s i=%llx c=%u",
+		      __entry->obj,
+		      __print_symbolic(__entry->why, cachefiles_coherency_traces),
+		      __entry->ino,
+		      __entry->content)
+	    );
+
 #endif /* _TRACE_CACHEFILES_H */
 
 /* This part must be outside protection */



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 38/76] cachefiles: Split cachefiles_drop_object() up a bit
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (35 preceding siblings ...)
  2020-11-20 15:10 ` [RFC PATCH 37/76] cachefiles: trace: Log coherency checks David Howells
@ 2020-11-20 15:10 ` David Howells
  2020-11-20 15:11 ` [RFC PATCH 39/76] cachefiles: Implement new fscache I/O backend API David Howells
                   ` (37 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:10 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Split cachefiles_drop_object() up a bit to make it easier to modify later.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/interface.c |   58 ++++++++++++++++++++++++++++++---------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index e40118e65e25..3f6d2be690bc 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -192,6 +192,42 @@ static void cachefiles_update_object(struct fscache_object *_object)
 	_leave("");
 }
 
+/*
+ * Commit changes to the object as we drop it.
+ */
+static void cachefiles_commit_object(struct cachefiles_object *object,
+				     struct cachefiles_cache *cache)
+{
+}
+
+/*
+ * Finalise and object and close the VFS structs that we have.
+ */
+static void cachefiles_clean_up_object(struct cachefiles_object *object,
+				       struct cachefiles_cache *cache,
+				       bool invalidate)
+{
+	if (invalidate && &object->fscache != cache->cache.fsdef) {
+		_debug("- inval object OBJ%x", object->fscache.debug_id);
+		cachefiles_delete_object(cache, object);
+	} else {
+		cachefiles_commit_object(object, cache);
+	}
+
+	/* close the filesystem stuff attached to the object */
+	if (object->backing_file)
+		fput(object->backing_file);
+	object->backing_file = NULL;
+
+	if (object->backer != object->dentry)
+		dput(object->backer);
+	object->backer = NULL;
+
+	cachefiles_unmark_inode_in_use(object, object->dentry);
+	dput(object->dentry);
+	object->dentry = NULL;
+}
+
 /*
  * discard the resources pinned by an object and effect retirement if
  * requested
@@ -223,25 +259,9 @@ static void cachefiles_drop_object(struct fscache_object *_object,
 	 * before we set it up.
 	 */
 	if (object->dentry) {
-		if (invalidate && _object != cache->cache.fsdef) {
-			_debug("- inval object OBJ%x", object->fscache.debug_id);
-			cachefiles_begin_secure(cache, &saved_cred);
-			cachefiles_delete_object(cache, object);
-			cachefiles_end_secure(cache, saved_cred);
-		}
-
-		/* close the filesystem stuff attached to the object */
-		if (object->backing_file)
-			fput(object->backing_file);
-		object->backing_file = NULL;
-
-		if (object->backer != object->dentry)
-			dput(object->backer);
-		object->backer = NULL;
-
-		cachefiles_unmark_inode_in_use(object, object->dentry);
-		dput(object->dentry);
-		object->dentry = NULL;
+		cachefiles_begin_secure(cache, &saved_cred);
+		cachefiles_clean_up_object(object, cache, invalidate);
+		cachefiles_end_secure(cache, saved_cred);
 	}
 
 	_leave("");



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 39/76] cachefiles: Implement new fscache I/O backend API
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (36 preceding siblings ...)
  2020-11-20 15:10 ` [RFC PATCH 38/76] cachefiles: Split cachefiles_drop_object() up a bit David Howells
@ 2020-11-20 15:11 ` David Howells
  2020-11-20 15:11 ` [RFC PATCH 40/76] cachefiles: Merge object->backer into object->dentry David Howells
                   ` (36 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:11 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Implement the new fscache I/O backend API in cachefiles.  The
cachefiles_object struct carries a non-accounted file to the cachefiles
object (so that it doesn't cause ENFILE).

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/Makefile    |    1 +
 fs/cachefiles/interface.c |   13 +++++++
 fs/cachefiles/internal.h  |   19 ++++++++++
 fs/cachefiles/io.c        |   85 +++++++++++++++++++++++++++++++++++++++++++++
 fs/cachefiles/namei.c     |    3 ++
 5 files changed, 120 insertions(+), 1 deletion(-)
 create mode 100644 fs/cachefiles/io.c

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index 3455d3646547..d894d317d6e7 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -7,6 +7,7 @@ cachefiles-y := \
 	bind.o \
 	daemon.o \
 	interface.o \
+	io.o \
 	key.o \
 	main.o \
 	namei.o \
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 3f6d2be690bc..dc8c875223bb 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -453,6 +453,18 @@ static unsigned int cachefiles_get_object_usage(const struct fscache_object *_ob
 	return atomic_read(&object->usage);
 }
 
+static const struct fscache_op_ops cachefiles_io_ops = {
+	.wait_for_operation	= __fscache_wait_for_operation,
+	.end_operation		= __fscache_end_operation,
+	.read			= cachefiles_read,
+	.write			= cachefiles_write,
+};
+
+static void cachefiles_begin_operation(struct fscache_op_resources *opr)
+{
+	opr->ops = &cachefiles_io_ops;
+}
+
 const struct fscache_cache_ops cachefiles_cache_ops = {
 	.name			= "cachefiles",
 	.alloc_object		= cachefiles_alloc_object,
@@ -466,4 +478,5 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
 	.put_object		= cachefiles_put_object,
 	.get_object_usage	= cachefiles_get_object_usage,
 	.sync_cache		= cachefiles_sync_cache,
+	.begin_operation	= cachefiles_begin_operation,
 };
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 16d15291a629..caa6dfbaf333 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -115,6 +115,22 @@ extern const struct fscache_cache_ops cachefiles_cache_ops;
 extern struct fscache_object *cachefiles_grab_object(struct fscache_object *_object,
 						     enum fscache_obj_ref_trace why);
 
+/*
+ * io.c
+ */
+extern int cachefiles_read(struct fscache_op_resources *opr,
+			   loff_t start_pos,
+			   struct iov_iter *iter,
+			   bool seek_data,
+			   fscache_io_terminated_t term_func,
+			   void *term_func_priv);
+extern int cachefiles_write(struct fscache_op_resources *opr,
+			    loff_t start_pos,
+			    struct iov_iter *iter,
+			    fscache_io_terminated_t term_func,
+			    void *term_func_priv);
+extern bool cachefiles_open_object(struct cachefiles_object *obj);
+
 /*
  * key.c
  */
@@ -216,7 +232,8 @@ do {									\
 									\
 	___cache = container_of((object)->fscache.cache,		\
 				struct cachefiles_cache, cache);	\
-	cachefiles_io_error(___cache, FMT, ##__VA_ARGS__);		\
+	cachefiles_io_error(___cache, FMT " [o=%08x]", ##__VA_ARGS__,	\
+			    object->fscache.debug_id);			\
 } while (0)
 
 
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
new file mode 100644
index 000000000000..f1d5976aa28c
--- /dev/null
+++ b/fs/cachefiles/io.c
@@ -0,0 +1,85 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Data I/O routines
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/mount.h>
+#include <linux/slab.h>
+#include <linux/file.h>
+#include <linux/uio.h>
+#include <linux/xattr.h>
+#include "internal.h"
+#include <trace/events/fscache.h>
+
+/*
+ * Initiate a read from the cache.
+ */
+int cachefiles_read(struct fscache_op_resources *opr,
+		    loff_t start_pos,
+		    struct iov_iter *iter,
+		    bool seek_data,
+		    fscache_io_terminated_t term_func,
+		    void *term_func_priv)
+{
+	fscache_wait_for_operation(opr, FSCACHE_WANT_READ);
+	fscache_count_io_operation(opr->object->cookie);
+	if (term_func)
+		term_func(term_func_priv, -ENODATA);
+	return -ENODATA;
+}
+
+/*
+ * Initiate a write to the cache.
+ */
+int cachefiles_write(struct fscache_op_resources *opr,
+		     loff_t start_pos,
+		     struct iov_iter *iter,
+		     fscache_io_terminated_t term_func,
+		     void *term_func_priv)
+{
+	fscache_wait_for_operation(opr, FSCACHE_WANT_WRITE);
+	fscache_count_io_operation(opr->object->cookie);
+	if (term_func)
+		term_func(term_func_priv, -ENOBUFS);
+	return -ENOBUFS;
+}
+
+/*
+ * Open a cache object.
+ */
+bool cachefiles_open_object(struct cachefiles_object *object)
+{
+	struct cachefiles_cache *cache =
+		container_of(object->fscache.cache, struct cachefiles_cache, cache);
+	struct file *file;
+	struct path path;
+
+	path.mnt = cache->mnt;
+	path.dentry = object->backer;
+
+	file = open_with_fake_path(&path,
+				   O_RDWR | O_LARGEFILE | O_DIRECT,
+				   d_backing_inode(object->backer),
+				   cache->cache_cred);
+	if (IS_ERR(file))
+		goto error;
+
+	if (!S_ISREG(file_inode(file)->i_mode))
+		goto error_file;
+
+	if (unlikely(!file->f_op->read_iter) ||
+	    unlikely(!file->f_op->write_iter)) {
+		pr_notice("Cache does not support read_iter and write_iter\n");
+		goto error_file;
+	}
+
+	object->backing_file = file;
+	return true;
+
+error_file:
+	fput(file);
+error:
+	return false;
+}
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 0a7e2031efa2..4b515054d92e 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -490,6 +490,9 @@ bool cachefiles_walk_to_object(struct cachefiles_object *parent,
 		} else {
 			BUG(); // TODO: open file in data-class subdir
 		}
+
+		if (!cachefiles_open_object(object))
+			goto check_error;
 	}
 
 	if (object->new)



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 40/76] cachefiles: Merge object->backer into object->dentry
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (37 preceding siblings ...)
  2020-11-20 15:11 ` [RFC PATCH 39/76] cachefiles: Implement new fscache I/O backend API David Howells
@ 2020-11-20 15:11 ` David Howells
  2020-11-20 15:11 ` [RFC PATCH 41/76] cachefiles: Implement a content-present indicator and bitmap David Howells
                   ` (35 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:11 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Merge the object->backer pointer into the object->dentry pointer and assume
that data objects are always going to be just regular files.

object->dentry can then more easily be overridden later by invalidation
without having two different things to update the xattrs on.

object->old maintains a pointer to the old file so that we can unlink the
it later.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/interface.c |   35 +++++++++++++++++------------------
 fs/cachefiles/internal.h  |    2 +-
 fs/cachefiles/io.c        |    4 ++--
 fs/cachefiles/namei.c     |    4 +++-
 4 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index dc8c875223bb..7e8940a7fb55 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -171,16 +171,16 @@ static void cachefiles_update_object(struct fscache_object *_object)
 	cachefiles_begin_secure(cache, &saved_cred);
 
 	object_size = object->fscache.cookie->object_size;
-	if (i_size_read(d_inode(object->backer)) > object_size) {
+	if (i_size_read(d_inode(object->dentry)) > object_size) {
 		struct path path = {
 			.mnt	= cache->mnt,
-			.dentry	= object->backer
+			.dentry	= object->dentry
 		};
-		_debug("trunc %llx -> %llx", i_size_read(d_inode(object->backer)), object_size);
+		_debug("trunc %llx -> %llx", i_size_read(d_inode(object->dentry)), object_size);
 		ret = vfs_truncate(&path, object_size);
 		if (ret < 0) {
 			cachefiles_io_error_obj(object, "Trunc-to-size failed");
-			cachefiles_remove_object_xattr(cache, object->backer);
+			cachefiles_remove_object_xattr(cache, object->dentry);
 			goto out;
 		}
 	}
@@ -219,9 +219,8 @@ static void cachefiles_clean_up_object(struct cachefiles_object *object,
 		fput(object->backing_file);
 	object->backing_file = NULL;
 
-	if (object->backer != object->dentry)
-		dput(object->backer);
-	object->backer = NULL;
+	dput(object->old);
+	object->old = NULL;
 
 	cachefiles_unmark_inode_in_use(object, object->dentry);
 	dput(object->dentry);
@@ -295,7 +294,7 @@ static void cachefiles_put_object(struct fscache_object *_object,
 	if (u == 0) {
 		_debug("- kill object OBJ%x", object->fscache.debug_id);
 
-		ASSERTCMP(object->backer, ==, NULL);
+		ASSERTCMP(object->old, ==, NULL);
 		ASSERTCMP(object->dentry, ==, NULL);
 		ASSERTCMP(object->fscache.n_children, ==, 0);
 
@@ -360,17 +359,17 @@ static int cachefiles_attr_changed(struct cachefiles_object *object)
 	if (ni_size == object->i_size)
 		return 0;
 
-	if (!object->backer)
+	if (!object->dentry)
 		return -ENOBUFS;
 
-	ASSERT(d_is_reg(object->backer));
+	ASSERT(d_is_reg(object->dentry));
 
-	oi_size = i_size_read(d_backing_inode(object->backer));
+	oi_size = i_size_read(d_backing_inode(object->dentry));
 	if (oi_size == ni_size)
 		return 0;
 
 	cachefiles_begin_secure(cache, &saved_cred);
-	inode_lock(d_inode(object->backer));
+	inode_lock(d_inode(object->dentry));
 
 	/* if there's an extension to a partial page at the end of the backing
 	 * file, we need to discard the partial page so that we pick up new
@@ -379,17 +378,17 @@ static int cachefiles_attr_changed(struct cachefiles_object *object)
 		_debug("discard tail %llx", oi_size);
 		newattrs.ia_valid = ATTR_SIZE;
 		newattrs.ia_size = oi_size & PAGE_MASK;
-		ret = notify_change(object->backer, &newattrs, NULL);
+		ret = notify_change(object->dentry, &newattrs, NULL);
 		if (ret < 0)
 			goto truncate_failed;
 	}
 
 	newattrs.ia_valid = ATTR_SIZE;
 	newattrs.ia_size = ni_size;
-	ret = notify_change(object->backer, &newattrs, NULL);
+	ret = notify_change(object->dentry, &newattrs, NULL);
 
 truncate_failed:
-	inode_unlock(d_inode(object->backer));
+	inode_unlock(d_inode(object->dentry));
 	cachefiles_end_secure(cache, saved_cred);
 
 	if (ret == -EIO) {
@@ -422,10 +421,10 @@ static bool cachefiles_invalidate_object(struct fscache_object *_object)
 	_enter("{OBJ%x},[%llu]",
 	       object->fscache.debug_id, (unsigned long long)ni_size);
 
-	if (object->backer) {
-		ASSERT(d_is_reg(object->backer));
+	if (object->dentry) {
+		ASSERT(d_is_reg(object->dentry));
 
-		path.dentry = object->backer;
+		path.dentry = object->dentry;
 		path.mnt = cache->mnt;
 
 		cachefiles_begin_secure(cache, &saved_cred);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index caa6dfbaf333..2dab3ee386ad 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -35,7 +35,7 @@ extern unsigned cachefiles_debug;
 struct cachefiles_object {
 	struct fscache_object		fscache;	/* fscache handle */
 	struct dentry			*dentry;	/* the file/dir representing this object */
-	struct dentry			*backer;	/* backing file */
+	struct dentry			*old;		/* backing file */
 	struct file			*backing_file;	/* File open on backing storage */
 	loff_t				i_size;		/* object size */
 	atomic_t			usage;		/* object usage count */
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index f1d5976aa28c..2d406d681597 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -57,11 +57,11 @@ bool cachefiles_open_object(struct cachefiles_object *object)
 	struct path path;
 
 	path.mnt = cache->mnt;
-	path.dentry = object->backer;
+	path.dentry = object->dentry;
 
 	file = open_with_fake_path(&path,
 				   O_RDWR | O_LARGEFILE | O_DIRECT,
-				   d_backing_inode(object->backer),
+				   d_backing_inode(object->dentry),
 				   cache->cache_cred);
 	if (IS_ERR(file))
 		goto error;
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 4b515054d92e..9f6c91cff55d 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -486,7 +486,7 @@ bool cachefiles_walk_to_object(struct cachefiles_object *parent,
 				goto check_error;
 			}
 
-			object->backer = object->dentry;
+			object->old = dget(object->dentry);
 		} else {
 			BUG(); // TODO: open file in data-class subdir
 		}
@@ -521,7 +521,9 @@ bool cachefiles_walk_to_object(struct cachefiles_object *parent,
 		cachefiles_unmark_inode_in_use(object, object->dentry);
 	cachefiles_mark_object_inactive(cache, object);
 	dput(object->dentry);
+	dput(object->old);
 	object->dentry = NULL;
+	object->old = NULL;
 	goto error_out;
 
 lookup_error:



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 41/76] cachefiles: Implement a content-present indicator and bitmap
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (38 preceding siblings ...)
  2020-11-20 15:11 ` [RFC PATCH 40/76] cachefiles: Merge object->backer into object->dentry David Howells
@ 2020-11-20 15:11 ` David Howells
  2020-11-20 15:11 ` [RFC PATCH 42/76] cachefiles: Shape requests from the fscache read helper David Howells
                   ` (34 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:11 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Implement a content indicator that indicates the presence or absence of
content and a bitmap that indicates which blocks of granular content are
present in a granular file.  This is added to the xattr that stores the
netfs coherency data, along with the file size and the file zero point (the
point after which it can be assumed that the server doesn't have any data).

In the content bitmap, if present, each bit indicates which 256KiB granules
of a cache file are present.  This is stored in a separate xattr, which is
loaded when the first I/O handle is created on that cache object and saved
when the object is discarded from memory.

Non-index objects in the cache can be monolithic or granular.  The content
map isn't used for monolithic objects (FSCACHE_COOKIE_ADV_SINGLE_CHUNK) as
they are expected to be all-or-nothing, so the content indicator alone
suffices.  Examples of this would be AFS directory or symlink content.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/Makefile            |    1 
 fs/cachefiles/bind.c              |    1 
 fs/cachefiles/content-map.c       |  246 +++++++++++++++++++++++++++++++++++++
 fs/cachefiles/interface.c         |    5 +
 fs/cachefiles/internal.h          |   32 +++++
 fs/cachefiles/io.c                |    4 +
 fs/cachefiles/xattr.c             |   24 +++-
 include/trace/events/cachefiles.h |    4 -
 8 files changed, 309 insertions(+), 8 deletions(-)
 create mode 100644 fs/cachefiles/content-map.c

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index d894d317d6e7..84615aca866a 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -5,6 +5,7 @@
 
 cachefiles-y := \
 	bind.o \
+	content-map.o \
 	daemon.o \
 	interface.o \
 	io.o \
diff --git a/fs/cachefiles/bind.c b/fs/cachefiles/bind.c
index c89a422fd1d1..88c7cb4fb0f7 100644
--- a/fs/cachefiles/bind.c
+++ b/fs/cachefiles/bind.c
@@ -102,6 +102,7 @@ static int cachefiles_daemon_add_cache(struct cachefiles_cache *cache)
 		goto error_root_object;
 
 	atomic_set(&fsdef->usage, 1);
+	rwlock_init(&fsdef->content_map_lock);
 	fsdef->type = FSCACHE_COOKIE_TYPE_INDEX;
 
 	/* look up the directory at the root of the cache */
diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
new file mode 100644
index 000000000000..68fcab313361
--- /dev/null
+++ b/fs/cachefiles/content-map.c
@@ -0,0 +1,246 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Datafile content management
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/mount.h>
+#include <linux/slab.h>
+#include <linux/file.h>
+#include <linux/swap.h>
+#include <linux/xattr.h>
+#include "internal.h"
+
+static const char cachefiles_xattr_content_map[] =
+	XATTR_USER_PREFIX "CacheFiles.content";
+
+/*
+ * Determine the map size for a granulated object.
+ *
+ * There's one bit per granule.  We size it in terms of 8-byte chunks, where a
+ * 64-bit span * 256KiB bytes granules covers 16MiB of file space.  At that,
+ * 512B will cover 1GiB.
+ */
+static size_t cachefiles_map_size(loff_t i_size)
+{
+	loff_t size;
+	size_t granules, bits, bytes, map_size;
+
+	if (i_size <= CACHEFILES_GRAN_SIZE * 64)
+		return 8;
+
+	size = min_t(loff_t, i_size + CACHEFILES_GRAN_SIZE - 1, CACHEFILES_SIZE_LIMIT);
+	granules = size / CACHEFILES_GRAN_SIZE;
+	bits = granules + (64 - 1);
+	bits &= ~(64 - 1);
+	bytes = bits / 8;
+	map_size = roundup_pow_of_two(bytes);
+	_leave(" = %zx [i=%llx g=%zu b=%zu]", map_size, i_size, granules, bits);
+	return map_size;
+}
+
+/*
+ * Mark the content map to indicate stored granule.
+ */
+void cachefiles_mark_content_map(struct cachefiles_object *object,
+				 loff_t start, loff_t len)
+{
+	_enter("%llx", start);
+
+	read_lock_bh(&object->content_map_lock);
+
+	if (object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK) {
+		if (start == 0) {
+			object->content_info = CACHEFILES_CONTENT_SINGLE;
+			set_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->fscache.flags);
+		}
+	} else {
+		pgoff_t granule;
+		loff_t end = start + len;
+
+		start = round_down(start, CACHEFILES_GRAN_SIZE);
+		do {
+			granule = start / CACHEFILES_GRAN_SIZE;
+			if (granule / 8 >= object->content_map_size)
+				break;
+
+			set_bit_le(granule, object->content_map);
+			object->content_map_changed = true;
+			start += CACHEFILES_GRAN_SIZE;
+
+		} while (start < end);
+
+		if (object->content_info != CACHEFILES_CONTENT_MAP) {
+			object->content_info = CACHEFILES_CONTENT_MAP;
+			set_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->fscache.flags);
+		}
+	}
+
+	read_unlock_bh(&object->content_map_lock);
+}
+
+/*
+ * Expand the content map to a larger file size.
+ */
+void cachefiles_expand_content_map(struct cachefiles_object *object, loff_t i_size)
+{
+	size_t size;
+	u8 *map, *zap;
+
+	size = cachefiles_map_size(i_size);
+
+	_enter("%llx,%zx,%x", i_size, size, object->content_map_size);
+
+	if (size <= object->content_map_size)
+		return;
+
+	map = kzalloc(size, GFP_KERNEL);
+	if (!map)
+		return;
+
+	write_lock_bh(&object->content_map_lock);
+	if (size > object->content_map_size) {
+		zap = object->content_map;
+		memcpy(map, zap, object->content_map_size);
+		object->content_map = map;
+		object->content_map_size = size;
+	} else {
+		zap = map;
+	}
+	write_unlock_bh(&object->content_map_lock);
+
+	kfree(zap);
+}
+
+/*
+ * Adjust the content map when we shorten a backing object.
+ *
+ * We need to unmark any granules that are going to be discarded.
+ */
+void cachefiles_shorten_content_map(struct cachefiles_object *object,
+				    loff_t new_size)
+{
+	struct fscache_cookie *cookie = object->fscache.cookie;
+	ssize_t granules_needed, bits_needed, bytes_needed;
+
+	if (object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK)
+		return;
+
+	write_lock_bh(&object->content_map_lock);
+
+	if (object->content_info == CACHEFILES_CONTENT_MAP) {
+		if (cookie->zero_point > new_size)
+			cookie->zero_point = new_size;
+
+		granules_needed = new_size;
+		granules_needed += CACHEFILES_GRAN_SIZE - 1;
+		granules_needed /= CACHEFILES_GRAN_SIZE;
+		bits_needed = round_up(granules_needed, 8);
+		bytes_needed = bits_needed / 8;
+
+		if (bytes_needed < object->content_map_size)
+			memset(object->content_map + bytes_needed, 0,
+			       object->content_map_size - bytes_needed);
+
+		if (bits_needed > granules_needed) {
+			size_t byte = (granules_needed - 1) / 8;
+			unsigned int shift = granules_needed % 8;
+			unsigned int mask = (1 << shift) - 1;
+			object->content_map[byte] &= mask;
+		}
+	}
+
+	write_unlock_bh(&object->content_map_lock);
+}
+
+/*
+ * Load the content map.
+ */
+bool cachefiles_load_content_map(struct cachefiles_object *object)
+{
+	struct cachefiles_cache *cache = container_of(object->fscache.cache,
+						      struct cachefiles_cache, cache);
+	const struct cred *saved_cred;
+	ssize_t got;
+	size_t size;
+	u8 *map = NULL;
+
+	_enter("c=%08x,%llx",
+	       object->fscache.cookie->debug_id,
+	       object->fscache.cookie->object_size);
+
+	object->content_info = CACHEFILES_CONTENT_NO_DATA;
+	if (object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK) {
+		/* Single-chunk object.  The presence or absence of the content
+		 * map xattr is sufficient indication.
+		 */
+		size = 0;
+	} else {
+		/* Granulated object.  There's one bit per granule.  We size it
+		 * in terms of 8-byte chunks, where a 64-bit span * 256KiB
+		 * bytes granules covers 16MiB of file space.  At that, 512B
+		 * will cover 1GiB.
+		 */
+		size = cachefiles_map_size(object->fscache.cookie->object_size);
+		map = kzalloc(size, GFP_KERNEL);
+		if (!map)
+			return false;
+	}
+
+	cachefiles_begin_secure(cache, &saved_cred);
+	got = vfs_getxattr(object->dentry, cachefiles_xattr_content_map,
+			   map, size);
+	cachefiles_end_secure(cache, saved_cred);
+	if (got < 0 && got != -ENODATA) {
+		kfree(map);
+		_leave(" = f [%zd]", got);
+		return false;
+	}
+
+	if (size == 0) {
+		if (got != -ENODATA)
+			object->content_info = CACHEFILES_CONTENT_SINGLE;
+		_leave(" = t [%zd]", got);
+	} else {
+		object->content_map = map;
+		object->content_map_size = size;
+		object->content_info = CACHEFILES_CONTENT_MAP;
+		_leave(" = t [%zd/%zu %*phN]", got, size, (int)size, map);
+	}
+
+	return true;
+}
+
+/*
+ * Save the content map.
+ */
+void cachefiles_save_content_map(struct cachefiles_object *object)
+{
+	ssize_t ret;
+	size_t size;
+	u8 *map;
+
+	_enter("c=%08x", object->fscache.cookie->debug_id);
+
+	if (object->content_info != CACHEFILES_CONTENT_MAP)
+		return;
+
+	size = object->content_map_size;
+	map = object->content_map;
+
+	/* Don't save trailing zeros, but do save at least one byte */
+	for (; size > 0; size--)
+		if (map[size - 1])
+			break;
+
+	ret = vfs_setxattr(object->dentry, cachefiles_xattr_content_map,
+			   map, size, 0);
+	if (ret < 0) {
+		cachefiles_io_error_obj(object, "Unable to set xattr e=%zd s=%zu",
+					ret, size);
+		return;
+	}
+
+	_leave(" = %zd", ret);
+}
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 7e8940a7fb55..1d06cf72eeb0 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -37,6 +37,7 @@ struct fscache_object *cachefiles_alloc_object(struct fscache_cookie *cookie,
 		return NULL;
 	}
 
+	rwlock_init(&object->content_map_lock);
 	fscache_object_init(&object->fscache, cookie, &cache->cache);
 	object->fscache.parent = parent;
 	object->fscache.stage = FSCACHE_OBJECT_STAGE_LOOKING_UP;
@@ -198,6 +199,8 @@ static void cachefiles_update_object(struct fscache_object *_object)
 static void cachefiles_commit_object(struct cachefiles_object *object,
 				     struct cachefiles_cache *cache)
 {
+	if (object->content_map_changed)
+		cachefiles_save_content_map(object);
 }
 
 /*
@@ -298,6 +301,8 @@ static void cachefiles_put_object(struct fscache_object *_object,
 		ASSERTCMP(object->dentry, ==, NULL);
 		ASSERTCMP(object->fscache.n_children, ==, 0);
 
+		kfree(object->content_map);
+
 		cache = object->fscache.cache;
 		fscache_object_destroy(&object->fscache);
 		kmem_cache_free(cachefiles_object_jar, object);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 2dab3ee386ad..e065550a4bc0 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -19,6 +19,11 @@
 #include <linux/workqueue.h>
 #include <linux/security.h>
 
+/* Cachefile granularity */
+#define CACHEFILES_GRAN_SIZE	(256 * 1024)
+#define CACHEFILES_DIO_BLOCK_SIZE 4096
+#define CACHEFILES_SIZE_LIMIT	(512 * 8 * CACHEFILES_GRAN_SIZE)
+
 struct cachefiles_cache;
 struct cachefiles_object;
 
@@ -29,6 +34,16 @@ extern unsigned cachefiles_debug;
 
 #define cachefiles_gfp (__GFP_RECLAIM | __GFP_NORETRY | __GFP_NOMEMALLOC)
 
+enum cachefiles_content {
+	/* These values are saved on disk */
+	CACHEFILES_CONTENT_NO_DATA	= 0, /* No content stored */
+	CACHEFILES_CONTENT_SINGLE	= 1, /* Content is monolithic, all is present */
+	CACHEFILES_CONTENT_ALL		= 2, /* Content is all present, no map */
+	CACHEFILES_CONTENT_MAP		= 3, /* Content is piecemeal, map in use */
+	CACHEFILES_CONTENT_DIRTY	= 4, /* Content is dirty (only seen on disk) */
+	nr__cachefiles_content
+};
+
 /*
  * node records
  */
@@ -41,6 +56,13 @@ struct cachefiles_object {
 	atomic_t			usage;		/* object usage count */
 	uint8_t				type;		/* object type */
 	bool				new;		/* T if object new */
+
+	/* Map of the content blocks in the object */
+	enum cachefiles_content		content_info:8;	/* Info about content presence */
+	bool				content_map_changed;
+	u8				*content_map;		/* Content present bitmap */
+	unsigned int			content_map_size;	/* Size of buffer */
+	rwlock_t			content_map_lock;
 };
 
 extern struct kmem_cache *cachefiles_object_jar;
@@ -100,6 +122,16 @@ static inline void cachefiles_state_changed(struct cachefiles_cache *cache)
 extern int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args);
 extern void cachefiles_daemon_unbind(struct cachefiles_cache *cache);
 
+/*
+ * content-map.c
+ */
+extern void cachefiles_mark_content_map(struct cachefiles_object *object,
+					loff_t start, loff_t len);
+extern void cachefiles_expand_content_map(struct cachefiles_object *object, loff_t size);
+extern void cachefiles_shorten_content_map(struct cachefiles_object *object, loff_t new_size);
+extern bool cachefiles_load_content_map(struct cachefiles_object *object);
+extern void cachefiles_save_content_map(struct cachefiles_object *object);
+
 /*
  * daemon.c
  */
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 2d406d681597..bf1930699636 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -59,6 +59,10 @@ bool cachefiles_open_object(struct cachefiles_object *object)
 	path.mnt = cache->mnt;
 	path.dentry = object->dentry;
 
+	if (object->content_info == CACHEFILES_CONTENT_MAP &&
+	    !cachefiles_load_content_map(object))
+		goto error;
+
 	file = open_with_fake_path(&path,
 				   O_RDWR | O_LARGEFILE | O_DIRECT,
 				   d_backing_inode(object->dentry),
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index 19db19b7d7db..cbd43855dc0d 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -16,8 +16,11 @@
 #include "internal.h"
 
 struct cachefiles_xattr {
-	uint8_t				type;
-	uint8_t				data[];
+	__be64	object_size;	/* Actual size of the object */
+	__be64	zero_point;	/* Size after which server has no data not written by us */
+	__u8	type;		/* Type of object */
+	__u8	content;	/* Content presence (enum cachefiles_content) */
+	__u8	data[];		/* netfs coherency data */
 } __packed;
 
 static const char cachefiles_xattr_cache[] =
@@ -118,7 +121,10 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 	if (!buf)
 		return -ENOMEM;
 
-	buf->type = object->fscache.cookie->type;
+	buf->object_size	= cpu_to_be64(object->fscache.cookie->object_size);
+	buf->zero_point		= cpu_to_be64(object->fscache.cookie->zero_point);
+	buf->type		= object->fscache.cookie->type;
+	buf->content		= object->content_info;
 	if (len > 0)
 		memcpy(buf->data, fscache_get_aux(object->fscache.cookie), len);
 
@@ -127,7 +133,7 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 			   xattr_flags);
 	if (ret < 0) {
 		trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
-					   0,
+					   buf->content,
 					   cachefiles_coherency_set_fail);
 		if (ret != -ENOMEM)
 			cachefiles_io_error_obj(
@@ -135,7 +141,7 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 				"Failed to set xattr with error %d", ret);
 	} else {
 		trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
-					   0,
+					   buf->content,
 					   cachefiles_coherency_set_ok);
 	}
 
@@ -174,15 +180,21 @@ int cachefiles_check_auxdata(struct cachefiles_object *object)
 		why = cachefiles_coherency_check_xattr;
 	} else if (buf->type != object->fscache.cookie->type) {
 		why = cachefiles_coherency_check_type;
+	} else if (buf->content >= nr__cachefiles_content) {
+		why = cachefiles_coherency_check_content;
 	} else if (memcmp(buf->data, p, len) != 0) {
 		why = cachefiles_coherency_check_aux;
+	} else if (be64_to_cpu(buf->object_size) != object->fscache.cookie->object_size) {
+		why = cachefiles_coherency_check_objsize;
 	} else {
+		object->fscache.cookie->zero_point = be64_to_cpu(buf->zero_point);
+		object->content_info = buf->content;
 		why = cachefiles_coherency_check_ok;
 		ret = 0;
 	}
 
 	trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
-				   0, why);
+				   buf->content, why);
 	kfree(buf);
 	return ret;
 }
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index bf588c3f4a07..e7af1d683009 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -324,7 +324,7 @@ TRACE_EVENT(cachefiles_mark_buried,
 TRACE_EVENT(cachefiles_coherency,
 	    TP_PROTO(struct cachefiles_object *obj,
 		     ino_t ino,
-		     int content,
+		     enum cachefiles_content content,
 		     enum cachefiles_coherency_trace why),
 
 	    TP_ARGS(obj, ino, content, why),
@@ -333,7 +333,7 @@ TRACE_EVENT(cachefiles_coherency,
 	    TP_STRUCT__entry(
 		    __field(unsigned int,			obj	)
 		    __field(enum cachefiles_coherency_trace,	why	)
-		    __field(int,				content	)
+		    __field(enum cachefiles_content,		content	)
 		    __field(u64,				ino	)
 			     ),
 



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 42/76] cachefiles: Shape requests from the fscache read helper
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (39 preceding siblings ...)
  2020-11-20 15:11 ` [RFC PATCH 41/76] cachefiles: Implement a content-present indicator and bitmap David Howells
@ 2020-11-20 15:11 ` David Howells
  2020-11-20 15:11 ` [RFC PATCH 43/76] cachefiles: Round the cachefile size up to DIO block size David Howells
                   ` (33 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:11 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Implement the function that shapes read requests to map onto the granules
in a cache file.

When preparing to fetch data from the server to be cached, the request will
be expanded to align with granule size and cut down so that it doesn't
cross the boundary between a non-present granule and a present granule.

When preparing to read data from the cache, the extent will be cut down so
that it doesn't cross the boundary between a present granule and a
non-present granule.

If no caching is taking place, whatever was requested goes.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/content-map.c |  164 +++++++++++++++++++++++++++++++++++++++++++
 fs/cachefiles/interface.c   |    2 +
 fs/cachefiles/internal.h    |    4 +
 3 files changed, 170 insertions(+)

diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
index 68fcab313361..4ff8c645817c 100644
--- a/fs/cachefiles/content-map.c
+++ b/fs/cachefiles/content-map.c
@@ -10,6 +10,7 @@
 #include <linux/file.h>
 #include <linux/swap.h>
 #include <linux/xattr.h>
+#include <linux/netfs.h>
 #include "internal.h"
 
 static const char cachefiles_xattr_content_map[] =
@@ -40,6 +41,169 @@ static size_t cachefiles_map_size(loff_t i_size)
 	return map_size;
 }
 
+static bool cachefiles_granule_is_present(struct cachefiles_object *object,
+					  long granule)
+{
+	bool res;
+
+	if (granule / 8 >= object->content_map_size)
+		return false;
+	read_lock_bh(&object->content_map_lock);
+	res = test_bit_le(granule, object->content_map);
+	read_unlock_bh(&object->content_map_lock);
+	return res;
+}
+
+static long cachefiles_find_next_granule(struct cachefiles_object *object,
+					 long start_from, long *_limit)
+{
+	long result, limit;
+
+	read_lock_bh(&object->content_map_lock);
+	*_limit = limit = object->content_map_size * 8;
+	result = find_next_bit_le(object->content_map, limit, start_from);
+	read_unlock_bh(&object->content_map_lock);
+	return result;
+}
+
+static long cachefiles_find_next_hole(struct cachefiles_object *object,
+				      long start_from, long *_limit)
+{
+	long result, limit;
+
+	read_lock_bh(&object->content_map_lock);
+	*_limit = limit = object->content_map_size * 8;
+	result = find_next_zero_bit_le(object->content_map, limit, start_from);
+	read_unlock_bh(&object->content_map_lock);
+	return result;
+}
+
+/*
+ * Expand a readahead proposal from the VM to align with cache limits
+ * and granularity.
+ */
+void cachefiles_expand_readahead(struct fscache_op_resources *opr,
+				 loff_t *_start, size_t *_len, loff_t i_size)
+{
+	loff_t start = *_start, delta;
+	size_t len = *_len;
+
+	if (start >= CACHEFILES_SIZE_LIMIT)
+		return;
+
+	if (len > CACHEFILES_SIZE_LIMIT - start)
+		len = *_len = CACHEFILES_SIZE_LIMIT - start;
+
+	delta = start & (CACHEFILES_GRAN_SIZE - 1);
+	if (start - delta < i_size) {
+		start -= delta;
+		len = round_up(len + delta, CACHEFILES_GRAN_SIZE);
+		if (len > i_size - start) {
+			_debug("overshot eof");
+			len = i_size - start;
+		}
+	}
+
+	*_start = start;
+	*_len = len;
+}
+
+/*
+ * Prepare a I/O subrequest of a read request.  We're asked to retrieve all the
+ * remaining data in the read request, but we are allowed to shrink that and we
+ * set flags to indicate where we want it read from.
+ */
+enum netfs_read_source cachefiles_prepare_read(struct netfs_read_subrequest *subreq,
+					       loff_t i_size)
+{
+	struct cachefiles_object *object =
+		container_of(subreq->rreq->cache_resources.object,
+			     struct cachefiles_object, fscache);
+	loff_t start = subreq->start, len = subreq->len, boundary;
+	long granule, next, limit;
+
+	_enter("%llx,%llx", start, len);
+
+	if (start >= CACHEFILES_SIZE_LIMIT) {
+		if (start >= i_size)
+			goto zero_pages_nocache;
+		goto on_server_nocache;
+	}
+	if (len > CACHEFILES_SIZE_LIMIT - start)
+		len = CACHEFILES_SIZE_LIMIT - start;
+
+	granule = start / CACHEFILES_GRAN_SIZE;
+	if (granule / 8 >= object->content_map_size) {
+		cachefiles_expand_content_map(object, i_size);
+		if (granule / 8 >= object->content_map_size)
+			goto maybe_on_server_nocache;
+	}
+
+	if (start >= i_size)
+		goto zero_pages;
+
+	if (cachefiles_granule_is_present(object, granule)) {
+		/* The start of the request is present in the cache - restrict
+		 * the length to what's available.
+		 */
+		if (start & (CACHEFILES_DIO_BLOCK_SIZE - 1)) {
+			/* We should never see DIO-unaligned requests here. */
+			WARN_ON_ONCE(1);
+			len &= CACHEFILES_DIO_BLOCK_SIZE - 1;
+			goto maybe_on_server;
+		}
+
+		next = cachefiles_find_next_hole(object, granule + 1, &limit);
+		_debug("present %lx %lx", granule, limit);
+		if (granule >= limit)
+			goto maybe_on_server;
+		boundary = next * CACHEFILES_GRAN_SIZE;
+		if (len > boundary - start)
+			len = boundary - start;
+		goto in_cache;
+	} else {
+		/* The start of the request is not present in the cache -
+		 * restrict the length to the size of the hole.
+		 */
+		next = cachefiles_find_next_granule(object, granule + 1, &limit);
+		_debug("hole %lx %lx", granule, limit);
+		if (granule >= limit)
+			goto maybe_on_server;
+		boundary = next * CACHEFILES_GRAN_SIZE;
+		if (len > boundary - start)
+			len = boundary - start;
+		goto maybe_on_server;
+	}
+
+maybe_on_server:
+	/* If the start of the request is beyond the original EOF of the file
+	 * on the server then it's not going to be found on the server.
+	 */
+	if (start >= object->fscache.cookie->zero_point)
+		goto zero_pages;
+	goto on_server;
+maybe_on_server_nocache:
+	if (start >= object->fscache.cookie->zero_point)
+		goto zero_pages_nocache;
+	goto on_server_nocache;
+on_server:
+	__set_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags);
+on_server_nocache:
+	subreq->len = len;
+	_leave(" = down %llx", len);
+	return NETFS_DOWNLOAD_FROM_SERVER;
+zero_pages:
+	__set_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags);
+zero_pages_nocache:
+	subreq->len = len;
+	_leave(" = zero %llx", len);
+	return NETFS_FILL_WITH_ZEROES;
+in_cache:
+	subreq->len = len;
+	_leave(" = read %llx", len);
+	return NETFS_READ_FROM_CACHE;
+}
+
 /*
  * Mark the content map to indicate stored granule.
  */
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 1d06cf72eeb0..a20917cb4667 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -462,6 +462,8 @@ static const struct fscache_op_ops cachefiles_io_ops = {
 	.end_operation		= __fscache_end_operation,
 	.read			= cachefiles_read,
 	.write			= cachefiles_write,
+	.expand_readahead	= cachefiles_expand_readahead,
+	.prepare_read		= cachefiles_prepare_read,
 };
 
 static void cachefiles_begin_operation(struct fscache_op_resources *opr)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index e065550a4bc0..e9f45d5053b1 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -125,6 +125,10 @@ extern void cachefiles_daemon_unbind(struct cachefiles_cache *cache);
 /*
  * content-map.c
  */
+extern void cachefiles_expand_readahead(struct fscache_op_resources *opr,
+					loff_t *_start, size_t *_len, loff_t i_size);
+extern enum netfs_read_source cachefiles_prepare_read(struct netfs_read_subrequest *subreq,
+						      loff_t i_size);
 extern void cachefiles_mark_content_map(struct cachefiles_object *object,
 					loff_t start, loff_t len);
 extern void cachefiles_expand_content_map(struct cachefiles_object *object, loff_t size);



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 43/76] cachefiles: Round the cachefile size up to DIO block size
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (40 preceding siblings ...)
  2020-11-20 15:11 ` [RFC PATCH 42/76] cachefiles: Shape requests from the fscache read helper David Howells
@ 2020-11-20 15:11 ` David Howells
  2020-11-20 15:12 ` [RFC PATCH 44/76] cachefiles: Implement read and write parts of new I/O API David Howells
                   ` (32 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:11 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Round the size of a cachefile up to DIO block size so that we can always
read back the last partial page of a file using direct I/O.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/interface.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index a20917cb4667..0e3d5b5ffc55 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -184,6 +184,17 @@ static void cachefiles_update_object(struct fscache_object *_object)
 			cachefiles_remove_object_xattr(cache, object->dentry);
 			goto out;
 		}
+
+		object_size = round_up(object_size, CACHEFILES_DIO_BLOCK_SIZE);
+		_debug("trunc %llx -> %llx", i_size_read(d_inode(object->dentry)), object_size);
+		if (i_size_read(d_inode(object->dentry)) < object_size) {
+			ret = vfs_truncate(&path, object_size);
+			if (ret < 0) {
+				cachefiles_io_error_obj(object, "Trunc-to-dio-size failed");
+				cachefiles_remove_object_xattr(cache, object->dentry);
+				goto out;
+			}
+		}
 	}
 
 	cachefiles_set_object_xattr(object, XATTR_REPLACE);
@@ -354,6 +365,7 @@ static int cachefiles_attr_changed(struct cachefiles_object *object)
 	int ret;
 
 	ni_size = object->fscache.cookie->object_size;
+	ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
 
 	_enter("{OBJ%x},[%llu]",
 	       object->fscache.debug_id, (unsigned long long) ni_size);
@@ -422,6 +434,7 @@ static bool cachefiles_invalidate_object(struct fscache_object *_object)
 			     struct cachefiles_cache, cache);
 
 	ni_size = object->fscache.cookie->object_size;
+	ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
 
 	_enter("{OBJ%x},[%llu]",
 	       object->fscache.debug_id, (unsigned long long)ni_size);



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 44/76] cachefiles: Implement read and write parts of new I/O API
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (41 preceding siblings ...)
  2020-11-20 15:11 ` [RFC PATCH 43/76] cachefiles: Round the cachefile size up to DIO block size David Howells
@ 2020-11-20 15:12 ` David Howells
  2020-11-20 15:12 ` [RFC PATCH 45/76] cachefiles: Add I/O tracepoints David Howells
                   ` (31 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:12 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Implement writing into the cache and reading back from the cache inside
cachefiles using asynchronous direct I/O from the specified iterator.  The
size and position of the request should be aligned to the reported
dio_block_size.

Errors and completion are reported by callback.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/interface.c |   20 ++-
 fs/cachefiles/internal.h  |    2 
 fs/cachefiles/io.c        |  270 ++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 277 insertions(+), 15 deletions(-)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 0e3d5b5ffc55..c14e2b4f5b24 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -12,8 +12,6 @@
 #include "internal.h"
 
 static int cachefiles_attr_changed(struct cachefiles_object *object);
-static void cachefiles_put_object(struct fscache_object *_object,
-				  enum fscache_obj_ref_trace why);
 
 /*
  * Allocate an object record for a cookie lookup and prepare the lookup data.
@@ -160,7 +158,8 @@ static void cachefiles_update_object(struct fscache_object *_object)
 	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
 	const struct cred *saved_cred;
-	loff_t object_size;
+	struct inode *inode;
+	loff_t object_size, i_size;
 	int ret;
 
 	_enter("{OBJ%x}", _object->debug_id);
@@ -172,12 +171,14 @@ static void cachefiles_update_object(struct fscache_object *_object)
 	cachefiles_begin_secure(cache, &saved_cred);
 
 	object_size = object->fscache.cookie->object_size;
-	if (i_size_read(d_inode(object->dentry)) > object_size) {
+	inode = d_inode(object->dentry);
+	i_size = i_size_read(inode);
+	if (i_size > object_size) {
 		struct path path = {
 			.mnt	= cache->mnt,
 			.dentry	= object->dentry
 		};
-		_debug("trunc %llx -> %llx", i_size_read(d_inode(object->dentry)), object_size);
+		_debug("trunc %llx -> %llx", i_size, object_size);
 		ret = vfs_truncate(&path, object_size);
 		if (ret < 0) {
 			cachefiles_io_error_obj(object, "Trunc-to-size failed");
@@ -186,8 +187,9 @@ static void cachefiles_update_object(struct fscache_object *_object)
 		}
 
 		object_size = round_up(object_size, CACHEFILES_DIO_BLOCK_SIZE);
-		_debug("trunc %llx -> %llx", i_size_read(d_inode(object->dentry)), object_size);
-		if (i_size_read(d_inode(object->dentry)) < object_size) {
+		i_size = i_size_read(inode);
+		_debug("trunc %llx -> %llx", i_size, object_size);
+		if (i_size < object_size) {
 			ret = vfs_truncate(&path, object_size);
 			if (ret < 0) {
 				cachefiles_io_error_obj(object, "Trunc-to-dio-size failed");
@@ -283,8 +285,8 @@ static void cachefiles_drop_object(struct fscache_object *_object,
 /*
  * dispose of a reference to an object
  */
-static void cachefiles_put_object(struct fscache_object *_object,
-				  enum fscache_obj_ref_trace why)
+void cachefiles_put_object(struct fscache_object *_object,
+			   enum fscache_obj_ref_trace why)
 {
 	struct cachefiles_object *object;
 	struct fscache_cache *cache;
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index e9f45d5053b1..24a8aed2600d 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -150,6 +150,8 @@ extern int cachefiles_has_space(struct cachefiles_cache *cache,
 extern const struct fscache_cache_ops cachefiles_cache_ops;
 extern struct fscache_object *cachefiles_grab_object(struct fscache_object *_object,
 						     enum fscache_obj_ref_trace why);
+extern void cachefiles_put_object(struct fscache_object *_object,
+				  enum fscache_obj_ref_trace why);
 
 /*
  * io.c
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index bf1930699636..4c66b9183dd6 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -10,9 +10,52 @@
 #include <linux/file.h>
 #include <linux/uio.h>
 #include <linux/xattr.h>
+#include <linux/sched/mm.h>
 #include "internal.h"
 #include <trace/events/fscache.h>
 
+struct cachefiles_kiocb {
+	struct kiocb		iocb;
+	refcount_t		ki_refcnt;
+	loff_t			start;
+	union {
+		size_t		skipped;
+		size_t		len;
+	};
+	struct cachefiles_object *object;
+	fscache_io_terminated_t	term_func;
+	void			*term_func_priv;
+};
+
+static inline void cachefiles_put_kiocb(struct cachefiles_kiocb *ki)
+{
+	if (refcount_dec_and_test(&ki->ki_refcnt)) {
+		cachefiles_put_object(&ki->object->fscache, fscache_obj_put_ioreq);
+		fput(ki->iocb.ki_filp);
+		kfree(ki);
+	}
+}
+
+/*
+ * Handle completion of a read from the cache.
+ */
+static void cachefiles_read_complete(struct kiocb *iocb, long ret, long ret2)
+{
+	struct cachefiles_kiocb *ki = container_of(iocb, struct cachefiles_kiocb, iocb);
+
+	_enter("%ld,%ld", ret, ret2);
+
+	if (ki->term_func) {
+		if (ret < 0)
+			ki->term_func(ki->term_func_priv, ret);
+		else
+			ki->term_func(ki->term_func_priv, ki->skipped + ret);
+	}
+
+	fscache_uncount_io_operation(ki->object->fscache.cookie);
+	cachefiles_put_kiocb(ki);
+}
+
 /*
  * Initiate a read from the cache.
  */
@@ -23,11 +66,142 @@ int cachefiles_read(struct fscache_op_resources *opr,
 		    fscache_io_terminated_t term_func,
 		    void *term_func_priv)
 {
-	fscache_wait_for_operation(opr, FSCACHE_WANT_READ);
+	struct cachefiles_object *object =
+		container_of(opr->object, struct cachefiles_object, fscache);
+	struct cachefiles_kiocb *ki;
+	struct file *file;
+	unsigned int old_nofs;
+	ssize_t ret = -ENOBUFS;
+	size_t len = iov_iter_count(iter), skipped = 0;
+
+	spin_lock(&object->fscache.lock);
+	file = get_file(object->backing_file);
+	spin_unlock(&object->fscache.lock);
+
+	_enter("%pD,%li,%llx,%zx/%llx",
+	       file, file_inode(file)->i_ino, start_pos, len,
+	       i_size_read(file->f_inode));
+
+	__fscache_wait_for_operation(opr, FSCACHE_WANT_READ);
 	fscache_count_io_operation(opr->object->cookie);
+
+	/* If the caller asked us to seek for data before doing the read, then
+	 * we should do that now.  If we find a gap, we fill it with zeros.
+	 */
+	if (seek_data) {
+		loff_t off = start_pos, off2;
+
+		off2 = vfs_llseek(file, off, SEEK_DATA);
+		if (off2 < 0 && off2 >= (loff_t)-MAX_ERRNO && off2 != -ENXIO) {
+			skipped = 0;
+			ret = off2;
+			goto presubmission_error;
+		}
+
+		if (off2 == -ENXIO || off2 >= start_pos + len) {
+			/* The region is beyond the EOF or there's no more data
+			 * in the region, so clear the rest of the buffer and
+			 * return success.
+			 */
+			iov_iter_zero(len, iter);
+			skipped = len;
+			fscache_uncount_io_operation(object->fscache.cookie);
+			ret = 0;
+			goto presubmission_error;
+		}
+
+		skipped = off2 - off;
+		iov_iter_zero(skipped, iter);
+	}
+
+	ret = -ENOBUFS;
+	ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
+	if (!ki)
+		goto presubmission_error;
+
+	refcount_set(&ki->ki_refcnt, 2);
+	ki->iocb.ki_filp	= file;
+	ki->iocb.ki_pos		= start_pos + skipped;
+	ki->iocb.ki_flags	= IOCB_DIRECT;
+	ki->iocb.ki_hint	= ki_hint_validate(file_write_hint(file));
+	ki->iocb.ki_ioprio	= get_current_ioprio();
+	ki->skipped		= skipped;
+	ki->object		= object;
+	ki->term_func		= term_func;
+	ki->term_func_priv	= term_func_priv;
+
+	if (ki->term_func)
+		ki->iocb.ki_complete = cachefiles_read_complete;
+
+	ret = rw_verify_area(READ, file, &ki->iocb.ki_pos, len - skipped);
+	if (ret < 0)
+		goto presubmission_error_free;
+
+	cachefiles_grab_object(&object->fscache, fscache_obj_get_ioreq);
+
+	old_nofs = memalloc_nofs_save();
+	ret = call_read_iter(file, &ki->iocb, iter);
+	memalloc_nofs_restore(old_nofs);
+	switch (ret) {
+	case -EIOCBQUEUED:
+		goto in_progress;
+
+	case -ERESTARTSYS:
+	case -ERESTARTNOINTR:
+	case -ERESTARTNOHAND:
+	case -ERESTART_RESTARTBLOCK:
+		/* There's no easy way to restart the syscall since other AIO's
+		 * may be already running. Just fail this IO with EINTR.
+		 */
+		ret = -EINTR;
+		fallthrough;
+	default:
+		cachefiles_read_complete(&ki->iocb, ret, 0);
+		if (ret > 0)
+			ret = 0;
+		break;
+	}
+
+in_progress:
+	cachefiles_put_kiocb(ki);
+	_leave(" = %zd", ret);
+	return ret;
+
+presubmission_error_free:
+	kfree(ki);
+presubmission_error:
+	fput(file);
 	if (term_func)
-		term_func(term_func_priv, -ENODATA);
-	return -ENODATA;
+		term_func(term_func_priv, ret < 0 ? ret : skipped);
+	return ret;
+}
+
+/*
+ * Handle completion of a write to the cache.
+ */
+static void cachefiles_write_complete(struct kiocb *iocb, long ret, long ret2)
+{
+	struct cachefiles_kiocb *ki = container_of(iocb, struct cachefiles_kiocb, iocb);
+	struct inode *inode = file_inode(ki->iocb.ki_filp);
+
+	_enter("%ld,%ld", ret, ret2);
+
+	/* Tell lockdep we inherited freeze protection from submission thread */
+	__sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
+	__sb_end_write(inode->i_sb, SB_FREEZE_WRITE);
+
+	if (ret < 0) {
+		if (ki->term_func)
+			ki->term_func(ki->term_func_priv, ret);
+	} else {
+		if (ret == ki->len)
+			cachefiles_mark_content_map(ki->object, ki->start, ki->len);
+		if (ki->term_func)
+			ki->term_func(ki->term_func_priv, ret);
+	}
+
+	fscache_uncount_io_operation(ki->object->fscache.cookie);
+	cachefiles_put_kiocb(ki);
 }
 
 /*
@@ -39,11 +213,95 @@ int cachefiles_write(struct fscache_op_resources *opr,
 		     fscache_io_terminated_t term_func,
 		     void *term_func_priv)
 {
-	fscache_wait_for_operation(opr, FSCACHE_WANT_WRITE);
+	struct cachefiles_object *object =
+		container_of(opr->object, struct cachefiles_object, fscache);
+	struct cachefiles_kiocb *ki;
+	struct inode *inode;
+	struct file *file;
+	unsigned int old_nofs;
+	ssize_t ret = -ENOBUFS;
+	size_t len = iov_iter_count(iter);
+
+	spin_lock(&object->fscache.lock);
+	file = get_file(object->backing_file);
+	spin_unlock(&object->fscache.lock);
+
+	_enter("%pD,%li,%llx,%zx/%llx",
+	       file, file_inode(file)->i_ino, start_pos, len,
+	       i_size_read(file->f_inode));
+
+	__fscache_wait_for_operation(opr, FSCACHE_WANT_WRITE);
 	fscache_count_io_operation(opr->object->cookie);
+
+	ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
+	if (!ki)
+		goto presubmission_error;
+
+	refcount_set(&ki->ki_refcnt, 2);
+	ki->iocb.ki_filp	= file;
+	ki->iocb.ki_pos		= start_pos;
+	ki->iocb.ki_flags	= IOCB_DIRECT | IOCB_WRITE;
+	ki->iocb.ki_hint	= ki_hint_validate(file_write_hint(file));
+	ki->iocb.ki_ioprio	= get_current_ioprio();
+	ki->start		= start_pos;
+	ki->len			= len;
+	ki->object		= object;
+	ki->term_func		= term_func;
+	ki->term_func_priv	= term_func_priv;
+
+	if (ki->term_func)
+		ki->iocb.ki_complete = cachefiles_write_complete;
+
+	ret = rw_verify_area(WRITE, file, &ki->iocb.ki_pos, iov_iter_count(iter));
+	if (ret < 0)
+		goto presubmission_error_free;
+
+	/* Open-code file_start_write here to grab freeze protection, which
+	 * will be released by another thread in aio_complete_rw().  Fool
+	 * lockdep by telling it the lock got released so that it doesn't
+	 * complain about the held lock when we return to userspace.
+	 */
+	inode = file_inode(file);
+	__sb_start_write(inode->i_sb, SB_FREEZE_WRITE);
+	__sb_writers_release(inode->i_sb, SB_FREEZE_WRITE);
+
+	cachefiles_grab_object(&object->fscache, fscache_obj_get_ioreq);
+
+	old_nofs = memalloc_nofs_save();
+	ret = call_write_iter(file, &ki->iocb, iter);
+	memalloc_nofs_restore(old_nofs);
+	switch (ret) {
+	case -EIOCBQUEUED:
+		goto in_progress;
+
+	case -ERESTARTSYS:
+	case -ERESTARTNOINTR:
+	case -ERESTARTNOHAND:
+	case -ERESTART_RESTARTBLOCK:
+		/* There's no easy way to restart the syscall since other AIO's
+		 * may be already running. Just fail this IO with EINTR.
+		 */
+		ret = -EINTR;
+		/* Fall through */
+	default:
+		cachefiles_write_complete(&ki->iocb, ret, 0);
+		if (ret > 0)
+			ret = 0;
+		break;
+	}
+
+in_progress:
+	cachefiles_put_kiocb(ki);
+	_leave(" = %zd", ret);
+	return ret;
+
+presubmission_error_free:
+	kfree(ki);
+presubmission_error:
+	fput(file);
 	if (term_func)
-		term_func(term_func_priv, -ENOBUFS);
-	return -ENOBUFS;
+		term_func(term_func_priv, -ENOMEM);
+	return -ENOMEM;
 }
 
 /*



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 45/76] cachefiles: Add I/O tracepoints
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (42 preceding siblings ...)
  2020-11-20 15:12 ` [RFC PATCH 44/76] cachefiles: Implement read and write parts of new I/O API David Howells
@ 2020-11-20 15:12 ` David Howells
  2020-11-20 15:12 ` [RFC PATCH 46/76] fscache: Display cache-specific data in /proc/fs/fscache/objects David Howells
                   ` (30 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:12 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Add some cache I/O tracepoints.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/interface.c         |    2 +
 fs/cachefiles/io.c                |    2 +
 include/trace/events/cachefiles.h |  125 +++++++++++++++++++++++++++++++++++++
 3 files changed, 129 insertions(+)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index c14e2b4f5b24..654c6efb5a0a 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -179,6 +179,7 @@ static void cachefiles_update_object(struct fscache_object *_object)
 			.dentry	= object->dentry
 		};
 		_debug("trunc %llx -> %llx", i_size, object_size);
+		trace_cachefiles_trunc(object, inode, i_size, object_size);
 		ret = vfs_truncate(&path, object_size);
 		if (ret < 0) {
 			cachefiles_io_error_obj(object, "Trunc-to-size failed");
@@ -190,6 +191,7 @@ static void cachefiles_update_object(struct fscache_object *_object)
 		i_size = i_size_read(inode);
 		_debug("trunc %llx -> %llx", i_size, object_size);
 		if (i_size < object_size) {
+			trace_cachefiles_trunc(object, inode, i_size, object_size);
 			ret = vfs_truncate(&path, object_size);
 			if (ret < 0) {
 				cachefiles_io_error_obj(object, "Trunc-to-dio-size failed");
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 4c66b9183dd6..c9866bd5e010 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -139,6 +139,7 @@ int cachefiles_read(struct fscache_op_resources *opr,
 
 	cachefiles_grab_object(&object->fscache, fscache_obj_get_ioreq);
 
+	trace_cachefiles_read(object, file_inode(file), ki->iocb.ki_pos, len - skipped);
 	old_nofs = memalloc_nofs_save();
 	ret = call_read_iter(file, &ki->iocb, iter);
 	memalloc_nofs_restore(old_nofs);
@@ -267,6 +268,7 @@ int cachefiles_write(struct fscache_op_resources *opr,
 
 	cachefiles_grab_object(&object->fscache, fscache_obj_get_ioreq);
 
+	trace_cachefiles_write(object, inode, ki->iocb.ki_pos, len);
 	old_nofs = memalloc_nofs_save();
 	ret = call_write_iter(file, &ki->iocb, iter);
 	memalloc_nofs_restore(old_nofs);
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index e7af1d683009..f9f48c0f5080 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -351,6 +351,131 @@ TRACE_EVENT(cachefiles_coherency,
 		      __entry->content)
 	    );
 
+TRACE_EVENT(cachefiles_read,
+	    TP_PROTO(struct cachefiles_object *obj,
+		     struct inode *backer,
+		     loff_t start,
+		     size_t len),
+
+	    TP_ARGS(obj, backer, start, len),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,			obj	)
+		    __field(unsigned int,			backer	)
+		    __field(size_t,				len	)
+		    __field(loff_t,				start	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->backer	= backer->i_ino;
+		    __entry->start	= start;
+		    __entry->len	= len;
+			   ),
+
+	    TP_printk("o=%08x b=%08x s=%llx l=%zx",
+		      __entry->obj,
+		      __entry->backer,
+		      __entry->start,
+		      __entry->len)
+	    );
+
+TRACE_EVENT(cachefiles_write,
+	    TP_PROTO(struct cachefiles_object *obj,
+		     struct inode *backer,
+		     loff_t start,
+		     size_t len),
+
+	    TP_ARGS(obj, backer, start, len),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,			obj	)
+		    __field(unsigned int,			backer	)
+		    __field(size_t,				len	)
+		    __field(loff_t,				start	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->backer	= backer->i_ino;
+		    __entry->start	= start;
+		    __entry->len	= len;
+			   ),
+
+	    TP_printk("o=%08x b=%08x s=%llx l=%zx",
+		      __entry->obj,
+		      __entry->backer,
+		      __entry->start,
+		      __entry->len)
+	    );
+
+TRACE_EVENT(cachefiles_trunc,
+	    TP_PROTO(struct cachefiles_object *obj, struct inode *backer,
+		     loff_t from, loff_t to),
+
+	    TP_ARGS(obj, backer, from, to),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,			obj	)
+		    __field(unsigned int,			backer	)
+		    __field(loff_t,				from	)
+		    __field(loff_t,				to	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->backer	= backer->i_ino;
+		    __entry->from	= from;
+		    __entry->to		= to;
+			   ),
+
+	    TP_printk("o=%08x b=%08x l=%llx->%llx",
+		      __entry->obj,
+		      __entry->backer,
+		      __entry->from,
+		      __entry->to)
+	    );
+
+TRACE_EVENT(cachefiles_tmpfile,
+	    TP_PROTO(struct cachefiles_object *obj, struct inode *backer),
+
+	    TP_ARGS(obj, backer),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,			obj	)
+		    __field(unsigned int,			backer	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->backer	= backer->i_ino;
+			   ),
+
+	    TP_printk("o=%08x b=%08x",
+		      __entry->obj,
+		      __entry->backer)
+	    );
+
+TRACE_EVENT(cachefiles_link,
+	    TP_PROTO(struct cachefiles_object *obj, struct inode *backer),
+
+	    TP_ARGS(obj, backer),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,			obj	)
+		    __field(unsigned int,			backer	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->obj	= obj->fscache.debug_id;
+		    __entry->backer	= backer->i_ino;
+			   ),
+
+	    TP_printk("o=%08x b=%08x",
+		      __entry->obj,
+		      __entry->backer)
+	    );
+
 #endif /* _TRACE_CACHEFILES_H */
 
 /* This part must be outside protection */



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 46/76] fscache: Display cache-specific data in /proc/fs/fscache/objects
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (43 preceding siblings ...)
  2020-11-20 15:12 ` [RFC PATCH 45/76] cachefiles: Add I/O tracepoints David Howells
@ 2020-11-20 15:12 ` David Howells
  2020-11-20 15:12 ` [RFC PATCH 47/76] fscache: Remove more obsolete stats David Howells
                   ` (29 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:12 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Allow the cache to add information in /proc/fs/fscache/objects instead of
displaying cookie key and aux data - which can be seen in the cookies file.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/content-map.c   |   41 +++++++++++++++++++++++++++++++++++++++++
 fs/cachefiles/interface.c     |    1 +
 fs/cachefiles/internal.h      |    1 +
 fs/fscache/object-list.c      |   33 +++------------------------------
 include/linux/fscache-cache.h |    4 ++++
 5 files changed, 50 insertions(+), 30 deletions(-)

diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
index 4ff8c645817c..4c8dc11d2d4c 100644
--- a/fs/cachefiles/content-map.c
+++ b/fs/cachefiles/content-map.c
@@ -408,3 +408,44 @@ void cachefiles_save_content_map(struct cachefiles_object *object)
 
 	_leave(" = %zd", ret);
 }
+
+/*
+ * Display object information in proc.
+ */
+int cachefiles_display_object(struct seq_file *m, struct fscache_object *_object)
+{
+	struct cachefiles_object *object =
+		container_of(_object, struct cachefiles_object, fscache);
+
+	if (object->fscache.cookie->type == FSCACHE_COOKIE_TYPE_INDEX) {
+		if (object->content_info != CACHEFILES_CONTENT_NO_DATA)
+			seq_printf(m, " ???%u???", object->content_info);
+	} else {
+		switch (object->content_info) {
+		case CACHEFILES_CONTENT_NO_DATA:
+			seq_puts(m, " <n>");
+			break;
+		case CACHEFILES_CONTENT_SINGLE:
+			seq_puts(m, " <s>");
+			break;
+		case CACHEFILES_CONTENT_ALL:
+			seq_puts(m, " <a>");
+			break;
+		case CACHEFILES_CONTENT_MAP:
+			read_lock_bh(&object->content_map_lock);
+			if (object->content_map) {
+				seq_printf(m, " %*phN",
+					   object->content_map_size,
+					   object->content_map);
+			}
+			read_unlock_bh(&object->content_map_lock);
+			break;
+		default:
+			seq_printf(m, " <%u>", object->content_info);
+			break;
+		}
+	}
+
+	seq_putc(m, '\n');
+	return 0;
+}
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 654c6efb5a0a..c083d9833b87 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -502,4 +502,5 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
 	.get_object_usage	= cachefiles_get_object_usage,
 	.sync_cache		= cachefiles_sync_cache,
 	.begin_operation	= cachefiles_begin_operation,
+	.display_object		= cachefiles_display_object,
 };
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 24a8aed2600d..b58177f65135 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -135,6 +135,7 @@ extern void cachefiles_expand_content_map(struct cachefiles_object *object, loff
 extern void cachefiles_shorten_content_map(struct cachefiles_object *object, loff_t new_size);
 extern bool cachefiles_load_content_map(struct cachefiles_object *object);
 extern void cachefiles_save_content_map(struct cachefiles_object *object);
+extern int cachefiles_display_object(struct seq_file *m, struct fscache_object *object);
 
 /*
  * daemon.c
diff --git a/fs/fscache/object-list.c b/fs/fscache/object-list.c
index 5777f909d31a..361610e124bd 100644
--- a/fs/fscache/object-list.c
+++ b/fs/fscache/object-list.c
@@ -155,7 +155,6 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 	struct fscache_cookie *cookie;
 	unsigned long config = data->config;
 	char _type[3], *type;
-	u8 *p;
 
 	if ((unsigned long) v == 1) {
 		seq_puts(m, "OBJECT   PARENT   USE CHLDN OPS FL  S"
@@ -201,8 +200,6 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 		   obj->stage);
 
 	if (obj->cookie) {
-		uint16_t keylen = 0, auxlen = 0;
-
 		switch (cookie->type) {
 		case 0:
 			type = "IX";
@@ -211,8 +208,7 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 			type = "DT";
 			break;
 		default:
-			snprintf(_type, sizeof(_type), "%02u",
-				 cookie->type);
+			snprintf(_type, sizeof(_type), "%02x", cookie->type);
 			type = _type;
 			break;
 		}
@@ -223,34 +219,11 @@ static int fscache_objlist_show(struct seq_file *m, void *v)
 			   type,
 			   cookie->stage,
 			   cookie->flags);
-
-		if (config & FSCACHE_OBJLIST_CONFIG_KEY)
-			keylen = cookie->key_len;
-
-		if (config & FSCACHE_OBJLIST_CONFIG_AUX)
-			auxlen = cookie->aux_len;
-
-		if (keylen > 0 || auxlen > 0) {
-			seq_puts(m, " ");
-			p = keylen <= sizeof(cookie->inline_key) ?
-				cookie->inline_key : cookie->key;
-			for (; keylen > 0; keylen--)
-				seq_printf(m, "%02x", *p++);
-			if (auxlen > 0) {
-				if (config & FSCACHE_OBJLIST_CONFIG_KEY)
-					seq_puts(m, ", ");
-				p = auxlen <= sizeof(cookie->inline_aux) ?
-					cookie->inline_aux : cookie->aux;
-				for (; auxlen > 0; auxlen--)
-					seq_printf(m, "%02x", *p++);
-			}
-		}
-
-		seq_puts(m, "\n");
 	} else {
 		seq_puts(m, "<no_netfs>\n");
 	}
-	return 0;
+
+	return obj->cache->ops->display_object(m, obj);
 }
 
 static const struct seq_operations fscache_objlist_ops = {
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 97415d19aa82..54625464a109 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -19,6 +19,7 @@
 
 #define NR_MAXCACHES BITS_PER_LONG
 
+struct seq_file;
 struct fscache_cache;
 struct fscache_cache_ops;
 struct fscache_object;
@@ -139,6 +140,9 @@ struct fscache_cache_ops {
 
 	/* Begin an operation on a cache object */
 	void (*begin_operation)(struct fscache_op_resources *opr);
+
+	/* Display object info in /proc/fs/fscache/objects */
+	int (*display_object)(struct seq_file *m, struct fscache_object *object);
 };
 
 extern struct fscache_cookie fscache_fsdef_index;



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 47/76] fscache: Remove more obsolete stats
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (44 preceding siblings ...)
  2020-11-20 15:12 ` [RFC PATCH 46/76] fscache: Display cache-specific data in /proc/fs/fscache/objects David Howells
@ 2020-11-20 15:12 ` David Howells
  2020-11-20 15:12 ` [RFC PATCH 48/76] fscache: Always create /proc/fs/fscache/stats if configured David Howells
                   ` (28 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:12 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Remove some more stats that have become obsolete.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/internal.h |   20 ++---------------
 fs/fscache/obj.c      |    6 +++--
 fs/fscache/stats.c    |   58 +++++++++----------------------------------------
 3 files changed, 16 insertions(+), 68 deletions(-)

diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 2c1f4151c092..ae05f636faac 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -175,32 +175,18 @@ extern void fscache_proc_cleanup(void);
  * stats.c
  */
 #ifdef CONFIG_FSCACHE_STATS
-extern atomic_t fscache_n_op_pend;
-extern atomic_t fscache_n_op_run;
-extern atomic_t fscache_n_op_enqueue;
-extern atomic_t fscache_n_op_deferred_release;
-extern atomic_t fscache_n_op_initialised;
-extern atomic_t fscache_n_op_release;
-extern atomic_t fscache_n_op_gc;
-extern atomic_t fscache_n_op_cancelled;
-extern atomic_t fscache_n_op_rejected;
-
 extern atomic_t fscache_n_acquires;
 extern atomic_t fscache_n_acquires_null;
 extern atomic_t fscache_n_acquires_no_cache;
 extern atomic_t fscache_n_acquires_ok;
-extern atomic_t fscache_n_acquires_nobufs;
 extern atomic_t fscache_n_acquires_oom;
 
 extern atomic_t fscache_n_invalidates;
-extern atomic_t fscache_n_invalidates_run;
 
 extern atomic_t fscache_n_updates;
-extern atomic_t fscache_n_updates_null;
 extern atomic_t fscache_n_updates_run;
 
 extern atomic_t fscache_n_relinquishes;
-extern atomic_t fscache_n_relinquishes_null;
 extern atomic_t fscache_n_relinquishes_retire;
 
 extern atomic_t fscache_n_cookie_index;
@@ -212,15 +198,13 @@ extern atomic_t fscache_n_object_no_alloc;
 extern atomic_t fscache_n_object_lookups;
 extern atomic_t fscache_n_object_lookups_negative;
 extern atomic_t fscache_n_object_lookups_positive;
-extern atomic_t fscache_n_object_lookups_timed_out;
-extern atomic_t fscache_n_object_created;
+extern atomic_t fscache_n_object_creates;
 extern atomic_t fscache_n_object_avail;
 extern atomic_t fscache_n_object_dead;
 
 extern atomic_t fscache_n_cop_alloc_object;
 extern atomic_t fscache_n_cop_lookup_object;
-extern atomic_t fscache_n_cop_lookup_complete;
-extern atomic_t fscache_n_cop_grab_object;
+extern atomic_t fscache_n_cop_create_object;
 extern atomic_t fscache_n_cop_invalidate_object;
 extern atomic_t fscache_n_cop_update_object;
 extern atomic_t fscache_n_cop_drop_object;
diff --git a/fs/fscache/obj.c b/fs/fscache/obj.c
index 7286d3e2eb31..23598bec639e 100644
--- a/fs/fscache/obj.c
+++ b/fs/fscache/obj.c
@@ -47,10 +47,10 @@ static int fscache_do_lookup_object(struct fscache_object *object, void *data)
 static int fscache_do_create_object(struct fscache_object *object, void *data)
 {
 	int ret;
-	fscache_stat(&fscache_n_object_lookups);
-	fscache_stat(&fscache_n_cop_lookup_object);
+	fscache_stat(&fscache_n_object_creates);
+	fscache_stat(&fscache_n_cop_create_object);
 	ret = object->cache->ops->create_object(object, data);
-	fscache_stat_d(&fscache_n_cop_lookup_object);
+	fscache_stat_d(&fscache_n_cop_create_object);
 	return ret;
 }
 
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index 583817f4f113..dffe6925aadb 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -14,32 +14,18 @@
 /*
  * operation counters
  */
-atomic_t fscache_n_op_pend;
-atomic_t fscache_n_op_run;
-atomic_t fscache_n_op_enqueue;
-atomic_t fscache_n_op_deferred_release;
-atomic_t fscache_n_op_initialised;
-atomic_t fscache_n_op_release;
-atomic_t fscache_n_op_gc;
-atomic_t fscache_n_op_cancelled;
-atomic_t fscache_n_op_rejected;
-
 atomic_t fscache_n_acquires;
 atomic_t fscache_n_acquires_null;
 atomic_t fscache_n_acquires_no_cache;
 atomic_t fscache_n_acquires_ok;
-atomic_t fscache_n_acquires_nobufs;
 atomic_t fscache_n_acquires_oom;
 
 atomic_t fscache_n_invalidates;
-atomic_t fscache_n_invalidates_run;
 
 atomic_t fscache_n_updates;
-atomic_t fscache_n_updates_null;
 atomic_t fscache_n_updates_run;
 
 atomic_t fscache_n_relinquishes;
-atomic_t fscache_n_relinquishes_null;
 atomic_t fscache_n_relinquishes_retire;
 
 atomic_t fscache_n_cookie_index;
@@ -51,15 +37,13 @@ atomic_t fscache_n_object_no_alloc;
 atomic_t fscache_n_object_lookups;
 atomic_t fscache_n_object_lookups_negative;
 atomic_t fscache_n_object_lookups_positive;
-atomic_t fscache_n_object_lookups_timed_out;
-atomic_t fscache_n_object_created;
+atomic_t fscache_n_object_creates;
 atomic_t fscache_n_object_avail;
 atomic_t fscache_n_object_dead;
 
 atomic_t fscache_n_cop_alloc_object;
 atomic_t fscache_n_cop_lookup_object;
-atomic_t fscache_n_cop_lookup_complete;
-atomic_t fscache_n_cop_grab_object;
+atomic_t fscache_n_cop_create_object;
 atomic_t fscache_n_cop_invalidate_object;
 atomic_t fscache_n_cop_update_object;
 atomic_t fscache_n_cop_drop_object;
@@ -90,53 +74,33 @@ int fscache_stats_show(struct seq_file *m, void *v)
 		   atomic_read(&fscache_n_object_avail),
 		   atomic_read(&fscache_n_object_dead));
 
-	seq_printf(m, "Acquire: n=%u nul=%u noc=%u ok=%u nbf=%u"
-		   " oom=%u\n",
+	seq_printf(m, "Acquire: n=%u nul=%u noc=%u ok=%u oom=%u\n",
 		   atomic_read(&fscache_n_acquires),
 		   atomic_read(&fscache_n_acquires_null),
 		   atomic_read(&fscache_n_acquires_no_cache),
 		   atomic_read(&fscache_n_acquires_ok),
-		   atomic_read(&fscache_n_acquires_nobufs),
 		   atomic_read(&fscache_n_acquires_oom));
 
-	seq_printf(m, "Lookups: n=%u neg=%u pos=%u crt=%u tmo=%u\n",
+	seq_printf(m, "Lookups: n=%u neg=%u pos=%u crt=%u\n",
 		   atomic_read(&fscache_n_object_lookups),
 		   atomic_read(&fscache_n_object_lookups_negative),
 		   atomic_read(&fscache_n_object_lookups_positive),
-		   atomic_read(&fscache_n_object_created),
-		   atomic_read(&fscache_n_object_lookups_timed_out));
+		   atomic_read(&fscache_n_object_creates));
 
-	seq_printf(m, "Invals : n=%u run=%u\n",
-		   atomic_read(&fscache_n_invalidates),
-		   atomic_read(&fscache_n_invalidates_run));
+	seq_printf(m, "Invals : n=%u\n",
+		   atomic_read(&fscache_n_invalidates));
 
-	seq_printf(m, "Updates: n=%u nul=%u run=%u\n",
+	seq_printf(m, "Updates: n=%u run=%u\n",
 		   atomic_read(&fscache_n_updates),
-		   atomic_read(&fscache_n_updates_null),
 		   atomic_read(&fscache_n_updates_run));
 
-	seq_printf(m, "Relinqs: n=%u nul=%u rtr=%u\n",
+	seq_printf(m, "Relinqs: n=%u rtr=%u\n",
 		   atomic_read(&fscache_n_relinquishes),
-		   atomic_read(&fscache_n_relinquishes_null),
 		   atomic_read(&fscache_n_relinquishes_retire));
 
-	seq_printf(m, "Ops    : pend=%u run=%u enq=%u can=%u rej=%u\n",
-		   atomic_read(&fscache_n_op_pend),
-		   atomic_read(&fscache_n_op_run),
-		   atomic_read(&fscache_n_op_enqueue),
-		   atomic_read(&fscache_n_op_cancelled),
-		   atomic_read(&fscache_n_op_rejected));
-	seq_printf(m, "Ops    : ini=%u dfr=%u rel=%u gc=%u\n",
-		   atomic_read(&fscache_n_op_initialised),
-		   atomic_read(&fscache_n_op_deferred_release),
-		   atomic_read(&fscache_n_op_release),
-		   atomic_read(&fscache_n_op_gc));
-
-	seq_printf(m, "CacheOp: alo=%d luo=%d luc=%d gro=%d\n",
+	seq_printf(m, "CacheOp: alo=%d luo=%d\n",
 		   atomic_read(&fscache_n_cop_alloc_object),
-		   atomic_read(&fscache_n_cop_lookup_object),
-		   atomic_read(&fscache_n_cop_lookup_complete),
-		   atomic_read(&fscache_n_cop_grab_object));
+		   atomic_read(&fscache_n_cop_lookup_object));
 	seq_printf(m, "CacheOp: inv=%d upo=%d dro=%d pto=%d atc=%d syn=%d\n",
 		   atomic_read(&fscache_n_cop_invalidate_object),
 		   atomic_read(&fscache_n_cop_update_object),



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 48/76] fscache: Always create /proc/fs/fscache/stats if configured
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (45 preceding siblings ...)
  2020-11-20 15:12 ` [RFC PATCH 47/76] fscache: Remove more obsolete stats David Howells
@ 2020-11-20 15:12 ` David Howells
  2020-11-20 15:13 ` [RFC PATCH 49/76] netfs: Stats David Howells
                   ` (27 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:12 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Split the fscache initialisation so that /proc/fs/fscache/stats is always
created, if FSCACHE_STATS=y, thereby allowing it to be used if the read
helper is enabled, but not the rest of the caching infrastructure.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/Makefile     |    3 +
 fs/fscache/cache_init.c |  139 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/fscache/internal.h   |   12 +++-
 fs/fscache/main.c       |  115 ++++-----------------------------------
 fs/fscache/proc.c       |   52 +-----------------
 fs/fscache/stats.c      |    8 +++
 6 files changed, 174 insertions(+), 155 deletions(-)
 create mode 100644 fs/fscache/cache_init.c

diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
index 3caf66810e7b..5d3284f6fe2d 100644
--- a/fs/fscache/Makefile
+++ b/fs/fscache/Makefile
@@ -5,6 +5,7 @@
 
 fscache-y := \
 	cache.o \
+	cache_init.o \
 	cookie.o \
 	dispatcher.o \
 	fsdef.o \
@@ -15,8 +16,8 @@ fscache-y := \
 	object_bits.o
 
 fscache-$(CONFIG_PROC_FS) += proc.o
-fscache-$(CONFIG_FSCACHE_STATS) += stats.o
 fscache-$(CONFIG_FSCACHE_HISTOGRAM) += histogram.o
 fscache-$(CONFIG_FSCACHE_OBJECT_LIST) += object-list.o
+fscache-$(CONFIG_FSCACHE_STATS) += stats.o
 
 obj-$(CONFIG_FSCACHE) := fscache.o
diff --git a/fs/fscache/cache_init.c b/fs/fscache/cache_init.c
new file mode 100644
index 000000000000..8cade2e00050
--- /dev/null
+++ b/fs/fscache/cache_init.c
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* General filesystem local caching manager
+ *
+ * Copyright (C) 2004-2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#define FSCACHE_DEBUG_LEVEL CACHE
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/completion.h>
+#include <linux/slab.h>
+#include <linux/seq_file.h>
+#define CREATE_TRACE_POINTS
+#include "internal.h"
+
+struct kobject *fscache_root;
+struct workqueue_struct *fscache_op_wq;
+
+/* these values serve as lower bounds, will be adjusted in fscache_init() */
+static unsigned fscache_object_max_active = 4;
+static unsigned fscache_op_max_active = 2;
+
+#ifdef CONFIG_SYSCTL
+static struct ctl_table_header *fscache_sysctl_header;
+
+static int fscache_max_active_sysctl(struct ctl_table *table, int write,
+				     void *buffer, size_t *lenp, loff_t *ppos)
+{
+	struct workqueue_struct **wqp = table->extra1;
+	unsigned int *datap = table->data;
+	int ret;
+
+	ret = proc_dointvec(table, write, buffer, lenp, ppos);
+	if (ret == 0)
+		workqueue_set_max_active(*wqp, *datap);
+	return ret;
+}
+
+static struct ctl_table fscache_sysctls[] = {
+	{
+		.procname	= "operation_max_active",
+		.data		= &fscache_op_max_active,
+		.maxlen		= sizeof(unsigned),
+		.mode		= 0644,
+		.proc_handler	= fscache_max_active_sysctl,
+		.extra1		= &fscache_op_wq,
+	},
+	{}
+};
+
+static struct ctl_table fscache_sysctls_root[] = {
+	{
+		.procname	= "fscache",
+		.mode		= 0555,
+		.child		= fscache_sysctls,
+	},
+	{}
+};
+#endif
+
+/*
+ * Initialise the caching code.
+ */
+int __init fscache_init_caching(void)
+{
+	int ret;
+
+	fscache_op_max_active =
+		clamp_val(fscache_object_max_active / 2,
+			  fscache_op_max_active, WQ_UNBOUND_MAX_ACTIVE);
+
+	ret = -ENOMEM;
+	fscache_op_wq = alloc_workqueue("fscache_operation", WQ_UNBOUND,
+					fscache_op_max_active);
+	if (!fscache_op_wq)
+		goto error_op_wq;
+
+	ret = fscache_init_dispatchers();
+	if (ret < 0)
+		goto error_dispatchers;
+
+	ret = fscache_proc_caching_init();
+	if (ret < 0)
+		goto error_proc;
+
+#ifdef CONFIG_SYSCTL
+	ret = -ENOMEM;
+	fscache_sysctl_header = register_sysctl_table(fscache_sysctls_root);
+	if (!fscache_sysctl_header)
+		goto error_sysctl;
+#endif
+
+	fscache_cookie_jar = kmem_cache_create("fscache_cookie_jar",
+					       sizeof(struct fscache_cookie),
+					       0, 0, NULL);
+	if (!fscache_cookie_jar) {
+		pr_notice("Failed to allocate a cookie jar\n");
+		ret = -ENOMEM;
+		goto error_cookie_jar;
+	}
+
+	fscache_root = kobject_create_and_add("fscache", kernel_kobj);
+	if (!fscache_root)
+		goto error_kobj;
+
+	return 0;
+
+error_kobj:
+	kmem_cache_destroy(fscache_cookie_jar);
+error_cookie_jar:
+#ifdef CONFIG_SYSCTL
+	unregister_sysctl_table(fscache_sysctl_header);
+error_sysctl:
+#endif
+	fscache_kill_dispatchers();
+error_dispatchers:
+error_proc:
+	destroy_workqueue(fscache_op_wq);
+error_op_wq:
+	return ret;
+}
+
+/*
+ * clean up on module removal
+ */
+void __exit fscache_exit_caching(void)
+{
+	_enter("");
+
+	kobject_put(fscache_root);
+	kmem_cache_destroy(fscache_cookie_jar);
+#ifdef CONFIG_SYSCTL
+	unregister_sysctl_table(fscache_sysctl_header);
+#endif
+	fscache_kill_dispatchers();
+	destroy_workqueue(fscache_op_wq);
+}
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index ae05f636faac..1721823b8cac 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -56,6 +56,12 @@ static inline void fscache_put_cache_tag(struct fscache_cache_tag *tag)
 		kfree(tag);
 }
 
+/*
+ * cache_init.c
+ */
+extern int __init fscache_init_caching(void);
+extern void __exit fscache_exit_caching(void);
+
 /*
  * cookie.c
  */
@@ -164,11 +170,9 @@ extern void fscache_objlist_remove(struct fscache_object *);
  * proc.c
  */
 #ifdef CONFIG_PROC_FS
-extern int __init fscache_proc_init(void);
-extern void fscache_proc_cleanup(void);
+extern int __init fscache_proc_caching_init(void);
 #else
 #define fscache_proc_init()	(0)
-#define fscache_proc_cleanup()	do {} while (0)
 #endif
 
 /*
@@ -230,11 +234,13 @@ static inline void fscache_stat_d(atomic_t *stat)
 #define __fscache_stat(stat) (stat)
 
 int fscache_stats_show(struct seq_file *m, void *v);
+extern int __init fscache_proc_stats_init(void);
 #else
 
 #define __fscache_stat(stat) (NULL)
 #define fscache_stat(stat) do {} while (0)
 #define fscache_stat_d(stat) do {} while (0)
+#define fscache_proc_stats_init(void) 0
 #endif
 
 static inline
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index 003e53d17245..9bdcd9557aa6 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -11,8 +11,7 @@
 #include <linux/sched.h>
 #include <linux/completion.h>
 #include <linux/slab.h>
-#include <linux/seq_file.h>
-#define CREATE_TRACE_POINTS
+#include <linux/proc_fs.h>
 #include "internal.h"
 
 MODULE_DESCRIPTION("FS Cache Manager");
@@ -25,115 +24,31 @@ module_param_named(debug, fscache_debug, uint,
 MODULE_PARM_DESC(fscache_debug,
 		 "FS-Cache debugging mask");
 
-struct kobject *fscache_root;
-struct workqueue_struct *fscache_op_wq;
-
-/* these values serve as lower bounds, will be adjusted in fscache_init() */
-static unsigned fscache_object_max_active = 4;
-static unsigned fscache_op_max_active = 2;
-
-#ifdef CONFIG_SYSCTL
-static struct ctl_table_header *fscache_sysctl_header;
-
-static int fscache_max_active_sysctl(struct ctl_table *table, int write,
-				     void *buffer, size_t *lenp, loff_t *ppos)
-{
-	struct workqueue_struct **wqp = table->extra1;
-	unsigned int *datap = table->data;
-	int ret;
-
-	ret = proc_dointvec(table, write, buffer, lenp, ppos);
-	if (ret == 0)
-		workqueue_set_max_active(*wqp, *datap);
-	return ret;
-}
-
-static struct ctl_table fscache_sysctls[] = {
-	{
-		.procname	= "operation_max_active",
-		.data		= &fscache_op_max_active,
-		.maxlen		= sizeof(unsigned),
-		.mode		= 0644,
-		.proc_handler	= fscache_max_active_sysctl,
-		.extra1		= &fscache_op_wq,
-	},
-	{}
-};
-
-static struct ctl_table fscache_sysctls_root[] = {
-	{
-		.procname	= "fscache",
-		.mode		= 0555,
-		.child		= fscache_sysctls,
-	},
-	{}
-};
-#endif
-
 /*
- * initialise the fs caching module
+ * Initialise the module
  */
 static int __init fscache_init(void)
 {
 	int ret;
 
-	fscache_op_max_active =
-		clamp_val(fscache_object_max_active / 2,
-			  fscache_op_max_active, WQ_UNBOUND_MAX_ACTIVE);
-
-	ret = -ENOMEM;
-	fscache_op_wq = alloc_workqueue("fscache_operation", WQ_UNBOUND,
-					fscache_op_max_active);
-	if (!fscache_op_wq)
-		goto error_op_wq;
+	if (!proc_mkdir("fs/fscache", NULL))
+		return -ENOMEM;
 
-	ret = fscache_init_dispatchers();
+	ret = fscache_proc_stats_init();
 	if (ret < 0)
-		goto error_dispatchers;
+		goto error;
 
-	ret = fscache_proc_init();
+	ret = fscache_init_caching();
 	if (ret < 0)
-		goto error_proc;
-
-#ifdef CONFIG_SYSCTL
-	ret = -ENOMEM;
-	fscache_sysctl_header = register_sysctl_table(fscache_sysctls_root);
-	if (!fscache_sysctl_header)
-		goto error_sysctl;
-#endif
-
-	fscache_cookie_jar = kmem_cache_create("fscache_cookie_jar",
-					       sizeof(struct fscache_cookie),
-					       0, 0, NULL);
-	if (!fscache_cookie_jar) {
-		pr_notice("Failed to allocate a cookie jar\n");
-		ret = -ENOMEM;
-		goto error_cookie_jar;
-	}
-
-	fscache_root = kobject_create_and_add("fscache", kernel_kobj);
-	if (!fscache_root)
-		goto error_kobj;
+		goto error;
 
 	pr_notice("Loaded\n");
 	return 0;
 
-error_kobj:
-	kmem_cache_destroy(fscache_cookie_jar);
-error_cookie_jar:
-#ifdef CONFIG_SYSCTL
-	unregister_sysctl_table(fscache_sysctl_header);
-error_sysctl:
-#endif
-	fscache_kill_dispatchers();
-error_dispatchers:
-	fscache_proc_cleanup();
-error_proc:
-	destroy_workqueue(fscache_op_wq);
-error_op_wq:
+error:
+	remove_proc_subtree("fs/fscache", NULL);
 	return ret;
 }
-
 fs_initcall(fscache_init);
 
 /*
@@ -143,14 +58,8 @@ static void __exit fscache_exit(void)
 {
 	_enter("");
 
-	kobject_put(fscache_root);
-	kmem_cache_destroy(fscache_cookie_jar);
-#ifdef CONFIG_SYSCTL
-	unregister_sysctl_table(fscache_sysctl_header);
-#endif
-	fscache_proc_cleanup();
-	fscache_kill_dispatchers();
-	destroy_workqueue(fscache_op_wq);
+	remove_proc_subtree("fs/fscache", NULL);
+	fscache_exit_caching();
 	pr_notice("Unloaded\n");
 }
 
diff --git a/fs/fscache/proc.c b/fs/fscache/proc.c
index 729d083f1e91..7e156b21bb1d 100644
--- a/fs/fscache/proc.c
+++ b/fs/fscache/proc.c
@@ -14,67 +14,23 @@
 /*
  * initialise the /proc/fs/fscache/ directory
  */
-int __init fscache_proc_init(void)
+int __init fscache_proc_caching_init(void)
 {
-	if (!proc_mkdir("fs/fscache", NULL))
-		goto error_dir;
-
 	if (!proc_create_seq("fs/fscache/cookies", S_IFREG | 0444, NULL,
 			     &fscache_cookies_seq_ops))
-		goto error_cookies;
-
-#ifdef CONFIG_FSCACHE_STATS
-	if (!proc_create_single("fs/fscache/stats", S_IFREG | 0444, NULL,
-			fscache_stats_show))
-		goto error_stats;
-#endif
+		return -ENOMEM;
 
 #ifdef CONFIG_FSCACHE_HISTOGRAM
 	if (!proc_create_seq("fs/fscache/histogram", S_IFREG | 0444, NULL,
 			 &fscache_histogram_ops))
-		goto error_histogram;
+		return -ENOMEM;
 #endif
 
 #ifdef CONFIG_FSCACHE_OBJECT_LIST
 	if (!proc_create("fs/fscache/objects", S_IFREG | 0444, NULL,
 			 &fscache_objlist_proc_ops))
-		goto error_objects;
+		return -ENOMEM;
 #endif
 
 	return 0;
-
-#ifdef CONFIG_FSCACHE_OBJECT_LIST
-error_objects:
-#endif
-#ifdef CONFIG_FSCACHE_HISTOGRAM
-	remove_proc_entry("fs/fscache/histogram", NULL);
-error_histogram:
-#endif
-#ifdef CONFIG_FSCACHE_STATS
-	remove_proc_entry("fs/fscache/stats", NULL);
-error_stats:
-#endif
-	remove_proc_entry("fs/fscache/cookies", NULL);
-error_cookies:
-	remove_proc_entry("fs/fscache", NULL);
-error_dir:
-	return -ENOMEM;
-}
-
-/*
- * clean up the /proc/fs/fscache/ directory
- */
-void fscache_proc_cleanup(void)
-{
-#ifdef CONFIG_FSCACHE_OBJECT_LIST
-	remove_proc_entry("fs/fscache/objects", NULL);
-#endif
-#ifdef CONFIG_FSCACHE_HISTOGRAM
-	remove_proc_entry("fs/fscache/histogram", NULL);
-#endif
-#ifdef CONFIG_FSCACHE_STATS
-	remove_proc_entry("fs/fscache/stats", NULL);
-#endif
-	remove_proc_entry("fs/fscache/cookies", NULL);
-	remove_proc_entry("fs/fscache", NULL);
 }
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index dffe6925aadb..bf2935571de5 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -115,3 +115,11 @@ int fscache_stats_show(struct seq_file *m, void *v)
 		   atomic_read(&fscache_n_cache_culled_objects));
 	return 0;
 }
+
+int __init fscache_proc_stats_init(void)
+{
+	if (!proc_create_single("fs/fscache/stats", S_IFREG | 0444, NULL,
+			fscache_stats_show))
+		return -ENOMEM;
+	return 0;
+}



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 49/76] netfs: Stats
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (46 preceding siblings ...)
  2020-11-20 15:12 ` [RFC PATCH 48/76] fscache: Always create /proc/fs/fscache/stats if configured David Howells
@ 2020-11-20 15:13 ` David Howells
  2020-11-20 15:13 ` [RFC PATCH 50/76] fscache: New stats David Howells
                   ` (26 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:13 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel


---

 fs/netfs/Kconfig       |   15 ++++++++++++++
 fs/netfs/Makefile      |    3 +--
 fs/netfs/internal.h    |   34 +++++++++++++++++++++++++++++++
 fs/netfs/read_helper.c |   32 +++++++++++++++++++++++++++--
 fs/netfs/stats.c       |   53 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/netfs.h  |    1 +
 6 files changed, 134 insertions(+), 4 deletions(-)
 create mode 100644 fs/netfs/stats.c

diff --git a/fs/netfs/Kconfig b/fs/netfs/Kconfig
index 2ebf90e6ca95..578112713703 100644
--- a/fs/netfs/Kconfig
+++ b/fs/netfs/Kconfig
@@ -6,3 +6,18 @@ config NETFS_SUPPORT
 	  This option enables support for network filesystems, including
 	  helpers for high-level buffered I/O, abstracting out read
 	  segmentation, local caching and transparent huge page support.
+
+config NETFS_STATS
+	bool "Gather statistical information on local caching"
+	depends on NETFS_SUPPORT && PROC_FS
+	help
+	  This option causes statistical information to be gathered on local
+	  caching and exported through file:
+
+		/proc/fs/fscache/stats
+
+	  The gathering of statistics adds a certain amount of overhead to
+	  execution as there are a quite a few stats gathered, and on a
+	  multi-CPU system these may be on cachelines that keep bouncing
+	  between CPUs.  On the other hand, the stats are very useful for
+	  debugging purposes.  Saying 'Y' here is recommended.
diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
index 4b4eff2ba369..c15bfc966d96 100644
--- a/fs/netfs/Makefile
+++ b/fs/netfs/Makefile
@@ -1,6 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
 
-netfs-y := \
-	read_helper.o
+netfs-y := read_helper.o stats.o
 
 obj-$(CONFIG_NETFS_SUPPORT) := netfs.o
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index b07f43f9130e..d83317b1eb9d 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -16,8 +16,42 @@
  */
 extern unsigned int netfs_debug;
 
+/*
+ * stats.c
+ */
+#ifdef CONFIG_NETFS_STATS
+extern atomic_t netfs_n_rh_readahead;
+extern atomic_t netfs_n_rh_readpage;
+extern atomic_t netfs_n_rh_rreq;
+extern atomic_t netfs_n_rh_sreq;
+extern atomic_t netfs_n_rh_download;
+extern atomic_t netfs_n_rh_download_done;
+extern atomic_t netfs_n_rh_download_failed;
+extern atomic_t netfs_n_rh_download_instead;
+extern atomic_t netfs_n_rh_read;
+extern atomic_t netfs_n_rh_read_done;
+extern atomic_t netfs_n_rh_read_failed;
+extern atomic_t netfs_n_rh_zero;
+extern atomic_t netfs_n_rh_short_read;
+extern atomic_t netfs_n_rh_write;
+extern atomic_t netfs_n_rh_write_done;
+extern atomic_t netfs_n_rh_write_failed;
+
+
+static inline void netfs_stat(atomic_t *stat)
+{
+	atomic_inc(stat);
+}
+
+static inline void netfs_stat_d(atomic_t *stat)
+{
+	atomic_dec(stat);
+}
+
+#else
 #define netfs_stat(x) do {} while(0)
 #define netfs_stat_d(x) do {} while(0)
+#endif
 
 /*****************************************************************************/
 /*
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 21d3578c6979..e1ce197d8d18 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -55,6 +55,7 @@ static struct netfs_read_request *netfs_alloc_read_request(
 		refcount_set(&rreq->usage, 1);
 		__set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
 		ops->init_rreq(rreq, file);
+		netfs_stat(&netfs_n_rh_rreq);
 	}
 
 	return rreq;
@@ -86,6 +87,7 @@ static void netfs_put_read_request(struct netfs_read_request *rreq)
 		fscache_end_operation(&rreq->cache_resources);
 		trace_netfs_rreq(rreq, netfs_rreq_trace_free);
 		kfree(rreq);
+		netfs_stat_d(&netfs_n_rh_rreq);
 	}
 }
 
@@ -102,6 +104,7 @@ static struct netfs_read_subrequest *netfs_alloc_subrequest(
 		INIT_LIST_HEAD(&subreq->rreq_link);
 		refcount_set(&subreq->usage, 2);
 		subreq->rreq = rreq;
+		netfs_stat(&netfs_n_rh_sreq);
 	}
 
 	return subreq;
@@ -117,6 +120,7 @@ static void __netfs_put_subrequest(struct netfs_read_subrequest *subreq)
 	trace_netfs_sreq(subreq, netfs_sreq_trace_free);
 	netfs_put_read_request(subreq->rreq);
 	kfree(subreq);
+	netfs_stat_d(&netfs_n_rh_sreq);
 }
 
 /*
@@ -153,6 +157,7 @@ static void netfs_read_from_cache(struct netfs_read_request *rreq,
 			subreq->start + subreq->transferred,
 			subreq->len   - subreq->transferred);
 
+	netfs_stat(&netfs_n_rh_read);
 	fscache_read(&rreq->cache_resources, subreq->start, &iter, seek_data,
 		     netfs_cache_read_terminated, subreq);
 }
@@ -163,6 +168,7 @@ static void netfs_read_from_cache(struct netfs_read_request *rreq,
 static void netfs_fill_with_zeroes(struct netfs_read_request *rreq,
 				   struct netfs_read_subrequest *subreq)
 {
+	netfs_stat(&netfs_n_rh_zero);
 	__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
 	netfs_subreq_terminated(subreq, 0);
 }
@@ -186,6 +192,7 @@ static void netfs_fill_with_zeroes(struct netfs_read_request *rreq,
 static void netfs_read_from_server(struct netfs_read_request *rreq,
 				   struct netfs_read_subrequest *subreq)
 {
+	netfs_stat(&netfs_n_rh_download);
 	rreq->netfs_ops->issue_op(subreq);
 }
 
@@ -240,10 +247,13 @@ static void netfs_rreq_copy_terminated(void *priv, ssize_t transferred_or_error)
 	struct netfs_read_subrequest *subreq = priv;
 	struct netfs_read_request *rreq = subreq->rreq;
 
-	if (IS_ERR_VALUE(transferred_or_error))
+	if (IS_ERR_VALUE(transferred_or_error)) {
 		subreq->error = transferred_or_error;
-	else
+		netfs_stat(&netfs_n_rh_write_failed);
+	} else {
 		subreq->error = 0;
+		netfs_stat(&netfs_n_rh_write_done);
+	}
 
 	trace_netfs_sreq(subreq, netfs_sreq_trace_write_term);
 
@@ -288,6 +298,7 @@ static void netfs_rreq_do_write_to_cache(struct netfs_read_request *rreq)
 			netfs_put_subrequest(next);
 		}
 
+		netfs_stat(&netfs_n_rh_write);
 		trace_netfs_sreq(subreq, netfs_sreq_trace_write);
 
 		iov_iter_xarray(&iter, WRITE, &rreq->mapping->i_pages,
@@ -415,6 +426,7 @@ static void netfs_rreq_short_read(struct netfs_read_request *rreq,
 	__clear_bit(NETFS_SREQ_SHORT_READ, &subreq->flags);
 	__set_bit(NETFS_SREQ_SEEK_DATA_READ, &subreq->flags);
 
+	netfs_stat(&netfs_n_rh_short_read);
 	trace_netfs_sreq(subreq, netfs_sreq_trace_resubmit_short);
 
 	netfs_get_read_subrequest(subreq);
@@ -449,6 +461,7 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq)
 				break;
 			subreq->source = NETFS_DOWNLOAD_FROM_SERVER;
 			subreq->error = 0;
+			netfs_stat(&netfs_n_rh_download_instead);
 			trace_netfs_sreq(subreq, netfs_sreq_trace_download_instead);
 			netfs_get_read_subrequest(subreq);
 			atomic_inc(&rreq->nr_rd_ops);
@@ -563,6 +576,17 @@ void netfs_subreq_terminated(struct netfs_read_subrequest *subreq,
 	       subreq->debug_index, subreq->start, subreq->flags,
 	       transferred_or_error);
 
+	switch (subreq->source) {
+	case NETFS_READ_FROM_CACHE:
+		netfs_stat(&netfs_n_rh_read_done);
+		break;
+	case NETFS_DOWNLOAD_FROM_SERVER:
+		netfs_stat(&netfs_n_rh_download_done);
+		break;
+	default:
+		break;
+	}
+
 	if (IS_ERR_VALUE(transferred_or_error)) {
 		subreq->error = transferred_or_error;
 		goto failed;
@@ -616,8 +640,10 @@ void netfs_subreq_terminated(struct netfs_read_subrequest *subreq,
 
 failed:
 	if (subreq->source == NETFS_READ_FROM_CACHE) {
+		netfs_stat(&netfs_n_rh_read_failed);
 		set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags);
 	} else {
+		netfs_stat(&netfs_n_rh_download_failed);
 		set_bit(NETFS_RREQ_FAILED, &rreq->flags);
 		rreq->error = subreq->error;
 	}
@@ -822,6 +848,7 @@ void netfs_readahead(struct readahead_control *ractl,
 	rreq->start	= readahead_pos(ractl);
 	rreq->len	= readahead_length(ractl);
 
+	netfs_stat(&netfs_n_rh_readahead);
 	trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl),
 			 netfs_read_trace_readahead);
 
@@ -897,6 +924,7 @@ int netfs_readpage(struct file *file,
 	rreq->start		= page->index * PAGE_SIZE;
 	rreq->len		= thp_size(page);
 
+	netfs_stat(&netfs_n_rh_readpage);
 	trace_netfs_read(rreq, page->index * PAGE_SIZE, thp_size(page),
 			 netfs_read_trace_readpage);
 
diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c
new file mode 100644
index 000000000000..3a7a3c10e1cd
--- /dev/null
+++ b/fs/netfs/stats.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Netfs support statistics
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/seq_file.h>
+#include <linux/netfs.h>
+#include "internal.h"
+
+atomic_t netfs_n_rh_readahead;
+atomic_t netfs_n_rh_readpage;
+atomic_t netfs_n_rh_rreq;
+atomic_t netfs_n_rh_sreq;
+atomic_t netfs_n_rh_download;
+atomic_t netfs_n_rh_download_done;
+atomic_t netfs_n_rh_download_failed;
+atomic_t netfs_n_rh_download_instead;
+atomic_t netfs_n_rh_read;
+atomic_t netfs_n_rh_read_done;
+atomic_t netfs_n_rh_read_failed;
+atomic_t netfs_n_rh_zero;
+atomic_t netfs_n_rh_short_read;
+atomic_t netfs_n_rh_write;
+atomic_t netfs_n_rh_write_done;
+atomic_t netfs_n_rh_write_failed;
+
+void netfs_stats_show(struct seq_file *m)
+{
+	seq_printf(m, "RdHelp : RA=%u RP=%u rr=%u sr=%u\n",
+		   atomic_read(&netfs_n_rh_readahead),
+		   atomic_read(&netfs_n_rh_readpage),
+		   atomic_read(&netfs_n_rh_rreq),
+		   atomic_read(&netfs_n_rh_sreq));
+	seq_printf(m, "RdHelp : ZR=%u sh=%u\n",
+		   atomic_read(&netfs_n_rh_zero),
+		   atomic_read(&netfs_n_rh_short_read));
+	seq_printf(m, "RdHelp : DL=%u ds=%u df=%u di=%u\n",
+		   atomic_read(&netfs_n_rh_download),
+		   atomic_read(&netfs_n_rh_download_done),
+		   atomic_read(&netfs_n_rh_download_failed),
+		   atomic_read(&netfs_n_rh_download_instead));
+	seq_printf(m, "RdHelp : RD=%u rs=%u rf=%u\n",
+		   atomic_read(&netfs_n_rh_read),
+		   atomic_read(&netfs_n_rh_read_done),
+		   atomic_read(&netfs_n_rh_read_failed));
+	seq_printf(m, "RdHelp : WR=%u ws=%u wf=%u\n",
+		   atomic_read(&netfs_n_rh_write),
+		   atomic_read(&netfs_n_rh_write_done),
+		   atomic_read(&netfs_n_rh_write_failed));
+}
+EXPORT_SYMBOL(netfs_stats_show);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index c4276eb9914d..d9cd9a1d1f6d 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -98,5 +98,6 @@ extern int netfs_readpage(struct file *,
 			  void *);
 
 extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t);
+extern void netfs_stats_show(struct seq_file *);
 
 #endif /* _LINUX_NETFS_H */



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 50/76] fscache: New stats
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (47 preceding siblings ...)
  2020-11-20 15:13 ` [RFC PATCH 49/76] netfs: Stats David Howells
@ 2020-11-20 15:13 ` David Howells
  2020-11-20 15:13 ` [RFC PATCH 51/76] fscache, cachefiles: Rewrite invalidation David Howells
                   ` (25 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:13 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Create some new stat counters appropriate to the reworked code.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/io.c            |    2 ++
 fs/fscache/Kconfig            |    1 +
 fs/fscache/dispatcher.c       |    6 ++++++
 fs/fscache/internal.h         |    8 ++++++--
 fs/fscache/stats.c            |   27 ++++++++++++++++++++++++---
 include/linux/fscache-cache.h |   10 ++++++++++
 6 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index c9866bd5e010..1060c1c57008 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -84,6 +84,7 @@ int cachefiles_read(struct fscache_op_resources *opr,
 
 	__fscache_wait_for_operation(opr, FSCACHE_WANT_READ);
 	fscache_count_io_operation(opr->object->cookie);
+	fscache_count_read();
 
 	/* If the caller asked us to seek for data before doing the read, then
 	 * we should do that now.  If we find a gap, we fill it with zeros.
@@ -233,6 +234,7 @@ int cachefiles_write(struct fscache_op_resources *opr,
 
 	__fscache_wait_for_operation(opr, FSCACHE_WANT_WRITE);
 	fscache_count_io_operation(opr->object->cookie);
+	fscache_count_write();
 
 	ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
 	if (!ki)
diff --git a/fs/fscache/Kconfig b/fs/fscache/Kconfig
index b04f07160e5e..fe7f856a4037 100644
--- a/fs/fscache/Kconfig
+++ b/fs/fscache/Kconfig
@@ -14,6 +14,7 @@ config FSCACHE
 config FSCACHE_STATS
 	bool "Gather statistical information on local caching"
 	depends on FSCACHE && PROC_FS
+	select NETFS_STATS
 	help
 	  This option causes statistical information to be gathered on local
 	  caching and exported through file:
diff --git a/fs/fscache/dispatcher.c b/fs/fscache/dispatcher.c
index 3d957e499da3..2663bd4b36e8 100644
--- a/fs/fscache/dispatcher.c
+++ b/fs/fscache/dispatcher.c
@@ -41,6 +41,8 @@ void fscache_dispatch(struct fscache_cookie *cookie,
 	struct fscache_work *work;
 	bool queued = false;
 
+	fscache_stat(&fscache_n_dispatch_count);
+
 	work = kzalloc(sizeof(struct fscache_work), GFP_KERNEL);
 	if (work) {
 		work->cookie = cookie;
@@ -57,10 +59,13 @@ void fscache_dispatch(struct fscache_cookie *cookie,
 			queued = true;
 		}
 		spin_unlock(&fscache_work_lock);
+		if (queued)
+			fscache_stat(&fscache_n_dispatch_deferred);
 	}
 
 	if (!queued) {
 		kfree(work);
+		fscache_stat(&fscache_n_dispatch_inline);
 		func(cookie, object, param);
 	}
 }
@@ -86,6 +91,7 @@ static int fscache_dispatcher(void *data)
 
 			if (work) {
 				work->func(work->cookie, work->object, work->param);
+				fscache_stat(&fscache_n_dispatch_in_pool);
 				fscache_cookie_put(work->cookie, fscache_cookie_put_work);
 				kfree(work);
 			}
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 1721823b8cac..73568e84fe3d 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -221,6 +221,11 @@ extern atomic_t fscache_n_cache_stale_objects;
 extern atomic_t fscache_n_cache_retired_objects;
 extern atomic_t fscache_n_cache_culled_objects;
 
+extern atomic_t fscache_n_dispatch_count;
+extern atomic_t fscache_n_dispatch_deferred;
+extern atomic_t fscache_n_dispatch_inline;
+extern atomic_t fscache_n_dispatch_in_pool;
+
 static inline void fscache_stat(atomic_t *stat)
 {
 	atomic_inc(stat);
@@ -233,10 +238,9 @@ static inline void fscache_stat_d(atomic_t *stat)
 
 #define __fscache_stat(stat) (stat)
 
-int fscache_stats_show(struct seq_file *m, void *v);
 extern int __init fscache_proc_stats_init(void);
-#else
 
+#else
 #define __fscache_stat(stat) (NULL)
 #define fscache_stat(stat) do {} while (0)
 #define fscache_stat_d(stat) do {} while (0)
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index bf2935571de5..952214305853 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -6,9 +6,9 @@
  */
 
 #define FSCACHE_DEBUG_LEVEL CACHE
-#include <linux/module.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
+#include <linux/netfs.h>
 #include "internal.h"
 
 /*
@@ -56,13 +56,22 @@ atomic_t fscache_n_cache_stale_objects;
 atomic_t fscache_n_cache_retired_objects;
 atomic_t fscache_n_cache_culled_objects;
 
+atomic_t fscache_n_dispatch_count;
+atomic_t fscache_n_dispatch_deferred;
+atomic_t fscache_n_dispatch_inline;
+atomic_t fscache_n_dispatch_in_pool;
+
+atomic_t fscache_n_read;
+EXPORT_SYMBOL(fscache_n_read);
+atomic_t fscache_n_write;
+EXPORT_SYMBOL(fscache_n_write);
+
 /*
  * display the general statistics
  */
-int fscache_stats_show(struct seq_file *m, void *v)
+static int fscache_stats_show(struct seq_file *m, void *v)
 {
 	seq_puts(m, "FS-Cache statistics\n");
-
 	seq_printf(m, "Cookies: idx=%u dat=%u spc=%u\n",
 		   atomic_read(&fscache_n_cookie_index),
 		   atomic_read(&fscache_n_cookie_data),
@@ -113,6 +122,18 @@ int fscache_stats_show(struct seq_file *m, void *v)
 		   atomic_read(&fscache_n_cache_stale_objects),
 		   atomic_read(&fscache_n_cache_retired_objects),
 		   atomic_read(&fscache_n_cache_culled_objects));
+
+	seq_printf(m, "Disp   : n=%u il=%u df=%u pl=%u\n",
+		   atomic_read(&fscache_n_dispatch_count),
+		   atomic_read(&fscache_n_dispatch_inline),
+		   atomic_read(&fscache_n_dispatch_deferred),
+		   atomic_read(&fscache_n_dispatch_in_pool));
+
+	seq_printf(m, "IO     : rd=%u wr=%u\n",
+		   atomic_read(&fscache_n_read),
+		   atomic_read(&fscache_n_write));
+
+	netfs_stats_show(m);
 	return 0;
 }
 
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 54625464a109..dacfda1d3c20 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -259,4 +259,14 @@ static inline void fscache_uncount_io_operation(struct fscache_cookie *cookie)
 extern void __fscache_wait_for_operation(struct fscache_op_resources *, enum fscache_want_stage);
 extern void __fscache_end_operation(struct fscache_op_resources *);
 
+#ifdef CONFIG_FSCACHE_STATS
+extern atomic_t fscache_n_read;
+extern atomic_t fscache_n_write;
+#define fscache_count_read() atomic_inc(&fscache_n_read)
+#define fscache_count_write() atomic_inc(&fscache_n_write)
+#else
+#define fscache_count_read() do {} while(0)
+#define fscache_count_write() do {} while(0)
+#endif
+
 #endif /* _LINUX_FSCACHE_CACHE_H */



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 51/76] fscache, cachefiles: Rewrite invalidation
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (48 preceding siblings ...)
  2020-11-20 15:13 ` [RFC PATCH 50/76] fscache: New stats David Howells
@ 2020-11-20 15:13 ` David Howells
  2020-11-20 15:13 ` [RFC PATCH 52/76] fscache: disable cookie when doing an invalidation for DIO write David Howells
                   ` (24 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:13 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Rewrite the cache object invalidation code in fscache and cachefiles.  The
following changes are made to fscache:

 (1) Invalidation is now ignored or allowed to proceed depending on the
     'stage' a non-index cookie is in with respect to the backing object.

 (2) If invalidation is proceeds, it pins the object and holds an operation
     count  for the duration.

 (3) The fscache_object struct is given an invalidation counter that is
     incremented any time fscache_invalidate() is called, even if the
     cookie is at a stage in which it cannot be applied.  The counter,
     however, can be noted and applied retroactively later.

 (4) The invalidation counter is noted in the operation struct when a cache
     operation is begun and can be checked on operation completion to find
     out if any consequent metadata changes should be dropped.

 (5) New operations aren't allowed to proceed if the object is being
     invalidated.

and to cachefiles:

 (1) If an open object is invalidated, the open backing file is replaced
     with a tmpfile (as if opened O_TMPFILE).  This is held unlinked until
     the object released from memory, at which point the file is simply
     abandoned if it was retired or the old file is unlinked and the new
     one linked into its place.

     Note: This would be easier if linkat() could be given a flag to
     indicate the destination should be overwritten or if RENAME_EXCHANGE
     could be applied to tmpfiles, effectively unlinking the destination.

 (2) Upon invalidation, the content map is replaced with a blank one.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/inode.c                 |    8 ++
 fs/cachefiles/content-map.c    |   38 ++++++++++-
 fs/cachefiles/interface.c      |  133 +++++++++++++++++++++++++++++++++-------
 fs/cachefiles/internal.h       |   12 +++-
 fs/cachefiles/io.c             |   17 ++++-
 fs/cachefiles/namei.c          |   69 +++++++++++++++++++--
 fs/cachefiles/xattr.c          |    6 +-
 fs/fscache/cookie.c            |   22 +++++--
 fs/fscache/io.c                |    1 
 fs/fscache/obj.c               |   25 ++------
 include/linux/fscache-cache.h  |    5 +-
 include/linux/fscache.h        |   20 +++++-
 include/trace/events/fscache.h |   19 ++++++
 13 files changed, 299 insertions(+), 76 deletions(-)

diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 0ba2815f8152..3930f051f39a 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -569,7 +569,13 @@ static void afs_zap_data(struct afs_vnode *vnode)
 	_enter("{%llx:%llu}", vnode->fid.vid, vnode->fid.vnode);
 
 #ifdef CONFIG_AFS_FSCACHE
-	fscache_invalidate(vnode->cache, i_size_read(&vnode->vfs_inode));
+	{
+		struct afs_vnode_cache_aux aux = {
+			.data_version = vnode->status.data_version,
+		};
+		fscache_invalidate(afs_vnode_cache(vnode), &aux,
+				   i_size_read(&vnode->vfs_inode), 0);
+	}
 #endif
 
 	/* nuke all the non-dirty pages that aren't locked, mapped or being
diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
index 4c8dc11d2d4c..da0a81e3f751 100644
--- a/fs/cachefiles/content-map.c
+++ b/fs/cachefiles/content-map.c
@@ -204,21 +204,47 @@ enum netfs_read_source cachefiles_prepare_read(struct netfs_read_subrequest *sub
 	return NETFS_READ_FROM_CACHE;
 }
 
+/*
+ * Allocate a new content map.
+ */
+u8 *cachefiles_new_content_map(struct cachefiles_object *object,
+			       unsigned int *_size)
+{
+	size_t size;
+	u8 *map = NULL;
+
+	_enter("");
+
+	if (!(object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK)) {
+		/* Single-chunk object.  The presence or absence of the content
+		 * map xattr is sufficient indication.
+		 */
+		*_size = 0;
+		return NULL;
+	}
+
+	/* Granular object. */
+	size = cachefiles_map_size(object->fscache.cookie->object_size);
+	map = kzalloc(size, GFP_KERNEL);
+	if (!map)
+		return ERR_PTR(-ENOMEM);
+	*_size = size;
+	return map;
+}
+
 /*
  * Mark the content map to indicate stored granule.
  */
 void cachefiles_mark_content_map(struct cachefiles_object *object,
-				 loff_t start, loff_t len)
+				 loff_t start, loff_t len,
+				 unsigned int inval_counter)
 {
 	_enter("%llx", start);
 
 	read_lock_bh(&object->content_map_lock);
 
-	if (object->fscache.cookie->advice & FSCACHE_ADV_SINGLE_CHUNK) {
-		if (start == 0) {
-			object->content_info = CACHEFILES_CONTENT_SINGLE;
-			set_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->fscache.flags);
-		}
+	if (object->fscache.inval_counter != inval_counter) {
+		_debug("inval mark");
 	} else {
 		pgoff_t granule;
 		loff_t end = start + len;
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index c083d9833b87..4da10640611c 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -201,7 +201,7 @@ static void cachefiles_update_object(struct fscache_object *_object)
 		}
 	}
 
-	cachefiles_set_object_xattr(object, XATTR_REPLACE);
+	cachefiles_set_object_xattr(object);
 
 out:
 	cachefiles_end_secure(cache, saved_cred);
@@ -211,11 +211,15 @@ static void cachefiles_update_object(struct fscache_object *_object)
 /*
  * Commit changes to the object as we drop it.
  */
-static void cachefiles_commit_object(struct cachefiles_object *object,
+static bool cachefiles_commit_object(struct cachefiles_object *object,
 				     struct cachefiles_cache *cache)
 {
 	if (object->content_map_changed)
 		cachefiles_save_content_map(object);
+
+	if (test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags))
+		return cachefiles_commit_tmpfile(cache, object);
+	return true;
 }
 
 /*
@@ -422,48 +426,133 @@ static int cachefiles_attr_changed(struct cachefiles_object *object)
 }
 
 /*
- * Invalidate an object
+ * Create a temporary file and leave it unattached and un-xattr'd until the
+ * time comes to discard the object from memory.
  */
-static bool cachefiles_invalidate_object(struct fscache_object *_object)
+static struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
 {
-	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
 	const struct cred *saved_cred;
+	struct file *file;
 	struct path path;
 	uint64_t ni_size;
-	int ret;
+	long ret;
 
-	object = container_of(_object, struct cachefiles_object, fscache);
 	cache = container_of(object->fscache.cache,
 			     struct cachefiles_cache, cache);
 
 	ni_size = object->fscache.cookie->object_size;
 	ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
 
+	cachefiles_begin_secure(cache, &saved_cred);
+
+	path.mnt = cache->mnt;
+	path.dentry = vfs_tmpfile(cache->graveyard, S_IFREG, O_RDWR);
+	if (IS_ERR(path.dentry)) {
+		if (PTR_ERR(path.dentry) == -EIO)
+			cachefiles_io_error_obj(object, "Failed to create tmpfile");
+		file = ERR_CAST(path.dentry);
+		goto out;
+	}
+
+	trace_cachefiles_tmpfile(object, d_inode(path.dentry));
+
+	if (ni_size > 0) {
+		trace_cachefiles_trunc(object, d_inode(path.dentry), 0, ni_size);
+		ret = vfs_truncate(&path, ni_size);
+		if (ret < 0) {
+			file = ERR_PTR(ret);
+			goto out_dput;
+		}
+	}
+
+	file = open_with_fake_path(&path,
+				   O_RDWR | O_LARGEFILE | O_DIRECT,
+				   d_backing_inode(path.dentry),
+				   cache->cache_cred);
+out_dput:
+	dput(path.dentry);
+out:
+	cachefiles_end_secure(cache, saved_cred);
+	return file;
+}
+
+/*
+ * Invalidate an object
+ */
+static bool cachefiles_invalidate_object(struct fscache_object *_object,
+					 unsigned int flags)
+{
+	struct cachefiles_object *object;
+	struct file *file, *old_file;
+	struct dentry *old_dentry;
+	u8 *map, *old_map;
+	unsigned int map_size;
+
+	object = container_of(_object, struct cachefiles_object, fscache);
+
 	_enter("{OBJ%x},[%llu]",
-	       object->fscache.debug_id, (unsigned long long)ni_size);
+	       object->fscache.debug_id, _object->cookie->object_size);
+
+	if ((flags & FSCACHE_INVAL_LIGHT) &&
+	    test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags)) {
+		_leave(" = t [light]");
+		return true;
+	}
 
 	if (object->dentry) {
 		ASSERT(d_is_reg(object->dentry));
 
-		path.dentry = object->dentry;
-		path.mnt = cache->mnt;
-
-		cachefiles_begin_secure(cache, &saved_cred);
-		ret = vfs_truncate(&path, 0);
-		if (ret == 0)
-			ret = vfs_truncate(&path, ni_size);
-		cachefiles_end_secure(cache, saved_cred);
-
-		if (ret != 0) {
-			if (ret == -EIO)
-				cachefiles_io_error_obj(object,
-							"Invalidate failed");
-			return false;
+		file = cachefiles_create_tmpfile(object);
+		if (IS_ERR(file))
+			goto failed;
+
+		map = cachefiles_new_content_map(object, &map_size);
+		if (IS_ERR(map))
+			goto failed_fput;
+
+		/* Substitute the VFS target */
+		_debug("sub");
+		dget(file->f_path.dentry); /* Do outside of content_map_lock */
+		spin_lock(&object->fscache.lock);
+		write_lock_bh(&object->content_map_lock);
+
+		if (!object->old) {
+			/* Save the dentry carrying the path information */
+			object->old = object->dentry;
+			old_dentry = NULL;
+		} else {
+			old_dentry = object->dentry;
 		}
+
+		old_file = object->backing_file;
+		old_map = object->content_map;
+		object->backing_file = file;
+		object->dentry = file->f_path.dentry;
+		object->content_info = CACHEFILES_CONTENT_NO_DATA;
+		object->content_map = map;
+		object->content_map_size = map_size;
+		object->content_map_changed = true;
+		set_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags);
+		set_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->fscache.flags);
+
+		write_unlock_bh(&object->content_map_lock);
+		spin_unlock(&object->fscache.lock);
+		_debug("subbed");
+
+		kfree(old_map);
+		fput(old_file);
+		dput(old_dentry);
 	}
 
+	_leave(" = t [tmpfile]");
 	return true;
+
+failed_fput:
+	fput(file);
+failed:
+	_leave(" = f");
+	return false;
 }
 
 static unsigned int cachefiles_get_object_usage(const struct fscache_object *_object)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index b58177f65135..af68564598d5 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -54,6 +54,8 @@ struct cachefiles_object {
 	struct file			*backing_file;	/* File open on backing storage */
 	loff_t				i_size;		/* object size */
 	atomic_t			usage;		/* object usage count */
+	unsigned long			flags;
+#define CACHEFILES_OBJECT_USING_TMPFILE	0		/* Object has a tmpfile that need linking */
 	uint8_t				type;		/* object type */
 	bool				new;		/* T if object new */
 
@@ -129,8 +131,10 @@ extern void cachefiles_expand_readahead(struct fscache_op_resources *opr,
 					loff_t *_start, size_t *_len, loff_t i_size);
 extern enum netfs_read_source cachefiles_prepare_read(struct netfs_read_subrequest *subreq,
 						      loff_t i_size);
+extern u8 *cachefiles_new_content_map(struct cachefiles_object *object,
+				      unsigned int *_size);
 extern void cachefiles_mark_content_map(struct cachefiles_object *object,
-					loff_t start, loff_t len);
+					loff_t start, loff_t len, unsigned int inval_counter);
 extern void cachefiles_expand_content_map(struct cachefiles_object *object, loff_t size);
 extern void cachefiles_shorten_content_map(struct cachefiles_object *object, loff_t new_size);
 extern bool cachefiles_load_content_map(struct cachefiles_object *object);
@@ -195,6 +199,9 @@ extern int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
 extern int cachefiles_check_in_use(struct cachefiles_cache *cache,
 				   struct dentry *dir, char *filename);
 
+extern bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
+				      struct cachefiles_object *object);
+
 /*
  * proc.c
  */
@@ -247,8 +254,7 @@ static inline void cachefiles_end_secure(struct cachefiles_cache *cache,
  * xattr.c
  */
 extern int cachefiles_check_object_type(struct cachefiles_object *object);
-extern int cachefiles_set_object_xattr(struct cachefiles_object *object,
-				       unsigned int xattr_flags);
+extern int cachefiles_set_object_xattr(struct cachefiles_object *object);
 extern int cachefiles_check_auxdata(struct cachefiles_object *object);
 extern int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
 					  struct dentry *dentry);
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 1060c1c57008..3ad62a39133e 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -17,6 +17,7 @@
 struct cachefiles_kiocb {
 	struct kiocb		iocb;
 	refcount_t		ki_refcnt;
+	unsigned int		inval_counter;
 	loff_t			start;
 	union {
 		size_t		skipped;
@@ -46,10 +47,15 @@ static void cachefiles_read_complete(struct kiocb *iocb, long ret, long ret2)
 	_enter("%ld,%ld", ret, ret2);
 
 	if (ki->term_func) {
-		if (ret < 0)
+		if (ret < 0) {
 			ki->term_func(ki->term_func_priv, ret);
-		else
-			ki->term_func(ki->term_func_priv, ki->skipped + ret);
+		} else {
+			if (ki->object->fscache.inval_counter == ki->inval_counter)
+				ki->skipped += ret;
+			else
+				ret = -ESTALE;
+			ki->term_func(ki->term_func_priv, ret);
+		}
 	}
 
 	fscache_uncount_io_operation(ki->object->fscache.cookie);
@@ -126,6 +132,7 @@ int cachefiles_read(struct fscache_op_resources *opr,
 	ki->iocb.ki_flags	= IOCB_DIRECT;
 	ki->iocb.ki_hint	= ki_hint_validate(file_write_hint(file));
 	ki->iocb.ki_ioprio	= get_current_ioprio();
+	ki->inval_counter	= opr->inval_counter;
 	ki->skipped		= skipped;
 	ki->object		= object;
 	ki->term_func		= term_func;
@@ -197,7 +204,8 @@ static void cachefiles_write_complete(struct kiocb *iocb, long ret, long ret2)
 			ki->term_func(ki->term_func_priv, ret);
 	} else {
 		if (ret == ki->len)
-			cachefiles_mark_content_map(ki->object, ki->start, ki->len);
+			cachefiles_mark_content_map(ki->object, ki->start, ki->len,
+						    ki->inval_counter);
 		if (ki->term_func)
 			ki->term_func(ki->term_func_priv, ret);
 	}
@@ -246,6 +254,7 @@ int cachefiles_write(struct fscache_op_resources *opr,
 	ki->iocb.ki_flags	= IOCB_DIRECT | IOCB_WRITE;
 	ki->iocb.ki_hint	= ki_hint_validate(file_write_hint(file));
 	ki->iocb.ki_ioprio	= get_current_ioprio();
+	ki->inval_counter	= opr->inval_counter;
 	ki->start		= start_pos;
 	ki->len			= len;
 	ki->object		= object;
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 9f6c91cff55d..69a7ce9a62c3 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -466,7 +466,7 @@ bool cachefiles_walk_to_object(struct cachefiles_object *parent,
 
 	if (object->new) {
 		/* attach data to a newly constructed terminal object */
-		ret = cachefiles_set_object_xattr(object, XATTR_CREATE);
+		ret = cachefiles_set_object_xattr(object);
 		if (ret < 0)
 			goto check_error;
 	} else {
@@ -485,8 +485,6 @@ bool cachefiles_walk_to_object(struct cachefiles_object *parent,
 				pr_warn("cachefiles: Block size too large\n");
 				goto check_error;
 			}
-
-			object->old = dget(object->dentry);
 		} else {
 			BUG(); // TODO: open file in data-class subdir
 		}
@@ -521,9 +519,7 @@ bool cachefiles_walk_to_object(struct cachefiles_object *parent,
 		cachefiles_unmark_inode_in_use(object, object->dentry);
 	cachefiles_mark_object_inactive(cache, object);
 	dput(object->dentry);
-	dput(object->old);
 	object->dentry = NULL;
-	object->old = NULL;
 	goto error_out;
 
 lookup_error:
@@ -807,3 +803,66 @@ int cachefiles_check_in_use(struct cachefiles_cache *cache, struct dentry *dir,
 	//_leave(" = 0");
 	return ret;
 }
+
+/*
+ * Attempt to link a temporary file into its rightful place in the cache.
+ */
+bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
+			       struct cachefiles_object *object)
+{
+	struct dentry *dir, *dentry, *old;
+	char *name;
+	unsigned int namelen;
+	bool success = false;
+	int ret;
+
+	_enter(",%pd", object->old);
+
+	namelen = object->old->d_name.len;
+	name = kmemdup_nul(object->old->d_name.name, namelen, GFP_KERNEL);
+	if (!name)
+		goto out;
+
+	dir = dget_parent(object->old);
+
+	inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
+	ret = cachefiles_bury_object(cache, object, dir, object->old,
+				     FSCACHE_OBJECT_IS_STALE);
+	dput(object->old);
+	object->old = NULL;
+	if (ret < 0 && ret != -ENOENT) {
+		_debug("bury fail %d", ret);
+		goto out_name;
+	}
+
+	inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
+	dentry = lookup_one_len(name, dir, namelen);
+	if (IS_ERR(dentry)) {
+		_debug("lookup fail %ld", PTR_ERR(dentry));
+		goto out_unlock;
+	}
+
+	ret = vfs_link(object->dentry, d_inode(dir), dentry, NULL);
+	if (ret < 0) {
+		_debug("link fail %d", ret);
+		dput(dentry);
+	} else {
+		trace_cachefiles_link(object, d_inode(object->dentry));
+		spin_lock(&object->fscache.lock);
+		old = object->dentry;
+		object->dentry = dentry;
+		success = true;
+		clear_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags);
+		spin_unlock(&object->fscache.lock);
+		dput(old);
+	}
+
+out_unlock:
+	inode_unlock(d_inode(dir));
+out_name:
+	kfree(name);
+	dput(dir);
+out:
+	_leave(" = %u", success);
+	return success;
+}
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index cbd43855dc0d..991ecda1f140 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -104,8 +104,7 @@ int cachefiles_check_object_type(struct cachefiles_object *object)
 /*
  * set the state xattr on a cache file
  */
-int cachefiles_set_object_xattr(struct cachefiles_object *object,
-				unsigned int xattr_flags)
+int cachefiles_set_object_xattr(struct cachefiles_object *object)
 {
 	struct cachefiles_xattr *buf;
 	struct dentry *dentry = object->dentry;
@@ -129,8 +128,7 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 		memcpy(buf->data, fscache_get_aux(object->fscache.cookie), len);
 
 	ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
-			   buf, sizeof(struct cachefiles_xattr) + len,
-			   xattr_flags);
+			   buf, sizeof(struct cachefiles_xattr) + len, 0);
 	if (ret < 0) {
 		trace_cachefiles_coherency(object, d_inode(dentry)->i_ino,
 					   buf->content,
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 8d8aba14912a..1b30b28f7cf6 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -531,9 +531,11 @@ void fscache_set_cookie_stage(struct fscache_cookie *cookie,
 }
 
 /*
- * Invalidate an object.  Callable with spinlocks held.
+ * Invalidate an object.
  */
-void __fscache_invalidate(struct fscache_cookie *cookie, loff_t new_size)
+void __fscache_invalidate(struct fscache_cookie *cookie,
+			  const void *aux_data, loff_t new_size,
+			  unsigned int flags)
 {
 	struct fscache_object *object = NULL;
 
@@ -541,6 +543,10 @@ void __fscache_invalidate(struct fscache_cookie *cookie, loff_t new_size)
 
 	fscache_stat(&fscache_n_invalidates);
 
+	if (WARN(test_bit(FSCACHE_COOKIE_RELINQUISHED, &cookie->flags),
+		 "Trying to invalidate relinquished cookie\n"))
+		return;
+
 	/* Only permit invalidation of data files.  Invalidating an index will
 	 * require the caller to release all its attachments to the tree rooted
 	 * there, and if it's doing that, it may as well just retire the
@@ -549,12 +555,16 @@ void __fscache_invalidate(struct fscache_cookie *cookie, loff_t new_size)
 	ASSERTCMP(cookie->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
 
 	spin_lock(&cookie->lock);
-	cookie->object_size = new_size;
+	fscache_update_aux(cookie, aux_data, &new_size);
 	cookie->zero_point = new_size;
 
-	if (!hlist_empty(&cookie->backing_objects))
+	trace_fscache_invalidate(cookie, new_size);
+
+	if (!hlist_empty(&cookie->backing_objects)) {
 		object = hlist_entry(cookie->backing_objects.first,
 				     struct fscache_object, cookie_link);
+		object->inval_counter++;
+	}
 
 	switch (cookie->stage) {
 	case FSCACHE_COOKIE_STAGE_INITIALISING: /* Assume later checks will catch it */
@@ -566,7 +576,7 @@ void __fscache_invalidate(struct fscache_cookie *cookie, loff_t new_size)
 
 	case FSCACHE_COOKIE_STAGE_LOOKING_UP:
 		spin_unlock(&cookie->lock);
-		_leave(" [look]");
+		_leave(" [look %x]", object->inval_counter);
 		return;
 
 	case FSCACHE_COOKIE_STAGE_NO_DATA_YET:
@@ -578,7 +588,7 @@ void __fscache_invalidate(struct fscache_cookie *cookie, loff_t new_size)
 		spin_unlock(&cookie->lock);
 		wake_up_cookie_stage(cookie);
 
-		fscache_dispatch(cookie, object, 0, fscache_invalidate_object);
+		fscache_dispatch(cookie, object, flags, fscache_invalidate_object);
 		_leave(" [inv]");
 		return;
 	}
diff --git a/fs/fscache/io.c b/fs/fscache/io.c
index 90c056e85cea..f13a7729bad3 100644
--- a/fs/fscache/io.c
+++ b/fs/fscache/io.c
@@ -111,6 +111,7 @@ int __fscache_begin_operation(struct fscache_cookie *cookie,
 		goto not_live;
 
 	opr->object = object;
+	opr->inval_counter = object->inval_counter;
 	object->cache->ops->grab_object(object, fscache_obj_get_ioreq);
 	object->cache->ops->begin_operation(opr);
 
diff --git a/fs/fscache/obj.c b/fs/fscache/obj.c
index 23598bec639e..60c70c0e474d 100644
--- a/fs/fscache/obj.c
+++ b/fs/fscache/obj.c
@@ -245,30 +245,15 @@ void fscache_lookup_object(struct fscache_cookie *cookie,
 }
 
 /*
- * Invalidate an object
+ * Invalidate an object.  param passes the invalidation flags.
  */
 void fscache_invalidate_object(struct fscache_cookie *cookie,
-			       struct fscache_object *unused, int param)
+			       struct fscache_object *object, int flags)
 {
-	struct fscache_object *object = NULL;
-	bool success = true;
+	bool success;
 
-	spin_lock(&cookie->lock);
-
-	if (!hlist_empty(&cookie->backing_objects)) {
-		object = hlist_entry(cookie->backing_objects.first,
-				     struct fscache_object,
-				     cookie_link);
-		object = object->cache->ops->grab_object(object,
-							 fscache_obj_get_inval);
-	}
-
-	spin_unlock(&cookie->lock);
-
-	if (object) {
-		success = object->cache->ops->invalidate_object(object);
-		fscache_do_put_object(object, fscache_obj_put_inval);
-	}
+	success = object->cache->ops->invalidate_object(object, flags);
+	fscache_do_put_object(object, fscache_obj_put_inval);
 
 	if (success)
 		fscache_set_cookie_stage(cookie, FSCACHE_COOKIE_STAGE_NO_DATA_YET);
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index dacfda1d3c20..a0bb526735b5 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -119,7 +119,8 @@ struct fscache_cache_ops {
 	void (*update_object)(struct fscache_object *object);
 
 	/* Invalidate an object */
-	bool (*invalidate_object)(struct fscache_object *object);
+	bool (*invalidate_object)(struct fscache_object *object,
+				  unsigned int flags);
 
 	/* discard the resources pinned by an object and effect retirement if
 	 * necessary */
@@ -164,10 +165,12 @@ enum fscache_object_stage {
 struct fscache_object {
 	int			debug_id;	/* debugging ID */
 	int			n_children;	/* number of child objects */
+	unsigned int		inval_counter;	/* Number of invalidations applied */
 	enum fscache_object_stage stage;	/* Stage of object's lifecycle */
 	spinlock_t		lock;		/* state and operations lock */
 
 	unsigned long		flags;
+#define FSCACHE_OBJECT_NEEDS_INVAL	8	/* T if object needs invalidation */
 #define FSCACHE_OBJECT_NEEDS_UPDATE	9	/* T if object attrs need writing to disk */
 
 	struct list_head	cache_link;	/* link in cache->object_list */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 3c53386e4f6e..1d141d17f63b 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -71,6 +71,9 @@ enum fscache_want_stage {
 	FSCACHE_WANT_READ,
 };
 
+#define FSCACHE_INVAL_LIGHT		0x01 /* Don't re-invalidate if temp object */
+#define FSCACHE_INVAL_DIO_WRITE		0x02 /* Invalidate due to DIO write */
+
 /*
  * fscache cached network filesystem type
  * - name, version and ops must be filled in before registration
@@ -148,6 +151,7 @@ struct fscache_op_resources {
 #if __fscache_available
 	const struct fscache_op_ops	*ops;
 	struct fscache_object		*object;
+	unsigned int			inval_counter;	/* object->inval_counter at begin_op */
 #endif
 };
 
@@ -214,7 +218,7 @@ extern int __fscache_begin_operation(struct fscache_cookie *, struct fscache_op_
 				     enum fscache_want_stage);
 extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
 extern void __fscache_update_cookie(struct fscache_cookie *, const void *, const loff_t *);
-extern void __fscache_invalidate(struct fscache_cookie *, loff_t);
+extern void __fscache_invalidate(struct fscache_cookie *, const void *, loff_t, unsigned int);
 extern void fscache_put_super(struct super_block *,
 			      struct fscache_cookie *(*get_cookie)(struct inode *));
 
@@ -442,22 +446,30 @@ void fscache_unpin_cookie(struct fscache_cookie *cookie)
 /**
  * fscache_invalidate - Notify cache that an object needs invalidation
  * @cookie: The cookie representing the cache object
+ * @aux_data: The updated auxiliary data for the cookie (may be NULL)
  * @size: The revised size of the object.
+ * @flags: Invalidation flags (FSCACHE_INVAL_*)
  *
  * Notify the cache that an object is needs to be invalidated and that it
  * should abort any retrievals or stores it is doing on the cache.  The object
  * is then marked non-caching until such time as the invalidation is complete.
  *
- * This can be called with spinlocks held.
+ * FSCACHE_INVAL_LIGHT indicates that if the object has been invalidated and
+ * replaced by a temporary object, the temporary object need not be replaced
+ * again.  This is primarily intended for use with FSCACHE_ADV_SINGLE_CHUNK.
+ *
+ * FSCACHE_INVAL_DIO_WRITE indicates that this is due to a direct I/O write and
+ * may cause caching to be suspended on this cookie.
  *
  * See Documentation/filesystems/caching/netfs-api.rst for a complete
  * description.
  */
 static inline
-void fscache_invalidate(struct fscache_cookie *cookie, loff_t size)
+void fscache_invalidate(struct fscache_cookie *cookie,
+			const void *aux_data, loff_t size, unsigned int flags)
 {
 	if (fscache_cookie_valid(cookie))
-		__fscache_invalidate(cookie, size);
+		__fscache_invalidate(cookie, aux_data, size, flags);
 }
 
 /**
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 2edf74c40e83..adb5618ce0c1 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -200,6 +200,25 @@ TRACE_EVENT(fscache_relinquish,
 		      __entry->flags, __entry->retire)
 	    );
 
+TRACE_EVENT(fscache_invalidate,
+	    TP_PROTO(struct fscache_cookie *cookie, loff_t new_size),
+
+	    TP_ARGS(cookie, new_size),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,		cookie		)
+		    __field(loff_t,			new_size	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->cookie	= cookie->debug_id;
+		    __entry->new_size	= new_size;
+			   ),
+
+	    TP_printk("c=%08x sz=%llx",
+		      __entry->cookie, __entry->new_size)
+	    );
+
 #endif /* _TRACE_FSCACHE_H */
 
 /* This part must be outside protection */



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 52/76] fscache: disable cookie when doing an invalidation for DIO write
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (49 preceding siblings ...)
  2020-11-20 15:13 ` [RFC PATCH 51/76] fscache, cachefiles: Rewrite invalidation David Howells
@ 2020-11-20 15:13 ` David Howells
  2020-11-20 15:13 ` [RFC PATCH 53/76] fscache: Implement "will_modify" parameter on fscache_use_cookie() David Howells
                   ` (23 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:13 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: Jeff Layton, dhowells, Jeff Layton, Matthew Wilcox,
	Alexander Viro, linux-cachefs, linux-afs, linux-nfs, linux-cifs,
	ceph-devel, v9fs-developer, linux-fsdevel, linux-kernel

From: Jeff Layton <jlayton@kernel.org>

O_DIRECT I/O is probably a good indicator that we don't need to be
caching this file at the moment. Disable the cookie by treating it
as we would a NULL cookie after the invalidation completes. Reenable
when the last unuse is done.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fscache/cookie.c     |    9 ++++++++-
 include/linux/fscache.h |    9 ++++++---
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 1b30b28f7cf6..d20028535a86 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -427,8 +427,11 @@ void __fscache_unuse_cookie(struct fscache_cookie *cookie,
 {
 	if (aux_data || object_size)
 		__fscache_update_cookie(cookie, aux_data, object_size);
-	if (atomic_dec_and_test(&cookie->n_active))
+	if (atomic_dec_and_test(&cookie->n_active)) {
+		clear_bit(FSCACHE_COOKIE_DISABLED, &cookie->flags);
+		smp_mb__after_atomic();
 		wake_up_var(&cookie->n_active);
+	}
 }
 EXPORT_SYMBOL(__fscache_unuse_cookie);
 
@@ -554,6 +557,10 @@ void __fscache_invalidate(struct fscache_cookie *cookie,
 	 */
 	ASSERTCMP(cookie->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
 
+	if ((flags & FSCACHE_INVAL_DIO_WRITE) &&
+	    test_and_set_bit(FSCACHE_COOKIE_DISABLED, &cookie->flags))
+		return;
+
 	spin_lock(&cookie->lock);
 	fscache_update_aux(cookie, aux_data, &new_size);
 	cookie->zero_point = new_size;
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 1d141d17f63b..3c173fb660a6 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -26,11 +26,13 @@
 #define fscache_available() (1)
 #define fscache_cookie_valid(cookie) (cookie)
 #define fscache_object_valid(object) (object)
+#define fscache_cookie_enabled(cookie) (cookie && !test_bit(FSCACHE_COOKIE_DISABLED, &cookie->flags))
 #else
 #define __fscache_available (0)
 #define fscache_available() (0)
 #define fscache_cookie_valid(cookie) (0)
 #define fscache_object_valid(object) (NULL)
+#define fscache_cookie_enabled(cookie) (0)
 #endif
 
 
@@ -127,6 +129,7 @@ struct fscache_cookie {
 
 	unsigned long			flags;
 #define FSCACHE_COOKIE_RELINQUISHED	6		/* T if cookie has been relinquished */
+#define FSCACHE_COOKIE_DISABLED		7		/* T if cookie has been disabled */
 
 	enum fscache_cookie_stage	stage;
 	enum fscache_cookie_type	type:8;
@@ -410,7 +413,7 @@ static inline
 void fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data,
 			   const loff_t *object_size)
 {
-	if (fscache_cookie_valid(cookie))
+	if (fscache_cookie_enabled(cookie))
 		__fscache_update_cookie(cookie, aux_data, object_size);
 }
 
@@ -468,7 +471,7 @@ static inline
 void fscache_invalidate(struct fscache_cookie *cookie,
 			const void *aux_data, loff_t size, unsigned int flags)
 {
-	if (fscache_cookie_valid(cookie))
+	if (fscache_cookie_enabled(cookie))
 		__fscache_invalidate(cookie, aux_data, size, flags);
 }
 
@@ -492,7 +495,7 @@ int fscache_begin_operation(struct fscache_cookie *cookie,
 			    struct fscache_op_resources *opr,
 			    enum fscache_want_stage want_stage)
 {
-	if (fscache_cookie_valid(cookie))
+	if (fscache_cookie_enabled(cookie))
 		return __fscache_begin_operation(cookie, opr, want_stage);
 	return -ENOBUFS;
 }



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 53/76] fscache: Implement "will_modify" parameter on fscache_use_cookie()
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (50 preceding siblings ...)
  2020-11-20 15:13 ` [RFC PATCH 52/76] fscache: disable cookie when doing an invalidation for DIO write David Howells
@ 2020-11-20 15:13 ` David Howells
  2020-11-20 15:14 ` [RFC PATCH 54/76] fscache: Provide resize operation David Howells
                   ` (22 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:13 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Implement the "will_modify" parameter passed to fscache_use_cookie().

Setting this to true will henceforth cause the affected object to be marked
as dirty on disk, subject to conflict resolution in the event that power
failure or a crash occurs or the filesystem operates in disconnected mode.

The dirty flag is removed when the fscache_object is discarded from memory.

A cache hook is provided to prepare for writing - and this can be used to
mark the object on disk.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/interface.c     |   68 +++++++++++++++++++++++++++++++++++++++++
 fs/cachefiles/internal.h      |    2 +
 fs/cachefiles/xattr.c         |   27 ++++++++++++++++
 fs/fscache/cookie.c           |   15 ++++++++-
 fs/fscache/internal.h         |    1 +
 fs/fscache/obj.c              |   28 ++++++++++++++---
 include/linux/fscache-cache.h |    4 ++
 7 files changed, 137 insertions(+), 8 deletions(-)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 4da10640611c..56ae8d956174 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -208,14 +208,81 @@ static void cachefiles_update_object(struct fscache_object *_object)
 	_leave("");
 }
 
+/*
+ * Shorten the backing object to discard any dirty data and free up
+ * any unused granules.
+ */
+static bool cachefiles_shorten_object(struct cachefiles_object *object, loff_t new_size)
+{
+	struct cachefiles_cache *cache;
+	struct inode *inode;
+	struct path path;
+	loff_t i_size;
+	int ret;
+
+	cache = container_of(object->fscache.cache,
+			     struct cachefiles_cache, cache);
+	path.mnt = cache->mnt;
+	path.dentry = object->dentry;
+
+	inode = d_inode(object->dentry);
+	trace_cachefiles_trunc(object, inode, i_size_read(inode), new_size);
+	ret = vfs_truncate(&path, new_size);
+	if (ret < 0) {
+		cachefiles_io_error_obj(object, "Trunc-to-size failed %d", ret);
+		cachefiles_remove_object_xattr(cache, object->dentry);
+		return false;
+	}
+
+	new_size = round_up(new_size, CACHEFILES_DIO_BLOCK_SIZE);
+	i_size = i_size_read(inode);
+	if (i_size < new_size) {
+		trace_cachefiles_trunc(object, inode, i_size, new_size);
+		ret = vfs_truncate(&path, new_size);
+		if (ret < 0) {
+			cachefiles_io_error_obj(object, "Trunc-to-dio-size failed %d", ret);
+			cachefiles_remove_object_xattr(cache, object->dentry);
+			return false;
+		}
+	}
+
+	return true;
+}
+
+/*
+ * Trim excess stored data off of an object.
+ */
+static bool cachefiles_trim_object(struct cachefiles_object *object)
+{
+	loff_t object_size;
+
+	_enter("{OBJ%x}", object->fscache.debug_id);
+
+	object_size = object->fscache.cookie->object_size;
+	if (i_size_read(d_inode(object->dentry)) <= object_size)
+		return true;
+
+	return cachefiles_shorten_object(object, object_size);
+}
+
 /*
  * Commit changes to the object as we drop it.
  */
 static bool cachefiles_commit_object(struct cachefiles_object *object,
 				     struct cachefiles_cache *cache)
 {
+	bool update = false;
+
 	if (object->content_map_changed)
 		cachefiles_save_content_map(object);
+	if (test_and_clear_bit(FSCACHE_OBJECT_LOCAL_WRITE, &object->fscache.flags))
+		update = true;
+	if (test_and_clear_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->fscache.flags))
+		update = true;
+	if (update) {
+		if (cachefiles_trim_object(object))
+			cachefiles_set_object_xattr(object);
+	}
 
 	if (test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags))
 		return cachefiles_commit_tmpfile(cache, object);
@@ -591,5 +658,6 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
 	.get_object_usage	= cachefiles_get_object_usage,
 	.sync_cache		= cachefiles_sync_cache,
 	.begin_operation	= cachefiles_begin_operation,
+	.prepare_to_write	= cachefiles_prepare_to_write,
 	.display_object		= cachefiles_display_object,
 };
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index af68564598d5..e2d06c0860a2 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -258,7 +258,7 @@ extern int cachefiles_set_object_xattr(struct cachefiles_object *object);
 extern int cachefiles_check_auxdata(struct cachefiles_object *object);
 extern int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
 					  struct dentry *dentry);
-
+extern int cachefiles_prepare_to_write(struct fscache_object *object);
 
 /*
  * error handling
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index 991ecda1f140..2f8623a7b864 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -124,6 +124,8 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object)
 	buf->zero_point		= cpu_to_be64(object->fscache.cookie->zero_point);
 	buf->type		= object->fscache.cookie->type;
 	buf->content		= object->content_info;
+	if (test_bit(FSCACHE_OBJECT_LOCAL_WRITE, &object->fscache.flags))
+		buf->content	= CACHEFILES_CONTENT_DIRTY;
 	if (len > 0)
 		memcpy(buf->data, fscache_get_aux(object->fscache.cookie), len);
 
@@ -184,6 +186,10 @@ int cachefiles_check_auxdata(struct cachefiles_object *object)
 		why = cachefiles_coherency_check_aux;
 	} else if (be64_to_cpu(buf->object_size) != object->fscache.cookie->object_size) {
 		why = cachefiles_coherency_check_objsize;
+	} else if (buf->content == CACHEFILES_CONTENT_DIRTY) {
+		// TODO: Begin conflict resolution
+		pr_warn("Dirty object in cache\n");
+		why = cachefiles_coherency_check_dirty;
 	} else {
 		object->fscache.cookie->zero_point = be64_to_cpu(buf->zero_point);
 		object->content_info = buf->content;
@@ -219,3 +225,24 @@ int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
 	_leave(" = %d", ret);
 	return ret;
 }
+
+/*
+ * Stick a marker on the cache object to indicate that it's dirty.
+ */
+int cachefiles_prepare_to_write(struct fscache_object *_object)
+{
+	int ret;
+	const struct cred *saved_cred;
+	struct cachefiles_object *object =
+		container_of(_object, struct cachefiles_object, fscache);
+	struct cachefiles_cache *cache =
+		container_of(_object->cache, struct cachefiles_cache, cache);
+
+	_enter("c=%08x", object->fscache.cookie->debug_id);
+
+	cachefiles_begin_secure(cache, &saved_cred);
+	ret = cachefiles_set_object_xattr(object);
+	cachefiles_end_secure(cache, saved_cred);
+
+	return ret;
+}
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index d20028535a86..bfea035af06d 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -363,6 +363,8 @@ EXPORT_SYMBOL(__fscache_acquire_cookie);
 void __fscache_use_cookie(struct fscache_cookie *cookie, bool will_modify)
 {
 	enum fscache_cookie_stage stage;
+	struct fscache_object *object;
+	bool write_set;
 
 	_enter("c=%08x", cookie->debug_id);
 
@@ -398,8 +400,17 @@ void __fscache_use_cookie(struct fscache_cookie *cookie, bool will_modify)
 	case FSCACHE_COOKIE_STAGE_NO_DATA_YET:
 	case FSCACHE_COOKIE_STAGE_ACTIVE:
 	case FSCACHE_COOKIE_STAGE_INVALIDATING:
-		// TODO: Handle will_modify
-		spin_unlock(&cookie->lock);
+		if (will_modify) {
+			object = hlist_entry(cookie->backing_objects.first,
+					     struct fscache_object, cookie_link);
+			write_set = test_and_set_bit(FSCACHE_OBJECT_LOCAL_WRITE,
+						     &object->flags);
+			spin_unlock(&cookie->lock);
+			if (!write_set)
+				fscache_dispatch(cookie, object, 0, fscache_prepare_to_write);
+		} else {
+			spin_unlock(&cookie->lock);
+		}
 		break;
 
 	case FSCACHE_COOKIE_STAGE_FAILED:
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 73568e84fe3d..3c408da2c837 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -152,6 +152,7 @@ extern void fscache_lookup_object(struct fscache_cookie *, struct fscache_object
 extern void fscache_invalidate_object(struct fscache_cookie *, struct fscache_object *, int);
 extern void fscache_drop_object(struct fscache_cookie *, struct fscache_object *, bool);
 extern void fscache_relinquish_objects(struct fscache_cookie *, struct fscache_object *, int);
+extern void fscache_prepare_to_write(struct fscache_cookie *, struct fscache_object *, int);
 
 /*
  * object-list.c
diff --git a/fs/fscache/obj.c b/fs/fscache/obj.c
index 60c70c0e474d..a36de6af2182 100644
--- a/fs/fscache/obj.c
+++ b/fs/fscache/obj.c
@@ -117,7 +117,8 @@ static bool fscache_wrangle_object(struct fscache_cookie *cookie,
  * Create an object chain, making sure that the index chain is fully created.
  */
 static struct fscache_object *fscache_lookup_object_chain(struct fscache_cookie *cookie,
-							  struct fscache_cache *cache)
+							  struct fscache_cache *cache,
+							  bool will_modify)
 {
 	struct fscache_object *object = NULL, *parent, *xobject;
 
@@ -131,7 +132,7 @@ static struct fscache_object *fscache_lookup_object_chain(struct fscache_cookie
 	spin_unlock(&cookie->lock);
 
 	/* Recurse to look up/create the parent index. */
-	parent = fscache_lookup_object_chain(cookie->parent, cache);
+	parent = fscache_lookup_object_chain(cookie->parent, cache, false);
 	if (!parent)
 		goto error;
 
@@ -146,6 +147,9 @@ static struct fscache_object *fscache_lookup_object_chain(struct fscache_cookie
 	if (!object)
 		goto error;
 
+	if (will_modify)
+		__set_bit(FSCACHE_OBJECT_LOCAL_WRITE, &object->flags);
+
 	xobject = fscache_attach_object(cookie, object);
 	if (xobject != object) {
 		fscache_do_put_object(object, fscache_obj_put_alloc_dup);
@@ -207,7 +211,8 @@ static struct fscache_object *fscache_lookup_object_chain(struct fscache_cookie
  * - this must make sure the index chain is instantiated and instantiate the
  *   object representation too
  */
-static void fscache_lookup_object_locked(struct fscache_cookie *cookie)
+static void fscache_lookup_object_locked(struct fscache_cookie *cookie,
+					 bool will_modify)
 {
 	struct fscache_object *object;
 	struct fscache_cache *cache;
@@ -225,12 +230,16 @@ static void fscache_lookup_object_locked(struct fscache_cookie *cookie)
 
 	_debug("cache %s", cache->tag->name);
 
-	object = fscache_lookup_object_chain(cookie, cache);
+	object = fscache_lookup_object_chain(cookie, cache, will_modify);
 	if (!object) {
 		_leave(" [fail]");
 		return;
 	}
 
+	if (will_modify &&
+	    test_and_set_bit(FSCACHE_OBJECT_LOCAL_WRITE, &object->flags))
+		fscache_prepare_to_write(cookie, object, 0);
+
 	fscache_do_put_object(object, fscache_obj_put);
 	_leave(" [done]");
 }
@@ -239,7 +248,7 @@ void fscache_lookup_object(struct fscache_cookie *cookie,
 			   struct fscache_object *object, int param)
 {
 	down_read(&fscache_addremove_sem);
-	fscache_lookup_object_locked(cookie);
+	fscache_lookup_object_locked(cookie, param);
 	up_read(&fscache_addremove_sem);
 	__fscache_unuse_cookie(cookie, NULL, NULL);
 }
@@ -354,3 +363,12 @@ void fscache_relinquish_objects(struct fscache_cookie *cookie,
 
 	fscache_drop_cookie(cookie);
 }
+
+/*
+ * Prepare a cache object to be written to.
+ */
+void fscache_prepare_to_write(struct fscache_cookie *cookie,
+			      struct fscache_object *object, int param)
+{
+	object->cache->ops->prepare_to_write(object);
+}
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index a0bb526735b5..eb303deb39c1 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -142,6 +142,9 @@ struct fscache_cache_ops {
 	/* Begin an operation on a cache object */
 	void (*begin_operation)(struct fscache_op_resources *opr);
 
+	/* Prepare to write to a live cache object */
+	int (*prepare_to_write)(struct fscache_object *object);
+
 	/* Display object info in /proc/fs/fscache/objects */
 	int (*display_object)(struct seq_file *m, struct fscache_object *object);
 };
@@ -170,6 +173,7 @@ struct fscache_object {
 	spinlock_t		lock;		/* state and operations lock */
 
 	unsigned long		flags;
+#define FSCACHE_OBJECT_LOCAL_WRITE	1	/* T if the object is being modified locally */
 #define FSCACHE_OBJECT_NEEDS_INVAL	8	/* T if object needs invalidation */
 #define FSCACHE_OBJECT_NEEDS_UPDATE	9	/* T if object attrs need writing to disk */
 



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 54/76] fscache: Provide resize operation
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (51 preceding siblings ...)
  2020-11-20 15:13 ` [RFC PATCH 53/76] fscache: Implement "will_modify" parameter on fscache_use_cookie() David Howells
@ 2020-11-20 15:14 ` David Howells
  2020-11-20 15:14 ` [RFC PATCH 55/76] fscache: Remove the update operation David Howells
                   ` (21 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:14 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Provide a cache operation to resize an object.  This is intended to be run
synchronously rather than being deferred as it really needs to run inside
the inode lock on the netfs inode from ->setattr() to correctly order with
respect to other truncates and writes.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/interface.c      |   31 +++++++++++++++++++++++++++++++
 fs/fscache/internal.h          |    3 +++
 fs/fscache/io.c                |   28 ++++++++++++++++++++++++++++
 fs/fscache/stats.c             |    9 +++++++--
 include/linux/fscache-cache.h  |    2 ++
 include/linux/fscache.h        |   18 ++++++++++++++++++
 include/trace/events/fscache.h |   26 ++++++++++++++++++++++++++
 7 files changed, 115 insertions(+), 2 deletions(-)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 56ae8d956174..aca08e4227b9 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -249,6 +249,36 @@ static bool cachefiles_shorten_object(struct cachefiles_object *object, loff_t n
 	return true;
 }
 
+/*
+ * Resize the backing object.
+ */
+static void cachefiles_resize_object(struct fscache_object *_object, loff_t new_size)
+{
+	struct cachefiles_object *object =
+		container_of(_object, struct cachefiles_object, fscache);
+	struct cachefiles_cache *cache =
+		container_of(object->fscache.cache, struct cachefiles_cache, cache);
+	const struct cred *saved_cred;
+	loff_t old_size = object->fscache.cookie->object_size;
+
+	_enter("%llu->%llu", old_size, new_size);
+
+	if (new_size < old_size) {
+		cachefiles_begin_secure(cache, &saved_cred);
+		cachefiles_shorten_content_map(object, new_size);
+		cachefiles_shorten_object(object, new_size);
+		cachefiles_end_secure(cache, saved_cred);
+		object->fscache.cookie->object_size = new_size;
+		return;
+	}
+
+	/* The file is being expanded.  We don't need to do anything
+	 * particularly.  cookie->initial_size doesn't change and so the point
+	 * at which we have to download before doesn't change.
+	 */
+	object->fscache.cookie->object_size = new_size;
+}
+
 /*
  * Trim excess stored data off of an object.
  */
@@ -652,6 +682,7 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
 	.free_lookup_data	= cachefiles_free_lookup_data,
 	.grab_object		= cachefiles_grab_object,
 	.update_object		= cachefiles_update_object,
+	.resize_object		= cachefiles_resize_object,
 	.invalidate_object	= cachefiles_invalidate_object,
 	.drop_object		= cachefiles_drop_object,
 	.put_object		= cachefiles_put_object,
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 3c408da2c837..ff193f61a4c5 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -194,6 +194,9 @@ extern atomic_t fscache_n_updates_run;
 extern atomic_t fscache_n_relinquishes;
 extern atomic_t fscache_n_relinquishes_retire;
 
+extern atomic_t fscache_n_resizes;
+extern atomic_t fscache_n_resizes_null;
+
 extern atomic_t fscache_n_cookie_index;
 extern atomic_t fscache_n_cookie_data;
 extern atomic_t fscache_n_cookie_special;
diff --git a/fs/fscache/io.c b/fs/fscache/io.c
index f13a7729bad3..5401c9ed347b 100644
--- a/fs/fscache/io.c
+++ b/fs/fscache/io.c
@@ -232,3 +232,31 @@ void fscache_put_super(struct super_block *sb,
 	}
 }
 EXPORT_SYMBOL(fscache_put_super);
+
+/*
+ * Change the size of a backing object.
+ */
+void __fscache_resize_cookie(struct fscache_cookie *cookie, loff_t new_size)
+{
+	struct fscache_op_resources opr;
+
+	ASSERT(cookie->type != FSCACHE_COOKIE_TYPE_INDEX);
+
+	trace_fscache_resize(cookie, new_size);
+	if (fscache_begin_operation(cookie, &opr, FSCACHE_WANT_WRITE) != -ENOBUFS) {
+		struct fscache_object *object = opr.object;
+
+		fscache_stat(&fscache_n_resizes);
+		set_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->flags);
+
+		/* We cannot defer a resize as we need to do it inside the
+		 * netfs's inode lock so that we're serialised with respect to
+		 * writes.
+		 */
+		object->cache->ops->resize_object(object, new_size);
+		fscache_end_operation(&opr);
+	} else {
+		fscache_stat(&fscache_n_resizes_null);
+	}
+}
+EXPORT_SYMBOL(__fscache_resize_cookie);
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index 952214305853..2a2df9d1649e 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -28,6 +28,9 @@ atomic_t fscache_n_updates_run;
 atomic_t fscache_n_relinquishes;
 atomic_t fscache_n_relinquishes_retire;
 
+atomic_t fscache_n_resizes;
+atomic_t fscache_n_resizes_null;
+
 atomic_t fscache_n_cookie_index;
 atomic_t fscache_n_cookie_data;
 atomic_t fscache_n_cookie_special;
@@ -99,9 +102,11 @@ static int fscache_stats_show(struct seq_file *m, void *v)
 	seq_printf(m, "Invals : n=%u\n",
 		   atomic_read(&fscache_n_invalidates));
 
-	seq_printf(m, "Updates: n=%u run=%u\n",
+	seq_printf(m, "Updates: n=%u nul=%u rsz=%u rsn=%u\n",
 		   atomic_read(&fscache_n_updates),
-		   atomic_read(&fscache_n_updates_run));
+		   atomic_read(&fscache_n_updates_run),
+		   atomic_read(&fscache_n_resizes),
+		   atomic_read(&fscache_n_resizes_null));
 
 	seq_printf(m, "Relinqs: n=%u rtr=%u\n",
 		   atomic_read(&fscache_n_relinquishes),
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index eb303deb39c1..958fa899917d 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -117,6 +117,8 @@ struct fscache_cache_ops {
 
 	/* store the updated auxiliary data on an object */
 	void (*update_object)(struct fscache_object *object);
+	/* Change the size of a data object */
+	void (*resize_object)(struct fscache_object *object, loff_t new_size);
 
 	/* Invalidate an object */
 	bool (*invalidate_object)(struct fscache_object *object,
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 3c173fb660a6..1c1ea3558421 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -221,6 +221,7 @@ extern int __fscache_begin_operation(struct fscache_cookie *, struct fscache_op_
 				     enum fscache_want_stage);
 extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
 extern void __fscache_update_cookie(struct fscache_cookie *, const void *, const loff_t *);
+extern void __fscache_resize_cookie(struct fscache_cookie *, loff_t);
 extern void __fscache_invalidate(struct fscache_cookie *, const void *, loff_t, unsigned int);
 extern void fscache_put_super(struct super_block *,
 			      struct fscache_cookie *(*get_cookie)(struct inode *));
@@ -417,6 +418,23 @@ void fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data,
 		__fscache_update_cookie(cookie, aux_data, object_size);
 }
 
+/**
+ * fscache_resize_cookie - Request that a cache object be resized
+ * @cookie: The cookie representing the cache object
+ * @new_size: The new size of the object (may be NULL)
+ *
+ * Request that the size of an object be changed.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+void fscache_resize_cookie(struct fscache_cookie *cookie, loff_t new_size)
+{
+	if (fscache_cookie_enabled(cookie))
+		__fscache_resize_cookie(cookie, new_size);
+}
+
 /**
  * fscache_pin_cookie - Pin a data-storage cache object in its cache
  * @cookie: The cookie representing the cache object
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index adb5618ce0c1..20bf21d12d0c 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -219,6 +219,32 @@ TRACE_EVENT(fscache_invalidate,
 		      __entry->cookie, __entry->new_size)
 	    );
 
+TRACE_EVENT(fscache_resize,
+	    TP_PROTO(struct fscache_cookie *cookie, loff_t new_size),
+
+	    TP_ARGS(cookie, new_size),
+
+	    TP_STRUCT__entry(
+		    __field(unsigned int,		cookie		)
+		    __field(loff_t,			old_size	)
+		    __field(loff_t,			zero_point	)
+		    __field(loff_t,			new_size	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->cookie	= cookie->debug_id;
+		    __entry->old_size	= cookie->object_size;
+		    __entry->zero_point	= cookie->zero_point;
+		    __entry->new_size	= new_size;
+			   ),
+
+	    TP_printk("c=%08x os=%08llx zp=%08llx sz=%08llx",
+		      __entry->cookie,
+		      __entry->old_size,
+		      __entry->zero_point,
+		      __entry->new_size)
+	    );
+
 #endif /* _TRACE_FSCACHE_H */
 
 /* This part must be outside protection */



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 55/76] fscache: Remove the update operation
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (52 preceding siblings ...)
  2020-11-20 15:14 ` [RFC PATCH 54/76] fscache: Provide resize operation David Howells
@ 2020-11-20 15:14 ` David Howells
  2020-11-20 15:14 ` [RFC PATCH 56/76] afs: Pass page into dirty region helpers to provide THP size David Howells
                   ` (20 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:14 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Remove the cache-side of the object update operation as it doesn't
serialise with other setattr, O_TRUNC and write operations.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/interface.c     |   59 -----------------------------------------
 fs/fscache/internal.h         |    2 -
 fs/fscache/obj.c              |   14 ----------
 fs/fscache/stats.c            |    6 +---
 include/linux/fscache-cache.h |    2 -
 5 files changed, 2 insertions(+), 81 deletions(-)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index aca08e4227b9..3609ff2fb491 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -150,64 +150,6 @@ struct fscache_object *cachefiles_grab_object(struct fscache_object *_object,
 	return &object->fscache;
 }
 
-/*
- * update the auxiliary data for an object object on disk
- */
-static void cachefiles_update_object(struct fscache_object *_object)
-{
-	struct cachefiles_object *object;
-	struct cachefiles_cache *cache;
-	const struct cred *saved_cred;
-	struct inode *inode;
-	loff_t object_size, i_size;
-	int ret;
-
-	_enter("{OBJ%x}", _object->debug_id);
-
-	object = container_of(_object, struct cachefiles_object, fscache);
-	cache = container_of(object->fscache.cache, struct cachefiles_cache,
-			     cache);
-
-	cachefiles_begin_secure(cache, &saved_cred);
-
-	object_size = object->fscache.cookie->object_size;
-	inode = d_inode(object->dentry);
-	i_size = i_size_read(inode);
-	if (i_size > object_size) {
-		struct path path = {
-			.mnt	= cache->mnt,
-			.dentry	= object->dentry
-		};
-		_debug("trunc %llx -> %llx", i_size, object_size);
-		trace_cachefiles_trunc(object, inode, i_size, object_size);
-		ret = vfs_truncate(&path, object_size);
-		if (ret < 0) {
-			cachefiles_io_error_obj(object, "Trunc-to-size failed");
-			cachefiles_remove_object_xattr(cache, object->dentry);
-			goto out;
-		}
-
-		object_size = round_up(object_size, CACHEFILES_DIO_BLOCK_SIZE);
-		i_size = i_size_read(inode);
-		_debug("trunc %llx -> %llx", i_size, object_size);
-		if (i_size < object_size) {
-			trace_cachefiles_trunc(object, inode, i_size, object_size);
-			ret = vfs_truncate(&path, object_size);
-			if (ret < 0) {
-				cachefiles_io_error_obj(object, "Trunc-to-dio-size failed");
-				cachefiles_remove_object_xattr(cache, object->dentry);
-				goto out;
-			}
-		}
-	}
-
-	cachefiles_set_object_xattr(object);
-
-out:
-	cachefiles_end_secure(cache, saved_cred);
-	_leave("");
-}
-
 /*
  * Shorten the backing object to discard any dirty data and free up
  * any unused granules.
@@ -681,7 +623,6 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
 	.lookup_object		= cachefiles_lookup_object,
 	.free_lookup_data	= cachefiles_free_lookup_data,
 	.grab_object		= cachefiles_grab_object,
-	.update_object		= cachefiles_update_object,
 	.resize_object		= cachefiles_resize_object,
 	.invalidate_object	= cachefiles_invalidate_object,
 	.drop_object		= cachefiles_drop_object,
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index ff193f61a4c5..510f166103bf 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -189,7 +189,6 @@ extern atomic_t fscache_n_acquires_oom;
 extern atomic_t fscache_n_invalidates;
 
 extern atomic_t fscache_n_updates;
-extern atomic_t fscache_n_updates_run;
 
 extern atomic_t fscache_n_relinquishes;
 extern atomic_t fscache_n_relinquishes_retire;
@@ -214,7 +213,6 @@ extern atomic_t fscache_n_cop_alloc_object;
 extern atomic_t fscache_n_cop_lookup_object;
 extern atomic_t fscache_n_cop_create_object;
 extern atomic_t fscache_n_cop_invalidate_object;
-extern atomic_t fscache_n_cop_update_object;
 extern atomic_t fscache_n_cop_drop_object;
 extern atomic_t fscache_n_cop_put_object;
 extern atomic_t fscache_n_cop_sync_cache;
diff --git a/fs/fscache/obj.c b/fs/fscache/obj.c
index a36de6af2182..2b0c99095a42 100644
--- a/fs/fscache/obj.c
+++ b/fs/fscache/obj.c
@@ -54,14 +54,6 @@ static int fscache_do_create_object(struct fscache_object *object, void *data)
 	return ret;
 }
 
-static void fscache_do_update_object(struct fscache_object *object)
-{
-	fscache_stat(&fscache_n_updates_run);
-	fscache_stat(&fscache_n_cop_update_object);
-	object->cache->ops->update_object(object);
-	fscache_stat_d(&fscache_n_cop_update_object);
-}
-
 static void fscache_do_drop_object(struct fscache_cache *cache,
 				   struct fscache_object *object,
 				   bool invalidate)
@@ -291,12 +283,6 @@ void fscache_drop_object(struct fscache_cookie *cookie,
 	_enter("{o=%08x,%d},%u",
 	       object->debug_id, object->n_children, invalidate);
 
-	if (!invalidate &&
-	    test_bit(FSCACHE_OBJECT_NEEDS_UPDATE, &object->flags)) {
-		_debug("final update");
-		fscache_do_update_object(object);
-	}
-
 	spin_lock(&cache->object_list_lock);
 	list_del_init(&object->cache_link);
 	spin_unlock(&cache->object_list_lock);
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index 2a2df9d1649e..350e65870dff 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -102,9 +102,8 @@ static int fscache_stats_show(struct seq_file *m, void *v)
 	seq_printf(m, "Invals : n=%u\n",
 		   atomic_read(&fscache_n_invalidates));
 
-	seq_printf(m, "Updates: n=%u nul=%u rsz=%u rsn=%u\n",
+	seq_printf(m, "Updates: n=%u rsz=%u rsn=%u\n",
 		   atomic_read(&fscache_n_updates),
-		   atomic_read(&fscache_n_updates_run),
 		   atomic_read(&fscache_n_resizes),
 		   atomic_read(&fscache_n_resizes_null));
 
@@ -115,9 +114,8 @@ static int fscache_stats_show(struct seq_file *m, void *v)
 	seq_printf(m, "CacheOp: alo=%d luo=%d\n",
 		   atomic_read(&fscache_n_cop_alloc_object),
 		   atomic_read(&fscache_n_cop_lookup_object));
-	seq_printf(m, "CacheOp: inv=%d upo=%d dro=%d pto=%d atc=%d syn=%d\n",
+	seq_printf(m, "CacheOp: inv=%d dro=%d pto=%d atc=%d syn=%d\n",
 		   atomic_read(&fscache_n_cop_invalidate_object),
-		   atomic_read(&fscache_n_cop_update_object),
 		   atomic_read(&fscache_n_cop_drop_object),
 		   atomic_read(&fscache_n_cop_put_object),
 		   atomic_read(&fscache_n_cop_attr_changed),
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 958fa899917d..74a09738c899 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -115,8 +115,6 @@ struct fscache_cache_ops {
 	/* unpin an object in the cache */
 	void (*unpin_object)(struct fscache_object *object);
 
-	/* store the updated auxiliary data on an object */
-	void (*update_object)(struct fscache_object *object);
 	/* Change the size of a data object */
 	void (*resize_object)(struct fscache_object *object, loff_t new_size);
 



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 56/76] afs: Pass page into dirty region helpers to provide THP size
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (53 preceding siblings ...)
  2020-11-20 15:14 ` [RFC PATCH 55/76] fscache: Remove the update operation David Howells
@ 2020-11-20 15:14 ` David Howells
  2020-11-20 15:14 ` [RFC PATCH 57/76] afs: Print the operation debug_id when logging an unexpected data version David Howells
                   ` (19 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:14 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Pass a pointer to the page being accessed into the dirty region helpers so
that the size of the page can be determined in case it's a transparent huge
page.

This also required the page to be passed into the afs_page_dirty trace
point - so there's no need to specifically pass in the index or private
data as these can be retrieved directly from the page struct.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/file.c              |   20 +++++++--------
 fs/afs/internal.h          |   16 ++++++------
 fs/afs/write.c             |   60 ++++++++++++++++++--------------------------
 include/trace/events/afs.h |   23 ++++++++++-------
 4 files changed, 55 insertions(+), 64 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index ba8874720626..c06197e4339b 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -528,8 +528,8 @@ static void afs_invalidate_dirty(struct page *page, unsigned int offset,
 		return;
 
 	/* We may need to shorten the dirty region */
-	f = afs_page_dirty_from(priv);
-	t = afs_page_dirty_to(priv);
+	f = afs_page_dirty_from(page, priv);
+	t = afs_page_dirty_to(page, priv);
 
 	if (t <= offset || f >= end)
 		return; /* Doesn't overlap */
@@ -547,17 +547,17 @@ static void afs_invalidate_dirty(struct page *page, unsigned int offset,
 	if (f == t)
 		goto undirty;
 
-	priv = afs_page_dirty(f, t);
+	priv = afs_page_dirty(page, f, t);
 	set_page_private(page, priv);
-	trace_afs_page_dirty(vnode, tracepoint_string("trunc"), page->index, priv);
+	trace_afs_page_dirty(vnode, tracepoint_string("trunc"), page);
 	return;
 
 undirty:
-	trace_afs_page_dirty(vnode, tracepoint_string("undirty"), page->index, priv);
+	trace_afs_page_dirty(vnode, tracepoint_string("undirty"), page);
 	clear_page_dirty_for_io(page);
 full_invalidate:
-	priv = (unsigned long)detach_page_private(page);
-	trace_afs_page_dirty(vnode, tracepoint_string("inval"), page->index, priv);
+	detach_page_private(page);
+	trace_afs_page_dirty(vnode, tracepoint_string("inval"), page);
 }
 
 /*
@@ -585,7 +585,6 @@ static void afs_invalidatepage(struct page *page, unsigned int offset,
 static int afs_releasepage(struct page *page, gfp_t gfp_flags)
 {
 	struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
-	unsigned long priv;
 
 	_enter("{{%llx:%llu}[%lu],%lx},%x",
 	       vnode->fid.vid, vnode->fid.vnode, page->index, page->flags,
@@ -594,9 +593,8 @@ static int afs_releasepage(struct page *page, gfp_t gfp_flags)
 	/* deny if page is being written to the cache and the caller hasn't
 	 * elected to wait */
 	if (PagePrivate(page)) {
-		priv = (unsigned long)detach_page_private(page);
-		trace_afs_page_dirty(vnode, tracepoint_string("rel"),
-				     page->index, priv);
+		detach_page_private(page);
+		trace_afs_page_dirty(vnode, tracepoint_string("rel"), page);
 	}
 
 	/* indicate that the page can be released */
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 12bb08eaeeff..9f75d99f5b2e 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -874,31 +874,31 @@ struct afs_vnode_cache_aux {
 #define __AFS_PAGE_PRIV_MMAPPED	0x8000UL
 #endif
 
-static inline unsigned int afs_page_dirty_resolution(void)
+static inline unsigned int afs_page_dirty_resolution(struct page *page)
 {
-	int shift = PAGE_SHIFT - (__AFS_PAGE_PRIV_SHIFT - 1);
+	int shift = thp_order(page) + PAGE_SHIFT - (__AFS_PAGE_PRIV_SHIFT - 1);
 	return (shift > 0) ? shift : 0;
 }
 
-static inline size_t afs_page_dirty_from(unsigned long priv)
+static inline size_t afs_page_dirty_from(struct page *page, unsigned long priv)
 {
 	unsigned long x = priv & __AFS_PAGE_PRIV_MASK;
 
 	/* The lower bound is inclusive */
-	return x << afs_page_dirty_resolution();
+	return x << afs_page_dirty_resolution(page);
 }
 
-static inline size_t afs_page_dirty_to(unsigned long priv)
+static inline size_t afs_page_dirty_to(struct page *page, unsigned long priv)
 {
 	unsigned long x = (priv >> __AFS_PAGE_PRIV_SHIFT) & __AFS_PAGE_PRIV_MASK;
 
 	/* The upper bound is immediately beyond the region */
-	return (x + 1) << afs_page_dirty_resolution();
+	return (x + 1) << afs_page_dirty_resolution(page);
 }
 
-static inline unsigned long afs_page_dirty(size_t from, size_t to)
+static inline unsigned long afs_page_dirty(struct page *page, size_t from, size_t to)
 {
-	unsigned int res = afs_page_dirty_resolution();
+	unsigned int res = afs_page_dirty_resolution(page);
 	from >>= res;
 	to = (to - 1) >> res;
 	return (to << __AFS_PAGE_PRIV_SHIFT) | from;
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 92eaa88000d7..9d0cef35ecba 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -112,15 +112,14 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
 	t = f = 0;
 	if (PagePrivate(page)) {
 		priv = page_private(page);
-		f = afs_page_dirty_from(priv);
-		t = afs_page_dirty_to(priv);
+		f = afs_page_dirty_from(page, priv);
+		t = afs_page_dirty_to(page, priv);
 		ASSERTCMP(f, <=, t);
 	}
 
 	if (f != t) {
 		if (PageWriteback(page)) {
-			trace_afs_page_dirty(vnode, tracepoint_string("alrdy"),
-					     page->index, priv);
+			trace_afs_page_dirty(vnode, tracepoint_string("alrdy"), page);
 			goto flush_conflicting_write;
 		}
 		/* If the file is being filled locally, allow inter-write
@@ -204,21 +203,19 @@ int afs_write_end(struct file *file, struct address_space *mapping,
 
 	if (PagePrivate(page)) {
 		priv = page_private(page);
-		f = afs_page_dirty_from(priv);
-		t = afs_page_dirty_to(priv);
+		f = afs_page_dirty_from(page, priv);
+		t = afs_page_dirty_to(page, priv);
 		if (from < f)
 			f = from;
 		if (to > t)
 			t = to;
-		priv = afs_page_dirty(f, t);
+		priv = afs_page_dirty(page, f, t);
 		set_page_private(page, priv);
-		trace_afs_page_dirty(vnode, tracepoint_string("dirty+"),
-				     page->index, priv);
+		trace_afs_page_dirty(vnode, tracepoint_string("dirty+"), page);
 	} else {
-		priv = afs_page_dirty(from, to);
+		priv = afs_page_dirty(page, from, to);
 		attach_page_private(page, (void *)priv);
-		trace_afs_page_dirty(vnode, tracepoint_string("dirty"),
-				     page->index, priv);
+		trace_afs_page_dirty(vnode, tracepoint_string("dirty"), page);
 	}
 
 	set_page_dirty(page);
@@ -321,7 +318,6 @@ static void afs_pages_written_back(struct afs_vnode *vnode,
 				   pgoff_t first, pgoff_t last)
 {
 	struct pagevec pv;
-	unsigned long priv;
 	unsigned count, loop;
 
 	_enter("{%llx:%llu},{%lx-%lx}",
@@ -340,9 +336,9 @@ static void afs_pages_written_back(struct afs_vnode *vnode,
 		ASSERTCMP(pv.nr, ==, count);
 
 		for (loop = 0; loop < count; loop++) {
-			priv = (unsigned long)detach_page_private(pv.pages[loop]);
+			detach_page_private(pv.pages[loop]);
 			trace_afs_page_dirty(vnode, tracepoint_string("clear"),
-					     pv.pages[loop]->index, priv);
+					     pv.pages[loop]);
 			end_page_writeback(pv.pages[loop]);
 		}
 		first += count;
@@ -516,15 +512,13 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
 	 */
 	start = primary_page->index;
 	priv = page_private(primary_page);
-	offset = afs_page_dirty_from(priv);
-	to = afs_page_dirty_to(priv);
-	trace_afs_page_dirty(vnode, tracepoint_string("store"),
-			     primary_page->index, priv);
+	offset = afs_page_dirty_from(primary_page, priv);
+	to = afs_page_dirty_to(primary_page, priv);
+	trace_afs_page_dirty(vnode, tracepoint_string("store"), primary_page);
 
 	WARN_ON(offset == to);
 	if (offset == to)
-		trace_afs_page_dirty(vnode, tracepoint_string("WARN"),
-				     primary_page->index, priv);
+		trace_afs_page_dirty(vnode, tracepoint_string("WARN"), primary_page);
 
 	if (start >= final_page ||
 	    (to < PAGE_SIZE && !test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags)))
@@ -562,8 +556,8 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
 			}
 
 			priv = page_private(page);
-			f = afs_page_dirty_from(priv);
-			t = afs_page_dirty_to(priv);
+			f = afs_page_dirty_from(page, priv);
+			t = afs_page_dirty_to(page, priv);
 			if (f != 0 &&
 			    !test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags)) {
 				unlock_page(page);
@@ -571,8 +565,7 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
 			}
 			to = t;
 
-			trace_afs_page_dirty(vnode, tracepoint_string("store+"),
-					     page->index, priv);
+			trace_afs_page_dirty(vnode, tracepoint_string("store+"), page);
 
 			if (!clear_page_dirty_for_io(page))
 				BUG();
@@ -861,14 +854,13 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 	 */
 	wait_on_page_writeback(vmf->page);
 
-	priv = afs_page_dirty(0, PAGE_SIZE);
+	priv = afs_page_dirty(vmf->page, 0, PAGE_SIZE);
 	priv = afs_page_dirty_mmapped(priv);
-	trace_afs_page_dirty(vnode, tracepoint_string("mkwrite"),
-			     vmf->page->index, priv);
 	if (PagePrivate(vmf->page))
 		set_page_private(vmf->page, priv);
 	else
 		attach_page_private(vmf->page, (void *)priv);
+	trace_afs_page_dirty(vnode, tracepoint_string("mkwrite"), vmf->page);
 	file_update_time(file);
 
 	sb_end_pagefault(inode->i_sb);
@@ -921,17 +913,15 @@ int afs_launder_page(struct page *page)
 		f = 0;
 		t = PAGE_SIZE;
 		if (PagePrivate(page)) {
-			f = afs_page_dirty_from(priv);
-			t = afs_page_dirty_to(priv);
+			f = afs_page_dirty_from(page, priv);
+			t = afs_page_dirty_to(page, priv);
 		}
 
-		trace_afs_page_dirty(vnode, tracepoint_string("launder"),
-				     page->index, priv);
+		trace_afs_page_dirty(vnode, tracepoint_string("launder"), page);
 		ret = afs_store_data(mapping, page->index, page->index, t, f, true);
 	}
 
-	priv = (unsigned long)detach_page_private(page);
-	trace_afs_page_dirty(vnode, tracepoint_string("laundered"),
-			     page->index, priv);
+	detach_page_private(page);
+	trace_afs_page_dirty(vnode, tracepoint_string("laundered"), page);
 	return ret;
 }
diff --git a/include/trace/events/afs.h b/include/trace/events/afs.h
index 4eef374d4413..f0b2565db2d9 100644
--- a/include/trace/events/afs.h
+++ b/include/trace/events/afs.h
@@ -967,30 +967,33 @@ TRACE_EVENT(afs_dir_check_failed,
 	    );
 
 TRACE_EVENT(afs_page_dirty,
-	    TP_PROTO(struct afs_vnode *vnode, const char *where,
-		     pgoff_t page, unsigned long priv),
+	    TP_PROTO(struct afs_vnode *vnode, const char *where, struct page *page),
 
-	    TP_ARGS(vnode, where, page, priv),
+	    TP_ARGS(vnode, where, page),
 
 	    TP_STRUCT__entry(
 		    __field(struct afs_vnode *,		vnode		)
 		    __field(const char *,		where		)
 		    __field(pgoff_t,			page		)
-		    __field(unsigned long,		priv		)
+		    __field(unsigned long,		from		)
+		    __field(unsigned long,		to		)
 			     ),
 
 	    TP_fast_assign(
 		    __entry->vnode = vnode;
 		    __entry->where = where;
-		    __entry->page = page;
-		    __entry->priv = priv;
+		    __entry->page = page->index;
+		    __entry->from = afs_page_dirty_from(page, page->private);
+		    __entry->to = afs_page_dirty_to(page, page->private);
+		    __entry->to |= (afs_is_page_dirty_mmapped(page->private) ?
+				    (1UL << (BITS_PER_LONG - 1)) : 0);
 			   ),
 
-	    TP_printk("vn=%p %lx %s %zx-%zx%s",
+	    TP_printk("vn=%p %lx %s %lx-%lx%s",
 		      __entry->vnode, __entry->page, __entry->where,
-		      afs_page_dirty_from(__entry->priv),
-		      afs_page_dirty_to(__entry->priv),
-		      afs_is_page_dirty_mmapped(__entry->priv) ? " M" : "")
+		      __entry->from,
+		      __entry->to & ~(1UL << (BITS_PER_LONG - 1)),
+		      __entry->to & (1UL << (BITS_PER_LONG - 1)) ? " M" : "")
 	    );
 
 TRACE_EVENT(afs_call_state,



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 57/76] afs: Print the operation debug_id when logging an unexpected data version
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (54 preceding siblings ...)
  2020-11-20 15:14 ` [RFC PATCH 56/76] afs: Pass page into dirty region helpers to provide THP size David Howells
@ 2020-11-20 15:14 ` David Howells
  2020-11-20 15:14 ` [RFC PATCH 58/76] afs: Move key to afs_read struct David Howells
                   ` (18 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:14 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Print the afs_operation debug_id when logging an unexpected change in the
data version.  This allows the logged message to be matched against
tracelines.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/inode.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 3930f051f39a..177baeea5a1e 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -215,11 +215,12 @@ static void afs_apply_status(struct afs_operation *op,
 
 	if (vp->dv_before + vp->dv_delta != status->data_version) {
 		if (test_bit(AFS_VNODE_CB_PROMISED, &vnode->flags))
-			pr_warn("kAFS: vnode modified {%llx:%llu} %llx->%llx %s\n",
+			pr_warn("kAFS: vnode modified {%llx:%llu} %llx->%llx %s (op=%x)\n",
 				vnode->fid.vid, vnode->fid.vnode,
 				(unsigned long long)vp->dv_before + vp->dv_delta,
 				(unsigned long long)status->data_version,
-				op->type ? op->type->name : "???");
+				op->type ? op->type->name : "???",
+				op->debug_id);
 
 		vnode->invalid_before = status->data_version;
 		if (vnode->status.type == AFS_FTYPE_DIR) {



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 58/76] afs: Move key to afs_read struct
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (55 preceding siblings ...)
  2020-11-20 15:14 ` [RFC PATCH 57/76] afs: Print the operation debug_id when logging an unexpected data version David Howells
@ 2020-11-20 15:14 ` David Howells
  2020-11-20 15:15 ` [RFC PATCH 59/76] afs: Don't truncate iter during data fetch David Howells
                   ` (17 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:14 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Stash the key used to authenticate read operations in the afs_read struct.
This will be necessary to reissue the operation against the server if a
read from the cache fails in upcoming cache changes.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/dir.c      |    3 ++-
 fs/afs/file.c     |   16 +++++++++-------
 fs/afs/internal.h |    3 ++-
 fs/afs/write.c    |   12 ++++++------
 4 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 1bb5b9d7f0a2..ec4be0d50d2f 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -241,6 +241,7 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key)
 		return ERR_PTR(-ENOMEM);
 
 	refcount_set(&req->usage, 1);
+	req->key = key_get(key);
 	req->nr_pages = nr_pages;
 	req->actual_len = i_size; /* May change */
 	req->len = nr_pages * PAGE_SIZE; /* We can ask for more than there is */
@@ -305,7 +306,7 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key)
 
 	if (!test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) {
 		trace_afs_reload_dir(dvnode);
-		ret = afs_fetch_data(dvnode, key, req);
+		ret = afs_fetch_data(dvnode, req);
 		if (ret < 0)
 			goto error_unlock;
 
diff --git a/fs/afs/file.c b/fs/afs/file.c
index c06197e4339b..55f4c6cd617f 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -212,6 +212,7 @@ void afs_put_read(struct afs_read *req)
 			if (req->pages != req->array)
 				kfree(req->pages);
 		}
+		key_put(req->key);
 		kfree(req);
 	}
 }
@@ -242,7 +243,7 @@ static const struct afs_operation_ops afs_fetch_data_operation = {
 /*
  * Fetch file data from the volume.
  */
-int afs_fetch_data(struct afs_vnode *vnode, struct key *key, struct afs_read *req)
+int afs_fetch_data(struct afs_vnode *vnode, struct afs_read *req)
 {
 	struct afs_operation *op;
 
@@ -251,9 +252,9 @@ int afs_fetch_data(struct afs_vnode *vnode, struct key *key, struct afs_read *re
 	       vnode->fid.vid,
 	       vnode->fid.vnode,
 	       vnode->fid.unique,
-	       key_serial(key));
+	       key_serial(req->key));
 
-	op = afs_alloc_operation(key, vnode->volume);
+	op = afs_alloc_operation(req->key, vnode->volume);
 	if (IS_ERR(op))
 		return PTR_ERR(op);
 
@@ -292,6 +293,7 @@ int afs_page_filler(void *data, struct page *page)
 	 * unmarshalling code will clear the unfilled space.
 	 */
 	refcount_set(&req->usage, 1);
+	req->key = key_get(key);
 	req->pos = (loff_t)page->index << PAGE_SHIFT;
 	req->len = PAGE_SIZE;
 	req->nr_pages = 1;
@@ -301,7 +303,7 @@ int afs_page_filler(void *data, struct page *page)
 
 	/* read the contents of the file from the server into the
 	 * page */
-	ret = afs_fetch_data(vnode, key, req);
+	ret = afs_fetch_data(vnode, req);
 	afs_put_read(req);
 
 	if (ret < 0) {
@@ -386,7 +388,6 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping,
 	struct afs_read *req;
 	struct list_head *p;
 	struct page *first, *page;
-	struct key *key = afs_file_key(file);
 	pgoff_t index;
 	int ret, n, i;
 
@@ -410,6 +411,7 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping,
 
 	refcount_set(&req->usage, 1);
 	req->vnode = vnode;
+	req->key = key_get(afs_file_key(file));
 	req->page_done = afs_readpages_page_done;
 	req->pos = first->index;
 	req->pos <<= PAGE_SHIFT;
@@ -439,11 +441,11 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping,
 	} while (req->nr_pages < n);
 
 	if (req->nr_pages == 0) {
-		kfree(req);
+		afs_put_read(req);
 		return 0;
 	}
 
-	ret = afs_fetch_data(vnode, key, req);
+	ret = afs_fetch_data(vnode, req);
 	if (ret < 0)
 		goto error;
 
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 9f75d99f5b2e..faddb0bc3559 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -204,6 +204,7 @@ struct afs_read {
 	loff_t			actual_len;	/* How much we're actually getting */
 	loff_t			remain;		/* Amount remaining */
 	loff_t			file_size;	/* File size returned by server */
+	struct key		*key;		/* The key to use to reissue the read */
 	afs_dataversion_t	data_version;	/* Version number returned by server */
 	refcount_t		usage;
 	unsigned int		index;		/* Which page we're reading into */
@@ -1037,7 +1038,7 @@ extern int afs_cache_wb_key(struct afs_vnode *, struct afs_file *);
 extern void afs_put_wb_key(struct afs_wb_key *);
 extern int afs_open(struct inode *, struct file *);
 extern int afs_release(struct inode *, struct file *);
-extern int afs_fetch_data(struct afs_vnode *, struct key *, struct afs_read *);
+extern int afs_fetch_data(struct afs_vnode *, struct afs_read *);
 extern int afs_page_filler(void *, struct page *);
 extern void afs_put_read(struct afs_read *);
 
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 9d0cef35ecba..7eba0d3201ba 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -25,9 +25,10 @@ int afs_set_page_dirty(struct page *page)
 /*
  * partly or wholly fill a page that's under preparation for writing
  */
-static int afs_fill_page(struct afs_vnode *vnode, struct key *key,
+static int afs_fill_page(struct file *file,
 			 loff_t pos, unsigned int len, struct page *page)
 {
+	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
 	struct afs_read *req;
 	size_t p;
 	void *data;
@@ -49,6 +50,7 @@ static int afs_fill_page(struct afs_vnode *vnode, struct key *key,
 		return -ENOMEM;
 
 	refcount_set(&req->usage, 1);
+	req->key = key_get(afs_file_key(file));
 	req->pos = pos;
 	req->len = len;
 	req->nr_pages = 1;
@@ -56,7 +58,7 @@ static int afs_fill_page(struct afs_vnode *vnode, struct key *key,
 	req->pages[0] = page;
 	get_page(page);
 
-	ret = afs_fetch_data(vnode, key, req);
+	ret = afs_fetch_data(vnode, req);
 	afs_put_read(req);
 	if (ret < 0) {
 		if (ret == -ENOENT) {
@@ -80,7 +82,6 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
 {
 	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
 	struct page *page;
-	struct key *key = afs_file_key(file);
 	unsigned long priv;
 	unsigned f, from = pos & (PAGE_SIZE - 1);
 	unsigned t, to = from + len;
@@ -95,7 +96,7 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
 		return -ENOMEM;
 
 	if (!PageUptodate(page) && len != PAGE_SIZE) {
-		ret = afs_fill_page(vnode, key, pos & PAGE_MASK, PAGE_SIZE, page);
+		ret = afs_fill_page(file, pos & PAGE_MASK, PAGE_SIZE, page);
 		if (ret < 0) {
 			unlock_page(page);
 			put_page(page);
@@ -163,7 +164,6 @@ int afs_write_end(struct file *file, struct address_space *mapping,
 		  struct page *page, void *fsdata)
 {
 	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
-	struct key *key = afs_file_key(file);
 	unsigned long priv;
 	unsigned int f, from = pos & (PAGE_SIZE - 1);
 	unsigned int t, to = from + copied;
@@ -193,7 +193,7 @@ int afs_write_end(struct file *file, struct address_space *mapping,
 			 * unmarshalling routine will take care of clearing any
 			 * bits that are beyond the EOF.
 			 */
-			ret = afs_fill_page(vnode, key, pos + copied,
+			ret = afs_fill_page(file, pos + copied,
 					    len - copied, page);
 			if (ret < 0)
 				goto out;



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 59/76] afs: Don't truncate iter during data fetch
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (56 preceding siblings ...)
  2020-11-20 15:14 ` [RFC PATCH 58/76] afs: Move key to afs_read struct David Howells
@ 2020-11-20 15:15 ` David Howells
  2020-11-20 15:15 ` [RFC PATCH 60/76] afs: Log remote unmarshalling errors David Howells
                   ` (16 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:15 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Don't truncate the iterator to correspond to the actual data size when
fetching the data from the server - rather, pass the length we want to read
to rxrpc.

This will allow the clear-after-read code in future to simply clear the
remaining iterator capacity rather than having to reinitialise the
iterator.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/fsclient.c      |    6 ++++--
 fs/afs/internal.h      |    6 ++++++
 fs/afs/rxrpc.c         |   13 +++++++++----
 fs/afs/yfsclient.c     |    6 ++++--
 include/net/af_rxrpc.h |    2 +-
 net/rxrpc/recvmsg.c    |    9 +++++----
 6 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 1d95ed9dd86e..4a57c6c6f12b 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -305,8 +305,9 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
 	unsigned int size;
 	int ret;
 
-	_enter("{%u,%zu/%llu}",
-	       call->unmarshall, iov_iter_count(call->iter), req->actual_len);
+	_enter("{%u,%zu,%zu/%llu}",
+	       call->unmarshall, call->iov_len, iov_iter_count(call->iter),
+	       req->actual_len);
 
 	switch (call->unmarshall) {
 	case 0:
@@ -343,6 +344,7 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
 			size = PAGE_SIZE - req->offset;
 		else
 			size = req->remain;
+		call->iov_len = size;
 		call->bvec[0].bv_len = size;
 		call->bvec[0].bv_offset = req->offset;
 		call->bvec[0].bv_page = req->pages[req->index];
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index faddb0bc3559..ea5b780518c7 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -104,6 +104,7 @@ struct afs_call {
 	struct afs_server	*server;	/* The fileserver record if fs op (pins ref) */
 	struct afs_vlserver	*vlserver;	/* The vlserver record if vl op */
 	void			*request;	/* request data (first part) */
+	size_t			iov_len;	/* Size of *iter to be used */
 	struct iov_iter		def_iter;	/* Default buffer/data iterator */
 	struct iov_iter		*iter;		/* Iterator currently in use */
 	union {	/* Convenience for ->def_iter */
@@ -1262,6 +1263,7 @@ static inline void afs_make_op_call(struct afs_operation *op, struct afs_call *c
 
 static inline void afs_extract_begin(struct afs_call *call, void *buf, size_t size)
 {
+	call->iov_len = size;
 	call->kvec[0].iov_base = buf;
 	call->kvec[0].iov_len = size;
 	iov_iter_kvec(&call->def_iter, READ, call->kvec, 1, size);
@@ -1269,21 +1271,25 @@ static inline void afs_extract_begin(struct afs_call *call, void *buf, size_t si
 
 static inline void afs_extract_to_tmp(struct afs_call *call)
 {
+	call->iov_len = sizeof(call->tmp);
 	afs_extract_begin(call, &call->tmp, sizeof(call->tmp));
 }
 
 static inline void afs_extract_to_tmp64(struct afs_call *call)
 {
+	call->iov_len = sizeof(call->tmp64);
 	afs_extract_begin(call, &call->tmp64, sizeof(call->tmp64));
 }
 
 static inline void afs_extract_discard(struct afs_call *call, size_t size)
 {
+	call->iov_len = size;
 	iov_iter_discard(&call->def_iter, READ, size);
 }
 
 static inline void afs_extract_to_buf(struct afs_call *call, size_t size)
 {
+	call->iov_len = size;
 	afs_extract_begin(call, call->buffer, size);
 }
 
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 8be709cb8542..0ec38b758f29 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -363,6 +363,7 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp)
 	struct rxrpc_call *rxcall;
 	struct msghdr msg;
 	struct kvec iov[1];
+	size_t len;
 	s64 tx_total_len;
 	int ret;
 
@@ -466,9 +467,10 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp)
 		rxrpc_kernel_abort_call(call->net->socket, rxcall,
 					RX_USER_ABORT, ret, "KSD");
 	} else {
+		len = 0;
 		iov_iter_kvec(&msg.msg_iter, READ, NULL, 0, 0);
 		rxrpc_kernel_recv_data(call->net->socket, rxcall,
-				       &msg.msg_iter, false,
+				       &msg.msg_iter, &len, false,
 				       &call->abort_code, &call->service_id);
 		ac->abort_code = call->abort_code;
 		ac->responded = true;
@@ -504,6 +506,7 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp)
 static void afs_deliver_to_call(struct afs_call *call)
 {
 	enum afs_call_state state;
+	size_t len;
 	u32 abort_code, remote_abort = 0;
 	int ret;
 
@@ -516,10 +519,11 @@ static void afs_deliver_to_call(struct afs_call *call)
 	       state == AFS_CALL_SV_AWAIT_ACK
 	       ) {
 		if (state == AFS_CALL_SV_AWAIT_ACK) {
+			len = 0;
 			iov_iter_kvec(&call->def_iter, READ, NULL, 0, 0);
 			ret = rxrpc_kernel_recv_data(call->net->socket,
 						     call->rxcall, &call->def_iter,
-						     false, &remote_abort,
+						     &len, false, &remote_abort,
 						     &call->service_id);
 			trace_afs_receive_data(call, &call->def_iter, false, ret);
 
@@ -929,10 +933,11 @@ int afs_extract_data(struct afs_call *call, bool want_more)
 	u32 remote_abort = 0;
 	int ret;
 
-	_enter("{%s,%zu},%d", call->type->name, iov_iter_count(iter), want_more);
+	_enter("{%s,%zu,%zu},%d",
+	       call->type->name, call->iov_len, iov_iter_count(iter), want_more);
 
 	ret = rxrpc_kernel_recv_data(net->socket, call->rxcall, iter,
-				     want_more, &remote_abort,
+				     &call->iov_len, want_more, &remote_abort,
 				     &call->service_id);
 	if (ret == 0 || ret == -EAGAIN)
 		return ret;
diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c
index bd787e71a657..6c45d32da13c 100644
--- a/fs/afs/yfsclient.c
+++ b/fs/afs/yfsclient.c
@@ -363,8 +363,9 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call *call)
 	unsigned int size;
 	int ret;
 
-	_enter("{%u,%zu/%llu}",
-	       call->unmarshall, iov_iter_count(call->iter), req->actual_len);
+	_enter("{%u,%zu, %zu/%llu}",
+	       call->unmarshall, call->iov_len, iov_iter_count(call->iter),
+	       req->actual_len);
 
 	switch (call->unmarshall) {
 	case 0:
@@ -396,6 +397,7 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call *call)
 			size = PAGE_SIZE - req->offset;
 		else
 			size = req->remain;
+		call->iov_len = size;
 		call->bvec[0].bv_len = size;
 		call->bvec[0].bv_offset = req->offset;
 		call->bvec[0].bv_page = req->pages[req->index];
diff --git a/include/net/af_rxrpc.h b/include/net/af_rxrpc.h
index f6abcc0bbd6e..cee5f83c0f11 100644
--- a/include/net/af_rxrpc.h
+++ b/include/net/af_rxrpc.h
@@ -53,7 +53,7 @@ int rxrpc_kernel_send_data(struct socket *, struct rxrpc_call *,
 			   struct msghdr *, size_t,
 			   rxrpc_notify_end_tx_t);
 int rxrpc_kernel_recv_data(struct socket *, struct rxrpc_call *,
-			   struct iov_iter *, bool, u32 *, u16 *);
+			   struct iov_iter *, size_t *, bool, u32 *, u16 *);
 bool rxrpc_kernel_abort_call(struct socket *, struct rxrpc_call *,
 			     u32, int, const char *);
 void rxrpc_kernel_end_call(struct socket *, struct rxrpc_call *);
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 2c842851d72e..876833feac53 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -669,6 +669,7 @@ int rxrpc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
  * @sock: The socket that the call exists on
  * @call: The call to send data through
  * @iter: The buffer to receive into
+ * @_len: The amount of data we want to receive (decreased on return)
  * @want_more: True if more data is expected to be read
  * @_abort: Where the abort code is stored if -ECONNABORTED is returned
  * @_service: Where to store the actual service ID (may be upgraded)
@@ -684,7 +685,7 @@ int rxrpc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
  * *_abort should also be initialised to 0.
  */
 int rxrpc_kernel_recv_data(struct socket *sock, struct rxrpc_call *call,
-			   struct iov_iter *iter,
+			   struct iov_iter *iter, size_t *_len,
 			   bool want_more, u32 *_abort, u16 *_service)
 {
 	size_t offset = 0;
@@ -692,7 +693,7 @@ int rxrpc_kernel_recv_data(struct socket *sock, struct rxrpc_call *call,
 
 	_enter("{%d,%s},%zu,%d",
 	       call->debug_id, rxrpc_call_states[call->state],
-	       iov_iter_count(iter), want_more);
+	       *_len, want_more);
 
 	ASSERTCMP(call->state, !=, RXRPC_CALL_SERVER_SECURING);
 
@@ -703,8 +704,8 @@ int rxrpc_kernel_recv_data(struct socket *sock, struct rxrpc_call *call,
 	case RXRPC_CALL_SERVER_RECV_REQUEST:
 	case RXRPC_CALL_SERVER_ACK_REQUEST:
 		ret = rxrpc_recvmsg_data(sock, call, NULL, iter,
-					 iov_iter_count(iter), 0,
-					 &offset);
+					 *_len, 0, &offset);
+		*_len -= offset;
 		if (ret < 0)
 			goto out;
 



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 60/76] afs: Log remote unmarshalling errors
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (57 preceding siblings ...)
  2020-11-20 15:15 ` [RFC PATCH 59/76] afs: Don't truncate iter during data fetch David Howells
@ 2020-11-20 15:15 ` David Howells
  2020-11-20 15:15 ` [RFC PATCH 61/76] afs: Set up the iov_iter before calling afs_extract_data() David Howells
                   ` (15 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:15 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Log unmarshalling errors reported by the peer (ie. it can't parse what we
sent it).  Limit the maximum number of messages to 3.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/rxrpc.c |   34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 0ec38b758f29..ae68576f822f 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -500,6 +500,39 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp)
 	_leave(" = %d", ret);
 }
 
+/*
+ * Log remote abort codes that indicate that we have a protocol disagreement
+ * with the server.
+ */
+static void afs_log_error(struct afs_call *call, s32 remote_abort)
+{
+	static int max = 0;
+	const char *msg;
+	int m;
+
+	switch (remote_abort) {
+	case RX_EOF:		 msg = "unexpected EOF";	break;
+	case RXGEN_CC_MARSHAL:	 msg = "client marshalling";	break;
+	case RXGEN_CC_UNMARSHAL: msg = "client unmarshalling";	break;
+	case RXGEN_SS_MARSHAL:	 msg = "server marshalling";	break;
+	case RXGEN_SS_UNMARSHAL: msg = "server unmarshalling";	break;
+	case RXGEN_DECODE:	 msg = "opcode decode";		break;
+	case RXGEN_SS_XDRFREE:	 msg = "server XDR cleanup";	break;
+	case RXGEN_CC_XDRFREE:	 msg = "client XDR cleanup";	break;
+	case -32:		 msg = "insufficient data";	break;
+	default:
+		return;
+	}
+
+	m = max;
+	if (m < 3) {
+		max = m + 1;
+		pr_notice("kAFS: Peer reported %s failure on %s [%pISp]\n",
+			  msg, call->type->name,
+			  &call->alist->addrs[call->addr_ix].transport);
+	}
+}
+
 /*
  * deliver messages to a call
  */
@@ -563,6 +596,7 @@ static void afs_deliver_to_call(struct afs_call *call)
 			goto out;
 		case -ECONNABORTED:
 			ASSERTCMP(state, ==, AFS_CALL_COMPLETE);
+			afs_log_error(call, call->abort_code);
 			goto done;
 		case -ENOTSUPP:
 			abort_code = RXGEN_OPCODE;



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 61/76] afs: Set up the iov_iter before calling afs_extract_data()
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (58 preceding siblings ...)
  2020-11-20 15:15 ` [RFC PATCH 60/76] afs: Log remote unmarshalling errors David Howells
@ 2020-11-20 15:15 ` David Howells
  2020-11-20 15:15 ` [RFC PATCH 62/76] afs: Use ITER_XARRAY for writing David Howells
                   ` (14 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:15 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

afs_extract_data() sets up a temporary iov_iter and passes it to AF_RXRPC
each time it is called to describe the remaining buffer to be filled.

Instead:

 (1) Put an iterator in the afs_call struct.

 (2) Set the iterator for each marshalling stage to load data into the
     appropriate places.  A number of convenience functions are provided to
     this end (eg. afs_extract_to_buf()).

     This iterator is then passed to afs_extract_data().

 (3) Use the new ITER_MAPPING iterator when reading data to load directly
     into the inode's pages without needing to create a list of them.

This will allow O_DIRECT calls to be supported in future patches.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/dir.c       |  222 +++++++++++++++++++++++++++++++++++-----------------
 fs/afs/file.c      |  190 ++++++++++++++++++++++++++-------------------
 fs/afs/fsclient.c  |   54 +++----------
 fs/afs/internal.h  |   16 ++--
 fs/afs/write.c     |   27 ++++--
 fs/afs/yfsclient.c |   54 +++----------
 6 files changed, 314 insertions(+), 249 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index ec4be0d50d2f..215f73121391 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -102,6 +102,35 @@ struct afs_lookup_cookie {
 	struct afs_fid		fids[50];
 };
 
+/*
+ * Drop the refs that we're holding on the pages we were reading into.  We've
+ * got refs on the first nr_pages pages.
+ */
+static void afs_dir_read_cleanup(struct afs_read *req)
+{
+	struct address_space *mapping = req->vnode->vfs_inode.i_mapping;
+	struct page *page;
+	pgoff_t last = req->nr_pages - 1;
+
+	XA_STATE(xas, &mapping->i_pages, 0);
+
+	if (unlikely(!req->nr_pages))
+		return;
+
+	rcu_read_lock();
+	xas_for_each(&xas, page, last) {
+		if (xas_retry(&xas, page))
+			continue;
+		BUG_ON(xa_is_value(page));
+		BUG_ON(PageCompound(page));
+		ASSERTCMP(page->mapping, ==, mapping);
+
+		put_page(page);
+	}
+
+	rcu_read_unlock();
+}
+
 /*
  * check that a directory page is valid
  */
@@ -127,7 +156,7 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, struct page *page,
 	qty /= sizeof(union afs_xdr_dir_block);
 
 	/* check them */
-	dbuf = kmap(page);
+	dbuf = kmap_atomic(page);
 	for (tmp = 0; tmp < qty; tmp++) {
 		if (dbuf->blocks[tmp].hdr.magic != AFS_DIR_MAGIC) {
 			printk("kAFS: %s(%lx): bad magic %d/%d is %04hx\n",
@@ -146,7 +175,7 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, struct page *page,
 		((u8 *)&dbuf->blocks[tmp])[AFS_DIR_BLOCK_SIZE - 1] = 0;
 	}
 
-	kunmap(page);
+	kunmap_atomic(dbuf);
 
 checked:
 	afs_stat_v(dvnode, n_read_dir);
@@ -157,35 +186,74 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, struct page *page,
 }
 
 /*
- * Check the contents of a directory that we've just read.
+ * Dump the contents of a directory.
  */
-static bool afs_dir_check_pages(struct afs_vnode *dvnode, struct afs_read *req)
+static void afs_dir_dump(struct afs_vnode *dvnode, struct afs_read *req)
 {
 	struct afs_xdr_dir_page *dbuf;
-	unsigned int i, j, qty = PAGE_SIZE / sizeof(union afs_xdr_dir_block);
+	struct address_space *mapping = dvnode->vfs_inode.i_mapping;
+	struct page *page;
+	unsigned int i, qty = PAGE_SIZE / sizeof(union afs_xdr_dir_block);
+	pgoff_t last = req->nr_pages - 1;
 
-	for (i = 0; i < req->nr_pages; i++)
-		if (!afs_dir_check_page(dvnode, req->pages[i], req->actual_len))
-			goto bad;
-	return true;
+	XA_STATE(xas, &mapping->i_pages, 0);
 
-bad:
-	pr_warn("DIR %llx:%llx f=%llx l=%llx al=%llx r=%llx\n",
+	pr_warn("DIR %llx:%llx f=%llx l=%llx al=%llx\n",
 		dvnode->fid.vid, dvnode->fid.vnode,
-		req->file_size, req->len, req->actual_len, req->remain);
-	pr_warn("DIR %llx %x %x %x\n",
-		req->pos, req->index, req->nr_pages, req->offset);
+		req->file_size, req->len, req->actual_len);
+	pr_warn("DIR %llx %x %zx %zx\n",
+		req->pos, req->nr_pages,
+		req->iter->iov_offset,  iov_iter_count(req->iter));
 
-	for (i = 0; i < req->nr_pages; i++) {
-		dbuf = kmap(req->pages[i]);
-		for (j = 0; j < qty; j++) {
-			union afs_xdr_dir_block *block = &dbuf->blocks[j];
+	xas_for_each(&xas, page, last) {
+		if (xas_retry(&xas, page))
+			continue;
+
+		BUG_ON(PageCompound(page));
+		BUG_ON(page->mapping != mapping);
+
+		dbuf = kmap_atomic(page);
+		for (i = 0; i < qty; i++) {
+			union afs_xdr_dir_block *block = &dbuf->blocks[i];
 
-			pr_warn("[%02x] %32phN\n", i * qty + j, block);
+			pr_warn("[%02lx] %32phN\n", page->index * qty + i, block);
 		}
-		kunmap(req->pages[i]);
+		kunmap_atomic(dbuf);
 	}
-	return false;
+}
+
+/*
+ * Check all the pages in a directory.  All the pages are held pinned.
+ */
+static int afs_dir_check(struct afs_vnode *dvnode, struct afs_read *req)
+{
+	struct address_space *mapping = dvnode->vfs_inode.i_mapping;
+	struct page *page;
+	pgoff_t last = req->nr_pages - 1;
+	int ret = 0;
+
+	XA_STATE(xas, &mapping->i_pages, 0);
+
+	if (unlikely(!req->nr_pages))
+		return 0;
+
+	rcu_read_lock();
+	xas_for_each(&xas, page, last) {
+		if (xas_retry(&xas, page))
+			continue;
+
+		BUG_ON(PageCompound(page));
+		BUG_ON(page->mapping != mapping);
+
+		ret = afs_dir_check_page(dvnode, page, req->file_size);
+		if (ret < 0) {
+			afs_dir_dump(dvnode, req);
+			break;
+		}
+	}
+
+	rcu_read_unlock();
+	return ret;
 }
 
 /*
@@ -214,58 +282,57 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key)
 {
 	struct afs_read *req;
 	loff_t i_size;
-	int nr_pages, nr_inline, i, n;
-	int ret = -ENOMEM;
+	int nr_pages, i, n;
+	int ret;
+
+	_enter("");
 
-retry:
+	req = kzalloc(sizeof(*req), GFP_KERNEL);
+	if (!req)
+		return ERR_PTR(-ENOMEM);
+
+	refcount_set(&req->usage, 1);
+	req->vnode = dvnode;
+	req->key = key_get(key);
+	req->cleanup = afs_dir_read_cleanup;
+
+expand:
 	i_size = i_size_read(&dvnode->vfs_inode);
-	if (i_size < 2048)
-		return ERR_PTR(afs_bad(dvnode, afs_file_error_dir_small));
+	if (i_size < 2048) {
+		ret = afs_bad(dvnode, afs_file_error_dir_small);
+		goto error;
+	}
 	if (i_size > 2048 * 1024) {
 		trace_afs_file_error(dvnode, -EFBIG, afs_file_error_dir_big);
-		return ERR_PTR(-EFBIG);
+		ret = -EFBIG;
+		goto error;
 	}
 
 	_enter("%llu", i_size);
 
-	/* Get a request record to hold the page list.  We want to hold it
-	 * inline if we can, but we don't want to make an order 1 allocation.
-	 */
 	nr_pages = (i_size + PAGE_SIZE - 1) / PAGE_SIZE;
-	nr_inline = nr_pages;
-	if (nr_inline > (PAGE_SIZE - sizeof(*req)) / sizeof(struct page *))
-		nr_inline = 0;
 
-	req = kzalloc(struct_size(req, array, nr_inline), GFP_KERNEL);
-	if (!req)
-		return ERR_PTR(-ENOMEM);
-
-	refcount_set(&req->usage, 1);
-	req->key = key_get(key);
-	req->nr_pages = nr_pages;
 	req->actual_len = i_size; /* May change */
 	req->len = nr_pages * PAGE_SIZE; /* We can ask for more than there is */
 	req->data_version = dvnode->status.data_version; /* May change */
-	if (nr_inline > 0) {
-		req->pages = req->array;
-	} else {
-		req->pages = kcalloc(nr_pages, sizeof(struct page *),
-				     GFP_KERNEL);
-		if (!req->pages)
-			goto error;
-	}
+	iov_iter_xarray(&req->def_iter, READ, &dvnode->vfs_inode.i_mapping->i_pages,
+			0, i_size);
+	req->iter = &req->def_iter;
 
-	/* Get a list of all the pages that hold or will hold the directory
-	 * content.  We need to fill in any gaps that we might find where the
-	 * memory reclaimer has been at work.  If there are any gaps, we will
+	/* Fill in any gaps that we might find where the memory reclaimer has
+	 * been at work and pin all the pages.  If there are any gaps, we will
 	 * need to reread the entire directory contents.
 	 */
-	i = 0;
-	do {
+	i = req->nr_pages;
+	while (i < nr_pages) {
+		struct page *pages[8], *page;
+
 		n = find_get_pages_contig(dvnode->vfs_inode.i_mapping, i,
-					  req->nr_pages - i,
-					  req->pages + i);
-		_debug("find %u at %u/%u", n, i, req->nr_pages);
+					  min_t(unsigned int, nr_pages - i,
+						ARRAY_SIZE(pages)),
+					  pages);
+		_debug("find %u at %u/%u", n, i, nr_pages);
+
 		if (n == 0) {
 			gfp_t gfp = dvnode->vfs_inode.i_mapping->gfp_mask;
 
@@ -273,22 +340,24 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key)
 				afs_stat_v(dvnode, n_inval);
 
 			ret = -ENOMEM;
-			req->pages[i] = __page_cache_alloc(gfp);
-			if (!req->pages[i])
+			page = __page_cache_alloc(gfp);
+			if (!page)
 				goto error;
-			ret = add_to_page_cache_lru(req->pages[i],
+			ret = add_to_page_cache_lru(page,
 						    dvnode->vfs_inode.i_mapping,
 						    i, gfp);
 			if (ret < 0)
 				goto error;
 
-			attach_page_private(req->pages[i], (void *)1);
-			unlock_page(req->pages[i]);
+			attach_page_private(page, (void *)1);
+			unlock_page(page);
+			req->nr_pages++;
 			i++;
 		} else {
+			req->nr_pages += n;
 			i += n;
 		}
-	} while (i < req->nr_pages);
+	}
 
 	/* If we're going to reload, we need to lock all the pages to prevent
 	 * races.
@@ -312,12 +381,17 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key)
 
 		task_io_account_read(PAGE_SIZE * req->nr_pages);
 
-		if (req->len < req->file_size)
-			goto content_has_grown;
+		if (req->len < req->file_size) {
+			/* The content has grown, so we need to expand the
+			 * buffer.
+			 */
+			up_write(&dvnode->validate_lock);
+			goto expand;
+		}
 
 		/* Validate the data we just read. */
-		ret = -EIO;
-		if (!afs_dir_check_pages(dvnode, req))
+		ret = afs_dir_check(dvnode, req);
+		if (ret < 0)
 			goto error_unlock;
 
 		// TODO: Trim excess pages
@@ -335,11 +409,6 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key)
 	afs_put_read(req);
 	_leave(" = %d", ret);
 	return ERR_PTR(ret);
-
-content_has_grown:
-	up_write(&dvnode->validate_lock);
-	afs_put_read(req);
-	goto retry;
 }
 
 /*
@@ -446,6 +515,7 @@ static int afs_dir_iterate(struct inode *dir, struct dir_context *ctx,
 	struct afs_read *req;
 	struct page *page;
 	unsigned blkoff, limit;
+	void __rcu **slot;
 	int ret;
 
 	_enter("{%lu},%u,,", dir->i_ino, (unsigned)ctx->pos);
@@ -470,9 +540,15 @@ static int afs_dir_iterate(struct inode *dir, struct dir_context *ctx,
 		blkoff = ctx->pos & ~(sizeof(union afs_xdr_dir_block) - 1);
 
 		/* Fetch the appropriate page from the directory and re-add it
-		 * to the LRU.
+		 * to the LRU.  We have all the pages pinned with an extra ref.
 		 */
-		page = req->pages[blkoff / PAGE_SIZE];
+		rcu_read_lock();
+		page = NULL;
+		slot = radix_tree_lookup_slot(&dvnode->vfs_inode.i_mapping->i_pages,
+					      blkoff / PAGE_SIZE);
+		if (slot)
+			page = radix_tree_deref_slot(slot);
+		rcu_read_unlock();
 		if (!page) {
 			ret = afs_bad(dvnode, afs_file_error_dir_missing_page);
 			break;
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 55f4c6cd617f..3f409ddc1c1f 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -197,21 +197,72 @@ int afs_release(struct inode *inode, struct file *file)
 	return ret;
 }
 
+/*
+ * Handle completion of a read operation.
+ */
+static void afs_file_read_done(struct afs_read *req)
+{
+	struct afs_vnode *vnode = req->vnode;
+	struct page *page;
+	pgoff_t index = req->pos >> PAGE_SHIFT;
+	pgoff_t last = index + req->nr_pages - 1;
+
+	XA_STATE(xas, &vnode->vfs_inode.i_mapping->i_pages, index);
+
+	if (iov_iter_count(req->iter) > 0) {
+		/* The read was short - clear the excess buffer. */
+		_debug("afterclear %zx %zx %llx/%llx",
+		       req->iter->iov_offset,
+		       iov_iter_count(req->iter),
+		       req->actual_len, req->len);
+		iov_iter_zero(iov_iter_count(req->iter), req->iter);
+	}
+
+	rcu_read_lock();
+	xas_for_each(&xas, page, last) {
+		page_endio(page, false, 0);
+		put_page(page);
+	}
+	rcu_read_unlock();
+
+	task_io_account_read(req->len);
+	req->cleanup = NULL;
+}
+
+/*
+ * Dispose of our locks and refs on the pages if the read failed.
+ */
+static void afs_file_read_cleanup(struct afs_read *req)
+{
+	struct page *page;
+	pgoff_t index = req->pos >> PAGE_SHIFT;
+	pgoff_t last = index + req->nr_pages - 1;
+
+	if (req->iter) {
+		XA_STATE(xas, &req->vnode->vfs_inode.i_mapping->i_pages, index);
+
+		_enter("%lu,%u,%zu", index, req->nr_pages, iov_iter_count(req->iter));
+
+		rcu_read_lock();
+		xas_for_each(&xas, page, last) {
+			BUG_ON(xa_is_value(page));
+			BUG_ON(PageCompound(page));
+
+			page_endio(page, false, req->error);
+			put_page(page);
+		}
+		rcu_read_unlock();
+	}
+}
+
 /*
  * Dispose of a ref to a read record.
  */
 void afs_put_read(struct afs_read *req)
 {
-	int i;
-
 	if (refcount_dec_and_test(&req->usage)) {
-		if (req->pages) {
-			for (i = 0; i < req->nr_pages; i++)
-				if (req->pages[i])
-					put_page(req->pages[i]);
-			if (req->pages != req->array)
-				kfree(req->pages);
-		}
+		if (req->cleanup)
+			req->cleanup(req);
 		key_put(req->key);
 		kfree(req);
 	}
@@ -229,6 +280,7 @@ static void afs_fetch_data_success(struct afs_operation *op)
 
 static void afs_fetch_data_put(struct afs_operation *op)
 {
+	op->fetch.req->error = op->error;
 	afs_put_read(op->fetch.req);
 }
 
@@ -268,12 +320,11 @@ int afs_fetch_data(struct afs_vnode *vnode, struct afs_read *req)
 /*
  * read page from file, directory or symlink, given a key to use
  */
-int afs_page_filler(void *data, struct page *page)
+static int afs_page_filler(struct key *key, struct page *page)
 {
 	struct inode *inode = page->mapping->host;
 	struct afs_vnode *vnode = AFS_FS_I(inode);
 	struct afs_read *req;
-	struct key *key = data;
 	int ret;
 
 	_enter("{%x},{%lu},{%lu}", key_serial(key), inode->i_ino, page->index);
@@ -284,53 +335,52 @@ int afs_page_filler(void *data, struct page *page)
 	if (test_bit(AFS_VNODE_DELETED, &vnode->flags))
 		goto error;
 
-	req = kzalloc(struct_size(req, array, 1), GFP_KERNEL);
+	req = kzalloc(sizeof(struct afs_read), GFP_KERNEL);
 	if (!req)
 		goto enomem;
 
-	/* We request a full page.  If the page is a partial one at the
-	 * end of the file, the server will return a short read and the
-	 * unmarshalling code will clear the unfilled space.
-	 */
 	refcount_set(&req->usage, 1);
-	req->key = key_get(key);
-	req->pos = (loff_t)page->index << PAGE_SHIFT;
-	req->len = PAGE_SIZE;
-	req->nr_pages = 1;
-	req->pages = req->array;
-	req->pages[0] = page;
+	req->vnode		= vnode;
+	req->key		= key_get(key);
+	req->pos		= (loff_t)page->index << PAGE_SHIFT;
+	req->len		= PAGE_SIZE;
+	req->nr_pages		= 1;
+	req->done		= afs_file_read_done;
+	req->cleanup		= afs_file_read_cleanup;
+
 	get_page(page);
+	iov_iter_xarray(&req->def_iter, READ, &page->mapping->i_pages,
+			req->pos, req->len);
+	req->iter = &req->def_iter;
 
-	/* read the contents of the file from the server into the
-	 * page */
 	ret = afs_fetch_data(vnode, req);
-	afs_put_read(req);
-
-	if (ret < 0) {
-		if (ret == -ENOENT) {
-			_debug("got NOENT from server"
-			       " - marking file deleted and stale");
-			set_bit(AFS_VNODE_DELETED, &vnode->flags);
-			ret = -ESTALE;
-		}
-
-		if (ret == -EINTR ||
-		    ret == -ENOMEM ||
-		    ret == -ERESTARTSYS ||
-		    ret == -EAGAIN)
-			goto error;
-		goto io_error;
-	}
-
-	SetPageUptodate(page);
-	unlock_page(page);
+	if (ret < 0)
+		goto fetch_error;
 
+	afs_put_read(req);
 	_leave(" = 0");
 	return 0;
 
-io_error:
-	SetPageError(page);
-	goto error;
+fetch_error:
+	switch (ret) {
+	case -EINTR:
+	case -ENOMEM:
+	case -ERESTARTSYS:
+	case -EAGAIN:
+		afs_put_read(req);
+		goto error;
+	case -ENOENT:
+		_debug("got NOENT from server - marking file deleted and stale");
+		set_bit(AFS_VNODE_DELETED, &vnode->flags);
+		ret = -ESTALE;
+		/* Fall through */
+	default:
+		page_endio(page, false, ret);
+		afs_put_read(req);
+		_leave(" = %d", ret);
+		return ret;
+	}
+
 enomem:
 	ret = -ENOMEM;
 error:
@@ -365,19 +415,6 @@ static int afs_readpage(struct file *file, struct page *page)
 	return ret;
 }
 
-/*
- * Make pages available as they're filled.
- */
-static void afs_readpages_page_done(struct afs_read *req)
-{
-	struct page *page = req->pages[req->index];
-
-	req->pages[req->index] = NULL;
-	SetPageUptodate(page);
-	unlock_page(page);
-	put_page(page);
-}
-
 /*
  * Read a contiguous set of pages.
  */
@@ -389,7 +426,7 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping,
 	struct list_head *p;
 	struct page *first, *page;
 	pgoff_t index;
-	int ret, n, i;
+	int ret, n;
 
 	/* Count the number of contiguous pages at the front of the list.  Note
 	 * that the list goes prev-wards rather than next-wards.
@@ -405,21 +442,20 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping,
 		n++;
 	}
 
-	req = kzalloc(struct_size(req, array, n), GFP_NOFS);
+	req = kzalloc(sizeof(struct afs_read), GFP_NOFS);
 	if (!req)
 		return -ENOMEM;
 
 	refcount_set(&req->usage, 1);
 	req->vnode = vnode;
 	req->key = key_get(afs_file_key(file));
-	req->page_done = afs_readpages_page_done;
+	req->done = afs_file_read_done;
+	req->cleanup = afs_file_read_cleanup;
 	req->pos = first->index;
 	req->pos <<= PAGE_SHIFT;
-	req->pages = req->array;
 
-	/* Transfer the pages to the request.  We add them in until one fails
-	 * to add to the LRU and then we stop (as that'll make a hole in the
-	 * contiguous run.
+	/* Add pages to the LRU until it fails.  We keep the pages ref'd and
+	 * locked until the read is complete.
 	 *
 	 * Note that it's possible for the file size to change whilst we're
 	 * doing this, but we rely on the server returning less than we asked
@@ -436,8 +472,7 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping,
 			break;
 		}
 
-		req->pages[req->nr_pages++] = page;
-		req->len += PAGE_SIZE;
+		req->nr_pages++;
 	} while (req->nr_pages < n);
 
 	if (req->nr_pages == 0) {
@@ -445,30 +480,25 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping,
 		return 0;
 	}
 
+	req->len = req->nr_pages * PAGE_SIZE;
+	iov_iter_xarray(&req->def_iter, READ, &file->f_mapping->i_pages,
+			req->pos, req->len);
+	req->iter = &req->def_iter;
+
 	ret = afs_fetch_data(vnode, req);
 	if (ret < 0)
 		goto error;
 
-	task_io_account_read(PAGE_SIZE * req->nr_pages);
 	afs_put_read(req);
 	return 0;
 
 error:
 	if (ret == -ENOENT) {
-		_debug("got NOENT from server"
-		       " - marking file deleted and stale");
+		_debug("got NOENT from server - marking file deleted and stale");
 		set_bit(AFS_VNODE_DELETED, &vnode->flags);
 		ret = -ESTALE;
 	}
 
-	for (i = 0; i < req->nr_pages; i++) {
-		page = req->pages[i];
-		if (page) {
-			SetPageError(page);
-			unlock_page(page);
-		}
-	}
-
 	afs_put_read(req);
 	return ret;
 }
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 4a57c6c6f12b..897b37301851 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -302,7 +302,6 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
 	struct afs_vnode_param *vp = &op->file[0];
 	struct afs_read *req = op->fetch.req;
 	const __be32 *bp;
-	unsigned int size;
 	int ret;
 
 	_enter("{%u,%zu,%zu/%llu}",
@@ -312,8 +311,6 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
 	switch (call->unmarshall) {
 	case 0:
 		req->actual_len = 0;
-		req->index = 0;
-		req->offset = req->pos & (PAGE_SIZE - 1);
 		call->unmarshall++;
 		if (call->operation_ID == FSFETCHDATA64) {
 			afs_extract_to_tmp64(call);
@@ -323,7 +320,10 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
 		}
 		fallthrough;
 
-		/* extract the returned data length */
+		/* Extract the returned data length into
+		 * ->actual_len.  This may indicate more or less data than was
+		 * requested will be returned.
+		 */
 	case 1:
 		_debug("extract data length");
 		ret = afs_extract_data(call, true);
@@ -332,45 +332,25 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
 
 		req->actual_len = be64_to_cpu(call->tmp64);
 		_debug("DATA length: %llu", req->actual_len);
-		req->remain = min(req->len, req->actual_len);
-		if (req->remain == 0)
+
+		if (req->actual_len == 0)
 			goto no_more_data;
 
+		call->iter = req->iter;
+		call->iov_len = min(req->actual_len, req->len);
 		call->unmarshall++;
-
-	begin_page:
-		ASSERTCMP(req->index, <, req->nr_pages);
-		if (req->remain > PAGE_SIZE - req->offset)
-			size = PAGE_SIZE - req->offset;
-		else
-			size = req->remain;
-		call->iov_len = size;
-		call->bvec[0].bv_len = size;
-		call->bvec[0].bv_offset = req->offset;
-		call->bvec[0].bv_page = req->pages[req->index];
-		iov_iter_bvec(&call->def_iter, READ, call->bvec, 1, size);
-		ASSERTCMP(size, <=, PAGE_SIZE);
 		fallthrough;
 
 		/* extract the returned data */
 	case 2:
 		_debug("extract data %zu/%llu",
-		       iov_iter_count(call->iter), req->remain);
+		       iov_iter_count(call->iter), req->actual_len);
 
 		ret = afs_extract_data(call, true);
 		if (ret < 0)
 			return ret;
-		req->remain -= call->bvec[0].bv_len;
-		req->offset += call->bvec[0].bv_len;
-		ASSERTCMP(req->offset, <=, PAGE_SIZE);
-		if (req->offset == PAGE_SIZE) {
-			req->offset = 0;
-			req->index++;
-			if (req->remain > 0)
-				goto begin_page;
-		}
 
-		ASSERTCMP(req->remain, ==, 0);
+		call->iter = &call->def_iter;
 		if (req->actual_len <= req->len)
 			goto no_more_data;
 
@@ -412,16 +392,8 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
 		break;
 	}
 
-	for (; req->index < req->nr_pages; req->index++) {
-		if (req->offset < PAGE_SIZE)
-			zero_user_segment(req->pages[req->index],
-					  req->offset, PAGE_SIZE);
-		req->offset = 0;
-	}
-
-	if (req->page_done)
-		for (req->index = 0; req->index < req->nr_pages; req->index++)
-			req->page_done(req);
+	if (req->done)
+		req->done(req);
 
 	_leave(" = 0 [done]");
 	return 0;
@@ -496,6 +468,8 @@ void afs_fs_fetch_data(struct afs_operation *op)
 	if (!call)
 		return afs_op_nomem(op);
 
+	req->call_debug_id = call->debug_id;
+
 	/* marshall the parameters */
 	bp = call->request;
 	bp[0] = htonl(FSFETCHDATA);
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index ea5b780518c7..a643d87315ed 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -31,6 +31,7 @@
 
 struct pagevec;
 struct afs_call;
+struct afs_vnode;
 
 /*
  * Partial file-locking emulation mode.  (The problem being that AFS3 only
@@ -203,18 +204,18 @@ struct afs_read {
 	loff_t			pos;		/* Where to start reading */
 	loff_t			len;		/* How much we're asking for */
 	loff_t			actual_len;	/* How much we're actually getting */
-	loff_t			remain;		/* Amount remaining */
 	loff_t			file_size;	/* File size returned by server */
 	struct key		*key;		/* The key to use to reissue the read */
+	struct afs_vnode	*vnode;		/* The file being read into. */
 	afs_dataversion_t	data_version;	/* Version number returned by server */
 	refcount_t		usage;
-	unsigned int		index;		/* Which page we're reading into */
+	unsigned int		call_debug_id;
 	unsigned int		nr_pages;
-	unsigned int		offset;		/* offset into current page */
-	struct afs_vnode	*vnode;
-	void (*page_done)(struct afs_read *);
-	struct page		**pages;
-	struct page		*array[];
+	int			error;
+	void (*done)(struct afs_read *);
+	void (*cleanup)(struct afs_read *);
+	struct iov_iter		*iter;		/* Iterator representing the buffer */
+	struct iov_iter		def_iter;	/* Default iterator */
 };
 
 /*
@@ -1040,7 +1041,6 @@ extern void afs_put_wb_key(struct afs_wb_key *);
 extern int afs_open(struct inode *, struct file *);
 extern int afs_release(struct inode *, struct file *);
 extern int afs_fetch_data(struct afs_vnode *, struct afs_read *);
-extern int afs_page_filler(void *, struct page *);
 extern void afs_put_read(struct afs_read *);
 
 static inline struct afs_read *afs_get_read(struct afs_read *req)
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 7eba0d3201ba..e78a9bc3b02d 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -22,6 +22,16 @@ int afs_set_page_dirty(struct page *page)
 	return __set_page_dirty_nobuffers(page);
 }
 
+/*
+ * Handle completion of a read operation to fill a page.
+ */
+static void afs_fill_hole(struct afs_read *req)
+{
+	if (iov_iter_count(req->iter) > 0)
+		/* The read was short - clear the excess buffer. */
+		iov_iter_zero(iov_iter_count(req->iter), req->iter);
+}
+
 /*
  * partly or wholly fill a page that's under preparation for writing
  */
@@ -45,18 +55,19 @@ static int afs_fill_page(struct file *file,
 		return 0;
 	}
 
-	req = kzalloc(struct_size(req, array, 1), GFP_KERNEL);
+	req = kzalloc(sizeof(struct afs_read), GFP_KERNEL);
 	if (!req)
 		return -ENOMEM;
 
 	refcount_set(&req->usage, 1);
-	req->key = key_get(afs_file_key(file));
-	req->pos = pos;
-	req->len = len;
-	req->nr_pages = 1;
-	req->pages = req->array;
-	req->pages[0] = page;
-	get_page(page);
+	req->vnode	= vnode;
+	req->done	= afs_fill_hole;
+	req->key	= key_get(afs_file_key(file));
+	req->pos	= pos;
+	req->len	= len;
+	req->nr_pages	= 1;
+	req->iter	= &req->def_iter;
+	iov_iter_xarray(&req->def_iter, READ, &file->f_mapping->i_pages, pos, len);
 
 	ret = afs_fetch_data(vnode, req);
 	afs_put_read(req);
diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c
index 6c45d32da13c..abcec145db4b 100644
--- a/fs/afs/yfsclient.c
+++ b/fs/afs/yfsclient.c
@@ -360,7 +360,6 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call *call)
 	struct afs_vnode_param *vp = &op->file[0];
 	struct afs_read *req = op->fetch.req;
 	const __be32 *bp;
-	unsigned int size;
 	int ret;
 
 	_enter("{%u,%zu, %zu/%llu}",
@@ -370,13 +369,14 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call *call)
 	switch (call->unmarshall) {
 	case 0:
 		req->actual_len = 0;
-		req->index = 0;
-		req->offset = req->pos & (PAGE_SIZE - 1);
 		afs_extract_to_tmp64(call);
 		call->unmarshall++;
 		fallthrough;
 
-		/* extract the returned data length */
+		/* Extract the returned data length into ->actual_len.  This
+		 * may indicate more or less data than was requested will be
+		 * returned.
+		 */
 	case 1:
 		_debug("extract data length");
 		ret = afs_extract_data(call, true);
@@ -385,45 +385,25 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call *call)
 
 		req->actual_len = be64_to_cpu(call->tmp64);
 		_debug("DATA length: %llu", req->actual_len);
-		req->remain = min(req->len, req->actual_len);
-		if (req->remain == 0)
+
+		if (req->actual_len == 0)
 			goto no_more_data;
 
+		call->iter = req->iter;
+		call->iov_len = min(req->actual_len, req->len);
 		call->unmarshall++;
-
-	begin_page:
-		ASSERTCMP(req->index, <, req->nr_pages);
-		if (req->remain > PAGE_SIZE - req->offset)
-			size = PAGE_SIZE - req->offset;
-		else
-			size = req->remain;
-		call->iov_len = size;
-		call->bvec[0].bv_len = size;
-		call->bvec[0].bv_offset = req->offset;
-		call->bvec[0].bv_page = req->pages[req->index];
-		iov_iter_bvec(&call->def_iter, READ, call->bvec, 1, size);
-		ASSERTCMP(size, <=, PAGE_SIZE);
 		fallthrough;
 
 		/* extract the returned data */
 	case 2:
 		_debug("extract data %zu/%llu",
-		       iov_iter_count(call->iter), req->remain);
+		       iov_iter_count(call->iter), req->actual_len);
 
 		ret = afs_extract_data(call, true);
 		if (ret < 0)
 			return ret;
-		req->remain -= call->bvec[0].bv_len;
-		req->offset += call->bvec[0].bv_len;
-		ASSERTCMP(req->offset, <=, PAGE_SIZE);
-		if (req->offset == PAGE_SIZE) {
-			req->offset = 0;
-			req->index++;
-			if (req->remain > 0)
-				goto begin_page;
-		}
 
-		ASSERTCMP(req->remain, ==, 0);
+		call->iter = &call->def_iter;
 		if (req->actual_len <= req->len)
 			goto no_more_data;
 
@@ -469,16 +449,8 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call *call)
 		break;
 	}
 
-	for (; req->index < req->nr_pages; req->index++) {
-		if (req->offset < PAGE_SIZE)
-			zero_user_segment(req->pages[req->index],
-					  req->offset, PAGE_SIZE);
-		req->offset = 0;
-	}
-
-	if (req->page_done)
-		for (req->index = 0; req->index < req->nr_pages; req->index++)
-			req->page_done(req);
+	if (req->done)
+		req->done(req);
 
 	_leave(" = 0 [done]");
 	return 0;
@@ -518,6 +490,8 @@ void yfs_fs_fetch_data(struct afs_operation *op)
 	if (!call)
 		return afs_op_nomem(op);
 
+	req->call_debug_id = call->debug_id;
+
 	/* marshall the parameters */
 	bp = call->request;
 	bp = xdr_encode_u32(bp, YFSFETCHDATA64);



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 62/76] afs: Use ITER_XARRAY for writing
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (59 preceding siblings ...)
  2020-11-20 15:15 ` [RFC PATCH 61/76] afs: Set up the iov_iter before calling afs_extract_data() David Howells
@ 2020-11-20 15:15 ` David Howells
  2020-11-20 15:16 ` [RFC PATCH 63/76] afs: Wait on PG_fscache before modifying/releasing a page David Howells
                   ` (13 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:15 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Use a single ITER_XARRAY iterator to describe the portion of a file to be
transmitted to the server rather than generating a series of small
ITER_BVEC iterators on the fly.  This will make it easier to implement AIO
in afs.

In theory we could maybe use one giant ITER_BVEC, but that means
potentially allocating a huge array of bio_vec structs (max 256 per page)
when in fact the pagecache already has a structure listing all the relevant
pages (radix_tree/xarray) that can be walked over.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/fsclient.c          |   50 +++++++++------------
 fs/afs/internal.h          |   15 +++---
 fs/afs/rxrpc.c             |  103 ++++++--------------------------------------
 fs/afs/write.c             |  100 ++++++++++++++++++++++++-------------------
 fs/afs/yfsclient.c         |   25 +++--------
 include/trace/events/afs.h |   51 ++++++++--------------
 6 files changed, 126 insertions(+), 218 deletions(-)

diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 897b37301851..31e6b3635541 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -1055,8 +1055,7 @@ static const struct afs_call_type afs_RXFSStoreData64 = {
 /*
  * store a set of pages to a very large file
  */
-static void afs_fs_store_data64(struct afs_operation *op,
-				loff_t pos, loff_t size, loff_t i_size)
+static void afs_fs_store_data64(struct afs_operation *op)
 {
 	struct afs_vnode_param *vp = &op->file[0];
 	struct afs_call *call;
@@ -1071,7 +1070,7 @@ static void afs_fs_store_data64(struct afs_operation *op,
 	if (!call)
 		return afs_op_nomem(op);
 
-	call->send_pages = true;
+	call->write_iter = op->store.write_iter;
 
 	/* marshall the parameters */
 	bp = call->request;
@@ -1087,47 +1086,38 @@ static void afs_fs_store_data64(struct afs_operation *op,
 	*bp++ = 0; /* unix mode */
 	*bp++ = 0; /* segment size */
 
-	*bp++ = htonl(upper_32_bits(pos));
-	*bp++ = htonl(lower_32_bits(pos));
-	*bp++ = htonl(upper_32_bits(size));
-	*bp++ = htonl(lower_32_bits(size));
-	*bp++ = htonl(upper_32_bits(i_size));
-	*bp++ = htonl(lower_32_bits(i_size));
+	*bp++ = htonl(upper_32_bits(op->store.pos));
+	*bp++ = htonl(lower_32_bits(op->store.pos));
+	*bp++ = htonl(upper_32_bits(op->store.size));
+	*bp++ = htonl(lower_32_bits(op->store.size));
+	*bp++ = htonl(upper_32_bits(op->store.i_size));
+	*bp++ = htonl(lower_32_bits(op->store.i_size));
 
 	trace_afs_make_fs_call(call, &vp->fid);
 	afs_make_op_call(op, call, GFP_NOFS);
 }
 
 /*
- * store a set of pages
+ * Write data to a file on the server.
  */
 void afs_fs_store_data(struct afs_operation *op)
 {
 	struct afs_vnode_param *vp = &op->file[0];
 	struct afs_call *call;
-	loff_t size, pos, i_size;
 	__be32 *bp;
 
 	_enter(",%x,{%llx:%llu},,",
 	       key_serial(op->key), vp->fid.vid, vp->fid.vnode);
 
-	size = (loff_t)op->store.last_to - (loff_t)op->store.first_offset;
-	if (op->store.first != op->store.last)
-		size += (loff_t)(op->store.last - op->store.first) << PAGE_SHIFT;
-	pos = (loff_t)op->store.first << PAGE_SHIFT;
-	pos += op->store.first_offset;
-
-	i_size = i_size_read(&vp->vnode->vfs_inode);
-	if (pos + size > i_size)
-		i_size = size + pos;
-
 	_debug("size %llx, at %llx, i_size %llx",
-	       (unsigned long long) size, (unsigned long long) pos,
-	       (unsigned long long) i_size);
+	       (unsigned long long)op->store.size,
+	       (unsigned long long)op->store.pos,
+	       (unsigned long long)op->store.i_size);
 
-	if (upper_32_bits(pos) || upper_32_bits(i_size) || upper_32_bits(size) ||
-	    upper_32_bits(pos + size))
-		return afs_fs_store_data64(op, pos, size, i_size);
+	if (upper_32_bits(op->store.pos) ||
+	    upper_32_bits(op->store.size) ||
+	    upper_32_bits(op->store.i_size))
+		return afs_fs_store_data64(op);
 
 	call = afs_alloc_flat_call(op->net, &afs_RXFSStoreData,
 				   (4 + 6 + 3) * 4,
@@ -1135,7 +1125,7 @@ void afs_fs_store_data(struct afs_operation *op)
 	if (!call)
 		return afs_op_nomem(op);
 
-	call->send_pages = true;
+	call->write_iter = op->store.write_iter;
 
 	/* marshall the parameters */
 	bp = call->request;
@@ -1151,9 +1141,9 @@ void afs_fs_store_data(struct afs_operation *op)
 	*bp++ = 0; /* unix mode */
 	*bp++ = 0; /* segment size */
 
-	*bp++ = htonl(lower_32_bits(pos));
-	*bp++ = htonl(lower_32_bits(size));
-	*bp++ = htonl(lower_32_bits(i_size));
+	*bp++ = htonl(lower_32_bits(op->store.pos));
+	*bp++ = htonl(lower_32_bits(op->store.size));
+	*bp++ = htonl(lower_32_bits(op->store.i_size));
 
 	trace_afs_make_fs_call(call, &vp->fid);
 	afs_make_op_call(op, call, GFP_NOFS);
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index a643d87315ed..3556500a1247 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -107,6 +107,7 @@ struct afs_call {
 	void			*request;	/* request data (first part) */
 	size_t			iov_len;	/* Size of *iter to be used */
 	struct iov_iter		def_iter;	/* Default buffer/data iterator */
+	struct iov_iter		*write_iter;	/* Iterator defining write to be made */
 	struct iov_iter		*iter;		/* Iterator currently in use */
 	union {	/* Convenience for ->def_iter */
 		struct kvec	kvec[1];
@@ -133,7 +134,6 @@ struct afs_call {
 	unsigned char		unmarshall;	/* unmarshalling phase */
 	unsigned char		addr_ix;	/* Address in ->alist */
 	bool			drop_ref;	/* T if need to drop ref for incoming call */
-	bool			send_pages;	/* T if data from mapping should be sent */
 	bool			need_attention;	/* T if RxRPC poked us */
 	bool			async;		/* T if asynchronous */
 	bool			upgrade;	/* T to request service upgrade */
@@ -810,12 +810,13 @@ struct afs_operation {
 			afs_lock_type_t type;
 		} lock;
 		struct {
-			struct address_space *mapping;	/* Pages being written from */
-			pgoff_t		first;		/* first page in mapping to deal with */
-			pgoff_t		last;		/* last page in mapping to deal with */
-			unsigned	first_offset;	/* offset into mapping[first] */
-			unsigned	last_to;	/* amount of mapping[last] */
-			bool		laundering;	/* Laundering page, PG_writeback not set */
+			struct iov_iter	*write_iter;
+			loff_t	pos;
+			loff_t	size;
+			loff_t	i_size;
+			pgoff_t	first;		/* first page in mapping to deal with */
+			pgoff_t	last;		/* last page in mapping to deal with */
+			bool	laundering;	/* Laundering page, PG_writeback not set */
 		} store;
 		struct {
 			struct iattr	*attr;
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index ae68576f822f..23a1a92d64bb 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -271,40 +271,6 @@ void afs_flat_call_destructor(struct afs_call *call)
 	call->buffer = NULL;
 }
 
-#define AFS_BVEC_MAX 8
-
-/*
- * Load the given bvec with the next few pages.
- */
-static void afs_load_bvec(struct afs_call *call, struct msghdr *msg,
-			  struct bio_vec *bv, pgoff_t first, pgoff_t last,
-			  unsigned offset)
-{
-	struct afs_operation *op = call->op;
-	struct page *pages[AFS_BVEC_MAX];
-	unsigned int nr, n, i, to, bytes = 0;
-
-	nr = min_t(pgoff_t, last - first + 1, AFS_BVEC_MAX);
-	n = find_get_pages_contig(op->store.mapping, first, nr, pages);
-	ASSERTCMP(n, ==, nr);
-
-	msg->msg_flags |= MSG_MORE;
-	for (i = 0; i < nr; i++) {
-		to = PAGE_SIZE;
-		if (first + i >= last) {
-			to = op->store.last_to;
-			msg->msg_flags &= ~MSG_MORE;
-		}
-		bv[i].bv_page = pages[i];
-		bv[i].bv_len = to - offset;
-		bv[i].bv_offset = offset;
-		bytes += to - offset;
-		offset = 0;
-	}
-
-	iov_iter_bvec(&msg->msg_iter, WRITE, bv, nr, bytes);
-}
-
 /*
  * Advance the AFS call state when the RxRPC call ends the transmit phase.
  */
@@ -317,42 +283,6 @@ static void afs_notify_end_request_tx(struct sock *sock,
 	afs_set_call_state(call, AFS_CALL_CL_REQUESTING, AFS_CALL_CL_AWAIT_REPLY);
 }
 
-/*
- * attach the data from a bunch of pages on an inode to a call
- */
-static int afs_send_pages(struct afs_call *call, struct msghdr *msg)
-{
-	struct afs_operation *op = call->op;
-	struct bio_vec bv[AFS_BVEC_MAX];
-	unsigned int bytes, nr, loop, offset;
-	pgoff_t first = op->store.first, last = op->store.last;
-	int ret;
-
-	offset = op->store.first_offset;
-	op->store.first_offset = 0;
-
-	do {
-		afs_load_bvec(call, msg, bv, first, last, offset);
-		trace_afs_send_pages(call, msg, first, last, offset);
-
-		offset = 0;
-		bytes = msg->msg_iter.count;
-		nr = msg->msg_iter.nr_segs;
-
-		ret = rxrpc_kernel_send_data(op->net->socket, call->rxcall, msg,
-					     bytes, afs_notify_end_request_tx);
-		for (loop = 0; loop < nr; loop++)
-			put_page(bv[loop].bv_page);
-		if (ret < 0)
-			break;
-
-		first += nr;
-	} while (first <= last);
-
-	trace_afs_sent_pages(call, op->store.first, last, first, ret);
-	return ret;
-}
-
 /*
  * Initiate a call and synchronously queue up the parameters for dispatch.  Any
  * error is stored into the call struct, which the caller must check for.
@@ -384,21 +314,8 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp)
 	 * after the initial fixed part.
 	 */
 	tx_total_len = call->request_size;
-	if (call->send_pages) {
-		struct afs_operation *op = call->op;
-
-		if (op->store.last == op->store.first) {
-			tx_total_len += op->store.last_to - op->store.first_offset;
-		} else {
-			/* It looks mathematically like you should be able to
-			 * combine the following lines with the ones above, but
-			 * unsigned arithmetic is fun when it wraps...
-			 */
-			tx_total_len += PAGE_SIZE - op->store.first_offset;
-			tx_total_len += op->store.last_to;
-			tx_total_len += (op->store.last - op->store.first - 1) * PAGE_SIZE;
-		}
-	}
+	if (call->write_iter)
+		tx_total_len += iov_iter_count(call->write_iter);
 
 	/* If the call is going to be asynchronous, we need an extra ref for
 	 * the call to hold itself so the caller need not hang on to its ref.
@@ -440,7 +357,7 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp)
 	iov_iter_kvec(&msg.msg_iter, WRITE, iov, 1, call->request_size);
 	msg.msg_control		= NULL;
 	msg.msg_controllen	= 0;
-	msg.msg_flags		= MSG_WAITALL | (call->send_pages ? MSG_MORE : 0);
+	msg.msg_flags		= MSG_WAITALL | (call->write_iter ? MSG_MORE : 0);
 
 	ret = rxrpc_kernel_send_data(call->net->socket, rxcall,
 				     &msg, call->request_size,
@@ -448,8 +365,18 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp)
 	if (ret < 0)
 		goto error_do_abort;
 
-	if (call->send_pages) {
-		ret = afs_send_pages(call, &msg);
+	if (call->write_iter) {
+		msg.msg_iter = *call->write_iter;
+		msg.msg_flags &= ~MSG_MORE;
+		trace_afs_send_data(call, &msg);
+
+		ret = rxrpc_kernel_send_data(call->net->socket,
+					     call->rxcall, &msg,
+					     iov_iter_count(&msg.msg_iter),
+					     afs_notify_end_request_tx);
+		*call->write_iter = msg.msg_iter;
+
+		trace_afs_sent_data(call, &msg, ret);
 		if (ret < 0)
 			goto error_do_abort;
 	}
diff --git a/fs/afs/write.c b/fs/afs/write.c
index e78a9bc3b02d..dd4dc1c868b5 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -325,36 +325,27 @@ static void afs_redirty_pages(struct writeback_control *wbc,
 /*
  * completion of write to server
  */
-static void afs_pages_written_back(struct afs_vnode *vnode,
-				   pgoff_t first, pgoff_t last)
+static void afs_pages_written_back(struct afs_vnode *vnode, pgoff_t start, pgoff_t last)
 {
-	struct pagevec pv;
-	unsigned count, loop;
+	struct address_space *mapping = vnode->vfs_inode.i_mapping;
+	struct page *page;
+
+	XA_STATE(xas, &mapping->i_pages, start);
 
 	_enter("{%llx:%llu},{%lx-%lx}",
-	       vnode->fid.vid, vnode->fid.vnode, first, last);
+	       vnode->fid.vid, vnode->fid.vnode, start, last);
 
-	pagevec_init(&pv);
+	rcu_read_lock();
 
-	do {
-		_debug("done %lx-%lx", first, last);
+	xas_for_each(&xas, page, last) {
+		ASSERT(PageWriteback(page));
 
-		count = last - first + 1;
-		if (count > PAGEVEC_SIZE)
-			count = PAGEVEC_SIZE;
-		pv.nr = find_get_pages_contig(vnode->vfs_inode.i_mapping,
-					      first, count, pv.pages);
-		ASSERTCMP(pv.nr, ==, count);
+		detach_page_private(page);
+		trace_afs_page_dirty(vnode, tracepoint_string("clear"), page);
+		page_endio(page, true, 0);
+	}
 
-		for (loop = 0; loop < count; loop++) {
-			detach_page_private(pv.pages[loop]);
-			trace_afs_page_dirty(vnode, tracepoint_string("clear"),
-					     pv.pages[loop]);
-			end_page_writeback(pv.pages[loop]);
-		}
-		first += count;
-		__pagevec_release(&pv);
-	} while (first <= last);
+	rcu_read_unlock();
 
 	afs_prune_wb_keys(vnode);
 	_leave("");
@@ -411,9 +402,7 @@ static void afs_store_data_success(struct afs_operation *op)
 		if (!op->store.laundering)
 			afs_pages_written_back(vnode, op->store.first, op->store.last);
 		afs_stat_v(vnode, n_stores);
-		atomic_long_add((op->store.last * PAGE_SIZE + op->store.last_to) -
-				(op->store.first * PAGE_SIZE + op->store.first_offset),
-				&afs_v2net(vnode)->n_store_bytes);
+		atomic_long_add(op->store.size, &afs_v2net(vnode)->n_store_bytes);
 	}
 }
 
@@ -426,21 +415,21 @@ static const struct afs_operation_ops afs_store_data_operation = {
 /*
  * write to a file
  */
-static int afs_store_data(struct address_space *mapping,
-			  pgoff_t first, pgoff_t last,
-			  unsigned offset, unsigned to, bool laundering)
+static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter,
+			  loff_t pos, pgoff_t first, pgoff_t last,
+			  bool laundering)
 {
-	struct afs_vnode *vnode = AFS_FS_I(mapping->host);
 	struct afs_operation *op;
 	struct afs_wb_key *wbk = NULL;
-	int ret;
+	loff_t size = iov_iter_count(iter), i_size;
+	int ret = -ENOKEY;
 
-	_enter("%s{%llx:%llu.%u},%lx,%lx,%x,%x",
+	_enter("%s{%llx:%llu.%u},%llx,%llx",
 	       vnode->volume->name,
 	       vnode->fid.vid,
 	       vnode->fid.vnode,
 	       vnode->fid.unique,
-	       first, last, offset, to);
+	       size, pos);
 
 	ret = afs_get_writeback_key(vnode, &wbk);
 	if (ret) {
@@ -454,13 +443,16 @@ static int afs_store_data(struct address_space *mapping,
 		return -ENOMEM;
 	}
 
+	i_size = i_size_read(&vnode->vfs_inode);
+
 	afs_op_set_vnode(op, 0, vnode);
 	op->file[0].dv_delta = 1;
-	op->store.mapping = mapping;
+	op->store.write_iter = iter;
+	op->store.pos = pos;
 	op->store.first = first;
 	op->store.last = last;
-	op->store.first_offset = offset;
-	op->store.last_to = to;
+	op->store.size = size;
+	op->store.i_size = max(pos + size, i_size);
 	op->store.laundering = laundering;
 	op->mtime = vnode->vfs_inode.i_mtime;
 	op->flags |= AFS_OPERATION_UNINTR;
@@ -503,11 +495,12 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
 					   pgoff_t final_page)
 {
 	struct afs_vnode *vnode = AFS_FS_I(mapping->host);
+	struct iov_iter iter;
 	struct page *pages[8], *page;
 	unsigned long count, priv;
 	unsigned n, offset, to, f, t;
 	pgoff_t start, first, last;
-	loff_t i_size, end;
+	loff_t i_size, pos, end;
 	int loop, ret;
 
 	_enter(",%lx", primary_page->index);
@@ -604,15 +597,28 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
 
 	first = primary_page->index;
 	last = first + count - 1;
+	_debug("write back %lx[%u..] to %lx[..%u]", first, offset, last, to);
 
-	end = (loff_t)last * PAGE_SIZE + to;
-	i_size = i_size_read(&vnode->vfs_inode);
+	pos = first;
+	pos <<= PAGE_SHIFT;
+	pos += offset;
+	end = last;
+	end <<= PAGE_SHIFT;
+	end += to;
 
-	_debug("write back %lx[%u..] to %lx[..%u]", first, offset, last, to);
+	/* Trim the actual write down to the EOF */
+	i_size = i_size_read(&vnode->vfs_inode);
 	if (end > i_size)
-		to = i_size & ~PAGE_MASK;
+		end = i_size;
+
+	if (pos < i_size) {
+		iov_iter_xarray(&iter, WRITE, &mapping->i_pages, pos, end - pos);
+		ret = afs_store_data(vnode, &iter, pos, first, last, false);
+	} else {
+		/* The dirty region was entirely beyond the EOF. */
+		ret = 0;
+	}
 
-	ret = afs_store_data(mapping, first, last, offset, to, false);
 	switch (ret) {
 	case 0:
 		ret = count;
@@ -913,6 +919,8 @@ int afs_launder_page(struct page *page)
 {
 	struct address_space *mapping = page->mapping;
 	struct afs_vnode *vnode = AFS_FS_I(mapping->host);
+	struct iov_iter iter;
+	struct bio_vec bv[1];
 	unsigned long priv;
 	unsigned int f, t;
 	int ret = 0;
@@ -928,8 +936,14 @@ int afs_launder_page(struct page *page)
 			t = afs_page_dirty_to(page, priv);
 		}
 
+		bv[0].bv_page = page;
+		bv[0].bv_offset = f;
+		bv[0].bv_len = t - f;
+		iov_iter_bvec(&iter, WRITE, bv, 1, bv[0].bv_len);
+
 		trace_afs_page_dirty(vnode, tracepoint_string("launder"), page);
-		ret = afs_store_data(mapping, page->index, page->index, t, f, true);
+		ret = afs_store_data(vnode, &iter, (loff_t)page->index << PAGE_SHIFT,
+				     page->index, page->index, true);
 	}
 
 	detach_page_private(page);
diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c
index abcec145db4b..363d6dd276c0 100644
--- a/fs/afs/yfsclient.c
+++ b/fs/afs/yfsclient.c
@@ -1078,25 +1078,15 @@ void yfs_fs_store_data(struct afs_operation *op)
 {
 	struct afs_vnode_param *vp = &op->file[0];
 	struct afs_call *call;
-	loff_t size, pos, i_size;
 	__be32 *bp;
 
 	_enter(",%x,{%llx:%llu},,",
 	       key_serial(op->key), vp->fid.vid, vp->fid.vnode);
 
-	size = (loff_t)op->store.last_to - (loff_t)op->store.first_offset;
-	if (op->store.first != op->store.last)
-		size += (loff_t)(op->store.last - op->store.first) << PAGE_SHIFT;
-	pos = (loff_t)op->store.first << PAGE_SHIFT;
-	pos += op->store.first_offset;
-
-	i_size = i_size_read(&vp->vnode->vfs_inode);
-	if (pos + size > i_size)
-		i_size = size + pos;
-
 	_debug("size %llx, at %llx, i_size %llx",
-	       (unsigned long long)size, (unsigned long long)pos,
-	       (unsigned long long)i_size);
+	       (unsigned long long)op->store.size,
+	       (unsigned long long)op->store.pos,
+	       (unsigned long long)op->store.i_size);
 
 	call = afs_alloc_flat_call(op->net, &yfs_RXYFSStoreData64,
 				   sizeof(__be32) +
@@ -1109,8 +1099,7 @@ void yfs_fs_store_data(struct afs_operation *op)
 	if (!call)
 		return afs_op_nomem(op);
 
-	call->key = op->key;
-	call->send_pages = true;
+	call->write_iter = op->store.write_iter;
 
 	/* marshall the parameters */
 	bp = call->request;
@@ -1118,9 +1107,9 @@ void yfs_fs_store_data(struct afs_operation *op)
 	bp = xdr_encode_u32(bp, 0); /* RPC flags */
 	bp = xdr_encode_YFSFid(bp, &vp->fid);
 	bp = xdr_encode_YFSStoreStatus_mtime(bp, &op->mtime);
-	bp = xdr_encode_u64(bp, pos);
-	bp = xdr_encode_u64(bp, size);
-	bp = xdr_encode_u64(bp, i_size);
+	bp = xdr_encode_u64(bp, op->store.pos);
+	bp = xdr_encode_u64(bp, op->store.size);
+	bp = xdr_encode_u64(bp, op->store.i_size);
 	yfs_check_req(call, bp);
 
 	trace_afs_make_fs_call(call, &vp->fid);
diff --git a/include/trace/events/afs.h b/include/trace/events/afs.h
index f0b2565db2d9..f9e196c8798d 100644
--- a/include/trace/events/afs.h
+++ b/include/trace/events/afs.h
@@ -884,65 +884,52 @@ TRACE_EVENT(afs_call_done,
 		      __entry->rx_call)
 	    );
 
-TRACE_EVENT(afs_send_pages,
-	    TP_PROTO(struct afs_call *call, struct msghdr *msg,
-		     pgoff_t first, pgoff_t last, unsigned int offset),
+TRACE_EVENT(afs_send_data,
+	    TP_PROTO(struct afs_call *call, struct msghdr *msg),
 
-	    TP_ARGS(call, msg, first, last, offset),
+	    TP_ARGS(call, msg),
 
 	    TP_STRUCT__entry(
 		    __field(unsigned int,		call		)
-		    __field(pgoff_t,			first		)
-		    __field(pgoff_t,			last		)
-		    __field(unsigned int,		nr		)
-		    __field(unsigned int,		bytes		)
-		    __field(unsigned int,		offset		)
 		    __field(unsigned int,		flags		)
+		    __field(loff_t,			offset		)
+		    __field(loff_t,			count		)
 			     ),
 
 	    TP_fast_assign(
 		    __entry->call = call->debug_id;
-		    __entry->first = first;
-		    __entry->last = last;
-		    __entry->nr = msg->msg_iter.nr_segs;
-		    __entry->bytes = msg->msg_iter.count;
-		    __entry->offset = offset;
 		    __entry->flags = msg->msg_flags;
+		    __entry->offset = msg->msg_iter.xarray_start + msg->msg_iter.iov_offset;
+		    __entry->count = iov_iter_count(&msg->msg_iter);
 			   ),
 
-	    TP_printk(" c=%08x %lx-%lx-%lx b=%x o=%x f=%x",
-		      __entry->call,
-		      __entry->first, __entry->first + __entry->nr - 1, __entry->last,
-		      __entry->bytes, __entry->offset,
+	    TP_printk(" c=%08x o=%llx n=%llx f=%x",
+		      __entry->call, __entry->offset, __entry->count,
 		      __entry->flags)
 	    );
 
-TRACE_EVENT(afs_sent_pages,
-	    TP_PROTO(struct afs_call *call, pgoff_t first, pgoff_t last,
-		     pgoff_t cursor, int ret),
+TRACE_EVENT(afs_sent_data,
+	    TP_PROTO(struct afs_call *call, struct msghdr *msg, int ret),
 
-	    TP_ARGS(call, first, last, cursor, ret),
+	    TP_ARGS(call, msg, ret),
 
 	    TP_STRUCT__entry(
 		    __field(unsigned int,		call		)
-		    __field(pgoff_t,			first		)
-		    __field(pgoff_t,			last		)
-		    __field(pgoff_t,			cursor		)
 		    __field(int,			ret		)
+		    __field(loff_t,			offset		)
+		    __field(loff_t,			count		)
 			     ),
 
 	    TP_fast_assign(
 		    __entry->call = call->debug_id;
-		    __entry->first = first;
-		    __entry->last = last;
-		    __entry->cursor = cursor;
 		    __entry->ret = ret;
+		    __entry->offset = msg->msg_iter.xarray_start + msg->msg_iter.iov_offset;
+		    __entry->count = iov_iter_count(&msg->msg_iter);
 			   ),
 
-	    TP_printk(" c=%08x %lx-%lx c=%lx r=%d",
-		      __entry->call,
-		      __entry->first, __entry->last,
-		      __entry->cursor, __entry->ret)
+	    TP_printk(" c=%08x o=%llx n=%llx r=%x",
+		      __entry->call, __entry->offset, __entry->count,
+		      __entry->ret)
 	    );
 
 TRACE_EVENT(afs_dir_check_failed,



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 63/76] afs: Wait on PG_fscache before modifying/releasing a page
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (60 preceding siblings ...)
  2020-11-20 15:15 ` [RFC PATCH 62/76] afs: Use ITER_XARRAY for writing David Howells
@ 2020-11-20 15:16 ` David Howells
  2020-11-20 15:16 ` [RFC PATCH 64/76] afs: Extract writeback extension into its own function David Howells
                   ` (12 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:16 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

PG_fscache is going to be used to indicate that a page is being written to
the cache, and that the page should not be modified or released until it's
finished.

Make afs_invalidatepage() and afs_releasepage() wait for it.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/file.c  |    9 +++++++++
 fs/afs/write.c |   10 ++++++++++
 2 files changed, 19 insertions(+)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 3f409ddc1c1f..a6c6cc9b1d1c 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -607,6 +607,7 @@ static void afs_invalidatepage(struct page *page, unsigned int offset,
 	if (PagePrivate(page))
 		afs_invalidate_dirty(page, offset, length);
 
+	wait_on_page_fscache(page);
 	_leave("");
 }
 
@@ -624,6 +625,14 @@ static int afs_releasepage(struct page *page, gfp_t gfp_flags)
 
 	/* deny if page is being written to the cache and the caller hasn't
 	 * elected to wait */
+#ifdef CONFIG_AFS_FSCACHE
+	if (PageFsCache(page)) {
+		if (!(gfp_flags & __GFP_DIRECT_RECLAIM) || !(gfp_flags & __GFP_FS))
+			return false;
+		wait_on_page_fscache(page);
+	}
+#endif
+
 	if (PagePrivate(page)) {
 		detach_page_private(page);
 		trace_afs_page_dirty(vnode, tracepoint_string("rel"), page);
diff --git a/fs/afs/write.c b/fs/afs/write.c
index dd4dc1c868b5..e1791de90478 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -117,6 +117,10 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
 		SetPageUptodate(page);
 	}
 
+#ifdef CONFIG_AFS_FSCACHE
+	wait_on_page_fscache(page);
+#endif
+
 try_again:
 	/* See if this page is already partially written in a way that we can
 	 * merge the new write with.
@@ -857,6 +861,11 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 	/* Wait for the page to be written to the cache before we allow it to
 	 * be modified.  We then assume the entire page will need writing back.
 	 */
+#ifdef CONFIG_AFS_FSCACHE
+	if (PageFsCache(vmf->page) &&
+	    wait_on_page_bit_killable(vmf->page, PG_fscache) < 0)
+		return VM_FAULT_RETRY;
+#endif
 
 	if (PageWriteback(vmf->page) &&
 	    wait_on_page_bit_killable(vmf->page, PG_writeback) < 0)
@@ -948,5 +957,6 @@ int afs_launder_page(struct page *page)
 
 	detach_page_private(page);
 	trace_afs_page_dirty(vnode, tracepoint_string("laundered"), page);
+	wait_on_page_fscache(page);
 	return ret;
 }



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 64/76] afs: Extract writeback extension into its own function
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (61 preceding siblings ...)
  2020-11-20 15:16 ` [RFC PATCH 63/76] afs: Wait on PG_fscache before modifying/releasing a page David Howells
@ 2020-11-20 15:16 ` David Howells
  2020-11-20 15:16 ` [RFC PATCH 65/76] afs: Prepare for use of THPs David Howells
                   ` (11 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:16 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Extract writeback extension into its own function to break up the writeback
function a bit.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/write.c |  109 ++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 67 insertions(+), 42 deletions(-)

diff --git a/fs/afs/write.c b/fs/afs/write.c
index e1791de90478..89c804bfe253 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -490,47 +490,25 @@ static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter,
 }
 
 /*
- * Synchronously write back the locked page and any subsequent non-locked dirty
- * pages.
+ * Extend the region to be written back to include subsequent contiguously
+ * dirty pages if possible, but don't sleep while doing so.
+ *
+ * If this page holds new content, then we can include filler zeros in the
+ * writeback.
  */
-static int afs_write_back_from_locked_page(struct address_space *mapping,
-					   struct writeback_control *wbc,
-					   struct page *primary_page,
-					   pgoff_t final_page)
+static void afs_extend_writeback(struct address_space *mapping,
+				 struct afs_vnode *vnode,
+				 long *_count,
+				 pgoff_t start,
+				 pgoff_t final_page,
+				 unsigned *_offset,
+				 unsigned *_to,
+				 bool new_content)
 {
-	struct afs_vnode *vnode = AFS_FS_I(mapping->host);
-	struct iov_iter iter;
 	struct page *pages[8], *page;
-	unsigned long count, priv;
-	unsigned n, offset, to, f, t;
-	pgoff_t start, first, last;
-	loff_t i_size, pos, end;
-	int loop, ret;
-
-	_enter(",%lx", primary_page->index);
-
-	count = 1;
-	if (test_set_page_writeback(primary_page))
-		BUG();
-
-	/* Find all consecutive lockable dirty pages that have contiguous
-	 * written regions, stopping when we find a page that is not
-	 * immediately lockable, is not dirty or is missing, or we reach the
-	 * end of the range.
-	 */
-	start = primary_page->index;
-	priv = page_private(primary_page);
-	offset = afs_page_dirty_from(primary_page, priv);
-	to = afs_page_dirty_to(primary_page, priv);
-	trace_afs_page_dirty(vnode, tracepoint_string("store"), primary_page);
-
-	WARN_ON(offset == to);
-	if (offset == to)
-		trace_afs_page_dirty(vnode, tracepoint_string("WARN"), primary_page);
-
-	if (start >= final_page ||
-	    (to < PAGE_SIZE && !test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags)))
-		goto no_more;
+	unsigned long count = *_count, priv;
+	unsigned offset = *_offset, to = *_to, n, f, t;
+	int loop;
 
 	start++;
 	do {
@@ -551,8 +529,7 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
 
 		for (loop = 0; loop < n; loop++) {
 			page = pages[loop];
-			if (to != PAGE_SIZE &&
-			    !test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags))
+			if (to != PAGE_SIZE && !new_content)
 				break;
 			if (page->index > final_page)
 				break;
@@ -566,8 +543,7 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
 			priv = page_private(page);
 			f = afs_page_dirty_from(page, priv);
 			t = afs_page_dirty_to(page, priv);
-			if (f != 0 &&
-			    !test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags)) {
+			if (f != 0 && !new_content) {
 				unlock_page(page);
 				break;
 			}
@@ -593,6 +569,55 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
 	} while (start <= final_page && count < 65536);
 
 no_more:
+	*_count = count;
+	*_offset = offset;
+	*_to = to;
+}
+
+/*
+ * Synchronously write back the locked page and any subsequent non-locked dirty
+ * pages.
+ */
+static int afs_write_back_from_locked_page(struct address_space *mapping,
+					   struct writeback_control *wbc,
+					   struct page *primary_page,
+					   pgoff_t final_page)
+{
+	struct afs_vnode *vnode = AFS_FS_I(mapping->host);
+	struct iov_iter iter;
+	unsigned long count, priv;
+	unsigned offset, to;
+	pgoff_t start, first, last;
+	loff_t i_size, pos, end;
+	bool new_content = test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags);
+	int ret;
+
+	_enter(",%lx", primary_page->index);
+
+	count = 1;
+	if (test_set_page_writeback(primary_page))
+		BUG();
+
+	/* Find all consecutive lockable dirty pages that have contiguous
+	 * written regions, stopping when we find a page that is not
+	 * immediately lockable, is not dirty or is missing, or we reach the
+	 * end of the range.
+	 */
+	start = primary_page->index;
+	priv = page_private(primary_page);
+	offset = afs_page_dirty_from(primary_page, priv);
+	to = afs_page_dirty_to(primary_page, priv);
+	trace_afs_page_dirty(vnode, tracepoint_string("store"), primary_page);
+
+	WARN_ON(offset == to);
+	if (offset == to)
+		trace_afs_page_dirty(vnode, tracepoint_string("WARN"), primary_page);
+
+	if (start < final_page &&
+	    (to == PAGE_SIZE || new_content))
+		afs_extend_writeback(mapping, vnode, &count, start, final_page,
+				     &offset, &to, new_content);
+
 	/* We now have a contiguous set of dirty pages, each with writeback
 	 * set; the first page is still locked at this point, but all the rest
 	 * have been unlocked.



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 65/76] afs: Prepare for use of THPs
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (62 preceding siblings ...)
  2020-11-20 15:16 ` [RFC PATCH 64/76] afs: Extract writeback extension into its own function David Howells
@ 2020-11-20 15:16 ` David Howells
  2020-11-20 15:16 ` [RFC PATCH 66/76] afs: Use the fs operation ops to handle FetchData completion David Howells
                   ` (10 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:16 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

As a prelude to supporting transparent huge pages, use thp_size() and
similar rather than PAGE_SIZE/SHIFT.

Further, try and frame everything in terms of file positions and lengths
rather than page indices and numbers of pages.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/dir.c      |    2 
 fs/afs/file.c     |    8 -
 fs/afs/internal.h |    2 
 fs/afs/write.c    |  436 +++++++++++++++++++++++++++++------------------------
 4 files changed, 245 insertions(+), 203 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 215f73121391..26ac5cfdf3af 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -2078,6 +2078,6 @@ static void afs_dir_invalidatepage(struct page *page, unsigned int offset,
 		afs_stat_v(dvnode, n_inval);
 
 	/* we clean up only if the entire page is being invalidated */
-	if (offset == 0 && length == PAGE_SIZE)
+	if (offset == 0 && length == thp_size(page))
 		detach_page_private(page);
 }
diff --git a/fs/afs/file.c b/fs/afs/file.c
index a6c6cc9b1d1c..b4a78afad643 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -343,8 +343,8 @@ static int afs_page_filler(struct key *key, struct page *page)
 	req->vnode		= vnode;
 	req->key		= key_get(key);
 	req->pos		= (loff_t)page->index << PAGE_SHIFT;
-	req->len		= PAGE_SIZE;
-	req->nr_pages		= 1;
+	req->len		= thp_size(page);
+	req->nr_pages		= thp_nr_pages(page);
 	req->done		= afs_file_read_done;
 	req->cleanup		= afs_file_read_cleanup;
 
@@ -588,8 +588,8 @@ static void afs_invalidate_dirty(struct page *page, unsigned int offset,
 	trace_afs_page_dirty(vnode, tracepoint_string("undirty"), page);
 	clear_page_dirty_for_io(page);
 full_invalidate:
-	detach_page_private(page);
 	trace_afs_page_dirty(vnode, tracepoint_string("inval"), page);
+	detach_page_private(page);
 }
 
 /*
@@ -634,8 +634,8 @@ static int afs_releasepage(struct page *page, gfp_t gfp_flags)
 #endif
 
 	if (PagePrivate(page)) {
-		detach_page_private(page);
 		trace_afs_page_dirty(vnode, tracepoint_string("rel"), page);
+		detach_page_private(page);
 	}
 
 	/* indicate that the page can be released */
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 3556500a1247..ce1f05d5c94b 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -814,8 +814,6 @@ struct afs_operation {
 			loff_t	pos;
 			loff_t	size;
 			loff_t	i_size;
-			pgoff_t	first;		/* first page in mapping to deal with */
-			pgoff_t	last;		/* last page in mapping to deal with */
 			bool	laundering;	/* Laundering page, PG_writeback not set */
 		} store;
 		struct {
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 89c804bfe253..e672833c99bc 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -94,15 +94,15 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
 	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
 	struct page *page;
 	unsigned long priv;
-	unsigned f, from = pos & (PAGE_SIZE - 1);
-	unsigned t, to = from + len;
-	pgoff_t index = pos >> PAGE_SHIFT;
+	unsigned f, from;
+	unsigned t, to;
+	pgoff_t index;
 	int ret;
 
-	_enter("{%llx:%llu},{%lx},%u,%u",
-	       vnode->fid.vid, vnode->fid.vnode, index, from, to);
+	_enter("{%llx:%llu},%llx,%x",
+	       vnode->fid.vid, vnode->fid.vnode, pos, len);
 
-	page = grab_cache_page_write_begin(mapping, index, flags);
+	page = grab_cache_page_write_begin(mapping, pos / PAGE_SIZE, flags);
 	if (!page)
 		return -ENOMEM;
 
@@ -121,19 +121,20 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
 	wait_on_page_fscache(page);
 #endif
 
+	index = page->index;
+	from = pos - index * PAGE_SIZE;
+	to = from + len;
+
 try_again:
 	/* See if this page is already partially written in a way that we can
 	 * merge the new write with.
 	 */
-	t = f = 0;
 	if (PagePrivate(page)) {
 		priv = page_private(page);
 		f = afs_page_dirty_from(page, priv);
 		t = afs_page_dirty_to(page, priv);
 		ASSERTCMP(f, <=, t);
-	}
 
-	if (f != t) {
 		if (PageWriteback(page)) {
 			trace_afs_page_dirty(vnode, tracepoint_string("alrdy"), page);
 			goto flush_conflicting_write;
@@ -180,7 +181,7 @@ int afs_write_end(struct file *file, struct address_space *mapping,
 {
 	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
 	unsigned long priv;
-	unsigned int f, from = pos & (PAGE_SIZE - 1);
+	unsigned int f, from = pos & (thp_size(page) - 1);
 	unsigned int t, to = from + copied;
 	loff_t i_size, maybe_i_size;
 	int ret = 0;
@@ -233,9 +234,8 @@ int afs_write_end(struct file *file, struct address_space *mapping,
 		trace_afs_page_dirty(vnode, tracepoint_string("dirty"), page);
 	}
 
-	set_page_dirty(page);
-	if (PageDirty(page))
-		_debug("dirtied");
+	if (set_page_dirty(page))
+		_debug("dirtied %lx", page->index);
 	ret = copied;
 
 out:
@@ -248,40 +248,43 @@ int afs_write_end(struct file *file, struct address_space *mapping,
  * kill all the pages in the given range
  */
 static void afs_kill_pages(struct address_space *mapping,
-			   pgoff_t first, pgoff_t last)
+			   loff_t start, loff_t len)
 {
 	struct afs_vnode *vnode = AFS_FS_I(mapping->host);
 	struct pagevec pv;
-	unsigned count, loop;
+	unsigned int loop, psize;
 
-	_enter("{%llx:%llu},%lx-%lx",
-	       vnode->fid.vid, vnode->fid.vnode, first, last);
+	_enter("{%llx:%llu},%llx @%llx",
+	       vnode->fid.vid, vnode->fid.vnode, len, start);
 
 	pagevec_init(&pv);
 
 	do {
-		_debug("kill %lx-%lx", first, last);
+		_debug("kill %llx @%llx", len, start);
 
-		count = last - first + 1;
-		if (count > PAGEVEC_SIZE)
-			count = PAGEVEC_SIZE;
-		pv.nr = find_get_pages_contig(mapping, first, count, pv.pages);
-		ASSERTCMP(pv.nr, ==, count);
+		pv.nr = find_get_pages_contig(mapping, start / PAGE_SIZE,
+					      PAGEVEC_SIZE, pv.pages);
+		if (pv.nr == 0)
+			break;
 
-		for (loop = 0; loop < count; loop++) {
+		for (loop = 0; loop < pv.nr; loop++) {
 			struct page *page = pv.pages[loop];
+
+			if (page->index * PAGE_SIZE >= start + len)
+				break;
+
+			psize = thp_size(page);
+			start += psize;
+			len -= psize;
 			ClearPageUptodate(page);
-			SetPageError(page);
 			end_page_writeback(page);
-			if (page->index >= first)
-				first = page->index + 1;
 			lock_page(page);
 			generic_error_remove_page(mapping, page);
 			unlock_page(page);
 		}
 
 		__pagevec_release(&pv);
-	} while (first <= last);
+	} while (len > 0);
 
 	_leave("");
 }
@@ -291,37 +294,40 @@ static void afs_kill_pages(struct address_space *mapping,
  */
 static void afs_redirty_pages(struct writeback_control *wbc,
 			      struct address_space *mapping,
-			      pgoff_t first, pgoff_t last)
+			      loff_t start, loff_t len)
 {
 	struct afs_vnode *vnode = AFS_FS_I(mapping->host);
 	struct pagevec pv;
-	unsigned count, loop;
+	unsigned int loop, psize;
 
-	_enter("{%llx:%llu},%lx-%lx",
-	       vnode->fid.vid, vnode->fid.vnode, first, last);
+	_enter("{%llx:%llu},%llx @%llx",
+	       vnode->fid.vid, vnode->fid.vnode, len, start);
 
 	pagevec_init(&pv);
 
 	do {
-		_debug("redirty %lx-%lx", first, last);
+		_debug("redirty %llx @%llx", len, start);
 
-		count = last - first + 1;
-		if (count > PAGEVEC_SIZE)
-			count = PAGEVEC_SIZE;
-		pv.nr = find_get_pages_contig(mapping, first, count, pv.pages);
-		ASSERTCMP(pv.nr, ==, count);
+		pv.nr = find_get_pages_contig(mapping, start / PAGE_SIZE,
+					      PAGEVEC_SIZE, pv.pages);
+		if (pv.nr == 0)
+			break;
 
-		for (loop = 0; loop < count; loop++) {
+		for (loop = 0; loop < pv.nr; loop++) {
 			struct page *page = pv.pages[loop];
 
+			if (page->index * PAGE_SIZE >= start + len)
+				break;
+
+			psize = thp_size(page);
+			start += psize;
+			len -= psize;
 			redirty_page_for_writepage(wbc, page);
 			end_page_writeback(page);
-			if (page->index >= first)
-				first = page->index + 1;
 		}
 
 		__pagevec_release(&pv);
-	} while (first <= last);
+	} while (len > 0);
 
 	_leave("");
 }
@@ -329,23 +335,28 @@ static void afs_redirty_pages(struct writeback_control *wbc,
 /*
  * completion of write to server
  */
-static void afs_pages_written_back(struct afs_vnode *vnode, pgoff_t start, pgoff_t last)
+static void afs_pages_written_back(struct afs_vnode *vnode, loff_t start, unsigned int len)
 {
 	struct address_space *mapping = vnode->vfs_inode.i_mapping;
 	struct page *page;
+	pgoff_t end;
 
-	XA_STATE(xas, &mapping->i_pages, start);
+	XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
 
-	_enter("{%llx:%llu},{%lx-%lx}",
-	       vnode->fid.vid, vnode->fid.vnode, start, last);
+	_enter("{%llx:%llu},{%x @%llx}",
+	       vnode->fid.vid, vnode->fid.vnode, len, start);
 
 	rcu_read_lock();
 
-	xas_for_each(&xas, page, last) {
-		ASSERT(PageWriteback(page));
+	end = (start + len - 1) / PAGE_SIZE;
+	xas_for_each(&xas, page, end) {
+		if (!PageWriteback(page)) {
+			kdebug("bad %x @%llx page %lx %lx", len, start, page->index, end);
+			ASSERT(PageWriteback(page));
+		}
 
-		detach_page_private(page);
 		trace_afs_page_dirty(vnode, tracepoint_string("clear"), page);
+		detach_page_private(page);
 		page_endio(page, true, 0);
 	}
 
@@ -404,7 +415,7 @@ static void afs_store_data_success(struct afs_operation *op)
 	afs_vnode_commit_status(op, &op->file[0]);
 	if (op->error == 0) {
 		if (!op->store.laundering)
-			afs_pages_written_back(vnode, op->store.first, op->store.last);
+			afs_pages_written_back(vnode, op->store.pos, op->store.size);
 		afs_stat_v(vnode, n_stores);
 		atomic_long_add(op->store.size, &afs_v2net(vnode)->n_store_bytes);
 	}
@@ -419,8 +430,7 @@ static const struct afs_operation_ops afs_store_data_operation = {
 /*
  * write to a file
  */
-static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter,
-			  loff_t pos, pgoff_t first, pgoff_t last,
+static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, loff_t pos,
 			  bool laundering)
 {
 	struct afs_operation *op;
@@ -453,8 +463,6 @@ static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter,
 	op->file[0].dv_delta = 1;
 	op->store.write_iter = iter;
 	op->store.pos = pos;
-	op->store.first = first;
-	op->store.last = last;
 	op->store.size = size;
 	op->store.i_size = max(pos + size, i_size);
 	op->store.laundering = laundering;
@@ -499,40 +507,49 @@ static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter,
 static void afs_extend_writeback(struct address_space *mapping,
 				 struct afs_vnode *vnode,
 				 long *_count,
-				 pgoff_t start,
-				 pgoff_t final_page,
-				 unsigned *_offset,
-				 unsigned *_to,
-				 bool new_content)
+				 loff_t start,
+				 loff_t max_len,
+				 bool new_content,
+				 unsigned int *_len)
 {
-	struct page *pages[8], *page;
-	unsigned long count = *_count, priv;
-	unsigned offset = *_offset, to = *_to, n, f, t;
-	int loop;
+	struct pagevec pvec;
+	struct page *page;
+	unsigned long priv;
+	unsigned int psize, filler = 0;
+	unsigned int f, t;
+	loff_t len = *_len;
+	pgoff_t index = (start + len) / PAGE_SIZE;
+	bool stop = true;
+	unsigned int i;
+
+	XA_STATE(xas, &mapping->i_pages, index);
+	pagevec_init(&pvec);
 
-	start++;
 	do {
-		_debug("more %lx [%lx]", start, count);
-		n = final_page - start + 1;
-		if (n > ARRAY_SIZE(pages))
-			n = ARRAY_SIZE(pages);
-		n = find_get_pages_contig(mapping, start, ARRAY_SIZE(pages), pages);
-		_debug("fgpc %u", n);
-		if (n == 0)
-			goto no_more;
-		if (pages[0]->index != start) {
-			do {
-				put_page(pages[--n]);
-			} while (n > 0);
-			goto no_more;
-		}
+		/* Firstly, we gather up a batch of contiguous dirty pages
+		 * under the RCU read lock - but we can't clear the dirty flags
+		 * there if any of those pages are mapped.
+		 */
+		rcu_read_lock();
 
-		for (loop = 0; loop < n; loop++) {
-			page = pages[loop];
-			if (to != PAGE_SIZE && !new_content)
+		xas_for_each(&xas, page, ULONG_MAX) {
+			stop = true;
+			if (xas_retry(&xas, page))
+				continue;
+			if (xa_is_value(page))
+				break;
+			if (page->index != index)
 				break;
-			if (page->index > final_page)
+
+			if (!page_cache_get_speculative(page)) {
+				xas_reset(&xas);
+				continue;
+			}
+
+			/* Has the page moved or been split? */
+			if (unlikely(page != xas_reload(&xas)))
 				break;
+
 			if (!trylock_page(page))
 				break;
 			if (!PageDirty(page) || PageWriteback(page)) {
@@ -540,6 +557,7 @@ static void afs_extend_writeback(struct address_space *mapping,
 				break;
 			}
 
+			psize = thp_size(page);
 			priv = page_private(page);
 			f = afs_page_dirty_from(page, priv);
 			t = afs_page_dirty_to(page, priv);
@@ -547,110 +565,126 @@ static void afs_extend_writeback(struct address_space *mapping,
 				unlock_page(page);
 				break;
 			}
-			to = t;
 
+			len += filler + t;
+			filler = psize - t;
+			if (len >= max_len || *_count <= 0)
+				stop = true;
+			else if (t == psize || new_content)
+				stop = false;
+
+			index += thp_nr_pages(page);
+			if (!pagevec_add(&pvec, page))
+				break;
+			if (stop)
+				break;
+		}
+
+		if (!stop)
+			xas_pause(&xas);
+		rcu_read_unlock();
+
+		/* Now, if we obtained any pages, we can shift them to being
+		 * writable and mark them for caching.
+		 */
+		if (!pagevec_count(&pvec))
+			break;
+
+		for (i = 0; i < pagevec_count(&pvec); i++) {
+			page = pvec.pages[i];
 			trace_afs_page_dirty(vnode, tracepoint_string("store+"), page);
 
 			if (!clear_page_dirty_for_io(page))
 				BUG();
 			if (test_set_page_writeback(page))
 				BUG();
+
+			*_count -= thp_nr_pages(page);
 			unlock_page(page);
-			put_page(page);
-		}
-		count += loop;
-		if (loop < n) {
-			for (; loop < n; loop++)
-				put_page(pages[loop]);
-			goto no_more;
 		}
 
-		start += loop;
-	} while (start <= final_page && count < 65536);
+		pagevec_release(&pvec);
+		cond_resched();
+	} while (!stop);
 
-no_more:
-	*_count = count;
-	*_offset = offset;
-	*_to = to;
+	*_len = len;
 }
 
 /*
  * Synchronously write back the locked page and any subsequent non-locked dirty
  * pages.
  */
-static int afs_write_back_from_locked_page(struct address_space *mapping,
-					   struct writeback_control *wbc,
-					   struct page *primary_page,
-					   pgoff_t final_page)
+static ssize_t afs_write_back_from_locked_page(struct address_space *mapping,
+					       struct writeback_control *wbc,
+					       struct page *page,
+					       loff_t start, loff_t end)
 {
 	struct afs_vnode *vnode = AFS_FS_I(mapping->host);
 	struct iov_iter iter;
-	unsigned long count, priv;
-	unsigned offset, to;
-	pgoff_t start, first, last;
-	loff_t i_size, pos, end;
+	unsigned long priv;
+	unsigned int offset, to, len, max_len;
+	loff_t i_size = i_size_read(&vnode->vfs_inode);
 	bool new_content = test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags);
+	long count = wbc->nr_to_write;
 	int ret;
 
-	_enter(",%lx", primary_page->index);
+	_enter(",%lx,%llx-%llx", page->index, start, end);
 
-	count = 1;
-	if (test_set_page_writeback(primary_page))
+	if (test_set_page_writeback(page))
 		BUG();
 
+	count -= thp_nr_pages(page);
+
 	/* Find all consecutive lockable dirty pages that have contiguous
 	 * written regions, stopping when we find a page that is not
 	 * immediately lockable, is not dirty or is missing, or we reach the
 	 * end of the range.
 	 */
-	start = primary_page->index;
-	priv = page_private(primary_page);
-	offset = afs_page_dirty_from(primary_page, priv);
-	to = afs_page_dirty_to(primary_page, priv);
-	trace_afs_page_dirty(vnode, tracepoint_string("store"), primary_page);
-
-	WARN_ON(offset == to);
-	if (offset == to)
-		trace_afs_page_dirty(vnode, tracepoint_string("WARN"), primary_page);
-
-	if (start < final_page &&
-	    (to == PAGE_SIZE || new_content))
-		afs_extend_writeback(mapping, vnode, &count, start, final_page,
-				     &offset, &to, new_content);
+	priv = page_private(page);
+	offset = afs_page_dirty_from(page, priv);
+	to = afs_page_dirty_to(page, priv);
+	trace_afs_page_dirty(vnode, tracepoint_string("store"), page);
+
+	len = to - offset;
+	start += offset;
+	if (start < i_size) {
+		/* Trim the write to the EOF; the extra data is ignored.  Also
+		 * put an upper limit on the size of a single storedata op.
+		 */
+		max_len = 65536 * 4096;
+		max_len = min_t(unsigned long long, max_len, end - start + 1);
+		max_len = min_t(unsigned long long, max_len, i_size - start);
+
+		if (len < max_len &&
+		    (to == thp_size(page) || new_content))
+			afs_extend_writeback(mapping, vnode, &count,
+					     start, max_len, new_content, &len);
+		len = min_t(loff_t, len, max_len);
+	}
 
 	/* We now have a contiguous set of dirty pages, each with writeback
 	 * set; the first page is still locked at this point, but all the rest
 	 * have been unlocked.
 	 */
-	unlock_page(primary_page);
-
-	first = primary_page->index;
-	last = first + count - 1;
-	_debug("write back %lx[%u..] to %lx[..%u]", first, offset, last, to);
-
-	pos = first;
-	pos <<= PAGE_SHIFT;
-	pos += offset;
-	end = last;
-	end <<= PAGE_SHIFT;
-	end += to;
+	unlock_page(page);
 
-	/* Trim the actual write down to the EOF */
-	i_size = i_size_read(&vnode->vfs_inode);
-	if (end > i_size)
-		end = i_size;
+	if (start < i_size) {
+		_debug("write back %x @%llx [%llx]", len, start, i_size);
 
-	if (pos < i_size) {
-		iov_iter_xarray(&iter, WRITE, &mapping->i_pages, pos, end - pos);
-		ret = afs_store_data(vnode, &iter, pos, first, last, false);
+		iov_iter_xarray(&iter, WRITE, &mapping->i_pages, start, len);
+		ret = afs_store_data(vnode, &iter, start, false);
 	} else {
+		_debug("write discard %x @%llx [%llx]", len, start, i_size);
+
 		/* The dirty region was entirely beyond the EOF. */
+		afs_pages_written_back(vnode, start, len);
 		ret = 0;
 	}
 
 	switch (ret) {
 	case 0:
-		ret = count;
+		wbc->nr_to_write = count;
+		ret = len;
 		break;
 
 	default:
@@ -662,13 +696,13 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
 	case -EKEYEXPIRED:
 	case -EKEYREJECTED:
 	case -EKEYREVOKED:
-		afs_redirty_pages(wbc, mapping, first, last);
+		afs_redirty_pages(wbc, mapping, start, len);
 		mapping_set_error(mapping, ret);
 		break;
 
 	case -EDQUOT:
 	case -ENOSPC:
-		afs_redirty_pages(wbc, mapping, first, last);
+		afs_redirty_pages(wbc, mapping, start, len);
 		mapping_set_error(mapping, -ENOSPC);
 		break;
 
@@ -680,7 +714,7 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
 	case -ENOMEDIUM:
 	case -ENXIO:
 		trace_afs_file_error(vnode, ret, afs_file_error_writeback_fail);
-		afs_kill_pages(mapping, first, last);
+		afs_kill_pages(mapping, start, len);
 		mapping_set_error(mapping, ret);
 		break;
 	}
@@ -695,19 +729,19 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
  */
 int afs_writepage(struct page *page, struct writeback_control *wbc)
 {
-	int ret;
+	ssize_t ret;
+	loff_t start;
 
 	_enter("{%lx},", page->index);
 
+	start = page->index * PAGE_SIZE;
 	ret = afs_write_back_from_locked_page(page->mapping, wbc, page,
-					      wbc->range_end >> PAGE_SHIFT);
+					      start, LLONG_MAX - start);
 	if (ret < 0) {
-		_leave(" = %d", ret);
-		return 0;
+		_leave(" = %zd", ret);
+		return ret;
 	}
 
-	wbc->nr_to_write -= ret;
-
 	_leave(" = 0");
 	return 0;
 }
@@ -717,35 +751,46 @@ int afs_writepage(struct page *page, struct writeback_control *wbc)
  */
 static int afs_writepages_region(struct address_space *mapping,
 				 struct writeback_control *wbc,
-				 pgoff_t index, pgoff_t end, pgoff_t *_next)
+				 loff_t start, loff_t end, loff_t *_next)
 {
 	struct page *page;
-	int ret, n;
+	ssize_t ret;
+	int n;
 
-	_enter(",,%lx,%lx,", index, end);
+	_enter("%llx,%llx,", start, end);
 
 	do {
-		n = find_get_pages_range_tag(mapping, &index, end,
-					PAGECACHE_TAG_DIRTY, 1, &page);
+		pgoff_t index = start / PAGE_SIZE;
+
+		n = find_get_pages_range_tag(mapping, &index, end / PAGE_SIZE,
+					     PAGECACHE_TAG_DIRTY, 1, &page);
 		if (!n)
 			break;
 
+		start = (loff_t)page->index * PAGE_SIZE; /* May regress with THPs */
+
 		_debug("wback %lx", page->index);
 
-		/*
-		 * at this point we hold neither the i_pages lock nor the
+		/* At this point we hold neither the i_pages lock nor the
 		 * page lock: the page may be truncated or invalidated
 		 * (changing page->mapping to NULL), or even swizzled
 		 * back from swapper_space to tmpfs file mapping
 		 */
-		ret = lock_page_killable(page);
-		if (ret < 0) {
-			put_page(page);
-			_leave(" = %d", ret);
-			return ret;
+		if (wbc->sync_mode != WB_SYNC_NONE) {
+			ret = lock_page_killable(page);
+			if (ret < 0) {
+				put_page(page);
+				return ret;
+			}
+		} else {
+			if (!trylock_page(page)) {
+				put_page(page);
+				return 0;
+			}
 		}
 
 		if (page->mapping != mapping || !PageDirty(page)) {
+			start += thp_size(page);
 			unlock_page(page);
 			put_page(page);
 			continue;
@@ -761,20 +806,20 @@ static int afs_writepages_region(struct address_space *mapping,
 
 		if (!clear_page_dirty_for_io(page))
 			BUG();
-		ret = afs_write_back_from_locked_page(mapping, wbc, page, end);
+		ret = afs_write_back_from_locked_page(mapping, wbc, page, start, end);
 		put_page(page);
 		if (ret < 0) {
-			_leave(" = %d", ret);
+			_leave(" = %zd", ret);
 			return ret;
 		}
 
-		wbc->nr_to_write -= ret;
+		start += ret * PAGE_SIZE;
 
 		cond_resched();
-	} while (index < end && wbc->nr_to_write > 0);
+	} while (wbc->nr_to_write > 0);
 
-	*_next = index;
-	_leave(" = 0 [%lx]", *_next);
+	*_next = start;
+	_leave(" = 0 [%llx]", *_next);
 	return 0;
 }
 
@@ -785,7 +830,7 @@ int afs_writepages(struct address_space *mapping,
 		   struct writeback_control *wbc)
 {
 	struct afs_vnode *vnode = AFS_FS_I(mapping->host);
-	pgoff_t start, end, next;
+	loff_t start, next;
 	int ret;
 
 	_enter("");
@@ -800,22 +845,19 @@ int afs_writepages(struct address_space *mapping,
 		return 0;
 
 	if (wbc->range_cyclic) {
-		start = mapping->writeback_index;
-		end = -1;
-		ret = afs_writepages_region(mapping, wbc, start, end, &next);
+		start = mapping->writeback_index * PAGE_SIZE;
+		ret = afs_writepages_region(mapping, wbc, start, LLONG_MAX, &next);
 		if (start > 0 && wbc->nr_to_write > 0 && ret == 0)
 			ret = afs_writepages_region(mapping, wbc, 0, start,
 						    &next);
-		mapping->writeback_index = next;
+		mapping->writeback_index = next / PAGE_SIZE;
 	} else if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) {
-		end = (pgoff_t)(LLONG_MAX >> PAGE_SHIFT);
-		ret = afs_writepages_region(mapping, wbc, 0, end, &next);
+		ret = afs_writepages_region(mapping, wbc, 0, LLONG_MAX, &next);
 		if (wbc->nr_to_write > 0)
 			mapping->writeback_index = next;
 	} else {
-		start = wbc->range_start >> PAGE_SHIFT;
-		end = wbc->range_end >> PAGE_SHIFT;
-		ret = afs_writepages_region(mapping, wbc, start, end, &next);
+		ret = afs_writepages_region(mapping, wbc,
+					    wbc->range_start, wbc->range_end, &next);
 	}
 
 	up_read(&vnode->validate_lock);
@@ -873,13 +915,13 @@ int afs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
  */
 vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 {
+	struct page *page = thp_head(vmf->page);
 	struct file *file = vmf->vma->vm_file;
 	struct inode *inode = file_inode(file);
 	struct afs_vnode *vnode = AFS_FS_I(inode);
 	unsigned long priv;
 
-	_enter("{{%llx:%llu}},{%lx}",
-	       vnode->fid.vid, vnode->fid.vnode, vmf->page->index);
+	_enter("{{%llx:%llu}},{%lx}", vnode->fid.vid, vnode->fid.vnode, page->index);
 
 	sb_start_pagefault(inode->i_sb);
 
@@ -887,31 +929,33 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
 	 * be modified.  We then assume the entire page will need writing back.
 	 */
 #ifdef CONFIG_AFS_FSCACHE
-	if (PageFsCache(vmf->page) &&
-	    wait_on_page_bit_killable(vmf->page, PG_fscache) < 0)
+	if (PageFsCache(page) &&
+	    wait_on_page_bit_killable(page, PG_fscache) < 0)
 		return VM_FAULT_RETRY;
 #endif
 
-	if (PageWriteback(vmf->page) &&
-	    wait_on_page_bit_killable(vmf->page, PG_writeback) < 0)
+	if (PageWriteback(page) &&
+	    wait_on_page_bit_killable(page, PG_writeback) < 0)
 		return VM_FAULT_RETRY;
 
-	if (lock_page_killable(vmf->page) < 0)
+	if (lock_page_killable(page) < 0)
 		return VM_FAULT_RETRY;
 
 	/* We mustn't change page->private until writeback is complete as that
 	 * details the portion of the page we need to write back and we might
 	 * need to redirty the page if there's a problem.
 	 */
-	wait_on_page_writeback(vmf->page);
+	wait_on_page_writeback(page);
 
-	priv = afs_page_dirty(vmf->page, 0, PAGE_SIZE);
+	priv = afs_page_dirty(page, 0, thp_size(page));
 	priv = afs_page_dirty_mmapped(priv);
-	if (PagePrivate(vmf->page))
-		set_page_private(vmf->page, priv);
-	else
-		attach_page_private(vmf->page, (void *)priv);
-	trace_afs_page_dirty(vnode, tracepoint_string("mkwrite"), vmf->page);
+	if (PagePrivate(page)) {
+		set_page_private(page, priv);
+		trace_afs_page_dirty(vnode, tracepoint_string("mkwrite+"), page);
+	} else {
+		attach_page_private(page, (void *)priv);
+		trace_afs_page_dirty(vnode, tracepoint_string("mkwrite"), page);
+	}
 	file_update_time(file);
 
 	sb_end_pagefault(inode->i_sb);
@@ -964,7 +1008,7 @@ int afs_launder_page(struct page *page)
 	priv = page_private(page);
 	if (clear_page_dirty_for_io(page)) {
 		f = 0;
-		t = PAGE_SIZE;
+		t = thp_size(page);
 		if (PagePrivate(page)) {
 			f = afs_page_dirty_from(page, priv);
 			t = afs_page_dirty_to(page, priv);
@@ -976,12 +1020,12 @@ int afs_launder_page(struct page *page)
 		iov_iter_bvec(&iter, WRITE, bv, 1, bv[0].bv_len);
 
 		trace_afs_page_dirty(vnode, tracepoint_string("launder"), page);
-		ret = afs_store_data(vnode, &iter, (loff_t)page->index << PAGE_SHIFT,
-				     page->index, page->index, true);
+		ret = afs_store_data(vnode, &iter, (loff_t)page->index * PAGE_SIZE,
+				     true);
 	}
 
-	detach_page_private(page);
 	trace_afs_page_dirty(vnode, tracepoint_string("laundered"), page);
+	detach_page_private(page);
 	wait_on_page_fscache(page);
 	return ret;
 }



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 66/76] afs: Use the fs operation ops to handle FetchData completion
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (63 preceding siblings ...)
  2020-11-20 15:16 ` [RFC PATCH 65/76] afs: Prepare for use of THPs David Howells
@ 2020-11-20 15:16 ` David Howells
  2020-11-20 15:17 ` [RFC PATCH 67/76] afs: Use new fscache read helper API David Howells
                   ` (9 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:16 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Use the 'success' and 'aborted' afs_operations_ops methods and add a
'failed' method to handle the completion of an AFS.FetchData,
AFS.FetchData64 or YFS.FetchData64 RPC operation rather than directly
calling the done func pointed to by the afs_read struct from the call
delivery handler.

This means the done function will be called back on error also, not just on
successful completion.

This allows motion towards asynchronous data reception on data fetch calls
and allows any error to be handed off to the fscache read helper in the
same place as a successful completion.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/file.c         |   15 +++++++++++++++
 fs/afs/fs_operation.c |    4 +++-
 fs/afs/fsclient.c     |    3 ---
 fs/afs/internal.h     |    1 +
 fs/afs/yfsclient.c    |    3 ---
 5 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index b4a78afad643..38dda8990b4c 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -268,6 +268,19 @@ void afs_put_read(struct afs_read *req)
 	}
 }
 
+static void afs_fetch_data_notify(struct afs_operation *op)
+{
+	struct afs_read *req = op->fetch.req;
+	int error = op->error;
+
+	if (error == -ECONNABORTED)
+		error = afs_abort_to_error(op->ac.abort_code);
+	req->error = error;
+
+	if (req->done)
+		req->done(req);
+}
+
 static void afs_fetch_data_success(struct afs_operation *op)
 {
 	struct afs_vnode *vnode = op->file[0].vnode;
@@ -276,6 +289,7 @@ static void afs_fetch_data_success(struct afs_operation *op)
 	afs_vnode_commit_status(op, &op->file[0]);
 	afs_stat_v(vnode, n_fetches);
 	atomic_long_add(op->fetch.req->actual_len, &op->net->n_fetch_bytes);
+	afs_fetch_data_notify(op);
 }
 
 static void afs_fetch_data_put(struct afs_operation *op)
@@ -289,6 +303,7 @@ static const struct afs_operation_ops afs_fetch_data_operation = {
 	.issue_yfs_rpc	= yfs_fs_fetch_data,
 	.success	= afs_fetch_data_success,
 	.aborted	= afs_check_for_remote_deletion,
+	.failed		= afs_fetch_data_notify,
 	.put		= afs_fetch_data_put,
 };
 
diff --git a/fs/afs/fs_operation.c b/fs/afs/fs_operation.c
index 97cab12b0a6c..938e28a00101 100644
--- a/fs/afs/fs_operation.c
+++ b/fs/afs/fs_operation.c
@@ -195,8 +195,10 @@ void afs_wait_for_operation(struct afs_operation *op)
 	case -ECONNABORTED:
 		if (op->ops->aborted)
 			op->ops->aborted(op);
-		break;
+		fallthrough;
 	default:
+		if (op->ops->failed)
+			op->ops->failed(op);
 		break;
 	}
 
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 31e6b3635541..5e34f4dbd385 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -392,9 +392,6 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
 		break;
 	}
 
-	if (req->done)
-		req->done(req);
-
 	_leave(" = 0 [done]");
 	return 0;
 }
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index ce1f05d5c94b..e7ec2f4acc4d 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -742,6 +742,7 @@ struct afs_operation_ops {
 	void (*issue_yfs_rpc)(struct afs_operation *op);
 	void (*success)(struct afs_operation *op);
 	void (*aborted)(struct afs_operation *op);
+	void (*failed)(struct afs_operation *op);
 	void (*edit_dir)(struct afs_operation *op);
 	void (*put)(struct afs_operation *op);
 };
diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c
index 363d6dd276c0..2b35cba8ad62 100644
--- a/fs/afs/yfsclient.c
+++ b/fs/afs/yfsclient.c
@@ -449,9 +449,6 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call *call)
 		break;
 	}
 
-	if (req->done)
-		req->done(req);
-
 	_leave(" = 0 [done]");
 	return 0;
 }



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 67/76] afs: Use new fscache read helper API
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (64 preceding siblings ...)
  2020-11-20 15:16 ` [RFC PATCH 66/76] afs: Use the fs operation ops to handle FetchData completion David Howells
@ 2020-11-20 15:17 ` David Howells
  2020-11-20 15:17 ` [RFC PATCH 68/76] netfs: Add write_begin helper David Howells
                   ` (8 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:17 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Make AFS use the new fscache read helpers to implement the VM read
operations:

 - afs_readpage() now hands off responsibility to fscache_readpage().

 - afs_readpages() is gone and replaced with afs_readahead().

 - afs_readahead() just hands off responsibility to fscache_readahead().

These make use of the cache if a cookie is supplied, otherwise just call
the ->issue_op() method a sufficient number of times to complete the entire
request.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/file.c     |  326 +++++++++++++----------------------------------------
 fs/afs/fsclient.c |    1 
 fs/afs/internal.h |    2 
 fs/afs/super.c    |   24 ++++
 4 files changed, 108 insertions(+), 245 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 38dda8990b4c..8772f5e1cbd3 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -14,6 +14,7 @@
 #include <linux/gfp.h>
 #include <linux/task_io_accounting_ops.h>
 #include <linux/mm.h>
+#include <linux/netfs.h>
 #include "internal.h"
 
 static int afs_file_mmap(struct file *file, struct vm_area_struct *vma);
@@ -22,8 +23,7 @@ static void afs_invalidatepage(struct page *page, unsigned int offset,
 			       unsigned int length);
 static int afs_releasepage(struct page *page, gfp_t gfp_flags);
 
-static int afs_readpages(struct file *filp, struct address_space *mapping,
-			 struct list_head *pages, unsigned nr_pages);
+static void afs_readahead(struct readahead_control *ractl);
 
 const struct file_operations afs_file_operations = {
 	.open		= afs_open,
@@ -48,7 +48,7 @@ const struct inode_operations afs_file_inode_operations = {
 
 const struct address_space_operations afs_fs_aops = {
 	.readpage	= afs_readpage,
-	.readpages	= afs_readpages,
+	.readahead	= afs_readahead,
 	.set_page_dirty	= afs_set_page_dirty,
 	.launder_page	= afs_launder_page,
 	.releasepage	= afs_releasepage,
@@ -198,61 +198,17 @@ int afs_release(struct inode *inode, struct file *file)
 }
 
 /*
- * Handle completion of a read operation.
+ * Allocate a new read record.
  */
-static void afs_file_read_done(struct afs_read *req)
+struct afs_read *afs_alloc_read(gfp_t gfp)
 {
-	struct afs_vnode *vnode = req->vnode;
-	struct page *page;
-	pgoff_t index = req->pos >> PAGE_SHIFT;
-	pgoff_t last = index + req->nr_pages - 1;
-
-	XA_STATE(xas, &vnode->vfs_inode.i_mapping->i_pages, index);
-
-	if (iov_iter_count(req->iter) > 0) {
-		/* The read was short - clear the excess buffer. */
-		_debug("afterclear %zx %zx %llx/%llx",
-		       req->iter->iov_offset,
-		       iov_iter_count(req->iter),
-		       req->actual_len, req->len);
-		iov_iter_zero(iov_iter_count(req->iter), req->iter);
-	}
-
-	rcu_read_lock();
-	xas_for_each(&xas, page, last) {
-		page_endio(page, false, 0);
-		put_page(page);
-	}
-	rcu_read_unlock();
-
-	task_io_account_read(req->len);
-	req->cleanup = NULL;
-}
-
-/*
- * Dispose of our locks and refs on the pages if the read failed.
- */
-static void afs_file_read_cleanup(struct afs_read *req)
-{
-	struct page *page;
-	pgoff_t index = req->pos >> PAGE_SHIFT;
-	pgoff_t last = index + req->nr_pages - 1;
-
-	if (req->iter) {
-		XA_STATE(xas, &req->vnode->vfs_inode.i_mapping->i_pages, index);
-
-		_enter("%lu,%u,%zu", index, req->nr_pages, iov_iter_count(req->iter));
+	struct afs_read *req;
 
-		rcu_read_lock();
-		xas_for_each(&xas, page, last) {
-			BUG_ON(xa_is_value(page));
-			BUG_ON(PageCompound(page));
+	req = kzalloc(sizeof(struct afs_read), gfp);
+	if (req)
+		refcount_set(&req->usage, 1);
 
-			page_endio(page, false, req->error);
-			put_page(page);
-		}
-		rcu_read_unlock();
-	}
+	return req;
 }
 
 /*
@@ -271,14 +227,20 @@ void afs_put_read(struct afs_read *req)
 static void afs_fetch_data_notify(struct afs_operation *op)
 {
 	struct afs_read *req = op->fetch.req;
+	struct netfs_read_subrequest *subreq = req->subreq;
 	int error = op->error;
 
 	if (error == -ECONNABORTED)
 		error = afs_abort_to_error(op->ac.abort_code);
 	req->error = error;
 
-	if (req->done)
+	if (subreq) {
+		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+		netfs_subreq_terminated(subreq, error ?: req->actual_len);
+		req->subreq = NULL;
+	} else if (req->done) {
 		req->done(req);
+	}
 }
 
 static void afs_fetch_data_success(struct afs_operation *op)
@@ -332,222 +294,96 @@ int afs_fetch_data(struct afs_vnode *vnode, struct afs_read *req)
 	return afs_do_sync_operation(op);
 }
 
-/*
- * read page from file, directory or symlink, given a key to use
- */
-static int afs_page_filler(struct key *key, struct page *page)
+static void afs_req_issue_op(struct netfs_read_subrequest *subreq)
 {
-	struct inode *inode = page->mapping->host;
-	struct afs_vnode *vnode = AFS_FS_I(inode);
-	struct afs_read *req;
+	struct afs_vnode *vnode = AFS_FS_I(subreq->rreq->inode);
+	struct afs_read *fsreq;
 	int ret;
 
-	_enter("{%x},{%lu},{%lu}", key_serial(key), inode->i_ino, page->index);
+	fsreq = afs_alloc_read(GFP_NOFS);
+	if (!fsreq) {
+		subreq->error = -ENOMEM;
+		return;
+	}
 
-	BUG_ON(!PageLocked(page));
+	fsreq->subreq	= subreq;
+	fsreq->pos	= subreq->start + subreq->transferred;
+	fsreq->len	= subreq->len   - subreq->transferred;
+	fsreq->key	= subreq->rreq->netfs_priv;
+	fsreq->vnode	= vnode;
+	fsreq->iter	= &fsreq->def_iter;
 
-	ret = -ESTALE;
-	if (test_bit(AFS_VNODE_DELETED, &vnode->flags))
-		goto error;
+	iov_iter_xarray(&fsreq->def_iter, READ,
+			&fsreq->vnode->vfs_inode.i_mapping->i_pages,
+			fsreq->pos, fsreq->len);
 
-	req = kzalloc(sizeof(struct afs_read), GFP_KERNEL);
-	if (!req)
-		goto enomem;
-
-	refcount_set(&req->usage, 1);
-	req->vnode		= vnode;
-	req->key		= key_get(key);
-	req->pos		= (loff_t)page->index << PAGE_SHIFT;
-	req->len		= thp_size(page);
-	req->nr_pages		= thp_nr_pages(page);
-	req->done		= afs_file_read_done;
-	req->cleanup		= afs_file_read_cleanup;
-
-	get_page(page);
-	iov_iter_xarray(&req->def_iter, READ, &page->mapping->i_pages,
-			req->pos, req->len);
-	req->iter = &req->def_iter;
-
-	ret = afs_fetch_data(vnode, req);
+	ret = afs_fetch_data(fsreq->vnode, fsreq);
 	if (ret < 0)
-		goto fetch_error;
-
-	afs_put_read(req);
-	_leave(" = 0");
-	return 0;
-
-fetch_error:
-	switch (ret) {
-	case -EINTR:
-	case -ENOMEM:
-	case -ERESTARTSYS:
-	case -EAGAIN:
-		afs_put_read(req);
-		goto error;
-	case -ENOENT:
-		_debug("got NOENT from server - marking file deleted and stale");
-		set_bit(AFS_VNODE_DELETED, &vnode->flags);
-		ret = -ESTALE;
-		/* Fall through */
-	default:
-		page_endio(page, false, ret);
-		afs_put_read(req);
-		_leave(" = %d", ret);
-		return ret;
-	}
-
-enomem:
-	ret = -ENOMEM;
-error:
-	unlock_page(page);
-	_leave(" = %d", ret);
-	return ret;
+		subreq->error = ret;
 }
 
-/*
- * read page from file, directory or symlink, given a file to nominate the key
- * to be used
- */
-static int afs_readpage(struct file *file, struct page *page)
+static int afs_symlink_readpage(struct page *page)
 {
-	struct key *key;
+	struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
+	struct afs_read *fsreq;
 	int ret;
 
-	if (file) {
-		key = afs_file_key(file);
-		ASSERT(key != NULL);
-		ret = afs_page_filler(key, page);
-	} else {
-		struct inode *inode = page->mapping->host;
-		key = afs_request_key(AFS_FS_S(inode->i_sb)->cell);
-		if (IS_ERR(key)) {
-			ret = PTR_ERR(key);
-		} else {
-			ret = afs_page_filler(key, page);
-			key_put(key);
-		}
-	}
-	return ret;
-}
-
-/*
- * Read a contiguous set of pages.
- */
-static int afs_readpages_one(struct file *file, struct address_space *mapping,
-			     struct list_head *pages)
-{
-	struct afs_vnode *vnode = AFS_FS_I(mapping->host);
-	struct afs_read *req;
-	struct list_head *p;
-	struct page *first, *page;
-	pgoff_t index;
-	int ret, n;
-
-	/* Count the number of contiguous pages at the front of the list.  Note
-	 * that the list goes prev-wards rather than next-wards.
-	 */
-	first = lru_to_page(pages);
-	index = first->index + 1;
-	n = 1;
-	for (p = first->lru.prev; p != pages; p = p->prev) {
-		page = list_entry(p, struct page, lru);
-		if (page->index != index)
-			break;
-		index++;
-		n++;
-	}
-
-	req = kzalloc(sizeof(struct afs_read), GFP_NOFS);
-	if (!req)
+	fsreq = afs_alloc_read(GFP_NOFS);
+	if (!fsreq)
 		return -ENOMEM;
 
-	refcount_set(&req->usage, 1);
-	req->vnode = vnode;
-	req->key = key_get(afs_file_key(file));
-	req->done = afs_file_read_done;
-	req->cleanup = afs_file_read_cleanup;
-	req->pos = first->index;
-	req->pos <<= PAGE_SHIFT;
-
-	/* Add pages to the LRU until it fails.  We keep the pages ref'd and
-	 * locked until the read is complete.
-	 *
-	 * Note that it's possible for the file size to change whilst we're
-	 * doing this, but we rely on the server returning less than we asked
-	 * for if the file shrank.  We also rely on this to deal with a partial
-	 * page at the end of the file.
-	 */
-	do {
-		page = lru_to_page(pages);
-		list_del(&page->lru);
-		index = page->index;
-		if (add_to_page_cache_lru(page, mapping, index,
-					  readahead_gfp_mask(mapping))) {
-			put_page(page);
-			break;
-		}
-
-		req->nr_pages++;
-	} while (req->nr_pages < n);
-
-	if (req->nr_pages == 0) {
-		afs_put_read(req);
-		return 0;
-	}
+	fsreq->pos	= page->index * PAGE_SIZE;
+	fsreq->len	= PAGE_SIZE;
+	fsreq->vnode	= vnode;
+	fsreq->iter	= &fsreq->def_iter;
+	iov_iter_xarray(&fsreq->def_iter, READ, &page->mapping->i_pages,
+			fsreq->pos, fsreq->len);
 
-	req->len = req->nr_pages * PAGE_SIZE;
-	iov_iter_xarray(&req->def_iter, READ, &file->f_mapping->i_pages,
-			req->pos, req->len);
-	req->iter = &req->def_iter;
+	ret = afs_fetch_data(fsreq->vnode, fsreq);
+	page_endio(page, false, ret);
+	return ret;
+}
 
-	ret = afs_fetch_data(vnode, req);
-	if (ret < 0)
-		goto error;
+static void afs_init_rreq(struct netfs_read_request *rreq, struct file *file)
+{
+	struct afs_vnode *vnode = AFS_FS_I(rreq->inode);
 
-	afs_put_read(req);
-	return 0;
+	rreq->netfs_priv = key_get(afs_file_key(file));
+	if (afs_vnode_cache(vnode))
+		rreq->cookie_debug_id = afs_vnode_cache(vnode)->debug_id;
+}
 
-error:
-	if (ret == -ENOENT) {
-		_debug("got NOENT from server - marking file deleted and stale");
-		set_bit(AFS_VNODE_DELETED, &vnode->flags);
-		ret = -ESTALE;
-	}
+static int afs_begin_cache_operation(struct netfs_read_request *rreq)
+{
+	struct afs_vnode *vnode = AFS_FS_I(rreq->inode);
 
-	afs_put_read(req);
-	return ret;
+	return fscache_begin_operation(afs_vnode_cache(vnode), &rreq->cache_resources,
+				       FSCACHE_WANT_PARAMS);
 }
 
-/*
- * read a set of pages
- */
-static int afs_readpages(struct file *file, struct address_space *mapping,
-			 struct list_head *pages, unsigned nr_pages)
+static void afs_priv_cleanup(struct address_space *mapping, void *netfs_priv)
 {
-	struct key *key = afs_file_key(file);
-	struct afs_vnode *vnode;
-	int ret = 0;
-
-	_enter("{%d},{%lu},,%d",
-	       key_serial(key), mapping->host->i_ino, nr_pages);
+	key_put(netfs_priv);
+}
 
-	ASSERT(key != NULL);
+static const struct netfs_read_request_ops afs_req_ops = {
+	.init_rreq		= afs_init_rreq,
+	.begin_cache_operation	= afs_begin_cache_operation,
+	.issue_op		= afs_req_issue_op,
+	.cleanup		= afs_priv_cleanup,
+};
 
-	vnode = AFS_FS_I(mapping->host);
-	if (test_bit(AFS_VNODE_DELETED, &vnode->flags)) {
-		_leave(" = -ESTALE");
-		return -ESTALE;
-	}
+static int afs_readpage(struct file *file, struct page *page)
+{
+	if (!file)
+		return afs_symlink_readpage(page);
 
-	/* attempt to read as many of the pages as possible */
-	while (!list_empty(pages)) {
-		ret = afs_readpages_one(file, mapping, pages);
-		if (ret < 0)
-			break;
-	}
+	return netfs_readpage(file, page, &afs_req_ops, NULL);
+}
 
-	_leave(" = %d [netting]", ret);
-	return ret;
+static void afs_readahead(struct readahead_control *ractl)
+{
+	netfs_readahead(ractl, &afs_req_ops, NULL);
 }
 
 /*
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 5e34f4dbd385..2f695a260442 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -10,6 +10,7 @@
 #include <linux/sched.h>
 #include <linux/circ_buf.h>
 #include <linux/iversion.h>
+#include <linux/netfs.h>
 #include "internal.h"
 #include "afs_fs.h"
 #include "xdr_fs.h"
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index e7ec2f4acc4d..1f7caa560c5c 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -207,6 +207,7 @@ struct afs_read {
 	loff_t			file_size;	/* File size returned by server */
 	struct key		*key;		/* The key to use to reissue the read */
 	struct afs_vnode	*vnode;		/* The file being read into. */
+	struct netfs_read_subrequest *subreq;	/* Fscache helper read request this belongs to */
 	afs_dataversion_t	data_version;	/* Version number returned by server */
 	refcount_t		usage;
 	unsigned int		call_debug_id;
@@ -1041,6 +1042,7 @@ extern void afs_put_wb_key(struct afs_wb_key *);
 extern int afs_open(struct inode *, struct file *);
 extern int afs_release(struct inode *, struct file *);
 extern int afs_fetch_data(struct afs_vnode *, struct afs_read *);
+extern struct afs_read *afs_alloc_read(gfp_t);
 extern void afs_put_read(struct afs_read *);
 
 static inline struct afs_read *afs_get_read(struct afs_read *req)
diff --git a/fs/afs/super.c b/fs/afs/super.c
index 6c5900df6aa5..d745e228961a 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -40,6 +40,12 @@ static int afs_show_options(struct seq_file *m, struct dentry *root);
 static int afs_init_fs_context(struct fs_context *fc);
 static const struct fs_parameter_spec afs_fs_parameters[];
 
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_put_super (struct super_block *sb);
+#else
+#define afs_put_super NULL
+#endif
+
 struct file_system_type afs_fs_type = {
 	.owner			= THIS_MODULE,
 	.name			= "afs",
@@ -61,6 +67,7 @@ static const struct super_operations afs_super_ops = {
 	.evict_inode	= afs_evict_inode,
 	.show_devname	= afs_show_devname,
 	.show_options	= afs_show_options,
+	.put_super	= afs_put_super,
 };
 
 static struct kmem_cache *afs_inode_cachep;
@@ -546,6 +553,23 @@ static void afs_kill_super(struct super_block *sb)
 	afs_destroy_sbi(as);
 }
 
+#ifdef CONFIG_AFS_FSCACHE
+static struct fscache_cookie *afs_put_super_get_cookie(struct inode *inode)
+{
+	return afs_vnode_cache(AFS_FS_I(inode));
+}
+
+static void afs_put_super(struct super_block *sb)
+{
+	struct afs_super_info *as = AFS_FS_S(sb);
+
+	if (as &&
+	    as->volume &&
+	    as->volume->cache)
+		fscache_put_super(sb, afs_put_super_get_cookie);
+}
+#endif
+
 /*
  * Get an AFS superblock and root directory.
  */



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 68/76] netfs: Add write_begin helper
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (65 preceding siblings ...)
  2020-11-20 15:17 ` [RFC PATCH 67/76] afs: Use new fscache read helper API David Howells
@ 2020-11-20 15:17 ` David Howells
  2020-11-20 15:17 ` [RFC PATCH 69/76] fscache: Add support for writing to the cache David Howells
                   ` (7 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:17 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Add a helper to do the pre-reading work for the netfs write_begin address
space op.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/file.c                |    8 ++
 fs/netfs/internal.h          |    2 +
 fs/netfs/read_helper.c       |  168 ++++++++++++++++++++++++++++++++++++++++++
 fs/netfs/stats.c             |   10 ++-
 include/linux/netfs.h        |    8 ++
 include/trace/events/netfs.h |    4 +
 6 files changed, 196 insertions(+), 4 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 8772f5e1cbd3..84a222587ca3 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -353,6 +353,13 @@ static void afs_init_rreq(struct netfs_read_request *rreq, struct file *file)
 		rreq->cookie_debug_id = afs_vnode_cache(vnode)->debug_id;
 }
 
+static bool afs_is_cache_enabled(struct inode *inode)
+{
+	struct fscache_cookie *cookie = afs_vnode_cache(AFS_FS_I(inode));
+
+	return fscache_cookie_enabled(cookie) && !hlist_empty(&cookie->backing_objects);
+}
+
 static int afs_begin_cache_operation(struct netfs_read_request *rreq)
 {
 	struct afs_vnode *vnode = AFS_FS_I(rreq->inode);
@@ -368,6 +375,7 @@ static void afs_priv_cleanup(struct address_space *mapping, void *netfs_priv)
 
 static const struct netfs_read_request_ops afs_req_ops = {
 	.init_rreq		= afs_init_rreq,
+	.is_cache_enabled	= afs_is_cache_enabled,
 	.begin_cache_operation	= afs_begin_cache_operation,
 	.issue_op		= afs_req_issue_op,
 	.cleanup		= afs_priv_cleanup,
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index d83317b1eb9d..3d7ab3ab5743 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -34,8 +34,10 @@ extern atomic_t netfs_n_rh_read_failed;
 extern atomic_t netfs_n_rh_zero;
 extern atomic_t netfs_n_rh_short_read;
 extern atomic_t netfs_n_rh_write;
+extern atomic_t netfs_n_rh_write_begin;
 extern atomic_t netfs_n_rh_write_done;
 extern atomic_t netfs_n_rh_write_failed;
+extern atomic_t netfs_n_rh_write_zskip;
 
 
 static inline void netfs_stat(atomic_t *stat)
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index e1ce197d8d18..3e59d7d2f77e 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -955,3 +955,171 @@ int netfs_readpage(struct file *file,
 	return ret;
 }
 EXPORT_SYMBOL(netfs_readpage);
+
+static void netfs_clear_thp(struct page *page)
+{
+	unsigned int i;
+
+	for (i = 0; i < thp_nr_pages(page); i++)
+		clear_highpage(page + i);
+}
+
+/**
+ * netfs_write_begin - Helper to prepare for writing
+ * @file: The file to read from
+ * @mapping: The mapping to read from
+ * @pos: File position at which the write will begin
+ * @len: The length of the write in this page
+ * @flags: AOP_* flags
+ * @_page: Where to put the resultant page
+ * @_fsdata: Place for the netfs to store a cookie
+ * @ops: The network filesystem's operations for the helper to use
+ * @netfs_priv: Private netfs data to be retained in the request
+ *
+ * Pre-read data for a write-begin request by drawing data from the cache if
+ * possible, or the netfs if not.  Space beyond the EOF is zero-filled.
+ * Multiple I/O requests from different sources will get munged together.  If
+ * necessary, the readahead window can be expanded in either direction to a
+ * more convenient alighment for RPC efficiency or to make storage in the cache
+ * feasible.
+ *
+ * The calling netfs must provide a table of operations, only one of which,
+ * issue_op, is mandatory.
+ *
+ * The check_write_begin() operation can be provided to check for and flush
+ * conflicting writes once the page is grabbed and locked.  It is passed a
+ * pointer to the fsdata cookie that gets returned to the VM to be passed to
+ * write_end.  It is permitted to sleep.  It should return 0 if the request
+ * should go ahead; unlock the page and return -EAGAIN to cause the page to be
+ * regot; or return an error.
+ *
+ * This is usable whether or not caching is enabled.
+ */
+int netfs_write_begin(struct file *file, struct address_space *mapping,
+		      loff_t pos, unsigned int len, unsigned int flags,
+		      struct page **_page, void **_fsdata,
+		      const struct netfs_read_request_ops *ops,
+		      void *netfs_priv)
+{
+	struct netfs_read_request *rreq;
+	struct page *page, *xpage;
+	struct inode *inode = file_inode(file);
+	unsigned int debug_index = 0;
+	pgoff_t index = pos >> PAGE_SHIFT;
+	int pos_in_page = pos & ~PAGE_MASK;
+	loff_t size;
+	int ret;
+
+	struct readahead_control ractl = {
+		.file		= file,
+		.mapping	= mapping,
+		._index		= index,
+		._nr_pages	= 0,
+	};
+
+retry:
+	page = grab_cache_page_write_begin(mapping, index, 0);
+	if (!page)
+		return -ENOMEM;
+
+	if (ops->check_write_begin) {
+		/* Allow the netfs (eg. ceph) to flush conflicts. */
+		ret = ops->check_write_begin(file, pos, len, page, _fsdata);
+		if (ret < 0) {
+			if (ret == -EAGAIN)
+				goto retry;
+			goto error;
+		}
+	}
+
+	if (PageUptodate(page))
+		goto have_page;
+
+	/* If the page is beyond the EOF, we want to clear it - unless it's
+	 * within the cache granule containing the EOF, in which case we need
+	 * to preload the granule.
+	 */
+	size = i_size_read(inode);
+	if (!ops->is_cache_enabled(inode) &&
+	    ((pos_in_page == 0 && len == thp_size(page)) ||
+	     (pos >= size) ||
+	     (pos_in_page == 0 && (pos + len) >= size))) {
+		netfs_clear_thp(page);
+		SetPageUptodate(page);
+		netfs_stat(&netfs_n_rh_write_zskip);
+		goto have_page_no_wait;
+	}
+
+	ret = -ENOMEM;
+	rreq = netfs_alloc_read_request(ops, netfs_priv, file);
+	if (!rreq)
+		goto error;
+	rreq->mapping		= page->mapping;
+	rreq->start		= page->index * PAGE_SIZE;
+	rreq->len		= thp_size(page);
+	rreq->no_unlock_page	= page->index;
+	__set_bit(NETFS_RREQ_NO_UNLOCK_PAGE, &rreq->flags);
+	netfs_priv = NULL;
+
+	netfs_stat(&netfs_n_rh_write_begin);
+	trace_netfs_read(rreq, pos, len, netfs_read_trace_write_begin);
+
+	if (ops->begin_cache_operation)
+		ops->begin_cache_operation(rreq);
+
+	/* Expand the request to meet caching requirements and download
+	 * preferences.
+	 */
+	ractl._nr_pages = thp_nr_pages(page);
+	netfs_rreq_expand(rreq, &ractl);
+	netfs_get_read_request(rreq);
+
+	/* We hold the page locks, so we can drop the references */
+	while ((xpage = readahead_page(&ractl)))
+		if (xpage != page)
+			put_page(xpage);
+
+	atomic_set(&rreq->nr_rd_ops, 1);
+	do {
+		if (!netfs_rreq_submit_slice(rreq, &debug_index))
+			break;
+
+	} while (rreq->submitted < rreq->len);
+
+	// TODO: If we didn't submit enough readage, we need to clean up
+
+	/* Keep nr_rd_ops incremented so that the ref always belongs to us, and
+	 * the service code isn't punted off to a random thread pool to
+	 * process.
+	 */
+	for (;;) {
+		wait_var_event(&rreq->nr_rd_ops, atomic_read(&rreq->nr_rd_ops) == 1);
+		netfs_rreq_assess(rreq);
+		if (!test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags))
+			break;
+		cond_resched();
+	}
+
+	ret = rreq->error;
+	netfs_put_read_request(rreq);
+	if (ret < 0)
+		goto error;
+
+have_page:
+	wait_on_page_fscache(page);
+have_page_no_wait:
+	if (netfs_priv)
+		ops->cleanup(netfs_priv, mapping);
+	*_page = page;
+	_leave(" = 0");
+	return 0;
+
+error:
+	unlock_page(page);
+	put_page(page);
+	if (netfs_priv)
+		ops->cleanup(netfs_priv, mapping);
+	_leave(" = %d", ret);
+	return ret;
+}
+EXPORT_SYMBOL(netfs_write_begin);
diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c
index 3a7a3c10e1cd..cdd09e30ce75 100644
--- a/fs/netfs/stats.c
+++ b/fs/netfs/stats.c
@@ -23,19 +23,23 @@ atomic_t netfs_n_rh_read_failed;
 atomic_t netfs_n_rh_zero;
 atomic_t netfs_n_rh_short_read;
 atomic_t netfs_n_rh_write;
+atomic_t netfs_n_rh_write_begin;
 atomic_t netfs_n_rh_write_done;
 atomic_t netfs_n_rh_write_failed;
+atomic_t netfs_n_rh_write_zskip;
 
 void netfs_stats_show(struct seq_file *m)
 {
-	seq_printf(m, "RdHelp : RA=%u RP=%u rr=%u sr=%u\n",
+	seq_printf(m, "RdHelp : RA=%u RP=%u WB=%u rr=%u sr=%u\n",
 		   atomic_read(&netfs_n_rh_readahead),
 		   atomic_read(&netfs_n_rh_readpage),
+		   atomic_read(&netfs_n_rh_write_begin),
 		   atomic_read(&netfs_n_rh_rreq),
 		   atomic_read(&netfs_n_rh_sreq));
-	seq_printf(m, "RdHelp : ZR=%u sh=%u\n",
+	seq_printf(m, "RdHelp : ZR=%u sh=%u sk=%u\n",
 		   atomic_read(&netfs_n_rh_zero),
-		   atomic_read(&netfs_n_rh_short_read));
+		   atomic_read(&netfs_n_rh_short_read),
+		   atomic_read(&netfs_n_rh_write_zskip));
 	seq_printf(m, "RdHelp : DL=%u ds=%u df=%u di=%u\n",
 		   atomic_read(&netfs_n_rh_download),
 		   atomic_read(&netfs_n_rh_download_done),
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index d9cd9a1d1f6d..aced292b7e6d 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -78,12 +78,15 @@ struct netfs_read_request {
  * Operations the network filesystem can/must provide to the helpers.
  */
 struct netfs_read_request_ops {
+	bool (*is_cache_enabled)(struct inode *inode);
 	void (*init_rreq)(struct netfs_read_request *rreq, struct file *file);
 	int (*begin_cache_operation)(struct netfs_read_request *rreq);
 	void (*expand_readahead)(struct netfs_read_request *rreq);
 	bool (*clamp_length)(struct netfs_read_subrequest *subreq);
 	void (*issue_op)(struct netfs_read_subrequest *subreq);
 	bool (*is_still_valid)(struct netfs_read_request *rreq);
+	int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
+				 struct page *page, void **_fsdata);
 	void (*done)(struct netfs_read_request *rreq);
 	void (*cleanup)(struct address_space *mapping, void *netfs_priv);
 };
@@ -96,6 +99,11 @@ extern int netfs_readpage(struct file *,
 			  struct page *,
 			  const struct netfs_read_request_ops *,
 			  void *);
+extern int netfs_write_begin(struct file *, struct address_space *,
+			     loff_t, unsigned int, unsigned int, struct page **,
+			     void **,
+			     const struct netfs_read_request_ops *,
+			     void *);
 
 extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t);
 extern void netfs_stats_show(struct seq_file *);
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 56a734a3fb0d..b199dbb59256 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -23,6 +23,7 @@ enum netfs_read_trace {
 	netfs_read_trace_expanded,
 	netfs_read_trace_readahead,
 	netfs_read_trace_readpage,
+	netfs_read_trace_write_begin,
 };
 
 enum netfs_rreq_trace {
@@ -51,7 +52,8 @@ enum netfs_sreq_trace {
 #define netfs_read_traces					\
 	EM(netfs_read_trace_expanded,		"EXPANDED ")	\
 	EM(netfs_read_trace_readahead,		"READAHEAD")	\
-	E_(netfs_read_trace_readpage,		"READPAGE ")
+	EM(netfs_read_trace_readpage,		"READPAGE ")	\
+	E_(netfs_read_trace_write_begin,	"WRITEBEGN")
 
 #define netfs_rreq_traces					\
 	EM(netfs_rreq_trace_assess,		"ASSESS")	\



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 69/76] fscache: Add support for writing to the cache
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (66 preceding siblings ...)
  2020-11-20 15:17 ` [RFC PATCH 68/76] netfs: Add write_begin helper David Howells
@ 2020-11-20 15:17 ` David Howells
  2020-11-20 15:17 ` [RFC PATCH 70/76] afs: Use the fscache_write_begin() helper David Howells
                   ` (6 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:17 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Add a pair of helpers for use by a netfs to write data to the cache.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/content-map.c |   38 +++++++++++++
 fs/cachefiles/interface.c   |    1 
 fs/cachefiles/internal.h    |    2 +
 fs/fscache/io.c             |  128 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/fscache.h     |   34 +++++++++++
 5 files changed, 203 insertions(+)

diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c
index da0a81e3f751..d1e8a509a8cc 100644
--- a/fs/cachefiles/content-map.c
+++ b/fs/cachefiles/content-map.c
@@ -204,6 +204,44 @@ enum netfs_read_source cachefiles_prepare_read(struct netfs_read_subrequest *sub
 	return NETFS_READ_FROM_CACHE;
 }
 
+/*
+ * Prepare for a write to occur.
+ */
+int cachefiles_prepare_write(struct fscache_op_resources *opr,
+			     loff_t *_start, size_t *_len, loff_t i_size)
+{
+	struct cachefiles_object *object =
+		container_of(opr->object, struct cachefiles_object, fscache);
+	loff_t start = *_start, map_limit;
+	size_t len = *_len, down;
+	long granule = start / CACHEFILES_GRAN_SIZE;
+
+	if (start >= CACHEFILES_SIZE_LIMIT)
+		return -ENOBUFS;
+
+	if (granule / 8 >= object->content_map_size) {
+		cachefiles_expand_content_map(object, i_size);
+		if (granule / 8 >= object->content_map_size)
+			return -ENOBUFS;
+	}
+
+	map_limit = object->content_map_size * 8 * CACHEFILES_GRAN_SIZE;
+	if (start >= map_limit)
+		return -ENOBUFS;
+	if (len > map_limit - start)
+		len = map_limit - start;
+
+	/* Assume that the preparation to write involved preloading any
+	 * bits of the cache that weren't to be written and filling any
+	 * gaps that didn't end up being written.
+	 */
+
+	down = start - round_down(start, CACHEFILES_DIO_BLOCK_SIZE);
+	*_start = start - down;
+	*_len = round_up(down + len, CACHEFILES_DIO_BLOCK_SIZE);
+	return 0;
+}
+
 /*
  * Allocate a new content map.
  */
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 3609ff2fb491..a9725ca72ad5 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -609,6 +609,7 @@ static const struct fscache_op_ops cachefiles_io_ops = {
 	.write			= cachefiles_write,
 	.expand_readahead	= cachefiles_expand_readahead,
 	.prepare_read		= cachefiles_prepare_read,
+	.prepare_write		= cachefiles_prepare_write,
 };
 
 static void cachefiles_begin_operation(struct fscache_op_resources *opr)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index e2d06c0860a2..38149868c331 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -137,6 +137,8 @@ extern void cachefiles_mark_content_map(struct cachefiles_object *object,
 					loff_t start, loff_t len, unsigned int inval_counter);
 extern void cachefiles_expand_content_map(struct cachefiles_object *object, loff_t size);
 extern void cachefiles_shorten_content_map(struct cachefiles_object *object, loff_t new_size);
+extern int cachefiles_prepare_write(struct fscache_op_resources *opr,
+				    loff_t *_start, size_t *_len, loff_t i_size);
 extern bool cachefiles_load_content_map(struct cachefiles_object *object);
 extern void cachefiles_save_content_map(struct cachefiles_object *object);
 extern int cachefiles_display_object(struct seq_file *m, struct fscache_object *object);
diff --git a/fs/fscache/io.c b/fs/fscache/io.c
index 5401c9ed347b..295a89af4269 100644
--- a/fs/fscache/io.c
+++ b/fs/fscache/io.c
@@ -10,6 +10,7 @@
 #include <linux/fscache-cache.h>
 #include <linux/slab.h>
 #include <linux/netfs.h>
+#include <linux/uio.h>
 #include "internal.h"
 
 /*
@@ -260,3 +261,130 @@ void __fscache_resize_cookie(struct fscache_cookie *cookie, loff_t new_size)
 	}
 }
 EXPORT_SYMBOL(__fscache_resize_cookie);
+
+struct fscache_write_request {
+	struct fscache_op_resources cache_resources;
+	struct address_space	*mapping;
+	loff_t			start;
+	size_t			len;
+	fscache_io_terminated_t	term_func;
+	void			*term_func_priv;
+};
+
+/**
+ * fscache_clear_page_bits - Clear the PG_fscache bits from a set of pages
+ * @mapping: The netfs inode to use as the source
+ * @start: The start position in @mapping
+ * @len: The amount of data to unlock
+ *
+ * Clear the PG_fscache flag from a sequence of pages and wake up anyone who's
+ * waiting.
+ */
+void __fscache_clear_page_bits(struct address_space *mapping,
+			       loff_t start, size_t len)
+{
+	pgoff_t first = start / PAGE_SIZE;
+	pgoff_t last = (start + len - 1) / PAGE_SIZE;
+	struct page *page;
+
+	if (len) {
+		XA_STATE(xas, &mapping->i_pages, first);
+
+		rcu_read_lock();
+		xas_for_each(&xas, page, last) {
+			unlock_page_fscache(page);
+		}
+		rcu_read_unlock();
+	}
+}
+EXPORT_SYMBOL(__fscache_clear_page_bits);
+
+/*
+ * Deal with the completion of writing the data to the cache.
+ */
+static void fscache_wreq_done(void *priv, ssize_t transferred_or_error)
+{
+	struct fscache_write_request *wreq = priv;
+
+	fscache_clear_page_bits(wreq->mapping, wreq->start, wreq->len);
+
+	if (wreq->term_func)
+		wreq->term_func(wreq->term_func_priv, transferred_or_error);
+	fscache_end_operation(&wreq->cache_resources);
+	kfree(wreq);
+}
+
+/**
+ * fscache_write_to_cache - Save a write to the cache and clear PG_fscache
+ * @cookie: The cookie representing the cache object
+ * @mapping: The netfs inode to use as the source
+ * @start: The start position in @mapping
+ * @len: The amount of data to write back
+ * @i_size: The new size of the inode
+ * @term_func: The function to call upon completion
+ * @term_func_priv: The private data for @term_func
+ *
+ * Helper function for a netfs to write dirty data from an inode into the cache
+ * object that's backing it.
+ *
+ * @start and @len describe the range of the data.  This does not need to be
+ * page-aligned, but to satisfy DIO requirements, the cache may expand it up to
+ * the page boundaries on either end.  All the pages covering the range must be
+ * marked with PG_fscache.
+ *
+ * If given, @term_func will be called upon completion and supplied with
+ * @term_func_priv.  Note that the PG_fscache flags will have been cleared by
+ * this point, so the netfs must retain its own pin on the mapping.
+ */
+void __fscache_write_to_cache(struct fscache_cookie *cookie,
+			      struct address_space *mapping,
+			      loff_t start, size_t len, loff_t i_size,
+			      fscache_io_terminated_t term_func,
+			      void *term_func_priv)
+{
+	struct fscache_write_request *wreq;
+	struct fscache_op_resources *opr;
+	struct iov_iter iter;
+	int ret = -ENOBUFS;
+
+	if (!fscache_cookie_valid(cookie) || len == 0)
+		goto abandon;
+
+	_enter("%llx,%zx", start, len);
+
+	wreq = kzalloc(sizeof(struct fscache_write_request), GFP_NOFS);
+	if (!wreq)
+		goto abandon;
+	wreq->mapping		= mapping;
+	wreq->start		= start;
+	wreq->len		= len;
+	wreq->term_func		= term_func;
+	wreq->term_func_priv	= term_func_priv;
+
+	opr = &wreq->cache_resources;
+	if (fscache_begin_operation(cookie, opr, FSCACHE_WANT_WRITE) < 0)
+		goto abandon_free;
+
+	ret = opr->ops->prepare_write(opr, &start, &len, i_size);
+	if (ret < 0)
+		goto abandon_end;
+
+	/* TODO: Consider clearing page bits now for space the write isn't
+	 * covering.  This is more complicated than it appears when THPs are
+	 * taken into account.
+	 */
+
+	iov_iter_xarray(&iter, WRITE, &mapping->i_pages, start, len);
+	fscache_write(opr, start, &iter, fscache_wreq_done, wreq);
+	return;
+
+abandon_end:
+	return fscache_wreq_done(wreq, ret);
+abandon_free:
+	kfree(wreq);
+abandon:
+	fscache_clear_page_bits(mapping, start, len);
+	if (term_func)
+		term_func(term_func_priv, ret);
+}
+EXPORT_SYMBOL(__fscache_write_to_cache);
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 1c1ea3558421..0613ccea88c1 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -192,6 +192,12 @@ struct fscache_op_ops {
 	 */
 	enum netfs_read_source (*prepare_read)(struct netfs_read_subrequest *subreq,
 					       loff_t i_size);
+
+	/* Prepare a write operation, working out what part of the write we can
+	 * actually do.
+	 */
+	int (*prepare_write)(struct fscache_op_resources *opr,
+			     loff_t *_start, size_t *_len, loff_t i_size);
 };
 
 /*
@@ -226,6 +232,10 @@ extern void __fscache_invalidate(struct fscache_cookie *, const void *, loff_t,
 extern void fscache_put_super(struct super_block *,
 			      struct fscache_cookie *(*get_cookie)(struct inode *));
 
+extern void __fscache_write_to_cache(struct fscache_cookie *, struct address_space *,
+				     loff_t, size_t, loff_t, fscache_io_terminated_t, void *);
+extern void __fscache_clear_page_bits(struct address_space *, loff_t, size_t);
+
 /**
  * fscache_register_netfs - Register a filesystem as desiring caching services
  * @netfs: The description of the filesystem
@@ -627,6 +637,30 @@ int fscache_write(struct fscache_op_resources *opr,
 	return ops->write(opr, start_pos, iter, term_func, term_func_priv);
 }
 
+static inline void fscache_clear_page_bits(struct address_space *mapping,
+					   loff_t start, size_t len)
+{
+	if (fscache_available())
+		__fscache_clear_page_bits(mapping, start, len);
+}
+
+static inline void fscache_write_to_cache(struct fscache_cookie *cookie,
+					  struct address_space *mapping,
+					  loff_t start, size_t len, loff_t i_size,
+					  fscache_io_terminated_t term_func,
+					  void *term_func_priv)
+{
+	if (fscache_available()) {
+		__fscache_write_to_cache(cookie, mapping, start, len, i_size,
+					 term_func, term_func_priv);
+	} else {
+		fscache_clear_page_bits(mapping, start, len);
+		if (term_func)
+			term_func(term_func_priv, -ENOBUFS);
+	}
+
+}
+
 #if __fscache_available
 extern int fscache_set_page_dirty(struct page *page, struct fscache_cookie *cookie);
 #else



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 70/76] afs: Use the fscache_write_begin() helper
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (67 preceding siblings ...)
  2020-11-20 15:17 ` [RFC PATCH 69/76] fscache: Add support for writing to the cache David Howells
@ 2020-11-20 15:17 ` David Howells
  2020-11-20 15:18 ` [RFC PATCH 71/76] afs: Copy local writes to the cache when writing to the server David Howells
                   ` (5 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:17 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Make AFS use the new fscache_write_begin() helper to do the pre-reading
required before the write.  If successful, the helper returns with the
required page filled in and locked.  It may read more than just one page,
expanding the read to meet cache granularity requirements as necessary.

Note: A more advanced version of this could be made that does
generic_perform_write() for a whole cache granule.  This would make it
easier to avoid doing the download/read for the data to be overwritten.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/file.c     |   11 +++++
 fs/afs/internal.h |    1 
 fs/afs/write.c    |  108 ++++++-----------------------------------------------
 3 files changed, 23 insertions(+), 97 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 84a222587ca3..21c0ea627bd4 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -368,15 +368,24 @@ static int afs_begin_cache_operation(struct netfs_read_request *rreq)
 				       FSCACHE_WANT_PARAMS);
 }
 
+static int afs_check_write_begin(struct file *file, loff_t pos, unsigned len,
+				 struct page *page, void **_fsdata)
+{
+	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
+
+	return test_bit(AFS_VNODE_DELETED, &vnode->flags) ? -ESTALE : 0;
+}
+
 static void afs_priv_cleanup(struct address_space *mapping, void *netfs_priv)
 {
 	key_put(netfs_priv);
 }
 
-static const struct netfs_read_request_ops afs_req_ops = {
+const struct netfs_read_request_ops afs_req_ops = {
 	.init_rreq		= afs_init_rreq,
 	.is_cache_enabled	= afs_is_cache_enabled,
 	.begin_cache_operation	= afs_begin_cache_operation,
+	.check_write_begin	= afs_check_write_begin,
 	.issue_op		= afs_req_issue_op,
 	.cleanup		= afs_priv_cleanup,
 };
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 1f7caa560c5c..7a76bedd19d9 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -1036,6 +1036,7 @@ extern void afs_dynroot_depopulate(struct super_block *);
 extern const struct address_space_operations afs_fs_aops;
 extern const struct inode_operations afs_file_inode_operations;
 extern const struct file_operations afs_file_operations;
+extern const struct netfs_read_request_ops afs_req_ops;
 
 extern int afs_cache_wb_key(struct afs_vnode *, struct afs_file *);
 extern void afs_put_wb_key(struct afs_wb_key *);
diff --git a/fs/afs/write.c b/fs/afs/write.c
index e672833c99bc..b2e03de09c24 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -11,6 +11,8 @@
 #include <linux/pagemap.h>
 #include <linux/writeback.h>
 #include <linux/pagevec.h>
+#include <linux/netfs.h>
+#include <linux/fscache.h>
 #include "internal.h"
 
 /*
@@ -22,68 +24,6 @@ int afs_set_page_dirty(struct page *page)
 	return __set_page_dirty_nobuffers(page);
 }
 
-/*
- * Handle completion of a read operation to fill a page.
- */
-static void afs_fill_hole(struct afs_read *req)
-{
-	if (iov_iter_count(req->iter) > 0)
-		/* The read was short - clear the excess buffer. */
-		iov_iter_zero(iov_iter_count(req->iter), req->iter);
-}
-
-/*
- * partly or wholly fill a page that's under preparation for writing
- */
-static int afs_fill_page(struct file *file,
-			 loff_t pos, unsigned int len, struct page *page)
-{
-	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
-	struct afs_read *req;
-	size_t p;
-	void *data;
-	int ret;
-
-	_enter(",,%llu", (unsigned long long)pos);
-
-	if (pos >= vnode->vfs_inode.i_size) {
-		p = pos & ~PAGE_MASK;
-		ASSERTCMP(p + len, <=, PAGE_SIZE);
-		data = kmap(page);
-		memset(data + p, 0, len);
-		kunmap(page);
-		return 0;
-	}
-
-	req = kzalloc(sizeof(struct afs_read), GFP_KERNEL);
-	if (!req)
-		return -ENOMEM;
-
-	refcount_set(&req->usage, 1);
-	req->vnode	= vnode;
-	req->done	= afs_fill_hole;
-	req->key	= key_get(afs_file_key(file));
-	req->pos	= pos;
-	req->len	= len;
-	req->nr_pages	= 1;
-	req->iter	= &req->def_iter;
-	iov_iter_xarray(&req->def_iter, READ, &file->f_mapping->i_pages, pos, len);
-
-	ret = afs_fetch_data(vnode, req);
-	afs_put_read(req);
-	if (ret < 0) {
-		if (ret == -ENOENT) {
-			_debug("got NOENT from server"
-			       " - marking file deleted and stale");
-			set_bit(AFS_VNODE_DELETED, &vnode->flags);
-			ret = -ESTALE;
-		}
-	}
-
-	_leave(" = %d", ret);
-	return ret;
-}
-
 /*
  * prepare to perform part of a write to a page
  */
@@ -102,24 +42,14 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
 	_enter("{%llx:%llu},%llx,%x",
 	       vnode->fid.vid, vnode->fid.vnode, pos, len);
 
-	page = grab_cache_page_write_begin(mapping, pos / PAGE_SIZE, flags);
-	if (!page)
-		return -ENOMEM;
-
-	if (!PageUptodate(page) && len != PAGE_SIZE) {
-		ret = afs_fill_page(file, pos & PAGE_MASK, PAGE_SIZE, page);
-		if (ret < 0) {
-			unlock_page(page);
-			put_page(page);
-			_leave(" = %d [prep]", ret);
-			return ret;
-		}
-		SetPageUptodate(page);
-	}
-
-#ifdef CONFIG_AFS_FSCACHE
-	wait_on_page_fscache(page);
-#endif
+	/* Prefetch area to be written into the cache if we're caching this
+	 * file.  We need to do this before we get a lock on the page in case
+	 * there's more than one writer competing for the same cache block.
+	 */
+	ret = netfs_write_begin(file, mapping, pos, len, flags, &page, fsdata,
+				&afs_req_ops, NULL);
+	if (ret < 0)
+		return ret;
 
 	index = page->index;
 	from = pos - index * PAGE_SIZE;
@@ -184,7 +114,6 @@ int afs_write_end(struct file *file, struct address_space *mapping,
 	unsigned int f, from = pos & (thp_size(page) - 1);
 	unsigned int t, to = from + copied;
 	loff_t i_size, maybe_i_size;
-	int ret = 0;
 
 	_enter("{%llx:%llu},{%lx}",
 	       vnode->fid.vid, vnode->fid.vnode, page->index);
@@ -203,19 +132,7 @@ int afs_write_end(struct file *file, struct address_space *mapping,
 		write_sequnlock(&vnode->cb_lock);
 	}
 
-	if (!PageUptodate(page)) {
-		if (copied < len) {
-			/* Try and load any missing data from the server.  The
-			 * unmarshalling routine will take care of clearing any
-			 * bits that are beyond the EOF.
-			 */
-			ret = afs_fill_page(file, pos + copied,
-					    len - copied, page);
-			if (ret < 0)
-				goto out;
-		}
-		SetPageUptodate(page);
-	}
+	ASSERT(PageUptodate(page));
 
 	if (PagePrivate(page)) {
 		priv = page_private(page);
@@ -236,12 +153,11 @@ int afs_write_end(struct file *file, struct address_space *mapping,
 
 	if (set_page_dirty(page))
 		_debug("dirtied %lx", page->index);
-	ret = copied;
 
 out:
 	unlock_page(page);
 	put_page(page);
-	return ret;
+	return copied;
 }
 
 /*



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 71/76] afs: Copy local writes to the cache when writing to the server
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (68 preceding siblings ...)
  2020-11-20 15:17 ` [RFC PATCH 70/76] afs: Use the fscache_write_begin() helper David Howells
@ 2020-11-20 15:18 ` David Howells
  2020-11-20 15:18 ` [RFC PATCH 72/76] afs: Invoke fscache_resize_cookie() when handling ATTR_SIZE for setattr David Howells
                   ` (4 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:18 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

When writing to the server from afs_writepage() or afs_writepages(), copy
the data to the cache object too.

To make this possible, the cookie must have its active users count
incremented when the page is dirtied and kept incremented until we manage
to clean up all the pages.  This allows the writeback to take place after
the last file struct is released.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/file.c     |    6 ++++
 fs/afs/inode.c    |   11 ++++++--
 fs/afs/internal.h |    1 +
 fs/afs/super.c    |    1 +
 fs/afs/write.c    |   74 +++++++++++++++++++++++++++++++++++++++++++++--------
 5 files changed, 79 insertions(+), 14 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 21c0ea627bd4..bd070684de53 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -403,6 +403,12 @@ static void afs_readahead(struct readahead_control *ractl)
 	netfs_readahead(ractl, &afs_req_ops, NULL);
 }
 
+int afs_write_inode(struct inode *inode, struct writeback_control *wbc)
+{
+	fscache_unpin_writeback(wbc, afs_vnode_cache(AFS_FS_I(inode)));
+	return 0;
+}
+
 /*
  * Adjust the dirty region of the page on truncation or full invalidation,
  * getting rid of the markers altogether if the region is entirely invalidated.
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 177baeea5a1e..51e55bfadb54 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -776,9 +776,7 @@ int afs_drop_inode(struct inode *inode)
  */
 void afs_evict_inode(struct inode *inode)
 {
-	struct afs_vnode *vnode;
-
-	vnode = AFS_FS_I(inode);
+	struct afs_vnode *vnode = AFS_FS_I(inode);
 
 	_enter("{%llx:%llu.%d}",
 	       vnode->fid.vid,
@@ -790,6 +788,13 @@ void afs_evict_inode(struct inode *inode)
 	ASSERTCMP(inode->i_ino, ==, vnode->fid.vnode);
 
 	truncate_inode_pages_final(&inode->i_data);
+
+	if (inode->i_state & I_PINNING_FSCACHE_WB) {
+		struct afs_vnode_cache_aux aux;
+		loff_t i_size = i_size_read(&vnode->vfs_inode);
+		aux.data_version = vnode->status.data_version;
+		fscache_unuse_cookie(afs_vnode_cache(vnode), &aux, &i_size);
+	}
 	clear_inode(inode);
 
 	while (!list_empty(&vnode->wb_keys)) {
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 7a76bedd19d9..bc76c08b9f38 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -1045,6 +1045,7 @@ extern int afs_release(struct inode *, struct file *);
 extern int afs_fetch_data(struct afs_vnode *, struct afs_read *);
 extern struct afs_read *afs_alloc_read(gfp_t);
 extern void afs_put_read(struct afs_read *);
+extern int afs_write_inode(struct inode *, struct writeback_control *);
 
 static inline struct afs_read *afs_get_read(struct afs_read *req)
 {
diff --git a/fs/afs/super.c b/fs/afs/super.c
index d745e228961a..c95523e90fe6 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -61,6 +61,7 @@ int afs_net_id;
 static const struct super_operations afs_super_ops = {
 	.statfs		= afs_statfs,
 	.alloc_inode	= afs_alloc_inode,
+	.write_inode	= afs_write_inode,
 	.drop_inode	= afs_drop_inode,
 	.destroy_inode	= afs_destroy_inode,
 	.free_inode	= afs_free_inode,
diff --git a/fs/afs/write.c b/fs/afs/write.c
index b2e03de09c24..627b08d8de1f 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -15,13 +15,16 @@
 #include <linux/fscache.h>
 #include "internal.h"
 
+static void afs_write_to_cache(struct afs_vnode *vnode, loff_t start, size_t len,
+			       loff_t i_size);
+
 /*
- * mark a page as having been made dirty and thus needing writeback
+ * Mark a page as having been made dirty and thus needing writeback.  We also
+ * need to pin the cache object to write back to.
  */
 int afs_set_page_dirty(struct page *page)
 {
-	_enter("");
-	return __set_page_dirty_nobuffers(page);
+	return fscache_set_page_dirty(page, afs_vnode_cache(AFS_FS_I(page->mapping->host)));
 }
 
 /*
@@ -113,7 +116,7 @@ int afs_write_end(struct file *file, struct address_space *mapping,
 	unsigned long priv;
 	unsigned int f, from = pos & (thp_size(page) - 1);
 	unsigned int t, to = from + copied;
-	loff_t i_size, maybe_i_size;
+	loff_t i_size, write_end_pos;
 
 	_enter("{%llx:%llu},{%lx}",
 	       vnode->fid.vid, vnode->fid.vnode, page->index);
@@ -121,15 +124,16 @@ int afs_write_end(struct file *file, struct address_space *mapping,
 	if (copied == 0)
 		goto out;
 
-	maybe_i_size = pos + copied;
+	write_end_pos = pos + copied;
 
 	i_size = i_size_read(&vnode->vfs_inode);
-	if (maybe_i_size > i_size) {
+	if (write_end_pos > i_size) {
 		write_seqlock(&vnode->cb_lock);
 		i_size = i_size_read(&vnode->vfs_inode);
-		if (maybe_i_size > i_size)
-			i_size_write(&vnode->vfs_inode, maybe_i_size);
+		if (write_end_pos > i_size)
+			i_size_write(&vnode->vfs_inode, write_end_pos);
 		write_sequnlock(&vnode->cb_lock);
+		fscache_update_cookie(afs_vnode_cache(vnode), NULL, &write_end_pos);
 	}
 
 	ASSERT(PageUptodate(page));
@@ -468,7 +472,8 @@ static void afs_extend_writeback(struct address_space *mapping,
 
 			if (!trylock_page(page))
 				break;
-			if (!PageDirty(page) || PageWriteback(page)) {
+			if (!PageDirty(page) || PageWriteback(page) ||
+			    PageFsCache(page)) {
 				unlock_page(page);
 				break;
 			}
@@ -514,6 +519,8 @@ static void afs_extend_writeback(struct address_space *mapping,
 				BUG();
 			if (test_set_page_writeback(page))
 				BUG();
+			if (TestSetPageFsCache(page))
+				BUG();
 
 			*_count -= thp_nr_pages(page);
 			unlock_page(page);
@@ -548,6 +555,8 @@ static ssize_t afs_write_back_from_locked_page(struct address_space *mapping,
 
 	if (test_set_page_writeback(page))
 		BUG();
+	if (TestSetPageFsCache(page))
+		BUG();
 
 	count -= thp_nr_pages(page);
 
@@ -587,12 +596,18 @@ static ssize_t afs_write_back_from_locked_page(struct address_space *mapping,
 	if (start < i_size) {
 		_debug("write back %x @%llx [%llx]", len, start, i_size);
 
+		/* Speculatively write to the cache.  We have to fix this up
+		 * later if the store fails.
+		 */
+		afs_write_to_cache(vnode, start, len, i_size);
+
 		iov_iter_xarray(&iter, WRITE, &mapping->i_pages, start, len);
 		ret = afs_store_data(vnode, &iter, start, false);
 	} else {
 		_debug("write discard %x @%llx [%llx]", len, start, i_size);
 
 		/* The dirty region was entirely beyond the EOF. */
+		fscache_clear_page_bits(mapping, start, len);
 		afs_pages_written_back(vnode, start, len);
 		ret = 0;
 	}
@@ -650,6 +665,10 @@ int afs_writepage(struct page *page, struct writeback_control *wbc)
 
 	_enter("{%lx},", page->index);
 
+#ifdef CONFIG_AFS_FSCACHE
+	wait_on_page_fscache(page);
+#endif
+
 	start = page->index * PAGE_SIZE;
 	ret = afs_write_back_from_locked_page(page->mapping, wbc, page,
 					      start, LLONG_MAX - start);
@@ -712,10 +731,14 @@ static int afs_writepages_region(struct address_space *mapping,
 			continue;
 		}
 
-		if (PageWriteback(page)) {
+		if (PageWriteback(page) || PageFsCache(page)) {
 			unlock_page(page);
-			if (wbc->sync_mode != WB_SYNC_NONE)
+			if (wbc->sync_mode != WB_SYNC_NONE) {
 				wait_on_page_writeback(page);
+#ifdef CONFIG_AFS_FSCACHE
+				wait_on_page_fscache(page);
+#endif
+			}
 			put_page(page);
 			continue;
 		}
@@ -945,3 +968,32 @@ int afs_launder_page(struct page *page)
 	wait_on_page_fscache(page);
 	return ret;
 }
+
+/*
+ * Deal with the completion of writing the data to the cache.
+ */
+static void afs_write_to_cache_done(void *priv, ssize_t transferred_or_error)
+{
+	struct afs_vnode *vnode = priv;
+
+	if (IS_ERR_VALUE(transferred_or_error) &&
+	    transferred_or_error != -ENOBUFS) {
+		struct afs_vnode_cache_aux aux = {
+			.data_version = vnode->status.data_version,
+		};
+
+		fscache_invalidate(afs_vnode_cache(vnode), &aux,
+				   i_size_read(&vnode->vfs_inode), 0);
+	}
+}
+
+/*
+ * Save the write to the cache also.
+ */
+static void afs_write_to_cache(struct afs_vnode *vnode,
+			       loff_t start, size_t len, loff_t i_size)
+{
+	fscache_write_to_cache(afs_vnode_cache(vnode),
+			       vnode->vfs_inode.i_mapping, start, len, i_size,
+			       afs_write_to_cache_done, vnode);
+}



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 72/76] afs: Invoke fscache_resize_cookie() when handling ATTR_SIZE for setattr
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (69 preceding siblings ...)
  2020-11-20 15:18 ` [RFC PATCH 71/76] afs: Copy local writes to the cache when writing to the server David Howells
@ 2020-11-20 15:18 ` David Howells
  2020-11-20 15:18 ` [RFC PATCH 73/76] afs: Add O_DIRECT read support David Howells
                   ` (3 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:18 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Invoke fscache_resize_cookie() to adjust the size of the backing cache
object when setattr is called with ATTR_SIZE.  This discards any data that
then lies beyond the revised EOF and frees up space.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/inode.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 51e55bfadb54..c639de101065 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -847,6 +847,9 @@ static void afs_setattr_edit_file(struct afs_operation *op)
 
 		if (size < i_size)
 			truncate_pagecache(inode, size);
+		if (size != i_size)
+			fscache_resize_cookie(afs_vnode_cache(vp->vnode),
+					      vp->scb.status.size);
 	}
 }
 
@@ -889,6 +892,8 @@ int afs_setattr(struct dentry *dentry, struct iattr *attr)
 			attr->ia_valid &= ~ATTR_SIZE;
 	}
 
+	fscache_use_cookie(afs_vnode_cache(vnode), true);
+
 	/* flush any dirty data outstanding on a regular file */
 	if (S_ISREG(vnode->vfs_inode.i_mode))
 		filemap_write_and_wait(vnode->vfs_inode.i_mapping);
@@ -919,6 +924,7 @@ int afs_setattr(struct dentry *dentry, struct iattr *attr)
 
 out_unlock:
 	up_write(&vnode->validate_lock);
+	fscache_unuse_cookie(afs_vnode_cache(vnode), NULL, NULL);
 	_leave(" = %d", ret);
 	return ret;
 }



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 73/76] afs: Add O_DIRECT read support
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (70 preceding siblings ...)
  2020-11-20 15:18 ` [RFC PATCH 72/76] afs: Invoke fscache_resize_cookie() when handling ATTR_SIZE for setattr David Howells
@ 2020-11-20 15:18 ` David Howells
  2020-11-20 15:18 ` [RFC PATCH 74/76] afs: Skip truncation on the server of data we haven't written yet David Howells
                   ` (2 subsequent siblings)
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:18 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Add synchronous O_DIRECT read support to AFS (no AIO yet).  It can
theoretically handle reads up to the maximum size describable by loff_t -
and given an iterator with sufficiently capacity to handle that and given
support on the server.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/file.c      |   59 +++++++++++++++++++++++++++++++++++
 fs/afs/fsclient.c  |   18 ++++++++---
 fs/afs/internal.h  |    2 +
 fs/afs/write.c     |   88 +++++++++++++++++++++++++++++++++++++++++++++++-----
 fs/afs/yfsclient.c |   12 +++++--
 5 files changed, 161 insertions(+), 18 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index bd070684de53..27445866531c 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -24,6 +24,7 @@ static void afs_invalidatepage(struct page *page, unsigned int offset,
 static int afs_releasepage(struct page *page, gfp_t gfp_flags);
 
 static void afs_readahead(struct readahead_control *ractl);
+static ssize_t afs_direct_IO(struct kiocb *iocb, struct iov_iter *iter);
 
 const struct file_operations afs_file_operations = {
 	.open		= afs_open,
@@ -53,6 +54,7 @@ const struct address_space_operations afs_fs_aops = {
 	.launder_page	= afs_launder_page,
 	.releasepage	= afs_releasepage,
 	.invalidatepage	= afs_invalidatepage,
+	.direct_IO	= afs_direct_IO,
 	.write_begin	= afs_write_begin,
 	.write_end	= afs_write_end,
 	.writepage	= afs_writepage,
@@ -529,3 +531,60 @@ static int afs_file_mmap(struct file *file, struct vm_area_struct *vma)
 		vma->vm_ops = &afs_vm_ops;
 	return ret;
 }
+
+/*
+ * Direct file read operation for an AFS file.
+ *
+ * TODO: To support AIO, the pages in the iterator have to be copied and
+ * refs taken on them.  Then -EIOCBQUEUED needs to be returned.
+ * iocb->ki_complete must then be called upon completion of the operation.
+ */
+static ssize_t afs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter)
+{
+	struct file *file = iocb->ki_filp;
+	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
+	struct afs_read *req;
+	ssize_t ret, transferred;
+
+	_enter("%llx,%zx", iocb->ki_pos, iov_iter_count(iter));
+
+	req = afs_alloc_read(GFP_KERNEL);
+	if (!req)
+		return -ENOMEM;
+
+	req->vnode	= vnode;
+	req->key	= key_get(afs_file_key(file));
+	req->pos	= iocb->ki_pos;
+	req->len	= iov_iter_count(iter);
+	req->iter	= iter;
+
+	task_io_account_read(req->len);
+
+	// TODO nfs_start_io_direct(inode);
+	ret = afs_fetch_data(vnode, req);
+	if (ret == 0)
+		transferred = req->actual_len;
+	afs_put_read(req);
+
+	// TODO nfs_end_io_direct(inode);
+
+	if (ret == 0)
+		ret = transferred;
+
+	BUG_ON(ret == -EIOCBQUEUED); // TODO
+	//if (iocb->ki_complete)
+	//	iocb->ki_complete(iocb, ret, 0); // only if ret == -EIOCBQUEUED
+
+	_leave(" = %zu", ret);
+	return ret;
+}
+
+/*
+ * Do direct I/O.
+ */
+static ssize_t afs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
+{
+	if (iov_iter_rw(iter) == READ)
+		return afs_file_direct_read(iocb, iter);
+	return afs_file_direct_write(iocb, iter);
+}
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 2f695a260442..5e42af4d1ded 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -438,7 +438,7 @@ static void afs_fs_fetch_data64(struct afs_operation *op)
 	bp[3] = htonl(vp->fid.unique);
 	bp[4] = htonl(upper_32_bits(req->pos));
 	bp[5] = htonl(lower_32_bits(req->pos));
-	bp[6] = 0;
+	bp[6] = htonl(upper_32_bits(req->len));
 	bp[7] = htonl(lower_32_bits(req->len));
 
 	trace_afs_make_fs_call(call, &vp->fid);
@@ -1058,6 +1058,7 @@ static void afs_fs_store_data64(struct afs_operation *op)
 	struct afs_vnode_param *vp = &op->file[0];
 	struct afs_call *call;
 	__be32 *bp;
+	u32 mask = 0;
 
 	_enter(",%x,{%llx:%llu},,",
 	       key_serial(op->key), vp->fid.vid, vp->fid.vnode);
@@ -1070,6 +1071,9 @@ static void afs_fs_store_data64(struct afs_operation *op)
 
 	call->write_iter = op->store.write_iter;
 
+	if (op->flags & AFS_OPERATION_SET_MTIME)
+		mask |= AFS_SET_MTIME;
+
 	/* marshall the parameters */
 	bp = call->request;
 	*bp++ = htonl(FSSTOREDATA64);
@@ -1077,8 +1081,8 @@ static void afs_fs_store_data64(struct afs_operation *op)
 	*bp++ = htonl(vp->fid.vnode);
 	*bp++ = htonl(vp->fid.unique);
 
-	*bp++ = htonl(AFS_SET_MTIME); /* mask */
-	*bp++ = htonl(op->mtime.tv_sec); /* mtime */
+	*bp++ = htonl(mask);
+	*bp++ = htonl(op->mtime.tv_sec);
 	*bp++ = 0; /* owner */
 	*bp++ = 0; /* group */
 	*bp++ = 0; /* unix mode */
@@ -1103,6 +1107,7 @@ void afs_fs_store_data(struct afs_operation *op)
 	struct afs_vnode_param *vp = &op->file[0];
 	struct afs_call *call;
 	__be32 *bp;
+	u32 mask = 0;
 
 	_enter(",%x,{%llx:%llu},,",
 	       key_serial(op->key), vp->fid.vid, vp->fid.vnode);
@@ -1125,6 +1130,9 @@ void afs_fs_store_data(struct afs_operation *op)
 
 	call->write_iter = op->store.write_iter;
 
+	if (op->flags & AFS_OPERATION_SET_MTIME)
+		mask |= AFS_SET_MTIME;
+
 	/* marshall the parameters */
 	bp = call->request;
 	*bp++ = htonl(FSSTOREDATA);
@@ -1132,8 +1140,8 @@ void afs_fs_store_data(struct afs_operation *op)
 	*bp++ = htonl(vp->fid.vnode);
 	*bp++ = htonl(vp->fid.unique);
 
-	*bp++ = htonl(AFS_SET_MTIME); /* mask */
-	*bp++ = htonl(op->mtime.tv_sec); /* mtime */
+	*bp++ = htonl(mask);
+	*bp++ = htonl(op->mtime.tv_sec);
 	*bp++ = 0; /* owner */
 	*bp++ = 0; /* group */
 	*bp++ = 0; /* unix mode */
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index bc76c08b9f38..e80fb6fe15b3 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -853,6 +853,7 @@ struct afs_operation {
 #define AFS_OPERATION_TRIED_ALL		0x0400	/* Set if we've tried all the fileservers */
 #define AFS_OPERATION_RETRY_SERVER	0x0800	/* Set if we should retry the current server */
 #define AFS_OPERATION_DIR_CONFLICT	0x1000	/* Set if we detected a 3rd-party dir change */
+#define AFS_OPERATION_SET_MTIME		0x2000	/* Set if we should try to store the mtime */
 };
 
 /*
@@ -1506,6 +1507,7 @@ extern int afs_fsync(struct file *, loff_t, loff_t, int);
 extern vm_fault_t afs_page_mkwrite(struct vm_fault *vmf);
 extern void afs_prune_wb_keys(struct afs_vnode *);
 extern int afs_launder_page(struct page *);
+extern ssize_t afs_file_direct_write(struct kiocb *, struct iov_iter *);
 
 /*
  * xattr.c
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 627b08d8de1f..bab110c00abd 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -387,7 +387,7 @@ static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, loff_t
 	op->store.i_size = max(pos + size, i_size);
 	op->store.laundering = laundering;
 	op->mtime = vnode->vfs_inode.i_mtime;
-	op->flags |= AFS_OPERATION_UNINTR;
+	op->flags |= AFS_OPERATION_SET_MTIME | AFS_OPERATION_UNINTR;
 	op->ops = &afs_store_data_operation;
 
 try_next_key:
@@ -810,7 +810,6 @@ int afs_writepages(struct address_space *mapping,
 ssize_t afs_file_write(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct afs_vnode *vnode = AFS_FS_I(file_inode(iocb->ki_filp));
-	ssize_t result;
 	size_t count = iov_iter_count(from);
 
 	_enter("{%llx:%llu},{%zu},",
@@ -822,13 +821,7 @@ ssize_t afs_file_write(struct kiocb *iocb, struct iov_iter *from)
 		return -EBUSY;
 	}
 
-	if (!count)
-		return 0;
-
-	result = generic_file_write_iter(iocb, from);
-
-	_leave(" = %zd", result);
-	return result;
+	return generic_file_write_iter(iocb, from);
 }
 
 /*
@@ -997,3 +990,80 @@ static void afs_write_to_cache(struct afs_vnode *vnode,
 			       vnode->vfs_inode.i_mapping, start, len, i_size,
 			       afs_write_to_cache_done, vnode);
 }
+
+static void afs_dio_store_data_success(struct afs_operation *op)
+{
+	struct afs_vnode *vnode = op->file[0].vnode;
+
+	op->ctime = op->file[0].scb.status.mtime_client;
+	afs_vnode_commit_status(op, &op->file[0]);
+	if (op->error == 0) {
+		afs_stat_v(vnode, n_stores);
+		atomic_long_add(op->store.size, &afs_v2net(vnode)->n_store_bytes);
+	}
+}
+
+static const struct afs_operation_ops afs_dio_store_data_operation = {
+	.issue_afs_rpc	= afs_fs_store_data,
+	.issue_yfs_rpc	= yfs_fs_store_data,
+	.success	= afs_dio_store_data_success,
+};
+
+/*
+ * Direct file write operation for an AFS file.
+ *
+ * TODO: To support AIO, the pages in the iterator have to be copied and
+ * refs taken on them.  Then -EIOCBQUEUED needs to be returned.
+ * iocb->ki_complete must then be called upon completion of the operation.
+ */
+ssize_t afs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter)
+{
+	struct file *file = iocb->ki_filp;
+	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
+	struct afs_operation *op;
+	loff_t size = iov_iter_count(iter), i_size;
+	ssize_t ret;
+
+	_enter("%s{%llx:%llu.%u},%llx,%llx",
+	       vnode->volume->name,
+	       vnode->fid.vid,
+	       vnode->fid.vnode,
+	       vnode->fid.unique,
+	       size, iocb->ki_pos);
+
+	op = afs_alloc_operation(afs_file_key(file), vnode->volume);
+	if (IS_ERR(op))
+		return -ENOMEM;
+
+	i_size = i_size_read(&vnode->vfs_inode);
+
+	afs_op_set_vnode(op, 0, vnode);
+	op->file[0].dv_delta	= 1;
+	op->file[0].set_size	= true;
+	op->store.write_iter	= iter;
+	op->store.pos		= iocb->ki_pos;
+	op->store.size		= size;
+	op->store.i_size	= max(iocb->ki_pos + size, i_size);
+	op->ops			= &afs_dio_store_data_operation;
+
+	//if (!is_sync_kiocb(iocb)) {
+
+	ret = afs_do_sync_operation(op);
+	if (ret == 0)
+		ret = size;
+
+	{
+		struct afs_vnode_cache_aux aux = {
+			.data_version = vnode->status.data_version,
+		};
+		fscache_invalidate(afs_vnode_cache(vnode), &aux,
+				   i_size_read(&vnode->vfs_inode),
+				   FSCACHE_INVAL_DIO_WRITE);
+	}
+
+	//if (iocb->ki_complete)
+	//	iocb->ki_complete(iocb, ret, 0); // only if ret == -EIOCBQUEUED
+
+	_leave(" = %zd", ret);
+	return ret;
+}
diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c
index 2b35cba8ad62..a8c4e230002d 100644
--- a/fs/afs/yfsclient.c
+++ b/fs/afs/yfsclient.c
@@ -95,12 +95,16 @@ static __be32 *xdr_encode_YFSStoreStatus_mode(__be32 *bp, mode_t mode)
 	return bp + xdr_size(x);
 }
 
-static __be32 *xdr_encode_YFSStoreStatus_mtime(__be32 *bp, const struct timespec64 *t)
+static __be32 *xdr_encode_YFSStoreStatus_mtime(__be32 *bp, struct afs_operation *op)
 {
 	struct yfs_xdr_YFSStoreStatus *x = (void *)bp;
-	s64 mtime = linux_to_yfs_time(t);
+	s64 mtime = linux_to_yfs_time(&op->mtime);
+	u32 mask = 0;
 
-	x->mask		= htonl(AFS_SET_MTIME);
+	if (op->flags & AFS_OPERATION_SET_MTIME)
+		mask |= AFS_SET_MTIME;
+
+	x->mask		= htonl(mask);
 	x->mode		= htonl(0);
 	x->mtime_client	= u64_to_xdr(mtime);
 	x->owner	= u64_to_xdr(0);
@@ -1103,7 +1107,7 @@ void yfs_fs_store_data(struct afs_operation *op)
 	bp = xdr_encode_u32(bp, YFSSTOREDATA64);
 	bp = xdr_encode_u32(bp, 0); /* RPC flags */
 	bp = xdr_encode_YFSFid(bp, &vp->fid);
-	bp = xdr_encode_YFSStoreStatus_mtime(bp, &op->mtime);
+	bp = xdr_encode_YFSStoreStatus_mtime(bp, op);
 	bp = xdr_encode_u64(bp, op->store.pos);
 	bp = xdr_encode_u64(bp, op->store.size);
 	bp = xdr_encode_u64(bp, op->store.i_size);



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 74/76] afs: Skip truncation on the server of data we haven't written yet
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (71 preceding siblings ...)
  2020-11-20 15:18 ` [RFC PATCH 73/76] afs: Add O_DIRECT read support David Howells
@ 2020-11-20 15:18 ` David Howells
  2020-11-20 15:19 ` [RFC PATCH 75/76] afs: Make afs_write_begin() return the THP subpage David Howells
  2020-11-20 15:19 ` [RFC PATCH 76/76] afs: Fix speculative status fetch going out of order wrt to modifications David Howells
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:18 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

Don't send a truncation RPC to the server if we're only shortening data
that's in the pagecache and is beyond the server's EOF.

Also don't automatically force writeback on setattr, but do wait to store
RPCs that are in the region to be removed on a shortening truncation.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/inode.c |   45 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 35 insertions(+), 10 deletions(-)

diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index c639de101065..9d8f759f77f7 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -865,42 +865,67 @@ static const struct afs_operation_ops afs_setattr_operation = {
  */
 int afs_setattr(struct dentry *dentry, struct iattr *attr)
 {
+	const unsigned int supported =
+		ATTR_SIZE | ATTR_MODE | ATTR_UID | ATTR_GID |
+		ATTR_MTIME | ATTR_MTIME_SET | ATTR_TIMES_SET | ATTR_TOUCH;
 	struct afs_operation *op;
 	struct afs_vnode *vnode = AFS_FS_I(d_inode(dentry));
+	struct inode *inode = &vnode->vfs_inode;
+	loff_t i_size;
 	int ret;
 
 	_enter("{%llx:%llu},{n=%pd},%x",
 	       vnode->fid.vid, vnode->fid.vnode, dentry,
 	       attr->ia_valid);
 
-	if (!(attr->ia_valid & (ATTR_SIZE | ATTR_MODE | ATTR_UID | ATTR_GID |
-				ATTR_MTIME | ATTR_MTIME_SET | ATTR_TIMES_SET |
-				ATTR_TOUCH))) {
+	if (!(attr->ia_valid & supported)) {
 		_leave(" = 0 [unsupported]");
 		return 0;
 	}
 
+	i_size = i_size_read(inode);
 	if (attr->ia_valid & ATTR_SIZE) {
-		if (!S_ISREG(vnode->vfs_inode.i_mode))
+		if (!S_ISREG(inode->i_mode))
 			return -EISDIR;
 
-		ret = inode_newsize_ok(&vnode->vfs_inode, attr->ia_size);
+		ret = inode_newsize_ok(inode, attr->ia_size);
 		if (ret)
 			return ret;
 
-		if (attr->ia_size == i_size_read(&vnode->vfs_inode))
+		if (attr->ia_size == i_size)
 			attr->ia_valid &= ~ATTR_SIZE;
 	}
 
 	fscache_use_cookie(afs_vnode_cache(vnode), true);
 
-	/* flush any dirty data outstanding on a regular file */
-	if (S_ISREG(vnode->vfs_inode.i_mode))
-		filemap_write_and_wait(vnode->vfs_inode.i_mapping);
-
 	/* Prevent any new writebacks from starting whilst we do this. */
 	down_write(&vnode->validate_lock);
 
+	if ((attr->ia_valid & ATTR_SIZE) && S_ISREG(inode->i_mode)) {
+		loff_t size = attr->ia_size;
+
+		/* Wait for any outstanding writes to the server to complete */
+		loff_t from = min(size, i_size);
+		loff_t to = max(size, i_size);
+		ret = filemap_fdatawait_range(inode->i_mapping, from, to);
+		if (ret < 0)
+			goto out_unlock;
+
+		/* Don't talk to the server if we're just shortening in-memory
+		 * writes that haven't gone to the server yet.
+		 */
+		if (!(attr->ia_valid & (supported & ~ATTR_SIZE & ~ATTR_MTIME)) &&
+		    attr->ia_size < i_size &&
+		    attr->ia_size > vnode->status.size) {
+			truncate_pagecache(inode, attr->ia_size);
+			fscache_resize_cookie(afs_vnode_cache(vnode),
+					      attr->ia_size);
+			i_size_write(inode, attr->ia_size);
+			ret = 0;
+			goto out_unlock;
+		}
+	}
+
 	op = afs_alloc_operation(((attr->ia_valid & ATTR_FILE) ?
 				  afs_file_key(attr->ia_file) : NULL),
 				 vnode->volume);



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 75/76] afs: Make afs_write_begin() return the THP subpage
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (72 preceding siblings ...)
  2020-11-20 15:18 ` [RFC PATCH 74/76] afs: Skip truncation on the server of data we haven't written yet David Howells
@ 2020-11-20 15:19 ` David Howells
  2020-11-20 15:19 ` [RFC PATCH 76/76] afs: Fix speculative status fetch going out of order wrt to modifications David Howells
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:19 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

generic_perform_write() can't handle a THP, so we have to return the
subpage of that THP from afs_write_begin() and then convert it back into
the head on entry to afs_write_end().

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/write.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/afs/write.c b/fs/afs/write.c
index bab110c00abd..a88c6ae71868 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -81,7 +81,7 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
 			goto flush_conflicting_write;
 	}
 
-	*_page = page;
+	*_page = find_subpage(page, pos / PAGE_SIZE);
 	_leave(" = 0");
 	return 0;
 
@@ -110,9 +110,10 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
  */
 int afs_write_end(struct file *file, struct address_space *mapping,
 		  loff_t pos, unsigned len, unsigned copied,
-		  struct page *page, void *fsdata)
+		  struct page *subpage, void *fsdata)
 {
 	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
+	struct page *page = thp_head(subpage);
 	unsigned long priv;
 	unsigned int f, from = pos & (thp_size(page) - 1);
 	unsigned int t, to = from + copied;



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 76/76] afs: Fix speculative status fetch going out of order wrt to modifications
  2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
                   ` (73 preceding siblings ...)
  2020-11-20 15:19 ` [RFC PATCH 75/76] afs: Make afs_write_begin() return the THP subpage David Howells
@ 2020-11-20 15:19 ` David Howells
  74 siblings, 0 replies; 76+ messages in thread
From: David Howells @ 2020-11-20 15:19 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: dhowells, Jeff Layton, Matthew Wilcox, Alexander Viro,
	linux-cachefs, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-fsdevel, linux-kernel

When doing a lookup in a directory, the afs filesystem uses a bulk status
fetch to speculatively retrieve the statuses of up to 48 other vnodes found
in the same directory and it will then either update extant inodes or
create new ones - effectively doing 'lookup ahead'.

To avoid the possibility of deadlocking itself, however, the filesystem
doesn't lock all of those inodes; rather just the directory inode is locked
(by the VFS).  When the operation completes, afs_inode_init_from_status()
or afs_apply_status() is called, depending on whether the inode already
exists, to commit the new status.

A case exists, however, where the speculative status fetch operation may
straddle a modification operation on one of those vnodes.  What can then
happen is that the speculative bulk status RPC retrieves the old status,
and whilst that is happening, the modification happens - which returns an
updated status, then the modification status is committed, then we attempt
to commit the speculative status.

This results in something like the following being seen in dmesg:

	kAFS: vnode modified {100058:861} 8->9 YFS.InlineBulkStatus

showing that for vnode 861 on volume 100058, we saw YFS.InlineBulkStatus
say that the vnode had data version 8 when we'd already recorded version 9
due to a local modification.  This was causing the cache to be invalidated
for that vnode when it shouldn't have been.  If it happens on a data file,
this might lead to local changes being lost.

Fix this by ignoring speculative status updates if the data version doesn't
match the expected value.

Note that it is possible to get a DV regression if a volume gets restored
from a backup - but we should get a callback break in such a case that
should trigger a recheck anyway.  It might be worth checking the volume
creation time in the volsync info and, if a change is observed in that (as
would happen on a restore), invalidate all caches associated with the
volume.

Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/dir.c      |    1 +
 fs/afs/inode.c    |    8 ++++++++
 fs/afs/internal.h |    1 +
 3 files changed, 10 insertions(+)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 26ac5cfdf3af..157529b0e422 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -900,6 +900,7 @@ static struct inode *afs_do_lookup(struct inode *dir, struct dentry *dentry,
 				vp->cb_break_before = afs_calc_vnode_cb_break(vnode);
 				vp->vnode = vnode;
 				vp->put_vnode = true;
+				vp->speculative = true; /* vnode not locked */
 			}
 		}
 	}
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 9d8f759f77f7..acb47a862893 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -295,6 +295,13 @@ void afs_vnode_commit_status(struct afs_operation *op, struct afs_vnode_param *v
 			op->flags &= ~AFS_OPERATION_DIR_CONFLICT;
 		}
 	} else if (vp->scb.have_status) {
+		if (vp->dv_before + vp->dv_delta != vp->scb.status.data_version &&
+		    vp->speculative)
+			/* Ignore the result of a speculative bulk status fetch
+			 * if it splits around a modification op, thereby
+			 * appearing to regress the data version.
+			 */
+			goto out;
 		afs_apply_status(op, vp);
 		if (vp->scb.have_cb)
 			afs_apply_callback(op, vp);
@@ -306,6 +313,7 @@ void afs_vnode_commit_status(struct afs_operation *op, struct afs_vnode_param *v
 		}
 	}
 
+out:
 	write_sequnlock(&vnode->cb_lock);
 
 	if (vp->scb.have_status)
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index e80fb6fe15b3..07471deeb6a8 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -760,6 +760,7 @@ struct afs_vnode_param {
 	bool			update_ctime:1;	/* Need to update the ctime */
 	bool			set_size:1;	/* Must update i_size */
 	bool			op_unlinked:1;	/* True if file was unlinked by op */
+	bool			speculative:1;	/* T if speculative status fetch (no vnode lock) */
 };
 
 /*



^ permalink raw reply related	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2020-11-20 15:19 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-20 15:02 [RFC PATCH 00/76] fscache: Modernisation David Howells
2020-11-20 15:02 ` [RFC PATCH 01/76] nfs, cifs, ceph, 9p: Disable use of fscache prior to its rewrite David Howells
2020-11-20 15:02 ` [RFC PATCH 02/76] afs: Disable use of the fscache I/O routines David Howells
2020-11-20 15:03 ` [RFC PATCH 03/76] fscache: Add a cookie debug ID and use that in traces David Howells
2020-11-20 15:03 ` [RFC PATCH 04/76] fscache: Procfile to display cookies David Howells
2020-11-20 15:03 ` [RFC PATCH 05/76] fscache: Remove the old I/O API David Howells
2020-11-20 15:03 ` [RFC PATCH 06/76] fscache: Remove the netfs data from the cookie David Howells
2020-11-20 15:04 ` [RFC PATCH 07/76] fscache: Remove struct fscache_cookie_def David Howells
2020-11-20 15:04 ` [RFC PATCH 08/76] fscache: Remove store_limit* from struct fscache_object David Howells
2020-11-20 15:04 ` [RFC PATCH 09/76] fscache: Remove fscache_check_consistency() David Howells
2020-11-20 15:04 ` [RFC PATCH 10/76] fscache: Remove fscache_attr_changed() David Howells
2020-11-20 15:04 ` [RFC PATCH 11/76] fscache: Remove obsolete stats David Howells
2020-11-20 15:05 ` [RFC PATCH 12/76] fscache: Remove old I/O tracepoints David Howells
2020-11-20 15:05 ` [RFC PATCH 13/76] fscache: Temporarily disable fscache_invalidate() David Howells
2020-11-20 15:05 ` [RFC PATCH 14/76] fscache: Remove the I/O operation manager David Howells
2020-11-20 15:05 ` [RFC PATCH 15/76] fscache: Change %p in format strings to something else David Howells
2020-11-20 15:06 ` [RFC PATCH 16/76] cachefiles: " David Howells
2020-11-20 15:06 ` [RFC PATCH 17/76] iov_iter: Add ITER_XARRAY David Howells
2020-11-20 15:06 ` [RFC PATCH 18/76] vm: Add wait/unlock functions for PG_fscache David Howells
2020-11-20 15:06 ` [RFC PATCH 19/76] mm: Implement readahead_control pageset expansion David Howells
2020-11-20 15:06 ` [RFC PATCH 20/76] mm: Stop generic_file_buffered_read() from grabbing a superfluous page David Howells
2020-11-20 15:07 ` [RFC PATCH 21/76] vfs: Export rw_verify_area() for use by cachefiles David Howells
2020-11-20 15:07 ` [RFC PATCH 22/76] vfs: Provide S_CACHE_FILE inode flag David Howells
2020-11-20 15:07 ` [RFC PATCH 23/76] cachefiles: Remove tree of active files and use " David Howells
2020-11-20 15:07 ` [RFC PATCH 24/76] fscache: Provide a simple thread pool for running ops asynchronously David Howells
2020-11-20 15:08 ` [RFC PATCH 26/76] fscache: Rewrite the I/O API based on iov_iter David Howells
2020-11-20 15:08 ` [RFC PATCH 27/76] fscache: Keep track of size of a file last set independently on the server David Howells
2020-11-20 15:08 ` [RFC PATCH 28/76] fscache, cachefiles: Fix disabled histogram warnings David Howells
2020-11-20 15:08 ` [RFC PATCH 29/76] fscache: Recast assertion in terms of cookie not being an index David Howells
2020-11-20 15:08 ` [RFC PATCH 30/76] vfs, fscache: Force ->write_inode() to occur if cookie pinned for writeback David Howells
2020-11-20 15:09 ` [RFC PATCH 31/76] fscache: Allow ->put_super() to be used to wait for cache operations David Howells
2020-11-20 15:09 ` [RFC PATCH 32/76] netfs: Make a netfs helper module David Howells
2020-11-20 15:09 ` [RFC PATCH 33/76] netfs: Provide readahead and readpage netfs helpers David Howells
2020-11-20 15:09 ` [RFC PATCH 34/76] netfs: Use the cache David Howells
2020-11-20 15:10 ` [RFC PATCH 35/76] fscache: read-helper: Add tracepoints David Howells
2020-11-20 15:10 ` [RFC PATCH 36/76] cachefiles: Remove some redundant checks on unsigned values David Howells
2020-11-20 15:10 ` [RFC PATCH 37/76] cachefiles: trace: Log coherency checks David Howells
2020-11-20 15:10 ` [RFC PATCH 38/76] cachefiles: Split cachefiles_drop_object() up a bit David Howells
2020-11-20 15:11 ` [RFC PATCH 39/76] cachefiles: Implement new fscache I/O backend API David Howells
2020-11-20 15:11 ` [RFC PATCH 40/76] cachefiles: Merge object->backer into object->dentry David Howells
2020-11-20 15:11 ` [RFC PATCH 41/76] cachefiles: Implement a content-present indicator and bitmap David Howells
2020-11-20 15:11 ` [RFC PATCH 42/76] cachefiles: Shape requests from the fscache read helper David Howells
2020-11-20 15:11 ` [RFC PATCH 43/76] cachefiles: Round the cachefile size up to DIO block size David Howells
2020-11-20 15:12 ` [RFC PATCH 44/76] cachefiles: Implement read and write parts of new I/O API David Howells
2020-11-20 15:12 ` [RFC PATCH 45/76] cachefiles: Add I/O tracepoints David Howells
2020-11-20 15:12 ` [RFC PATCH 46/76] fscache: Display cache-specific data in /proc/fs/fscache/objects David Howells
2020-11-20 15:12 ` [RFC PATCH 47/76] fscache: Remove more obsolete stats David Howells
2020-11-20 15:12 ` [RFC PATCH 48/76] fscache: Always create /proc/fs/fscache/stats if configured David Howells
2020-11-20 15:13 ` [RFC PATCH 49/76] netfs: Stats David Howells
2020-11-20 15:13 ` [RFC PATCH 50/76] fscache: New stats David Howells
2020-11-20 15:13 ` [RFC PATCH 51/76] fscache, cachefiles: Rewrite invalidation David Howells
2020-11-20 15:13 ` [RFC PATCH 52/76] fscache: disable cookie when doing an invalidation for DIO write David Howells
2020-11-20 15:13 ` [RFC PATCH 53/76] fscache: Implement "will_modify" parameter on fscache_use_cookie() David Howells
2020-11-20 15:14 ` [RFC PATCH 54/76] fscache: Provide resize operation David Howells
2020-11-20 15:14 ` [RFC PATCH 55/76] fscache: Remove the update operation David Howells
2020-11-20 15:14 ` [RFC PATCH 56/76] afs: Pass page into dirty region helpers to provide THP size David Howells
2020-11-20 15:14 ` [RFC PATCH 57/76] afs: Print the operation debug_id when logging an unexpected data version David Howells
2020-11-20 15:14 ` [RFC PATCH 58/76] afs: Move key to afs_read struct David Howells
2020-11-20 15:15 ` [RFC PATCH 59/76] afs: Don't truncate iter during data fetch David Howells
2020-11-20 15:15 ` [RFC PATCH 60/76] afs: Log remote unmarshalling errors David Howells
2020-11-20 15:15 ` [RFC PATCH 61/76] afs: Set up the iov_iter before calling afs_extract_data() David Howells
2020-11-20 15:15 ` [RFC PATCH 62/76] afs: Use ITER_XARRAY for writing David Howells
2020-11-20 15:16 ` [RFC PATCH 63/76] afs: Wait on PG_fscache before modifying/releasing a page David Howells
2020-11-20 15:16 ` [RFC PATCH 64/76] afs: Extract writeback extension into its own function David Howells
2020-11-20 15:16 ` [RFC PATCH 65/76] afs: Prepare for use of THPs David Howells
2020-11-20 15:16 ` [RFC PATCH 66/76] afs: Use the fs operation ops to handle FetchData completion David Howells
2020-11-20 15:17 ` [RFC PATCH 67/76] afs: Use new fscache read helper API David Howells
2020-11-20 15:17 ` [RFC PATCH 68/76] netfs: Add write_begin helper David Howells
2020-11-20 15:17 ` [RFC PATCH 69/76] fscache: Add support for writing to the cache David Howells
2020-11-20 15:17 ` [RFC PATCH 70/76] afs: Use the fscache_write_begin() helper David Howells
2020-11-20 15:18 ` [RFC PATCH 71/76] afs: Copy local writes to the cache when writing to the server David Howells
2020-11-20 15:18 ` [RFC PATCH 72/76] afs: Invoke fscache_resize_cookie() when handling ATTR_SIZE for setattr David Howells
2020-11-20 15:18 ` [RFC PATCH 73/76] afs: Add O_DIRECT read support David Howells
2020-11-20 15:18 ` [RFC PATCH 74/76] afs: Skip truncation on the server of data we haven't written yet David Howells
2020-11-20 15:19 ` [RFC PATCH 75/76] afs: Make afs_write_begin() return the THP subpage David Howells
2020-11-20 15:19 ` [RFC PATCH 76/76] afs: Fix speculative status fetch going out of order wrt to modifications David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).