CEPH-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 00/32] Network fs helper library & fscache kiocb API [ver #2]
@ 2021-01-25 21:30 David Howells
  2021-01-25 21:31 ` [PATCH 01/32] iov_iter: Add ITER_XARRAY David Howells
                   ` (31 more replies)
  0 siblings, 32 replies; 42+ messages in thread
From: David Howells @ 2021-01-25 21:30 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Dominique Martinet
  Cc: linux-afs, Dave Wysochanski, Matthew Wilcox,
	Matthew Wilcox (Oracle),
	dhowells, Jeff Layton, David Wysochanski, Matthew Wilcox,
	Alexander Viro, linux-cachefs, linux-afs, linux-nfs, linux-cifs,
	ceph-devel, v9fs-developer, linux-fsdevel, linux-kernel


Here's a set of patches to do two things:

 (1) Add a helper library to handle the new VM readahead interface.  This
     is intended to be used unconditionally by the filesystem (whether or
     not caching is enabled) and provides a common framework for doing
     caching, transparent huge pages and, in the future, possibly fscrypt
     and read bandwidth maximisation.  It also allows the netfs and the
     cache to align, expand and slice up a read request from the VM in
     various ways; the netfs need only provide a function to read a stretch
     of data to the pagecache and the helper takes care of the rest.

 (2) Add an alternative fscache/cachfiles I/O API that uses the kiocb
     facility to do async DIO to transfer data to/from the netfs's pages,
     rather than using readpage with wait queue snooping on one side and
     vfs_write() on the other.  It also uses less memory, since it doesn't
     do buffered I/O on the backing file.

     Note that this uses SEEK_HOLE/SEEK_DATA to locate the data available
     to be read from the cache.  Whilst this is an improvement from the
     bmap interface, it still has a problem with regard to a modern
     extent-based filesystem inserting or removing bridging blocks of
     zeros.  Fixing that requires a much greater overhaul.

This is a step towards overhauling the fscache API.  The change is opt-in
on the part of the network filesystem.  A netfs should not try to mix the
old and the new API because of conflicting ways of handling pages and the
PG_fscache page flag and because it would be mixing DIO with buffered I/O.
Further, the helper library can't be used with the old API.

This does not change any of the fscache cookie handling APIs or the way
invalidation is done.

In the near term, I intend to deprecate and remove the old I/O API
(fscache_allocate_page{,s}(), fscache_read_or_alloc_page{,s}(),
fscache_write_page() and fscache_uncache_page()) and eventually replace
most of fscache/cachefiles with something simpler and easier to follow.

The patchset contains five parts:

 (1) Some helper patches, including provision of an ITER_XARRAY iov
     iterator and a function to do readahead expansion.

 (2) Patches to add the netfs helper library.

 (3) A patch to add the fscache/cachefiles kiocb API

 (4) Patches to add support in AFS for this.

 (5) Patches from David Wysochanski to add support in NFS for this.

Jeff Layton also has patches for Ceph for this, though they're not included
on this branch.

With this, AFS without a cache passes all expected xfstests; with a cache,
there's an extra failure, but that's also there before these patches.
Fixing that probably requires a greater overhaul.  Ceph and NFS also pass
the expected tests.

These patches can be found also on:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-netfs-lib


Changes
=======

 (v2) Fix some bugs and add NFS support.

David
---
Dave Wysochanski (8):
      NFS: Clean up nfs_readpage() and nfs_readpages()
      NFS: In nfs_readpage() only increment NFSIOS_READPAGES when read succeeds
      NFS: Refactor nfs_readpage() and nfs_readpage_async() to use nfs_readdesc
      NFS: Call readpage_async_filler() from nfs_readpage_async()
      NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()
      NFS: Allow internal use of read structs and functions
      NFS: Convert to the netfs API and nfs_readpage to use netfs_readpage
      NFS: Convert readpage to readahead and use netfs_readahead for fscache

David Howells (24):
      iov_iter: Add ITER_XARRAY
      vm: Add wait/unlock functions for PG_fscache
      mm: Implement readahead_control pageset expansion
      vfs: Export rw_verify_area() for use by cachefiles
      netfs: Make a netfs helper module
      netfs: Provide readahead and readpage netfs helpers
      netfs: Add tracepoints
      netfs: Gather stats
      netfs: Add write_begin helper
      netfs: Define an interface to talk to a cache
      fscache, cachefiles: Add alternate API to use kiocb for read/write to cache
      afs: Disable use of the fscache I/O routines
      afs: Pass page into dirty region helpers to provide THP size
      afs: Print the operation debug_id when logging an unexpected data version
      afs: Move key to afs_read struct
      afs: Don't truncate iter during data fetch
      afs: Log remote unmarshalling errors
      afs: Set up the iov_iter before calling afs_extract_data()
      afs: Use ITER_XARRAY for writing
      afs: Wait on PG_fscache before modifying/releasing a page
      afs: Extract writeback extension into its own function
      afs: Prepare for use of THPs
      afs: Use the fs operation ops to handle FetchData completion
      afs: Use new fscache read helper API


 fs/Kconfig                    |    1 +
 fs/Makefile                   |    1 +
 fs/afs/Kconfig                |    1 +
 fs/afs/dir.c                  |  225 ++++---
 fs/afs/file.c                 |  472 ++++----------
 fs/afs/fs_operation.c         |    4 +-
 fs/afs/fsclient.c             |  108 +--
 fs/afs/inode.c                |    7 +-
 fs/afs/internal.h             |   57 +-
 fs/afs/rxrpc.c                |  150 ++---
 fs/afs/write.c                |  610 +++++++++--------
 fs/afs/yfsclient.c            |   82 +--
 fs/cachefiles/Makefile        |    1 +
 fs/cachefiles/interface.c     |    5 +-
 fs/cachefiles/internal.h      |    9 +
 fs/cachefiles/rdwr2.c         |  412 ++++++++++++
 fs/fscache/Kconfig            |    1 +
 fs/fscache/Makefile           |    3 +-
 fs/fscache/internal.h         |    3 +
 fs/fscache/page.c             |    2 +-
 fs/fscache/page2.c            |  116 ++++
 fs/fscache/stats.c            |    1 +
 fs/internal.h                 |    5 -
 fs/netfs/Kconfig              |   23 +
 fs/netfs/Makefile             |    5 +
 fs/netfs/internal.h           |   97 +++
 fs/netfs/read_helper.c        | 1155 +++++++++++++++++++++++++++++++++
 fs/netfs/stats.c              |   59 ++
 fs/nfs/file.c                 |    2 +-
 fs/nfs/fscache.c              |  206 +++---
 fs/nfs/fscache.h              |   66 +-
 fs/nfs/internal.h             |    8 +
 fs/nfs/pagelist.c             |    2 +
 fs/nfs/read.c                 |  244 ++++---
 fs/read_write.c               |    1 +
 include/linux/fs.h            |    1 +
 include/linux/fscache-cache.h |    4 +
 include/linux/fscache.h       |   28 +-
 include/linux/netfs.h         |  167 +++++
 include/linux/nfs_fs.h        |    5 +-
 include/linux/nfs_iostat.h    |    2 +-
 include/linux/nfs_page.h      |    1 +
 include/linux/nfs_xdr.h       |    1 +
 include/linux/pagemap.h       |   16 +
 include/net/af_rxrpc.h        |    2 +-
 include/trace/events/afs.h    |   74 +--
 include/trace/events/netfs.h  |  201 ++++++
 mm/filemap.c                  |   18 +
 mm/readahead.c                |   70 ++
 net/rxrpc/recvmsg.c           |    9 +-
 50 files changed, 3457 insertions(+), 1286 deletions(-)
 create mode 100644 fs/cachefiles/rdwr2.c
 create mode 100644 fs/fscache/page2.c
 create mode 100644 fs/netfs/Kconfig
 create mode 100644 fs/netfs/Makefile
 create mode 100644 fs/netfs/internal.h
 create mode 100644 fs/netfs/read_helper.c
 create mode 100644 fs/netfs/stats.c
 create mode 100644 include/linux/netfs.h
 create mode 100644 include/trace/events/netfs.h



^ permalink raw reply	[flat|nested] 42+ messages in thread
* [PATCH 00/32] fscache: Rewrite 2: Make the I/O interface use kiocb/iov_iter
@ 2020-07-13 16:30 David Howells
  2020-07-13 16:31 ` [PATCH 02/32] vm: Add wait/unlock functions for PG_fscache David Howells
  0 siblings, 1 reply; 42+ messages in thread
From: David Howells @ 2020-07-13 16:30 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Steve French, Alexander Viro,
	Matthew Wilcox
  Cc: linux-cifs, linux-nfs, linux-kernel, linux-cachefs,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-afs


Here's a set of patches that massively overhauls the object lifecycle
management and the I/O API of the local caching for network filesystems code
for a reduction of about 3000 LoC and a 1000 line Documentation reduction.
The ability to use async DIO to pass data to/from the cache gives a huge speed
bonus and uses less memory.

So far, only AFS has been ported to use this.  This series disabled fscache
support in all other filesystems that use it.  I'm working on NFS, but it's
tricky and I could use some help.

The following parts have been removed:

    - The object state machine
    - The I/O operation manager
    - All non-transient references from fscache to the netfs's data
    - All non-transient callbacks from fscache to the netfs
    - The backing page I/O monitoring
    - The tracking of netfs pages that fscache knows about
    - The tracking of netfs pages that need writing to the cache
    - The use of bmap to work out if a page is stored in the cache
    - The copy of data to/from backing pages to netfs pages.

Instead, the I/O to the cache is driven much more from the netfs.  There are a
number of aspects to the I/O API:

 (1) The lowest level I/O primitives just take an iov_iter and start async
     DIO on the cache objects.  The caller gets a callback upon completion.
     The PG_fscache bit is now just used to indicate that there's a write
     to the cache in progress.  The cache will keep track in xattrs as to
     what areas of the backing file are occupied.

     - struct fscache_io_request
     - fscache_read() and fscache_write().

 (2) The cache may require that the I/O fulfil certain alignment and length
     granularity constraints.  This, and whether the cache contains the
     desired data, can be queried.

     - struct fscache_request_shape
     - fscache_shape_request()

 (3) The cookie that's obtained when an inode is set up must be 'used' when a
     file is opened (with an indication as to whether it might be modified)
     and 'unused' when it is done with.  At the point of unuse, the auxdata
     and file size can be specified.

     - fscache_use_cookie(), fscache_unuse_cookie()

 (4) The cookie can be invalidated at any time, and new auxiliary data and a
     new size provided.  Any in-progress I/O will either cause new I/O to
     wait, or a replacement tmpfile will be created and the in-progress I/O
     will just be abandoned.  The on-disk auxdata (in xattrs, say) are updated
     lazily.

     - fscache_invalidate()

 (5) A helper API for reads is provided to combine the (1), (2) above and
     read_cache_pages() and also do read-(re)issue to the network in the case
     that the data isn't present or the cache fails.  This requires that an
     operation descriptor be allocated and given some operations.  This needs
     to be used for ->readpage(), ->readpages() and prefetching for
     ->write_begin().

     - struct fscache_io_request
     - fscache_read_helper()

I've also simplified the cookie management API to remove struct
fscache_cookie_def.  Instead, the pertinent details are supplied when a cookie
is created and the file size, key and auxdata are stored in the cookie.
Callbacks and backpointers are simply removed.

I've added some pieces outside of the API also:

 (1) An inode flag to mark a backing cachefile as being in use by the kernel.
     This prevents multiple caches mounted in the same directory from fighting
     over the same files.  It can also be extended to exclude other kernel
     users (such as swap) and could also be used to prevent userspace
     interfering with the file.

 (2) A new I/O iterator class, ITER_MAPPING, that iterates over the specified
     byte range of a mapping.  The caller is required to make sure that the
     pages don't evaporate under the callee (eg. pinning them by PG_locked,
     PG_writeback, PG_fscache or usage count).

     It may make sense to make this more generic and just take an xarray that
     is known to be laden with pages.  The one tricky bit there is that the
     iteration routines call find_get_page_contig() - though the only thing
     the mapping is used for is to find the xarray containing the pages.

     This is better than an ITER_BVEC as no allocation of bio_vec structs is
     required since the xarray holds pointers to all the pages involved.

 (3) Wait and unlock functions for PG_fscache.  These are in the core, so no
     need to call into fscache for it.

So, to be done hopefully before the next merge window:

 (1) Need to investigate whether I need to be lazy, but only to an extent,
     about closing up backing files.

 (2) Make NFS, CIFS, Ceph, 9P work with it.  Jeff Layton is working on Ceph
     and Dave Wysochanski is working on NFS.

And beyond that:

 (3) Put in support for versioned monolithic objects (eg. AFS directories).

 (4) Currently it cachefiles only caches large files up to 1GiB.  File data
     beyond that isn't cached.  The problem here is that I'm using an xattr to
     hold the content map, and xattrs may be limited in size and I've limited
     myself to using a 512 byte xattr.  I can easily make it cache a
     non-sparse file of any size with no map, but as soon as it becomes
     sparse, I need a different strategy.

 (5) Change the indexing strategy so that the culling mechanism is brought
     into the kernel, rather than doing that in userspace, and use an index
     table of files with a LRU list.

These patches can be found also on:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter

David
---
David Howells (1):
      cachefiles: Shape write requests


 fs/9p/Kconfig                          |    2 +-
 fs/Makefile                            |    2 +-
 fs/afs/Kconfig                         |    1 +
 fs/afs/cache.c                         |   54 --
 fs/afs/cell.c                          |    9 +-
 fs/afs/dir.c                           |  242 +++++--
 fs/afs/file.c                          |  575 +++++++--------
 fs/afs/fs_operation.c                  |    4 +-
 fs/afs/fsclient.c                      |  154 ++--
 fs/afs/inode.c                         |   57 +-
 fs/afs/internal.h                      |   58 +-
 fs/afs/rxrpc.c                         |  150 ++--
 fs/afs/volume.c                        |    9 +-
 fs/afs/write.c                         |  413 +++++++----
 fs/afs/yfsclient.c                     |  113 ++-
 fs/cachefiles/Makefile                 |    3 +-
 fs/cachefiles/bind.c                   |   11 +-
 fs/cachefiles/content-map.c            |  476 ++++++++++++
 fs/cachefiles/daemon.c                 |   10 +-
 fs/cachefiles/interface.c              |  564 ++++++++-------
 fs/cachefiles/internal.h               |  139 ++--
 fs/cachefiles/io.c                     |  279 +++++++
 fs/cachefiles/main.c                   |   12 +-
 fs/cachefiles/namei.c                  |  508 +++++--------
 fs/cachefiles/rdwr.c                   |  974 -------------------------
 fs/cachefiles/xattr.c                  |  263 +++----
 fs/ceph/Kconfig                        |    2 +-
 fs/cifs/Kconfig                        |    2 +-
 fs/fscache/Kconfig                     |    8 +
 fs/fscache/Makefile                    |   10 +-
 fs/fscache/cache.c                     |  136 ++--
 fs/fscache/cookie.c                    |  769 ++++++++------------
 fs/fscache/dispatcher.c                |  150 ++++
 fs/fscache/fsdef.c                     |   56 +-
 fs/fscache/histogram.c                 |    2 +-
 fs/fscache/internal.h                  |  260 +++----
 fs/fscache/io.c                        |  201 +++++
 fs/fscache/main.c                      |   35 +-
 fs/fscache/netfs.c                     |   10 +-
 fs/fscache/obj.c                       |  345 +++++++++
 fs/fscache/object-list.c               |  129 +---
 fs/fscache/object.c                    | 1133 -----------------------------
 fs/fscache/object_bits.c               |  120 +++
 fs/fscache/operation.c                 |  633 ----------------
 fs/fscache/page.c                      | 1248 --------------------------------
 fs/fscache/proc.c                      |   13 +-
 fs/fscache/read_helper.c               |  688 ++++++++++++++++++
 fs/fscache/stats.c                     |  265 +++----
 fs/internal.h                          |    5 -
 fs/nfs/Kconfig                         |    2 +-
 fs/nfs/fscache-index.c                 |    4 +-
 fs/read_write.c                        |    1 +
 include/linux/fs.h                     |    2 +
 include/linux/fscache-cache.h          |  508 +++----------
 include/linux/fscache-obsolete.h       |   13 +
 include/linux/fscache.h                |  814 ++++++++-------------
 include/linux/mm.h                     |    1 +
 include/linux/pagemap.h                |   14 +
 include/linux/uio.h                    |   11 +
 include/net/af_rxrpc.h                 |    2 +-
 include/trace/events/afs.h             |   51 +-
 include/trace/events/cachefiles.h      |  285 ++++++--
 include/trace/events/fscache.h         |  421 ++---------
 include/trace/events/fscache_support.h |   91 +++
 lib/iov_iter.c                         |  286 +++++++-
 mm/filemap.c                           |   27 +-
 net/rxrpc/recvmsg.c                    |    9 +-
 67 files changed, 5598 insertions(+), 8246 deletions(-)
 create mode 100644 fs/cachefiles/content-map.c
 create mode 100644 fs/cachefiles/io.c
 delete mode 100644 fs/cachefiles/rdwr.c
 create mode 100644 fs/fscache/dispatcher.c
 create mode 100644 fs/fscache/io.c
 create mode 100644 fs/fscache/obj.c
 delete mode 100644 fs/fscache/object.c
 create mode 100644 fs/fscache/object_bits.c
 delete mode 100644 fs/fscache/operation.c
 delete mode 100644 fs/fscache/page.c
 create mode 100644 fs/fscache/read_helper.c
 create mode 100644 include/linux/fscache-obsolete.h
 create mode 100644 include/trace/events/fscache_support.h

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, back to index

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-25 21:30 [PATCH 00/32] Network fs helper library & fscache kiocb API [ver #2] David Howells
2021-01-25 21:31 ` [PATCH 01/32] iov_iter: Add ITER_XARRAY David Howells
2021-01-25 21:31 ` [PATCH 02/32] vm: Add wait/unlock functions for PG_fscache David Howells
2021-01-25 21:31 ` [PATCH 03/32] mm: Implement readahead_control pageset expansion David Howells
2021-01-25 21:31 ` [PATCH 04/32] vfs: Export rw_verify_area() for use by cachefiles David Howells
2021-01-25 21:31 ` [PATCH 05/32] netfs: Make a netfs helper module David Howells
2021-01-25 21:32 ` [PATCH 06/32] netfs: Provide readahead and readpage netfs helpers David Howells
2021-01-25 21:32 ` [PATCH 07/32] netfs: Add tracepoints David Howells
2021-01-25 21:32 ` [PATCH 08/32] netfs: Gather stats David Howells
2021-01-25 21:32 ` [PATCH 09/32] netfs: Add write_begin helper David Howells
2021-01-25 21:32 ` [PATCH 10/32] netfs: Define an interface to talk to a cache David Howells
2021-01-25 21:32 ` [PATCH 11/32] fscache, cachefiles: Add alternate API to use kiocb for read/write to cache David Howells
2021-01-25 21:33 ` [PATCH 12/32] afs: Disable use of the fscache I/O routines David Howells
2021-01-25 21:33 ` [PATCH 13/32] afs: Pass page into dirty region helpers to provide THP size David Howells
2021-01-25 21:33 ` [PATCH 14/32] afs: Print the operation debug_id when logging an unexpected data version David Howells
2021-01-25 21:33 ` [PATCH 15/32] afs: Move key to afs_read struct David Howells
2021-01-25 21:34 ` [PATCH 16/32] afs: Don't truncate iter during data fetch David Howells
2021-01-25 21:34 ` [PATCH 17/32] afs: Log remote unmarshalling errors David Howells
2021-01-25 21:34 ` [PATCH 18/32] afs: Set up the iov_iter before calling afs_extract_data() David Howells
2021-01-25 21:34 ` [PATCH 19/32] afs: Use ITER_XARRAY for writing David Howells
2021-01-25 21:34 ` [PATCH 20/32] afs: Wait on PG_fscache before modifying/releasing a page David Howells
2021-01-25 21:35 ` [PATCH 21/32] afs: Extract writeback extension into its own function David Howells
2021-01-25 21:35 ` [PATCH 22/32] afs: Prepare for use of THPs David Howells
2021-01-25 21:35 ` [PATCH 23/32] afs: Use the fs operation ops to handle FetchData completion David Howells
2021-01-25 21:35 ` [PATCH 24/32] afs: Use new fscache read helper API David Howells
2021-01-25 21:35 ` [PATCH 25/32] NFS: Clean up nfs_readpage() and nfs_readpages() David Howells
2021-01-26  3:59   ` Matthew Wilcox
2021-01-26 15:33     ` David Wysochanski
2021-01-25 21:36 ` [PATCH 26/32] NFS: In nfs_readpage() only increment NFSIOS_READPAGES when read succeeds David Howells
2021-01-25 21:36 ` [PATCH 27/32] NFS: Refactor nfs_readpage() and nfs_readpage_async() to use nfs_readdesc David Howells
2021-01-26  4:05   ` Matthew Wilcox
2021-01-26 15:24     ` David Wysochanski
2021-01-25 21:36 ` [PATCH 28/32] NFS: Call readpage_async_filler() from nfs_readpage_async() David Howells
2021-01-25 21:36 ` [PATCH 29/32] NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async() David Howells
2021-01-25 21:37 ` [PATCH 30/32] NFS: Allow internal use of read structs and functions David Howells
2021-01-25 21:37 ` [PATCH 31/32] NFS: Convert to the netfs API and nfs_readpage to use netfs_readpage David Howells
2021-01-25 21:37 ` [PATCH 32/32] NFS: Convert readpage to readahead and use netfs_readahead for fscache David Howells
2021-01-26  1:36   ` Matthew Wilcox
2021-01-26 15:24     ` Chuck Lever
2021-01-26 19:10       ` David Wysochanski
2021-01-26 18:25     ` David Wysochanski
  -- strict thread matches above, loose matches on Subject: below --
2020-07-13 16:30 [PATCH 00/32] fscache: Rewrite 2: Make the I/O interface use kiocb/iov_iter David Howells
2020-07-13 16:31 ` [PATCH 02/32] vm: Add wait/unlock functions for PG_fscache David Howells

CEPH-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/ceph-devel/0 ceph-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ceph-devel ceph-devel/ https://lore.kernel.org/ceph-devel \
		ceph-devel@vger.kernel.org
	public-inbox-index ceph-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.ceph-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git