ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: David Howells <dhowells@redhat.com>,
	Christian Brauner <christian@brauner.io>,
	Gao Xiang <hsiangkao@linux.alibaba.com>,
	Dominique Martinet <asmadeus@codewreck.org>
Cc: Matthew Wilcox <willy@infradead.org>,
	Steve French <smfrench@gmail.com>,
	Marc Dionne <marc.dionne@auristor.com>,
	Paulo Alcantara <pc@manguebit.com>,
	Shyam Prasad N <sprasad@microsoft.com>,
	Tom Talpey <tom@talpey.com>,
	Eric Van Hensbergen <ericvh@kernel.org>,
	Ilya Dryomov <idryomov@gmail.com>,
	netfs@lists.linux.dev,  linux-cachefs@redhat.com,
	linux-afs@lists.infradead.org,  linux-cifs@vger.kernel.org,
	linux-nfs@vger.kernel.org,  ceph-devel@vger.kernel.org,
	v9fs@lists.linux.dev, linux-erofs@lists.ozlabs.org,
	 linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	netdev@vger.kernel.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache
Date: Mon, 15 Apr 2024 08:49:39 -0400	[thread overview]
Message-ID: <5c905e500499a07c5e4b0dcf9983b90e8746ed81.camel@kernel.org> (raw)
In-Reply-To: <20240328163424.2781320-1-dhowells@redhat.com>

On Thu, 2024-03-28 at 16:33 +0000, David Howells wrote:
> Hi Christian, Willy,
> 
> The primary purpose of these patches is to rework the netfslib writeback
> implementation such that pages read from the cache are written to the cache
> through ->writepages(), thereby allowing the fscache page flag to be
> retired.
> 
> The reworking also:
> 
>  (1) builds on top of the new writeback_iter() infrastructure;
> 
>  (2) makes it possible to use vectored write RPCs as discontiguous streams
>      of pages can be accommodated;
> 
>  (3) makes it easier to do simultaneous content crypto and stream division.
> 
>  (4) provides support for retrying writes and re-dividing a stream;
> 
>  (5) replaces the ->launder_folio() op, so that ->writepages() is used
>      instead;
> 
>  (6) uses mempools to allocate the netfs_io_request and netfs_io_subrequest
>      structs to avoid allocation failure in the writeback path.
> 
> Some code that uses the fscache page flag is retained for compatibility
> purposes with nfs and ceph.  The code is switched to using the synonymous
> private_2 label instead and marked with deprecation comments.  I have a
> separate set of patches that convert cifs to use this code.
> 
> -~-
> 
> In this new implementation, writeback_iter() is used to pump folios,
> progressively creating two parallel, but separate streams.  Either or both
> streams can contain gaps, and the subrequests in each stream can be of
> variable size, don't need to align with each other and don't need to align
> with the folios.  (Note that more streams can be added if we have multiple
> servers to duplicate data to).
> 
> Indeed, subrequests can cross folio boundaries, may cover several folios or
> a folio may be spanned by multiple subrequests, e.g.:
> 
>          +---+---+-----+-----+---+----------+
> Folios:  |   |   |     |     |   |          |
>          +---+---+-----+-----+---+----------+
> 
>            +------+------+     +----+----+
> Upload:    |      |      |.....|    |    |
>            +------+------+     +----+----+
> 
>          +------+------+------+------+------+
> Cache:   |      |      |      |      |      |
>          +------+------+------+------+------+
> 
> Data that got read from the server that needs copying to the cache is
> stored in folios that are marked dirty and have folio->private set to a
> special value.
> 
> The progressive subrequest construction permits the algorithm to be
> preparing both the next upload to the server and the next write to the
> cache whilst the previous ones are already in progress.  Throttling can be
> applied to control the rate of production of subrequests - and, in any
> case, we probably want to write them to the server in ascending order,
> particularly if the file will be extended.
> 
> Content crypto can also be prepared at the same time as the subrequests and
> run asynchronously, with the prepped requests being stalled until the
> crypto catches up with them.  This might also be useful for transport
> crypto, but that happens at a lower layer, so probably would be harder to
> pull off.
> 
> The algorithm is split into three parts:
> 
>  (1) The issuer.  This walks through the data, packaging it up, encrypting
>      it and creating subrequests.  The part of this that generates
>      subrequests only deals with file positions and spans and so is usable
>      for DIO/unbuffered writes as well as buffered writes.
> 
>  (2) The collector.  This asynchronously collects completed subrequests,
>      unlocks folios, frees crypto buffers and performs any retries.  This
>      runs in a work queue so that the issuer can return to the caller for
>      writeback (so that the VM can have its kswapd thread back) or async
>      writes.
> 
>      Collection is slightly complex as the collector has to work out where
>      discontiguities happen in the folio list so that it doesn't try and
>      collect folios that weren't included in the write out.
> 
>  (3) The retryer.  This pauses the issuer, waits for all outstanding
>      subrequests to complete and then goes through the failed subrequests
>      to reissue them.  This may involve reprepping them (with cifs, the
>      credits must be renegotiated and a subrequest may need splitting), and
>      doing RMW for content crypto if there's a conflicting change on the
>      server.
> 
> David
> 
> David Howells (26):
>   cifs: Fix duplicate fscache cookie warnings
>   9p: Clean up some kdoc and unused var warnings.
>   netfs: Update i_blocks when write committed to pagecache
>   netfs: Replace PG_fscache by setting folio->private and marking dirty
>   mm: Remove the PG_fscache alias for PG_private_2
>   netfs: Remove deprecated use of PG_private_2 as a second writeback
>     flag
>   netfs: Make netfs_io_request::subreq_counter an atomic_t
>   netfs: Use subreq_counter to allocate subreq debug_index values
>   mm: Provide a means of invalidation without using launder_folio
>   cifs: Use alternative invalidation to using launder_folio
>   9p: Use alternative invalidation to using launder_folio
>   afs: Use alternative invalidation to using launder_folio
>   netfs: Remove ->launder_folio() support
>   netfs: Use mempools for allocating requests and subrequests
>   mm: Export writeback_iter()
>   netfs: Switch to using unsigned long long rather than loff_t
>   netfs: Fix writethrough-mode error handling
>   netfs: Add some write-side stats and clean up some stat names
>   netfs: New writeback implementation
>   netfs, afs: Implement helpers for new write code
>   netfs, 9p: Implement helpers for new write code
>   netfs, cachefiles: Implement helpers for new write code
>   netfs: Cut over to using new writeback code
>   netfs: Remove the old writeback code
>   netfs: Miscellaneous tidy ups
>   netfs, afs: Use writeback retry to deal with alternate keys
> 
>  fs/9p/vfs_addr.c             |  60 +--
>  fs/9p/vfs_inode_dotl.c       |   4 -
>  fs/afs/file.c                |   8 +-
>  fs/afs/internal.h            |   6 +-
>  fs/afs/validation.c          |   4 +-
>  fs/afs/write.c               | 187 ++++----
>  fs/cachefiles/io.c           |  75 +++-
>  fs/ceph/addr.c               |  24 +-
>  fs/ceph/inode.c              |   2 +
>  fs/netfs/Makefile            |   3 +-
>  fs/netfs/buffered_read.c     |  40 +-
>  fs/netfs/buffered_write.c    | 832 ++++-------------------------------
>  fs/netfs/direct_write.c      |  30 +-
>  fs/netfs/fscache_io.c        |  14 +-
>  fs/netfs/internal.h          |  55 ++-
>  fs/netfs/io.c                | 155 +------
>  fs/netfs/main.c              |  55 ++-
>  fs/netfs/misc.c              |  10 +-
>  fs/netfs/objects.c           |  81 +++-
>  fs/netfs/output.c            | 478 --------------------
>  fs/netfs/stats.c             |  17 +-
>  fs/netfs/write_collect.c     | 813 ++++++++++++++++++++++++++++++++++
>  fs/netfs/write_issue.c       | 673 ++++++++++++++++++++++++++++
>  fs/nfs/file.c                |   8 +-
>  fs/nfs/fscache.h             |   6 +-
>  fs/nfs/write.c               |   4 +-
>  fs/smb/client/cifsfs.h       |   1 -
>  fs/smb/client/file.c         | 136 +-----
>  fs/smb/client/fscache.c      |  16 +-
>  fs/smb/client/inode.c        |  27 +-
>  include/linux/fscache.h      |  22 +-
>  include/linux/netfs.h        | 196 +++++----
>  include/linux/pagemap.h      |   1 +
>  include/net/9p/client.h      |   2 +
>  include/trace/events/netfs.h | 249 ++++++++++-
>  mm/filemap.c                 |  52 ++-
>  mm/page-writeback.c          |   1 +
>  net/9p/Kconfig               |   1 +
>  net/9p/client.c              |  49 +++
>  net/9p/trans_fd.c            |   1 -
>  40 files changed, 2492 insertions(+), 1906 deletions(-)
>  delete mode 100644 fs/netfs/output.c
>  create mode 100644 fs/netfs/write_collect.c
>  create mode 100644 fs/netfs/write_issue.c
> 

This all looks pretty reasonable. There is at least one bugfix that
looks like it ought to go in independently (#17). #19 is huge, complex
and hard to review. That will need some cycles in -next, I think. In any
case, on any that I didn't send comments you can add:

    Reviewed-by: Jeff Layton <jlayton@kernel.org>

      parent reply	other threads:[~2024-04-15 12:49 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
2024-03-28 16:33 ` [PATCH 01/26] cifs: Fix duplicate fscache cookie warnings David Howells
2024-04-15 11:25   ` Jeff Layton
2024-04-15 13:03   ` David Howells
2024-04-15 22:51     ` Steve French
2024-04-16 22:40     ` David Howells
2024-03-28 16:33 ` [PATCH 02/26] 9p: Clean up some kdoc and unused var warnings David Howells
2024-03-28 16:33 ` [PATCH 03/26] netfs: Update i_blocks when write committed to pagecache David Howells
2024-04-15 11:28   ` Jeff Layton
2024-04-16 22:47   ` David Howells
2024-03-28 16:33 ` [PATCH 04/26] netfs: Replace PG_fscache by setting folio->private and marking dirty David Howells
2024-03-28 16:33 ` [PATCH 05/26] mm: Remove the PG_fscache alias for PG_private_2 David Howells
2024-03-28 16:33 ` [PATCH 06/26] netfs: Remove deprecated use of PG_private_2 as a second writeback flag David Howells
2024-03-28 16:33 ` [PATCH 07/26] netfs: Make netfs_io_request::subreq_counter an atomic_t David Howells
2024-03-28 16:34 ` [PATCH 08/26] netfs: Use subreq_counter to allocate subreq debug_index values David Howells
2024-03-28 16:34 ` [PATCH 09/26] mm: Provide a means of invalidation without using launder_folio David Howells
2024-04-15 11:41   ` Jeff Layton
2024-04-17  9:02   ` David Howells
2024-03-28 16:34 ` [PATCH 10/26] cifs: Use alternative invalidation to " David Howells
2024-03-28 16:34 ` [PATCH 11/26] 9p: " David Howells
2024-04-15 11:43   ` Jeff Layton
2024-04-16 23:03   ` David Howells
2024-03-28 16:34 ` [PATCH 12/26] afs: " David Howells
2024-03-28 16:34 ` [PATCH 13/26] netfs: Remove ->launder_folio() support David Howells
2024-03-28 16:34 ` [PATCH 14/26] netfs: Use mempools for allocating requests and subrequests David Howells
2024-03-28 16:34 ` [PATCH 15/26] mm: Export writeback_iter() David Howells
2024-04-03  8:59   ` Christoph Hellwig
2024-04-03 10:10   ` David Howells
2024-04-03 10:14     ` Christoph Hellwig
2024-04-03 10:55     ` David Howells
2024-04-03 12:41       ` Christoph Hellwig
2024-04-03 12:58       ` David Howells
2024-04-05  6:53         ` Christoph Hellwig
2024-04-05 10:15         ` Christian Brauner
2024-03-28 16:34 ` [PATCH 16/26] netfs: Switch to using unsigned long long rather than loff_t David Howells
2024-03-28 16:34 ` [PATCH 17/26] netfs: Fix writethrough-mode error handling David Howells
2024-04-15 12:40   ` Jeff Layton
2024-04-17  9:04   ` David Howells
2024-03-28 16:34 ` [PATCH 18/26] netfs: Add some write-side stats and clean up some stat names David Howells
2024-03-28 16:34 ` [PATCH 19/26] netfs: New writeback implementation David Howells
2024-03-29 10:34   ` Naveen Mamindlapalli
2024-03-30  1:06     ` Vadim Fedorenko
2024-03-30  1:06       ` Vadim Fedorenko
2024-03-30  1:03   ` Vadim Fedorenko
2024-03-28 16:34 ` [PATCH 20/26] netfs, afs: Implement helpers for new write code David Howells
2024-03-28 16:34 ` [PATCH 21/26] netfs, 9p: " David Howells
2024-03-28 16:34 ` [PATCH 22/26] netfs, cachefiles: " David Howells
2024-03-28 16:34 ` [PATCH 23/26] netfs: Cut over to using new writeback code David Howells
2024-03-28 16:34 ` [PATCH 24/26] netfs: Remove the old " David Howells
2024-04-15 12:20   ` Jeff Layton
2024-04-17 10:36   ` David Howells
2024-03-28 16:34 ` [PATCH 25/26] netfs: Miscellaneous tidy ups David Howells
2024-03-28 16:34 ` [PATCH 26/26] netfs, afs: Use writeback retry to deal with alternate keys David Howells
2024-04-01 13:53   ` Simon Horman
2024-04-02  8:32   ` David Howells
2024-04-10 17:38     ` Simon Horman
2024-04-11  7:09     ` David Howells
2024-04-02  8:46 ` [PATCH 19/26] netfs: New writeback implementation David Howells
2024-04-02 10:48 ` [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache Christian Brauner
2024-04-04  7:51 ` [PATCH 21/26] netfs, 9p: Implement helpers for new write code David Howells
2024-04-04  8:01 ` David Howells
2024-04-08 15:53 ` [PATCH 23/26] netfs: Cut over to using new writeback code David Howells
2024-04-15 12:49 ` Jeff Layton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c905e500499a07c5e4b0dcf9983b90e8746ed81.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=asmadeus@codewreck.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=christian@brauner.io \
    --cc=dhowells@redhat.com \
    --cc=ericvh@kernel.org \
    --cc=hsiangkao@linux.alibaba.com \
    --cc=idryomov@gmail.com \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=marc.dionne@auristor.com \
    --cc=netdev@vger.kernel.org \
    --cc=netfs@lists.linux.dev \
    --cc=pc@manguebit.com \
    --cc=smfrench@gmail.com \
    --cc=sprasad@microsoft.com \
    --cc=tom@talpey.com \
    --cc=v9fs@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).