All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "hch@lst.de" <hch@lst.de>,
	"miklos@szeredi.hu" <miklos@szeredi.hu>,
	"willy@infradead.org" <willy@infradead.org>,
	"dhowells@redhat.com" <dhowells@redhat.com>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
	"linux-cifs@vger.kernel.org" <linux-cifs@vger.kernel.org>,
	"brauner@kernel.org" <brauner@kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"v9fs@lists.linux.dev" <v9fs@lists.linux.dev>,
	"netfs@lists.linux.dev" <netfs@lists.linux.dev>,
	"jlayton@kernel.org" <jlayton@kernel.org>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"devel@lists.orangefs.org" <devel@lists.orangefs.org>,
	"linux-afs@lists.infradead.org" <linux-afs@lists.infradead.org>
Subject: Re: [RFC PATCH] mm, netfs: Provide a means of invalidation without using launder_folio
Date: Wed, 27 Mar 2024 15:56:50 +0000	[thread overview]
Message-ID: <37514eae34c02cefb11fc4c6d3f4ae2296fb6ab5.camel@hammerspace.com> (raw)
In-Reply-To: <2318298.1711551844@warthog.procyon.org.uk>

On Wed, 2024-03-27 at 15:04 +0000, David Howells wrote:
> Implement a replacement for launder_folio[1].  The key feature of
> invalidate_inode_pages2() is that it locks each folio individually,
> unmaps
> it to prevent mmap'd accesses interfering and calls the -
> >launder_folio()
> address_space op to flush it.  This has problems: firstly, each folio
> is
> written individually as one or more small writes; secondly, adjacent
> folios
> cannot be added so easily into the laundry; thirdly, it's yet another
> op to
> implement.
> 
> Here's a bit of a hacked together solution which should probably be
> moved
> to mm/:
> 
> Use the mmap lock to cause future faulting to wait, then unmap all
> the
> folios if we have mmaps, then, conditionally, use ->writepages() to
> flush
> any dirty data back and then discard all pages.  The caller needs to
> hold a
> lock to prevent ->write_iter() getting underfoot.
> 
> Note that this does not prevent ->read_iter() from accessing the file
> whilst we do this since that may operate without locking.
> 
> We also have the writeback_control available and so have the
> opportunity to
> set a flag in it to tell the filesystem that we're doing an
> invalidation.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: Miklos Szeredi <miklos@szeredi.hu>
> cc: Trond Myklebust <trond.myklebust@hammerspace.com>
> cc: Christoph Hellwig <hch@lst.de>
> cc: Andrew Morton <akpm@linux-foundation.org>
> cc: Alexander Viro <viro@zeniv.linux.org.uk>
> cc: Christian Brauner <brauner@kernel.org>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: linux-mm@kvack.org
> cc: linux-fsdevel@vger.kernel.org
> cc: netfs@lists.linux.dev
> cc: v9fs@lists.linux.dev
> cc: linux-afs@lists.infradead.org
> cc: ceph-devel@vger.kernel.org
> cc: linux-cifs@vger.kernel.org
> cc: linux-nfs@vger.kernel.org
> cc: devel@lists.orangefs.org
> Link:
> https://lore.kernel.org/r/1668172.1709764777@warthog.procyon.org.uk/ 
> [1]
> ---
>  fs/netfs/misc.c       |   56
> ++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/netfs.h |    3 ++
>  mm/memory.c           |    3 +-
>  3 files changed, 61 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
> index bc1fc54fb724..774ce825fbec 100644
> --- a/fs/netfs/misc.c
> +++ b/fs/netfs/misc.c
> @@ -250,3 +250,59 @@ bool netfs_release_folio(struct folio *folio,
> gfp_t gfp)
>  	return true;
>  }
>  EXPORT_SYMBOL(netfs_release_folio);
> +
> +extern void unmap_mapping_range_tree(struct rb_root_cached *root,
> +				     pgoff_t first_index,
> +				     pgoff_t last_index,
> +				     struct zap_details *details);
> +
> +/**
> + * netfs_invalidate_inode - Invalidate/forcibly write back an
> inode's pagecache
> + * @inode: The inode to flush
> + * @flush: Set to write back rather than simply invalidate.
> + *
> + * Invalidate all the folios on an inode, possibly writing them back
> first.
> + * Whilst the operation is undertaken, the mmap lock is held to
> prevent
> + * ->fault() from reinstalling the folios.  The caller must hold a
> lock on the
> + * inode sufficient to prevent ->write_iter() from dirtying more
> folios.
> + */
> +int netfs_invalidate_inode(struct inode *inode, bool flush)
> +{
> +	struct address_space *mapping = inode->i_mapping;
> +
> +	if (!mapping || !mapping->nrpages)
> +		goto out;
> +
> +	/* Prevent folios from being faulted in. */
> +	i_mmap_lock_write(mapping);
> +
> +	if (!mapping->nrpages)
> +		goto unlock;
> +
> +	/* Assume there are probably PTEs only if there are mmaps.
> */
> +	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))) {
> +		struct zap_details details = { };
> +
> +		unmap_mapping_range_tree(&mapping->i_mmap, 0,
> LLONG_MAX, &details);
> +	}
> +
> +	/* Write back the data if we're asked to. */
> +	if (flush) {
> +		struct writeback_control wbc = {
> +			.sync_mode	= WB_SYNC_ALL,
> +			.nr_to_write	= LONG_MAX,
> +			.range_start	= 0,
> +			.range_end	= LLONG_MAX,
> +		};
> +
> +		filemap_fdatawrite_wbc(mapping, &wbc);
> +	}
> +
> +	/* Wait for writeback to complete on all folios and discard.
> */
> +	truncate_inode_pages_range(mapping, 0, LLONG_MAX);
> +
> +unlock:
> +	i_mmap_unlock_write(mapping);
> +out:
> +	return filemap_check_errors(mapping);
> +}
> diff --git a/include/linux/netfs.h b/include/linux/netfs.h
> index 298552f5122c..40dc34ee291d 100644
> --- a/include/linux/netfs.h
> +++ b/include/linux/netfs.h
> @@ -400,6 +400,9 @@ ssize_t netfs_buffered_write_iter_locked(struct
> kiocb *iocb, struct iov_iter *fr
>  ssize_t netfs_unbuffered_write_iter(struct kiocb *iocb, struct
> iov_iter *from);
>  ssize_t netfs_file_write_iter(struct kiocb *iocb, struct iov_iter
> *from);
>  
> +/* High-level invalidation API */
> +int netfs_invalidate_inode(struct inode *inode, bool flush);
> +
>  /* Address operations API */
>  struct readahead_control;
>  void netfs_readahead(struct readahead_control *);
> diff --git a/mm/memory.c b/mm/memory.c
> index f2bc6dd15eb8..106f32c7d7fb 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3665,7 +3665,7 @@ static void unmap_mapping_range_vma(struct
> vm_area_struct *vma,
>  	zap_page_range_single(vma, start_addr, end_addr -
> start_addr, details);
>  }
>  
> -static inline void unmap_mapping_range_tree(struct rb_root_cached
> *root,
> +inline void unmap_mapping_range_tree(struct rb_root_cached *root,
>  					    pgoff_t first_index,
>  					    pgoff_t last_index,
>  					    struct zap_details
> *details)
> @@ -3685,6 +3685,7 @@ static inline void
> unmap_mapping_range_tree(struct rb_root_cached *root,
>  				details);
>  	}
>  }
> +EXPORT_SYMBOL_GPL(unmap_mapping_range_tree);
>  
>  /**
>   * unmap_mapping_folio() - Unmap single folio from processes.
> 

This is hardly a drop-in replacement for launder_page. The whole point
of using invalidate_inode_pages2() was that it only requires taking the
page locks, allowing us to use it in contexts such as
nfs_release_file().

The above use of truncate_inode_pages_range() will require any caller
to grab several locks in order to prevent data loss through races with
write system calls.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  reply	other threads:[~2024-03-27 15:56 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-27 15:04 [RFC PATCH] mm, netfs: Provide a means of invalidation without using launder_folio David Howells
2024-03-27 15:56 ` Trond Myklebust [this message]
2024-03-27 17:46   ` Matthew Wilcox
2024-03-27 17:55 ` [RFC PATCH v2] " David Howells
2024-03-27 18:45   ` Matthew Wilcox
2024-03-27 20:37   ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=37514eae34c02cefb11fc4c6d3f4ae2296fb6ab5.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=devel@lists.orangefs.org \
    --cc=dhowells@redhat.com \
    --cc=hch@lst.de \
    --cc=jlayton@kernel.org \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=netfs@lists.linux.dev \
    --cc=v9fs@lists.linux.dev \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.