All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Matthew Wilcox <willy@infradead.org>,
	Miklos Szeredi <miklos@szeredi.hu>,
	Trond Myklebust <trond.myklebust@hammerspace.com>,
	Christoph Hellwig <hch@lst.de>
Cc: dhowells@redhat.com, Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>,
	Jeff Layton <jlayton@kernel.org>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	netfs@lists.linux.dev, v9fs@lists.linux.dev,
	linux-afs@lists.infradead.org, ceph-devel@vger.kernel.org,
	linux-cifs@vger.kernel.org, linux-nfs@vger.kernel.org,
	devel@lists.orangefs.org, linux-kernel@vger.kernel.org
Subject: [RFC PATCH] mm, netfs: Provide a means of invalidation without using launder_folio
Date: Wed, 27 Mar 2024 15:04:04 +0000	[thread overview]
Message-ID: <2318298.1711551844@warthog.procyon.org.uk> (raw)

Implement a replacement for launder_folio[1].  The key feature of
invalidate_inode_pages2() is that it locks each folio individually, unmaps
it to prevent mmap'd accesses interfering and calls the ->launder_folio()
address_space op to flush it.  This has problems: firstly, each folio is
written individually as one or more small writes; secondly, adjacent folios
cannot be added so easily into the laundry; thirdly, it's yet another op to
implement.

Here's a bit of a hacked together solution which should probably be moved
to mm/:

Use the mmap lock to cause future faulting to wait, then unmap all the
folios if we have mmaps, then, conditionally, use ->writepages() to flush
any dirty data back and then discard all pages.  The caller needs to hold a
lock to prevent ->write_iter() getting underfoot.

Note that this does not prevent ->read_iter() from accessing the file
whilst we do this since that may operate without locking.

We also have the writeback_control available and so have the opportunity to
set a flag in it to tell the filesystem that we're doing an invalidation.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Miklos Szeredi <miklos@szeredi.hu>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Andrew Morton <akpm@linux-foundation.org>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Christian Brauner <brauner@kernel.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-mm@kvack.org
cc: linux-fsdevel@vger.kernel.org
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: devel@lists.orangefs.org
Link: https://lore.kernel.org/r/1668172.1709764777@warthog.procyon.org.uk/ [1]
---
 fs/netfs/misc.c       |   56 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/netfs.h |    3 ++
 mm/memory.c           |    3 +-
 3 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
index bc1fc54fb724..774ce825fbec 100644
--- a/fs/netfs/misc.c
+++ b/fs/netfs/misc.c
@@ -250,3 +250,59 @@ bool netfs_release_folio(struct folio *folio, gfp_t gfp)
 	return true;
 }
 EXPORT_SYMBOL(netfs_release_folio);
+
+extern void unmap_mapping_range_tree(struct rb_root_cached *root,
+				     pgoff_t first_index,
+				     pgoff_t last_index,
+				     struct zap_details *details);
+
+/**
+ * netfs_invalidate_inode - Invalidate/forcibly write back an inode's pagecache
+ * @inode: The inode to flush
+ * @flush: Set to write back rather than simply invalidate.
+ *
+ * Invalidate all the folios on an inode, possibly writing them back first.
+ * Whilst the operation is undertaken, the mmap lock is held to prevent
+ * ->fault() from reinstalling the folios.  The caller must hold a lock on the
+ * inode sufficient to prevent ->write_iter() from dirtying more folios.
+ */
+int netfs_invalidate_inode(struct inode *inode, bool flush)
+{
+	struct address_space *mapping = inode->i_mapping;
+
+	if (!mapping || !mapping->nrpages)
+		goto out;
+
+	/* Prevent folios from being faulted in. */
+	i_mmap_lock_write(mapping);
+
+	if (!mapping->nrpages)
+		goto unlock;
+
+	/* Assume there are probably PTEs only if there are mmaps. */
+	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))) {
+		struct zap_details details = { };
+
+		unmap_mapping_range_tree(&mapping->i_mmap, 0, LLONG_MAX, &details);
+	}
+
+	/* Write back the data if we're asked to. */
+	if (flush) {
+		struct writeback_control wbc = {
+			.sync_mode	= WB_SYNC_ALL,
+			.nr_to_write	= LONG_MAX,
+			.range_start	= 0,
+			.range_end	= LLONG_MAX,
+		};
+
+		filemap_fdatawrite_wbc(mapping, &wbc);
+	}
+
+	/* Wait for writeback to complete on all folios and discard. */
+	truncate_inode_pages_range(mapping, 0, LLONG_MAX);
+
+unlock:
+	i_mmap_unlock_write(mapping);
+out:
+	return filemap_check_errors(mapping);
+}
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 298552f5122c..40dc34ee291d 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -400,6 +400,9 @@ ssize_t netfs_buffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *fr
 ssize_t netfs_unbuffered_write_iter(struct kiocb *iocb, struct iov_iter *from);
 ssize_t netfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from);
 
+/* High-level invalidation API */
+int netfs_invalidate_inode(struct inode *inode, bool flush);
+
 /* Address operations API */
 struct readahead_control;
 void netfs_readahead(struct readahead_control *);
diff --git a/mm/memory.c b/mm/memory.c
index f2bc6dd15eb8..106f32c7d7fb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3665,7 +3665,7 @@ static void unmap_mapping_range_vma(struct vm_area_struct *vma,
 	zap_page_range_single(vma, start_addr, end_addr - start_addr, details);
 }
 
-static inline void unmap_mapping_range_tree(struct rb_root_cached *root,
+inline void unmap_mapping_range_tree(struct rb_root_cached *root,
 					    pgoff_t first_index,
 					    pgoff_t last_index,
 					    struct zap_details *details)
@@ -3685,6 +3685,7 @@ static inline void unmap_mapping_range_tree(struct rb_root_cached *root,
 				details);
 	}
 }
+EXPORT_SYMBOL_GPL(unmap_mapping_range_tree);
 
 /**
  * unmap_mapping_folio() - Unmap single folio from processes.


             reply	other threads:[~2024-03-27 15:04 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-27 15:04 David Howells [this message]
2024-03-27 15:56 ` [RFC PATCH] mm, netfs: Provide a means of invalidation without using launder_folio Trond Myklebust
2024-03-27 17:46   ` Matthew Wilcox
2024-03-27 17:55 ` [RFC PATCH v2] " David Howells
2024-03-27 18:45   ` Matthew Wilcox
2024-03-27 20:37   ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2318298.1711551844@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=devel@lists.orangefs.org \
    --cc=hch@lst.de \
    --cc=jlayton@kernel.org \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=netfs@lists.linux.dev \
    --cc=trond.myklebust@hammerspace.com \
    --cc=v9fs@lists.linux.dev \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.