Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
To: linux-fsdevel@vger.kernel.org
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, linux-mm@kvack.org
Subject: [PATCH v3 03/18] fs: Add AOP_UPDATED_PAGE return value
Date: Fri, 16 Oct 2020 17:04:28 +0100
Message-ID: <20201016160443.18685-4-willy@infradead.org> (raw)
In-Reply-To: <20201016160443.18685-1-willy@infradead.org>

Allow synchronous ->readpage implementations to execute more
efficiently by skipping the re-locking of the page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/filesystems/locking.rst |  7 ++++---
 Documentation/filesystems/vfs.rst     | 21 ++++++++++++++-------
 fs/buffer.c                           | 15 +++++++++------
 fs/cachefiles/rdwr.c                  |  8 ++++++++
 include/linux/fs.h                    |  5 +++++
 mm/filemap.c                          | 15 +++++++++++++--
 mm/page_io.c                          |  7 ++++++-
 mm/readahead.c                        |  3 ++-
 8 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 64f94a18d97e..06a7a8bf2362 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -269,7 +269,7 @@ locking rules:
 ops			PageLocked(page)	 i_rwsem
 ======================	======================== =========
 writepage:		yes, unlocks (see below)
-readpage:		yes, unlocks
+readpage:		yes, may unlock
 writepages:
 set_page_dirty		no
 readahead:		yes, unlocks
@@ -294,8 +294,9 @@ swap_deactivate:	no
 ->write_begin(), ->write_end() and ->readpage() may be called from
 the request handler (/dev/loop).
 
-->readpage() unlocks the page, either synchronously or via I/O
-completion.
+->readpage() may return AOP_UPDATED_PAGE if the page is now Uptodate
+or 0 if the page will be unlocked asynchronously by I/O completion.
+If it returns -errno, it should unlock the page.
 
 ->readahead() unlocks the pages that I/O is attempted on like ->readpage().
 
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index ca52c82e5bb5..16248c299aaa 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -643,7 +643,7 @@ set_page_dirty to write data into the address_space, and writepage and
 writepages to writeback data to storage.
 
 Adding and removing pages to/from an address_space is protected by the
-inode's i_mutex.
+inode's i_rwsem held exclusively.
 
 When data is written to a page, the PG_Dirty flag should be set.  It
 typically remains set until writepage asks for it to be written.  This
@@ -757,12 +757,19 @@ cache in your filesystem.  The following members are defined:
 
 ``readpage``
 	called by the VM to read a page from backing store.  The page
-	will be Locked when readpage is called, and should be unlocked
-	and marked uptodate once the read completes.  If ->readpage
-	discovers that it needs to unlock the page for some reason, it
-	can do so, and then return AOP_TRUNCATED_PAGE.  In this case,
-	the page will be relocated, relocked and if that all succeeds,
-	->readpage will be called again.
+	will be Locked and !Uptodate when readpage is called.  Ideally,
+	the filesystem will bring the page Uptodate and return
+	AOP_UPDATED_PAGE.  If the filesystem encounters an error, it
+	should unlock the page and return a negative errno without marking
+	the page Uptodate.  It does not need to mark the page as Error.
+	If the filesystem returns 0, this means the page will be unlocked
+	asynchronously by I/O completion.  The VFS will wait for the
+	page to be unlocked, so there is no advantage to executing this
+	operation asynchronously.
+
+	The filesystem can also return AOP_TRUNCATED_PAGE to indicate
+	that it had to unlock the page to avoid a deadlock.  The caller
+	will re-check the page cache and call ->readpage again.
 
 ``writepages``
 	called by the VM to write out pages associated with the
diff --git a/fs/buffer.c b/fs/buffer.c
index 50bbc99e3d96..1d5337517dcd 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2856,14 +2856,17 @@ int nobh_truncate_page(struct address_space *mapping,
 	/* Ok, it's mapped. Make sure it's up-to-date */
 	if (!PageUptodate(page)) {
 		err = mapping->a_ops->readpage(NULL, page);
-		if (err) {
+		if (err < 0) {
 			put_page(page);
 			goto out;
-		}
-		lock_page(page);
-		if (!PageUptodate(page)) {
-			err = -EIO;
-			goto unlock;
+		} else if (err == 0) {
+			lock_page(page);
+			if (!PageUptodate(page)) {
+				err = -EIO;
+				goto unlock;
+			}
+		} else {
+			err = 0;
 		}
 		if (page_has_buffers(page))
 			goto has_buffers;
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 8bda092e60c5..cc4cc535caf5 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -122,6 +122,10 @@ static int cachefiles_read_reissue(struct cachefiles_object *object,
 		ret = bmapping->a_ops->readpage(NULL, backpage);
 		if (ret < 0)
 			goto discard;
+		if (ret == AOP_UPDATED_PAGE) {
+			ret = 0;
+			goto unlock_discard;
+		}
 	}
 
 	/* but the page may have been read before the monitor was installed, so
@@ -285,6 +289,8 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
 	ret = bmapping->a_ops->readpage(NULL, backpage);
 	if (ret < 0)
 		goto read_error;
+	if (ret == AOP_UPDATED_PAGE)
+		unlock_page(backpage);
 
 	/* set the monitor to transfer the data across */
 monitor_backing_page:
@@ -523,6 +529,8 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 		ret = bmapping->a_ops->readpage(NULL, backpage);
 		if (ret < 0)
 			goto read_error;
+		if (ret == AOP_UPDATED_PAGE)
+			unlock_page(backpage);
 
 		/* add the netfs page to the pagecache and LRU, and set the
 		 * monitor to transfer the data across */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7519ae003a08..badf80e133fd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -273,6 +273,10 @@ struct iattr {
  *  			reference, it should drop it before retrying.  Returned
  *  			by readpage().
  *
+ * @AOP_UPDATED_PAGE: The readpage method has brought the page Uptodate
+ * without releasing the page lock.  This is suitable for synchronous
+ * implementations of readpage.
+ *
  * address_space_operation functions return these large constants to indicate
  * special semantics to the caller.  These are much larger than the bytes in a
  * page to allow for functions that return the number of bytes operated on in a
@@ -282,6 +286,7 @@ struct iattr {
 enum positive_aop_returns {
 	AOP_WRITEPAGE_ACTIVATE	= 0x80000,
 	AOP_TRUNCATED_PAGE	= 0x80001,
+	AOP_UPDATED_PAGE	= 0x80002,
 };
 
 #define AOP_FLAG_CONT_EXPAND		0x0001 /* called from cont_expand */
diff --git a/mm/filemap.c b/mm/filemap.c
index 1aaea26556cc..95b68ec1f22c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2254,8 +2254,13 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb,
 		 * PG_error will be set again if readpage fails.
 		 */
 		ClearPageError(page);
-		/* Start the actual read. The read will unlock the page. */
+		/* Start the actual read. The read may unlock the page. */
 		error = mapping->a_ops->readpage(filp, page);
+		if (error == AOP_UPDATED_PAGE) {
+			unlock_page(page);
+			error = 0;
+			goto page_ok;
+		}
 
 		if (unlikely(error)) {
 			if (error == AOP_TRUNCATED_PAGE) {
@@ -2619,7 +2624,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 	 */
 	if (unlikely(!PageUptodate(page)))
 		goto page_not_uptodate;
-
+page_ok:
 	/*
 	 * We've made it this far and we had to drop our mmap_lock, now is the
 	 * time to return to the upper layer and have it re-find the vma and
@@ -2654,6 +2659,8 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 	ClearPageError(page);
 	fpin = maybe_unlock_mmap_for_io(vmf, fpin);
 	error = mapping->a_ops->readpage(file, page);
+	if (error == AOP_UPDATED_PAGE)
+		goto page_ok;
 	if (!error) {
 		wait_on_page_locked(page);
 		if (!PageUptodate(page))
@@ -2867,6 +2874,10 @@ static struct page *do_read_cache_page(struct address_space *mapping,
 			err = filler(data, page);
 		else
 			err = mapping->a_ops->readpage(data, page);
+		if (err == AOP_UPDATED_PAGE) {
+			unlock_page(page);
+			goto out;
+		}
 
 		if (err < 0) {
 			put_page(page);
diff --git a/mm/page_io.c b/mm/page_io.c
index faf5ccb42946..f141a171d09c 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -376,13 +376,18 @@ static int mapping_readpage(struct file *file, struct address_space *mapping,
 		._index = page->index,
 		._nr_pages = 1,
 	};
+	int ret;
 
 	if (!synchronous && mapping->a_ops->readahead) {
 		mapping->a_ops->readahead(&ractl);
 		return 0;
 	}
 
-	return mapping->a_ops->readpage(file, page);
+	ret = mapping->a_ops->readpage(file, page);
+	if (ret != AOP_UPDATED_PAGE)
+		return ret;
+	unlock_page(page);
+	return 0;
 }
 
 int swap_readpage(struct page *page, bool synchronous)
diff --git a/mm/readahead.c b/mm/readahead.c
index 3c9a8dd7c56c..f182fa731ab3 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -142,7 +142,8 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
 		rac->_nr_pages = 0;
 	} else {
 		while ((page = readahead_page(rac))) {
-			aops->readpage(rac->file, page);
+			if (aops->readpage(rac->file, page) == AOP_UPDATED_PAGE)
+				unlock_page(page);
 			put_page(page);
 		}
 	}
-- 
2.28.0


  parent reply index

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-16 16:04 [PATCH v3 00/18] Allow readpage to return a locked page Matthew Wilcox (Oracle)
2020-10-16 16:04 ` [PATCH v3 01/18] cachefiles: Handle readpage error correctly Matthew Wilcox (Oracle)
2020-10-16 16:04 ` [PATCH v3 02/18] swap: Call aops->readahead if appropriate Matthew Wilcox (Oracle)
2020-10-16 16:04 ` Matthew Wilcox (Oracle) [this message]
2020-10-16 16:04 ` [PATCH v3 04/18] mm/filemap: Inline wait_on_page_read into its one caller Matthew Wilcox (Oracle)
2020-10-16 16:04 ` [PATCH v3 05/18] 9p: Tell the VFS that readpage was synchronous Matthew Wilcox (Oracle)
2020-10-16 16:04 ` [PATCH v3 06/18] afs: " Matthew Wilcox (Oracle)
2020-10-16 16:04 ` [PATCH v3 07/18] ceph: " Matthew Wilcox (Oracle)
2020-10-16 16:04 ` [PATCH v3 08/18] cifs: " Matthew Wilcox (Oracle)
2020-10-16 16:04 ` [PATCH v3 09/18] cramfs: " Matthew Wilcox (Oracle)
2020-10-16 16:04 ` [PATCH v3 10/18] ecryptfs: " Matthew Wilcox (Oracle)
2020-10-16 16:04 ` [PATCH v3 11/18] ext4: " Matthew Wilcox (Oracle)
2020-10-18 14:24   ` Theodore Y. Ts'o
2020-10-16 16:04 ` [PATCH v3 12/18] ext4: Return error from ext4_readpage Matthew Wilcox (Oracle)
2020-10-18 14:25   ` Theodore Y. Ts'o
2020-10-18 15:04     ` Matthew Wilcox
2020-10-16 16:04 ` [PATCH v3 13/18] fuse: Tell the VFS that readpage was synchronous Matthew Wilcox (Oracle)
2020-10-20 14:13   ` Miklos Szeredi
2020-10-16 16:04 ` [PATCH v3 14/18] hostfs: " Matthew Wilcox (Oracle)
2020-10-16 16:04 ` [PATCH v3 15/18] jffs2: " Matthew Wilcox (Oracle)
2020-10-16 16:04 ` [PATCH v3 16/18] ubifs: " Matthew Wilcox (Oracle)
2020-10-16 16:04 ` [PATCH v3 17/18] udf: " Matthew Wilcox (Oracle)
2020-10-16 16:04 ` [PATCH v3 18/18] vboxsf: " Matthew Wilcox (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201016160443.18685-4-willy@infradead.org \
    --to=willy@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git