ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] netfs, afs: Fix netfs_write_begin and THP handling
@ 2021-06-17  8:23 David Howells
  2021-06-17  8:24 ` [PATCH v2 3/3] netfs: fix test for whether we can skip read when writing beyond EOF David Howells
  2021-06-18  3:46 ` [PATCH v2 0/3] netfs, afs: Fix netfs_write_begin and THP handling Al Viro
  0 siblings, 2 replies; 4+ messages in thread
From: David Howells @ 2021-06-17  8:23 UTC (permalink / raw)
  To: linux-cachefs, linux-afs
  Cc: Matthew Wilcox (Oracle),
	ceph-devel, Jeff Layton, Matthew Wilcox, Al Viro, Andrew W Elble,
	dhowells, Jeff Layton, Matthew Wilcox (Oracle),
	linux-fsdevel, linux-kernel


Here are some patches to fix netfs_write_begin() and the handling of THPs in
that and afs_write_begin/end() in the following ways:

 (1) Use offset_in_thp() rather than manually calculating the offset into
     the page.

 (2) In the future, the len parameter may extend beyond the page allocated.
     This is because the page allocation is deferred to write_begin() and
     that gets to decide what size of THP to allocate.

 (3) In netfs_write_begin(), extract the decision about whether to skip a
     page out to its own helper and have that clear around the region to be
     written, but not clear that region.  This requires the filesystem to
     patch it up afterwards if the hole doesn't get completely filled.

 (4) Due to (3), afs_write_end() now needs to handle short data write into
     the page by generic_perform_write().  I've adopted an analogous
     approach to ceph of just returning 0 in this case and letting the
     caller go round again.

I wonder if generic_perform_write() should pass in a flag indicating
whether this is the first attempt or a second attempt at this, and on the
second attempt we just completely prefill the page and just let the partial
write stand - which we have to do if the page was already uptodate when we
started.

The patches can be found here:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=afs-fixes

David

Link: https://lore.kernel.org/r/20210613233345.113565-1-jlayton@kernel.org/
Link: https://lore.kernel.org/r/162367681795.460125.11729955608839747375.stgit@warthog.procyon.org.uk/

Changes
=======

ver #2:
   - Removed a var that's no longer used (spotted by the kernel test robot)
   - Removed a forgotten "noinline".

ver #1:
   - Prefixed the Jeff's new helper with "netfs_".
   - Don't call zero_user_segments() for a full-page write.
   - Altered the beyond-last-page check to avoid a DIV.
   - Removed redundant zero-length-file check.
   - Added patches to fix afs.

---
David Howells (2):
      afs: Handle len being extending over page end in write_begin/write_end
      afs: Fix afs_write_end() to handle short writes

Jeff Layton (1):
      netfs: fix test for whether we can skip read when writing beyond EOF


 fs/afs/write.c         | 12 +++++++++--
 fs/netfs/read_helper.c | 49 +++++++++++++++++++++++++++++++-----------
 2 files changed, 46 insertions(+), 15 deletions(-)



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 3/3] netfs: fix test for whether we can skip read when writing beyond EOF
  2021-06-17  8:23 [PATCH v2 0/3] netfs, afs: Fix netfs_write_begin and THP handling David Howells
@ 2021-06-17  8:24 ` David Howells
  2021-06-21 14:50   ` Matthew Wilcox
  2021-06-18  3:46 ` [PATCH v2 0/3] netfs, afs: Fix netfs_write_begin and THP handling Al Viro
  1 sibling, 1 reply; 4+ messages in thread
From: David Howells @ 2021-06-17  8:24 UTC (permalink / raw)
  To: linux-cachefs, linux-afs
  Cc: Andrew W Elble, Jeff Layton, ceph-devel, dhowells, Jeff Layton,
	Matthew Wilcox (Oracle),
	linux-fsdevel, linux-kernel

From: Jeff Layton <jlayton@kernel.org>

It's not sufficient to skip reading when the pos is beyond the EOF.
There may be data at the head of the page that we need to fill in
before the write.

Add a new helper function that corrects and clarifies the logic of
when we can skip reads, and have it only zero out the part of the page
that won't have data copied in for the write.

Finally, don't set the page Uptodate after zeroing. It's not up to date
since the write data won't have been copied in yet.

[DH made the following changes:

 - Prefixed the new function with "netfs_".

 - Don't call zero_user_segments() for a full-page write.

 - Altered the beyond-last-page check to avoid a DIV instruction and got
   rid of then-redundant zero-length file check.
]

Fixes: e1b1240c1ff5f ("netfs: Add write_begin helper")
Reported-by: Andrew W Elble <aweits@rit.edu>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: ceph-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20210613233345.113565-1-jlayton@kernel.org/
Link: https://lore.kernel.org/r/162367683365.460125.4467036947364047314.stgit@warthog.procyon.org.uk/ # v1
---

 fs/netfs/read_helper.c |   49 +++++++++++++++++++++++++++++++++++-------------
 1 file changed, 36 insertions(+), 13 deletions(-)

diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 725614625ed4..0b6cd3b8734c 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -1011,12 +1011,42 @@ int netfs_readpage(struct file *file,
 }
 EXPORT_SYMBOL(netfs_readpage);
 
-static void netfs_clear_thp(struct page *page)
+/**
+ * netfs_skip_page_read - prep a page for writing without reading first
+ * @page: page being prepared
+ * @pos: starting position for the write
+ * @len: length of write
+ *
+ * In some cases, write_begin doesn't need to read at all:
+ * - full page write
+ * - write that lies in a page that is completely beyond EOF
+ * - write that covers the the page from start to EOF or beyond it
+ *
+ * If any of these criteria are met, then zero out the unwritten parts
+ * of the page and return true. Otherwise, return false.
+ */
+static bool netfs_skip_page_read(struct page *page, loff_t pos, size_t len)
 {
-	unsigned int i;
+	struct inode *inode = page->mapping->host;
+	loff_t i_size = i_size_read(inode);
+	size_t offset = offset_in_thp(page, pos);
+
+	/* Full page write */
+	if (offset == 0 && len >= thp_size(page))
+		return true;
+
+	/* pos beyond last page in the file */
+	if (pos - offset >= i_size)
+		goto zero_out;
+
+	/* Write that covers from the start of the page to EOF or beyond */
+	if (offset == 0 && (pos + len) >= i_size)
+		goto zero_out;
 
-	for (i = 0; i < thp_nr_pages(page); i++)
-		clear_highpage(page + i);
+	return false;
+zero_out:
+	zero_user_segments(page, 0, offset, offset + len, thp_size(page));
+	return true;
 }
 
 /**
@@ -1024,7 +1054,7 @@ static void netfs_clear_thp(struct page *page)
  * @file: The file to read from
  * @mapping: The mapping to read from
  * @pos: File position at which the write will begin
- * @len: The length of the write in this page
+ * @len: The length of the write (may extend beyond the end of the page chosen)
  * @flags: AOP_* flags
  * @_page: Where to put the resultant page
  * @_fsdata: Place for the netfs to store a cookie
@@ -1061,8 +1091,6 @@ int netfs_write_begin(struct file *file, struct address_space *mapping,
 	struct inode *inode = file_inode(file);
 	unsigned int debug_index = 0;
 	pgoff_t index = pos >> PAGE_SHIFT;
-	int pos_in_page = pos & ~PAGE_MASK;
-	loff_t size;
 	int ret;
 
 	DEFINE_READAHEAD(ractl, file, NULL, mapping, index);
@@ -1090,13 +1118,8 @@ int netfs_write_begin(struct file *file, struct address_space *mapping,
 	 * within the cache granule containing the EOF, in which case we need
 	 * to preload the granule.
 	 */
-	size = i_size_read(inode);
 	if (!ops->is_cache_enabled(inode) &&
-	    ((pos_in_page == 0 && len == thp_size(page)) ||
-	     (pos >= size) ||
-	     (pos_in_page == 0 && (pos + len) >= size))) {
-		netfs_clear_thp(page);
-		SetPageUptodate(page);
+	    netfs_skip_page_read(page, pos, len)) {
 		netfs_stat(&netfs_n_rh_write_zskip);
 		goto have_page_no_wait;
 	}



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 0/3] netfs, afs: Fix netfs_write_begin and THP handling
  2021-06-17  8:23 [PATCH v2 0/3] netfs, afs: Fix netfs_write_begin and THP handling David Howells
  2021-06-17  8:24 ` [PATCH v2 3/3] netfs: fix test for whether we can skip read when writing beyond EOF David Howells
@ 2021-06-18  3:46 ` Al Viro
  1 sibling, 0 replies; 4+ messages in thread
From: Al Viro @ 2021-06-18  3:46 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cachefs, linux-afs, Matthew Wilcox (Oracle),
	ceph-devel, Jeff Layton, Andrew W Elble, linux-fsdevel,
	linux-kernel

On Thu, Jun 17, 2021 at 09:23:51AM +0100, David Howells wrote:
> 
> Here are some patches to fix netfs_write_begin() and the handling of THPs in
> that and afs_write_begin/end() in the following ways:
> 
>  (1) Use offset_in_thp() rather than manually calculating the offset into
>      the page.
> 
>  (2) In the future, the len parameter may extend beyond the page allocated.
>      This is because the page allocation is deferred to write_begin() and
>      that gets to decide what size of THP to allocate.
> 
>  (3) In netfs_write_begin(), extract the decision about whether to skip a
>      page out to its own helper and have that clear around the region to be
>      written, but not clear that region.  This requires the filesystem to
>      patch it up afterwards if the hole doesn't get completely filled.
> 
>  (4) Due to (3), afs_write_end() now needs to handle short data write into
>      the page by generic_perform_write().  I've adopted an analogous
>      approach to ceph of just returning 0 in this case and letting the
>      caller go round again.

Series looks sane.  I'd like to hear about the thp-related plans in
more detail, but that's a separate story.

> I wonder if generic_perform_write() should pass in a flag indicating
> whether this is the first attempt or a second attempt at this, and on the
> second attempt we just completely prefill the page and just let the partial
> write stand - which we have to do if the page was already uptodate when we
> started.

Not really - we'll simply get a shorter chunk next time around (with
the patches in -next right now it'll be "the amount we'd actually
managed to copy this time around" in case ->write_begin() tells us
to take a hike), and that shorter chunk is what ->write_begin() will
see.  No need for the flags...

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 3/3] netfs: fix test for whether we can skip read when writing beyond EOF
  2021-06-17  8:24 ` [PATCH v2 3/3] netfs: fix test for whether we can skip read when writing beyond EOF David Howells
@ 2021-06-21 14:50   ` Matthew Wilcox
  0 siblings, 0 replies; 4+ messages in thread
From: Matthew Wilcox @ 2021-06-21 14:50 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cachefs, linux-afs, Andrew W Elble, Jeff Layton,
	ceph-devel, linux-fsdevel, linux-kernel

On Thu, Jun 17, 2021 at 09:24:27AM +0100, David Howells wrote:
> From: Jeff Layton <jlayton@kernel.org>
> 
> It's not sufficient to skip reading when the pos is beyond the EOF.
> There may be data at the head of the page that we need to fill in
> before the write.
> 
> Add a new helper function that corrects and clarifies the logic of
> when we can skip reads, and have it only zero out the part of the page
> that won't have data copied in for the write.
> 
> Finally, don't set the page Uptodate after zeroing. It's not up to date
> since the write data won't have been copied in yet.
> 
> [DH made the following changes:
> 
>  - Prefixed the new function with "netfs_".
> 
>  - Don't call zero_user_segments() for a full-page write.
> 
>  - Altered the beyond-last-page check to avoid a DIV instruction and got
>    rid of then-redundant zero-length file check.
> ]
> 
> Fixes: e1b1240c1ff5f ("netfs: Add write_begin helper")
> Reported-by: Andrew W Elble <aweits@rit.edu>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: ceph-devel@vger.kernel.org
> Link: https://lore.kernel.org/r/20210613233345.113565-1-jlayton@kernel.org/
> Link: https://lore.kernel.org/r/162367683365.460125.4467036947364047314.stgit@warthog.procyon.org.uk/ # v1

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-06-21 14:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-17  8:23 [PATCH v2 0/3] netfs, afs: Fix netfs_write_begin and THP handling David Howells
2021-06-17  8:24 ` [PATCH v2 3/3] netfs: fix test for whether we can skip read when writing beyond EOF David Howells
2021-06-21 14:50   ` Matthew Wilcox
2021-06-18  3:46 ` [PATCH v2 0/3] netfs, afs: Fix netfs_write_begin and THP handling Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).