linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] afs: Handle len being extending over page end in write_begin/write_end
@ 2021-06-14 13:20 David Howells
  2021-06-14 13:20 ` [PATCH 2/3] afs: Fix afs_write_end() to handle short writes David Howells
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: David Howells @ 2021-06-14 13:20 UTC (permalink / raw)
  To: jlayton, willy
  Cc: linux-afs, dhowells, linux-afs, ceph-devel, linux-fsdevel, linux-kernel

With transparent huge pages, in the future, write_begin() and write_end()
may be passed a length parameter that, in combination with the offset into
the page, exceeds the length of that page.  This allows
grab_cache_page_write_begin() to better choose the size of THP to allocate.

Fix afs's functions to handle this by trimming the length as needed after
the page has been allocated.

Fixes: e1b1240c1ff5 ("netfs: Add write_begin helper")
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
---

 fs/afs/write.c |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/afs/write.c b/fs/afs/write.c
index a523bb86915d..f68274c39f47 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -25,7 +25,8 @@ int afs_set_page_dirty(struct page *page)
 }
 
 /*
- * prepare to perform part of a write to a page
+ * Prepare to perform part of a write to a page.  Note that len may extend
+ * beyond the end of the page.
  */
 int afs_write_begin(struct file *file, struct address_space *mapping,
 		    loff_t pos, unsigned len, unsigned flags,
@@ -52,7 +53,8 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
 		return ret;
 
 	index = page->index;
-	from = pos - index * PAGE_SIZE;
+	from = offset_in_thp(page, pos);
+	len = min_t(size_t, len, thp_size(page) - from);
 	to = from + len;
 
 try_again:
@@ -103,7 +105,8 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
 }
 
 /*
- * finalise part of a write to a page
+ * Finalise part of a write to a page.  Note that len may extend beyond the end
+ * of the page.
  */
 int afs_write_end(struct file *file, struct address_space *mapping,
 		  loff_t pos, unsigned len, unsigned copied,
@@ -111,7 +114,7 @@ int afs_write_end(struct file *file, struct address_space *mapping,
 {
 	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
 	unsigned long priv;
-	unsigned int f, from = pos & (thp_size(page) - 1);
+	unsigned int f, from = offset_in_thp(page, pos);
 	unsigned int t, to = from + copied;
 	loff_t i_size, maybe_i_size;
 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] afs: Fix afs_write_end() to handle short writes
  2021-06-14 13:20 [PATCH 1/3] afs: Handle len being extending over page end in write_begin/write_end David Howells
@ 2021-06-14 13:20 ` David Howells
  2021-06-14 13:28   ` Matthew Wilcox
                     ` (3 more replies)
  2021-06-14 13:20 ` [PATCH 3/3] netfs: fix test for whether we can skip read when writing beyond EOF David Howells
                   ` (2 subsequent siblings)
  3 siblings, 4 replies; 15+ messages in thread
From: David Howells @ 2021-06-14 13:20 UTC (permalink / raw)
  To: jlayton, willy
  Cc: linux-afs, dhowells, linux-afs, ceph-devel, linux-fsdevel, linux-kernel

Fix afs_write_end() to correctly handle a short copy into the intended
write region of the page.  Two things are necessary:

 (1) If the page is not up to date, then we should just return 0
     (ie. indicating a zero-length copy).  The loop in
     generic_perform_write() will go around again, possibly breaking up the
     iterator into discrete chunks.

     This is analogous to commit b9de313cf05fe08fa59efaf19756ec5283af672a
     for ceph.

 (2) The page should not have been set uptodate if it wasn't completely set
     up by netfs_write_begin() (this will be fixed in the next page), so we
     need to set PG_uptodate here in such a case.

Fixes: 3003bbd0697b ("afs: Use the netfs_write_begin() helper")
Reported-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
---

 fs/afs/write.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/afs/write.c b/fs/afs/write.c
index f68274c39f47..4b3f16ca2810 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -121,6 +121,16 @@ int afs_write_end(struct file *file, struct address_space *mapping,
 	_enter("{%llx:%llu},{%lx}",
 	       vnode->fid.vid, vnode->fid.vnode, page->index);
 
+	len = min_t(size_t, len, thp_size(page) - from);
+	if (!PageUptodate(page)) {
+		if (copied < len) {
+			copied = 0;
+			goto out;
+		}
+
+		SetPageUptodate(page);
+	}
+
 	if (copied == 0)
 		goto out;
 
@@ -135,8 +145,6 @@ int afs_write_end(struct file *file, struct address_space *mapping,
 		write_sequnlock(&vnode->cb_lock);
 	}
 
-	ASSERT(PageUptodate(page));
-
 	if (PagePrivate(page)) {
 		priv = page_private(page);
 		f = afs_page_dirty_from(page, priv);



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/3] netfs: fix test for whether we can skip read when writing beyond EOF
  2021-06-14 13:20 [PATCH 1/3] afs: Handle len being extending over page end in write_begin/write_end David Howells
  2021-06-14 13:20 ` [PATCH 2/3] afs: Fix afs_write_end() to handle short writes David Howells
@ 2021-06-14 13:20 ` David Howells
  2021-06-15 17:01   ` Jeff Layton
  2021-06-14 13:33 ` David Howells
  2021-06-17  7:43 ` [PATCH 1/3] afs: Handle len being extending over page end in write_begin/write_end kernel test robot
  3 siblings, 1 reply; 15+ messages in thread
From: David Howells @ 2021-06-14 13:20 UTC (permalink / raw)
  To: jlayton, willy
  Cc: Andrew W Elble, ceph-devel, dhowells, linux-afs, ceph-devel,
	linux-fsdevel, linux-kernel

From: Jeff Layton <jlayton@kernel.org>

It's not sufficient to skip reading when the pos is beyond the EOF.
There may be data at the head of the page that we need to fill in
before the write.

Add a new helper function that corrects and clarifies the logic of
when we can skip reads, and have it only zero out the part of the page
that won't have data copied in for the write.

Finally, don't set the page Uptodate after zeroing. It's not up to date
since the write data won't have been copied in yet.

[DH made the following changes:

 - Prefixed the new function with "netfs_".

 - Don't call zero_user_segments() for a full-page write.

 - Altered the beyond-last-page check to avoid a DIV instruction and got
   rid of then-redundant zero-length file check.
]

Fixes: e1b1240c1ff5f ("netfs: Add write_begin helper")
Reported-by: Andrew W Elble <aweits@rit.edu>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: ceph-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20210613233345.113565-1-jlayton@kernel.org/
---

 fs/netfs/read_helper.c |   49 +++++++++++++++++++++++++++++++++++-------------
 1 file changed, 36 insertions(+), 13 deletions(-)

diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 725614625ed4..70a5b1a19a50 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -1011,12 +1011,42 @@ int netfs_readpage(struct file *file,
 }
 EXPORT_SYMBOL(netfs_readpage);
 
-static void netfs_clear_thp(struct page *page)
+/**
+ * netfs_skip_page_read - prep a page for writing without reading first
+ * @page: page being prepared
+ * @pos: starting position for the write
+ * @len: length of write
+ *
+ * In some cases, write_begin doesn't need to read at all:
+ * - full page write
+ * - write that lies in a page that is completely beyond EOF
+ * - write that covers the the page from start to EOF or beyond it
+ *
+ * If any of these criteria are met, then zero out the unwritten parts
+ * of the page and return true. Otherwise, return false.
+ */
+static noinline bool netfs_skip_page_read(struct page *page, loff_t pos, size_t len)
 {
-	unsigned int i;
+	struct inode *inode = page->mapping->host;
+	loff_t i_size = i_size_read(inode);
+	size_t offset = offset_in_thp(page, pos);
+
+	/* Full page write */
+	if (offset == 0 && len >= thp_size(page))
+		return true;
+
+	/* pos beyond last page in the file */
+	if (pos - offset >= i_size)
+		goto zero_out;
+
+	/* Write that covers from the start of the page to EOF or beyond */
+	if (offset == 0 && (pos + len) >= i_size)
+		goto zero_out;
 
-	for (i = 0; i < thp_nr_pages(page); i++)
-		clear_highpage(page + i);
+	return false;
+zero_out:
+	zero_user_segments(page, 0, offset, offset + len, thp_size(page));
+	return true;
 }
 
 /**
@@ -1024,7 +1054,7 @@ static void netfs_clear_thp(struct page *page)
  * @file: The file to read from
  * @mapping: The mapping to read from
  * @pos: File position at which the write will begin
- * @len: The length of the write in this page
+ * @len: The length of the write (may extend beyond the end of the page chosen)
  * @flags: AOP_* flags
  * @_page: Where to put the resultant page
  * @_fsdata: Place for the netfs to store a cookie
@@ -1061,8 +1091,6 @@ int netfs_write_begin(struct file *file, struct address_space *mapping,
 	struct inode *inode = file_inode(file);
 	unsigned int debug_index = 0;
 	pgoff_t index = pos >> PAGE_SHIFT;
-	int pos_in_page = pos & ~PAGE_MASK;
-	loff_t size;
 	int ret;
 
 	DEFINE_READAHEAD(ractl, file, NULL, mapping, index);
@@ -1090,13 +1118,8 @@ int netfs_write_begin(struct file *file, struct address_space *mapping,
 	 * within the cache granule containing the EOF, in which case we need
 	 * to preload the granule.
 	 */
-	size = i_size_read(inode);
 	if (!ops->is_cache_enabled(inode) &&
-	    ((pos_in_page == 0 && len == thp_size(page)) ||
-	     (pos >= size) ||
-	     (pos_in_page == 0 && (pos + len) >= size))) {
-		netfs_clear_thp(page);
-		SetPageUptodate(page);
+	    netfs_skip_page_read(page, pos, len)) {
 		netfs_stat(&netfs_n_rh_write_zskip);
 		goto have_page_no_wait;
 	}



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] afs: Fix afs_write_end() to handle short writes
  2021-06-14 13:20 ` [PATCH 2/3] afs: Fix afs_write_end() to handle short writes David Howells
@ 2021-06-14 13:28   ` Matthew Wilcox
  2021-06-14 13:37   ` David Howells
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Matthew Wilcox @ 2021-06-14 13:28 UTC (permalink / raw)
  To: David Howells; +Cc: jlayton, linux-afs, ceph-devel, linux-fsdevel, linux-kernel

On Mon, Jun 14, 2021 at 02:20:25PM +0100, David Howells wrote:
> Fix afs_write_end() to correctly handle a short copy into the intended
> write region of the page.  Two things are necessary:
> 
>  (1) If the page is not up to date, then we should just return 0
>      (ie. indicating a zero-length copy).  The loop in
>      generic_perform_write() will go around again, possibly breaking up the
>      iterator into discrete chunks.

Does this actually work?  What about the situation where you're reading
the last page of a file and thus (almost) always reading fewer bytes
than a PAGE_SIZE?

>      This is analogous to commit b9de313cf05fe08fa59efaf19756ec5283af672a
>      for ceph.
> 
>  (2) The page should not have been set uptodate if it wasn't completely set
>      up by netfs_write_begin() (this will be fixed in the next page), so we
>      need to set PG_uptodate here in such a case.

s/page/patch/

and you have a really bad habit of breaking the layering and referring
to PG_foo.  Please just refer to PageFoo.  Filesystems shouldn't
care what the implementation of PageFoo is.

> +	len = min_t(size_t, len, thp_size(page) - from);
> +	if (!PageUptodate(page)) {
> +		if (copied < len) {
> +			copied = 0;
> +			goto out;
> +		}
> +
> +		SetPageUptodate(page);
> +	}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] netfs: fix test for whether we can skip read when writing beyond EOF
  2021-06-14 13:20 [PATCH 1/3] afs: Handle len being extending over page end in write_begin/write_end David Howells
  2021-06-14 13:20 ` [PATCH 2/3] afs: Fix afs_write_end() to handle short writes David Howells
  2021-06-14 13:20 ` [PATCH 3/3] netfs: fix test for whether we can skip read when writing beyond EOF David Howells
@ 2021-06-14 13:33 ` David Howells
  2021-06-17  7:43 ` [PATCH 1/3] afs: Handle len being extending over page end in write_begin/write_end kernel test robot
  3 siblings, 0 replies; 15+ messages in thread
From: David Howells @ 2021-06-14 13:33 UTC (permalink / raw)
  Cc: dhowells, jlayton, willy, Andrew W Elble, ceph-devel, linux-afs,
	linux-fsdevel, linux-kernel

David Howells <dhowells@redhat.com> wrote:

> +static noinline bool netfs_skip_page_read(struct page *page, loff_t pos, size_t len)

I should've removed the "noinline".

David


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] afs: Fix afs_write_end() to handle short writes
  2021-06-14 13:20 ` [PATCH 2/3] afs: Fix afs_write_end() to handle short writes David Howells
  2021-06-14 13:28   ` Matthew Wilcox
@ 2021-06-14 13:37   ` David Howells
  2021-06-14 13:46     ` Matthew Wilcox
  2021-06-14 14:04     ` David Howells
  2021-06-14 14:37   ` Matthew Wilcox
  2021-06-14 15:38   ` David Howells
  3 siblings, 2 replies; 15+ messages in thread
From: David Howells @ 2021-06-14 13:37 UTC (permalink / raw)
  To: Matthew Wilcox, Al Viro
  Cc: dhowells, jlayton, linux-afs, ceph-devel, linux-fsdevel, linux-kernel

Matthew Wilcox <willy@infradead.org> wrote:

> >  (1) If the page is not up to date, then we should just return 0
> >      (ie. indicating a zero-length copy).  The loop in
> >      generic_perform_write() will go around again, possibly breaking up the
> >      iterator into discrete chunks.
> 
> Does this actually work?  What about the situation where you're reading
> the last page of a file and thus (almost) always reading fewer bytes
> than a PAGE_SIZE?

Al Viro made such a change for Ceph - and we're writing, not reading.

I was thinking that it would break if reading from a pipe, but Jeff pointed
out that the iov_iter_advance() in generic_perform_write() uses the return
value of ->write_end() to advance the iterator.  So it might loop endlessly,
but it doesn't appear it will corrupt your data.

David


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] afs: Fix afs_write_end() to handle short writes
  2021-06-14 13:37   ` David Howells
@ 2021-06-14 13:46     ` Matthew Wilcox
  2021-06-18  3:40       ` Al Viro
  2021-06-14 14:04     ` David Howells
  1 sibling, 1 reply; 15+ messages in thread
From: Matthew Wilcox @ 2021-06-14 13:46 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, jlayton, linux-afs, ceph-devel, linux-fsdevel, linux-kernel

On Mon, Jun 14, 2021 at 02:37:12PM +0100, David Howells wrote:
> Matthew Wilcox <willy@infradead.org> wrote:
> 
> > >  (1) If the page is not up to date, then we should just return 0
> > >      (ie. indicating a zero-length copy).  The loop in
> > >      generic_perform_write() will go around again, possibly breaking up the
> > >      iterator into discrete chunks.
> > 
> > Does this actually work?  What about the situation where you're reading
> > the last page of a file and thus (almost) always reading fewer bytes
> > than a PAGE_SIZE?
> 
> Al Viro made such a change for Ceph - and we're writing, not reading.

I'd feel better if you said "xfstests doesn't show any new problems"
than arguing to authority.

I know the operation which triggers this path is a call to write(),
but if, say, the file is 32 bytes long, not in cache, and you write
bytes 32-63, the client must READ bytes 0-31 from the server, which
is less than a full page.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] afs: Fix afs_write_end() to handle short writes
  2021-06-14 13:37   ` David Howells
  2021-06-14 13:46     ` Matthew Wilcox
@ 2021-06-14 14:04     ` David Howells
  1 sibling, 0 replies; 15+ messages in thread
From: David Howells @ 2021-06-14 14:04 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: dhowells, Al Viro, jlayton, linux-afs, ceph-devel, linux-fsdevel,
	linux-kernel

Matthew Wilcox <willy@infradead.org> wrote:

> > Al Viro made such a change for Ceph - and we're writing, not reading.
> 
> I'd feel better if you said "xfstests doesn't show any new problems"
> than arguing to authority.

I'm kind of referring it to Al - I added him to the to: list.  And xfstests
doesn't show any new problems - but that doesn't mean that this path got
tested.

David


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] afs: Fix afs_write_end() to handle short writes
  2021-06-14 13:20 ` [PATCH 2/3] afs: Fix afs_write_end() to handle short writes David Howells
  2021-06-14 13:28   ` Matthew Wilcox
  2021-06-14 13:37   ` David Howells
@ 2021-06-14 14:37   ` Matthew Wilcox
  2021-06-14 15:38   ` David Howells
  3 siblings, 0 replies; 15+ messages in thread
From: Matthew Wilcox @ 2021-06-14 14:37 UTC (permalink / raw)
  To: David Howells; +Cc: jlayton, linux-afs, ceph-devel, linux-fsdevel, linux-kernel

On Mon, Jun 14, 2021 at 02:20:25PM +0100, David Howells wrote:
> @@ -135,8 +145,6 @@ int afs_write_end(struct file *file, struct address_space *mapping,
>  		write_sequnlock(&vnode->cb_lock);
>  	}
>  
> -	ASSERT(PageUptodate(page));
> -
>  	if (PagePrivate(page)) {
>  		priv = page_private(page);
>  		f = afs_page_dirty_from(page, priv);

Why are you removing this assertion?  Does AFS now support dirty,
partially-uptodate pages?  If so, a subsequent read() to that
page is going to need to be careful to only read the parts of the page
from the server that haven't been written ...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] afs: Fix afs_write_end() to handle short writes
  2021-06-14 13:20 ` [PATCH 2/3] afs: Fix afs_write_end() to handle short writes David Howells
                     ` (2 preceding siblings ...)
  2021-06-14 14:37   ` Matthew Wilcox
@ 2021-06-14 15:38   ` David Howells
  2021-06-14 15:43     ` Matthew Wilcox
  2021-06-14 21:11     ` David Howells
  3 siblings, 2 replies; 15+ messages in thread
From: David Howells @ 2021-06-14 15:38 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: dhowells, jlayton, linux-afs, ceph-devel, linux-fsdevel, linux-kernel

Matthew Wilcox <willy@infradead.org> wrote:

> > -	ASSERT(PageUptodate(page));
> > -
> >  	if (PagePrivate(page)) {
> >  		priv = page_private(page);
> >  		f = afs_page_dirty_from(page, priv);
> 
> Why are you removing this assertion?  Does AFS now support dirty,
> partially-uptodate pages?  If so, a subsequent read() to that
> page is going to need to be careful to only read the parts of the page
> from the server that haven't been written ...

Because the previous hunk in the patch:

	+	if (!PageUptodate(page)) {
	+		if (copied < len) {
	+			copied = 0;
	+			goto out;
	+		}
	+
	+		SetPageUptodate(page);
	+	}

means you can't get there unless PageUptodate() is true by that point.

David


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] afs: Fix afs_write_end() to handle short writes
  2021-06-14 15:38   ` David Howells
@ 2021-06-14 15:43     ` Matthew Wilcox
  2021-06-14 21:11     ` David Howells
  1 sibling, 0 replies; 15+ messages in thread
From: Matthew Wilcox @ 2021-06-14 15:43 UTC (permalink / raw)
  To: David Howells; +Cc: jlayton, linux-afs, ceph-devel, linux-fsdevel, linux-kernel

On Mon, Jun 14, 2021 at 04:38:21PM +0100, David Howells wrote:
> Matthew Wilcox <willy@infradead.org> wrote:
> 
> > > -	ASSERT(PageUptodate(page));
> > > -
> > >  	if (PagePrivate(page)) {
> > >  		priv = page_private(page);
> > >  		f = afs_page_dirty_from(page, priv);
> > 
> > Why are you removing this assertion?  Does AFS now support dirty,
> > partially-uptodate pages?  If so, a subsequent read() to that
> > page is going to need to be careful to only read the parts of the page
> > from the server that haven't been written ...
> 
> Because the previous hunk in the patch:
> 
> 	+	if (!PageUptodate(page)) {
> 	+		if (copied < len) {
> 	+			copied = 0;
> 	+			goto out;
> 	+		}
> 	+
> 	+		SetPageUptodate(page);
> 	+	}
> 
> means you can't get there unless PageUptodate() is true by that point.

Isn't the point of an assertion to check that this is true?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] afs: Fix afs_write_end() to handle short writes
  2021-06-14 15:38   ` David Howells
  2021-06-14 15:43     ` Matthew Wilcox
@ 2021-06-14 21:11     ` David Howells
  1 sibling, 0 replies; 15+ messages in thread
From: David Howells @ 2021-06-14 21:11 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: dhowells, jlayton, linux-afs, ceph-devel, linux-fsdevel, linux-kernel

Matthew Wilcox <willy@infradead.org> wrote:

> > means you can't get there unless PageUptodate() is true by that point.
> 
> Isn't the point of an assertion to check that this is true?

The assertion was meant to check that that it was true given that the page was
set uptodate somewhere else before this function was even called.  With this
patch, however, it's now set in this function if it wasn't already right at
the top - so the assertion should now be redundant.  I can put it back if you
really insist.

David


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] netfs: fix test for whether we can skip read when writing beyond EOF
  2021-06-14 13:20 ` [PATCH 3/3] netfs: fix test for whether we can skip read when writing beyond EOF David Howells
@ 2021-06-15 17:01   ` Jeff Layton
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2021-06-15 17:01 UTC (permalink / raw)
  To: open list:CEPH DISTRIBUTED...
  Cc: Andrew W Elble, ceph-devel, linux-afs, linux-fsdevel,
	linux-kernel, David Howells, willy

On Mon, 2021-06-14 at 14:20 +0100, David Howells wrote:
> From: Jeff Layton <jlayton@kernel.org>
> 
> It's not sufficient to skip reading when the pos is beyond the EOF.
> There may be data at the head of the page that we need to fill in
> before the write.
> 
> Add a new helper function that corrects and clarifies the logic of
> when we can skip reads, and have it only zero out the part of the page
> that won't have data copied in for the write.
> 
> Finally, don't set the page Uptodate after zeroing. It's not up to date
> since the write data won't have been copied in yet.
> 
> [DH made the following changes:
> 
>  - Prefixed the new function with "netfs_".
> 
>  - Don't call zero_user_segments() for a full-page write.
> 
>  - Altered the beyond-last-page check to avoid a DIV instruction and got
>    rid of then-redundant zero-length file check.
> ]
> 
> Fixes: e1b1240c1ff5f ("netfs: Add write_begin helper")
> Reported-by: Andrew W Elble <aweits@rit.edu>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: ceph-devel@vger.kernel.org
> Link: https://lore.kernel.org/r/20210613233345.113565-1-jlayton@kernel.org/
> ---
> 
>  fs/netfs/read_helper.c |   49 +++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 36 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
> index 725614625ed4..70a5b1a19a50 100644
> --- a/fs/netfs/read_helper.c
> +++ b/fs/netfs/read_helper.c
> @@ -1011,12 +1011,42 @@ int netfs_readpage(struct file *file,
>  }
>  EXPORT_SYMBOL(netfs_readpage);
>  
> -static void netfs_clear_thp(struct page *page)
> +/**
> + * netfs_skip_page_read - prep a page for writing without reading first
> + * @page: page being prepared
> + * @pos: starting position for the write
> + * @len: length of write
> + *
> + * In some cases, write_begin doesn't need to read at all:
> + * - full page write
> + * - write that lies in a page that is completely beyond EOF
> + * - write that covers the the page from start to EOF or beyond it
> + *
> + * If any of these criteria are met, then zero out the unwritten parts
> + * of the page and return true. Otherwise, return false.
> + */
> +static noinline bool netfs_skip_page_read(struct page *page, loff_t pos, size_t len)
>  {
> -	unsigned int i;
> +	struct inode *inode = page->mapping->host;
> +	loff_t i_size = i_size_read(inode);
> +	size_t offset = offset_in_thp(page, pos);
> +
> +	/* Full page write */
> +	if (offset == 0 && len >= thp_size(page))
> +		return true;
> +
> +	/* pos beyond last page in the file */
> +	if (pos - offset >= i_size)
> +		goto zero_out;
> +
> +	/* Write that covers from the start of the page to EOF or beyond */
> +	if (offset == 0 && (pos + len) >= i_size)
> +		goto zero_out;
>  
> -	for (i = 0; i < thp_nr_pages(page); i++)
> -		clear_highpage(page + i);
> +	return false;
> +zero_out:
> +	zero_user_segments(page, 0, offset, offset + len, thp_size(page));
> +	return true;
>  }
>  
>  /**
> @@ -1024,7 +1054,7 @@ static void netfs_clear_thp(struct page *page)
>   * @file: The file to read from
>   * @mapping: The mapping to read from
>   * @pos: File position at which the write will begin
> - * @len: The length of the write in this page
> + * @len: The length of the write (may extend beyond the end of the page chosen)
>   * @flags: AOP_* flags
>   * @_page: Where to put the resultant page
>   * @_fsdata: Place for the netfs to store a cookie
> @@ -1061,8 +1091,6 @@ int netfs_write_begin(struct file *file, struct address_space *mapping,
>  	struct inode *inode = file_inode(file);
>  	unsigned int debug_index = 0;
>  	pgoff_t index = pos >> PAGE_SHIFT;
> -	int pos_in_page = pos & ~PAGE_MASK;
> -	loff_t size;
>  	int ret;
>  
>  	DEFINE_READAHEAD(ractl, file, NULL, mapping, index);
> @@ -1090,13 +1118,8 @@ int netfs_write_begin(struct file *file, struct address_space *mapping,
>  	 * within the cache granule containing the EOF, in which case we need
>  	 * to preload the granule.
>  	 */
> -	size = i_size_read(inode);
>  	if (!ops->is_cache_enabled(inode) &&
> -	    ((pos_in_page == 0 && len == thp_size(page)) ||
> -	     (pos >= size) ||
> -	     (pos_in_page == 0 && (pos + len) >= size))) {
> -		netfs_clear_thp(page);
> -		SetPageUptodate(page);
> +	    netfs_skip_page_read(page, pos, len)) {
>  		netfs_stat(&netfs_n_rh_write_zskip);
>  		goto have_page_no_wait;
>  	}
> 
> 

I've gone ahead and merged this into the ceph/testing branch since it is
a data corruptor. Once this patch goes into mainline, we'll drop that
patch and rebase onto that commit.
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] afs: Handle len being extending over page end in write_begin/write_end
  2021-06-14 13:20 [PATCH 1/3] afs: Handle len being extending over page end in write_begin/write_end David Howells
                   ` (2 preceding siblings ...)
  2021-06-14 13:33 ` David Howells
@ 2021-06-17  7:43 ` kernel test robot
  3 siblings, 0 replies; 15+ messages in thread
From: kernel test robot @ 2021-06-17  7:43 UTC (permalink / raw)
  To: David Howells, jlayton, willy
  Cc: kbuild-all, linux-afs, dhowells, ceph-devel, linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 7824 bytes --]

Hi David,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.13-rc6 next-20210616]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/David-Howells/afs-Handle-len-being-extending-over-page-end-in-write_begin-write_end/20210617-091000
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 70585216fe7730d9fb5453d3e2804e149d0fe201
config: arc-allyesconfig (attached as .config)
compiler: arceb-elf-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/d5cb85d0ca85764a811baaf4baca5f1890476434
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review David-Howells/afs-Handle-len-being-extending-over-page-end-in-write_begin-write_end/20210617-091000
        git checkout d5cb85d0ca85764a811baaf4baca5f1890476434
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   fs/afs/write.c: In function 'afs_write_begin':
>> fs/afs/write.c:40:10: warning: variable 'index' set but not used [-Wunused-but-set-variable]
      40 |  pgoff_t index;
         |          ^~~~~


vim +/index +40 fs/afs/write.c

31143d5d515ece David Howells   2007-05-09   26  
31143d5d515ece David Howells   2007-05-09   27  /*
d5cb85d0ca8576 David Howells   2021-06-14   28   * Prepare to perform part of a write to a page.  Note that len may extend
d5cb85d0ca8576 David Howells   2021-06-14   29   * beyond the end of the page.
31143d5d515ece David Howells   2007-05-09   30   */
15b4650e55e06d Nicholas Piggin 2008-10-15   31  int afs_write_begin(struct file *file, struct address_space *mapping,
15b4650e55e06d Nicholas Piggin 2008-10-15   32  		    loff_t pos, unsigned len, unsigned flags,
21db2cdc667f74 David Howells   2020-10-22   33  		    struct page **_page, void **fsdata)
31143d5d515ece David Howells   2007-05-09   34  {
496ad9aa8ef448 Al Viro         2013-01-23   35  	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
15b4650e55e06d Nicholas Piggin 2008-10-15   36  	struct page *page;
4343d00872e1de David Howells   2017-11-02   37  	unsigned long priv;
e87b03f5830ecd David Howells   2020-10-20   38  	unsigned f, from;
e87b03f5830ecd David Howells   2020-10-20   39  	unsigned t, to;
e87b03f5830ecd David Howells   2020-10-20  @40  	pgoff_t index;
31143d5d515ece David Howells   2007-05-09   41  	int ret;
31143d5d515ece David Howells   2007-05-09   42  
e87b03f5830ecd David Howells   2020-10-20   43  	_enter("{%llx:%llu},%llx,%x",
e87b03f5830ecd David Howells   2020-10-20   44  	       vnode->fid.vid, vnode->fid.vnode, pos, len);
31143d5d515ece David Howells   2007-05-09   45  
3003bbd0697b65 David Howells   2020-02-06   46  	/* Prefetch area to be written into the cache if we're caching this
3003bbd0697b65 David Howells   2020-02-06   47  	 * file.  We need to do this before we get a lock on the page in case
3003bbd0697b65 David Howells   2020-02-06   48  	 * there's more than one writer competing for the same cache block.
3003bbd0697b65 David Howells   2020-02-06   49  	 */
3003bbd0697b65 David Howells   2020-02-06   50  	ret = netfs_write_begin(file, mapping, pos, len, flags, &page, fsdata,
3003bbd0697b65 David Howells   2020-02-06   51  				&afs_req_ops, NULL);
3003bbd0697b65 David Howells   2020-02-06   52  	if (ret < 0)
31143d5d515ece David Howells   2007-05-09   53  		return ret;
630f5dda8442ca David Howells   2020-02-06   54  
e87b03f5830ecd David Howells   2020-10-20   55  	index = page->index;
d5cb85d0ca8576 David Howells   2021-06-14   56  	from = offset_in_thp(page, pos);
d5cb85d0ca8576 David Howells   2021-06-14   57  	len = min_t(size_t, len, thp_size(page) - from);
e87b03f5830ecd David Howells   2020-10-20   58  	to = from + len;
e87b03f5830ecd David Howells   2020-10-20   59  
31143d5d515ece David Howells   2007-05-09   60  try_again:
4343d00872e1de David Howells   2017-11-02   61  	/* See if this page is already partially written in a way that we can
4343d00872e1de David Howells   2017-11-02   62  	 * merge the new write with.
4343d00872e1de David Howells   2017-11-02   63  	 */
4343d00872e1de David Howells   2017-11-02   64  	if (PagePrivate(page)) {
4343d00872e1de David Howells   2017-11-02   65  		priv = page_private(page);
67d78a6f6e7b38 David Howells   2020-10-28   66  		f = afs_page_dirty_from(page, priv);
67d78a6f6e7b38 David Howells   2020-10-28   67  		t = afs_page_dirty_to(page, priv);
4343d00872e1de David Howells   2017-11-02   68  		ASSERTCMP(f, <=, t);
4343d00872e1de David Howells   2017-11-02   69  
5a039c32271b9a David Howells   2017-11-18   70  		if (PageWriteback(page)) {
67d78a6f6e7b38 David Howells   2020-10-28   71  			trace_afs_page_dirty(vnode, tracepoint_string("alrdy"), page);
5a039c32271b9a David Howells   2017-11-18   72  			goto flush_conflicting_write;
5a039c32271b9a David Howells   2017-11-18   73  		}
5a8132761609bd David Howells   2018-04-06   74  		/* If the file is being filled locally, allow inter-write
5a8132761609bd David Howells   2018-04-06   75  		 * spaces to be merged into writes.  If it's not, only write
5a8132761609bd David Howells   2018-04-06   76  		 * back what the user gives us.
5a8132761609bd David Howells   2018-04-06   77  		 */
5a8132761609bd David Howells   2018-04-06   78  		if (!test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags) &&
5a8132761609bd David Howells   2018-04-06   79  		    (to < f || from > t))
4343d00872e1de David Howells   2017-11-02   80  			goto flush_conflicting_write;
31143d5d515ece David Howells   2007-05-09   81  	}
31143d5d515ece David Howells   2007-05-09   82  
21db2cdc667f74 David Howells   2020-10-22   83  	*_page = page;
4343d00872e1de David Howells   2017-11-02   84  	_leave(" = 0");
31143d5d515ece David Howells   2007-05-09   85  	return 0;
31143d5d515ece David Howells   2007-05-09   86  
4343d00872e1de David Howells   2017-11-02   87  	/* The previous write and this write aren't adjacent or overlapping, so
4343d00872e1de David Howells   2017-11-02   88  	 * flush the page out.
4343d00872e1de David Howells   2017-11-02   89  	 */
4343d00872e1de David Howells   2017-11-02   90  flush_conflicting_write:
31143d5d515ece David Howells   2007-05-09   91  	_debug("flush conflict");
4343d00872e1de David Howells   2017-11-02   92  	ret = write_one_page(page);
21db2cdc667f74 David Howells   2020-10-22   93  	if (ret < 0)
21db2cdc667f74 David Howells   2020-10-22   94  		goto error;
31143d5d515ece David Howells   2007-05-09   95  
4343d00872e1de David Howells   2017-11-02   96  	ret = lock_page_killable(page);
21db2cdc667f74 David Howells   2020-10-22   97  	if (ret < 0)
21db2cdc667f74 David Howells   2020-10-22   98  		goto error;
21db2cdc667f74 David Howells   2020-10-22   99  	goto try_again;
21db2cdc667f74 David Howells   2020-10-22  100  
21db2cdc667f74 David Howells   2020-10-22  101  error:
21db2cdc667f74 David Howells   2020-10-22  102  	put_page(page);
4343d00872e1de David Howells   2017-11-02  103  	_leave(" = %d", ret);
4343d00872e1de David Howells   2017-11-02  104  	return ret;
4343d00872e1de David Howells   2017-11-02  105  }
31143d5d515ece David Howells   2007-05-09  106  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 68158 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] afs: Fix afs_write_end() to handle short writes
  2021-06-14 13:46     ` Matthew Wilcox
@ 2021-06-18  3:40       ` Al Viro
  0 siblings, 0 replies; 15+ messages in thread
From: Al Viro @ 2021-06-18  3:40 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: David Howells, jlayton, linux-afs, ceph-devel, linux-fsdevel,
	linux-kernel

On Mon, Jun 14, 2021 at 02:46:03PM +0100, Matthew Wilcox wrote:
> On Mon, Jun 14, 2021 at 02:37:12PM +0100, David Howells wrote:
> > Matthew Wilcox <willy@infradead.org> wrote:
> > 
> > > >  (1) If the page is not up to date, then we should just return 0
> > > >      (ie. indicating a zero-length copy).  The loop in
> > > >      generic_perform_write() will go around again, possibly breaking up the
> > > >      iterator into discrete chunks.
> > > 
> > > Does this actually work?  What about the situation where you're reading
> > > the last page of a file and thus (almost) always reading fewer bytes
> > > than a PAGE_SIZE?
> > 
> > Al Viro made such a change for Ceph - and we're writing, not reading.
> 
> I'd feel better if you said "xfstests doesn't show any new problems"
> than arguing to authority.
> 
> I know the operation which triggers this path is a call to write(),
> but if, say, the file is 32 bytes long, not in cache, and you write
> bytes 32-63, the client must READ bytes 0-31 from the server, which
> is less than a full page.

[as commented on IRC several days ago]

Short copy has nothing to do with destination; it's about failures on
source - e.g. source page we'd prefaulted before locking the destination
got evicted by the time we got around to copying; we can't afford
page faults while holding some pages locked, so we do it with
pagefault_disable() and get a short copy on #PF.

The story with short copies is this:
	* write() is about to copy the next chunk of data into
page cache of the file we are writing into.  We have decided what
part of the destination page will be copied over, faulted the
source in and locked the destination page.

	* if the page is not uptodate, we might need to read
some parts before copying new data into it; the work that needs
to be done depends upon the part of page we are going to overwrite.
E.g. if we are going to copy over the entire thing, we do
_not_ want to bother reading anything into it - if copying
works, we'll destroy the previous contents anyway.
	That's what ->write_begin() is about - it should
do whatever's needed in preparation to copying new data.

	* NOW we can copy the data.  Hopefully the copy will
be successful (i.e. we don't run into evicted source pages,
memory errors, races with munmap(), etc.), but it might fail
halfway through - we are doing that part with page faults
disabled.

	* finally we can do write to disk/server/whatnot.
That's what ->write_end() is for.  Ideally, it'll just
send the newly copied data on its way.  However, in case of
short copy we might have problems.  Consider e.g. a block
filesystem that has 4 blocks per page; the chunk we were
going to write went from the middle of the 1st to the
middle of the 4th block.  ->write_begin() made sure that
1st and 4th blocks had been uptodate.  It had not bothered
with the 2nd and the 3rd blocks, since we were going to
overwrite them anyway.  And had the copy succeeded, we'd
be fine - page fully uptodate, can write the data to
disk and be done with that.  However, the copy failed
halfway through the 3rd block.  What do we have?
1st block: uptodate, partly old data, partly new one.
2nd block: uptodate, new data
3rd block: beginning is filled with new data, garbage in the rest
4th block: uptodate, old data.
What to do?  Everything up to the beginning of the 3rd block
is fine, but the 3rd one is a hopeless mess.  We can't write it
out - the garbage would end up on disk.  We can't replace the
garbage with valid data without reading it from disk - and that'll
lose the new data we'd managed to copy there.

The best we can do in such situation is to treat that as
having advanced to the beginning of the third block, despite
having copied more than that.  The caller (generic_perform_write())
will choose the next chunk starting at that point (beginning of
the 3rd block) and repeat the whole sequence for that chunk,
including the fault-in.

So ->write_end() gets 3 numbers - two describing the range we
prepared for (what ->write_begin() had received) and the third
telling how much had been actually copied.  Again, "short copy"
here does not refer to any preparations done by ->write_begin() -
it's about having told ->write_begin() we would copy over given
range and only managing to fill a part of that range.

Note that if page is uptodate, we are fine - _everything_
in that page matches what we want in file, so we can deal with
sending it to disk/server/whatnot.  If there'd been a short
copy the caller will obviously need to continue from the point
where the copy stopped, but that's not our problem.

What to do in case of short copy into non-uptodate page is
up to filesystem.  Saying "sod it, I'm not taking any of
that, just repeat the entire thing" is always fine.  We might
do better than that (see above for one such example), but
the caller will be OK if we don't.  It's a rare case, and
you either need something like race with munmap() of part of
source buffer from another thread or severe memory pressure
for that to trigger in the first place.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-06-18  3:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-14 13:20 [PATCH 1/3] afs: Handle len being extending over page end in write_begin/write_end David Howells
2021-06-14 13:20 ` [PATCH 2/3] afs: Fix afs_write_end() to handle short writes David Howells
2021-06-14 13:28   ` Matthew Wilcox
2021-06-14 13:37   ` David Howells
2021-06-14 13:46     ` Matthew Wilcox
2021-06-18  3:40       ` Al Viro
2021-06-14 14:04     ` David Howells
2021-06-14 14:37   ` Matthew Wilcox
2021-06-14 15:38   ` David Howells
2021-06-14 15:43     ` Matthew Wilcox
2021-06-14 21:11     ` David Howells
2021-06-14 13:20 ` [PATCH 3/3] netfs: fix test for whether we can skip read when writing beyond EOF David Howells
2021-06-15 17:01   ` Jeff Layton
2021-06-14 13:33 ` David Howells
2021-06-17  7:43 ` [PATCH 1/3] afs: Handle len being extending over page end in write_begin/write_end kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).