Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] iomap: Ensure iop->uptodate matches PageUptodate
@ 2020-07-26  9:10 Matthew Wilcox (Oracle)
  2020-07-26 15:15 ` Christoph Hellwig
  2020-07-26 23:06 ` Dave Chinner
  0 siblings, 2 replies; 7+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-07-26  9:10 UTC (permalink / raw)
  To: Christoph Hellwig, Darrick J. Wong
  Cc: Matthew Wilcox (Oracle),
	linux-xfs, linux-fsdevel, Brian Foster, linux-kernel

If the filesystem has block size < page size and we end up calling
iomap_page_create() in iomap_page_mkwrite_actor(), the uptodate bits
would be zero, which causes us to skip writeback of blocks which are
!uptodate in iomap_writepage_map().  This can lead to user data loss.

Found using generic/127 with the THP patches.  I don't think this can be
reproduced on mainline using that test (the THP code causes iomap_pages
to be discarded more frequently), but inspection shows it can happen
with an appropriate series of operations.

Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index a2b3b5455219..f0c5027bf33f 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -53,7 +53,10 @@ iomap_page_create(struct inode *inode, struct page *page)
 	atomic_set(&iop->read_count, 0);
 	atomic_set(&iop->write_count, 0);
 	spin_lock_init(&iop->uptodate_lock);
-	bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
+	if (PageUptodate(page))
+		bitmap_fill(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
+	else
+		bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
 
 	/*
 	 * migrate_page_move_mapping() assumes that pages with private data have
@@ -72,6 +75,8 @@ iomap_page_release(struct page *page)
 		return;
 	WARN_ON_ONCE(atomic_read(&iop->read_count));
 	WARN_ON_ONCE(atomic_read(&iop->write_count));
+	WARN_ON_ONCE(bitmap_full(iop->uptodate, PAGE_SIZE / SECTOR_SIZE) !=
+			PageUptodate(page));
 	kfree(iop);
 }
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] iomap: Ensure iop->uptodate matches PageUptodate
  2020-07-26  9:10 [PATCH] iomap: Ensure iop->uptodate matches PageUptodate Matthew Wilcox (Oracle)
@ 2020-07-26 15:15 ` Christoph Hellwig
  2020-07-26 23:06 ` Dave Chinner
  1 sibling, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2020-07-26 15:15 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs, linux-fsdevel,
	Brian Foster, linux-kernel

On Sun, Jul 26, 2020 at 10:10:52AM +0100, Matthew Wilcox (Oracle) wrote:
> If the filesystem has block size < page size and we end up calling
> iomap_page_create() in iomap_page_mkwrite_actor(), the uptodate bits
> would be zero, which causes us to skip writeback of blocks which are
> !uptodate in iomap_writepage_map().  This can lead to user data loss.
> 
> Found using generic/127 with the THP patches.  I don't think this can be
> reproduced on mainline using that test (the THP code causes iomap_pages
> to be discarded more frequently), but inspection shows it can happen
> with an appropriate series of operations.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] iomap: Ensure iop->uptodate matches PageUptodate
  2020-07-26  9:10 [PATCH] iomap: Ensure iop->uptodate matches PageUptodate Matthew Wilcox (Oracle)
  2020-07-26 15:15 ` Christoph Hellwig
@ 2020-07-26 23:06 ` Dave Chinner
  2020-07-26 23:20   ` Matthew Wilcox
  1 sibling, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2020-07-26 23:06 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs, linux-fsdevel,
	Brian Foster, linux-kernel

On Sun, Jul 26, 2020 at 10:10:52AM +0100, Matthew Wilcox (Oracle) wrote:
> If the filesystem has block size < page size and we end up calling
> iomap_page_create() in iomap_page_mkwrite_actor(), the uptodate bits
> would be zero, which causes us to skip writeback of blocks which are
> !uptodate in iomap_writepage_map().  This can lead to user data loss.

I'm still unclear on what condition gets us to
iomap_page_mkwrite_actor() without already having initialised the
page correctly. i.e. via a read() or write() call, or the read fault
prior to ->page_mkwrite() which would have marked the page uptodate
- that operation should have called iomap_page_create() and
iomap_set_range_uptodate() on the page....

i.e. you've described the symptom, but not the cause of the issue
you are addressing.

> Found using generic/127 with the THP patches.  I don't think this can be
> reproduced on mainline using that test (the THP code causes iomap_pages
> to be discarded more frequently), but inspection shows it can happen
> with an appropriate series of operations.

That sequence of operations would be? 

> Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/iomap/buffered-io.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index a2b3b5455219..f0c5027bf33f 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -53,7 +53,10 @@ iomap_page_create(struct inode *inode, struct page *page)
>  	atomic_set(&iop->read_count, 0);
>  	atomic_set(&iop->write_count, 0);
>  	spin_lock_init(&iop->uptodate_lock);
> -	bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
> +	if (PageUptodate(page))
> +		bitmap_fill(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
> +	else
> +		bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);

I suspect this bitmap_fill call belongs in the iomap_page_mkwrite()
code as is the only code that can call iomap_page_create() with an
uptodate page. Then iomap_page_create() could just use kzalloc() and
drop the atomic_set() and bitmap_zero() calls altogether,

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] iomap: Ensure iop->uptodate matches PageUptodate
  2020-07-26 23:06 ` Dave Chinner
@ 2020-07-26 23:20   ` Matthew Wilcox
  2020-07-26 23:53     ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2020-07-26 23:20 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs, linux-fsdevel,
	Brian Foster, linux-kernel

On Mon, Jul 27, 2020 at 09:06:57AM +1000, Dave Chinner wrote:
> On Sun, Jul 26, 2020 at 10:10:52AM +0100, Matthew Wilcox (Oracle) wrote:
> > If the filesystem has block size < page size and we end up calling
> > iomap_page_create() in iomap_page_mkwrite_actor(), the uptodate bits
> > would be zero, which causes us to skip writeback of blocks which are
> > !uptodate in iomap_writepage_map().  This can lead to user data loss.
> 
> I'm still unclear on what condition gets us to
> iomap_page_mkwrite_actor() without already having initialised the
> page correctly. i.e. via a read() or write() call, or the read fault
> prior to ->page_mkwrite() which would have marked the page uptodate
> - that operation should have called iomap_page_create() and
> iomap_set_range_uptodate() on the page....
> 
> i.e. you've described the symptom, but not the cause of the issue
> you are addressing.

I don't know exactly what condition gets us there either.  It must be
possible, or there wouldn't be a call to iomap_page_create() but rather
one to to_iomap_page() like the one in iomap_finish_page_writeback().

> > Found using generic/127 with the THP patches.  I don't think this can be
> > reproduced on mainline using that test (the THP code causes iomap_pages
> > to be discarded more frequently), but inspection shows it can happen
> > with an appropriate series of operations.
> 
> That sequence of operations would be? 
> 
> > Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> >  fs/iomap/buffered-io.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index a2b3b5455219..f0c5027bf33f 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -53,7 +53,10 @@ iomap_page_create(struct inode *inode, struct page *page)
> >  	atomic_set(&iop->read_count, 0);
> >  	atomic_set(&iop->write_count, 0);
> >  	spin_lock_init(&iop->uptodate_lock);
> > -	bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
> > +	if (PageUptodate(page))
> > +		bitmap_fill(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
> > +	else
> > +		bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
> 
> I suspect this bitmap_fill call belongs in the iomap_page_mkwrite()
> code as is the only code that can call iomap_page_create() with an
> uptodate page. Then iomap_page_create() could just use kzalloc() and
> drop the atomic_set() and bitmap_zero() calls altogether,

Way ahead of you
http://git.infradead.org/users/willy/pagecache.git/commitdiff/5a1de6fc4f815797caa4a2f37c208c67afd7c20b

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] iomap: Ensure iop->uptodate matches PageUptodate
  2020-07-26 23:20   ` Matthew Wilcox
@ 2020-07-26 23:53     ` Dave Chinner
  2020-07-28  9:23       ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2020-07-26 23:53 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs, linux-fsdevel,
	Brian Foster, linux-kernel

On Mon, Jul 27, 2020 at 12:20:22AM +0100, Matthew Wilcox wrote:
> On Mon, Jul 27, 2020 at 09:06:57AM +1000, Dave Chinner wrote:
> > On Sun, Jul 26, 2020 at 10:10:52AM +0100, Matthew Wilcox (Oracle) wrote:
> > > If the filesystem has block size < page size and we end up calling
> > > iomap_page_create() in iomap_page_mkwrite_actor(), the uptodate bits
> > > would be zero, which causes us to skip writeback of blocks which are
> > > !uptodate in iomap_writepage_map().  This can lead to user data loss.
> > 
> > I'm still unclear on what condition gets us to
> > iomap_page_mkwrite_actor() without already having initialised the
> > page correctly. i.e. via a read() or write() call, or the read fault
> > prior to ->page_mkwrite() which would have marked the page uptodate
> > - that operation should have called iomap_page_create() and
> > iomap_set_range_uptodate() on the page....
> > 
> > i.e. you've described the symptom, but not the cause of the issue
> > you are addressing.
> 
> I don't know exactly what condition gets us there either.  It must be
> possible, or there wouldn't be a call to iomap_page_create() but rather
> one to to_iomap_page() like the one in iomap_finish_page_writeback().

Yes, I understand the code accepts it can happen; what I dislike is
code that asserts subtle behaviour can happen, then doesn't describe
that exactly why/how that condition can occur. And then, because we
don't know exactly how something happens, we add work arounds to
hide issues we can't reason through fully. That's .... suboptimal.

Christoph might know off the top of his head how we get into this
state. Once we work it out, then we need to add comments...

> > > reproduced on mainline using that test (the THP code causes iomap_pages
> > > to be discarded more frequently), but inspection shows it can happen
> > > with an appropriate series of operations.
> > 
> > That sequence of operations would be? 
> > 
> > > Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > ---
> > >  fs/iomap/buffered-io.c | 7 ++++++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > index a2b3b5455219..f0c5027bf33f 100644
> > > --- a/fs/iomap/buffered-io.c
> > > +++ b/fs/iomap/buffered-io.c
> > > @@ -53,7 +53,10 @@ iomap_page_create(struct inode *inode, struct page *page)
> > >  	atomic_set(&iop->read_count, 0);
> > >  	atomic_set(&iop->write_count, 0);
> > >  	spin_lock_init(&iop->uptodate_lock);
> > > -	bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
> > > +	if (PageUptodate(page))
> > > +		bitmap_fill(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
> > > +	else
> > > +		bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
> > 
> > I suspect this bitmap_fill call belongs in the iomap_page_mkwrite()
> > code as is the only code that can call iomap_page_create() with an
> > uptodate page. Then iomap_page_create() could just use kzalloc() and
> > drop the atomic_set() and bitmap_zero() calls altogether,
> 
> Way ahead of you
> http://git.infradead.org/users/willy/pagecache.git/commitdiff/5a1de6fc4f815797caa4a2f37c208c67afd7c20b

*nod*

I would suggest breaking that out as a separate cleanup patch and
not hide is in a patch that contains both THP modifications and bug
fixes. It stands alone as a valid cleanup.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] iomap: Ensure iop->uptodate matches PageUptodate
  2020-07-26 23:53     ` Dave Chinner
@ 2020-07-28  9:23       ` Christoph Hellwig
  2020-07-28 13:15         ` Matthew Wilcox
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2020-07-28  9:23 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Matthew Wilcox, Christoph Hellwig, Darrick J. Wong, linux-xfs,
	linux-fsdevel, Brian Foster, linux-kernel

On Mon, Jul 27, 2020 at 09:53:35AM +1000, Dave Chinner wrote:
> Yes, I understand the code accepts it can happen; what I dislike is
> code that asserts subtle behaviour can happen, then doesn't describe
> that exactly why/how that condition can occur. And then, because we
> don't know exactly how something happens, we add work arounds to
> hide issues we can't reason through fully. That's .... suboptimal.
> 
> Christoph might know off the top of his head how we get into this
> state. Once we work it out, then we need to add comments...

Unfortunately I don't know offhand.  I'll need to spend some more
quality time with this code first.

> > Way ahead of you
> > http://git.infradead.org/users/willy/pagecache.git/commitdiff/5a1de6fc4f815797caa4a2f37c208c67afd7c20b
> 
> *nod*
> 
> I would suggest breaking that out as a separate cleanup patch and
> not hide is in a patch that contains both THP modifications and bug
> fixes. It stands alone as a valid cleanup.

I'm pretty sure I already suggested that when it first showed up.

That being said I have another somewhat related thing in this area
that I really want to get done before THP support, and maybe I can
offload it to willy:

Currently we always allocate the iomap_page structure for blocksize
< PAGE_SIZE.  While this was easy to implement and a major improvement
over the buffer heads it actually is quite silly, as we only actually
need it if we either have sub-page uptodate state, or have extents
boundaries in the page.  So what I'd like to do is to only actually
allocate it in that case.  By doing the allocation lazy it should also
help to never allocate one that is marked all uptodate from the start.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] iomap: Ensure iop->uptodate matches PageUptodate
  2020-07-28  9:23       ` Christoph Hellwig
@ 2020-07-28 13:15         ` Matthew Wilcox
  0 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2020-07-28 13:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, Darrick J. Wong, linux-xfs, linux-fsdevel,
	Brian Foster, linux-kernel

On Tue, Jul 28, 2020 at 10:23:01AM +0100, Christoph Hellwig wrote:
> On Mon, Jul 27, 2020 at 09:53:35AM +1000, Dave Chinner wrote:
> > Yes, I understand the code accepts it can happen; what I dislike is
> > code that asserts subtle behaviour can happen, then doesn't describe
> > that exactly why/how that condition can occur. And then, because we
> > don't know exactly how something happens, we add work arounds to
> > hide issues we can't reason through fully. That's .... suboptimal.
> > 
> > Christoph might know off the top of his head how we get into this
> > state. Once we work it out, then we need to add comments...
> 
> Unfortunately I don't know offhand.  I'll need to spend some more
> quality time with this code first.

The code reads like you had several ideas for how the uptodate array
works, changing your mind as you went along, and it didn't quite get to
a coherent state before it was merged.  For example, there are parts
of the code which think that a clear bit in the uptodate array means
there's a hole in the file, eg

fs/iomap/seek.c:page_seek_hole_data() calls iomap_is_partially_uptodate()

but we set the uptodate bits when zeroing the parts of the page which
are covered by holes in iomap_readpage_actor()

> > > Way ahead of you
> > > http://git.infradead.org/users/willy/pagecache.git/commitdiff/5a1de6fc4f815797caa4a2f37c208c67afd7c20b
> > 
> > *nod*
> > 
> > I would suggest breaking that out as a separate cleanup patch and
> > not hide is in a patch that contains both THP modifications and bug
> > fixes. It stands alone as a valid cleanup.
> 
> I'm pretty sure I already suggested that when it first showed up.
> 
> That being said I have another somewhat related thing in this area
> that I really want to get done before THP support, and maybe I can
> offload it to willy:
> 
> Currently we always allocate the iomap_page structure for blocksize
> < PAGE_SIZE.  While this was easy to implement and a major improvement
> over the buffer heads it actually is quite silly, as we only actually
> need it if we either have sub-page uptodate state, or have extents
> boundaries in the page.  So what I'd like to do is to only actually
> allocate it in that case.  By doing the allocation lazy it should also
> help to never allocate one that is marked all uptodate from the start.

Hah, I want to do that too, and I was afraid I was going to have to
argue with you about it!

My thinking was to skip the allocation if the page lies entirely within
an iomap extent.  That will let us skip the allocation even for THPs
unless the file is fragmented.

I don't think it needs to get done before THP support, they're pretty
orthogonal.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-26  9:10 [PATCH] iomap: Ensure iop->uptodate matches PageUptodate Matthew Wilcox (Oracle)
2020-07-26 15:15 ` Christoph Hellwig
2020-07-26 23:06 ` Dave Chinner
2020-07-26 23:20   ` Matthew Wilcox
2020-07-26 23:53     ` Dave Chinner
2020-07-28  9:23       ` Christoph Hellwig
2020-07-28 13:15         ` Matthew Wilcox

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git