All of lore.kernel.org
 help / color / mirror / Atom feed
* FAILED: patch "[PATCH] Btrfs: don't clean dirty pages during buffered writes" failed to apply to 4.14-stable tree
@ 2018-11-11 20:18 gregkh
  2018-11-12 21:46 ` Chris Mason
  0 siblings, 1 reply; 2+ messages in thread
From: gregkh @ 2018-11-11 20:18 UTC (permalink / raw)
  To: clm, dsterba; +Cc: stable


The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable@vger.kernel.org>.

thanks,

greg k-h

------------------ original commit in Linus's tree ------------------

>From 7703bdd8d23e6ef057af3253958a793ec6066b28 Mon Sep 17 00:00:00 2001
From: Chris Mason <clm@fb.com>
Date: Wed, 20 Jun 2018 07:56:11 -0700
Subject: [PATCH] Btrfs: don't clean dirty pages during buffered writes

During buffered writes, we follow this basic series of steps:

again:
	lock all the pages
	wait for writeback on all the pages
	Take the extent range lock
	wait for ordered extents on the whole range
	clean all the pages

	if (copy_from_user_in_atomic() hits a fault) {
		drop our locks
		goto again;
	}

	dirty all the pages
	release all the locks

The extra waiting, cleaning and locking are there to make sure we don't
modify pages in flight to the drive, after they've been crc'd.

If some of the pages in the range were already dirty when the write
began, and we need to goto again, we create a window where a dirty page
has been cleaned and unlocked.  It may be reclaimed before we're able to
lock it again, which means we'll read the old contents off the drive and
lose any modifications that had been pending writeback.

We don't actually need to clean the pages.  All of the other locking in
place makes sure we don't start IO on the pages, so we can just leave
them dirty for the duration of the write.

Fixes: 73d59314e6ed (the original btrfs merge)
CC: stable@vger.kernel.org # v4.4+
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index d254cf94545f..15b925142793 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -531,6 +531,14 @@ int btrfs_dirty_pages(struct inode *inode, struct page **pages,
 
 	end_of_last_block = start_pos + num_bytes - 1;
 
+	/*
+	 * The pages may have already been dirty, clear out old accounting so
+	 * we can set things up properly
+	 */
+	clear_extent_bit(&BTRFS_I(inode)->io_tree, start_pos, end_of_last_block,
+			 EXTENT_DIRTY | EXTENT_DELALLOC |
+			 EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 0, 0, cached);
+
 	if (!btrfs_is_free_space_inode(BTRFS_I(inode))) {
 		if (start_pos >= isize &&
 		    !(BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC)) {
@@ -1500,18 +1508,27 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
 		}
 		if (ordered)
 			btrfs_put_ordered_extent(ordered);
-		clear_extent_bit(&inode->io_tree, start_pos, last_pos,
-				 EXTENT_DIRTY | EXTENT_DELALLOC |
-				 EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
-				 0, 0, cached_state);
+
 		*lockstart = start_pos;
 		*lockend = last_pos;
 		ret = 1;
 	}
 
+	/*
+	 * It's possible the pages are dirty right now, but we don't want
+	 * to clean them yet because copy_from_user may catch a page fault
+	 * and we might have to fall back to one page at a time.  If that
+	 * happens, we'll unlock these pages and we'd have a window where
+	 * reclaim could sneak in and drop the once-dirty page on the floor
+	 * without writing it.
+	 *
+	 * We have the pages locked and the extent range locked, so there's
+	 * no way someone can start IO on any dirty pages in this range.
+	 *
+	 * We'll call btrfs_dirty_pages() later on, and that will flip around
+	 * delalloc bits and dirty the pages as required.
+	 */
 	for (i = 0; i < num_pages; i++) {
-		if (clear_page_dirty_for_io(pages[i]))
-			account_page_redirty(pages[i]);
 		set_page_extent_mapped(pages[i]);
 		WARN_ON(!PageLocked(pages[i]));
 	}

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: FAILED: patch "[PATCH] Btrfs: don't clean dirty pages during buffered writes" failed to apply to 4.14-stable tree
  2018-11-11 20:18 FAILED: patch "[PATCH] Btrfs: don't clean dirty pages during buffered writes" failed to apply to 4.14-stable tree gregkh
@ 2018-11-12 21:46 ` Chris Mason
  0 siblings, 0 replies; 2+ messages in thread
From: Chris Mason @ 2018-11-12 21:46 UTC (permalink / raw)
  To: gregkh; +Cc: dsterba, stable

On 11 Nov 2018, at 12:18, gregkh@linuxfoundation.org wrote:

> The patch below does not apply to the 4.14-stable tree.
> If someone wants it applied there, or to any other stable or longterm
> tree, then please email the backport, including the original git 
> commit
> id to <stable@vger.kernel.org>.

This is a tiny race window that's been broken since day one in btrfs.  
I'm fine with leaving the fix out of older kernels, but Dave if you want 
this backported just let me know.

-chris

>
> thanks,
>
> greg k-h
>
> ------------------ original commit in Linus's tree ------------------
>
> From 7703bdd8d23e6ef057af3253958a793ec6066b28 Mon Sep 17 00:00:00 2001
> From: Chris Mason <clm@fb.com>
> Date: Wed, 20 Jun 2018 07:56:11 -0700
> Subject: [PATCH] Btrfs: don't clean dirty pages during buffered writes
>
> During buffered writes, we follow this basic series of steps:
>
> again:
> 	lock all the pages
> 	wait for writeback on all the pages
> 	Take the extent range lock
> 	wait for ordered extents on the whole range
> 	clean all the pages
>
> 	if (copy_from_user_in_atomic() hits a fault) {
> 		drop our locks
> 		goto again;
> 	}
>
> 	dirty all the pages
> 	release all the locks
>
> The extra waiting, cleaning and locking are there to make sure we 
> don't
> modify pages in flight to the drive, after they've been crc'd.
>
> If some of the pages in the range were already dirty when the write
> began, and we need to goto again, we create a window where a dirty 
> page
> has been cleaned and unlocked.  It may be reclaimed before we're able 
> to
> lock it again, which means we'll read the old contents off the drive 
> and
> lose any modifications that had been pending writeback.
>
> We don't actually need to clean the pages.  All of the other locking 
> in
> place makes sure we don't start IO on the pages, so we can just leave
> them dirty for the duration of the write.
>
> Fixes: 73d59314e6ed (the original btrfs merge)
> CC: stable@vger.kernel.org # v4.4+
> Signed-off-by: Chris Mason <clm@fb.com>
> Reviewed-by: David Sterba <dsterba@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index d254cf94545f..15b925142793 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -531,6 +531,14 @@ int btrfs_dirty_pages(struct inode *inode, struct 
> page **pages,
>
>  	end_of_last_block = start_pos + num_bytes - 1;
>
> +	/*
> +	 * The pages may have already been dirty, clear out old accounting 
> so
> +	 * we can set things up properly
> +	 */
> +	clear_extent_bit(&BTRFS_I(inode)->io_tree, start_pos, 
> end_of_last_block,
> +			 EXTENT_DIRTY | EXTENT_DELALLOC |
> +			 EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 0, 0, cached);
> +
>  	if (!btrfs_is_free_space_inode(BTRFS_I(inode))) {
>  		if (start_pos >= isize &&
>  		    !(BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC)) {
> @@ -1500,18 +1508,27 @@ lock_and_cleanup_extent_if_need(struct 
> btrfs_inode *inode, struct page **pages,
>  		}
>  		if (ordered)
>  			btrfs_put_ordered_extent(ordered);
> -		clear_extent_bit(&inode->io_tree, start_pos, last_pos,
> -				 EXTENT_DIRTY | EXTENT_DELALLOC |
> -				 EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
> -				 0, 0, cached_state);
> +
>  		*lockstart = start_pos;
>  		*lockend = last_pos;
>  		ret = 1;
>  	}
>
> +	/*
> +	 * It's possible the pages are dirty right now, but we don't want
> +	 * to clean them yet because copy_from_user may catch a page fault
> +	 * and we might have to fall back to one page at a time.  If that
> +	 * happens, we'll unlock these pages and we'd have a window where
> +	 * reclaim could sneak in and drop the once-dirty page on the floor
> +	 * without writing it.
> +	 *
> +	 * We have the pages locked and the extent range locked, so there's
> +	 * no way someone can start IO on any dirty pages in this range.
> +	 *
> +	 * We'll call btrfs_dirty_pages() later on, and that will flip 
> around
> +	 * delalloc bits and dirty the pages as required.
> +	 */
>  	for (i = 0; i < num_pages; i++) {
> -		if (clear_page_dirty_for_io(pages[i]))
> -			account_page_redirty(pages[i]);
>  		set_page_extent_mapped(pages[i]);
>  		WARN_ON(!PageLocked(pages[i]));
>  	}

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-11-13  7:41 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-11 20:18 FAILED: patch "[PATCH] Btrfs: don't clean dirty pages during buffered writes" failed to apply to 4.14-stable tree gregkh
2018-11-12 21:46 ` Chris Mason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.