From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:42006 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725817AbeKMHlt (ORCPT ); Tue, 13 Nov 2018 02:41:49 -0500 From: Chris Mason To: "gregkh@linuxfoundation.org" CC: "dsterba@suse.com" , "stable@vger.kernel.org" Subject: Re: FAILED: patch "[PATCH] Btrfs: don't clean dirty pages during buffered writes" failed to apply to 4.14-stable tree Date: Mon, 12 Nov 2018 21:46:23 +0000 Message-ID: <42748700-6ABF-4308-98F9-86FC4DC642CF@fb.com> References: <1541967508228252@kroah.com> In-Reply-To: <1541967508228252@kroah.com> Content-Language: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sender: stable-owner@vger.kernel.org List-ID: On 11 Nov 2018, at 12:18, gregkh@linuxfoundation.org wrote: > The patch below does not apply to the 4.14-stable tree. > If someone wants it applied there, or to any other stable or longterm > tree, then please email the backport, including the original git=20 > commit > id to . This is a tiny race window that's been broken since day one in btrfs. =20 I'm fine with leaving the fix out of older kernels, but Dave if you want=20 this backported just let me know. -chris > > thanks, > > greg k-h > > ------------------ original commit in Linus's tree ------------------ > > From 7703bdd8d23e6ef057af3253958a793ec6066b28 Mon Sep 17 00:00:00 2001 > From: Chris Mason > Date: Wed, 20 Jun 2018 07:56:11 -0700 > Subject: [PATCH] Btrfs: don't clean dirty pages during buffered writes > > During buffered writes, we follow this basic series of steps: > > again: > lock all the pages > wait for writeback on all the pages > Take the extent range lock > wait for ordered extents on the whole range > clean all the pages > > if (copy_from_user_in_atomic() hits a fault) { > drop our locks > goto again; > } > > dirty all the pages > release all the locks > > The extra waiting, cleaning and locking are there to make sure we=20 > don't > modify pages in flight to the drive, after they've been crc'd. > > If some of the pages in the range were already dirty when the write > began, and we need to goto again, we create a window where a dirty=20 > page > has been cleaned and unlocked. It may be reclaimed before we're able=20 > to > lock it again, which means we'll read the old contents off the drive=20 > and > lose any modifications that had been pending writeback. > > We don't actually need to clean the pages. All of the other locking=20 > in > place makes sure we don't start IO on the pages, so we can just leave > them dirty for the duration of the write. > > Fixes: 73d59314e6ed (the original btrfs merge) > CC: stable@vger.kernel.org # v4.4+ > Signed-off-by: Chris Mason > Reviewed-by: David Sterba > Signed-off-by: David Sterba > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index d254cf94545f..15b925142793 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -531,6 +531,14 @@ int btrfs_dirty_pages(struct inode *inode, struct=20 > page **pages, > > end_of_last_block =3D start_pos + num_bytes - 1; > > + /* > + * The pages may have already been dirty, clear out old accounting=20 > so > + * we can set things up properly > + */ > + clear_extent_bit(&BTRFS_I(inode)->io_tree, start_pos,=20 > end_of_last_block, > + EXTENT_DIRTY | EXTENT_DELALLOC | > + EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 0, 0, cached); > + > if (!btrfs_is_free_space_inode(BTRFS_I(inode))) { > if (start_pos >=3D isize && > !(BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC)) { > @@ -1500,18 +1508,27 @@ lock_and_cleanup_extent_if_need(struct=20 > btrfs_inode *inode, struct page **pages, > } > if (ordered) > btrfs_put_ordered_extent(ordered); > - clear_extent_bit(&inode->io_tree, start_pos, last_pos, > - EXTENT_DIRTY | EXTENT_DELALLOC | > - EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, > - 0, 0, cached_state); > + > *lockstart =3D start_pos; > *lockend =3D last_pos; > ret =3D 1; > } > > + /* > + * It's possible the pages are dirty right now, but we don't want > + * to clean them yet because copy_from_user may catch a page fault > + * and we might have to fall back to one page at a time. If that > + * happens, we'll unlock these pages and we'd have a window where > + * reclaim could sneak in and drop the once-dirty page on the floor > + * without writing it. > + * > + * We have the pages locked and the extent range locked, so there's > + * no way someone can start IO on any dirty pages in this range. > + * > + * We'll call btrfs_dirty_pages() later on, and that will flip=20 > around > + * delalloc bits and dirty the pages as required. > + */ > for (i =3D 0; i < num_pages; i++) { > - if (clear_page_dirty_for_io(pages[i])) > - account_page_redirty(pages[i]); > set_page_extent_mapped(pages[i]); > WARN_ON(!PageLocked(pages[i])); > }