[PATCH] Btrfs: fix negative subv_writers counter and data space leak after buffered write

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] Btrfs: fix negative subv_writers counter and data space leak after buffered write
@ 2019-10-09 16:44 fdmanana
  2019-10-11 13:27 ` Josef Bacik
  2019-10-11 15:41 ` [PATCH v2] " fdmanana
  0 siblings, 2 replies; 6+ messages in thread
From: fdmanana @ 2019-10-09 16:44 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When doing a buffered write it's possible to leave the subv_writers
counter of the root, used for synchronization between buffered nocow
writers and snapshotting. This happens in an exceptional case like the
following:

1) We fail to allocate data space for the write, since there's not
   enough available data space nor enough unallocated space for allocating
   a new data block group;

2) Because of that failure, we try to go to NOCOW mode, which succeeds
   and therefore we set the local variable 'only_release_metadata' to true
   and set the root's sub_writers counter to 1 through the call to
   btrfs_start_write_no_snapshotting() made by check_can_nocow();

3) The call to btrfs_copy_from_user() returns zero, which is very unlikely
   to happen but not impossible;

4) No pages are copied because btrfs_copy_from_user() returned zero;

5) We call btrfs_end_write_no_snapshotting() which decrements the root's
   subv_writers counter to 0;

6) We don't set 'only_release_metadata' back to 'false' because we do
   it only if 'copied', the value returned by btrfs_copy_from_user(), is
   greater than zero;

7) On the next iteration of the while loop, which processes the same
   page range, we are now able to allocate data space for the write (we
   got enough data space released in the meanwhile);

8) After this if we fail at btrfs_delalloc_reserve_metadata(), because
   now there isn't enough free metadata space, or in some other place
   further below (prepare_pages(), lock_and_cleanup_extent_if_need(),
   btrfs_dirty_pages()), we break out of the while loop with
   'only_release_metadata' having a value of 'true';

9) Because 'only_release_metadata' is 'true' we end up decrementing the
   root's subv_writers counter to -1, and we also end up not releasing the
   data space previously reserved through btrfs_check_data_free_space().
   As a consequence the mechanism for synchronizing NOCOW buffered writes
   with snapshotting gets broken.

Fix this by always setting 'only_release_metadata' to false whenever it
currently has a true value, independently of having been able to copy any
data to the pages.

Fixes: 8257b2dc3c1a10 ("Btrfs: introduce btrfs_{start, end}_nocow_write() for each subvolume")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/file.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 27e5b269e729..c98c1d10fd3a 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1780,18 +1780,19 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		}
 
 		release_bytes = 0;
-		if (only_release_metadata)
+		if (only_release_metadata) {
 			btrfs_end_write_no_snapshotting(root);
-
-		if (only_release_metadata && copied > 0) {
-			lockstart = round_down(pos,
-					       fs_info->sectorsize);
-			lockend = round_up(pos + copied,
-					   fs_info->sectorsize) - 1;
-
-			set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart,
-				       lockend, EXTENT_NORESERVE, NULL,
-				       NULL, GFP_NOFS);
+			if (copied > 0) {
+				lockstart = round_down(pos,
+						       fs_info->sectorsize);
+				lockend = round_up(pos + copied,
+						   fs_info->sectorsize) - 1;
+
+				set_extent_bit(&BTRFS_I(inode)->io_tree,
+					       lockstart, lockend,
+					       EXTENT_NORESERVE, NULL, NULL,
+					       GFP_NOFS);
+			}
 			only_release_metadata = false;
 		}
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] Btrfs: fix negative subv_writers counter and data space leak after buffered write
  2019-10-09 16:44 [PATCH] Btrfs: fix negative subv_writers counter and data space leak after buffered write fdmanana
@ 2019-10-11 13:27 ` Josef Bacik
  2019-10-11 15:40   ` Filipe Manana
  2019-10-11 15:41 ` [PATCH v2] " fdmanana
  1 sibling, 1 reply; 6+ messages in thread
From: Josef Bacik @ 2019-10-11 13:27 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Wed, Oct 09, 2019 at 05:44:22PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When doing a buffered write it's possible to leave the subv_writers
> counter of the root, used for synchronization between buffered nocow
> writers and snapshotting. This happens in an exceptional case like the
> following:
> 
> 1) We fail to allocate data space for the write, since there's not
>    enough available data space nor enough unallocated space for allocating
>    a new data block group;
> 
> 2) Because of that failure, we try to go to NOCOW mode, which succeeds
>    and therefore we set the local variable 'only_release_metadata' to true
>    and set the root's sub_writers counter to 1 through the call to
>    btrfs_start_write_no_snapshotting() made by check_can_nocow();
> 
> 3) The call to btrfs_copy_from_user() returns zero, which is very unlikely
>    to happen but not impossible;
> 
> 4) No pages are copied because btrfs_copy_from_user() returned zero;
> 
> 5) We call btrfs_end_write_no_snapshotting() which decrements the root's
>    subv_writers counter to 0;
> 
> 6) We don't set 'only_release_metadata' back to 'false' because we do
>    it only if 'copied', the value returned by btrfs_copy_from_user(), is
>    greater than zero;
> 
> 7) On the next iteration of the while loop, which processes the same
>    page range, we are now able to allocate data space for the write (we
>    got enough data space released in the meanwhile);
> 
> 8) After this if we fail at btrfs_delalloc_reserve_metadata(), because
>    now there isn't enough free metadata space, or in some other place
>    further below (prepare_pages(), lock_and_cleanup_extent_if_need(),
>    btrfs_dirty_pages()), we break out of the while loop with
>    'only_release_metadata' having a value of 'true';
> 
> 9) Because 'only_release_metadata' is 'true' we end up decrementing the
>    root's subv_writers counter to -1, and we also end up not releasing the
>    data space previously reserved through btrfs_check_data_free_space().
>    As a consequence the mechanism for synchronizing NOCOW buffered writes
>    with snapshotting gets broken.
> 
> Fix this by always setting 'only_release_metadata' to false whenever it
> currently has a true value, independently of having been able to copy any
> data to the pages.

Can we accomplish the same thing by just doing

only_release_metadata = false;

at the start of the loop?  That way we only ever deal with it in its current
scope?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Btrfs: fix negative subv_writers counter and data space leak after buffered write
  2019-10-11 13:27 ` Josef Bacik
@ 2019-10-11 15:40   ` Filipe Manana
  0 siblings, 0 replies; 6+ messages in thread
From: Filipe Manana @ 2019-10-11 15:40 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

On Fri, Oct 11, 2019 at 2:27 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Wed, Oct 09, 2019 at 05:44:22PM +0100, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > When doing a buffered write it's possible to leave the subv_writers
> > counter of the root, used for synchronization between buffered nocow
> > writers and snapshotting. This happens in an exceptional case like the
> > following:
> >
> > 1) We fail to allocate data space for the write, since there's not
> >    enough available data space nor enough unallocated space for allocating
> >    a new data block group;
> >
> > 2) Because of that failure, we try to go to NOCOW mode, which succeeds
> >    and therefore we set the local variable 'only_release_metadata' to true
> >    and set the root's sub_writers counter to 1 through the call to
> >    btrfs_start_write_no_snapshotting() made by check_can_nocow();
> >
> > 3) The call to btrfs_copy_from_user() returns zero, which is very unlikely
> >    to happen but not impossible;
> >
> > 4) No pages are copied because btrfs_copy_from_user() returned zero;
> >
> > 5) We call btrfs_end_write_no_snapshotting() which decrements the root's
> >    subv_writers counter to 0;
> >
> > 6) We don't set 'only_release_metadata' back to 'false' because we do
> >    it only if 'copied', the value returned by btrfs_copy_from_user(), is
> >    greater than zero;
> >
> > 7) On the next iteration of the while loop, which processes the same
> >    page range, we are now able to allocate data space for the write (we
> >    got enough data space released in the meanwhile);
> >
> > 8) After this if we fail at btrfs_delalloc_reserve_metadata(), because
> >    now there isn't enough free metadata space, or in some other place
> >    further below (prepare_pages(), lock_and_cleanup_extent_if_need(),
> >    btrfs_dirty_pages()), we break out of the while loop with
> >    'only_release_metadata' having a value of 'true';
> >
> > 9) Because 'only_release_metadata' is 'true' we end up decrementing the
> >    root's subv_writers counter to -1, and we also end up not releasing the
> >    data space previously reserved through btrfs_check_data_free_space().
> >    As a consequence the mechanism for synchronizing NOCOW buffered writes
> >    with snapshotting gets broken.
> >
> > Fix this by always setting 'only_release_metadata' to false whenever it
> > currently has a true value, independently of having been able to copy any
> > data to the pages.
>
> Can we accomplish the same thing by just doing
>
> only_release_metadata = false;
>
> at the start of the loop?  That way we only ever deal with it in its current
> scope?  Thanks,

Yeah, that's probably better. I just felt to leave it closer to the
last place where it's used.
Thanks.

>
> Josef

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] Btrfs: fix negative subv_writers counter and data space leak after buffered write
  2019-10-09 16:44 [PATCH] Btrfs: fix negative subv_writers counter and data space leak after buffered write fdmanana
  2019-10-11 13:27 ` Josef Bacik
@ 2019-10-11 15:41 ` fdmanana
  2019-10-11 17:14   ` Josef Bacik
  2019-10-11 18:23   ` David Sterba
  1 sibling, 2 replies; 6+ messages in thread
From: fdmanana @ 2019-10-11 15:41 UTC (permalink / raw)
  To: linux-btrfs; +Cc: josef, Filipe Manana

From: Filipe Manana <fdmanana@suse.com>

When doing a buffered write it's possible to leave the subv_writers
counter of the root, used for synchronization between buffered nocow
writers and snapshotting. This happens in an exceptional case like the
following:

1) We fail to allocate data space for the write, since there's not
   enough available data space nor enough unallocated space for allocating
   a new data block group;

2) Because of that failure, we try to go to NOCOW mode, which succeeds
   and therefore we set the local variable 'only_release_metadata' to true
   and set the root's sub_writers counter to 1 through the call to
   btrfs_start_write_no_snapshotting() made by check_can_nocow();

3) The call to btrfs_copy_from_user() returns zero, which is very unlikely
   to happen but not impossible;

4) No pages are copied because btrfs_copy_from_user() returned zero;

5) We call btrfs_end_write_no_snapshotting() which decrements the root's
   subv_writers counter to 0;

6) We don't set 'only_release_metadata' back to 'false' because we do
   it only if 'copied', the value returned by btrfs_copy_from_user(), is
   greater than zero;

7) On the next iteration of the while loop, which processes the same
   page range, we are now able to allocate data space for the write (we
   got enough data space released in the meanwhile);

8) After this if we fail at btrfs_delalloc_reserve_metadata(), because
   now there isn't enough free metadata space, or in some other place
   further below (prepare_pages(), lock_and_cleanup_extent_if_need(),
   btrfs_dirty_pages()), we break out of the while loop with
   'only_release_metadata' having a value of 'true';

9) Because 'only_release_metadata' is 'true' we end up decrementing the
   root's subv_writers counter to -1 (through a call to
   btrfs_end_write_no_snapshotting()), and we also end up not releasing the
   data space previously reserved through btrfs_check_data_free_space().
   As a consequence the mechanism for synchronizing NOCOW buffered writes
   with snapshotting gets broken.

Fix this by always setting 'only_release_metadata' to false at the start
of each iteration.

Fixes: 8257b2dc3c1a10 ("Btrfs: introduce btrfs_{start, end}_nocow_write() for each subvolume")
Fixes: 7ee9e4405f264e ("Btrfs: check if we can nocow if we don't have data space")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---

V2: Moved assignment of false to only_release_metadata to the beginning of
    loop instead. And another "Fixes:" tag that corresponds to the data
    space leak, since the other if for counter dropping to -1 bug.

 fs/btrfs/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 27e5b269e729..352928b45d2a 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1636,6 +1636,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 			break;
 		}
 
+		only_release_metadata = false;
 		sector_offset = pos & (fs_info->sectorsize - 1);
 		reserve_bytes = round_up(write_bytes + sector_offset,
 				fs_info->sectorsize);
@@ -1792,7 +1793,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 			set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart,
 				       lockend, EXTENT_NORESERVE, NULL,
 				       NULL, GFP_NOFS);
-			only_release_metadata = false;
 		}
 
 		btrfs_drop_pages(pages, num_pages);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] Btrfs: fix negative subv_writers counter and data space leak after buffered write
  2019-10-11 15:41 ` [PATCH v2] " fdmanana
@ 2019-10-11 17:14   ` Josef Bacik
  2019-10-11 18:23   ` David Sterba
  1 sibling, 0 replies; 6+ messages in thread
From: Josef Bacik @ 2019-10-11 17:14 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, josef, Filipe Manana

On Fri, Oct 11, 2019 at 04:41:20PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When doing a buffered write it's possible to leave the subv_writers
> counter of the root, used for synchronization between buffered nocow
> writers and snapshotting. This happens in an exceptional case like the
> following:
> 
> 1) We fail to allocate data space for the write, since there's not
>    enough available data space nor enough unallocated space for allocating
>    a new data block group;
> 
> 2) Because of that failure, we try to go to NOCOW mode, which succeeds
>    and therefore we set the local variable 'only_release_metadata' to true
>    and set the root's sub_writers counter to 1 through the call to
>    btrfs_start_write_no_snapshotting() made by check_can_nocow();
> 
> 3) The call to btrfs_copy_from_user() returns zero, which is very unlikely
>    to happen but not impossible;
> 
> 4) No pages are copied because btrfs_copy_from_user() returned zero;
> 
> 5) We call btrfs_end_write_no_snapshotting() which decrements the root's
>    subv_writers counter to 0;
> 
> 6) We don't set 'only_release_metadata' back to 'false' because we do
>    it only if 'copied', the value returned by btrfs_copy_from_user(), is
>    greater than zero;
> 
> 7) On the next iteration of the while loop, which processes the same
>    page range, we are now able to allocate data space for the write (we
>    got enough data space released in the meanwhile);
> 
> 8) After this if we fail at btrfs_delalloc_reserve_metadata(), because
>    now there isn't enough free metadata space, or in some other place
>    further below (prepare_pages(), lock_and_cleanup_extent_if_need(),
>    btrfs_dirty_pages()), we break out of the while loop with
>    'only_release_metadata' having a value of 'true';
> 
> 9) Because 'only_release_metadata' is 'true' we end up decrementing the
>    root's subv_writers counter to -1 (through a call to
>    btrfs_end_write_no_snapshotting()), and we also end up not releasing the
>    data space previously reserved through btrfs_check_data_free_space().
>    As a consequence the mechanism for synchronizing NOCOW buffered writes
>    with snapshotting gets broken.
> 
> Fix this by always setting 'only_release_metadata' to false at the start
> of each iteration.
> 
> Fixes: 8257b2dc3c1a10 ("Btrfs: introduce btrfs_{start, end}_nocow_write() for each subvolume")
> Fixes: 7ee9e4405f264e ("Btrfs: check if we can nocow if we don't have data space")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] Btrfs: fix negative subv_writers counter and data space leak after buffered write
  2019-10-11 15:41 ` [PATCH v2] " fdmanana
  2019-10-11 17:14   ` Josef Bacik
@ 2019-10-11 18:23   ` David Sterba
  1 sibling, 0 replies; 6+ messages in thread
From: David Sterba @ 2019-10-11 18:23 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, josef, Filipe Manana

On Fri, Oct 11, 2019 at 04:41:20PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When doing a buffered write it's possible to leave the subv_writers
> counter of the root, used for synchronization between buffered nocow
> writers and snapshotting. This happens in an exceptional case like the
> following:
> 
> 1) We fail to allocate data space for the write, since there's not
>    enough available data space nor enough unallocated space for allocating
>    a new data block group;
> 
> 2) Because of that failure, we try to go to NOCOW mode, which succeeds
>    and therefore we set the local variable 'only_release_metadata' to true
>    and set the root's sub_writers counter to 1 through the call to
>    btrfs_start_write_no_snapshotting() made by check_can_nocow();
> 
> 3) The call to btrfs_copy_from_user() returns zero, which is very unlikely
>    to happen but not impossible;
> 
> 4) No pages are copied because btrfs_copy_from_user() returned zero;
> 
> 5) We call btrfs_end_write_no_snapshotting() which decrements the root's
>    subv_writers counter to 0;
> 
> 6) We don't set 'only_release_metadata' back to 'false' because we do
>    it only if 'copied', the value returned by btrfs_copy_from_user(), is
>    greater than zero;
> 
> 7) On the next iteration of the while loop, which processes the same
>    page range, we are now able to allocate data space for the write (we
>    got enough data space released in the meanwhile);
> 
> 8) After this if we fail at btrfs_delalloc_reserve_metadata(), because
>    now there isn't enough free metadata space, or in some other place
>    further below (prepare_pages(), lock_and_cleanup_extent_if_need(),
>    btrfs_dirty_pages()), we break out of the while loop with
>    'only_release_metadata' having a value of 'true';
> 
> 9) Because 'only_release_metadata' is 'true' we end up decrementing the
>    root's subv_writers counter to -1 (through a call to
>    btrfs_end_write_no_snapshotting()), and we also end up not releasing the
>    data space previously reserved through btrfs_check_data_free_space().
>    As a consequence the mechanism for synchronizing NOCOW buffered writes
>    with snapshotting gets broken.
> 
> Fix this by always setting 'only_release_metadata' to false at the start
> of each iteration.
> 
> Fixes: 8257b2dc3c1a10 ("Btrfs: introduce btrfs_{start, end}_nocow_write() for each subvolume")
> Fixes: 7ee9e4405f264e ("Btrfs: check if we can nocow if we don't have data space")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> 
> V2: Moved assignment of false to only_release_metadata to the beginning of
>     loop instead. And another "Fixes:" tag that corresponds to the data
>     space leak, since the other if for counter dropping to -1 bug.

V2 looks better indeed. I'll add a stable tag but will not queue the
patch for 5.4-rc due to 3) and otherwise low chances to hit the problem.
Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-10-11 18:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-09 16:44 [PATCH] Btrfs: fix negative subv_writers counter and data space leak after buffered write fdmanana
2019-10-11 13:27 ` Josef Bacik
2019-10-11 15:40   ` Filipe Manana
2019-10-11 15:41 ` [PATCH v2] " fdmanana
2019-10-11 17:14   ` Josef Bacik
2019-10-11 18:23   ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).