All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] btrfs: fix possible free space tree corruption with online conversion
@ 2020-12-10 14:32 Josef Bacik
  2020-12-11 10:22 ` Filipe Manana
  2020-12-15 16:49 ` David Sterba
  0 siblings, 2 replies; 3+ messages in thread
From: Josef Bacik @ 2020-12-10 14:32 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: stable

While running btrfs/011 in a loop I would often ASSERT() while trying to
add a new free space entry that already existed, or get an -EEXIST while
adding a new block to the extent tree, which is another indication of
double allocation.

This occurs because when we do the free space tree population, we create
the new root and then populate the tree and commit the transaction.
The problem is when you create a new root, the root node and commit root
node are the same.  This means that caching a block group before the
transaction is committed can race with other operations modifying the
free space tree, and thus you can get double adds and other sort of
shenanigans.  This is only a problem for the first transaction, once
we've committed the transaction we created the free space tree in we're
OK to use the free space tree to cache block groups.

Fix this by marking the fs_info as unsafe to load the free space tree,
and fall back on the old slow method.  We could be smarter than this,
for example caching the block group while we're populating the free
space tree, but since this is a serious problem I've opted for the
simplest solution.

cc: stable@vger.kernel.org
Fixes: a5ed91828518 ("Btrfs: implement the free space B-tree")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-group.c     | 11 ++++++++++-
 fs/btrfs/ctree.h           |  3 +++
 fs/btrfs/free-space-tree.c | 10 +++++++++-
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 52f2198d44c9..b8bbdd95743e 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -673,7 +673,16 @@ static noinline void caching_thread(struct btrfs_work *work)
 		wake_up(&caching_ctl->wait);
 	}
 
-	if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE))
+	/*
+	 * If we are in the transaction that populated the free space tree we
+	 * can't actually cache from the free space tree as our commit root and
+	 * real root are the same, so we could change the contents of the blocks
+	 * while caching.  Instead do the slow caching in this case, and after
+	 * the transaction has committed we will be safe.
+	 */
+	if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE) &&
+	    !(test_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED,
+		       &fs_info->flags)))
 		ret = load_free_space_tree(caching_ctl);
 	else
 		ret = load_extent_tree_free(caching_ctl);
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 3935d297d198..4a60d81da5cb 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -562,6 +562,9 @@ enum {
 
 	/* Indicate that we need to cleanup space cache v1 */
 	BTRFS_FS_CLEANUP_SPACE_CACHE_V1,
+
+	/* Indicate that we can't trust the free space tree for caching yet. */
+	BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED,
 };
 
 /*
diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c
index e33a65bd9a0c..a33bca94d133 100644
--- a/fs/btrfs/free-space-tree.c
+++ b/fs/btrfs/free-space-tree.c
@@ -1150,6 +1150,7 @@ int btrfs_create_free_space_tree(struct btrfs_fs_info *fs_info)
 		return PTR_ERR(trans);
 
 	set_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags);
+	set_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags);
 	free_space_root = btrfs_create_tree(trans,
 					    BTRFS_FREE_SPACE_TREE_OBJECTID);
 	if (IS_ERR(free_space_root)) {
@@ -1171,11 +1172,18 @@ int btrfs_create_free_space_tree(struct btrfs_fs_info *fs_info)
 	btrfs_set_fs_compat_ro(fs_info, FREE_SPACE_TREE);
 	btrfs_set_fs_compat_ro(fs_info, FREE_SPACE_TREE_VALID);
 	clear_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags);
+	ret = btrfs_commit_transaction(trans);
 
-	return btrfs_commit_transaction(trans);
+	/*
+	 * Now that we've committed the transaction any reading of our commit
+	 * root will be safe, so we can cache from the free space tree now.
+	 */
+	clear_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags);
+	return ret;
 
 abort:
 	clear_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags);
+	clear_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags);
 	btrfs_abort_transaction(trans, ret);
 	btrfs_end_transaction(trans);
 	return ret;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] btrfs: fix possible free space tree corruption with online conversion
  2020-12-10 14:32 [PATCH v2] btrfs: fix possible free space tree corruption with online conversion Josef Bacik
@ 2020-12-11 10:22 ` Filipe Manana
  2020-12-15 16:49 ` David Sterba
  1 sibling, 0 replies; 3+ messages in thread
From: Filipe Manana @ 2020-12-11 10:22 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team, stable

On Fri, Dec 11, 2020 at 8:56 AM Josef Bacik <josef@toxicpanda.com> wrote:
>
> While running btrfs/011 in a loop I would often ASSERT() while trying to
> add a new free space entry that already existed, or get an -EEXIST while
> adding a new block to the extent tree, which is another indication of
> double allocation.
>
> This occurs because when we do the free space tree population, we create
> the new root and then populate the tree and commit the transaction.
> The problem is when you create a new root, the root node and commit root
> node are the same.  This means that caching a block group before the
> transaction is committed can race with other operations modifying the
> free space tree, and thus you can get double adds and other sort of
> shenanigans.  This is only a problem for the first transaction, once
> we've committed the transaction we created the free space tree in we're
> OK to use the free space tree to cache block groups.
>
> Fix this by marking the fs_info as unsafe to load the free space tree,
> and fall back on the old slow method.  We could be smarter than this,
> for example caching the block group while we're populating the free
> space tree, but since this is a serious problem I've opted for the
> simplest solution.
>
> cc: stable@vger.kernel.org
> Fixes: a5ed91828518 ("Btrfs: implement the free space B-tree")
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Looks good, but the explanation you gave in the reply to Nikolay's
comment makes it easier to realize how the problem happens.
I think it should be mentioned, in the changelog, that the operations
that can concurrently modify the free space tree are the insertions
from running delayed references.

Anyway,

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> ---
>  fs/btrfs/block-group.c     | 11 ++++++++++-
>  fs/btrfs/ctree.h           |  3 +++
>  fs/btrfs/free-space-tree.c | 10 +++++++++-
>  3 files changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 52f2198d44c9..b8bbdd95743e 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -673,7 +673,16 @@ static noinline void caching_thread(struct btrfs_work *work)
>                 wake_up(&caching_ctl->wait);
>         }
>
> -       if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE))
> +       /*
> +        * If we are in the transaction that populated the free space tree we
> +        * can't actually cache from the free space tree as our commit root and
> +        * real root are the same, so we could change the contents of the blocks
> +        * while caching.  Instead do the slow caching in this case, and after
> +        * the transaction has committed we will be safe.
> +        */
> +       if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE) &&
> +           !(test_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED,
> +                      &fs_info->flags)))
>                 ret = load_free_space_tree(caching_ctl);
>         else
>                 ret = load_extent_tree_free(caching_ctl);
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 3935d297d198..4a60d81da5cb 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -562,6 +562,9 @@ enum {
>
>         /* Indicate that we need to cleanup space cache v1 */
>         BTRFS_FS_CLEANUP_SPACE_CACHE_V1,
> +
> +       /* Indicate that we can't trust the free space tree for caching yet. */
> +       BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED,
>  };
>
>  /*
> diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c
> index e33a65bd9a0c..a33bca94d133 100644
> --- a/fs/btrfs/free-space-tree.c
> +++ b/fs/btrfs/free-space-tree.c
> @@ -1150,6 +1150,7 @@ int btrfs_create_free_space_tree(struct btrfs_fs_info *fs_info)
>                 return PTR_ERR(trans);
>
>         set_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags);
> +       set_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags);
>         free_space_root = btrfs_create_tree(trans,
>                                             BTRFS_FREE_SPACE_TREE_OBJECTID);
>         if (IS_ERR(free_space_root)) {
> @@ -1171,11 +1172,18 @@ int btrfs_create_free_space_tree(struct btrfs_fs_info *fs_info)
>         btrfs_set_fs_compat_ro(fs_info, FREE_SPACE_TREE);
>         btrfs_set_fs_compat_ro(fs_info, FREE_SPACE_TREE_VALID);
>         clear_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags);
> +       ret = btrfs_commit_transaction(trans);
>
> -       return btrfs_commit_transaction(trans);
> +       /*
> +        * Now that we've committed the transaction any reading of our commit
> +        * root will be safe, so we can cache from the free space tree now.
> +        */
> +       clear_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags);
> +       return ret;
>
>  abort:
>         clear_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags);
> +       clear_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags);
>         btrfs_abort_transaction(trans, ret);
>         btrfs_end_transaction(trans);
>         return ret;
> --
> 2.26.2
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] btrfs: fix possible free space tree corruption with online conversion
  2020-12-10 14:32 [PATCH v2] btrfs: fix possible free space tree corruption with online conversion Josef Bacik
  2020-12-11 10:22 ` Filipe Manana
@ 2020-12-15 16:49 ` David Sterba
  1 sibling, 0 replies; 3+ messages in thread
From: David Sterba @ 2020-12-15 16:49 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team, stable

On Thu, Dec 10, 2020 at 09:32:31AM -0500, Josef Bacik wrote:
> While running btrfs/011 in a loop I would often ASSERT() while trying to
> add a new free space entry that already existed, or get an -EEXIST while
> adding a new block to the extent tree, which is another indication of
> double allocation.

Do you have the stack traces? I'll update the changelog if you send it.

> This occurs because when we do the free space tree population, we create
> the new root and then populate the tree and commit the transaction.
> The problem is when you create a new root, the root node and commit root
> node are the same.  This means that caching a block group before the
> transaction is committed can race with other operations modifying the
> free space tree, and thus you can get double adds and other sort of
> shenanigans.  This is only a problem for the first transaction, once
> we've committed the transaction we created the free space tree in we're
> OK to use the free space tree to cache block groups.
> 
> Fix this by marking the fs_info as unsafe to load the free space tree,
> and fall back on the old slow method.  We could be smarter than this,
> for example caching the block group while we're populating the free
> space tree, but since this is a serious problem I've opted for the
> simplest solution.

Makes sense, this is a one-time thing during setup.

> cc: stable@vger.kernel.org

CC: stable@vger.kernel.org

If you send patch with the tag it's good to also note the minimal
version where it's relevant as you can easily reuse the knowledge from
developing the fix. The patch may not apply due to other changes, but
trivial context fixups are done by people interested in the backports.

> Fixes: a5ed91828518 ("Btrfs: implement the free space B-tree")

So if you know the offending commit, run

  $ git describe --contains a5ed91828518
  v4.5-rc1~21^2~22^2^2~3

which is 4.9.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-12-15 16:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-10 14:32 [PATCH v2] btrfs: fix possible free space tree corruption with online conversion Josef Bacik
2020-12-11 10:22 ` Filipe Manana
2020-12-15 16:49 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.