linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] btrfs: free space tree mounting fixes
@ 2020-09-03 20:33 Boris Burkov
  2020-09-03 20:33 ` [PATCH 1/2] btrfs: support remount of ro fs with free space tree Boris Burkov
  2020-09-03 20:33 ` [PATCH 2/2] btrfs: remove free space items when creating " Boris Burkov
  0 siblings, 2 replies; 6+ messages in thread
From: Boris Burkov @ 2020-09-03 20:33 UTC (permalink / raw)
  To: Dave Sterba, Josef Bacik, Chris Mason
  Cc: Boris Burkov, linux-btrfs, kernel-team

A couple fixes for issues with mounting the btrfs free space tree
(aka space_cache v2). These are not dependent, and are only related
loosely, in that they both apply to mounting the file system with
the free space tree.

The first patch fixes -o remount,space_cache=v2.

The second patch fixes the slight oversight of not cleaning up the
space cache free space object or free space inodes when migrating to
the free space tree.

Boris Burkov (2):
  btrfs: support remount of ro fs with free space tree
  btrfs: remove free space items when creating free space tree

 fs/btrfs/block-group.c      | 42 ++++---------------------------
 fs/btrfs/free-space-cache.c | 49 ++++++++++++++++++++++++++++++++++++-
 fs/btrfs/free-space-cache.h |  2 ++
 fs/btrfs/free-space-tree.c  |  3 +++
 fs/btrfs/super.c            | 17 +++++++++++++
 5 files changed, 75 insertions(+), 38 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] btrfs: support remount of ro fs with free space tree
  2020-09-03 20:33 [PATCH 0/2] btrfs: free space tree mounting fixes Boris Burkov
@ 2020-09-03 20:33 ` Boris Burkov
  2020-09-03 21:40   ` Josef Bacik
  2020-09-03 20:33 ` [PATCH 2/2] btrfs: remove free space items when creating " Boris Burkov
  1 sibling, 1 reply; 6+ messages in thread
From: Boris Burkov @ 2020-09-03 20:33 UTC (permalink / raw)
  To: Dave Sterba, Josef Bacik, Chris Mason
  Cc: Boris Burkov, linux-btrfs, kernel-team

When a user attempts to remount a btrfs filesystem with
'mount -o remount,space_cache=v2', that operation succeeds.
Unfortunately, this is misleading, because the remount does not create
the free space tree. /proc/mounts will incorrectly show space_cache=v2,
but on the next mount, the file system will revert to the old
space_cache.

For now, we handle only the easier case, where the existing mount is
read only. In that case, we can create the free space tree without
contending with the block groups changing as we go. If it is not read
only, we fail more explicitly so the user knows that the remount was not
successful, and we don't end up in a state where /proc/mounts is giving
misleading information.

References: https://github.com/btrfs/btrfs-todo/issues/5
Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/super.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 3ebe7240c63d..cbb2cdb0b488 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -47,6 +47,7 @@
 #include "tests/btrfs-tests.h"
 #include "block-group.h"
 #include "discard.h"
+#include "free-space-tree.h"
 
 #include "qgroup.h"
 #define CREATE_TRACE_POINTS
@@ -1862,6 +1863,22 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 	btrfs_resize_thread_pool(fs_info,
 		fs_info->thread_pool_size, old_thread_pool_size);
 
+	if (btrfs_test_opt(fs_info, FREE_SPACE_TREE) &&
+	    !btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) {
+		if (!sb_rdonly(sb)) {
+			btrfs_warn(fs_info, "cannot create free space tree on writeable mount");
+			ret = -EINVAL;
+			goto restore;
+		}
+		btrfs_info(fs_info, "creating free space tree");
+		ret = btrfs_create_free_space_tree(fs_info);
+		if (ret) {
+			btrfs_warn(fs_info,
+				   "failed to create free space tree: %d", ret);
+			goto restore;
+		}
+	}
+
 	if ((bool)(*flags & SB_RDONLY) == sb_rdonly(sb))
 		goto out;
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] btrfs: remove free space items when creating free space tree
  2020-09-03 20:33 [PATCH 0/2] btrfs: free space tree mounting fixes Boris Burkov
  2020-09-03 20:33 ` [PATCH 1/2] btrfs: support remount of ro fs with free space tree Boris Burkov
@ 2020-09-03 20:33 ` Boris Burkov
  2020-09-03 21:43   ` Josef Bacik
  1 sibling, 1 reply; 6+ messages in thread
From: Boris Burkov @ 2020-09-03 20:33 UTC (permalink / raw)
  To: Dave Sterba, Josef Bacik, Chris Mason
  Cc: Boris Burkov, linux-btrfs, kernel-team

When the file system transitions from space cache v1 to v2 it removes
the old cached data, but does not remove the FREE_SPACE items nor the
free space inodes they point to. This doesn't cause any issues besides
being a bit inefficient, since these items no longer do anything useful.

To fix it, as part of populating the free space tree, destroy each block
group's free space item and free space inode. This code is lifted from
the existing code for removing them when removing the block group.

Furthermore, cache_save_setup is called unconditionally from transaction
commit on dirty block groups, so we must also stop creating these items
when we are not using SPACE_CACHE.

References: https://github.com/btrfs/btrfs-todo/issues/5
Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/block-group.c      | 42 ++++---------------------------
 fs/btrfs/free-space-cache.c | 49 ++++++++++++++++++++++++++++++++++++-
 fs/btrfs/free-space-cache.h |  2 ++
 fs/btrfs/free-space-tree.c  |  3 +++
 4 files changed, 58 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 01e8ba1da1d3..d9cda5696649 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -892,8 +892,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 	struct btrfs_path *path;
 	struct btrfs_block_group *block_group;
 	struct btrfs_free_cluster *cluster;
-	struct btrfs_root *tree_root = fs_info->tree_root;
-	struct btrfs_key key;
 	struct inode *inode;
 	struct kobject *kobj = NULL;
 	int ret;
@@ -971,42 +969,9 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 	spin_unlock(&trans->transaction->dirty_bgs_lock);
 	mutex_unlock(&trans->transaction->cache_write_mutex);
 
-	if (!IS_ERR(inode)) {
-		ret = btrfs_orphan_add(trans, BTRFS_I(inode));
-		if (ret) {
-			btrfs_add_delayed_iput(inode);
-			goto out;
-		}
-		clear_nlink(inode);
-		/* One for the block groups ref */
-		spin_lock(&block_group->lock);
-		if (block_group->iref) {
-			block_group->iref = 0;
-			block_group->inode = NULL;
-			spin_unlock(&block_group->lock);
-			iput(inode);
-		} else {
-			spin_unlock(&block_group->lock);
-		}
-		/* One for our lookup ref */
-		btrfs_add_delayed_iput(inode);
-	}
-
-	key.objectid = BTRFS_FREE_SPACE_OBJECTID;
-	key.type = 0;
-	key.offset = block_group->start;
-
-	ret = btrfs_search_slot(trans, tree_root, &key, path, -1, 1);
-	if (ret < 0)
+	ret = remove_free_space_inode(trans, block_group);
+	if (ret)
 		goto out;
-	if (ret > 0)
-		btrfs_release_path(path);
-	if (ret == 0) {
-		ret = btrfs_del_item(trans, tree_root, path);
-		if (ret)
-			goto out;
-		btrfs_release_path(path);
-	}
 
 	spin_lock(&fs_info->block_group_cache_lock);
 	rb_erase(&block_group->cache_node,
@@ -2343,6 +2308,9 @@ static int cache_save_setup(struct btrfs_block_group *block_group,
 	int retries = 0;
 	int ret = 0;
 
+	if (!btrfs_test_opt(fs_info, SPACE_CACHE))
+		return 0;
+
 	/*
 	 * If this block group is smaller than 100 megs don't bother caching the
 	 * block group.
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 8759f5a1d6a0..52612d99a842 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -207,6 +207,54 @@ int create_free_space_inode(struct btrfs_trans_handle *trans,
 					 ino, block_group->start);
 }
 
+int remove_free_space_inode(struct btrfs_trans_handle *trans,
+			    struct btrfs_block_group *block_group)
+{
+	struct btrfs_path *path;
+	struct inode *inode;
+	struct btrfs_key key;
+	int ret = 0;
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	inode = lookup_free_space_inode(block_group, path);
+	if (IS_ERR(inode)) {
+		if (PTR_ERR(inode) != -ENOENT)
+			ret = PTR_ERR(inode);
+		goto out;
+	}
+	ret = btrfs_orphan_add(trans, BTRFS_I(inode));
+	if (ret) {
+		btrfs_add_delayed_iput(inode);
+		goto out;
+	}
+	clear_nlink(inode);
+	/* One for the block groups ref */
+	spin_lock(&block_group->lock);
+	if (block_group->iref) {
+		block_group->iref = 0;
+		block_group->inode = NULL;
+		spin_unlock(&block_group->lock);
+		iput(inode);
+	} else {
+		spin_unlock(&block_group->lock);
+	}
+	/* One for our lookup ref */
+	btrfs_add_delayed_iput(inode);
+
+	key.objectid = BTRFS_FREE_SPACE_OBJECTID;
+	key.type = 0;
+	key.offset = block_group->start;
+	ret = btrfs_search_slot(trans, trans->fs_info->tree_root, &key, path, -1, 1);
+	if (ret != 0)
+		goto out;
+	ret = btrfs_del_item(trans, trans->fs_info->tree_root, path);
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
 int btrfs_check_trunc_cache_free_space(struct btrfs_fs_info *fs_info,
 				       struct btrfs_block_rsv *rsv)
 {
@@ -2806,7 +2854,6 @@ void btrfs_remove_free_space_cache(struct btrfs_block_group *block_group)
 	__btrfs_remove_free_space_cache_locked(ctl);
 	btrfs_discard_update_discardable(block_group, ctl);
 	spin_unlock(&ctl->tree_lock);
-
 }
 
 /**
diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
index e3d5e0ad8f8e..f75302efc025 100644
--- a/fs/btrfs/free-space-cache.h
+++ b/fs/btrfs/free-space-cache.h
@@ -84,6 +84,8 @@ struct inode *lookup_free_space_inode(struct btrfs_block_group *block_group,
 int create_free_space_inode(struct btrfs_trans_handle *trans,
 			    struct btrfs_block_group *block_group,
 			    struct btrfs_path *path);
+int remove_free_space_inode(struct btrfs_trans_handle *trans,
+			    struct btrfs_block_group *block_group);
 
 int btrfs_check_trunc_cache_free_space(struct btrfs_fs_info *fs_info,
 				       struct btrfs_block_rsv *rsv);
diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c
index 6b9faf3b0e96..30943d585854 100644
--- a/fs/btrfs/free-space-tree.c
+++ b/fs/btrfs/free-space-tree.c
@@ -1165,6 +1165,9 @@ int btrfs_create_free_space_tree(struct btrfs_fs_info *fs_info)
 		block_group = rb_entry(node, struct btrfs_block_group,
 				       cache_node);
 		ret = populate_free_space_tree(trans, block_group);
+		if (ret)
+			goto abort;
+		ret = remove_free_space_inode(trans, block_group);
 		if (ret)
 			goto abort;
 		node = rb_next(node);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] btrfs: support remount of ro fs with free space tree
  2020-09-03 20:33 ` [PATCH 1/2] btrfs: support remount of ro fs with free space tree Boris Burkov
@ 2020-09-03 21:40   ` Josef Bacik
  2020-09-03 23:34     ` Boris Burkov
  0 siblings, 1 reply; 6+ messages in thread
From: Josef Bacik @ 2020-09-03 21:40 UTC (permalink / raw)
  To: Boris Burkov, Dave Sterba, Chris Mason; +Cc: linux-btrfs, kernel-team

On 9/3/20 4:33 PM, Boris Burkov wrote:
> When a user attempts to remount a btrfs filesystem with
> 'mount -o remount,space_cache=v2', that operation succeeds.
> Unfortunately, this is misleading, because the remount does not create
> the free space tree. /proc/mounts will incorrectly show space_cache=v2,
> but on the next mount, the file system will revert to the old
> space_cache.
> 
> For now, we handle only the easier case, where the existing mount is
> read only. In that case, we can create the free space tree without
> contending with the block groups changing as we go. If it is not read
> only, we fail more explicitly so the user knows that the remount was not
> successful, and we don't end up in a state where /proc/mounts is giving
> misleading information.
> 
> References: https://github.com/btrfs/btrfs-todo/issues/5
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
>   fs/btrfs/super.c | 17 +++++++++++++++++
>   1 file changed, 17 insertions(+)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 3ebe7240c63d..cbb2cdb0b488 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -47,6 +47,7 @@
>   #include "tests/btrfs-tests.h"
>   #include "block-group.h"
>   #include "discard.h"
> +#include "free-space-tree.h"
>   
>   #include "qgroup.h"
>   #define CREATE_TRACE_POINTS
> @@ -1862,6 +1863,22 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
>   	btrfs_resize_thread_pool(fs_info,
>   		fs_info->thread_pool_size, old_thread_pool_size);
>   
> +	if (btrfs_test_opt(fs_info, FREE_SPACE_TREE) &&
> +	    !btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) {
> +		if (!sb_rdonly(sb)) {
> +			btrfs_warn(fs_info, "cannot create free space tree on writeable mount");
> +			ret = -EINVAL;
> +			goto restore;
> +		}
> +		btrfs_info(fs_info, "creating free space tree");
> +		ret = btrfs_create_free_space_tree(fs_info);
> +		if (ret) {
> +			btrfs_warn(fs_info,
> +				   "failed to create free space tree: %d", ret);
> +			goto restore;
> +		}
> +	}
> +

This whole chunk actually needs to be moved down into the

} else {
	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) {

bit, and after all the checks to make sure it's ok to flip RW, but _before_ we 
do btrfs_cleanup_roots.

Also put a comment there saying something like

/*
  * Don't do anything that writes above this, otherwise bad things will happen.
  */

So that nobody accidentally messes this up in the future.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] btrfs: remove free space items when creating free space tree
  2020-09-03 20:33 ` [PATCH 2/2] btrfs: remove free space items when creating " Boris Burkov
@ 2020-09-03 21:43   ` Josef Bacik
  0 siblings, 0 replies; 6+ messages in thread
From: Josef Bacik @ 2020-09-03 21:43 UTC (permalink / raw)
  To: Boris Burkov, Dave Sterba, Chris Mason; +Cc: linux-btrfs, kernel-team

On 9/3/20 4:33 PM, Boris Burkov wrote:
> When the file system transitions from space cache v1 to v2 it removes
> the old cached data, but does not remove the FREE_SPACE items nor the
> free space inodes they point to. This doesn't cause any issues besides
> being a bit inefficient, since these items no longer do anything useful.
> 
> To fix it, as part of populating the free space tree, destroy each block
> group's free space item and free space inode. This code is lifted from
> the existing code for removing them when removing the block group.
> 
> Furthermore, cache_save_setup is called unconditionally from transaction
> commit on dirty block groups, so we must also stop creating these items
> when we are not using SPACE_CACHE.
> 
> References: https://github.com/btrfs/btrfs-todo/issues/5
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
>   fs/btrfs/block-group.c      | 42 ++++---------------------------
>   fs/btrfs/free-space-cache.c | 49 ++++++++++++++++++++++++++++++++++++-
>   fs/btrfs/free-space-cache.h |  2 ++
>   fs/btrfs/free-space-tree.c  |  3 +++
>   4 files changed, 58 insertions(+), 38 deletions(-)
> 
<snip>

>   
> +	if (!btrfs_test_opt(fs_info, SPACE_CACHE))
> +		return 0;
> +

This is functionally unrelated, so it needs to be it's own patch.

>   	/*
>   	 * If this block group is smaller than 100 megs don't bother caching the
>   	 * block group.
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index 8759f5a1d6a0..52612d99a842 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -207,6 +207,54 @@ int create_free_space_inode(struct btrfs_trans_handle *trans,
>   					 ino, block_group->start);
>   }
>   
> +int remove_free_space_inode(struct btrfs_trans_handle *trans,
> +			    struct btrfs_block_group *block_group)
> +{

It's public, lets call this btrfs_remove_free_space_inode().

<snip>

> @@ -2806,7 +2854,6 @@ void btrfs_remove_free_space_cache(struct btrfs_block_group *block_group)
>   	__btrfs_remove_free_space_cache_locked(ctl);
>   	btrfs_discard_update_discardable(block_group, ctl);
>   	spin_unlock(&ctl->tree_lock);
> -
>   }

Thou shall not change random whitespace that's not anywhere near the code you're 
modifying.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] btrfs: support remount of ro fs with free space tree
  2020-09-03 21:40   ` Josef Bacik
@ 2020-09-03 23:34     ` Boris Burkov
  0 siblings, 0 replies; 6+ messages in thread
From: Boris Burkov @ 2020-09-03 23:34 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Dave Sterba, Chris Mason, linux-btrfs, kernel-team

On Thu, Sep 03, 2020 at 05:40:39PM -0400, Josef Bacik wrote:
> On 9/3/20 4:33 PM, Boris Burkov wrote:
> >When a user attempts to remount a btrfs filesystem with
> >'mount -o remount,space_cache=v2', that operation succeeds.
> >Unfortunately, this is misleading, because the remount does not create
> >the free space tree. /proc/mounts will incorrectly show space_cache=v2,
> >but on the next mount, the file system will revert to the old
> >space_cache.
> >
> >For now, we handle only the easier case, where the existing mount is
> >read only. In that case, we can create the free space tree without
> >contending with the block groups changing as we go. If it is not read
> >only, we fail more explicitly so the user knows that the remount was not
> >successful, and we don't end up in a state where /proc/mounts is giving
> >misleading information.
> >
> >References: https://github.com/btrfs/btrfs-todo/issues/5
> >Signed-off-by: Boris Burkov <boris@bur.io>
> >---
> >  fs/btrfs/super.c | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> >
> >diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> >index 3ebe7240c63d..cbb2cdb0b488 100644
> >--- a/fs/btrfs/super.c
> >+++ b/fs/btrfs/super.c
> >@@ -47,6 +47,7 @@
> >  #include "tests/btrfs-tests.h"
> >  #include "block-group.h"
> >  #include "discard.h"
> >+#include "free-space-tree.h"
> >  #include "qgroup.h"
> >  #define CREATE_TRACE_POINTS
> >@@ -1862,6 +1863,22 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
> >  	btrfs_resize_thread_pool(fs_info,
> >  		fs_info->thread_pool_size, old_thread_pool_size);
> >+	if (btrfs_test_opt(fs_info, FREE_SPACE_TREE) &&
> >+	    !btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) {
> >+		if (!sb_rdonly(sb)) {
> >+			btrfs_warn(fs_info, "cannot create free space tree on writeable mount");
> >+			ret = -EINVAL;
> >+			goto restore;
> >+		}
> >+		btrfs_info(fs_info, "creating free space tree");
> >+		ret = btrfs_create_free_space_tree(fs_info);
> >+		if (ret) {
> >+			btrfs_warn(fs_info,
> >+				   "failed to create free space tree: %d", ret);
> >+			goto restore;
> >+		}
> >+	}
> >+
> 
> This whole chunk actually needs to be moved down into the
> 
> } else {
> 	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) {
> 
> bit, and after all the checks to make sure it's ok to flip RW, but
> _before_ we do btrfs_cleanup_roots.
> 
> Also put a comment there saying something like
> 
> /*
>  * Don't do anything that writes above this, otherwise bad things will happen.
>  */
> 
> So that nobody accidentally messes this up in the future.  Thanks,
> 
> Josef

This makes sense, since we need to be able to write to create the tree.
My mistake. With that said, do you think I should keep the logic that
makes remount explicitly fail when -o space_cache=v2 won't actually be
honored?

e.g.:
- fs is rw
- fs is ro, remount is ro

I think that loudly failing in these cases makes the user experience
quite a bit better, but it's not as simple as just throwing the create
code in the appropriate block for the ro->rw transition case.

Thanks for the review,
Boris

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-09-03 23:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-03 20:33 [PATCH 0/2] btrfs: free space tree mounting fixes Boris Burkov
2020-09-03 20:33 ` [PATCH 1/2] btrfs: support remount of ro fs with free space tree Boris Burkov
2020-09-03 21:40   ` Josef Bacik
2020-09-03 23:34     ` Boris Burkov
2020-09-03 20:33 ` [PATCH 2/2] btrfs: remove free space items when creating " Boris Burkov
2020-09-03 21:43   ` Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).