All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] mkfs: Remove temporary chunks
@ 2015-07-07  8:15 Qu Wenruo
  2015-07-07  8:15 ` [PATCH 1/7] btrfs-progs: disk-io: Support commit transaction on chunk tree Qu Wenruo
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Qu Wenruo @ 2015-07-07  8:15 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

Mkfs has a long standing problem, which will always create temporary
chunks.

Normally this is OK and do no harms.
But if someone create a btrfs with RAID0 data/metadata, and then mount
it, do a balance (without extra option) immediately, then the data
system chunk profile will magically turns to SINGLE.

The detailed bug and fix can be found at the last patch.

This patchset will fix it in a quick method.
Just remove the temporary chunks at the end of mkfs time.

But IMHO the fix is not elegant nor perfect.
1) Too many codes
   About 500+ lines to fix the problem.
   Although most of them are missing infrastructure to remove things.

2) Causing holes in devices.
   The removed temporary chunks will leave about 18 megabytes holes at
   the beginning of device 1.

TODO:
The perfect solution would be something like below:
0) Create a bones-only in-memory fs_info.
   To allow us use btrfs infrastructure.

1) Check all devices and add them into in-memory structures.
   Maybe fs_info->devices.

2) Create system chunk first
   We now have all devices, can create one perfect system chunk.

3) Create chunk root into the created chunk root.
   Also setup things like avail_alloc_bits.

4) Create the rest of tree root.
   Metadata chunk will be allocated on demand.

5) Create data chunk.
   Even though it would be empty through the whole mkfs.

But the perfect solution needs a lot of code change as the whole
workflow is changed.

So I took current fix even it's not perfect, but most practice one.

Already rebased to David's devel branch.

Qu Wenruo (7):
  btrfs-progs: disk-io: Support commit transaction on chunk tree.
  btrfs-progs: extent-tree: Introduce free_block_group_item function.
  btrfs-progs: extent-tree: Introduce functions to free dev extents in a
    chunk
  btrfs-progs: extent-tree: Introduce functions to free chunk items
  btrfs-progs: extent-tree: Introduce functions to free in-memory block 
       group cache
  btrfs-progs: extent-tree: Introduce btrfs_free_block_group function.
  btrfs-progs: mkfs: Cleanup temporary chunk to avoid strange balance
    behavior.

 ctree.h       |   2 +
 disk-io.c     |   2 +
 extent-tree.c | 376 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mkfs.c        | 150 +++++++++++++++++++++++
 4 files changed, 530 insertions(+)

-- 
2.4.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/7] btrfs-progs: disk-io: Support commit transaction on chunk tree.
  2015-07-07  8:15 [PATCH 0/7] mkfs: Remove temporary chunks Qu Wenruo
@ 2015-07-07  8:15 ` Qu Wenruo
  2015-07-07  8:15 ` [PATCH 2/7] btrfs-progs: extent-tree: Introduce free_block_group_item function Qu Wenruo
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2015-07-07  8:15 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

As chunk tree is only stored in super block, chunk tree commit doesn't
need to go through tree root update.
Or a BUG_ON will be triggered.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 disk-io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/disk-io.c b/disk-io.c
index 6a53843..fdcfd6d 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -571,6 +571,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 		goto commit_tree;
 	if (root == root->fs_info->tree_root)
 		goto commit_tree;
+	if (root == root->fs_info->chunk_root)
+		goto commit_tree;
 
 	free_extent_buffer(root->commit_root);
 	root->commit_root = NULL;
-- 
2.4.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/7] btrfs-progs: extent-tree: Introduce free_block_group_item function.
  2015-07-07  8:15 [PATCH 0/7] mkfs: Remove temporary chunks Qu Wenruo
  2015-07-07  8:15 ` [PATCH 1/7] btrfs-progs: disk-io: Support commit transaction on chunk tree Qu Wenruo
@ 2015-07-07  8:15 ` Qu Wenruo
  2015-07-07  8:15 ` [PATCH 3/7] btrfs-progs: extent-tree: Introduce functions to free dev extents in a chunk Qu Wenruo
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2015-07-07  8:15 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

This function is used to free a block group item.
It must be called with all the space in the block group pinned.
Or there is a possibility that tree blocks be allocated into the range.

The function is used for later block group/chunk free function.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 extent-tree.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/extent-tree.c b/extent-tree.c
index c24af6a..d002a4f 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -3448,6 +3448,42 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
 }
 
 /*
+ * Just remove a block group item in extent tree
+ * Caller should ensure the block group is empty and all space is pinned.
+ * Or new tree block/data may be allocated into it.
+ */
+static int free_block_group_item(struct btrfs_trans_handle *trans,
+				 struct btrfs_fs_info *fs_info,
+				 u64 bytenr, u64 len)
+{
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	struct btrfs_root *root = fs_info->extent_root;
+	int ret = 0;
+
+	key.objectid = bytenr;
+	key.offset = len;
+	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
+	if (ret > 0) {
+		ret = -ENOENT;
+		goto out;
+	}
+	if (ret < 0)
+		goto out;
+
+	ret = btrfs_del_item(trans, root, path);
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
+/*
  * Fixup block accounting. The initial block accounting created by
  * make_block_groups isn't accuracy in this case.
  */
-- 
2.4.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/7] btrfs-progs: extent-tree: Introduce functions to free dev extents in a chunk
  2015-07-07  8:15 [PATCH 0/7] mkfs: Remove temporary chunks Qu Wenruo
  2015-07-07  8:15 ` [PATCH 1/7] btrfs-progs: disk-io: Support commit transaction on chunk tree Qu Wenruo
  2015-07-07  8:15 ` [PATCH 2/7] btrfs-progs: extent-tree: Introduce free_block_group_item function Qu Wenruo
@ 2015-07-07  8:15 ` Qu Wenruo
  2015-07-07  8:15 ` [PATCH 4/7] btrfs-progs: extent-tree: Introduce functions to free chunk items Qu Wenruo
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2015-07-07  8:15 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

Introduce two functions, free_dev_extent_item and
free_chunk_dev_extent_items, to free dev extent items in a chunk.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 extent-tree.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/extent-tree.c b/extent-tree.c
index d002a4f..afc7822 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -3483,6 +3483,80 @@ out:
 	return ret;
 }
 
+static int free_dev_extent_item(struct btrfs_trans_handle *trans,
+				struct btrfs_fs_info *fs_info,
+				u64 devid, u64 dev_offset)
+{
+	struct btrfs_root *root = fs_info->dev_root;
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	int ret;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+
+	key.objectid = devid;
+	key.type = BTRFS_DEV_EXTENT_KEY;
+	key.offset = dev_offset;
+
+	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
+	if (ret < 0)
+		goto out;
+	if (ret > 0) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	ret = btrfs_del_item(trans, root, path);
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
+static int free_chunk_dev_extent_items(struct btrfs_trans_handle *trans,
+				       struct btrfs_fs_info *fs_info,
+				       u64 chunk_offset)
+{
+	struct btrfs_chunk *chunk = NULL;
+	struct btrfs_root *root= fs_info->chunk_root;
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	u16 num_stripes;
+	int i;
+	int ret;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
+	key.type = BTRFS_CHUNK_ITEM_KEY;
+	key.offset = chunk_offset;
+
+	ret = btrfs_search_slot(trans, root, &key, path, 0, 0);
+	if (ret < 0)
+		goto out;
+	if (ret > 0) {
+		ret = -ENOENT;
+		goto out;
+	}
+	chunk = btrfs_item_ptr(path->nodes[0], path->slots[0],
+			       struct btrfs_chunk);
+	num_stripes = btrfs_chunk_num_stripes(path->nodes[0], chunk);
+	for (i = 0; i < num_stripes; i++) {
+		ret = free_dev_extent_item(trans, fs_info,
+			btrfs_stripe_devid_nr(path->nodes[0], chunk, i),
+			btrfs_stripe_offset_nr(path->nodes[0], chunk, i));
+		if (ret < 0)
+			goto out;
+	}
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
 /*
  * Fixup block accounting. The initial block accounting created by
  * make_block_groups isn't accuracy in this case.
-- 
2.4.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/7] btrfs-progs: extent-tree: Introduce functions to free chunk items
  2015-07-07  8:15 [PATCH 0/7] mkfs: Remove temporary chunks Qu Wenruo
                   ` (2 preceding siblings ...)
  2015-07-07  8:15 ` [PATCH 3/7] btrfs-progs: extent-tree: Introduce functions to free dev extents in a chunk Qu Wenruo
@ 2015-07-07  8:15 ` Qu Wenruo
  2015-07-07  8:15 ` [PATCH 5/7] btrfs-progs: extent-tree: Introduce functions to free in-memory block group cache Qu Wenruo
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2015-07-07  8:15 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

Introduce two functions, free_chunk_item and free_system_chunk_item.
First one will free chunk item in chunk tree.
The latter one will free a system chunk in super block.

They are used for later chunk/block group free function.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 extent-tree.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 86 insertions(+)

diff --git a/extent-tree.c b/extent-tree.c
index afc7822..499838c 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -3557,6 +3557,92 @@ out:
 	return ret;
 }
 
+static int free_system_chunk_item(struct btrfs_super_block *super,
+				  struct btrfs_key *key)
+{
+	struct btrfs_disk_key *disk_key;
+	struct btrfs_key cpu_key;
+	u32 array_size = btrfs_super_sys_array_size(super);
+	char *ptr = (char *)super->sys_chunk_array;
+	int cur = 0;
+	int ret = -ENOENT;
+
+	while (cur < btrfs_super_sys_array_size(super)) {
+		struct btrfs_chunk *chunk;
+		u32 num_stripes;
+		u32 chunk_len;
+
+		disk_key = (struct btrfs_disk_key *)(ptr + cur);
+		btrfs_disk_key_to_cpu(&cpu_key, disk_key);
+		if (cpu_key.type != BTRFS_CHUNK_ITEM_KEY) {
+			/* just in case */
+			ret = -EIO;
+			goto out;
+		}
+
+		chunk = (struct btrfs_chunk *)(ptr + cur + sizeof(*disk_key));
+		num_stripes = btrfs_stack_chunk_num_stripes(chunk);
+		chunk_len = btrfs_chunk_item_size(num_stripes) +
+			    sizeof(*disk_key);
+
+		if (key->objectid == cpu_key.objectid &&
+		    key->offset == cpu_key.offset &&
+		    key->type == cpu_key.type) {
+			memmove(ptr + cur, ptr + cur + chunk_len,
+				array_size - cur - chunk_len);
+			array_size -= chunk_len;
+			btrfs_set_super_sys_array_size(super, array_size);
+			ret = 0;
+			goto out;
+		}
+
+		cur += chunk_len;
+	}
+out:
+	return ret;
+}
+
+static int free_chunk_item(struct btrfs_trans_handle *trans,
+			   struct btrfs_fs_info *fs_info,
+			   u64 bytenr, u64 len)
+{
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	struct btrfs_root *root = fs_info->chunk_root;
+	struct btrfs_chunk *chunk;
+	u64 chunk_type;
+	int ret;
+
+	key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
+	key.offset = bytenr;
+	key.type = BTRFS_CHUNK_ITEM_KEY;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
+	if (ret > 0) {
+		ret = -ENOENT;
+		goto out;
+	}
+	if (ret < 0)
+		goto out;
+	chunk = btrfs_item_ptr(path->nodes[0], path->slots[0],
+			       struct btrfs_chunk);
+	chunk_type = btrfs_chunk_type(path->nodes[0], chunk);
+
+	ret = btrfs_del_item(trans, root, path);
+	if (ret < 0)
+		goto out;
+
+	if (chunk_type & BTRFS_BLOCK_GROUP_SYSTEM)
+		ret = free_system_chunk_item(fs_info->super_copy, &key);
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
 /*
  * Fixup block accounting. The initial block accounting created by
  * make_block_groups isn't accuracy in this case.
-- 
2.4.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/7] btrfs-progs: extent-tree: Introduce functions to free in-memory block group cache
  2015-07-07  8:15 [PATCH 0/7] mkfs: Remove temporary chunks Qu Wenruo
                   ` (3 preceding siblings ...)
  2015-07-07  8:15 ` [PATCH 4/7] btrfs-progs: extent-tree: Introduce functions to free chunk items Qu Wenruo
@ 2015-07-07  8:15 ` Qu Wenruo
  2015-07-07  8:15 ` [PATCH 6/7] btrfs-progs: extent-tree: Introduce btrfs_free_block_group function Qu Wenruo
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2015-07-07  8:15 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

Introduce two functions, free_space_info and free_block_group_cache.

The former will free the space of a empty block group.
The latter will free the in memory block group cache along with its
space in space_info and device space.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 extent-tree.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 100 insertions(+)

diff --git a/extent-tree.c b/extent-tree.c
index 499838c..39d6288 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1801,6 +1801,30 @@ static struct btrfs_space_info *__find_space_info(struct btrfs_fs_info *info,
 
 }
 
+static int free_space_info(struct btrfs_fs_info *fs_info, u64 flags,
+                          u64 total_bytes, u64 bytes_used,
+                          struct btrfs_space_info **space_info)
+{
+	struct btrfs_space_info *found;
+	/* only support free block group which is empty */
+	if (bytes_used)
+		return -ENOTTY;
+
+	found = __find_space_info(fs_info, flags);
+	if (!found)
+		return -ENOENT;
+	if (found->total_bytes < total_bytes) {
+		fprintf(stderr,
+			"warning, bad space info to free %llu only have %llu\n",
+			total_bytes, found->total_bytes);
+		return -EINVAL;
+	}
+	found->total_bytes -= total_bytes;
+	if (space_info)
+		*space_info = found;
+	return 0;
+}
+
 static int update_space_info(struct btrfs_fs_info *info, u64 flags,
 			     u64 total_bytes, u64 bytes_used,
 			     struct btrfs_space_info **space_info)
@@ -3643,6 +3667,82 @@ out:
 	return ret;
 }
 
+static u64 get_dev_extent_len(struct map_lookup *map)
+{
+	int div;
+
+	switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
+	case 0: /* Single */
+	case BTRFS_BLOCK_GROUP_DUP:
+	case BTRFS_BLOCK_GROUP_RAID1:
+		div = 1;
+		break;
+	case BTRFS_BLOCK_GROUP_RAID5:
+		div = (map->num_stripes - 1);
+		break;
+	case BTRFS_BLOCK_GROUP_RAID6:
+		div = (map->num_stripes - 2);
+		break;
+	case BTRFS_BLOCK_GROUP_RAID10:
+		div = (map->num_stripes / map->sub_stripes);
+		break;
+	default:
+		/* normally, read chunk security hook should handled it */
+		BUG_ON(1);
+	}
+	return map->ce.size / div;
+}
+
+/* free block group/chunk related caches */
+static int free_block_group_cache(struct btrfs_trans_handle *trans,
+				  struct btrfs_fs_info *fs_info,
+				  u64 bytenr, u64 len)
+{
+	struct btrfs_block_group_cache *cache;
+	struct cache_extent *ce;
+	struct map_lookup *map;
+	int ret;
+	int i;
+	u64 flags;
+
+	/* Free block group cache first */
+	cache = btrfs_lookup_block_group(fs_info, bytenr);
+	if (!cache)
+		return -ENOENT;
+	flags = cache->flags;
+	if (cache->free_space_ctl) {
+		btrfs_remove_free_space_cache(cache);
+		kfree(cache->free_space_ctl);
+	}
+	clear_extent_bits(&fs_info->block_group_cache, bytenr, bytenr + len,
+			  (unsigned int)-1, GFP_NOFS);
+	ret = free_space_info(fs_info, flags, len, 0, NULL);
+	if (ret < 0)
+		goto out;
+	kfree(cache);
+
+	/* Then free mapping info and dev usage info */
+	ce = search_cache_extent(&fs_info->mapping_tree.cache_tree, bytenr);
+	if (!ce || ce->start != bytenr) {
+		ret = -ENOENT;
+		goto out;
+	}
+	map = container_of(ce, struct map_lookup, ce);
+	for (i = 0; i < map->num_stripes; i++) {
+		struct btrfs_device *device;
+
+		device = map->stripes[i].dev;
+		device->bytes_used -= get_dev_extent_len(map);
+		ret = btrfs_update_device(trans, device);
+		if (ret < 0)
+			goto out;
+	}
+	remove_cache_extent(&fs_info->mapping_tree.cache_tree, ce);
+	free(map);
+out:
+	return ret;
+}
+
 /*
  * Fixup block accounting. The initial block accounting created by
  * make_block_groups isn't accuracy in this case.
-- 
2.4.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 6/7] btrfs-progs: extent-tree: Introduce btrfs_free_block_group function.
  2015-07-07  8:15 [PATCH 0/7] mkfs: Remove temporary chunks Qu Wenruo
                   ` (4 preceding siblings ...)
  2015-07-07  8:15 ` [PATCH 5/7] btrfs-progs: extent-tree: Introduce functions to free in-memory block group cache Qu Wenruo
@ 2015-07-07  8:15 ` Qu Wenruo
  2015-07-07  8:15 ` [PATCH 7/7] btrfs-progs: mkfs: Cleanup temporary chunk to avoid strange balance behavior Qu Wenruo
  2015-07-10 12:46 ` [PATCH 0/7] mkfs: Remove temporary chunks David Sterba
  7 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2015-07-07  8:15 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

This function will be used to free a empty chunk.

This provides the basis for later temp chunk cleanup.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 ctree.h       |  2 ++
 extent-tree.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+)

diff --git a/ctree.h b/ctree.h
index f14a795..5550d45 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2275,6 +2275,8 @@ int btrfs_record_file_extent(struct btrfs_trans_handle *trans,
 			      struct btrfs_inode_item *inode,
 			      u64 file_pos, u64 disk_bytenr,
 			      u64 num_bytes);
+int btrfs_free_block_group(struct btrfs_trans_handle *trans,
+			   struct btrfs_fs_info *fs_info, u64 bytenr, u64 len);
 /* ctree.c */
 int btrfs_comp_cpu_keys(struct btrfs_key *k1, struct btrfs_key *k2);
 int btrfs_del_ptr(struct btrfs_trans_handle *trans, struct btrfs_root *root,
diff --git a/extent-tree.c b/extent-tree.c
index 39d6288..358bd19 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -3743,6 +3743,86 @@ out:
 	return ret;
 }
 
+int btrfs_free_block_group(struct btrfs_trans_handle *trans,
+			   struct btrfs_fs_info *fs_info, u64 bytenr, u64 len)
+{
+	struct btrfs_root *extent_root = fs_info->extent_root;
+	struct btrfs_path *path;
+	struct btrfs_block_group_item *bgi;
+	struct btrfs_key key;
+	int ret = 0;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = bytenr;
+	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
+	key.offset = len;
+
+	/* Double check the block group to ensure it's empty */
+	ret = btrfs_search_slot(trans, extent_root, &key, path, 0, 0);
+	if (ret > 0) {
+		ret = -ENONET;
+		goto out;
+	}
+	if (ret < 0)
+		goto out;
+
+	bgi = btrfs_item_ptr(path->nodes[0], path->slots[0],
+			     struct btrfs_block_group_item);
+	if (btrfs_disk_block_group_used(path->nodes[0], bgi)) {
+		fprintf(stderr,
+			"warning: block group [%llu,%llu) is not empty to free\n",
+			bytenr, bytenr + len);
+		ret = -EINVAL;
+		goto out;
+	}
+	btrfs_release_path(path);
+
+	/*
+	 * Now pin all space in the block group, to prevent further transaction
+	 * allocate space from it.
+	 * Every operation needs a transaction must be in the range.
+	 */
+	btrfs_pin_extent(fs_info, bytenr, len);
+
+	/* delete block group item and chunk item */
+	ret = free_block_group_item(trans, fs_info, bytenr, len);
+	if (ret < 0) {
+		fprintf(stderr,
+			"failed to free block group item for [%llu,%llu)\n",
+			bytenr, bytenr + len);
+		btrfs_unpin_extent(fs_info, bytenr, len);
+		goto out;
+	}
+
+	ret = free_chunk_dev_extent_items(trans, fs_info, bytenr);
+	if (ret < 0) {
+		fprintf(stderr,
+			"failed to dev extents belongs to [%llu,%llu)\n",
+			bytenr, bytenr + len);
+		btrfs_unpin_extent(fs_info, bytenr, len);
+		goto out;
+	}
+	ret = free_chunk_item(trans, fs_info, bytenr, len);
+	if (ret < 0) {
+		fprintf(stderr,
+			"failed to free chunk for [%llu,%llu)\n",
+			bytenr, bytenr + len);
+		btrfs_unpin_extent(fs_info, bytenr, len);
+		goto out;
+	}
+
+	/* Now release the block_group_cache */
+	ret = free_block_group_cache(trans, fs_info, bytenr, len);
+	btrfs_unpin_extent(fs_info, bytenr, len);
+
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
 /*
  * Fixup block accounting. The initial block accounting created by
  * make_block_groups isn't accuracy in this case.
-- 
2.4.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 7/7] btrfs-progs: mkfs: Cleanup temporary chunk to avoid strange balance behavior.
  2015-07-07  8:15 [PATCH 0/7] mkfs: Remove temporary chunks Qu Wenruo
                   ` (5 preceding siblings ...)
  2015-07-07  8:15 ` [PATCH 6/7] btrfs-progs: extent-tree: Introduce btrfs_free_block_group function Qu Wenruo
@ 2015-07-07  8:15 ` Qu Wenruo
  2015-07-14 17:33   ` David Sterba
  2015-07-10 12:46 ` [PATCH 0/7] mkfs: Remove temporary chunks David Sterba
  7 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2015-07-07  8:15 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

[BUG]
 # mkfs.btrfs /dev/sdb /dev/sdd -m raid0 -d raid0
 # mount /dev/sdb /mnt/btrfs
 # btrfs balance start /mnt/btrfs
 # btrfs fi df /mnt/btrfs
 Data, single: total=1.00GiB, used=320.00KiB
 System, single: total=32.00MiB, used=16.00KiB
 Metadata, RAID0: total=256.00MiB, used=112.00KiB
 GlobalReserve, single: total=16.00MiB, used=0.00B

Only metadata stay RAID0. Data and system goes from RAID0 to single.

[REASON]
The problem is caused by the temporary single chunk.
In mkfs, it will always create single data/metadata/sys chunk and them
add device into the temporary btrfs.

When doing all chunk balance, for data and syschunk, they are almost
empty, so balance will move them into the single chunk and remove the
old RAID0 chunk.
For metadata, it has more data and will kick the metadata chunk pre
alloc, so new RAID0 chunk is allocated and the old metadata is move
there. Old RAID0 and single chunks are removed.

[FIX]
Now we add a new function to cleanup the temporary chunks at the end of
mkfs routine.
It will cleanup the chunks which is empty and its profile differs from
the mkfs profile.
So in balance, btrfs will always alloc a new chunk to keep the profile,
other than moving data into the single chunk.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 mkfs.c | 150 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 150 insertions(+)

diff --git a/mkfs.c b/mkfs.c
index b60fc5a..ee8a3cb 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1182,6 +1182,149 @@ static void list_all_devices(struct btrfs_root *root)
 	printf("\n");
 }
 
+static int is_temp_block_group(struct extent_buffer *node,
+			       struct btrfs_block_group_item *bgi,
+			       u64 data_profile, u64 meta_profile,
+			       u64 sys_profile)
+{
+	u64 flag = btrfs_disk_block_group_flags(node, bgi);
+	u64 flag_type = flag & BTRFS_BLOCK_GROUP_TYPE_MASK;
+	u64 flag_profile = flag & BTRFS_BLOCK_GROUP_PROFILE_MASK;
+	u64 used = btrfs_disk_block_group_used(node, bgi);
+
+	/*
+	 * Chunks meets all the following conditions is a temp chunk
+	 * 1) Empty chunk
+	 * Temp chunk is always empty.
+	 *
+	 * 2) profile dismatch with mkfs profile.
+	 * Temp chunk is always in SINGLE
+	 *
+	 * 3) Size differs with mkfs_alloc
+	 * Special case for SINGLE/SINGLE btrfs.
+	 * In that case, temp data chunk and real data chunk are always empty.
+	 * So we need to use mkfs_alloc to be sure which chunk is the newly
+	 * allocated.
+	 *
+	 * Normally, new chunk size is equal to mkfs one (One chunk)
+	 * If it has multiple chunks, we just refuse to delete any one.
+	 * As they are all single, so no real problem will happen.
+	 * So only use condition 1) and 2) to judge them.
+	 */
+	if (used != 0)
+		return 0;
+	switch (flag_type) {
+	case BTRFS_BLOCK_GROUP_DATA:
+	case BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA:
+		data_profile &= BTRFS_BLOCK_GROUP_PROFILE_MASK;
+		if (flag_profile != data_profile)
+			return 1;
+		break;
+	case BTRFS_BLOCK_GROUP_METADATA:
+		meta_profile &= BTRFS_BLOCK_GROUP_PROFILE_MASK;
+		if (flag_profile != meta_profile)
+			return 1;
+		break;
+	case BTRFS_BLOCK_GROUP_SYSTEM:
+		sys_profile &= BTRFS_BLOCK_GROUP_PROFILE_MASK;
+		if (flag_profile != sys_profile)
+			return 1;
+		break;
+	}
+	return 0;
+}
+
+/* Note: if current is a block group, it will skip it anyway */
+static int next_block_group(struct btrfs_root *root,
+			    struct btrfs_path *path)
+{
+	struct btrfs_key key;
+	int ret = 0;
+
+	while (1) {
+		ret = btrfs_next_item(root, path);
+		if (ret)
+			goto out;
+
+		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+		if (key.type == BTRFS_BLOCK_GROUP_ITEM_KEY)
+			goto out;
+	}
+out:
+	return ret;
+}
+
+/* This function will cleanup  */
+static int cleanup_temp_chunks(struct btrfs_fs_info *fs_info,
+			       struct mkfs_allocation *alloc,
+			       u64 data_profile, u64 meta_profile,
+			       u64 sys_profile)
+{
+	struct btrfs_trans_handle *trans = NULL;
+	struct btrfs_block_group_item *bgi;
+	struct btrfs_root *root = fs_info->extent_root;
+	struct btrfs_key key;
+	struct btrfs_key found_key;
+	struct btrfs_path *path;
+	int ret = 0;
+
+	path = btrfs_alloc_path();
+	if (!path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	trans = btrfs_start_transaction(root, 1);
+
+	key.objectid = 0;
+	key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
+	key.offset = 0;
+
+	while (1) {
+		/*
+		 * as the rest of the loop may modify the tree, we need to
+		 * start a new search each time.
+		 */
+		ret = btrfs_search_slot(trans, root, &key, path, 0, 0);
+		if (ret < 0)
+			goto out;
+
+		btrfs_item_key_to_cpu(path->nodes[0], &found_key,
+				      path->slots[0]);
+		if (found_key.objectid < key.objectid)
+			goto out;
+		if (found_key.type != BTRFS_BLOCK_GROUP_ITEM_KEY) {
+			ret = next_block_group(root, path);
+			if (ret < 0)
+				goto out;
+			if (ret > 0) {
+				ret = 0;
+				goto out;
+			}
+			btrfs_item_key_to_cpu(path->nodes[0], &found_key,
+					      path->slots[0]);
+		}
+
+		bgi = btrfs_item_ptr(path->nodes[0], path->slots[0],
+				     struct btrfs_block_group_item);
+		if (is_temp_block_group(path->nodes[0], bgi,
+					data_profile, meta_profile,
+					sys_profile)) {
+			ret = btrfs_free_block_group(trans, fs_info,
+					found_key.objectid, found_key.offset);
+			if (ret < 0)
+				goto out;
+		}
+		btrfs_release_path(path);
+		key.objectid = found_key.objectid + found_key.offset;
+	}
+out:
+	if (trans)
+		btrfs_commit_transaction(trans, root);
+	btrfs_free_path(path);
+	return ret;
+}
+
 int main(int ac, char **av)
 {
 	char *file;
@@ -1669,6 +1812,12 @@ skip_multidev:
 		ret = make_image(source_dir, root, fd);
 		BUG_ON(ret);
 	}
+	ret = cleanup_temp_chunks(root->fs_info, &allocation, data_profile,
+				  metadata_profile, metadata_profile);
+	if (ret < 0) {
+		fprintf(stderr, "Failed to cleanup temporary chunks\n");
+		goto out;
+	}
 
 	if (verbose) {
 		char features_buf[64];
@@ -1703,6 +1852,7 @@ skip_multidev:
 		list_all_devices(root);
 	}
 
+out:
 	ret = close_ctree(root);
 	BUG_ON(ret);
 	free(label);
-- 
2.4.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/7] mkfs: Remove temporary chunks
  2015-07-07  8:15 [PATCH 0/7] mkfs: Remove temporary chunks Qu Wenruo
                   ` (6 preceding siblings ...)
  2015-07-07  8:15 ` [PATCH 7/7] btrfs-progs: mkfs: Cleanup temporary chunk to avoid strange balance behavior Qu Wenruo
@ 2015-07-10 12:46 ` David Sterba
  7 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2015-07-10 12:46 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, dsterba

On Tue, Jul 07, 2015 at 04:15:21PM +0800, Qu Wenruo wrote:
[...]
> So I took current fix even it's not perfect, but most practice one.

Thanks for working on the mkfs bug. Agreed on the approach.

> Qu Wenruo (7):
>   btrfs-progs: disk-io: Support commit transaction on chunk tree.
>   btrfs-progs: extent-tree: Introduce free_block_group_item function.
>   btrfs-progs: extent-tree: Introduce functions to free dev extents in a
>     chunk
>   btrfs-progs: extent-tree: Introduce functions to free chunk items
>   btrfs-progs: extent-tree: Introduce functions to free in-memory block 
>        group cache
>   btrfs-progs: extent-tree: Introduce btrfs_free_block_group function.

All of the above merged as they're only preparatory.

>   btrfs-progs: mkfs: Cleanup temporary chunk to avoid strange balance
>     behavior.

Looks good.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 7/7] btrfs-progs: mkfs: Cleanup temporary chunk to avoid strange balance behavior.
  2015-07-07  8:15 ` [PATCH 7/7] btrfs-progs: mkfs: Cleanup temporary chunk to avoid strange balance behavior Qu Wenruo
@ 2015-07-14 17:33   ` David Sterba
  0 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2015-07-14 17:33 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, dsterba

On Tue, Jul 07, 2015 at 04:15:28PM +0800, Qu Wenruo wrote:
[...]
> 
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>

Applied, thanks a lot. I've tested several data/metadata combinations
and the resulting 'fi df' looks ok.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-07-14 17:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-07  8:15 [PATCH 0/7] mkfs: Remove temporary chunks Qu Wenruo
2015-07-07  8:15 ` [PATCH 1/7] btrfs-progs: disk-io: Support commit transaction on chunk tree Qu Wenruo
2015-07-07  8:15 ` [PATCH 2/7] btrfs-progs: extent-tree: Introduce free_block_group_item function Qu Wenruo
2015-07-07  8:15 ` [PATCH 3/7] btrfs-progs: extent-tree: Introduce functions to free dev extents in a chunk Qu Wenruo
2015-07-07  8:15 ` [PATCH 4/7] btrfs-progs: extent-tree: Introduce functions to free chunk items Qu Wenruo
2015-07-07  8:15 ` [PATCH 5/7] btrfs-progs: extent-tree: Introduce functions to free in-memory block group cache Qu Wenruo
2015-07-07  8:15 ` [PATCH 6/7] btrfs-progs: extent-tree: Introduce btrfs_free_block_group function Qu Wenruo
2015-07-07  8:15 ` [PATCH 7/7] btrfs-progs: mkfs: Cleanup temporary chunk to avoid strange balance behavior Qu Wenruo
2015-07-14 17:33   ` David Sterba
2015-07-10 12:46 ` [PATCH 0/7] mkfs: Remove temporary chunks David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.