All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 00/10] btrfs-progs: mkfs: add --seed option to sprout a seed device at mkfs time
@ 2022-04-20  0:19 Qu Wenruo
  2022-04-20  0:19 ` [PATCH RFC 01/10] btrfs-progs: refactor find_free_dev_extent_start() for later expansion Qu Wenruo
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Qu Wenruo @ 2022-04-20  0:19 UTC (permalink / raw)
  To: linux-btrfs

This branch can be fetched from github:
https://github.com/adam900710/btrfs-progs/tree/mkfs_seed

Which contains all the fixes submitted to the mailing list, along with
the patchset.


Sometime ago I purposed to make seed sprout done in user space, now here
comes the working prototype.

To support such seemingly easy feature, it's way more complex than I
thought:

- Delay chunk/dev extent items insertion
- Add a way to track above delayed items for chunk allocation
  This allows btrfs to allocate multiple chunks in the same transaction,
  and should reduce possible false ENOSPC bugs in progs.

  Although further cleanup is not in the patchset.

- Implement quirks for sprout
  * Update device item for seed device
  * Force metadata/system chunk allocation during sprout
  * Relocate system chunks to avoid seed device in sys_chunk_array
  * Remove empty seed chunks

  Those quirks all have specific helpers implemented in mkfs/sprout.c.

  Personally speaking we can enhance kernel/progs to handle seed devices
  in sys_chunk_array, but since it's the long existing behavior, I
  choose to keep the same behavior to avoid compatibility problems.

With all these solved, I intentionally add some limitation on the new
--seed option:

- Ignore -m/-d profiles specification
  Just like kernel, we inherit the old profile from the seed device.
  So if those options are set, we just output a warning and continue.

- Limit the source seed device to be a single device fs
  In fact, I don't even think multi-device seed fs is sane.
  Like a two-disks RAID1 seed fs, adding a device will force we allocate
  SINGLE chunk, as we have no way to directly add two device in one go.

- Only accept one sprout device
  This is completely an artificial limit. Just because I don't see much
  usefulness to have an completely empty device added.

- Reject --rootdir
  This is a preventive behavior, as --rootdir can easily conflicts with
  the existing context from seed device.

Currently there is only one usecase that can not be replace by the user
space sprout:

   Read-only mount of the seed device and sprout without unmount.

Which can be useful for certain liveCD usage.

However even liveCD is no longer providing old-school RO fs as root fs.
Nowadays most liveCD uses memory block device, combined with overlayfs
to provide a RW root fs.

Thus if we go that way, the user space sprout can still be a very useful 
feature.

Although the feature works fine locally, and I have already done the
patch split to contain the changeset, the main point for the patchset is
still to evaluate whether this is a solid idea.

Thus any feedback on the feature (including rejection) is welcomed.

Qu Wenruo (10):
  btrfs-progs: refactor find_free_dev_extent_start() for later expansion
  btrfs-progs: delay chunk and device extent items insertion
  btrfs-progs: mkfs: introduce helper to set seed flag
  btrfs-progs: mkfs: avoid error out if some trees exist
  btrfs-progs: extract btrfs_fs_devices structure allocation into a
    helper
  btrfs-progs: mkfs/sprout: add a helper to update generation for seed
    device
  btrfs-progs: mkfs/sprout: introduce helper to force allocating a chunk
  btrfs-progs: mkfs/sprout: introduce a helper to relocate system chunks
  btrfs-progs: mkfs/sprout: introduce a helper to remove empty system
    chunks from seed device
  btrfs-progs: mkfs: add support for seed sprout

 Documentation/mkfs.btrfs.rst |  13 +
 Makefile                     |   2 +-
 kernel-shared/ctree.h        |   2 +
 kernel-shared/extent-tree.c  |  17 +-
 kernel-shared/transaction.c  |  77 ++++++
 kernel-shared/transaction.h  |  12 +
 kernel-shared/volumes.c      | 280 ++++++++++++---------
 kernel-shared/volumes.h      |  12 +
 mkfs/main.c                  | 133 +++++++++-
 mkfs/sprout.c                | 465 +++++++++++++++++++++++++++++++++++
 mkfs/sprout.h                |  13 +
 11 files changed, 900 insertions(+), 126 deletions(-)
 create mode 100644 mkfs/sprout.c
 create mode 100644 mkfs/sprout.h

-- 
2.35.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH RFC 01/10] btrfs-progs: refactor find_free_dev_extent_start() for later expansion
  2022-04-20  0:19 [PATCH RFC 00/10] btrfs-progs: mkfs: add --seed option to sprout a seed device at mkfs time Qu Wenruo
@ 2022-04-20  0:19 ` Qu Wenruo
  2022-04-20  0:19 ` [PATCH RFC 02/10] btrfs-progs: delay chunk and device extent items insertion Qu Wenruo
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2022-04-20  0:19 UTC (permalink / raw)
  To: linux-btrfs

[CURRENT BEHAVIOR]

We iterate through all dev extents, and calcluate the hole immediately.

This is fine, but for the later incoming delayed chunk allocation, it's
pretty hard to handle both dev extents in dev tree and in delayed
chunks.

[REFACTOR]

This patch will split the search into two part:

1. Populate @used_root cache tree
   That tree will record all used device extents

2. Iterate through each cache extent to calculate holes

With such separate loops, we can easily handle multiple dev extents
source, just queue them all into @used_root.

The 2nd part will handle everything well.

Since we're here, also add a comment on why we may want to re-search
after iterating all dev extents.
(Which is for zoned device boundary)

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 kernel-shared/volumes.c | 88 ++++++++++++++++++++++++-----------------
 1 file changed, 51 insertions(+), 37 deletions(-)

diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c
index 97c09a1a4931..c61fb51c4def 100644
--- a/kernel-shared/volumes.c
+++ b/kernel-shared/volumes.c
@@ -653,58 +653,59 @@ static int find_free_dev_extent_start(struct btrfs_device *device,
 				      u64 num_bytes, u64 search_start,
 				      u64 *start, u64 *len)
 {
-	struct btrfs_key key;
 	struct btrfs_root *root = device->dev_root;
-	struct btrfs_dev_extent *dev_extent;
-	struct btrfs_path *path;
-	u64 hole_size;
+	struct btrfs_path path = { 0 };
+	struct btrfs_key key;
+	struct cache_tree used_root;
+	struct cache_extent *pe;
 	u64 max_hole_start;
 	u64 max_hole_size;
-	u64 extent_end;
 	u64 search_end = device->total_bytes;
+	u64 hole_size;
 	int ret;
-	int slot;
-	struct extent_buffer *l;
 	u64 zone_size = 0;
 
+	cache_tree_init(&used_root);
 	if (device->zone_info)
 		zone_size = device->zone_info->zone_size;
 
 	search_start = dev_extent_search_start(device, search_start);
 
-	path = btrfs_alloc_path();
-	if (!path)
-		return -ENOMEM;
+	btrfs_init_path(&path);
+	path.reada = READA_FORWARD;
 
 	max_hole_start = search_start;
 	max_hole_size = 0;
 
-again:
 	if (search_start >= search_end) {
 		ret = -ENOSPC;
 		goto out;
 	}
 
-	path->reada = READA_FORWARD;
-
 	key.objectid = device->devid;
 	key.offset = search_start;
 	key.type = BTRFS_DEV_EXTENT_KEY;
 
-	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+	ret = btrfs_search_slot(NULL, root, &key, &path, 0, 0);
 	if (ret < 0)
 		goto out;
 	if (ret > 0) {
-		ret = btrfs_previous_item(root, path, key.objectid, key.type);
+		ret = btrfs_previous_item(root, &path, key.objectid, key.type);
 		if (ret < 0)
 			goto out;
 	}
 
+	/*
+	 * Iterate through all the dev extents in the device tree, and add them
+	 * into the used tree
+	 */
 	while (1) {
-		l = path->nodes[0];
-		slot = path->slots[0];
+		struct extent_buffer *l = path.nodes[0];
+		int slot = path.slots[0];
+		struct btrfs_dev_extent *de;
+
 		if (slot >= btrfs_header_nritems(l)) {
-			ret = btrfs_next_leaf(root, path);
+			ret = btrfs_next_leaf(root, &path);
 			if (ret == 0)
 				continue;
 			if (ret < 0)
@@ -723,8 +724,27 @@ again:
 		if (key.type != BTRFS_DEV_EXTENT_KEY)
 			goto next;
 
-		if (key.offset > search_start) {
-			hole_size = key.offset - search_start;
+		/* Got a dev extent item, add it to used_root */
+		de = btrfs_item_ptr(l, slot, struct btrfs_dev_extent);
+		ret = add_merge_cache_extent(&used_root, key.offset,
+					     btrfs_dev_extent_length(l, de));
+		if (ret < 0)
+			goto out;
+next:
+		path.slots[0]++;
+		cond_resched();
+	}
+
+again:
+	/*
+	 * Now used_root contains all the dev extents. Iterate through the tree
+	 * to grab holes.
+	 */
+	for (pe = first_cache_extent(&used_root); pe;
+	     pe = next_cache_extent(pe)) {
+		if (pe->start > search_start) {
+
+			hole_size = pe->start - search_start;
 			dev_extent_hole_check(device, &search_start, &hole_size,
 					      num_bytes);
 
@@ -732,7 +752,6 @@ again:
 				max_hole_start = search_start;
 				max_hole_size = hole_size;
 			}
-
 			/*
 			 * If this free space is greater than which we need,
 			 * it must be the max free space that we have found
@@ -747,15 +766,7 @@ again:
 				goto out;
 			}
 		}
-
-		dev_extent = btrfs_item_ptr(l, slot, struct btrfs_dev_extent);
-		extent_end = key.offset + btrfs_dev_extent_length(l,
-								  dev_extent);
-		if (extent_end > search_start)
-			search_start = extent_end;
-next:
-		path->slots[0]++;
-		cond_resched();
+		search_start = max(search_start, pe->start + pe->size);
 	}
 
 	/*
@@ -765,15 +776,17 @@ next:
 	 */
 	if (search_end > search_start) {
 		hole_size = search_end - search_start;
+
+		/*
+		 * Our hole crossed zone boundary, need to re-do the search
+		 * from the zone boundary.
+		 */
 		if (dev_extent_hole_check(device, &search_start, &hole_size,
-					  num_bytes)) {
-			btrfs_release_path(path);
+					  num_bytes))
 			goto again;
-		}
-
-		if (hole_size > max_hole_size) {
-			max_hole_start = search_start;
+		if (hole_size > max_hole_start) {
 			max_hole_size = hole_size;
+			max_hole_start = search_start;
 		}
 	}
 
@@ -784,8 +797,9 @@ next:
 		ret = 0;
 
 out:
+	btrfs_release_path(&path);
+	free_extent_cache_tree(&used_root);
 	ASSERT(zone_size == 0 || IS_ALIGNED(max_hole_start, zone_size));
-	btrfs_free_path(path);
 	*start = max_hole_start;
 	if (len)
 		*len = max_hole_size;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 02/10] btrfs-progs: delay chunk and device extent items insertion
  2022-04-20  0:19 [PATCH RFC 00/10] btrfs-progs: mkfs: add --seed option to sprout a seed device at mkfs time Qu Wenruo
  2022-04-20  0:19 ` [PATCH RFC 01/10] btrfs-progs: refactor find_free_dev_extent_start() for later expansion Qu Wenruo
@ 2022-04-20  0:19 ` Qu Wenruo
  2022-04-20  0:19 ` [PATCH RFC 03/10] btrfs-progs: mkfs: introduce helper to set seed flag Qu Wenruo
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2022-04-20  0:19 UTC (permalink / raw)
  To: linux-btrfs

Currently btrfs-progs always insert chunk and device extent items at
btrfs_chunk_alloc() time.

This behavior has one limitation, if we don't have enough space for even
CoWing the chunk and device trees, then we can not allocate new chunks
to fulfill our btrfs_reserve_extent() call.

This is not a problem so far as we always make sure we have enough
space.

But it's going to cause problem for the incoming sprout support at mkfs
time.

As when sprouting the seed fs, initially there is no RW block group at
all, we must allocate new chunks to do anything.

To resolve the problem, we need to delay chunk item insertion, so that
in do_chunk_alloc() we can create new chunk mapping with new block group
cache without triggering tree block CoW.

With block group cache inserted, then we're able to call
btrfs_reserve_extent() and do regular tree block CoW or whatever.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 kernel-shared/extent-tree.c |   1 -
 kernel-shared/transaction.c |  77 ++++++++++++++++++
 kernel-shared/transaction.h |  12 +++
 kernel-shared/volumes.c     | 154 ++++++++++++++++++++----------------
 kernel-shared/volumes.h     |  10 +++
 5 files changed, 187 insertions(+), 67 deletions(-)

diff --git a/kernel-shared/extent-tree.c b/kernel-shared/extent-tree.c
index 697a8a1e4dec..da801b1d9926 100644
--- a/kernel-shared/extent-tree.c
+++ b/kernel-shared/extent-tree.c
@@ -1741,7 +1741,6 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
 		trans->allocating_chunk = 0;
 		return 0;
 	}
-
 	BUG_ON(ret);
 
 	ret = btrfs_make_block_group(trans, fs_info, 0, flags, start,
diff --git a/kernel-shared/transaction.c b/kernel-shared/transaction.c
index 56828ee1714b..eb4e2b01cd83 100644
--- a/kernel-shared/transaction.c
+++ b/kernel-shared/transaction.c
@@ -54,6 +54,8 @@ struct btrfs_trans_handle* btrfs_start_transaction(struct btrfs_root *root,
 	root->commit_root = root->node;
 	extent_buffer_get(root->node);
 	INIT_LIST_HEAD(&h->dirty_bgs);
+	INIT_LIST_HEAD(&h->new_chunks);
+	INIT_LIST_HEAD(&h->reserved_dev_extents);
 
 	return h;
 }
@@ -162,16 +164,74 @@ again:
 	return 0;
 }
 
+static int insert_items_for_one_chunk(struct btrfs_trans_handle *trans,
+				      struct map_lookup *map)
+{
+	const u64 dev_extent_len = calc_stripe_length(map->type, map->ce.size,
+						      map->num_stripes);
+	int ret;
+	int i;
+
+	/* Insert dev extents */
+	for (i = 0; i < map->num_stripes; i++) {
+		ret = btrfs_insert_dev_extent(trans, map->stripes[i].dev,
+					      map->ce.start, dev_extent_len,
+					      map->stripes[i].physical);
+		/*
+		 * Since we're delaying chunk allocation, normally there should
+		 * be no dev extent. But there are call sites like btrfs convert
+		 * manually insert dev extents before creating the chunk.
+		 *
+		 * So here we're safe to ignore -EEXIST error.
+		 */
+		if (ret == -EEXIST)
+			ret = 0;
+		if (ret < 0)
+			goto out;
+		ret = btrfs_update_device(trans, map->stripes[i].dev);
+		if (ret < 0)
+			goto out;
+
+	}
+	/* Insert chunk item */
+	ret = btrfs_insert_chunk_item(trans, map);
+out:
+	return ret;
+}
+
 int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 			     struct btrfs_root *root)
 {
 	u64 transid = trans->transid;
 	int ret = 0;
+	struct map_lookup *map;
+	struct map_lookup *tmp;
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct btrfs_space_info *sinfo;
 
 	if (trans->fs_info->transaction_aborted)
 		return -EROFS;
+
+	/* Finish the items insert for new chunks */
+	list_for_each_entry_safe(map, tmp, &trans->new_chunks, list) {
+		ret = insert_items_for_one_chunk(trans, map);
+		if (ret < 0)
+			goto error;
+		list_del_init(&map->list);
+	}
+	/*
+	 * And cleanup the reserved extents, they have been inserted into dev
+	 * tree in above insert_items_for_one_chunk().
+	 */
+	while (!list_empty(&trans->reserved_dev_extents)) {
+		struct btrfs_reserved_dev_extent *reserved;
+
+		reserved = list_entry(trans->reserved_dev_extents.next,
+				struct btrfs_reserved_dev_extent, list);
+		list_del_init(&reserved->list);
+		free(reserved);
+	}
+
 	/*
 	 * Flush all accumulated delayed refs so that root-tree updates are
 	 * consistent
@@ -249,6 +309,23 @@ error:
 	return ret;
 }
 
+int btrfs_add_reserved_device_extent(struct btrfs_trans_handle *trans,
+				     struct btrfs_device *dev, u64 physical,
+				     u64 length)
+{
+	struct btrfs_reserved_dev_extent *reserved;
+
+	reserved = malloc(sizeof(*reserved));
+	if (!reserved)
+		return -ENOMEM;
+
+	reserved->dev = dev;
+	reserved->length = length;
+	reserved->physical = physical;
+	list_add_tail(&reserved->list, &trans->reserved_dev_extents);
+	return 0;
+}
+
 void btrfs_abort_transaction(struct btrfs_trans_handle *trans, int error)
 {
 	trans->fs_info->transaction_aborted = error;
diff --git a/kernel-shared/transaction.h b/kernel-shared/transaction.h
index 599cc95408de..b325cbe8ea3e 100644
--- a/kernel-shared/transaction.h
+++ b/kernel-shared/transaction.h
@@ -37,6 +37,15 @@ struct btrfs_trans_handle {
 	struct btrfs_block_group *block_group;
 	struct btrfs_delayed_ref_root delayed_refs;
 	struct list_head dirty_bgs;
+	struct list_head new_chunks;
+	struct list_head reserved_dev_extents;
+};
+
+struct btrfs_reserved_dev_extent {
+	struct list_head list;
+	struct btrfs_device *dev;
+	u64 physical;
+	u64 length;
 };
 
 struct btrfs_trans_handle* btrfs_start_transaction(struct btrfs_root *root,
@@ -48,5 +57,8 @@ int commit_tree_roots(struct btrfs_trans_handle *trans,
 int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 			     struct btrfs_root *root);
 void btrfs_abort_transaction(struct btrfs_trans_handle *trans, int error);
+int btrfs_add_reserved_device_extent(struct btrfs_trans_handle *trans,
+				     struct btrfs_device *dev, u64 physical,
+				     u64 length);
 
 #endif
diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c
index c61fb51c4def..923e1a9378d5 100644
--- a/kernel-shared/volumes.c
+++ b/kernel-shared/volumes.c
@@ -653,6 +653,7 @@ static int find_free_dev_extent_start(struct btrfs_device *device,
 				      u64 num_bytes, u64 search_start,
 				      u64 *start, u64 *len)
 {
+	struct btrfs_trans_handle *trans = device->fs_info->running_transaction;
 	struct btrfs_root *root = device->dev_root;
 	struct btrfs_path path = { 0 };
 	struct btrfs_key key;
@@ -735,6 +736,21 @@ next:
 		cond_resched();
 	}
 
+	/* Add reserved dev extents into @used_root */
+	if (trans) {
+		struct btrfs_reserved_dev_extent *reserved;
+
+		list_for_each_entry(reserved, &trans->reserved_dev_extents,
+				    list) {
+			if (reserved->dev != device)
+				continue;
+
+			ret = add_merge_cache_extent(&used_root,
+					reserved->physical, reserved->length);
+			if (ret < 0)
+				goto out;
+		}
+	}
 again:
 	/*
 	 * Now used_root contains all the dev extents. Iterate through the tree
@@ -795,7 +811,6 @@ again:
 		ret = -ENOSPC;
 	else
 		ret = 0;
-
 out:
 	btrfs_release_path(&path);
 	free_extent_cache_tree(&used_root);
@@ -863,26 +878,12 @@ err:
 	return ret;
 }
 
-/*
- * Allocate one free dev extent and insert it into the fs.
- */
-static int btrfs_alloc_dev_extent(struct btrfs_trans_handle *trans,
-				  struct btrfs_device *device,
-				  u64 chunk_offset, u64 num_bytes, u64 *start)
-{
-	int ret;
-
-	ret = find_free_dev_extent(device, num_bytes, start, NULL);
-	if (ret)
-		return ret;
-	return btrfs_insert_dev_extent(trans, device, chunk_offset, num_bytes,
-					*start);
-}
-
 static int find_next_chunk(struct btrfs_fs_info *fs_info, u64 *offset)
 {
+	struct btrfs_trans_handle *trans = fs_info->running_transaction;
 	struct btrfs_root *root = fs_info->chunk_root;
 	struct btrfs_path *path;
+	u64 new_chunk_end = 0;
 	int ret;
 	struct btrfs_key key;
 	struct btrfs_chunk *chunk;
@@ -917,6 +918,16 @@ static int find_next_chunk(struct btrfs_fs_info *fs_info, u64 *offset)
 				btrfs_chunk_length(path->nodes[0], chunk);
 		}
 	}
+
+	/* Still need to check the new chunks to avoid conflicts */
+	if (trans && !list_empty(&trans->new_chunks)) {
+		struct map_lookup *map;
+
+		list_for_each_entry(map, &trans->new_chunks, list)
+			new_chunk_end = max(new_chunk_end, map->ce.start +
+					    map->ce.size);
+		*offset = max(*offset, new_chunk_end);
+	}
 	ret = 0;
 error:
 	btrfs_free_path(path);
@@ -1362,10 +1373,7 @@ static int create_chunk(struct btrfs_trans_handle *trans,
 			struct btrfs_fs_info *info, struct alloc_chunk_ctl *ctl,
 			struct list_head *private_devs)
 {
-	struct btrfs_root *chunk_root = info->chunk_root;
-	struct btrfs_stripe *stripes;
 	struct btrfs_device *device = NULL;
-	struct btrfs_chunk *chunk;
 	struct list_head *dev_list = &info->fs_devices->devices;
 	struct list_head *cur;
 	struct map_lookup *map;
@@ -1387,22 +1395,14 @@ static int create_chunk(struct btrfs_trans_handle *trans,
 	key.type = BTRFS_CHUNK_ITEM_KEY;
 	key.offset = offset;
 
-	chunk = kmalloc(btrfs_chunk_item_size(ctl->num_stripes), GFP_NOFS);
-	if (!chunk)
-		return -ENOMEM;
-
 	map = kmalloc(btrfs_map_lookup_size(ctl->num_stripes), GFP_NOFS);
-	if (!map) {
-		kfree(chunk);
+	if (!map)
 		return -ENOMEM;
-	}
 
-	stripes = &chunk->stripe;
 	ctl->num_bytes = chunk_bytes_by_type(ctl);
 	index = 0;
 	while (index < ctl->num_stripes) {
 		u64 dev_offset;
-		struct btrfs_stripe *stripe;
 
 		BUG_ON(list_empty(private_devs));
 		cur = private_devs->next;
@@ -1414,45 +1414,30 @@ static int create_chunk(struct btrfs_trans_handle *trans,
 			list_move(&device->dev_list, dev_list);
 
 		if (!ctl->dev_offset) {
-			ret = btrfs_alloc_dev_extent(trans, device, key.offset,
-					ctl->stripe_size, &dev_offset);
+			ret = find_free_dev_extent(device, ctl->stripe_size, &dev_offset, NULL);
+			if (ret < 0)
+				goto out_chunk_map;
+			/*
+			 * Add this dev extent to trans::reserved_dev_ext, to
+			 * prevent allocation from the allocated one.
+			 */
+			ret = btrfs_add_reserved_device_extent(trans, device,
+					dev_offset, ctl->stripe_size);
 			if (ret < 0)
 				goto out_chunk_map;
 		} else {
 			dev_offset = ctl->dev_offset;
-			ret = btrfs_insert_dev_extent(trans, device, key.offset,
-						      ctl->stripe_size,
-						      ctl->dev_offset);
-			BUG_ON(ret);
 		}
 
 		ASSERT(!zone_size || IS_ALIGNED(dev_offset, zone_size));
 
 		device->bytes_used += ctl->stripe_size;
-		ret = btrfs_update_device(trans, device);
-		if (ret < 0)
-			goto out_chunk_map;
-
 		map->stripes[index].dev = device;
 		map->stripes[index].physical = dev_offset;
-		stripe = stripes + index;
-		btrfs_set_stack_stripe_devid(stripe, device->devid);
-		btrfs_set_stack_stripe_offset(stripe, dev_offset);
-		memcpy(stripe->dev_uuid, device->uuid, BTRFS_UUID_SIZE);
 		index++;
 	}
 	BUG_ON(!list_empty(private_devs));
 
-	/* key was set above */
-	btrfs_set_stack_chunk_length(chunk, ctl->num_bytes);
-	btrfs_set_stack_chunk_owner(chunk, BTRFS_EXTENT_TREE_OBJECTID);
-	btrfs_set_stack_chunk_stripe_len(chunk, BTRFS_STRIPE_LEN);
-	btrfs_set_stack_chunk_type(chunk, ctl->type);
-	btrfs_set_stack_chunk_num_stripes(chunk, ctl->num_stripes);
-	btrfs_set_stack_chunk_io_align(chunk, BTRFS_STRIPE_LEN);
-	btrfs_set_stack_chunk_io_width(chunk, BTRFS_STRIPE_LEN);
-	btrfs_set_stack_chunk_sector_size(chunk, info->sectorsize);
-	btrfs_set_stack_chunk_sub_stripes(chunk, ctl->sub_stripes);
 	map->sector_size = info->sectorsize;
 	map->stripe_len = BTRFS_STRIPE_LEN;
 	map->io_align = BTRFS_STRIPE_LEN;
@@ -1461,9 +1446,6 @@ static int create_chunk(struct btrfs_trans_handle *trans,
 	map->num_stripes = ctl->num_stripes;
 	map->sub_stripes = ctl->sub_stripes;
 
-	ret = btrfs_insert_item(trans, chunk_root, &key, chunk,
-				btrfs_chunk_item_size(ctl->num_stripes));
-	BUG_ON(ret);
 	ctl->start = key.offset;
 
 	map->ce.start = key.offset;
@@ -1472,21 +1454,16 @@ static int create_chunk(struct btrfs_trans_handle *trans,
 	ret = insert_cache_extent(&info->mapping_tree.cache_tree, &map->ce);
 	if (ret < 0)
 		goto out_chunk_map;
+	/*
+	 * Add the new chunk to new_chunks list so at commit trans time we can
+	 * finish the items insert.
+	 */
+	list_add(&map->list, &trans->new_chunks);
 
-	if (ctl->type & BTRFS_BLOCK_GROUP_SYSTEM) {
-		ret = btrfs_add_system_chunk(info, &key,
-			    chunk, btrfs_chunk_item_size(ctl->num_stripes));
-		if (ret < 0)
-			goto out_chunk;
-	}
-
-	kfree(chunk);
 	return ret;
 
 out_chunk_map:
 	kfree(map);
-out_chunk:
-	kfree(chunk);
 	return ret;
 }
 
@@ -1594,6 +1571,51 @@ again:
 	return ret;
 }
 
+int btrfs_insert_chunk_item(struct btrfs_trans_handle *trans,
+			    struct map_lookup *map)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_stripe *stripe;
+	struct btrfs_chunk *chunk;
+	struct btrfs_key key;
+	int i;
+	int ret;
+
+	chunk = malloc(btrfs_chunk_item_size(map->num_stripes));
+	if (!chunk)
+		return -ENOMEM;
+
+	btrfs_set_stack_chunk_length(chunk, map->ce.size);
+	btrfs_set_stack_chunk_owner(chunk, BTRFS_EXTENT_TREE_OBJECTID);
+	btrfs_set_stack_chunk_stripe_len(chunk, map->stripe_len);
+	btrfs_set_stack_chunk_type(chunk, map->type);
+	btrfs_set_stack_chunk_num_stripes(chunk, map->num_stripes);
+	btrfs_set_stack_chunk_io_align(chunk, BTRFS_STRIPE_LEN);
+	btrfs_set_stack_chunk_io_width(chunk, BTRFS_STRIPE_LEN);
+	btrfs_set_stack_chunk_sector_size(chunk, fs_info->sectorsize);
+	btrfs_set_stack_chunk_sub_stripes(chunk, map->sub_stripes);
+	for (i = 0, stripe = &chunk->stripe; i < map->num_stripes;
+	     i++, stripe++) {
+		struct btrfs_device *device = map->stripes[i].dev;
+
+		btrfs_set_stack_stripe_devid(stripe, device->devid);
+		btrfs_set_stack_stripe_offset(stripe, map->stripes[i].physical);
+		memcpy(stripe->dev_uuid, device->uuid, BTRFS_UUID_SIZE);
+	}
+
+	key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
+	key.type = BTRFS_CHUNK_ITEM_KEY;
+	key.offset = map->ce.start;
+	ret = btrfs_insert_item(trans, fs_info->chunk_root, &key, chunk,
+				btrfs_chunk_item_size(map->num_stripes));
+	if (ret < 0)
+		return ret;
+	if (map->type & BTRFS_BLOCK_GROUP_SYSTEM)
+		ret = btrfs_add_system_chunk(fs_info, &key, chunk,
+				btrfs_chunk_item_size(map->num_stripes));
+	return ret;
+}
+
 /*
  * Alloc a DATA chunk with SINGLE profile.
  *
diff --git a/kernel-shared/volumes.h b/kernel-shared/volumes.h
index 6e9103a933b7..2beae2d02fad 100644
--- a/kernel-shared/volumes.h
+++ b/kernel-shared/volumes.h
@@ -113,6 +113,14 @@ struct btrfs_multi_bio {
 
 struct map_lookup {
 	struct cache_extent ce;
+
+	/*
+	 * Newly allocated chunk map will be added to trans::new_chunks,
+	 * and its chunk/dev_extent/block_group items will be inserted into
+	 * the trees at transaction commit time.
+	 */
+	struct list_head list;
+
 	u64 type;
 	int io_align;
 	int io_width;
@@ -264,6 +272,8 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		      u64 *num_bytes, u64 type);
 int btrfs_alloc_data_chunk(struct btrfs_trans_handle *trans,
 			   struct btrfs_fs_info *fs_info, u64 *start, u64 num_bytes);
+int btrfs_insert_chunk_item(struct btrfs_trans_handle *trans,
+			    struct map_lookup *map);
 int btrfs_open_devices(struct btrfs_fs_info *fs_info,
 		       struct btrfs_fs_devices *fs_devices, int flags);
 int btrfs_close_devices(struct btrfs_fs_devices *fs_devices);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 03/10] btrfs-progs: mkfs: introduce helper to set seed flag
  2022-04-20  0:19 [PATCH RFC 00/10] btrfs-progs: mkfs: add --seed option to sprout a seed device at mkfs time Qu Wenruo
  2022-04-20  0:19 ` [PATCH RFC 01/10] btrfs-progs: refactor find_free_dev_extent_start() for later expansion Qu Wenruo
  2022-04-20  0:19 ` [PATCH RFC 02/10] btrfs-progs: delay chunk and device extent items insertion Qu Wenruo
@ 2022-04-20  0:19 ` Qu Wenruo
  2022-04-20  0:19 ` [PATCH RFC 04/10] btrfs-progs: mkfs: avoid error out if some trees exist Qu Wenruo
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2022-04-20  0:19 UTC (permalink / raw)
  To: linux-btrfs

The new helper, prepare_seed_device(), will be used for later mkfs time
seed sprouting.

Although it has way more checks than btrfstune:

- csum_type/sectorsize/nodesize/features checks
  Any mismatch means we can not use that seed device
  Normally it should not be a problem for default mkfs profiles,
  but since we're going to do the sprout at mkfs time, we must
  do these checks.

- Device number check
  I see no reason nor use-case to support nested/multi-device seed at
  mkfs time.

Currently mkfs.btrfs will accept --seed (undocumented, and experimental)
and call the helper to set seed flag on the target, but will not really
do the sprout yet.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 Makefile      |   2 +-
 mkfs/main.c   |  17 ++++++++
 mkfs/sprout.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++++++
 mkfs/sprout.h |  10 +++++
 4 files changed, 142 insertions(+), 1 deletion(-)
 create mode 100644 mkfs/sprout.c
 create mode 100644 mkfs/sprout.h

diff --git a/Makefile b/Makefile
index af4908f9d8de..4e56326e4746 100644
--- a/Makefile
+++ b/Makefile
@@ -226,7 +226,7 @@ libbtrfsutil_objects = libbtrfsutil/errors.o libbtrfsutil/filesystem.o \
 convert_objects = convert/main.o convert/common.o convert/source-fs.o \
 		  convert/source-ext2.o convert/source-reiserfs.o \
 		  mkfs/common.o
-mkfs_objects = mkfs/main.o mkfs/common.o mkfs/rootdir.o
+mkfs_objects = mkfs/main.o mkfs/common.o mkfs/rootdir.o mkfs/sprout.o
 image_objects = image/main.o image/sanitize.o
 all_objects = $(objects) $(cmds_objects) $(libbtrfs_objects) $(convert_objects) \
 	      $(mkfs_objects) $(image_objects) $(libbtrfsutil_objects)
diff --git a/mkfs/main.c b/mkfs/main.c
index 4e0a46a77aa5..7b7793f8b996 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -48,6 +48,7 @@
 #include "common/parse-utils.h"
 #include "mkfs/common.h"
 #include "mkfs/rootdir.h"
+#include "mkfs/sprout.h"
 #include "common/fsfeatures.h"
 #include "common/box.h"
 #include "common/units.h"
@@ -999,6 +1000,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 	bool force_overwrite = false;
 	int oflags;
 	char *source_dir = NULL;
+	char *seed_dev = NULL;
 	bool source_dir_set = false;
 	bool shrink_rootdir = false;
 	u64 source_dir_size = 0;
@@ -1024,6 +1026,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 			GETOPT_VAL_SHRINK = 257,
 			GETOPT_VAL_CHECKSUM,
 			GETOPT_VAL_GLOBAL_ROOTS,
+			GETOPT_VAL_SEED_DEV,
 		};
 		static const struct option long_options[] = {
 			{ "byte-count", required_argument, NULL, 'b' },
@@ -1050,6 +1053,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 			{ "shrink", no_argument, NULL, GETOPT_VAL_SHRINK },
 #if EXPERIMENTAL
 			{ "num-global-roots", required_argument, NULL, GETOPT_VAL_GLOBAL_ROOTS },
+			{ "seed", required_argument, NULL, GETOPT_VAL_SEED_DEV },
 #endif
 			{ "help", no_argument, NULL, GETOPT_VAL_HELP },
 			{ NULL, 0, NULL, 0}
@@ -1158,6 +1162,9 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 			case GETOPT_VAL_GLOBAL_ROOTS:
 				nr_global_roots = (int)arg_strtou64(optarg);
 				break;
+			case GETOPT_VAL_SEED_DEV:
+				seed_dev = optarg;
+				break;
 			case GETOPT_VAL_HELP:
 			default:
 				print_usage(c != GETOPT_VAL_HELP);
@@ -1207,6 +1214,16 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 		}
 	}
 
+	if (seed_dev) {
+		ret = prepare_seed_device(seed_dev, features, csum_type,
+					  sectorsize, nodesize);
+		if (ret < 0) {
+			errno = -ret;
+			error("faield to set seed flag on %s: %m", seed_dev);
+			goto error;
+		}
+	}
+
 	while (dev_cnt-- > 0) {
 		file = argv[optind++];
 		if (source_dir_set && path_exists(file) == 0)
diff --git a/mkfs/sprout.c b/mkfs/sprout.c
new file mode 100644
index 000000000000..eb423d082c7c
--- /dev/null
+++ b/mkfs/sprout.c
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include "kernel-shared/ctree.h"
+#include "kernel-shared/disk-io.h"
+#include "kernel-shared/volumes.h"
+#include "kernel-shared/transaction.h"
+#include "common/messages.h"
+#include "mkfs/common.h"
+
+int prepare_seed_device(const char *path, u64 features, u32 csum_type,
+			u32 sectorsize, u32 nodesize)
+{
+	struct open_ctree_flags ocf = { 0 };
+	struct btrfs_trans_handle *trans;
+	struct btrfs_fs_devices *fs_devs;
+	struct btrfs_fs_info *fs_info;
+	int nr_devs = 0;
+	int ret;
+
+	ocf.filename = path;
+	ocf.flags = OPEN_CTREE_WRITES;
+
+	fs_info = open_ctree_fs_info(&ocf);
+	if (!fs_info) {
+		error("can not open btrfs on %s", path);
+		return -EINVAL;
+	}
+
+	fs_devs = fs_info->fs_devices;
+	while (fs_devs) {
+		struct list_head *list;
+
+		list_for_each(list, &fs_devs->devices)
+			nr_devs++;
+		fs_devs = fs_devs->seed;
+	}
+
+	/*
+	 * Multi-device seed is not recommended, we just reject them.
+	 * This also rejects sported fs which still has seed attached.
+	 */
+	if (nr_devs > 1) {
+		ret = -EINVAL;
+		error("the seed filesystem has multiple devices, have %u expect 1",
+			nr_devs);
+		goto out;
+	}
+
+	/* METADATA_UUID feature should not be enabled on seed device */
+	if (btrfs_fs_incompat(fs_info, METADATA_UUID)) {
+		ret = -EINVAL;
+		error("the seed filesystem can not have METADATA_UUID feature enabled");
+		goto out;
+	}
+	/* Transient device can not be seed target */
+	if (btrfs_super_flags(fs_info->super_copy) &
+	    (BTRFS_SUPER_FLAG_CHANGING_CSUM |
+	     BTRFS_SUPER_FLAG_CHANGING_FSID |
+	     BTRFS_SUPER_FLAG_CHANGING_FSID_V2 |
+	     BTRFS_SUPER_FLAG_METADUMP |
+	     BTRFS_SUPER_FLAG_METADUMP_V2)) {
+		ret = -EINVAL;
+		error("the seed filesystem has transient flags: 0x%llx",
+		      btrfs_super_flags(fs_info->super_copy));
+		goto out;
+	}
+	/*
+	 * Make sure the seed device matches all the criteria
+	 * For incompat flags, we only require our target features is a subset
+	 * of the seed device.
+	 */
+	if (fs_info->sectorsize != sectorsize ||
+	    fs_info->nodesize != nodesize ||
+	    fs_info->csum_type != csum_type ||
+	    ~btrfs_super_incompat_flags(fs_info->super_copy) & features) {
+		ret = -EINVAL;
+		error("the seed filesystem parameters don't match the target");
+		error("  seed features=0x%llx csum_type=%u sectorsize=%u nodesize=%u",
+			btrfs_super_incompat_flags(fs_info->super_copy),
+			fs_info->csum_type, fs_info->sectorsize,
+			fs_info->nodesize);
+		error("  target features=0x%llx csum_type=%u sectorsize=%u nodesize=%u",
+			features, csum_type, sectorsize, nodesize);
+		goto out;
+	}
+
+	/* Already has seed flag */
+	if (btrfs_super_flags(fs_info->super_copy) & BTRFS_SUPER_FLAG_SEEDING) {
+		ret = 0;
+		goto out;
+	}
+
+	/* All check passed, set seed flag */
+	trans = btrfs_start_transaction(fs_info->tree_root, 0);
+	if (IS_ERR(trans)) {
+		ret = PTR_ERR(trans);
+		errno = -ret;
+		error("failed to start transaction for setting seed flag: %m");
+		goto out;
+	}
+	btrfs_set_super_flags(fs_info->super_copy, BTRFS_SUPER_FLAG_SEEDING |
+			      btrfs_super_flags(fs_info->super_copy));
+	ret = btrfs_commit_transaction(trans, fs_info->tree_root);
+	if (ret < 0) {
+		errno = -ret;
+		error("failed to commit transaction for setting seed flag: %m");
+		goto out;
+	} else {
+		printf("Seed flag set for %s\n", path);
+	}
+out:
+	close_ctree(fs_info->tree_root);
+	return ret;
+}
diff --git a/mkfs/sprout.h b/mkfs/sprout.h
new file mode 100644
index 000000000000..2e8b794c93e4
--- /dev/null
+++ b/mkfs/sprout.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/* Functions only used for seed sprout at mkfs time. */
+#ifndef __BTRFS_MKFS_SPROUT_H__
+#define __BTRFS_MKFS_SPROUT_H__
+
+int prepare_seed_device(const char *path, u64 features, u32 csum_type,
+			u32 sectorsize, u32 nodesize);
+
+#endif
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 04/10] btrfs-progs: mkfs: avoid error out if some trees exist
  2022-04-20  0:19 [PATCH RFC 00/10] btrfs-progs: mkfs: add --seed option to sprout a seed device at mkfs time Qu Wenruo
                   ` (2 preceding siblings ...)
  2022-04-20  0:19 ` [PATCH RFC 03/10] btrfs-progs: mkfs: introduce helper to set seed flag Qu Wenruo
@ 2022-04-20  0:19 ` Qu Wenruo
  2022-04-20  0:19 ` [PATCH RFC 05/10] btrfs-progs: extract btrfs_fs_devices structure allocation into a helper Qu Wenruo
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2022-04-20  0:19 UTC (permalink / raw)
  To: linux-btrfs

With the incoming seed sprout support at mkfs time, we can have quite
some trees already exist, those trees includes:

- data reloc tree
- uuid tree
- quota tree
- root dir

Handle the existing tress properly so we won't error out just because
the seed device already have the same tree.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 mkfs/main.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 72 insertions(+), 2 deletions(-)

diff --git a/mkfs/main.c b/mkfs/main.c
index 7b7793f8b996..ca035cbb27f7 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -712,6 +712,33 @@ static void update_chunk_allocation(struct btrfs_fs_info *fs_info,
 	}
 }
 
+/*
+ * Check if we already have an existing tree from seed device.
+ *
+ * Return >0 if we have seed device and already have the tree.
+ * Return 0 if we don't have seed device or the tree doesn't exist.
+ * Return <0 for error.
+ */
+static int check_seed_existing_tree(struct btrfs_fs_info *fs_info,
+				    struct btrfs_key *root_key)
+{
+	struct btrfs_path path;
+	int ret;
+
+	if (!fs_info->fs_devices->seed)
+		return 0;
+
+	btrfs_init_path(&path);
+	ret = btrfs_search_slot(NULL, fs_info->tree_root, root_key, &path, 0, 0);
+	btrfs_release_path(&path);
+	if (ret < 0)
+		return ret;
+
+	if (ret == 0)
+		return 1;
+	return 0;
+}
+
 static int create_data_reloc_tree(struct btrfs_trans_handle *trans)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
@@ -726,6 +753,12 @@ static int create_data_reloc_tree(struct btrfs_trans_handle *trans)
 	char *name = "..";
 	int ret;
 
+	ret = check_seed_existing_tree(fs_info, &key);
+	if (ret < 0)
+		return ret;
+	if (ret > 0)
+		return 0;
+
 	root = btrfs_create_tree(trans, fs_info, &key);
 	if (IS_ERR(root)) {
 		ret = PTR_ERR(root);
@@ -792,7 +825,12 @@ static int create_uuid_tree(struct btrfs_trans_handle *trans)
 	};
 	int ret = 0;
 
-	ASSERT(fs_info->uuid_root == NULL);
+	ret = check_seed_existing_tree(fs_info, &key);
+	if (ret < 0)
+		return ret;
+	if (ret > 0)
+		return 0;
+
 	root = btrfs_create_tree(trans, fs_info, &key);
 	if (IS_ERR(root)) {
 		ret = PTR_ERR(root);
@@ -896,7 +934,7 @@ static int setup_quota_root(struct btrfs_fs_info *fs_info)
 {
 	struct btrfs_trans_handle *trans;
 	struct btrfs_qgroup_status_item *qsi;
-	struct btrfs_root *quota_root;
+	struct btrfs_root *quota_root = fs_info->quota_root;
 	struct btrfs_path path;
 	struct btrfs_key key;
 	int qgroup_repaired = 0;
@@ -909,6 +947,15 @@ static int setup_quota_root(struct btrfs_fs_info *fs_info)
 		error("failed to start transaction: %d (%m)", ret);
 		return ret;
 	}
+	key.objectid = BTRFS_QUOTA_TREE_OBJECTID;
+	key.type = BTRFS_ROOT_ITEM_KEY;
+	key.offset = 0;
+	ret = check_seed_existing_tree(fs_info, &key);
+	if (ret < 0)
+		goto fail;
+	if (ret > 0)
+		goto insert_status;
+
 	ret = btrfs_create_root(trans, fs_info, BTRFS_QUOTA_TREE_OBJECTID);
 	if (ret < 0) {
 		error("failed to create quota root: %d (%m)", ret);
@@ -927,6 +974,25 @@ static int setup_quota_root(struct btrfs_fs_info *fs_info)
 		error("failed to insert qgroup status item: %d (%m)", ret);
 		goto fail;
 	}
+	btrfs_release_path(&path);
+
+insert_status:
+	/*
+	 * We reach here either we're creating a new quota root, or using
+	 * the existing quota root from seed.
+	 * So here we intentionally do a search, other than reusing the
+	 * inserted item, to handle both cases well.
+	 */
+	key.objectid = 0;
+	key.type = BTRFS_QGROUP_STATUS_KEY;
+	key.offset = 0;
+	ret = btrfs_search_slot(trans, quota_root, &key, &path, 1, 0);
+	if (ret > 0)
+		ret = -ENOENT;
+	if (ret < 0) {
+		btrfs_release_path(&path);
+		goto fail;
+	}
 
 	qsi = btrfs_item_ptr(path.nodes[0], path.slots[0],
 			     struct btrfs_qgroup_status_item);
@@ -941,6 +1007,8 @@ static int setup_quota_root(struct btrfs_fs_info *fs_info)
 
 	/* Currently mkfs will only create one subvolume */
 	ret = insert_qgroup_items(trans, fs_info, BTRFS_FS_TREE_OBJECTID);
+	if (ret == -EEXIST)
+		ret = 0;
 	if (ret < 0) {
 		error("failed to insert qgroup items: %d (%m)", ret);
 		goto fail;
@@ -1554,6 +1622,8 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 	}
 
 	ret = make_root_dir(trans, root);
+	if (ret == -EEXIST && root->fs_info->fs_devices->seed)
+		return 0;
 	if (ret) {
 		error("failed to setup the root directory: %d", ret);
 		goto error;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 05/10] btrfs-progs: extract btrfs_fs_devices structure allocation into a helper
  2022-04-20  0:19 [PATCH RFC 00/10] btrfs-progs: mkfs: add --seed option to sprout a seed device at mkfs time Qu Wenruo
                   ` (3 preceding siblings ...)
  2022-04-20  0:19 ` [PATCH RFC 04/10] btrfs-progs: mkfs: avoid error out if some trees exist Qu Wenruo
@ 2022-04-20  0:19 ` Qu Wenruo
  2022-04-20  0:19 ` [PATCH RFC 06/10] btrfs-progs: mkfs/sprout: add a helper to update generation for seed device Qu Wenruo
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2022-04-20  0:19 UTC (permalink / raw)
  To: linux-btrfs

The new helper function, btrfs_alloc_fs_devices() will allocate and
initialize a btrfs_fs_devices structure.

This helper will be later used by seed sprout, as we will need to create
a new fs_devices for the sproted fs.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 kernel-shared/volumes.c | 38 +++++++++++++++++++++++++++-----------
 kernel-shared/volumes.h |  2 ++
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c
index 923e1a9378d5..fe4e61710951 100644
--- a/kernel-shared/volumes.c
+++ b/kernel-shared/volumes.c
@@ -331,6 +331,31 @@ static struct btrfs_fs_devices *find_fsid(u8 *fsid, u8 *metadata_uuid)
 	return NULL;
 }
 
+struct btrfs_fs_devices *btrfs_alloc_fs_devices(const u8 *fsid,
+						const u8 *metadata_uuid)
+{
+	struct btrfs_fs_devices *fs_devices;
+
+	fs_devices = kzalloc(sizeof(*fs_devices), GFP_NOFS);
+	if (!fs_devices)
+		return NULL;
+
+	INIT_LIST_HEAD(&fs_devices->devices);
+	list_add(&fs_devices->list, &fs_uuids);
+	memcpy(fs_devices->fsid, fsid, BTRFS_FSID_SIZE);
+
+	if (metadata_uuid)
+		memcpy(fs_devices->metadata_uuid, metadata_uuid,
+		       BTRFS_FSID_SIZE);
+	else
+		memcpy(fs_devices->metadata_uuid, fsid, BTRFS_FSID_SIZE);
+
+	fs_devices->latest_trans = 0;
+	fs_devices->latest_devid = (u64)-1;
+	fs_devices->lowest_devid = (u64)-1;
+	return fs_devices;
+}
+
 static int device_list_add(const char *path,
 			   struct btrfs_super_block *disk_super,
 			   u64 devid, struct btrfs_fs_devices **fs_devices_ret)
@@ -348,22 +373,13 @@ static int device_list_add(const char *path,
 		fs_devices = find_fsid(disk_super->fsid, NULL);
 
 	if (!fs_devices) {
-		fs_devices = kzalloc(sizeof(*fs_devices), GFP_NOFS);
+		fs_devices = btrfs_alloc_fs_devices(disk_super->fsid,
+				metadata_uuid ? disk_super->metadata_uuid : NULL);
 		if (!fs_devices)
 			return -ENOMEM;
-		INIT_LIST_HEAD(&fs_devices->devices);
-		list_add(&fs_devices->list, &fs_uuids);
-		memcpy(fs_devices->fsid, disk_super->fsid, BTRFS_FSID_SIZE);
-		if (metadata_uuid)
-			memcpy(fs_devices->metadata_uuid,
-			       disk_super->metadata_uuid, BTRFS_FSID_SIZE);
-		else
-			memcpy(fs_devices->metadata_uuid, fs_devices->fsid,
-			       BTRFS_FSID_SIZE);
 
 		fs_devices->latest_devid = devid;
 		fs_devices->latest_trans = found_transid;
-		fs_devices->lowest_devid = (u64)-1;
 		fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_REGULAR;
 		device = NULL;
 	} else {
diff --git a/kernel-shared/volumes.h b/kernel-shared/volumes.h
index 2beae2d02fad..f9e564e4dc5e 100644
--- a/kernel-shared/volumes.h
+++ b/kernel-shared/volumes.h
@@ -276,6 +276,8 @@ int btrfs_insert_chunk_item(struct btrfs_trans_handle *trans,
 			    struct map_lookup *map);
 int btrfs_open_devices(struct btrfs_fs_info *fs_info,
 		       struct btrfs_fs_devices *fs_devices, int flags);
+struct btrfs_fs_devices *btrfs_alloc_fs_devices(const u8 *fsid,
+						const u8 *metadata_uuid);
 int btrfs_close_devices(struct btrfs_fs_devices *fs_devices);
 void btrfs_close_all_devices(void);
 int btrfs_insert_dev_extent(struct btrfs_trans_handle *trans,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 06/10] btrfs-progs: mkfs/sprout: add a helper to update generation for seed device
  2022-04-20  0:19 [PATCH RFC 00/10] btrfs-progs: mkfs: add --seed option to sprout a seed device at mkfs time Qu Wenruo
                   ` (4 preceding siblings ...)
  2022-04-20  0:19 ` [PATCH RFC 05/10] btrfs-progs: extract btrfs_fs_devices structure allocation into a helper Qu Wenruo
@ 2022-04-20  0:19 ` Qu Wenruo
  2022-04-20  0:19 ` [PATCH RFC 07/10] btrfs-progs: mkfs/sprout: introduce helper to force allocating a chunk Qu Wenruo
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2022-04-20  0:19 UTC (permalink / raw)
  To: linux-btrfs

For non-sprouted fs, generation of a device item is always 0.

But for sprouted fs, we use btrfs_dev_item::generation to store the
generation of expected seed device generation.

Here we introduce a helper to update the generation of seed device, for
the incoming seed sprout at mkfs time.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 mkfs/sprout.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/mkfs/sprout.c b/mkfs/sprout.c
index eb423d082c7c..5977a73644f5 100644
--- a/mkfs/sprout.c
+++ b/mkfs/sprout.c
@@ -112,3 +112,40 @@ out:
 	close_ctree(fs_info->tree_root);
 	return ret;
 }
+
+static int update_seed_dev_geneartion(struct btrfs_trans_handle *trans)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_fs_devices *seed_devs = fs_info->fs_devices->seed;
+	struct btrfs_device *dev;
+	struct btrfs_path path;
+	int ret;
+
+	ASSERT(seed_devs);
+
+	btrfs_init_path(&path);
+	list_for_each_entry(dev, &seed_devs->devices, dev_list) {
+		struct btrfs_dev_item *di;
+		struct btrfs_key key;
+
+		key.objectid = BTRFS_DEV_ITEMS_OBJECTID;
+		key.type = BTRFS_DEV_ITEM_KEY;
+		key.offset = dev->devid;
+
+		ret = btrfs_search_slot(trans, fs_info->chunk_root, &key, &path,
+					0, 1);
+		if (ret > 0)
+			ret = -ENOENT;
+		if (ret < 0)
+			break;
+
+		di = btrfs_item_ptr(path.nodes[0], path.slots[0],
+				    struct btrfs_dev_item);
+		btrfs_set_device_generation(path.nodes[0], di, dev->generation);
+		btrfs_mark_buffer_dirty(path.nodes[0]);
+		btrfs_release_path(&path);
+	}
+	btrfs_release_path(&path);
+	return ret;
+}
+
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 07/10] btrfs-progs: mkfs/sprout: introduce helper to force allocating a chunk
  2022-04-20  0:19 [PATCH RFC 00/10] btrfs-progs: mkfs: add --seed option to sprout a seed device at mkfs time Qu Wenruo
                   ` (5 preceding siblings ...)
  2022-04-20  0:19 ` [PATCH RFC 06/10] btrfs-progs: mkfs/sprout: add a helper to update generation for seed device Qu Wenruo
@ 2022-04-20  0:19 ` Qu Wenruo
  2022-04-20  0:19 ` [PATCH RFC 08/10] btrfs-progs: mkfs/sprout: introduce a helper to relocate system chunks Qu Wenruo
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2022-04-20  0:19 UTC (permalink / raw)
  To: linux-btrfs

The new helper, sprout_alloc_one_chunk(), will force allocating a chunk,
mostly using the profile from the seed device.

With the delayed chunk allocation, we really only need to grab the
target profile, and call btrfs_alloc_chunk() followed by
btrfs_make_block_group().

For mixed block group, we need btrfs_space_info to determine if a
profile needs extra types, so here we expoert btrfs_find_space_info().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 kernel-shared/ctree.h       |  2 ++
 kernel-shared/extent-tree.c | 16 ++++++++--------
 mkfs/sprout.c               | 38 +++++++++++++++++++++++++++++++++++++
 3 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
index 68943ff294cc..99ed81a9790a 100644
--- a/kernel-shared/ctree.h
+++ b/kernel-shared/ctree.h
@@ -2614,6 +2614,8 @@ u64 btrfs_name_hash(const char *name, int len);
 u64 btrfs_extref_hash(u64 parent_objectid, const char *name, int len);
 
 /* extent-tree.c */
+struct btrfs_space_info *btrfs_find_space_info(struct btrfs_fs_info *info,
+					       u64 flags);
 int btrfs_reserve_extent(struct btrfs_trans_handle *trans,
 			 struct btrfs_root *root,
 			 u64 num_bytes, u64 empty_size,
diff --git a/kernel-shared/extent-tree.c b/kernel-shared/extent-tree.c
index da801b1d9926..63ace7cea681 100644
--- a/kernel-shared/extent-tree.c
+++ b/kernel-shared/extent-tree.c
@@ -1592,8 +1592,8 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans)
 	return ret;
 }
 
-static struct btrfs_space_info *__find_space_info(struct btrfs_fs_info *info,
-						  u64 flags)
+struct btrfs_space_info *btrfs_find_space_info(struct btrfs_fs_info *info,
+					       u64 flags)
 {
 	struct btrfs_space_info *found;
 
@@ -1617,7 +1617,7 @@ static int free_space_info(struct btrfs_fs_info *fs_info, u64 flags,
 	if (bytes_used)
 		return -ENOTEMPTY;
 
-	found = __find_space_info(fs_info, flags);
+	found = btrfs_find_space_info(fs_info, flags);
 	if (!found)
 		return -ENOENT;
 	if (found->total_bytes < total_bytes) {
@@ -1638,7 +1638,7 @@ int update_space_info(struct btrfs_fs_info *info, u64 flags,
 {
 	struct btrfs_space_info *found;
 
-	found = __find_space_info(info, flags);
+	found = btrfs_find_space_info(info, flags);
 	if (found) {
 		found->total_bytes += total_bytes;
 		found->bytes_used += bytes_used;
@@ -1694,7 +1694,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
 	u64 num_bytes;
 	int ret;
 
-	space_info = __find_space_info(fs_info, flags);
+	space_info = btrfs_find_space_info(fs_info, flags);
 	if (!space_info) {
 		ret = update_space_info(fs_info, flags, 0, 0, &space_info);
 		BUG_ON(ret);
@@ -2381,7 +2381,7 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
 	u64 start, end;
 	int ret;
 
-	sinfo = __find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
+	sinfo = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
 	ASSERT(sinfo);
 
 	ins.objectid = node->bytenr;
@@ -2478,7 +2478,7 @@ static int alloc_tree_block(struct btrfs_trans_handle *trans,
 	if (!extent_op)
 		return -ENOMEM;
 
-	sinfo = __find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
+	sinfo = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
 	if (!sinfo) {
 		error("Corrupted fs, no valid METADATA block group found");
 		return -EUCLEAN;
@@ -3657,7 +3657,7 @@ int cleanup_ref_head(struct btrfs_trans_handle *trans,
 		if (!head->is_data) {
 			struct btrfs_space_info *sinfo;
 
-			sinfo = __find_space_info(trans->fs_info,
+			sinfo = btrfs_find_space_info(trans->fs_info,
 					BTRFS_BLOCK_GROUP_METADATA);
 			ASSERT(sinfo);
 			sinfo->bytes_reserved -= head->num_bytes;
diff --git a/mkfs/sprout.c b/mkfs/sprout.c
index 5977a73644f5..66119bbe975f 100644
--- a/mkfs/sprout.c
+++ b/mkfs/sprout.c
@@ -149,3 +149,41 @@ static int update_seed_dev_geneartion(struct btrfs_trans_handle *trans)
 	return ret;
 }
 
+static int sprout_alloc_one_chunk(struct btrfs_trans_handle *trans, u64 type)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_space_info *space_info;
+	u64 chunk_start;
+	u64 chunk_size;
+	u64 chunk_flags;
+	int ret;
+
+	space_info = btrfs_find_space_info(fs_info, type);
+	if (!space_info)
+		return -ENOENT;
+
+	if (type & BTRFS_BLOCK_GROUP_METADATA)
+		chunk_flags = fs_info->avail_metadata_alloc_bits &
+			      fs_info->metadata_alloc_profile;
+	if (type & BTRFS_BLOCK_GROUP_SYSTEM)
+		chunk_flags = fs_info->avail_system_alloc_bits &
+			      fs_info->system_alloc_profile;
+	if (type & BTRFS_BLOCK_GROUP_DATA)
+		chunk_flags = fs_info->avail_data_alloc_bits &
+			      fs_info->data_alloc_profile;
+	/* This is for mixed profile */
+	chunk_flags |= space_info->flags;
+
+	ret = btrfs_alloc_chunk(trans, fs_info, &chunk_start, &chunk_size,
+				chunk_flags);
+	if (ret < 0)
+		goto error;
+	ret = btrfs_make_block_group(trans, fs_info, 0, chunk_flags,
+				     chunk_start, chunk_size);
+	if (ret < 0)
+		goto error;
+	return ret;
+error:
+	btrfs_abort_transaction(trans, ret);
+	return ret;
+}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 08/10] btrfs-progs: mkfs/sprout: introduce a helper to relocate system chunks
  2022-04-20  0:19 [PATCH RFC 00/10] btrfs-progs: mkfs: add --seed option to sprout a seed device at mkfs time Qu Wenruo
                   ` (6 preceding siblings ...)
  2022-04-20  0:19 ` [PATCH RFC 07/10] btrfs-progs: mkfs/sprout: introduce helper to force allocating a chunk Qu Wenruo
@ 2022-04-20  0:19 ` Qu Wenruo
  2022-04-20  0:19 ` [PATCH RFC 09/10] btrfs-progs: mkfs/sprout: introduce a helper to remove empty system chunks from seed device Qu Wenruo
  2022-04-20  0:19 ` [PATCH RFC 10/10] btrfs-progs: mkfs: add support for seed sprout Qu Wenruo
  9 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2022-04-20  0:19 UTC (permalink / raw)
  To: linux-btrfs

In kernel, for seed sprout we always relocate all system chunks.

The reason is a little complex, at mount time, especially for
sys_chunk_array processing, we don't have any idea which device is seed
device.

And all we can access is just all deviecs with the same fsid, not even
knowing if that fsid has any seed device.

Thus kernel choose to relocate all system chunks, and remove the empty
seed system chunks to allow a proper mount.

Here we do the same thing, but since in progs we don't have chunk
relocation ability, here we just CoW every leaf, then all nodes will
also be CoWed, thus the whole chunk tree will be relocated.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 mkfs/sprout.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/mkfs/sprout.c b/mkfs/sprout.c
index 66119bbe975f..38d80d789084 100644
--- a/mkfs/sprout.c
+++ b/mkfs/sprout.c
@@ -187,3 +187,57 @@ error:
 	btrfs_abort_transaction(trans, ret);
 	return ret;
 }
+
+/*
+ * Relocate all chunk tree blocks by CoWing every leaf.
+ *
+ * Kernel and btrfs-progs requires sys_chunk_array to only contain devices from
+ * current fsid.
+ *
+ * As btrfs_stripe only contains devid and dev uuid, no fsid to determine if
+ * the current device is a seed.
+ *
+ * And at sys_chunk_array read time, btrfs doesn't have seed devices setup,
+ * thus if we have a chunk with seed device in it, kernel and progs will
+ * treat it as missing directly.
+ *
+ * So we need this function to relocate all system chunks from seed device,
+ * so later we can cleanup those system chunks.
+ */
+static int sprout_relocate_chunk_tree(struct btrfs_trans_handle *trans)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_root *chunk_root = fs_info->chunk_root;
+	struct btrfs_key key = { 0 };
+	struct btrfs_path path;
+	int ret;
+
+	while (true) {
+		btrfs_init_path(&path);
+		ret = btrfs_search_slot(trans, chunk_root, &key, &path, 0, 1);
+		if (ret < 0)
+			break;
+		/*
+		 * This is for the first search, we should be at the first item
+		 * of chunk tree. That's expected.
+		 */
+		if (ret > 0) {
+			ASSERT(key.offset == 0 && key.type == 0 &&
+			       key.objectid == 0);
+			ret = 0;
+		}
+
+		ret = btrfs_next_leaf(chunk_root, &path);
+		if (ret < 0)
+			break;
+		if (ret > 0) {
+			ret = 0;
+			break;
+		}
+		/* Save the key for next search */
+		btrfs_item_key_to_cpu(path.nodes[0], &key, 0);
+		btrfs_release_path(&path);
+	}
+	btrfs_release_path(&path);
+	return ret;
+}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 09/10] btrfs-progs: mkfs/sprout: introduce a helper to remove empty system chunks from seed device
  2022-04-20  0:19 [PATCH RFC 00/10] btrfs-progs: mkfs: add --seed option to sprout a seed device at mkfs time Qu Wenruo
                   ` (7 preceding siblings ...)
  2022-04-20  0:19 ` [PATCH RFC 08/10] btrfs-progs: mkfs/sprout: introduce a helper to relocate system chunks Qu Wenruo
@ 2022-04-20  0:19 ` Qu Wenruo
  2022-04-20  0:19 ` [PATCH RFC 10/10] btrfs-progs: mkfs: add support for seed sprout Qu Wenruo
  9 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2022-04-20  0:19 UTC (permalink / raw)
  To: linux-btrfs

Even if an empty system chunk is still in sys_chunk_array, kernel will
reject the mount.

So here we introduce the helper to do the removal, by iterating through
all block groups, and remove the block group if it's a system chunk and
contains stripe which points to the seed device.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 mkfs/sprout.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/mkfs/sprout.c b/mkfs/sprout.c
index 38d80d789084..049098872d3e 100644
--- a/mkfs/sprout.c
+++ b/mkfs/sprout.c
@@ -241,3 +241,56 @@ static int sprout_relocate_chunk_tree(struct btrfs_trans_handle *trans)
 	btrfs_release_path(&path);
 	return ret;
 }
+
+static int sprout_remove_seed_sys_chunk(struct btrfs_trans_handle *trans)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_block_group *bg;
+	struct btrfs_device *seed_dev;
+	int ret;
+
+	/* We should have exacly one seed device */
+	ASSERT(fs_info->fs_devices->seed);
+	ASSERT(fs_info->fs_devices->seed->devices.next ==
+	       fs_info->fs_devices->seed->devices.prev);
+	seed_dev = list_entry(fs_info->fs_devices->seed->devices.next,
+			      struct btrfs_device, dev_list);
+	bg = btrfs_lookup_first_block_group(fs_info, 0);
+	while (bg) {
+		const u64 cur = bg->start + bg->length;
+
+		struct cache_extent *ce;
+		struct map_lookup *map;
+		bool delete = false;
+		int i;
+
+		if (!(bg->flags & BTRFS_BLOCK_GROUP_SYSTEM))
+			goto next;
+
+		ce = search_cache_extent(&fs_info->mapping_tree.cache_tree,
+					 bg->start);
+		if (!ce) {
+			/* No chunk map for an bg, a big problem */
+			error("no chunk map for block group at %llu", bg->start);
+			return -EUCLEAN;
+		}
+		map = container_of(ce, struct map_lookup, ce);
+
+		for (i = 0; i < map->num_stripes; i++) {
+			if (map->stripes[i].dev == seed_dev) {
+				delete = true;
+				break;
+			}
+		}
+		if (!delete)
+			goto next;
+
+		ret = btrfs_remove_block_group(trans, bg->start, bg->length);
+		if (ret < 0)
+			return ret;
+next:
+		/* Has to using @cur, as the current bg may has been deleted */
+		bg = btrfs_lookup_first_block_group(fs_info, cur);
+	}
+	return 0;
+}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 10/10] btrfs-progs: mkfs: add support for seed sprout
  2022-04-20  0:19 [PATCH RFC 00/10] btrfs-progs: mkfs: add --seed option to sprout a seed device at mkfs time Qu Wenruo
                   ` (8 preceding siblings ...)
  2022-04-20  0:19 ` [PATCH RFC 09/10] btrfs-progs: mkfs/sprout: introduce a helper to remove empty system chunks from seed device Qu Wenruo
@ 2022-04-20  0:19 ` Qu Wenruo
  9 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2022-04-20  0:19 UTC (permalink / raw)
  To: linux-btrfs

This patch will add a new option "--seed <device>" to mkfs, allowing us
to mark specified device as seed (if not yet a seed device), then
sprouting it with the extra rw device.

The new --seed option will be hidden underneath the experimental feature.

Currently, --seed has extra limitations, including:

- csum type/sectorsize/nodesize/features must match seeding device
  The very basic sanity checks.

- Ignoreing -m/-d options
  As we will inherit the profiles from the seed device.

- Only accept single deviced seed
  This is not a technical limit, but purely to make seed usage less
  flex, and less error prone.

- Only accept single rw device
  We can already add as many deviecs as regular mkfs, I just didn't see
  much usefulness from it.

- Rejects --rootdir option
  It's super easy to have --rootdir to conflict with the existing files
  inherited from the seed fs.

With the new "--seed" option, we can replace the following workflow:

 mkfs.btrfs -f $seed
 mount $seed $mnt
 # Populate $mnt with contents we want
 umount $mnt
 btrfstune -S1 $seed
 mount $seed $mnt
 btrfs dev add -f $rw_dev $mnt
 mount -o remount,rw $mnt

With much less commands:

 mkfs.btrfs -f $seed
 mount $seed $mnt
 # Populate $mnt with contents we want
 umount $mnt
 mkfs.btrfs --seed $seed $rw_dev
 mount $rw_dev $mnt

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 Documentation/mkfs.btrfs.rst |  13 +++
 mkfs/main.c                  |  42 +++++++++
 mkfs/sprout.c                | 169 +++++++++++++++++++++++++++++++++++
 mkfs/sprout.h                |   3 +
 4 files changed, 227 insertions(+)

diff --git a/Documentation/mkfs.btrfs.rst b/Documentation/mkfs.btrfs.rst
index df3b5d807ca5..e73a3c5644fa 100644
--- a/Documentation/mkfs.btrfs.rst
+++ b/Documentation/mkfs.btrfs.rst
@@ -136,6 +136,19 @@ OPTIONS
                 contain the files from *rootdir*. Since version 4.14.1 the filesystem size is
                 not minimized. Please see option *--shrink* if you need that functionality.
 
+--seed <seed>
+        Use *seed* as the seed device, then sprout it with the extra *device*.
+        If *seed* is not a seed device yet, mkfs will mark it with seed flag,
+        other than that, no other write will reach *seed* device.
+
+        Check *SEEDING DEVICE* section of ``btrfs(5)`` for more details and the
+        alternative way of sprouting a seed device.
+
+        .. note::
+                With this option, *seed* must only contain a single device
+                filesystem, and we only accept one single device to be added
+                to sprout the seed device.
+
 --shrink
         Shrink the filesystem to its minimal size, only works with *--rootdir* option.
 
diff --git a/mkfs/main.c b/mkfs/main.c
index ca035cbb27f7..44f1b2e5c0a2 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -408,6 +408,7 @@ static void print_usage(int ret)
 	printf("  creation:\n");
 	printf("\t-b|--byte-count SIZE        set filesystem size to SIZE (on the first device)\n");
 	printf("\t-r|--rootdir DIR            copy files from DIR to the image root directory\n");
+	printf("\t--seed DEV                  use DEV as seed device to create the fs\n");
 	printf("\t--shrink                    (with --rootdir) shrink the filled filesystem to minimal size\n");
 	printf("\t-K|--nodiscard              do not perform whole device TRIM\n");
 	printf("\t-f|--force                  force overwrite of existing filesystem\n");
@@ -1283,6 +1284,14 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 	}
 
 	if (seed_dev) {
+		if (dev_cnt > 1) {
+			error("the option --seed only accepts a single device");
+			goto error;
+		}
+		if (source_dir_set) {
+			error("the option --seed conflicts with the option --rootdir");
+			goto error;
+		}
 		ret = prepare_seed_device(seed_dev, features, csum_type,
 					  sectorsize, nodesize);
 		if (ret < 0) {
@@ -1577,6 +1586,37 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 	else
 		mkfs_cfg.zone_size = 0;
 
+	if (seed_dev) {
+		/*
+		 * Data/metadata profiles will inherit from the seed device,
+		 * thus setting data/metadata profile is useless.
+		 */
+		if (data_profile_opt || metadata_profile_opt)
+			warning(
+"data/metadata profiles will inherit from the seed device, ignoring specified profiles");
+
+		ocf.filename = seed_dev;
+		/* Open the seed fs with read-only */
+		ocf.flags = 0;
+
+		fs_info = open_ctree_fs_info(&ocf);
+		if (!fs_info) {
+			error("failed to open seed fs");
+			goto error;
+		}
+
+		ret = sprout_seed_fs(fs_info, &mkfs_cfg, file, dev_block_count);
+		if (ret < 0) {
+			errno = -ret;
+			error("the seed fs failed to sprout: %m");
+			goto error;
+		}
+
+		root = fs_info->fs_root;
+		/* Now the fs is opened RW, and continue the remaining work */
+		goto fs_created;
+	}
+
 	ret = make_btrfs(fd, &mkfs_cfg);
 	if (ret) {
 		errno = -ret;
@@ -1712,6 +1752,8 @@ raid_groups:
 		error("unable to commit transaction before recowing trees: %m");
 		goto out;
 	}
+
+fs_created:
 	trans = btrfs_start_transaction(root, 1);
 	if (IS_ERR(trans)) {
 		errno = -PTR_ERR(trans);
diff --git a/mkfs/sprout.c b/mkfs/sprout.c
index 049098872d3e..eecf40f4512f 100644
--- a/mkfs/sprout.c
+++ b/mkfs/sprout.c
@@ -1,10 +1,16 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 
+#include <sys/stat.h>
+#include <uuid/uuid.h>
+#include <fcntl.h>
+#include <unistd.h>
 #include "kernel-shared/ctree.h"
 #include "kernel-shared/disk-io.h"
 #include "kernel-shared/volumes.h"
 #include "kernel-shared/transaction.h"
+#include "kernel-lib/overflow.h"
 #include "common/messages.h"
+#include "common/units.h"
 #include "mkfs/common.h"
 
 int prepare_seed_device(const char *path, u64 features, u32 csum_type,
@@ -294,3 +300,166 @@ next:
 	}
 	return 0;
 }
+
+int sprout_seed_fs(struct btrfs_fs_info *fs_info,
+		   struct btrfs_mkfs_config *cfg, const char *path,
+		   u64 dev_block_count)
+{
+	struct btrfs_fs_devices *seed_fs_devs = fs_info->fs_devices;
+	struct btrfs_fs_devices *new_fs_devs;
+	struct btrfs_trans_handle *trans;
+	struct btrfs_device *dev;
+	const u64 old_size = btrfs_super_total_bytes(fs_info->super_copy);
+	u8 fsid[BTRFS_FSID_SIZE];
+	u64 new_size;
+	int fd;
+	int ret;
+
+	/* The seed fs should be RO. */
+	ASSERT(fs_info->readonly);
+
+	fd = open(path, O_RDWR);
+	if (fd < 0) {
+		ret = -errno;
+		error("failed to open %s: %m", path);
+		return ret;
+	}
+	new_fs_devs = btrfs_alloc_fs_devices(fsid, NULL);
+	if (!new_fs_devs) {
+		ret = -ENOMEM;
+		goto error_close;
+	}
+	if (!*cfg->fs_uuid) {
+		uuid_generate(new_fs_devs->fsid);
+		uuid_unparse(new_fs_devs->fsid, cfg->fs_uuid);
+	} else {
+		uuid_parse(cfg->fs_uuid, new_fs_devs->fsid);
+	}
+	memcpy(new_fs_devs->metadata_uuid, new_fs_devs->fsid, BTRFS_FSID_SIZE);
+
+	/* The seed fs should only have one device. */
+	ASSERT(seed_fs_devs->devices.next == seed_fs_devs->devices.prev);
+
+	/* All devices should be RO already. */
+	list_for_each_entry(dev, &seed_fs_devs->devices, dev_list) {
+		if (dev->fd >= 0) {
+			int old_flags;
+
+			old_flags = fcntl(dev->fd, F_GETFL);
+			if ((old_flags & O_ACCMODE) != O_RDONLY)
+				warning("devid %llu is not opened read-only",
+					dev->devid);
+		}
+	}
+
+	/* Add a new btrfs_device. */
+	dev = calloc(1, sizeof(*dev));
+	if (!dev) {
+		ret = -ENOMEM;
+		goto error_close;
+	}
+	dev->fs_info = fs_info;
+	dev->fs_devices = new_fs_devs;
+	/* The seed device is the first device thus we're the 2nd. */
+	dev->devid = 2;
+	dev->type = 0;
+	dev->io_width = cfg->sectorsize;
+	dev->io_align = cfg->sectorsize;
+	dev->fd = fd;
+	dev->writeable = 1;
+	dev->total_bytes = dev_block_count;
+	dev->sector_size = cfg->sectorsize;
+	dev->bytes_used = 0;
+	dev->dev_root = fs_info->dev_root;
+	uuid_generate(dev->uuid);
+	dev->name = strdup(path);
+	if (!dev->name) {
+		ret = -ENOMEM;
+		goto error_free;
+	}
+	if (check_add_overflow(old_size, dev_block_count, &new_size)) {
+		ret = -EOVERFLOW;
+		error(
+		"adding device of %llu (%s) bytes would exceed max file system size",
+		      dev->total_bytes, pretty_size(dev->total_bytes));
+		goto error_free;
+	}
+	list_add_tail(&dev->dev_list, &new_fs_devs->devices);
+	fs_info->readonly = 0;
+	fs_info->fs_devices = new_fs_devs;
+	new_fs_devs->seed = seed_fs_devs;
+
+	trans = btrfs_start_transaction(fs_info->tree_root, 0);
+	if (IS_ERR(trans)) {
+		ret = PTR_ERR(trans);
+		errno = -ret;
+		error("failed to start transaction: %m");
+		goto abort;
+	}
+	/* Force chunk allocation, or we won't have any bg to do CoW. */
+	ret = sprout_alloc_one_chunk(trans, BTRFS_BLOCK_GROUP_METADATA);
+	if (ret < 0)
+		goto abort;
+	ret = sprout_alloc_one_chunk(trans, BTRFS_BLOCK_GROUP_SYSTEM);
+	if (ret < 0)
+		goto abort;
+	/*
+	 * Our device is only in memory. Now with new block groups, we're
+	 * safe to insert it into chunk tree.
+	 */
+	ret = btrfs_add_device(trans, fs_info, dev);
+	if (ret < 0)
+		goto abort;
+
+	/* Update the geneartion of the device item of seed device */
+	ret = update_seed_dev_geneartion(trans);
+	if (ret < 0)
+		goto abort;
+
+	/* Relocate seed system chunks, so later we can delete them */
+	ret = sprout_relocate_chunk_tree(trans);
+	if (ret < 0)
+		goto abort;
+
+	btrfs_set_super_total_bytes(fs_info->super_copy, new_size);
+	btrfs_set_super_num_devices(fs_info->super_copy,
+			btrfs_super_num_devices(fs_info->super_copy) + 1);
+	btrfs_set_super_flags(fs_info->super_copy, ~BTRFS_SUPER_FLAG_SEEDING &
+			btrfs_super_flags(fs_info->super_copy));
+	memcpy(fs_info->super_copy->fsid, new_fs_devs->fsid, BTRFS_FSID_SIZE);
+	/*
+	 * Commit trans will update super block device item for us
+	 *
+	 * And we also need to update block group items, as
+	 * btrfs_block_group_item::used update is delayed, we can't
+	 * delete the empty system chunks in the same trans.
+	 */
+	ret = btrfs_commit_transaction(trans, fs_info->tree_root);
+	if (ret < 0) {
+		errno = -ret;
+		error("failed to commit transaction: %m");
+		return ret;
+	}
+
+	trans = btrfs_start_transaction(fs_info->tree_root, 0);
+	/* Remove the empty seed system chunks. */
+	ret = sprout_remove_seed_sys_chunk(trans);
+	if (ret < 0)
+		goto abort;
+	ret = btrfs_commit_transaction(trans, fs_info->tree_root);
+	if (ret < 0) {
+		errno = -ret;
+		error("failed to commit transaction: %m");
+	}
+
+	return ret;
+error_free:
+	free(dev);
+	free(new_fs_devs);
+error_close:
+	close(fd);
+	return ret;
+abort:
+	btrfs_abort_transaction(trans, ret);
+	return ret;
+}
diff --git a/mkfs/sprout.h b/mkfs/sprout.h
index 2e8b794c93e4..5527256178e5 100644
--- a/mkfs/sprout.h
+++ b/mkfs/sprout.h
@@ -6,5 +6,8 @@
 
 int prepare_seed_device(const char *path, u64 features, u32 csum_type,
 			u32 sectorsize, u32 nodesize);
+int sprout_seed_fs(struct btrfs_fs_info *fs_info,
+		   struct btrfs_mkfs_config *cfg, const char *path,
+		   u64 dev_block_count);
 
 #endif
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-04-20  0:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-20  0:19 [PATCH RFC 00/10] btrfs-progs: mkfs: add --seed option to sprout a seed device at mkfs time Qu Wenruo
2022-04-20  0:19 ` [PATCH RFC 01/10] btrfs-progs: refactor find_free_dev_extent_start() for later expansion Qu Wenruo
2022-04-20  0:19 ` [PATCH RFC 02/10] btrfs-progs: delay chunk and device extent items insertion Qu Wenruo
2022-04-20  0:19 ` [PATCH RFC 03/10] btrfs-progs: mkfs: introduce helper to set seed flag Qu Wenruo
2022-04-20  0:19 ` [PATCH RFC 04/10] btrfs-progs: mkfs: avoid error out if some trees exist Qu Wenruo
2022-04-20  0:19 ` [PATCH RFC 05/10] btrfs-progs: extract btrfs_fs_devices structure allocation into a helper Qu Wenruo
2022-04-20  0:19 ` [PATCH RFC 06/10] btrfs-progs: mkfs/sprout: add a helper to update generation for seed device Qu Wenruo
2022-04-20  0:19 ` [PATCH RFC 07/10] btrfs-progs: mkfs/sprout: introduce helper to force allocating a chunk Qu Wenruo
2022-04-20  0:19 ` [PATCH RFC 08/10] btrfs-progs: mkfs/sprout: introduce a helper to relocate system chunks Qu Wenruo
2022-04-20  0:19 ` [PATCH RFC 09/10] btrfs-progs: mkfs/sprout: introduce a helper to remove empty system chunks from seed device Qu Wenruo
2022-04-20  0:19 ` [PATCH RFC 10/10] btrfs-progs: mkfs: add support for seed sprout Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.