linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] btrfs-progs: image: Data dump support, restore optimization and small fixes
@ 2019-06-06 11:06 Qu Wenruo
  2019-06-06 11:06 ` [PATCH 1/9] btrfs-progs: image: Use SZ_* to replace intermediate size Qu Wenruo
                   ` (9 more replies)
  0 siblings, 10 replies; 13+ messages in thread
From: Qu Wenruo @ 2019-06-06 11:06 UTC (permalink / raw)
  To: linux-btrfs

This patchset can be fetched from github:
https://github.com/adam900710/btrfs-progs/tree/image_data_dump
Which is based on v5.1 tag.

This patchset contains the following main features:
- various small fixes for btrfs-image
  From indent misalign, SZ_* cleanup to too many core cores causing
  btrfs-image crash.

- btrfs-image dump support 
  This introduce a new option -d to dump data.
  Due to item size limit, we have to enlarge the existing limit from
  256K (enough for tree blocks, but not enough for free space cache) to
  256M.
  This change will cause incompatibility, thus we have to introduce a
  new magic as version. While keeping all other on-disk format the same.

- btrfs-image restore optimization
  This will speed up chunk item search during restore.

Qu Wenruo (9):
  btrfs-progs: image: Use SZ_* to replace intermediate size
  btrfs-progs: image: Fix a indent misalign
  btrfs-progs: image: Fix a access-beyond-boundary bug when there are 32
    online CPUs
  btrfs-progs: image: Verify the superblock before restore
  btrfs-progs: image: Introduce framework for more dump versions
  btrfs-progs: image: Introduce -d option to dump data
  btrfs-progs: image: Allow restore to record system chunk ranges for
    later usage
  btrfs-progs: image: Introduce helper to determine if a tree block is
    in the range of system chunks
  btrfs-progs: image: Rework how we search chunk tree blocks

 disk-io.c        |   6 +-
 disk-io.h        |   1 +
 image/main.c     | 501 +++++++++++++++++++++++++++++++++++------------
 image/metadump.h |  15 +-
 4 files changed, 393 insertions(+), 130 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/9] btrfs-progs: image: Use SZ_* to replace intermediate size
  2019-06-06 11:06 [PATCH 0/9] btrfs-progs: image: Data dump support, restore optimization and small fixes Qu Wenruo
@ 2019-06-06 11:06 ` Qu Wenruo
  2019-06-06 11:06 ` [PATCH 2/9] btrfs-progs: image: Fix a indent misalign Qu Wenruo
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2019-06-06 11:06 UTC (permalink / raw)
  To: linux-btrfs

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/metadump.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/image/metadump.h b/image/metadump.h
index 8ace60f5..f85c9bcf 100644
--- a/image/metadump.h
+++ b/image/metadump.h
@@ -23,8 +23,8 @@
 #include "ctree.h"
 
 #define HEADER_MAGIC		0xbd5c25e27295668bULL
-#define MAX_PENDING_SIZE	(256 * 1024)
-#define BLOCK_SIZE		1024
+#define MAX_PENDING_SIZE	SZ_256K
+#define BLOCK_SIZE		SZ_1K
 #define BLOCK_MASK		(BLOCK_SIZE - 1)
 
 #define ITEMS_PER_CLUSTER ((BLOCK_SIZE - sizeof(struct meta_cluster)) / \
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/9] btrfs-progs: image: Fix a indent misalign
  2019-06-06 11:06 [PATCH 0/9] btrfs-progs: image: Data dump support, restore optimization and small fixes Qu Wenruo
  2019-06-06 11:06 ` [PATCH 1/9] btrfs-progs: image: Use SZ_* to replace intermediate size Qu Wenruo
@ 2019-06-06 11:06 ` Qu Wenruo
  2019-06-06 11:06 ` [PATCH 3/9] btrfs-progs: image: Fix a access-beyond-boundary bug when there are 32 online CPUs Qu Wenruo
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2019-06-06 11:06 UTC (permalink / raw)
  To: linux-btrfs

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/image/main.c b/image/main.c
index 4fba8283..fb9fc48c 100644
--- a/image/main.c
+++ b/image/main.c
@@ -2702,7 +2702,7 @@ int main(int argc, char *argv[])
 			create = 0;
 			multi_devices = 1;
 			break;
-			case GETOPT_VAL_HELP:
+		case GETOPT_VAL_HELP:
 		default:
 			print_usage(c != GETOPT_VAL_HELP);
 		}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/9] btrfs-progs: image: Fix a access-beyond-boundary bug when there are 32 online CPUs
  2019-06-06 11:06 [PATCH 0/9] btrfs-progs: image: Data dump support, restore optimization and small fixes Qu Wenruo
  2019-06-06 11:06 ` [PATCH 1/9] btrfs-progs: image: Use SZ_* to replace intermediate size Qu Wenruo
  2019-06-06 11:06 ` [PATCH 2/9] btrfs-progs: image: Fix a indent misalign Qu Wenruo
@ 2019-06-06 11:06 ` Qu Wenruo
  2019-06-10  1:23   ` Su Yue
  2019-06-06 11:06 ` [PATCH 4/9] btrfs-progs: image: Verify the superblock before restore Qu Wenruo
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2019-06-06 11:06 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
When there are over 32 (in my example, 35) online CPUs, btrfs-image -c9
will just hang.

[CAUSE]
Btrfs-image has a hard coded limit (32) on how many threads we can use.
For the "-t" option we do the up limit check.

But when we don't specify "-t" option and speicified "-c" option, then
btrfs-image will try to auto detect the number of online CPUs, and use
it without checking if it's over the up limit.

And for num_threads larger than the up limit, we will over write the
adjust members of metadump_struct/mdrestore_struct, corrupting
pthread_mutex_t and pthread_cond_t, causing synchronising problem.

Nowadays, with SMT/HT and higher cpu core counts, it's not hard to go
beyond 32 threads, and hit the bug.

[FIX]
Just do extra num_threads check before using the number from sysconf().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/image/main.c b/image/main.c
index fb9fc48c..80f09c21 100644
--- a/image/main.c
+++ b/image/main.c
@@ -2758,6 +2758,7 @@ int main(int argc, char *argv[])
 
 			if (tmp <= 0)
 				tmp = 1;
+			tmp = min_t(long, tmp, MAX_WORKER_THREADS);
 			num_threads = tmp;
 		}
 	} else {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/9] btrfs-progs: image: Verify the superblock before restore
  2019-06-06 11:06 [PATCH 0/9] btrfs-progs: image: Data dump support, restore optimization and small fixes Qu Wenruo
                   ` (2 preceding siblings ...)
  2019-06-06 11:06 ` [PATCH 3/9] btrfs-progs: image: Fix a access-beyond-boundary bug when there are 32 online CPUs Qu Wenruo
@ 2019-06-06 11:06 ` Qu Wenruo
  2019-06-06 11:06 ` [PATCH 5/9] btrfs-progs: image: Introduce framework for more dump versions Qu Wenruo
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2019-06-06 11:06 UTC (permalink / raw)
  To: linux-btrfs

This patch will export disk-io.c::check_super() as btrfs_check_super()
and use it in btrfs-image for extra verification.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 disk-io.c    | 6 +++---
 disk-io.h    | 1 +
 image/main.c | 5 +++++
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/disk-io.c b/disk-io.c
index 151eb3b5..ffe4a8c5 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -1347,7 +1347,7 @@ struct btrfs_root *open_ctree_fd(int fp, const char *path, u64 sb_bytenr,
  * - number of devices   - something sane
  * - sys array size      - maximum
  */
-static int check_super(struct btrfs_super_block *sb, unsigned sbflags)
+int btrfs_check_super(struct btrfs_super_block *sb, unsigned sbflags)
 {
 	u8 result[BTRFS_CSUM_SIZE];
 	u32 crc;
@@ -1547,7 +1547,7 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr,
 		if (btrfs_super_bytenr(buf) != sb_bytenr)
 			return -EIO;
 
-		ret = check_super(buf, sbflags);
+		ret = btrfs_check_super(buf, sbflags);
 		if (ret < 0)
 			return ret;
 		memcpy(sb, buf, BTRFS_SUPER_INFO_SIZE);
@@ -1572,7 +1572,7 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr,
 		/* if magic is NULL, the device was removed */
 		if (btrfs_super_magic(buf) == 0 && i == 0)
 			break;
-		if (check_super(buf, sbflags))
+		if (btrfs_check_super(buf, sbflags))
 			continue;
 
 		if (!fsid_is_initialized) {
diff --git a/disk-io.h b/disk-io.h
index ddf3a380..c97aa234 100644
--- a/disk-io.h
+++ b/disk-io.h
@@ -171,6 +171,7 @@ static inline int close_ctree(struct btrfs_root *root)
 
 int write_all_supers(struct btrfs_fs_info *fs_info);
 int write_ctree_super(struct btrfs_trans_handle *trans);
+int btrfs_check_super(struct btrfs_super_block *sb, unsigned sbflags);
 int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr,
 		unsigned sbflags);
 int btrfs_map_bh_to_logical(struct btrfs_root *root, struct extent_buffer *bh,
diff --git a/image/main.c b/image/main.c
index 80f09c21..0b7c8736 100644
--- a/image/main.c
+++ b/image/main.c
@@ -2040,6 +2040,11 @@ static int build_chunk_tree(struct mdrestore_struct *mdres,
 
 	pthread_mutex_lock(&mdres->mutex);
 	super = (struct btrfs_super_block *)buffer;
+	ret = btrfs_check_super(super, 0);
+	if (ret < 0) {
+		error("invalid superblock");
+		return ret;
+	}
 	chunk_root_bytenr = btrfs_super_chunk_root(super);
 	mdres->nodesize = btrfs_super_nodesize(super);
 	if (btrfs_super_incompat_flags(super) &
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/9] btrfs-progs: image: Introduce framework for more dump versions
  2019-06-06 11:06 [PATCH 0/9] btrfs-progs: image: Data dump support, restore optimization and small fixes Qu Wenruo
                   ` (3 preceding siblings ...)
  2019-06-06 11:06 ` [PATCH 4/9] btrfs-progs: image: Verify the superblock before restore Qu Wenruo
@ 2019-06-06 11:06 ` Qu Wenruo
  2019-06-06 11:06 ` [PATCH 6/9] btrfs-progs: image: Introduce -d option to dump data Qu Wenruo
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2019-06-06 11:06 UTC (permalink / raw)
  To: linux-btrfs

The original dump format only contains a @magic member to verify the
format, this means if we want to introduce new on-disk format or change
certain size limit, we can only introduce new magic as kind of version.

This patch will introduce the framework to allow multiple magic to
co-exist for further functions.

This patch will introduce the following members for each dump version.

- max_pending_size
  The threshold size for an cluster. It's not a hard limit but a soft
  one. One cluster can go larger than max_pending_size for one item, but
  next item would go to next cluster.

- magic_cpu
  The magic number in CPU endian.

- extra_sb_flags
  If the super block of this restore needs extra super block flags like
  BTRFS_SUPER_FLAG_METADUMP_V2.
  For incoming data dump feature, we don't need any extra super block
  flags.

This change also implies that all image dumps will use the same magic
for all clusters. No mixing is allowed, as we will use the first cluster
to determine the dump version.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c     | 80 ++++++++++++++++++++++++++++++++++++++++--------
 image/metadump.h | 13 ++++++--
 2 files changed, 78 insertions(+), 15 deletions(-)

diff --git a/image/main.c b/image/main.c
index 0b7c8736..e8b45a1a 100644
--- a/image/main.c
+++ b/image/main.c
@@ -41,6 +41,19 @@
 
 #define MAX_WORKER_THREADS	(32)
 
+const struct dump_version dump_versions[NR_DUMP_VERSIONS] = {
+	/*
+	 * The original format, which only supports tree blocks and
+	 * free space cache dump.
+	 */
+	{ .version = 0,
+	  .max_pending_size = SZ_256K,
+	  .magic_cpu = 0xbd5c25e27295668bULL,
+	  .extra_sb_flags = 1 }
+};
+
+const struct dump_version *current_version = &dump_versions[0];
+
 struct async_work {
 	struct list_head list;
 	struct list_head ordered;
@@ -395,7 +408,7 @@ static void meta_cluster_init(struct metadump_struct *md, u64 start)
 	md->num_items = 0;
 	md->num_ready = 0;
 	header = &md->cluster.header;
-	header->magic = cpu_to_le64(HEADER_MAGIC);
+	header->magic = cpu_to_le64(current_version->magic_cpu);
 	header->bytenr = cpu_to_le64(start);
 	header->nritems = cpu_to_le32(0);
 	header->compress = md->compress_level > 0 ?
@@ -707,7 +720,7 @@ static int add_extent(u64 start, u64 size, struct metadump_struct *md,
 {
 	int ret;
 	if (md->data != data ||
-	    md->pending_size + size > MAX_PENDING_SIZE ||
+	    md->pending_size + size > current_version->max_pending_size ||
 	    md->pending_start + md->pending_size != start) {
 		ret = flush_pending(md, 0);
 		if (ret)
@@ -1093,7 +1106,8 @@ static void update_super_old(u8 *buffer)
 	u32 sectorsize = btrfs_super_sectorsize(super);
 	u64 flags = btrfs_super_flags(super);
 
-	flags |= BTRFS_SUPER_FLAG_METADUMP;
+	if (current_version->extra_sb_flags)
+		flags |= BTRFS_SUPER_FLAG_METADUMP;
 	btrfs_set_super_flags(super, flags);
 
 	key = (struct btrfs_disk_key *)(super->sys_chunk_array);
@@ -1186,7 +1200,8 @@ static int update_super(struct mdrestore_struct *mdres, u8 *buffer)
 	if (mdres->clear_space_cache)
 		btrfs_set_super_cache_generation(super, 0);
 
-	flags |= BTRFS_SUPER_FLAG_METADUMP_V2;
+	if (current_version->extra_sb_flags)
+		flags |= BTRFS_SUPER_FLAG_METADUMP_V2;
 	btrfs_set_super_flags(super, flags);
 	btrfs_set_super_sys_array_size(super, new_array_size);
 	btrfs_set_super_num_devices(super, 1);
@@ -1374,7 +1389,7 @@ static void *restore_worker(void *data)
 	u8 *outbuf;
 	int outfd;
 	int ret;
-	int compress_size = MAX_PENDING_SIZE * 4;
+	int compress_size = current_version->max_pending_size * 4;
 
 	outfd = fileno(mdres->out);
 	buffer = malloc(compress_size);
@@ -1523,6 +1538,42 @@ static void mdrestore_destroy(struct mdrestore_struct *mdres, int num_threads)
 	pthread_mutex_destroy(&mdres->mutex);
 }
 
+static int detect_version(FILE *in)
+{
+	struct meta_cluster *cluster;
+	u8 buf[BLOCK_SIZE];
+	bool found = false;
+	int i;
+	int ret;
+
+	if (fseek(in, 0, SEEK_SET) < 0) {
+		error("seek failed: %m");
+		return -errno;
+	}
+	ret = fread(buf, BLOCK_SIZE, 1, in);
+	if (!ret) {
+		error("failed to read header");
+		return -EIO;
+	}
+
+	fseek(in, 0, SEEK_SET);
+	cluster = (struct meta_cluster *)buf;
+	for (i = 0; i < NR_DUMP_VERSIONS; i++) {
+		if (le64_to_cpu(cluster->header.magic) ==
+		    dump_versions[i].magic_cpu) {
+			found = true;
+			current_version = &dump_versions[i];
+			break;
+		}
+	}
+
+	if (!found) {
+		error("unrecognized header format");
+		return -EINVAL;
+	}
+	return 0;
+}
+
 static int mdrestore_init(struct mdrestore_struct *mdres,
 			  FILE *in, FILE *out, int old_restore,
 			  int num_threads, int fixup_offset,
@@ -1530,6 +1581,9 @@ static int mdrestore_init(struct mdrestore_struct *mdres,
 {
 	int i, ret = 0;
 
+	ret = detect_version(in);
+	if (ret < 0)
+		return ret;
 	memset(mdres, 0, sizeof(*mdres));
 	pthread_cond_init(&mdres->cond, NULL);
 	pthread_mutex_init(&mdres->mutex, NULL);
@@ -1577,9 +1631,9 @@ static int fill_mdres_info(struct mdrestore_struct *mdres,
 		return 0;
 
 	if (mdres->compress_method == COMPRESS_ZLIB) {
-		size_t size = MAX_PENDING_SIZE * 2;
+		size_t size = current_version->max_pending_size * 2;
 
-		buffer = malloc(MAX_PENDING_SIZE * 2);
+		buffer = malloc(current_version->max_pending_size * 2);
 		if (!buffer)
 			return -ENOMEM;
 		ret = uncompress(buffer, (unsigned long *)&size,
@@ -1818,7 +1872,7 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
 	u64 current_cluster = cluster_bytenr, bytenr;
 	u64 item_bytenr;
 	u32 bufsize, nritems, i;
-	u32 max_size = MAX_PENDING_SIZE * 2;
+	u32 max_size = current_version->max_pending_size * 2;
 	u8 *buffer, *tmp = NULL;
 	int ret = 0;
 
@@ -1874,7 +1928,7 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
 		ret = 0;
 
 		header = &cluster->header;
-		if (le64_to_cpu(header->magic) != HEADER_MAGIC ||
+		if (le64_to_cpu(header->magic) != current_version->magic_cpu ||
 		    le64_to_cpu(header->bytenr) != current_cluster) {
 			error("bad header in metadump image");
 			ret = -EIO;
@@ -1977,7 +2031,7 @@ static int build_chunk_tree(struct mdrestore_struct *mdres,
 	ret = 0;
 
 	header = &cluster->header;
-	if (le64_to_cpu(header->magic) != HEADER_MAGIC ||
+	if (le64_to_cpu(header->magic) != current_version->magic_cpu ||
 	    le64_to_cpu(header->bytenr) != 0) {
 		error("bad header in metadump image");
 		return -EIO;
@@ -2018,10 +2072,10 @@ static int build_chunk_tree(struct mdrestore_struct *mdres,
 	}
 
 	if (mdres->compress_method == COMPRESS_ZLIB) {
-		size_t size = MAX_PENDING_SIZE * 2;
+		size_t size = current_version->max_pending_size * 2;
 		u8 *tmp;
 
-		tmp = malloc(MAX_PENDING_SIZE * 2);
+		tmp = malloc(current_version->max_pending_size * 2);
 		if (!tmp) {
 			free(buffer);
 			return -ENOMEM;
@@ -2478,7 +2532,7 @@ static int restore_metadump(const char *input, FILE *out, int old_restore,
 			break;
 
 		header = &cluster->header;
-		if (le64_to_cpu(header->magic) != HEADER_MAGIC ||
+		if (le64_to_cpu(header->magic) != current_version->magic_cpu ||
 		    le64_to_cpu(header->bytenr) != bytenr) {
 			error("bad header in metadump image");
 			ret = -EIO;
diff --git a/image/metadump.h b/image/metadump.h
index f85c9bcf..941d4b82 100644
--- a/image/metadump.h
+++ b/image/metadump.h
@@ -22,8 +22,6 @@
 #include "kernel-lib/list.h"
 #include "ctree.h"
 
-#define HEADER_MAGIC		0xbd5c25e27295668bULL
-#define MAX_PENDING_SIZE	SZ_256K
 #define BLOCK_SIZE		SZ_1K
 #define BLOCK_MASK		(BLOCK_SIZE - 1)
 
@@ -33,6 +31,17 @@
 #define COMPRESS_NONE		0
 #define COMPRESS_ZLIB		1
 
+struct dump_version {
+	u64 magic_cpu;
+	int version;
+	int max_pending_size;
+	unsigned int extra_sb_flags:1;
+};
+
+#define NR_DUMP_VERSIONS	1
+extern const struct dump_version dump_versions[NR_DUMP_VERSIONS];
+const extern struct dump_version *current_version;
+
 struct meta_cluster_item {
 	__le64 bytenr;
 	__le32 size;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/9] btrfs-progs: image: Introduce -d option to dump data
  2019-06-06 11:06 [PATCH 0/9] btrfs-progs: image: Data dump support, restore optimization and small fixes Qu Wenruo
                   ` (4 preceding siblings ...)
  2019-06-06 11:06 ` [PATCH 5/9] btrfs-progs: image: Introduce framework for more dump versions Qu Wenruo
@ 2019-06-06 11:06 ` Qu Wenruo
  2019-06-06 11:06 ` [PATCH 7/9] btrfs-progs: image: Allow restore to record system chunk ranges for later usage Qu Wenruo
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2019-06-06 11:06 UTC (permalink / raw)
  To: linux-btrfs

This new data dump feature will dump the whole image, not long the
existing tree blocks but also all its data extents(*).

This feature will rely on the new dump format (_DUmP_v1), as it needs
extra large extent size limit, and older btrfs-image dump can't handle
such large item/cluster size.

Since we're dumping all extents including data extents, for the restored
image there is no need to use any extra super block flags to inform
kernel.
Kernel should just treat the restored image as any ordinary btrfs.

*: The data extents will be dumped as is, that's to say, even for
preallocated extent, its (meaningless) data will be read out and
dumpped.
This behavior will cause extra space usage for the image, but we can
skip all the complex partially shared preallocated extent check.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c     | 53 +++++++++++++++++++++++++++++++++++++-----------
 image/metadump.h |  2 +-
 2 files changed, 42 insertions(+), 13 deletions(-)

diff --git a/image/main.c b/image/main.c
index e8b45a1a..f394bfc8 100644
--- a/image/main.c
+++ b/image/main.c
@@ -49,7 +49,15 @@ const struct dump_version dump_versions[NR_DUMP_VERSIONS] = {
 	{ .version = 0,
 	  .max_pending_size = SZ_256K,
 	  .magic_cpu = 0xbd5c25e27295668bULL,
-	  .extra_sb_flags = 1 }
+	  .extra_sb_flags = 1 },
+	/*
+	 * The newer format, with much larger item size to contain
+	 * any data extent.
+	 */
+	{ .version = 1,
+	  .max_pending_size = SZ_256M,
+	  .magic_cpu = 0x31765f506d55445fULL, /* ascii _DUmP_v1, no null */
+	  .extra_sb_flags = 0 },
 };
 
 const struct dump_version *current_version = &dump_versions[0];
@@ -444,10 +452,14 @@ static void metadump_destroy(struct metadump_struct *md, int num_threads)
 
 static int metadump_init(struct metadump_struct *md, struct btrfs_root *root,
 			 FILE *out, int num_threads, int compress_level,
-			 enum sanitize_mode sanitize_names)
+			 bool dump_data, enum sanitize_mode sanitize_names)
 {
 	int i, ret = 0;
 
+	/* We need larger item/cluster limit for data extents */
+	if (dump_data)
+		current_version = &dump_versions[1];
+
 	memset(md, 0, sizeof(*md));
 	INIT_LIST_HEAD(&md->list);
 	INIT_LIST_HEAD(&md->ordered);
@@ -912,7 +924,7 @@ static int copy_space_cache(struct btrfs_root *root,
 }
 
 static int copy_from_extent_tree(struct metadump_struct *metadump,
-				 struct btrfs_path *path)
+				 struct btrfs_path *path, bool dump_data)
 {
 	struct btrfs_root *extent_root;
 	struct extent_buffer *leaf;
@@ -977,9 +989,15 @@ static int copy_from_extent_tree(struct metadump_struct *metadump,
 			ei = btrfs_item_ptr(leaf, path->slots[0],
 					    struct btrfs_extent_item);
 			if (btrfs_extent_flags(leaf, ei) &
-			    BTRFS_EXTENT_FLAG_TREE_BLOCK) {
+			    BTRFS_EXTENT_FLAG_TREE_BLOCK ||
+			    btrfs_extent_flags(leaf, ei) &
+			    BTRFS_EXTENT_FLAG_DATA) {
+				bool is_data;
+
+				is_data = btrfs_extent_flags(leaf, ei) &
+					  BTRFS_EXTENT_FLAG_DATA;
 				ret = add_extent(bytenr, num_bytes, metadump,
-						 0);
+						 is_data);
 				if (ret) {
 					error("unable to add block %llu: %d",
 						(unsigned long long)bytenr, ret);
@@ -1022,7 +1040,7 @@ static int copy_from_extent_tree(struct metadump_struct *metadump,
 
 static int create_metadump(const char *input, FILE *out, int num_threads,
 			   int compress_level, enum sanitize_mode sanitize,
-			   int walk_trees)
+			   int walk_trees, bool dump_data)
 {
 	struct btrfs_root *root;
 	struct btrfs_path path;
@@ -1037,7 +1055,7 @@ static int create_metadump(const char *input, FILE *out, int num_threads,
 	}
 
 	ret = metadump_init(&metadump, root, out, num_threads,
-			    compress_level, sanitize);
+			    compress_level, dump_data, sanitize);
 	if (ret) {
 		error("failed to initialize metadump: %d", ret);
 		close_ctree(root);
@@ -1069,7 +1087,7 @@ static int create_metadump(const char *input, FILE *out, int num_threads,
 			goto out;
 		}
 	} else {
-		ret = copy_from_extent_tree(&metadump, &path);
+		ret = copy_from_extent_tree(&metadump, &path, dump_data);
 		if (ret) {
 			err = ret;
 			goto out;
@@ -2694,6 +2712,7 @@ static void print_usage(int ret)
 	printf("\t-s      \tsanitize file names, use once to just use garbage, use twice if you want crc collisions\n");
 	printf("\t-w      \twalk all trees instead of using extent tree, do this if your extent tree is broken\n");
 	printf("\t-m	   \trestore for multiple devices\n");
+	printf("\t-d	   \talso dump data, conflicts with -w\n");
 	printf("\n");
 	printf("\tIn the dump mode, source is the btrfs device and target is the output file (use '-' for stdout).\n");
 	printf("\tIn the restore mode, source is the dumped image and target is the btrfs device/file.\n");
@@ -2713,6 +2732,7 @@ int main(int argc, char *argv[])
 	int ret;
 	enum sanitize_mode sanitize = SANITIZE_NONE;
 	int dev_cnt = 0;
+	bool dump_data = false;
 	int usage_error = 0;
 	FILE *out;
 
@@ -2721,7 +2741,7 @@ int main(int argc, char *argv[])
 			{ "help", no_argument, NULL, GETOPT_VAL_HELP},
 			{ NULL, 0, NULL, 0 }
 		};
-		int c = getopt_long(argc, argv, "rc:t:oswm", long_options, NULL);
+		int c = getopt_long(argc, argv, "rc:t:oswmd", long_options, NULL);
 		if (c < 0)
 			break;
 		switch (c) {
@@ -2761,6 +2781,9 @@ int main(int argc, char *argv[])
 			create = 0;
 			multi_devices = 1;
 			break;
+		case 'd':
+			dump_data = true;
+			break;
 		case GETOPT_VAL_HELP:
 		default:
 			print_usage(c != GETOPT_VAL_HELP);
@@ -2779,10 +2802,15 @@ int main(int argc, char *argv[])
 			"create and restore cannot be used at the same time");
 			usage_error++;
 		}
+		if (dump_data && walk_trees) {
+			error("-d conflicts with -f option");
+			usage_error++;
+		}
 	} else {
-		if (walk_trees || sanitize != SANITIZE_NONE || compress_level) {
+		if (walk_trees || sanitize != SANITIZE_NONE || compress_level ||
+		    dump_data) {
 			error(
-			"using -w, -s, -c options for restore makes no sense");
+		"using -w, -s, -c, -d options for restore makes no sense");
 			usage_error++;
 		}
 		if (multi_devices && dev_cnt < 2) {
@@ -2835,7 +2863,8 @@ int main(int argc, char *argv[])
 		}
 
 		ret = create_metadump(source, out, num_threads,
-				      compress_level, sanitize, walk_trees);
+				      compress_level, sanitize, walk_trees,
+				      dump_data);
 	} else {
 		ret = restore_metadump(source, out, old_restore, num_threads,
 				       0, target, multi_devices);
diff --git a/image/metadump.h b/image/metadump.h
index 941d4b82..a04f63a9 100644
--- a/image/metadump.h
+++ b/image/metadump.h
@@ -38,7 +38,7 @@ struct dump_version {
 	unsigned int extra_sb_flags:1;
 };
 
-#define NR_DUMP_VERSIONS	1
+#define NR_DUMP_VERSIONS	2
 extern const struct dump_version dump_versions[NR_DUMP_VERSIONS];
 const extern struct dump_version *current_version;
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 7/9] btrfs-progs: image: Allow restore to record system chunk ranges for later usage
  2019-06-06 11:06 [PATCH 0/9] btrfs-progs: image: Data dump support, restore optimization and small fixes Qu Wenruo
                   ` (5 preceding siblings ...)
  2019-06-06 11:06 ` [PATCH 6/9] btrfs-progs: image: Introduce -d option to dump data Qu Wenruo
@ 2019-06-06 11:06 ` Qu Wenruo
  2019-06-06 11:06 ` [PATCH 8/9] btrfs-progs: image: Introduce helper to determine if a tree block is in the range of system chunks Qu Wenruo
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2019-06-06 11:06 UTC (permalink / raw)
  To: linux-btrfs

Currently we are doing a pretty slow search for system chunks before
restoring real data.
The current behavior is to search all clusters for chunk tree root
first, then search all clusters again and again for every chunk tree
block.

This causes recursive calls and pretty slow start up, the only good news
is since chunk tree are normally small, we don't need to iterate too
many times, thus overall it's acceptable.

To address such bad behavior, we could take usage of system chunk array
in the super block.
By recording all system chunks ranges, we could easily determine if an
extent belongs to chunk tree, thus do one loop simple linear search for
chunk tree leaves.

This patch only introduces the code base for later patches.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c | 103 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 103 insertions(+)

diff --git a/image/main.c b/image/main.c
index f394bfc8..0460a5f5 100644
--- a/image/main.c
+++ b/image/main.c
@@ -35,6 +35,7 @@
 #include "utils.h"
 #include "volumes.h"
 #include "extent_io.h"
+#include "extent-cache.h"
 #include "help.h"
 #include "image/metadump.h"
 #include "image/sanitize.h"
@@ -112,6 +113,11 @@ struct mdrestore_struct {
 	pthread_mutex_t mutex;
 	pthread_cond_t cond;
 
+	/*
+	 * Records system chunk ranges, so restore can use this to determine
+	 * if an item is in chunk tree range.
+	 */
+	struct cache_tree sys_chunks;
 	struct rb_root chunk_tree;
 	struct rb_root physical_tree;
 	struct list_head list;
@@ -121,6 +127,8 @@ struct mdrestore_struct {
 	u64 devid;
 	u64 alloced_chunks;
 	u64 last_physical_offset;
+	/* An quicker checker for if a item is in sys chunk range */
+	u64 sys_chunk_end;
 	u8 uuid[BTRFS_UUID_SIZE];
 	u8 fsid[BTRFS_FSID_SIZE];
 
@@ -1544,6 +1552,7 @@ static void mdrestore_destroy(struct mdrestore_struct *mdres, int num_threads)
 		rb_erase(&entry->p, &mdres->physical_tree);
 		free(entry);
 	}
+	free_extent_cache_tree(&mdres->sys_chunks);
 	pthread_mutex_lock(&mdres->mutex);
 	mdres->done = 1;
 	pthread_cond_broadcast(&mdres->cond);
@@ -1607,6 +1616,7 @@ static int mdrestore_init(struct mdrestore_struct *mdres,
 	pthread_mutex_init(&mdres->mutex, NULL);
 	INIT_LIST_HEAD(&mdres->list);
 	INIT_LIST_HEAD(&mdres->overlapping_chunks);
+	cache_tree_init(&mdres->sys_chunks);
 	mdres->in = in;
 	mdres->out = out;
 	mdres->old_restore = old_restore;
@@ -2025,6 +2035,92 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
 	return ret;
 }
 
+/*
+ * Add system chunks in super blocks into mdres->sys_chunks, so later
+ * we can determine if an item is a chunk tree block.
+ */
+static int add_sys_array(struct mdrestore_struct *mdres,
+			 struct btrfs_super_block *sb)
+{
+	struct btrfs_disk_key *disk_key;
+	struct btrfs_key key;
+	struct btrfs_chunk *chunk;
+	struct cache_extent *cache;
+	u32 cur_offset;
+	u32 len = 0;
+	u32 array_size;
+	u8 *array_ptr;
+	int ret;
+
+	array_size = btrfs_super_sys_array_size(sb);
+	array_ptr = sb->sys_chunk_array;
+	cur_offset = 0;
+
+	while (cur_offset < array_size) {
+		u32 num_stripes;
+
+		disk_key = (struct btrfs_disk_key *)array_ptr;
+		len = sizeof(*disk_key);
+		if (cur_offset + len > array_size)
+			goto out_short_read;
+		btrfs_disk_key_to_cpu(&key, disk_key);
+
+		array_ptr += len;
+		cur_offset += len;
+
+		if (key.type == BTRFS_CHUNK_ITEM_KEY) {
+			chunk = (struct btrfs_chunk *)array_ptr;
+
+			/*
+			 * At least one btrfs_chunk with one stripe must be
+			 * present, exact stripe count check comes afterwards
+			 */
+			len = btrfs_chunk_item_size(1);
+			if (cur_offset + len > array_size)
+				goto out_short_read;
+			num_stripes = btrfs_stack_chunk_num_stripes(chunk);
+			if (!num_stripes) {
+				printk(
+	    "ERROR: invalid number of stripes %u in sys_array at offset %u\n",
+					num_stripes, cur_offset);
+				ret = -EIO;
+				break;
+			}
+			len = btrfs_chunk_item_size(num_stripes);
+			if (cur_offset + len > array_size)
+				goto out_short_read;
+			if (btrfs_stack_chunk_type(chunk) &
+			    BTRFS_BLOCK_GROUP_SYSTEM) {
+				ret = add_merge_cache_extent(&mdres->sys_chunks,
+					key.offset,
+					btrfs_stack_chunk_length(chunk));
+				if (ret < 0)
+					break;
+			}
+		} else {
+			error("unexpected item type %u in sys_array offset %u",
+			      key.type, cur_offset);
+			ret = -EUCLEAN;
+			break;
+		}
+		array_ptr += len;
+		cur_offset += len;
+	}
+
+	/* Get the last system chunk end as a quicker check */
+	cache = last_cache_extent(&mdres->sys_chunks);
+	if (!cache) {
+		error("no system chunk found in super block");
+		return -EUCLEAN;
+	}
+	mdres->sys_chunk_end = cache->start + cache->size - 1;
+	return ret;
+out_short_read:
+	error("sys_array too short to read %u bytes at offset %u\n",
+		len, cur_offset);
+	return -EUCLEAN;
+}
+
 static int build_chunk_tree(struct mdrestore_struct *mdres,
 			    struct meta_cluster *cluster)
 {
@@ -2117,6 +2213,13 @@ static int build_chunk_tree(struct mdrestore_struct *mdres,
 		error("invalid superblock");
 		return ret;
 	}
+	ret = add_sys_array(mdres, super);
+	if (ret < 0) {
+		error("failed to read system chunk array");
+		free(buffer);
+		pthread_mutex_unlock(&mdres->mutex);
+		return ret;
+	}
 	chunk_root_bytenr = btrfs_super_chunk_root(super);
 	mdres->nodesize = btrfs_super_nodesize(super);
 	if (btrfs_super_incompat_flags(super) &
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 8/9] btrfs-progs: image: Introduce helper to determine if a tree block is in the range of system chunks
  2019-06-06 11:06 [PATCH 0/9] btrfs-progs: image: Data dump support, restore optimization and small fixes Qu Wenruo
                   ` (6 preceding siblings ...)
  2019-06-06 11:06 ` [PATCH 7/9] btrfs-progs: image: Allow restore to record system chunk ranges for later usage Qu Wenruo
@ 2019-06-06 11:06 ` Qu Wenruo
  2019-06-06 11:06 ` [PATCH 9/9] btrfs-progs: image: Rework how we search chunk tree blocks Qu Wenruo
  2019-06-14 15:48 ` [PATCH 0/9] btrfs-progs: image: Data dump support, restore optimization and small fixes David Sterba
  9 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2019-06-06 11:06 UTC (permalink / raw)
  To: linux-btrfs

Introduce a new helper function, is_in_sys_chunks(), to determine if an
item is in the range of system chunks.

Since btrfs-image will merge adjacent same type extents into one item,
this function is designed to return true for any bytes in system chunk
range.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/image/main.c b/image/main.c
index 0460a5f5..dc677409 100644
--- a/image/main.c
+++ b/image/main.c
@@ -1780,6 +1780,54 @@ static int wait_for_worker(struct mdrestore_struct *mdres)
 	return ret;
 }
 
+/*
+ * Check if a range [start ,start + len] has ANY bytes covered by
+ * system chunks ranges.
+ */
+static bool is_in_sys_chunks(struct mdrestore_struct *mdres, u64 start,
+			     u64 len)
+{
+	struct rb_node *node = mdres->sys_chunks.root.rb_node;
+	struct cache_extent *entry;
+	struct cache_extent *next;
+	struct cache_extent *prev;
+
+	if (start > mdres->sys_chunk_end)
+		return false;
+
+	while (node) {
+		entry = rb_entry(node, struct cache_extent, rb_node);
+		if (start > entry->start) {
+			if (!node->rb_right)
+				break;
+			node = node->rb_right;
+		} else if (start < entry->start) {
+			if (!node->rb_left)
+				break;
+			node = node->rb_left;
+		} else {
+			/* already in a system chunk */
+			return true;
+		}
+	}
+	if (!node)
+		return false;
+	entry = rb_entry(node, struct cache_extent, rb_node);
+	/* Now we have entry which is the nearst chunk around @start */
+	if (start > entry->start) {
+		prev = entry;
+		next = next_cache_extent(entry);
+	} else {
+		prev = prev_cache_extent(entry);
+		next = entry;
+	}
+	if (prev && prev->start + prev->size > start)
+		return true;
+	if (next && start + len > next->start)
+		return true;
+	return false;
+}
+
 static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer,
 			    u64 bytenr, u64 item_bytenr, u32 bufsize,
 			    u64 cluster_bytenr)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 9/9] btrfs-progs: image: Rework how we search chunk tree blocks
  2019-06-06 11:06 [PATCH 0/9] btrfs-progs: image: Data dump support, restore optimization and small fixes Qu Wenruo
                   ` (7 preceding siblings ...)
  2019-06-06 11:06 ` [PATCH 8/9] btrfs-progs: image: Introduce helper to determine if a tree block is in the range of system chunks Qu Wenruo
@ 2019-06-06 11:06 ` Qu Wenruo
  2019-06-14 15:48 ` [PATCH 0/9] btrfs-progs: image: Data dump support, restore optimization and small fixes David Sterba
  9 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2019-06-06 11:06 UTC (permalink / raw)
  To: linux-btrfs

Before this patch, we were using a very inefficient way to search
chunks:

We iterate through all clusters to find the chunk root tree block first,
then re-iterate all clusters again to find every child tree blocks.

Every time we need to iterate all clusters just to find a chunk tree
block.
This is obviously inefficient, specially when chunk tree get larger.
So the original author leaves a comment on it:
  /* If you have to ask you aren't worthy */
  static int search_for_chunk_blocks()

This patch will change the behavior so that we will only iterate all
clusters once.

The idea behind the optimization is, since we have the superblock
restored first, we could use the CHUNK_ITEMs in
super_block::sys_chunk_array to build a SYSTEM chunk mapping.

Then when we start to iterate through all items, we can easily skip
unrelated items at different level:
- At cluster level
  If a cluster starts beyond last system chunk map, it must not contain
  any chunk tree blocks (as chunk tree blocks only lives inside system
  chunks)

- At item level
  If one item has no intersection with any system chunk map, then it
  must not contain any tree blocks.

By this, we can iterate through all clusters just once, and find out all
CHUNK_ITEMs.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c | 213 +++++++++++++++++++++++++++------------------------
 1 file changed, 113 insertions(+), 100 deletions(-)

diff --git a/image/main.c b/image/main.c
index dc677409..8cecb228 100644
--- a/image/main.c
+++ b/image/main.c
@@ -142,8 +142,6 @@ struct mdrestore_struct {
 	struct btrfs_fs_info *info;
 };
 
-static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
-				   u64 search, u64 cluster_bytenr);
 static struct extent_buffer *alloc_dummy_eb(u64 bytenr, u32 size);
 
 static void csum_block(u8 *buf, size_t len)
@@ -1828,67 +1826,17 @@ static bool is_in_sys_chunks(struct mdrestore_struct *mdres, u64 start,
 	return false;
 }
 
-static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer,
-			    u64 bytenr, u64 item_bytenr, u32 bufsize,
-			    u64 cluster_bytenr)
+static int read_chunk_tree_block(struct mdrestore_struct *mdres,
+				 struct extent_buffer *eb)
 {
-	struct extent_buffer *eb;
-	int ret = 0;
 	int i;
 
-	eb = alloc_dummy_eb(bytenr, mdres->nodesize);
-	if (!eb) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	while (item_bytenr != bytenr) {
-		buffer += mdres->nodesize;
-		item_bytenr += mdres->nodesize;
-	}
-
-	memcpy(eb->data, buffer, mdres->nodesize);
-	if (btrfs_header_bytenr(eb) != bytenr) {
-		error("eb bytenr does not match found bytenr: %llu != %llu",
-				(unsigned long long)btrfs_header_bytenr(eb),
-				(unsigned long long)bytenr);
-		ret = -EIO;
-		goto out;
-	}
-
-	if (memcmp(mdres->fsid, eb->data + offsetof(struct btrfs_header, fsid),
-		   BTRFS_FSID_SIZE)) {
-		error("filesystem metadata UUID of eb %llu does not match",
-				(unsigned long long)bytenr);
-		ret = -EIO;
-		goto out;
-	}
-
-	if (btrfs_header_owner(eb) != BTRFS_CHUNK_TREE_OBJECTID) {
-		error("wrong eb %llu owner %llu",
-				(unsigned long long)bytenr,
-				(unsigned long long)btrfs_header_owner(eb));
-		ret = -EIO;
-		goto out;
-	}
-
 	for (i = 0; i < btrfs_header_nritems(eb); i++) {
 		struct btrfs_chunk *chunk;
 		struct fs_chunk *fs_chunk;
 		struct btrfs_key key;
 		u64 type;
 
-		if (btrfs_header_level(eb)) {
-			u64 blockptr = btrfs_node_blockptr(eb, i);
-
-			ret = search_for_chunk_blocks(mdres, blockptr,
-						      cluster_bytenr);
-			if (ret)
-				break;
-			continue;
-		}
-
-		/* Yay a leaf!  We loves leafs! */
 		btrfs_item_key_to_cpu(eb, &key, i);
 		if (key.type != BTRFS_CHUNK_ITEM_KEY)
 			continue;
@@ -1896,8 +1844,7 @@ static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer,
 		fs_chunk = malloc(sizeof(struct fs_chunk));
 		if (!fs_chunk) {
 			error("not enough memory to allocate chunk");
-			ret = -ENOMEM;
-			break;
+			return -ENOMEM;
 		}
 		memset(fs_chunk, 0, sizeof(*fs_chunk));
 		chunk = btrfs_item_ptr(eb, i, struct btrfs_chunk);
@@ -1906,19 +1853,18 @@ static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer,
 		fs_chunk->physical = btrfs_stripe_offset_nr(eb, chunk, 0);
 		fs_chunk->bytes = btrfs_chunk_length(eb, chunk);
 		INIT_LIST_HEAD(&fs_chunk->list);
+
 		if (tree_search(&mdres->physical_tree, &fs_chunk->p,
 				physical_cmp, 1) != NULL)
 			list_add(&fs_chunk->list, &mdres->overlapping_chunks);
 		else
 			tree_insert(&mdres->physical_tree, &fs_chunk->p,
 				    physical_cmp);
-
 		type = btrfs_chunk_type(eb, chunk);
 		if (type & BTRFS_BLOCK_GROUP_DUP) {
 			fs_chunk->physical_dup =
 					btrfs_stripe_offset_nr(eb, chunk, 1);
 		}
-
 		if (fs_chunk->physical_dup + fs_chunk->bytes >
 		    mdres->last_physical_offset)
 			mdres->last_physical_offset = fs_chunk->physical_dup +
@@ -1933,19 +1879,80 @@ static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer,
 			mdres->alloced_chunks += fs_chunk->bytes;
 		tree_insert(&mdres->chunk_tree, &fs_chunk->l, chunk_cmp);
 	}
-out:
+	return 0;
+}
+
+static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer,
+			    u64 item_bytenr, u32 bufsize,
+			    u64 cluster_bytenr)
+{
+	struct extent_buffer *eb;
+	u32 nodesize = mdres->nodesize;
+	u64 bytenr;
+	size_t cur_offset;
+	int ret = 0;
+
+	eb = alloc_dummy_eb(0, mdres->nodesize);
+	if (!eb)
+		return -ENOMEM;
+
+	for (cur_offset = 0; cur_offset < bufsize; cur_offset += nodesize) {
+		bytenr = item_bytenr + cur_offset;
+		if (!is_in_sys_chunks(mdres, bytenr, nodesize))
+			continue;
+		memcpy(eb->data, buffer + cur_offset, nodesize);
+		if (btrfs_header_bytenr(eb) != bytenr) {
+			error(
+			"eb bytenr does not match found bytenr: %llu != %llu",
+				(unsigned long long)btrfs_header_bytenr(eb),
+				(unsigned long long)bytenr);
+			ret = -EUCLEAN;
+			break;
+		}
+		if (memcmp(mdres->fsid, eb->data +
+			   offsetof(struct btrfs_header, fsid),
+		    BTRFS_FSID_SIZE)) {
+			error(
+			"filesystem metadata UUID of eb %llu does not match",
+				bytenr);
+			ret = -EUCLEAN;
+			break;
+		}
+		if (btrfs_header_owner(eb) != BTRFS_CHUNK_TREE_OBJECTID) {
+			error("wrong eb %llu owner %llu",
+				(unsigned long long)bytenr,
+				(unsigned long long)btrfs_header_owner(eb));
+			ret = -EUCLEAN;
+			break;
+		}
+		/*
+		 * No need to search node, as we will iterate all tree blocks
+		 * in chunk tree, only need to bother leaves.
+		 */
+		if (btrfs_header_level(eb))
+			continue;
+		ret = read_chunk_tree_block(mdres, eb);
+		if (ret < 0)
+			break;
+	}
 	free(eb);
 	return ret;
 }
 
-/* If you have to ask you aren't worthy */
-static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
-				   u64 search, u64 cluster_bytenr)
+/*
+ * This function will try to find all chunk items in the dump image.
+ *
+ * This function will iterate all clusters, and find any item inside
+ * system chunk ranges.
+ * For such item, it will try to read them as tree blocks, and find
+ * CHUNK_ITEMs, add them to @mdres.
+ */
+static int search_for_chunk_blocks(struct mdrestore_struct *mdres)
 {
 	struct meta_cluster *cluster;
 	struct meta_cluster_header *header;
 	struct meta_cluster_item *item;
-	u64 current_cluster = cluster_bytenr, bytenr;
+	u64 current_cluster = 0, bytenr;
 	u64 item_bytenr;
 	u32 bufsize, nritems, i;
 	u32 max_size = current_version->max_pending_size * 2;
@@ -1976,43 +1983,45 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
 	}
 
 	bytenr = current_cluster;
+	/* Main loop, iterating all clusters */
 	while (1) {
 		if (fseek(mdres->in, current_cluster, SEEK_SET)) {
 			error("seek failed: %m");
 			ret = -EIO;
-			break;
+			goto out;
 		}
 
 		ret = fread(cluster, BLOCK_SIZE, 1, mdres->in);
 		if (ret == 0) {
-			if (cluster_bytenr != 0) {
-				cluster_bytenr = 0;
-				current_cluster = 0;
-				bytenr = 0;
-				continue;
-			}
+			if (feof(mdres->in))
+				goto out;
 			error(
 	"unknown state after reading cluster at %llu, probably corrupted data",
-					cluster_bytenr);
+					current_cluster);
 			ret = -EIO;
-			break;
+			goto out;
 		} else if (ret < 0) {
 			error("unable to read image at %llu: %m",
-					(unsigned long long)cluster_bytenr);
-			break;
+					current_cluster);
+			goto out;
 		}
-		ret = 0;
 
 		header = &cluster->header;
 		if (le64_to_cpu(header->magic) != current_version->magic_cpu ||
 		    le64_to_cpu(header->bytenr) != current_cluster) {
 			error("bad header in metadump image");
 			ret = -EIO;
-			break;
+			goto out;
 		}
 
+		/* We're already over the system chunk end, no need to search*/
+		if (current_cluster > mdres->sys_chunk_end)
+			goto out;
+
 		bytenr += BLOCK_SIZE;
 		nritems = le32_to_cpu(header->nritems);
+
+		/* Search items for tree blocks in sys chunks */
 		for (i = 0; i < nritems; i++) {
 			size_t size;
 
@@ -2020,11 +2029,21 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
 			bufsize = le32_to_cpu(item->size);
 			item_bytenr = le64_to_cpu(item->bytenr);
 
-			if (bufsize > max_size) {
-				error("item %u too big: %u > %u", i, bufsize,
-						max_size);
-				ret = -EIO;
-				break;
+			/*
+			 * Only data extent/free space cache can be that big,
+			 * adjacent tree blocks won't be able to be merged
+			 * beyond max_size.
+			 */
+			if (bufsize > max_size ||
+			    !is_in_sys_chunks(mdres, item_bytenr, bufsize)) {
+				ret = fseek(mdres->in, bufsize, SEEK_CUR);
+				if (ret < 0) {
+					error("failed to seek: %m");
+					ret = -errno;
+					goto out;
+				}
+				bytenr += bufsize;
+				continue;
 			}
 
 			if (mdres->compress_method == COMPRESS_ZLIB) {
@@ -2032,7 +2051,7 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
 				if (ret != 1) {
 					error("read error: %m");
 					ret = -EIO;
-					break;
+					goto out;
 				}
 
 				size = max_size;
@@ -2043,40 +2062,36 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
 					error("decompression failed with %d",
 							ret);
 					ret = -EIO;
-					break;
+					goto out;
 				}
 			} else {
 				ret = fread(buffer, bufsize, 1, mdres->in);
 				if (ret != 1) {
 					error("read error: %m");
 					ret = -EIO;
-					break;
+					goto out;
 				}
 				size = bufsize;
 			}
 			ret = 0;
 
-			if (item_bytenr <= search &&
-			    item_bytenr + size > search) {
-				ret = read_chunk_block(mdres, buffer, search,
-						       item_bytenr, size,
-						       current_cluster);
-				if (!ret)
-					ret = 1;
-				break;
+			ret = read_chunk_block(mdres, buffer,
+					       item_bytenr, size,
+					       current_cluster);
+			if (ret < 0) {
+				error(
+	"failed to search tree blocks in item bytenr %llu size %lu",
+					item_bytenr, size);
+				goto out;
 			}
 			bytenr += bufsize;
 		}
-		if (ret) {
-			if (ret > 0)
-				ret = 0;
-			break;
-		}
 		if (bytenr & BLOCK_MASK)
 			bytenr += BLOCK_SIZE - (bytenr & BLOCK_MASK);
 		current_cluster = bytenr;
 	}
 
+out:
 	free(tmp);
 	free(buffer);
 	free(cluster);
@@ -2175,7 +2190,6 @@ static int build_chunk_tree(struct mdrestore_struct *mdres,
 	struct btrfs_super_block *super;
 	struct meta_cluster_header *header;
 	struct meta_cluster_item *item = NULL;
-	u64 chunk_root_bytenr = 0;
 	u32 i, nritems;
 	u64 bytenr = 0;
 	u8 *buffer;
@@ -2268,7 +2282,6 @@ static int build_chunk_tree(struct mdrestore_struct *mdres,
 		pthread_mutex_unlock(&mdres->mutex);
 		return ret;
 	}
-	chunk_root_bytenr = btrfs_super_chunk_root(super);
 	mdres->nodesize = btrfs_super_nodesize(super);
 	if (btrfs_super_incompat_flags(super) &
 	    BTRFS_FEATURE_INCOMPAT_METADATA_UUID)
@@ -2281,7 +2294,7 @@ static int build_chunk_tree(struct mdrestore_struct *mdres,
 	free(buffer);
 	pthread_mutex_unlock(&mdres->mutex);
 
-	return search_for_chunk_blocks(mdres, chunk_root_bytenr, 0);
+	return search_for_chunk_blocks(mdres);
 }
 
 static int range_contains_super(u64 physical, u64 bytes)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/9] btrfs-progs: image: Fix a access-beyond-boundary bug when there are 32 online CPUs
  2019-06-06 11:06 ` [PATCH 3/9] btrfs-progs: image: Fix a access-beyond-boundary bug when there are 32 online CPUs Qu Wenruo
@ 2019-06-10  1:23   ` Su Yue
  2019-06-10  1:28     ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: Su Yue @ 2019-06-10  1:23 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 2019/6/6 7:06 PM, Qu Wenruo wrote:
> [BUG]
> When there are over 32 (in my example, 35) online CPUs, btrfs-image -c9
> will just hang.
>
> [CAUSE]
> Btrfs-image has a hard coded limit (32) on how many threads we can use.
> For the "-t" option we do the up limit check.
>
> But when we don't specify "-t" option and speicified "-c" option, then
> btrfs-image will try to auto detect the number of online CPUs, and use
> it without checking if it's over the up limit.
>
> And for num_threads larger than the up limit, we will over write the
> adjust members of metadump_struct/mdrestore_struct, corrupting
> pthread_mutex_t and pthread_cond_t, causing synchronising problem.
>
> Nowadays, with SMT/HT and higher cpu core counts, it's not hard to go
> beyond 32 threads, and hit the bug.
>
> [FIX]
> Just do extra num_threads check before using the number from sysconf().
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

This does fix an issue.
And as the commit says, why limit the max threads to 32?
Does it still make sense in nowadays multiple cores CPU?
Can we increase the limit?
However, this is another story.

For this patch:
Reviewed-by: Su Yue <Damenly_Su@gmx.com>

> ---
>   image/main.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/image/main.c b/image/main.c
> index fb9fc48c..80f09c21 100644
> --- a/image/main.c
> +++ b/image/main.c
> @@ -2758,6 +2758,7 @@ int main(int argc, char *argv[])
>
>   			if (tmp <= 0)
>   				tmp = 1;
> +			tmp = min_t(long, tmp, MAX_WORKER_THREADS);
>   			num_threads = tmp;
>   		}
>   	} else {
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/9] btrfs-progs: image: Fix a access-beyond-boundary bug when there are 32 online CPUs
  2019-06-10  1:23   ` Su Yue
@ 2019-06-10  1:28     ` Qu Wenruo
  0 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2019-06-10  1:28 UTC (permalink / raw)
  To: Su Yue, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2343 bytes --]



On 2019/6/10 上午9:23, Su Yue wrote:
> 
> 
> On 2019/6/6 7:06 PM, Qu Wenruo wrote:
>> [BUG]
>> When there are over 32 (in my example, 35) online CPUs, btrfs-image -c9
>> will just hang.
>>
>> [CAUSE]
>> Btrfs-image has a hard coded limit (32) on how many threads we can use.
>> For the "-t" option we do the up limit check.
>>
>> But when we don't specify "-t" option and speicified "-c" option, then
>> btrfs-image will try to auto detect the number of online CPUs, and use
>> it without checking if it's over the up limit.
>>
>> And for num_threads larger than the up limit, we will over write the
>> adjust members of metadump_struct/mdrestore_struct, corrupting
>> pthread_mutex_t and pthread_cond_t, causing synchronising problem.
>>
>> Nowadays, with SMT/HT and higher cpu core counts, it's not hard to go
>> beyond 32 threads, and hit the bug.
>>
>> [FIX]
>> Just do extra num_threads check before using the number from sysconf().
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> 
> This does fix an issue.
> And as the commit says, why limit the max threads to 32?

That's completely due to the hard coded metadump_struct.
We can switch to dynamically allocated pthread_t. Shouldn't be that hard
to convert.

> Does it still make sense in nowadays multiple cores CPU?

Well, thanks to the slow improvement caused by monopoly (cough, cough,
Intel), after one decade we're finally getting mainstream
16core/32threads CPUs in this year.

Personally speaking 32 threads should be already good enough for such a
less-common used tools.

So I'd prefer to keep the hard-coded limit for a while.

Thanks,
Qu

> Can we increase the limit?
> However, this is another story.
> 
> For this patch:
> Reviewed-by: Su Yue <Damenly_Su@gmx.com>
> 
>> ---
>>   image/main.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/image/main.c b/image/main.c
>> index fb9fc48c..80f09c21 100644
>> --- a/image/main.c
>> +++ b/image/main.c
>> @@ -2758,6 +2758,7 @@ int main(int argc, char *argv[])
>>
>>               if (tmp <= 0)
>>                   tmp = 1;
>> +            tmp = min_t(long, tmp, MAX_WORKER_THREADS);
>>               num_threads = tmp;
>>           }
>>       } else {
>>
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/9] btrfs-progs: image: Data dump support, restore optimization and small fixes
  2019-06-06 11:06 [PATCH 0/9] btrfs-progs: image: Data dump support, restore optimization and small fixes Qu Wenruo
                   ` (8 preceding siblings ...)
  2019-06-06 11:06 ` [PATCH 9/9] btrfs-progs: image: Rework how we search chunk tree blocks Qu Wenruo
@ 2019-06-14 15:48 ` David Sterba
  9 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-06-14 15:48 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Jun 06, 2019 at 07:06:02PM +0800, Qu Wenruo wrote:
> This patchset can be fetched from github:
> https://github.com/adam900710/btrfs-progs/tree/image_data_dump
> Which is based on v5.1 tag.
> 
> This patchset contains the following main features:
> - various small fixes for btrfs-image
>   From indent misalign, SZ_* cleanup to too many core cores causing
>   btrfs-image crash.

I've applied 1-4 to devel for now.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-06-14 15:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-06 11:06 [PATCH 0/9] btrfs-progs: image: Data dump support, restore optimization and small fixes Qu Wenruo
2019-06-06 11:06 ` [PATCH 1/9] btrfs-progs: image: Use SZ_* to replace intermediate size Qu Wenruo
2019-06-06 11:06 ` [PATCH 2/9] btrfs-progs: image: Fix a indent misalign Qu Wenruo
2019-06-06 11:06 ` [PATCH 3/9] btrfs-progs: image: Fix a access-beyond-boundary bug when there are 32 online CPUs Qu Wenruo
2019-06-10  1:23   ` Su Yue
2019-06-10  1:28     ` Qu Wenruo
2019-06-06 11:06 ` [PATCH 4/9] btrfs-progs: image: Verify the superblock before restore Qu Wenruo
2019-06-06 11:06 ` [PATCH 5/9] btrfs-progs: image: Introduce framework for more dump versions Qu Wenruo
2019-06-06 11:06 ` [PATCH 6/9] btrfs-progs: image: Introduce -d option to dump data Qu Wenruo
2019-06-06 11:06 ` [PATCH 7/9] btrfs-progs: image: Allow restore to record system chunk ranges for later usage Qu Wenruo
2019-06-06 11:06 ` [PATCH 8/9] btrfs-progs: image: Introduce helper to determine if a tree block is in the range of system chunks Qu Wenruo
2019-06-06 11:06 ` [PATCH 9/9] btrfs-progs: image: Rework how we search chunk tree blocks Qu Wenruo
2019-06-14 15:48 ` [PATCH 0/9] btrfs-progs: image: Data dump support, restore optimization and small fixes David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).