linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes
@ 2019-07-02 10:07 WenRuo Qu
  2019-07-02 10:07 ` [PATCH v2 01/14] btrfs-progs: image: Use SZ_* to replace intermediate size WenRuo Qu
                   ` (14 more replies)
  0 siblings, 15 replies; 17+ messages in thread
From: WenRuo Qu @ 2019-07-02 10:07 UTC (permalink / raw)
  To: linux-btrfs; +Cc: WenRuo Qu

This patchset is based on v5.1.1 tag.

With this update, the patchset has the following features:
- various small fixes and enhancements for btrfs-image
  * Fix an indent misalign
  * Fix an access-beyond-boundary bug
  * Fix a confusing error message due to unpopulated errno
  * Output error message for chunk tree build error
  * Use SZ_* to replace intermediate number
  * Verify superblock before restore

- btrfs-image dump support 
  This introduce a new option -d to dump data.
  Due to item size limit, we have to enlarge the existing limit from
  256K (enough for tree blocks, but not enough for free space cache) to
  256M.
  This change will cause incompatibility, thus we have to introduce a
  new magic as version. While keeping all other on-disk format the same.

- Reduce memory usage for both compressed and uncompressed images
  Originally for compressed extents, we will use 4 * max_pending_size as
  output buffer, which can be 1G for 256M newer limit.

  Change it to use at most 512K for compressed extent output buf, and
  also use 512K fixed buffer size for uncompressed extent.

- btrfs-image restore optimization
  This will speed up chunk item search during restore.

Changelog:
v2:
- New small fixes:
  * Fix a confusing error message due to unpopulated errno
  * Output error message for chunk tree build error
  
- Fix a regression of previous version
  Patch "btrfs-progs: image: Rework how we search chunk tree blocks"
  deleted a "ret = 0" line which could cause false early exit.

- Reduce memory usage for data dump

Qu Wenruo (14):
  btrfs-progs: image: Use SZ_* to replace intermediate size
  btrfs-progs: image: Fix an indent misalign
  btrfs-progs: image: Fix an access-beyond-boundary bug when there are
    32 online CPUs
  btrfs-progs: image: Verify the superblock before restore
  btrfs-progs: image: Introduce framework for more dump versions
  btrfs-progs: image: Introduce -d option to dump data
  btrfs-progs: image: Allow restore to record system chunk ranges for
    later usage
  btrfs-progs: image: Introduce helper to determine if a tree block is
    in the range of system chunks
  btrfs-progs: image: Rework how we search chunk tree blocks
  btrfs-progs: image: Reduce memory requirement for decompression
  btrfs-progs: image: Don't waste memory when we're just extracting
    super block
  btrfs-progs: image: Reduce memory usage for chunk tree search
  btrfs-progs: image: Output error message for chunk tree build error
  btrfs-progs: image: Fix error output to show correct return value

 disk-io.c        |   6 +-
 disk-io.h        |   1 +
 image/main.c     | 874 +++++++++++++++++++++++++++++++++++------------
 image/metadump.h |  15 +-
 4 files changed, 666 insertions(+), 230 deletions(-)

-- 
2.22.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 01/14] btrfs-progs: image: Use SZ_* to replace intermediate size
  2019-07-02 10:07 [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes WenRuo Qu
@ 2019-07-02 10:07 ` WenRuo Qu
  2019-07-02 10:07 ` [PATCH v2 02/14] btrfs-progs: image: Fix an indent misalign WenRuo Qu
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: WenRuo Qu @ 2019-07-02 10:07 UTC (permalink / raw)
  To: linux-btrfs; +Cc: WenRuo Qu

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/metadump.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/image/metadump.h b/image/metadump.h
index 8ace60f503d0..f85c9bcfb813 100644
--- a/image/metadump.h
+++ b/image/metadump.h
@@ -23,8 +23,8 @@
 #include "ctree.h"
 
 #define HEADER_MAGIC		0xbd5c25e27295668bULL
-#define MAX_PENDING_SIZE	(256 * 1024)
-#define BLOCK_SIZE		1024
+#define MAX_PENDING_SIZE	SZ_256K
+#define BLOCK_SIZE		SZ_1K
 #define BLOCK_MASK		(BLOCK_SIZE - 1)
 
 #define ITEMS_PER_CLUSTER ((BLOCK_SIZE - sizeof(struct meta_cluster)) / \
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 02/14] btrfs-progs: image: Fix an indent misalign
  2019-07-02 10:07 [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes WenRuo Qu
  2019-07-02 10:07 ` [PATCH v2 01/14] btrfs-progs: image: Use SZ_* to replace intermediate size WenRuo Qu
@ 2019-07-02 10:07 ` WenRuo Qu
  2019-07-02 10:07 ` [PATCH v2 03/14] btrfs-progs: image: Fix an access-beyond-boundary bug when there are 32 online CPUs WenRuo Qu
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: WenRuo Qu @ 2019-07-02 10:07 UTC (permalink / raw)
  To: linux-btrfs; +Cc: WenRuo Qu

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/image/main.c b/image/main.c
index 86845dadc958..9a07d9455e4f 100644
--- a/image/main.c
+++ b/image/main.c
@@ -2645,7 +2645,7 @@ int main(int argc, char *argv[])
 			create = 0;
 			multi_devices = 1;
 			break;
-			case GETOPT_VAL_HELP:
+		case GETOPT_VAL_HELP:
 		default:
 			print_usage(c != GETOPT_VAL_HELP);
 		}
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 03/14] btrfs-progs: image: Fix an access-beyond-boundary bug when there are 32 online CPUs
  2019-07-02 10:07 [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes WenRuo Qu
  2019-07-02 10:07 ` [PATCH v2 01/14] btrfs-progs: image: Use SZ_* to replace intermediate size WenRuo Qu
  2019-07-02 10:07 ` [PATCH v2 02/14] btrfs-progs: image: Fix an indent misalign WenRuo Qu
@ 2019-07-02 10:07 ` WenRuo Qu
  2019-07-02 10:07 ` [PATCH v2 04/14] btrfs-progs: image: Verify the superblock before restore WenRuo Qu
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: WenRuo Qu @ 2019-07-02 10:07 UTC (permalink / raw)
  To: linux-btrfs; +Cc: WenRuo Qu

[BUG]
When there are over 32 (in my example, 35) online CPUs, btrfs-image -c9
will just hang.

[CAUSE]
Btrfs-image has a hard coded limit (32) on how many threads we can use.
For the "-t" option we do the up limit check.

But when we don't specify "-t" option and speicified "-c" option, then
btrfs-image will try to auto detect the number of online CPUs, and use
it without checking if it's over the up limit.

And for num_threads larger than the up limit, we will over write the
adjust members of metadump_struct/mdrestore_struct, corrupting
pthread_mutex_t and pthread_cond_t, causing synchronising problem.

Nowadays, with SMT/HT and higher cpu core counts, it's not hard to go
beyond 32 threads, and hit the bug.

[FIX]
Just do extra num_threads check before using the number from sysconf().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/image/main.c b/image/main.c
index 9a07d9455e4f..c45d506812b2 100644
--- a/image/main.c
+++ b/image/main.c
@@ -2701,6 +2701,7 @@ int main(int argc, char *argv[])
 
 			if (tmp <= 0)
 				tmp = 1;
+			tmp = min_t(long, tmp, MAX_WORKER_THREADS);
 			num_threads = tmp;
 		}
 	} else {
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 04/14] btrfs-progs: image: Verify the superblock before restore
  2019-07-02 10:07 [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes WenRuo Qu
                   ` (2 preceding siblings ...)
  2019-07-02 10:07 ` [PATCH v2 03/14] btrfs-progs: image: Fix an access-beyond-boundary bug when there are 32 online CPUs WenRuo Qu
@ 2019-07-02 10:07 ` WenRuo Qu
  2019-07-02 10:07 ` [PATCH v2 05/14] btrfs-progs: image: Introduce framework for more dump versions WenRuo Qu
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: WenRuo Qu @ 2019-07-02 10:07 UTC (permalink / raw)
  To: linux-btrfs; +Cc: WenRuo Qu

This patch will export disk-io.c::check_super() as btrfs_check_super()
and use it in btrfs-image for extra verification.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 disk-io.c    | 6 +++---
 disk-io.h    | 1 +
 image/main.c | 5 +++++
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/disk-io.c b/disk-io.c
index 151eb3b5a278..ffe4a8c58060 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -1347,7 +1347,7 @@ struct btrfs_root *open_ctree_fd(int fp, const char *path, u64 sb_bytenr,
  * - number of devices   - something sane
  * - sys array size      - maximum
  */
-static int check_super(struct btrfs_super_block *sb, unsigned sbflags)
+int btrfs_check_super(struct btrfs_super_block *sb, unsigned sbflags)
 {
 	u8 result[BTRFS_CSUM_SIZE];
 	u32 crc;
@@ -1547,7 +1547,7 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr,
 		if (btrfs_super_bytenr(buf) != sb_bytenr)
 			return -EIO;
 
-		ret = check_super(buf, sbflags);
+		ret = btrfs_check_super(buf, sbflags);
 		if (ret < 0)
 			return ret;
 		memcpy(sb, buf, BTRFS_SUPER_INFO_SIZE);
@@ -1572,7 +1572,7 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr,
 		/* if magic is NULL, the device was removed */
 		if (btrfs_super_magic(buf) == 0 && i == 0)
 			break;
-		if (check_super(buf, sbflags))
+		if (btrfs_check_super(buf, sbflags))
 			continue;
 
 		if (!fsid_is_initialized) {
diff --git a/disk-io.h b/disk-io.h
index ddf3a3803ed5..c97aa2344ac9 100644
--- a/disk-io.h
+++ b/disk-io.h
@@ -171,6 +171,7 @@ static inline int close_ctree(struct btrfs_root *root)
 
 int write_all_supers(struct btrfs_fs_info *fs_info);
 int write_ctree_super(struct btrfs_trans_handle *trans);
+int btrfs_check_super(struct btrfs_super_block *sb, unsigned sbflags);
 int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr,
 		unsigned sbflags);
 int btrfs_map_bh_to_logical(struct btrfs_root *root, struct extent_buffer *bh,
diff --git a/image/main.c b/image/main.c
index c45d506812b2..f155794cfc19 100644
--- a/image/main.c
+++ b/image/main.c
@@ -1983,6 +1983,11 @@ static int build_chunk_tree(struct mdrestore_struct *mdres,
 
 	pthread_mutex_lock(&mdres->mutex);
 	super = (struct btrfs_super_block *)buffer;
+	ret = btrfs_check_super(super, 0);
+	if (ret < 0) {
+		error("invalid superblock");
+		return ret;
+	}
 	chunk_root_bytenr = btrfs_super_chunk_root(super);
 	mdres->nodesize = btrfs_super_nodesize(super);
 	if (btrfs_super_incompat_flags(super) &
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 05/14] btrfs-progs: image: Introduce framework for more dump versions
  2019-07-02 10:07 [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes WenRuo Qu
                   ` (3 preceding siblings ...)
  2019-07-02 10:07 ` [PATCH v2 04/14] btrfs-progs: image: Verify the superblock before restore WenRuo Qu
@ 2019-07-02 10:07 ` WenRuo Qu
  2019-07-02 10:07 ` [PATCH v2 06/14] btrfs-progs: image: Introduce -d option to dump data WenRuo Qu
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: WenRuo Qu @ 2019-07-02 10:07 UTC (permalink / raw)
  To: linux-btrfs; +Cc: WenRuo Qu

The original dump format only contains a @magic member to verify the
format, this means if we want to introduce new on-disk format or change
certain size limit, we can only introduce new magic as kind of version.

This patch will introduce the framework to allow multiple magic to
co-exist for further functions.

This patch will introduce the following members for each dump version.

- max_pending_size
  The threshold size for an cluster. It's not a hard limit but a soft
  one. One cluster can go larger than max_pending_size for one item, but
  next item would go to next cluster.

- magic_cpu
  The magic number in CPU endian.

- extra_sb_flags
  If the super block of this restore needs extra super block flags like
  BTRFS_SUPER_FLAG_METADUMP_V2.
  For incoming data dump feature, we don't need any extra super block
  flags.

This change also implies that all image dumps will use the same magic
for all clusters. No mixing is allowed, as we will use the first cluster
to determine the dump version.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c     | 80 ++++++++++++++++++++++++++++++++++++++++--------
 image/metadump.h | 13 ++++++--
 2 files changed, 78 insertions(+), 15 deletions(-)

diff --git a/image/main.c b/image/main.c
index f155794cfc19..26c72e85f656 100644
--- a/image/main.c
+++ b/image/main.c
@@ -41,6 +41,19 @@
 
 #define MAX_WORKER_THREADS	(32)
 
+const struct dump_version dump_versions[NR_DUMP_VERSIONS] = {
+	/*
+	 * The original format, which only supports tree blocks and
+	 * free space cache dump.
+	 */
+	{ .version = 0,
+	  .max_pending_size = SZ_256K,
+	  .magic_cpu = 0xbd5c25e27295668bULL,
+	  .extra_sb_flags = 1 }
+};
+
+const struct dump_version *current_version = &dump_versions[0];
+
 struct async_work {
 	struct list_head list;
 	struct list_head ordered;
@@ -395,7 +408,7 @@ static void meta_cluster_init(struct metadump_struct *md, u64 start)
 	md->num_items = 0;
 	md->num_ready = 0;
 	header = &md->cluster.header;
-	header->magic = cpu_to_le64(HEADER_MAGIC);
+	header->magic = cpu_to_le64(current_version->magic_cpu);
 	header->bytenr = cpu_to_le64(start);
 	header->nritems = cpu_to_le32(0);
 	header->compress = md->compress_level > 0 ?
@@ -707,7 +720,7 @@ static int add_extent(u64 start, u64 size, struct metadump_struct *md,
 {
 	int ret;
 	if (md->data != data ||
-	    md->pending_size + size > MAX_PENDING_SIZE ||
+	    md->pending_size + size > current_version->max_pending_size ||
 	    md->pending_start + md->pending_size != start) {
 		ret = flush_pending(md, 0);
 		if (ret)
@@ -1036,7 +1049,8 @@ static void update_super_old(u8 *buffer)
 	u32 sectorsize = btrfs_super_sectorsize(super);
 	u64 flags = btrfs_super_flags(super);
 
-	flags |= BTRFS_SUPER_FLAG_METADUMP;
+	if (current_version->extra_sb_flags)
+		flags |= BTRFS_SUPER_FLAG_METADUMP;
 	btrfs_set_super_flags(super, flags);
 
 	key = (struct btrfs_disk_key *)(super->sys_chunk_array);
@@ -1129,7 +1143,8 @@ static int update_super(struct mdrestore_struct *mdres, u8 *buffer)
 	if (mdres->clear_space_cache)
 		btrfs_set_super_cache_generation(super, 0);
 
-	flags |= BTRFS_SUPER_FLAG_METADUMP_V2;
+	if (current_version->extra_sb_flags)
+		flags |= BTRFS_SUPER_FLAG_METADUMP_V2;
 	btrfs_set_super_flags(super, flags);
 	btrfs_set_super_sys_array_size(super, new_array_size);
 	btrfs_set_super_num_devices(super, 1);
@@ -1317,7 +1332,7 @@ static void *restore_worker(void *data)
 	u8 *outbuf;
 	int outfd;
 	int ret;
-	int compress_size = MAX_PENDING_SIZE * 4;
+	int compress_size = current_version->max_pending_size * 4;
 
 	outfd = fileno(mdres->out);
 	buffer = malloc(compress_size);
@@ -1466,6 +1481,42 @@ static void mdrestore_destroy(struct mdrestore_struct *mdres, int num_threads)
 	pthread_mutex_destroy(&mdres->mutex);
 }
 
+static int detect_version(FILE *in)
+{
+	struct meta_cluster *cluster;
+	u8 buf[BLOCK_SIZE];
+	bool found = false;
+	int i;
+	int ret;
+
+	if (fseek(in, 0, SEEK_SET) < 0) {
+		error("seek failed: %m");
+		return -errno;
+	}
+	ret = fread(buf, BLOCK_SIZE, 1, in);
+	if (!ret) {
+		error("failed to read header");
+		return -EIO;
+	}
+
+	fseek(in, 0, SEEK_SET);
+	cluster = (struct meta_cluster *)buf;
+	for (i = 0; i < NR_DUMP_VERSIONS; i++) {
+		if (le64_to_cpu(cluster->header.magic) ==
+		    dump_versions[i].magic_cpu) {
+			found = true;
+			current_version = &dump_versions[i];
+			break;
+		}
+	}
+
+	if (!found) {
+		error("unrecognized header format");
+		return -EINVAL;
+	}
+	return 0;
+}
+
 static int mdrestore_init(struct mdrestore_struct *mdres,
 			  FILE *in, FILE *out, int old_restore,
 			  int num_threads, int fixup_offset,
@@ -1473,6 +1524,9 @@ static int mdrestore_init(struct mdrestore_struct *mdres,
 {
 	int i, ret = 0;
 
+	ret = detect_version(in);
+	if (ret < 0)
+		return ret;
 	memset(mdres, 0, sizeof(*mdres));
 	pthread_cond_init(&mdres->cond, NULL);
 	pthread_mutex_init(&mdres->mutex, NULL);
@@ -1520,9 +1574,9 @@ static int fill_mdres_info(struct mdrestore_struct *mdres,
 		return 0;
 
 	if (mdres->compress_method == COMPRESS_ZLIB) {
-		size_t size = MAX_PENDING_SIZE * 2;
+		size_t size = current_version->max_pending_size * 2;
 
-		buffer = malloc(MAX_PENDING_SIZE * 2);
+		buffer = malloc(current_version->max_pending_size * 2);
 		if (!buffer)
 			return -ENOMEM;
 		ret = uncompress(buffer, (unsigned long *)&size,
@@ -1761,7 +1815,7 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
 	u64 current_cluster = cluster_bytenr, bytenr;
 	u64 item_bytenr;
 	u32 bufsize, nritems, i;
-	u32 max_size = MAX_PENDING_SIZE * 2;
+	u32 max_size = current_version->max_pending_size * 2;
 	u8 *buffer, *tmp = NULL;
 	int ret = 0;
 
@@ -1817,7 +1871,7 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
 		ret = 0;
 
 		header = &cluster->header;
-		if (le64_to_cpu(header->magic) != HEADER_MAGIC ||
+		if (le64_to_cpu(header->magic) != current_version->magic_cpu ||
 		    le64_to_cpu(header->bytenr) != current_cluster) {
 			error("bad header in metadump image");
 			ret = -EIO;
@@ -1920,7 +1974,7 @@ static int build_chunk_tree(struct mdrestore_struct *mdres,
 	ret = 0;
 
 	header = &cluster->header;
-	if (le64_to_cpu(header->magic) != HEADER_MAGIC ||
+	if (le64_to_cpu(header->magic) != current_version->magic_cpu ||
 	    le64_to_cpu(header->bytenr) != 0) {
 		error("bad header in metadump image");
 		return -EIO;
@@ -1961,10 +2015,10 @@ static int build_chunk_tree(struct mdrestore_struct *mdres,
 	}
 
 	if (mdres->compress_method == COMPRESS_ZLIB) {
-		size_t size = MAX_PENDING_SIZE * 2;
+		size_t size = current_version->max_pending_size * 2;
 		u8 *tmp;
 
-		tmp = malloc(MAX_PENDING_SIZE * 2);
+		tmp = malloc(current_version->max_pending_size * 2);
 		if (!tmp) {
 			free(buffer);
 			return -ENOMEM;
@@ -2421,7 +2475,7 @@ static int restore_metadump(const char *input, FILE *out, int old_restore,
 			break;
 
 		header = &cluster->header;
-		if (le64_to_cpu(header->magic) != HEADER_MAGIC ||
+		if (le64_to_cpu(header->magic) != current_version->magic_cpu ||
 		    le64_to_cpu(header->bytenr) != bytenr) {
 			error("bad header in metadump image");
 			ret = -EIO;
diff --git a/image/metadump.h b/image/metadump.h
index f85c9bcfb813..941d4b827a24 100644
--- a/image/metadump.h
+++ b/image/metadump.h
@@ -22,8 +22,6 @@
 #include "kernel-lib/list.h"
 #include "ctree.h"
 
-#define HEADER_MAGIC		0xbd5c25e27295668bULL
-#define MAX_PENDING_SIZE	SZ_256K
 #define BLOCK_SIZE		SZ_1K
 #define BLOCK_MASK		(BLOCK_SIZE - 1)
 
@@ -33,6 +31,17 @@
 #define COMPRESS_NONE		0
 #define COMPRESS_ZLIB		1
 
+struct dump_version {
+	u64 magic_cpu;
+	int version;
+	int max_pending_size;
+	unsigned int extra_sb_flags:1;
+};
+
+#define NR_DUMP_VERSIONS	1
+extern const struct dump_version dump_versions[NR_DUMP_VERSIONS];
+const extern struct dump_version *current_version;
+
 struct meta_cluster_item {
 	__le64 bytenr;
 	__le32 size;
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 06/14] btrfs-progs: image: Introduce -d option to dump data
  2019-07-02 10:07 [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes WenRuo Qu
                   ` (4 preceding siblings ...)
  2019-07-02 10:07 ` [PATCH v2 05/14] btrfs-progs: image: Introduce framework for more dump versions WenRuo Qu
@ 2019-07-02 10:07 ` WenRuo Qu
  2019-07-02 10:07 ` [PATCH v2 07/14] btrfs-progs: image: Allow restore to record system chunk ranges for later usage WenRuo Qu
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: WenRuo Qu @ 2019-07-02 10:07 UTC (permalink / raw)
  To: linux-btrfs; +Cc: WenRuo Qu

This new data dump feature will dump the whole image, not long the
existing tree blocks but also all its data extents(*).

This feature will rely on the new dump format (_DUmP_v1), as it needs
extra large extent size limit, and older btrfs-image dump can't handle
such large item/cluster size.

Since we're dumping all extents including data extents, for the restored
image there is no need to use any extra super block flags to inform
kernel.
Kernel should just treat the restored image as any ordinary btrfs.

*: The data extents will be dumped as is, that's to say, even for
preallocated extent, its (meaningless) data will be read out and
dumpped.
This behavior will cause extra space usage for the image, but we can
skip all the complex partially shared preallocated extent check.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c     | 53 +++++++++++++++++++++++++++++++++++++-----------
 image/metadump.h |  2 +-
 2 files changed, 42 insertions(+), 13 deletions(-)

diff --git a/image/main.c b/image/main.c
index 26c72e85f656..04d914d14a66 100644
--- a/image/main.c
+++ b/image/main.c
@@ -49,7 +49,15 @@ const struct dump_version dump_versions[NR_DUMP_VERSIONS] = {
 	{ .version = 0,
 	  .max_pending_size = SZ_256K,
 	  .magic_cpu = 0xbd5c25e27295668bULL,
-	  .extra_sb_flags = 1 }
+	  .extra_sb_flags = 1 },
+	/*
+	 * The newer format, with much larger item size to contain
+	 * any data extent.
+	 */
+	{ .version = 1,
+	  .max_pending_size = SZ_256M,
+	  .magic_cpu = 0x31765f506d55445fULL, /* ascii _DUmP_v1, no null */
+	  .extra_sb_flags = 0 },
 };
 
 const struct dump_version *current_version = &dump_versions[0];
@@ -444,10 +452,14 @@ static void metadump_destroy(struct metadump_struct *md, int num_threads)
 
 static int metadump_init(struct metadump_struct *md, struct btrfs_root *root,
 			 FILE *out, int num_threads, int compress_level,
-			 enum sanitize_mode sanitize_names)
+			 bool dump_data, enum sanitize_mode sanitize_names)
 {
 	int i, ret = 0;
 
+	/* We need larger item/cluster limit for data extents */
+	if (dump_data)
+		current_version = &dump_versions[1];
+
 	memset(md, 0, sizeof(*md));
 	INIT_LIST_HEAD(&md->list);
 	INIT_LIST_HEAD(&md->ordered);
@@ -875,7 +887,7 @@ static int copy_space_cache(struct btrfs_root *root,
 }
 
 static int copy_from_extent_tree(struct metadump_struct *metadump,
-				 struct btrfs_path *path)
+				 struct btrfs_path *path, bool dump_data)
 {
 	struct btrfs_root *extent_root;
 	struct extent_buffer *leaf;
@@ -940,9 +952,15 @@ static int copy_from_extent_tree(struct metadump_struct *metadump,
 			ei = btrfs_item_ptr(leaf, path->slots[0],
 					    struct btrfs_extent_item);
 			if (btrfs_extent_flags(leaf, ei) &
-			    BTRFS_EXTENT_FLAG_TREE_BLOCK) {
+			    BTRFS_EXTENT_FLAG_TREE_BLOCK ||
+			    btrfs_extent_flags(leaf, ei) &
+			    BTRFS_EXTENT_FLAG_DATA) {
+				bool is_data;
+
+				is_data = btrfs_extent_flags(leaf, ei) &
+					  BTRFS_EXTENT_FLAG_DATA;
 				ret = add_extent(bytenr, num_bytes, metadump,
-						 0);
+						 is_data);
 				if (ret) {
 					error("unable to add block %llu: %d",
 						(unsigned long long)bytenr, ret);
@@ -965,7 +983,7 @@ static int copy_from_extent_tree(struct metadump_struct *metadump,
 
 static int create_metadump(const char *input, FILE *out, int num_threads,
 			   int compress_level, enum sanitize_mode sanitize,
-			   int walk_trees)
+			   int walk_trees, bool dump_data)
 {
 	struct btrfs_root *root;
 	struct btrfs_path path;
@@ -980,7 +998,7 @@ static int create_metadump(const char *input, FILE *out, int num_threads,
 	}
 
 	ret = metadump_init(&metadump, root, out, num_threads,
-			    compress_level, sanitize);
+			    compress_level, dump_data, sanitize);
 	if (ret) {
 		error("failed to initialize metadump: %d", ret);
 		close_ctree(root);
@@ -1012,7 +1030,7 @@ static int create_metadump(const char *input, FILE *out, int num_threads,
 			goto out;
 		}
 	} else {
-		ret = copy_from_extent_tree(&metadump, &path);
+		ret = copy_from_extent_tree(&metadump, &path, dump_data);
 		if (ret) {
 			err = ret;
 			goto out;
@@ -2637,6 +2655,7 @@ static void print_usage(int ret)
 	printf("\t-s      \tsanitize file names, use once to just use garbage, use twice if you want crc collisions\n");
 	printf("\t-w      \twalk all trees instead of using extent tree, do this if your extent tree is broken\n");
 	printf("\t-m	   \trestore for multiple devices\n");
+	printf("\t-d	   \talso dump data, conflicts with -w\n");
 	printf("\n");
 	printf("\tIn the dump mode, source is the btrfs device and target is the output file (use '-' for stdout).\n");
 	printf("\tIn the restore mode, source is the dumped image and target is the btrfs device/file.\n");
@@ -2656,6 +2675,7 @@ int main(int argc, char *argv[])
 	int ret;
 	enum sanitize_mode sanitize = SANITIZE_NONE;
 	int dev_cnt = 0;
+	bool dump_data = false;
 	int usage_error = 0;
 	FILE *out;
 
@@ -2664,7 +2684,7 @@ int main(int argc, char *argv[])
 			{ "help", no_argument, NULL, GETOPT_VAL_HELP},
 			{ NULL, 0, NULL, 0 }
 		};
-		int c = getopt_long(argc, argv, "rc:t:oswm", long_options, NULL);
+		int c = getopt_long(argc, argv, "rc:t:oswmd", long_options, NULL);
 		if (c < 0)
 			break;
 		switch (c) {
@@ -2704,6 +2724,9 @@ int main(int argc, char *argv[])
 			create = 0;
 			multi_devices = 1;
 			break;
+		case 'd':
+			dump_data = true;
+			break;
 		case GETOPT_VAL_HELP:
 		default:
 			print_usage(c != GETOPT_VAL_HELP);
@@ -2722,10 +2745,15 @@ int main(int argc, char *argv[])
 			"create and restore cannot be used at the same time");
 			usage_error++;
 		}
+		if (dump_data && walk_trees) {
+			error("-d conflicts with -f option");
+			usage_error++;
+		}
 	} else {
-		if (walk_trees || sanitize != SANITIZE_NONE || compress_level) {
+		if (walk_trees || sanitize != SANITIZE_NONE || compress_level ||
+		    dump_data) {
 			error(
-			"using -w, -s, -c options for restore makes no sense");
+		"using -w, -s, -c, -d options for restore makes no sense");
 			usage_error++;
 		}
 		if (multi_devices && dev_cnt < 2) {
@@ -2778,7 +2806,8 @@ int main(int argc, char *argv[])
 		}
 
 		ret = create_metadump(source, out, num_threads,
-				      compress_level, sanitize, walk_trees);
+				      compress_level, sanitize, walk_trees,
+				      dump_data);
 	} else {
 		ret = restore_metadump(source, out, old_restore, num_threads,
 				       0, target, multi_devices);
diff --git a/image/metadump.h b/image/metadump.h
index 941d4b827a24..a04f63a910d6 100644
--- a/image/metadump.h
+++ b/image/metadump.h
@@ -38,7 +38,7 @@ struct dump_version {
 	unsigned int extra_sb_flags:1;
 };
 
-#define NR_DUMP_VERSIONS	1
+#define NR_DUMP_VERSIONS	2
 extern const struct dump_version dump_versions[NR_DUMP_VERSIONS];
 const extern struct dump_version *current_version;
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 07/14] btrfs-progs: image: Allow restore to record system chunk ranges for later usage
  2019-07-02 10:07 [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes WenRuo Qu
                   ` (5 preceding siblings ...)
  2019-07-02 10:07 ` [PATCH v2 06/14] btrfs-progs: image: Introduce -d option to dump data WenRuo Qu
@ 2019-07-02 10:07 ` WenRuo Qu
  2019-07-02 10:07 ` [PATCH v2 08/14] btrfs-progs: image: Introduce helper to determine if a tree block is in the range of system chunks WenRuo Qu
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: WenRuo Qu @ 2019-07-02 10:07 UTC (permalink / raw)
  To: linux-btrfs; +Cc: WenRuo Qu

Currently we are doing a pretty slow search for system chunks before
restoring real data.
The current behavior is to search all clusters for chunk tree root
first, then search all clusters again and again for every chunk tree
block.

This causes recursive calls and pretty slow start up, the only good news
is since chunk tree are normally small, we don't need to iterate too
many times, thus overall it's acceptable.

To address such bad behavior, we could take usage of system chunk array
in the super block.
By recording all system chunks ranges, we could easily determine if an
extent belongs to chunk tree, thus do one loop simple linear search for
chunk tree leaves.

This patch only introduces the code base for later patches.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c | 103 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 103 insertions(+)

diff --git a/image/main.c b/image/main.c
index 04d914d14a66..29587c0171b8 100644
--- a/image/main.c
+++ b/image/main.c
@@ -35,6 +35,7 @@
 #include "utils.h"
 #include "volumes.h"
 #include "extent_io.h"
+#include "extent-cache.h"
 #include "help.h"
 #include "image/metadump.h"
 #include "image/sanitize.h"
@@ -112,6 +113,11 @@ struct mdrestore_struct {
 	pthread_mutex_t mutex;
 	pthread_cond_t cond;
 
+	/*
+	 * Records system chunk ranges, so restore can use this to determine
+	 * if an item is in chunk tree range.
+	 */
+	struct cache_tree sys_chunks;
 	struct rb_root chunk_tree;
 	struct rb_root physical_tree;
 	struct list_head list;
@@ -121,6 +127,8 @@ struct mdrestore_struct {
 	u64 devid;
 	u64 alloced_chunks;
 	u64 last_physical_offset;
+	/* An quicker checker for if a item is in sys chunk range */
+	u64 sys_chunk_end;
 	u8 uuid[BTRFS_UUID_SIZE];
 	u8 fsid[BTRFS_FSID_SIZE];
 
@@ -1487,6 +1495,7 @@ static void mdrestore_destroy(struct mdrestore_struct *mdres, int num_threads)
 		rb_erase(&entry->p, &mdres->physical_tree);
 		free(entry);
 	}
+	free_extent_cache_tree(&mdres->sys_chunks);
 	pthread_mutex_lock(&mdres->mutex);
 	mdres->done = 1;
 	pthread_cond_broadcast(&mdres->cond);
@@ -1550,6 +1559,7 @@ static int mdrestore_init(struct mdrestore_struct *mdres,
 	pthread_mutex_init(&mdres->mutex, NULL);
 	INIT_LIST_HEAD(&mdres->list);
 	INIT_LIST_HEAD(&mdres->overlapping_chunks);
+	cache_tree_init(&mdres->sys_chunks);
 	mdres->in = in;
 	mdres->out = out;
 	mdres->old_restore = old_restore;
@@ -1968,6 +1978,92 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
 	return ret;
 }
 
+/*
+ * Add system chunks in super blocks into mdres->sys_chunks, so later
+ * we can determine if an item is a chunk tree block.
+ */
+static int add_sys_array(struct mdrestore_struct *mdres,
+			 struct btrfs_super_block *sb)
+{
+	struct btrfs_disk_key *disk_key;
+	struct btrfs_key key;
+	struct btrfs_chunk *chunk;
+	struct cache_extent *cache;
+	u32 cur_offset;
+	u32 len = 0;
+	u32 array_size;
+	u8 *array_ptr;
+	int ret;
+
+	array_size = btrfs_super_sys_array_size(sb);
+	array_ptr = sb->sys_chunk_array;
+	cur_offset = 0;
+
+	while (cur_offset < array_size) {
+		u32 num_stripes;
+
+		disk_key = (struct btrfs_disk_key *)array_ptr;
+		len = sizeof(*disk_key);
+		if (cur_offset + len > array_size)
+			goto out_short_read;
+		btrfs_disk_key_to_cpu(&key, disk_key);
+
+		array_ptr += len;
+		cur_offset += len;
+
+		if (key.type == BTRFS_CHUNK_ITEM_KEY) {
+			chunk = (struct btrfs_chunk *)array_ptr;
+
+			/*
+			 * At least one btrfs_chunk with one stripe must be
+			 * present, exact stripe count check comes afterwards
+			 */
+			len = btrfs_chunk_item_size(1);
+			if (cur_offset + len > array_size)
+				goto out_short_read;
+			num_stripes = btrfs_stack_chunk_num_stripes(chunk);
+			if (!num_stripes) {
+				printk(
+	    "ERROR: invalid number of stripes %u in sys_array at offset %u\n",
+					num_stripes, cur_offset);
+				ret = -EIO;
+				break;
+			}
+			len = btrfs_chunk_item_size(num_stripes);
+			if (cur_offset + len > array_size)
+				goto out_short_read;
+			if (btrfs_stack_chunk_type(chunk) &
+			    BTRFS_BLOCK_GROUP_SYSTEM) {
+				ret = add_merge_cache_extent(&mdres->sys_chunks,
+					key.offset,
+					btrfs_stack_chunk_length(chunk));
+				if (ret < 0)
+					break;
+			}
+		} else {
+			error("unexpected item type %u in sys_array offset %u",
+			      key.type, cur_offset);
+			ret = -EUCLEAN;
+			break;
+		}
+		array_ptr += len;
+		cur_offset += len;
+	}
+
+	/* Get the last system chunk end as a quicker check */
+	cache = last_cache_extent(&mdres->sys_chunks);
+	if (!cache) {
+		error("no system chunk found in super block");
+		return -EUCLEAN;
+	}
+	mdres->sys_chunk_end = cache->start + cache->size - 1;
+	return ret;
+out_short_read:
+	error("sys_array too short to read %u bytes at offset %u\n",
+		len, cur_offset);
+	return -EUCLEAN;
+}
+
 static int build_chunk_tree(struct mdrestore_struct *mdres,
 			    struct meta_cluster *cluster)
 {
@@ -2060,6 +2156,13 @@ static int build_chunk_tree(struct mdrestore_struct *mdres,
 		error("invalid superblock");
 		return ret;
 	}
+	ret = add_sys_array(mdres, super);
+	if (ret < 0) {
+		error("failed to read system chunk array");
+		free(buffer);
+		pthread_mutex_unlock(&mdres->mutex);
+		return ret;
+	}
 	chunk_root_bytenr = btrfs_super_chunk_root(super);
 	mdres->nodesize = btrfs_super_nodesize(super);
 	if (btrfs_super_incompat_flags(super) &
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 08/14] btrfs-progs: image: Introduce helper to determine if a tree block is in the range of system chunks
  2019-07-02 10:07 [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes WenRuo Qu
                   ` (6 preceding siblings ...)
  2019-07-02 10:07 ` [PATCH v2 07/14] btrfs-progs: image: Allow restore to record system chunk ranges for later usage WenRuo Qu
@ 2019-07-02 10:07 ` WenRuo Qu
  2019-07-02 10:07 ` [PATCH v2 09/14] btrfs-progs: image: Rework how we search chunk tree blocks WenRuo Qu
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: WenRuo Qu @ 2019-07-02 10:07 UTC (permalink / raw)
  To: linux-btrfs; +Cc: WenRuo Qu

Introduce a new helper function, is_in_sys_chunks(), to determine if an
item is in the range of system chunks.

Since btrfs-image will merge adjacent same type extents into one item,
this function is designed to return true for any bytes in system chunk
range.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/image/main.c b/image/main.c
index 29587c0171b8..3493ebc4589e 100644
--- a/image/main.c
+++ b/image/main.c
@@ -1723,6 +1723,54 @@ static int wait_for_worker(struct mdrestore_struct *mdres)
 	return ret;
 }
 
+/*
+ * Check if a range [start ,start + len] has ANY bytes covered by
+ * system chunks ranges.
+ */
+static bool is_in_sys_chunks(struct mdrestore_struct *mdres, u64 start,
+			     u64 len)
+{
+	struct rb_node *node = mdres->sys_chunks.root.rb_node;
+	struct cache_extent *entry;
+	struct cache_extent *next;
+	struct cache_extent *prev;
+
+	if (start > mdres->sys_chunk_end)
+		return false;
+
+	while (node) {
+		entry = rb_entry(node, struct cache_extent, rb_node);
+		if (start > entry->start) {
+			if (!node->rb_right)
+				break;
+			node = node->rb_right;
+		} else if (start < entry->start) {
+			if (!node->rb_left)
+				break;
+			node = node->rb_left;
+		} else {
+			/* already in a system chunk */
+			return true;
+		}
+	}
+	if (!node)
+		return false;
+	entry = rb_entry(node, struct cache_extent, rb_node);
+	/* Now we have entry which is the nearst chunk around @start */
+	if (start > entry->start) {
+		prev = entry;
+		next = next_cache_extent(entry);
+	} else {
+		prev = prev_cache_extent(entry);
+		next = entry;
+	}
+	if (prev && prev->start + prev->size > start)
+		return true;
+	if (next && start + len > next->start)
+		return true;
+	return false;
+}
+
 static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer,
 			    u64 bytenr, u64 item_bytenr, u32 bufsize,
 			    u64 cluster_bytenr)
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 09/14] btrfs-progs: image: Rework how we search chunk tree blocks
  2019-07-02 10:07 [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes WenRuo Qu
                   ` (7 preceding siblings ...)
  2019-07-02 10:07 ` [PATCH v2 08/14] btrfs-progs: image: Introduce helper to determine if a tree block is in the range of system chunks WenRuo Qu
@ 2019-07-02 10:07 ` WenRuo Qu
  2019-07-02 10:07 ` [PATCH v2 10/14] btrfs-progs: image: Reduce memory requirement for decompression WenRuo Qu
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: WenRuo Qu @ 2019-07-02 10:07 UTC (permalink / raw)
  To: linux-btrfs; +Cc: WenRuo Qu

Before this patch, we were using a very inefficient way to search
chunks:

We iterate through all clusters to find the chunk root tree block first,
then re-iterate all clusters again to find every child tree blocks.

Every time we need to iterate all clusters just to find a chunk tree
block.
This is obviously inefficient, specially when chunk tree get larger.
So the original author leaves a comment on it:
  /* If you have to ask you aren't worthy */
  static int search_for_chunk_blocks()

This patch will change the behavior so that we will only iterate all
clusters once.

The idea behind the optimization is, since we have the superblock
restored first, we could use the CHUNK_ITEMs in
super_block::sys_chunk_array to build a SYSTEM chunk mapping.

Then when we start to iterate through all items, we can easily skip
unrelated items at different level:
- At cluster level
  If a cluster starts beyond last system chunk map, it must not contain
  any chunk tree blocks (as chunk tree blocks only lives inside system
  chunks)

- At item level
  If one item has no intersection with any system chunk map, then it
  must not contain any tree blocks.

By this, we can iterate through all clusters just once, and find out all
CHUNK_ITEMs.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c | 214 +++++++++++++++++++++++++++------------------------
 1 file changed, 115 insertions(+), 99 deletions(-)

diff --git a/image/main.c b/image/main.c
index 3493ebc4589e..fc5806e4e4bc 100644
--- a/image/main.c
+++ b/image/main.c
@@ -142,8 +142,6 @@ struct mdrestore_struct {
 	struct btrfs_fs_info *info;
 };
 
-static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
-				   u64 search, u64 cluster_bytenr);
 static struct extent_buffer *alloc_dummy_eb(u64 bytenr, u32 size);
 
 static void csum_block(u8 *buf, size_t len)
@@ -1771,67 +1769,17 @@ static bool is_in_sys_chunks(struct mdrestore_struct *mdres, u64 start,
 	return false;
 }
 
-static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer,
-			    u64 bytenr, u64 item_bytenr, u32 bufsize,
-			    u64 cluster_bytenr)
+static int read_chunk_tree_block(struct mdrestore_struct *mdres,
+				 struct extent_buffer *eb)
 {
-	struct extent_buffer *eb;
-	int ret = 0;
 	int i;
 
-	eb = alloc_dummy_eb(bytenr, mdres->nodesize);
-	if (!eb) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	while (item_bytenr != bytenr) {
-		buffer += mdres->nodesize;
-		item_bytenr += mdres->nodesize;
-	}
-
-	memcpy(eb->data, buffer, mdres->nodesize);
-	if (btrfs_header_bytenr(eb) != bytenr) {
-		error("eb bytenr does not match found bytenr: %llu != %llu",
-				(unsigned long long)btrfs_header_bytenr(eb),
-				(unsigned long long)bytenr);
-		ret = -EIO;
-		goto out;
-	}
-
-	if (memcmp(mdres->fsid, eb->data + offsetof(struct btrfs_header, fsid),
-		   BTRFS_FSID_SIZE)) {
-		error("filesystem metadata UUID of eb %llu does not match",
-				(unsigned long long)bytenr);
-		ret = -EIO;
-		goto out;
-	}
-
-	if (btrfs_header_owner(eb) != BTRFS_CHUNK_TREE_OBJECTID) {
-		error("wrong eb %llu owner %llu",
-				(unsigned long long)bytenr,
-				(unsigned long long)btrfs_header_owner(eb));
-		ret = -EIO;
-		goto out;
-	}
-
 	for (i = 0; i < btrfs_header_nritems(eb); i++) {
 		struct btrfs_chunk *chunk;
 		struct fs_chunk *fs_chunk;
 		struct btrfs_key key;
 		u64 type;
 
-		if (btrfs_header_level(eb)) {
-			u64 blockptr = btrfs_node_blockptr(eb, i);
-
-			ret = search_for_chunk_blocks(mdres, blockptr,
-						      cluster_bytenr);
-			if (ret)
-				break;
-			continue;
-		}
-
-		/* Yay a leaf!  We loves leafs! */
 		btrfs_item_key_to_cpu(eb, &key, i);
 		if (key.type != BTRFS_CHUNK_ITEM_KEY)
 			continue;
@@ -1839,8 +1787,7 @@ static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer,
 		fs_chunk = malloc(sizeof(struct fs_chunk));
 		if (!fs_chunk) {
 			error("not enough memory to allocate chunk");
-			ret = -ENOMEM;
-			break;
+			return -ENOMEM;
 		}
 		memset(fs_chunk, 0, sizeof(*fs_chunk));
 		chunk = btrfs_item_ptr(eb, i, struct btrfs_chunk);
@@ -1849,19 +1796,18 @@ static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer,
 		fs_chunk->physical = btrfs_stripe_offset_nr(eb, chunk, 0);
 		fs_chunk->bytes = btrfs_chunk_length(eb, chunk);
 		INIT_LIST_HEAD(&fs_chunk->list);
+
 		if (tree_search(&mdres->physical_tree, &fs_chunk->p,
 				physical_cmp, 1) != NULL)
 			list_add(&fs_chunk->list, &mdres->overlapping_chunks);
 		else
 			tree_insert(&mdres->physical_tree, &fs_chunk->p,
 				    physical_cmp);
-
 		type = btrfs_chunk_type(eb, chunk);
 		if (type & BTRFS_BLOCK_GROUP_DUP) {
 			fs_chunk->physical_dup =
 					btrfs_stripe_offset_nr(eb, chunk, 1);
 		}
-
 		if (fs_chunk->physical_dup + fs_chunk->bytes >
 		    mdres->last_physical_offset)
 			mdres->last_physical_offset = fs_chunk->physical_dup +
@@ -1876,19 +1822,80 @@ static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer,
 			mdres->alloced_chunks += fs_chunk->bytes;
 		tree_insert(&mdres->chunk_tree, &fs_chunk->l, chunk_cmp);
 	}
-out:
+	return 0;
+}
+
+static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer,
+			    u64 item_bytenr, u32 bufsize,
+			    u64 cluster_bytenr)
+{
+	struct extent_buffer *eb;
+	u32 nodesize = mdres->nodesize;
+	u64 bytenr;
+	size_t cur_offset;
+	int ret = 0;
+
+	eb = alloc_dummy_eb(0, mdres->nodesize);
+	if (!eb)
+		return -ENOMEM;
+
+	for (cur_offset = 0; cur_offset < bufsize; cur_offset += nodesize) {
+		bytenr = item_bytenr + cur_offset;
+		if (!is_in_sys_chunks(mdres, bytenr, nodesize))
+			continue;
+		memcpy(eb->data, buffer + cur_offset, nodesize);
+		if (btrfs_header_bytenr(eb) != bytenr) {
+			error(
+			"eb bytenr does not match found bytenr: %llu != %llu",
+				(unsigned long long)btrfs_header_bytenr(eb),
+				(unsigned long long)bytenr);
+			ret = -EUCLEAN;
+			break;
+		}
+		if (memcmp(mdres->fsid, eb->data +
+			   offsetof(struct btrfs_header, fsid),
+		    BTRFS_FSID_SIZE)) {
+			error(
+			"filesystem metadata UUID of eb %llu does not match",
+				bytenr);
+			ret = -EUCLEAN;
+			break;
+		}
+		if (btrfs_header_owner(eb) != BTRFS_CHUNK_TREE_OBJECTID) {
+			error("wrong eb %llu owner %llu",
+				(unsigned long long)bytenr,
+				(unsigned long long)btrfs_header_owner(eb));
+			ret = -EUCLEAN;
+			break;
+		}
+		/*
+		 * No need to search node, as we will iterate all tree blocks
+		 * in chunk tree, only need to bother leaves.
+		 */
+		if (btrfs_header_level(eb))
+			continue;
+		ret = read_chunk_tree_block(mdres, eb);
+		if (ret < 0)
+			break;
+	}
 	free(eb);
 	return ret;
 }
 
-/* If you have to ask you aren't worthy */
-static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
-				   u64 search, u64 cluster_bytenr)
+/*
+ * This function will try to find all chunk items in the dump image.
+ *
+ * This function will iterate all clusters, and find any item inside
+ * system chunk ranges.
+ * For such item, it will try to read them as tree blocks, and find
+ * CHUNK_ITEMs, add them to @mdres.
+ */
+static int search_for_chunk_blocks(struct mdrestore_struct *mdres)
 {
 	struct meta_cluster *cluster;
 	struct meta_cluster_header *header;
 	struct meta_cluster_item *item;
-	u64 current_cluster = cluster_bytenr, bytenr;
+	u64 current_cluster = 0, bytenr;
 	u64 item_bytenr;
 	u32 bufsize, nritems, i;
 	u32 max_size = current_version->max_pending_size * 2;
@@ -1919,30 +1926,27 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
 	}
 
 	bytenr = current_cluster;
+	/* Main loop, iterating all clusters */
 	while (1) {
 		if (fseek(mdres->in, current_cluster, SEEK_SET)) {
 			error("seek failed: %m");
 			ret = -EIO;
-			break;
+			goto out;
 		}
 
 		ret = fread(cluster, BLOCK_SIZE, 1, mdres->in);
 		if (ret == 0) {
-			if (cluster_bytenr != 0) {
-				cluster_bytenr = 0;
-				current_cluster = 0;
-				bytenr = 0;
-				continue;
-			}
+			if (feof(mdres->in))
+				goto out;
 			error(
 	"unknown state after reading cluster at %llu, probably corrupted data",
-					cluster_bytenr);
+					current_cluster);
 			ret = -EIO;
-			break;
+			goto out;
 		} else if (ret < 0) {
 			error("unable to read image at %llu: %m",
-					(unsigned long long)cluster_bytenr);
-			break;
+					current_cluster);
+			goto out;
 		}
 		ret = 0;
 
@@ -1951,11 +1955,17 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
 		    le64_to_cpu(header->bytenr) != current_cluster) {
 			error("bad header in metadump image");
 			ret = -EIO;
-			break;
+			goto out;
 		}
 
+		/* We're already over the system chunk end, no need to search*/
+		if (current_cluster > mdres->sys_chunk_end)
+			goto out;
+
 		bytenr += BLOCK_SIZE;
 		nritems = le32_to_cpu(header->nritems);
+
+		/* Search items for tree blocks in sys chunks */
 		for (i = 0; i < nritems; i++) {
 			size_t size;
 
@@ -1963,11 +1973,23 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
 			bufsize = le32_to_cpu(item->size);
 			item_bytenr = le64_to_cpu(item->bytenr);
 
-			if (bufsize > max_size) {
-				error("item %u too big: %u > %u", i, bufsize,
-						max_size);
-				ret = -EIO;
-				break;
+			/*
+			 * Only data extent/free space cache can be that big,
+			 * adjacent tree blocks won't be able to be merged
+			 * beyond max_size.
+			 * Also, we can skip super block.
+			 */
+			if (bufsize > max_size ||
+			    !is_in_sys_chunks(mdres, item_bytenr, bufsize) ||
+			    item_bytenr == BTRFS_SUPER_INFO_OFFSET) {
+				ret = fseek(mdres->in, bufsize, SEEK_CUR);
+				if (ret < 0) {
+					error("failed to seek: %m");
+					ret = -errno;
+					goto out;
+				}
+				bytenr += bufsize;
+				continue;
 			}
 
 			if (mdres->compress_method == COMPRESS_ZLIB) {
@@ -1975,7 +1997,7 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
 				if (ret != 1) {
 					error("read error: %m");
 					ret = -EIO;
-					break;
+					goto out;
 				}
 
 				size = max_size;
@@ -1986,40 +2008,36 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres,
 					error("decompression failed with %d",
 							ret);
 					ret = -EIO;
-					break;
+					goto out;
 				}
 			} else {
 				ret = fread(buffer, bufsize, 1, mdres->in);
 				if (ret != 1) {
 					error("read error: %m");
 					ret = -EIO;
-					break;
+					goto out;
 				}
 				size = bufsize;
 			}
 			ret = 0;
 
-			if (item_bytenr <= search &&
-			    item_bytenr + size > search) {
-				ret = read_chunk_block(mdres, buffer, search,
-						       item_bytenr, size,
-						       current_cluster);
-				if (!ret)
-					ret = 1;
-				break;
+			ret = read_chunk_block(mdres, buffer,
+					       item_bytenr, size,
+					       current_cluster);
+			if (ret < 0) {
+				error(
+	"failed to search tree blocks in item bytenr %llu size %lu",
+					item_bytenr, size);
+				goto out;
 			}
 			bytenr += bufsize;
 		}
-		if (ret) {
-			if (ret > 0)
-				ret = 0;
-			break;
-		}
 		if (bytenr & BLOCK_MASK)
 			bytenr += BLOCK_SIZE - (bytenr & BLOCK_MASK);
 		current_cluster = bytenr;
 	}
 
+out:
 	free(tmp);
 	free(buffer);
 	free(cluster);
@@ -2118,7 +2136,6 @@ static int build_chunk_tree(struct mdrestore_struct *mdres,
 	struct btrfs_super_block *super;
 	struct meta_cluster_header *header;
 	struct meta_cluster_item *item = NULL;
-	u64 chunk_root_bytenr = 0;
 	u32 i, nritems;
 	u64 bytenr = 0;
 	u8 *buffer;
@@ -2211,7 +2228,6 @@ static int build_chunk_tree(struct mdrestore_struct *mdres,
 		pthread_mutex_unlock(&mdres->mutex);
 		return ret;
 	}
-	chunk_root_bytenr = btrfs_super_chunk_root(super);
 	mdres->nodesize = btrfs_super_nodesize(super);
 	if (btrfs_super_incompat_flags(super) &
 	    BTRFS_FEATURE_INCOMPAT_METADATA_UUID)
@@ -2224,7 +2240,7 @@ static int build_chunk_tree(struct mdrestore_struct *mdres,
 	free(buffer);
 	pthread_mutex_unlock(&mdres->mutex);
 
-	return search_for_chunk_blocks(mdres, chunk_root_bytenr, 0);
+	return search_for_chunk_blocks(mdres);
 }
 
 static int range_contains_super(u64 physical, u64 bytes)
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 10/14] btrfs-progs: image: Reduce memory requirement for decompression
  2019-07-02 10:07 [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes WenRuo Qu
                   ` (8 preceding siblings ...)
  2019-07-02 10:07 ` [PATCH v2 09/14] btrfs-progs: image: Rework how we search chunk tree blocks WenRuo Qu
@ 2019-07-02 10:07 ` WenRuo Qu
  2019-07-02 10:07 ` [PATCH v2 11/14] btrfs-progs: image: Don't waste memory when we're just extracting super block WenRuo Qu
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: WenRuo Qu @ 2019-07-02 10:07 UTC (permalink / raw)
  To: linux-btrfs; +Cc: WenRuo Qu

With recent change to enlarge max_pending_size to 256M for data dump,
the decompress code requires quite a lot of memory space. (256M * 4).

The main reason behind it is, we're using wrapped uncompress() function
call, which needs the buffer to be large enough to contain the
decompressed data.

This patch will re-work the decompress work to use inflate() which can
resume it decompression so that we can use a much smaller buffer size.

This patch choose to use 512K buffer size.

Now the memory consumption for restore is reduced to
 Cluster data size + 512K * nr_running_threads

Instead of the original one:
 Cluster data size + 1G * nr_running_threads

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c | 220 +++++++++++++++++++++++++++++++++------------------
 1 file changed, 145 insertions(+), 75 deletions(-)

diff --git a/image/main.c b/image/main.c
index fc5806e4e4bc..d6e21ed68b87 100644
--- a/image/main.c
+++ b/image/main.c
@@ -1347,128 +1347,198 @@ static void write_backup_supers(int fd, u8 *buf)
 	}
 }
 
-static void *restore_worker(void *data)
+/*
+ * Restore one item.
+ *
+ * For uncompressed data, it's just reading from work->buf then write to output.
+ * For compressed data, since we can have very large decompressed data
+ * (up to 256M), we need to consider memory usage. So here we will fill buffer
+ * then write the decompressed buffer to output.
+ */
+static int restore_one_work(struct mdrestore_struct *mdres,
+			    struct async_work *async, u8 *buffer, int bufsize)
 {
-	struct mdrestore_struct *mdres = (struct mdrestore_struct *)data;
-	struct async_work *async;
-	size_t size;
-	u8 *buffer;
-	u8 *outbuf;
-	int outfd;
+	z_stream strm;
+	int buf_offset = 0;	/* offset inside work->buffer */
+	int out_offset = 0;	/* offset for output */
+	int out_len;
+	int outfd = fileno(mdres->out);
+	int compress_method = mdres->compress_method;
 	int ret;
-	int compress_size = current_version->max_pending_size * 4;
 
-	outfd = fileno(mdres->out);
-	buffer = malloc(compress_size);
-	if (!buffer) {
-		error("not enough memory for restore worker buffer");
-		pthread_mutex_lock(&mdres->mutex);
-		if (!mdres->error)
-			mdres->error = -ENOMEM;
-		pthread_mutex_unlock(&mdres->mutex);
-		pthread_exit(NULL);
+	ASSERT(is_power_of_2(bufsize));
+
+	if (compress_method == COMPRESS_ZLIB) {
+		strm.zalloc = Z_NULL;
+		strm.zfree = Z_NULL;
+		strm.opaque = Z_NULL;
+		strm.avail_in = async->bufsize;
+		strm.next_in = async->buffer;
+		strm.avail_out = 0;
+		strm.next_out = Z_NULL;
+		ret = inflateInit(&strm);
+		if (ret != Z_OK) {
+			error("failed to initialize decompress parameters: %d",
+				ret);
+			return ret;
+		}
 	}
+	while (buf_offset < async->bufsize) {
+		bool compress_end = false;
+		int read_size = min_t(u64, async->bufsize - buf_offset,
+				      bufsize);
 
-	while (1) {
-		u64 bytenr, physical_dup;
-		off_t offset = 0;
-		int err = 0;
-
-		pthread_mutex_lock(&mdres->mutex);
-		while (!mdres->nodesize || list_empty(&mdres->list)) {
-			if (mdres->done) {
-				pthread_mutex_unlock(&mdres->mutex);
-				goto out;
+		/* Read part */
+		if (compress_method == COMPRESS_ZLIB) {
+			if (strm.avail_out == 0) {
+				strm.avail_out = bufsize;
+				strm.next_out = buffer;
 			}
-			pthread_cond_wait(&mdres->cond, &mdres->mutex);
-		}
-		async = list_entry(mdres->list.next, struct async_work, list);
-		list_del_init(&async->list);
-
-		if (mdres->compress_method == COMPRESS_ZLIB) {
-			size = compress_size;
 			pthread_mutex_unlock(&mdres->mutex);
-			ret = uncompress(buffer, (unsigned long *)&size,
-					 async->buffer, async->bufsize);
+			ret = inflate(&strm, Z_NO_FLUSH);
 			pthread_mutex_lock(&mdres->mutex);
-			if (ret != Z_OK) {
-				error("decompression failed with %d", ret);
-				err = -EIO;
+			switch (ret) {
+			case Z_NEED_DICT:
+				ret = Z_DATA_ERROR; /* fallthrough */
+				__attribute__ ((fallthrough));
+			case Z_DATA_ERROR:
+			case Z_MEM_ERROR:
+				goto out;
+			}
+			if (ret == Z_STREAM_END) {
+				ret = 0;
+				compress_end = true;
 			}
-			outbuf = buffer;
+			out_len = bufsize - strm.avail_out;
 		} else {
-			outbuf = async->buffer;
-			size = async->bufsize;
+			/* No compress, read as many data as possible */
+			memcpy(buffer, async->buffer + buf_offset, read_size);
+
+			buf_offset += read_size;
+			out_len = read_size;
 		}
 
+		/* Fixup part */
 		if (!mdres->multi_devices) {
 			if (async->start == BTRFS_SUPER_INFO_OFFSET) {
 				if (mdres->old_restore) {
-					update_super_old(outbuf);
+					update_super_old(buffer);
 				} else {
-					ret = update_super(mdres, outbuf);
-					if (ret)
-						err = ret;
+					ret = update_super(mdres, buffer);
+					if (ret < 0) 
+						goto out;
 				}
 			} else if (!mdres->old_restore) {
-				ret = fixup_chunk_tree_block(mdres, async, outbuf, size);
+				ret = fixup_chunk_tree_block(mdres, async,
+							     buffer, out_len);
 				if (ret)
-					err = ret;
+					goto out;
 			}
 		}
 
+		/* Write part */
 		if (!mdres->fixup_offset) {
+			int size = out_len;
+			off_t offset = 0;
+
 			while (size) {
+				u64 logical = async->start + out_offset + offset;
 				u64 chunk_size = size;
-				physical_dup = 0;
+				u64 physical_dup = 0;
+				u64 bytenr;
+
 				if (!mdres->multi_devices && !mdres->old_restore)
 					bytenr = logical_to_physical(mdres,
-						     async->start + offset,
-						     &chunk_size,
-						     &physical_dup);
+							logical, &chunk_size,
+							&physical_dup);
 				else
-					bytenr = async->start + offset;
+					bytenr = logical;
 
-				ret = pwrite64(outfd, outbuf+offset, chunk_size,
-					       bytenr);
+				ret = pwrite64(outfd, buffer + offset, chunk_size, bytenr);
 				if (ret != chunk_size)
-					goto error;
+					goto write_error;
 
 				if (physical_dup)
-					ret = pwrite64(outfd, outbuf+offset,
-						       chunk_size,
-						       physical_dup);
+					ret = pwrite64(outfd, buffer + offset,
+						       chunk_size, physical_dup);
 				if (ret != chunk_size)
-					goto error;
+					goto write_error;
 
 				size -= chunk_size;
 				offset += chunk_size;
 				continue;
-
-error:
-				if (ret < 0) {
-					error("unable to write to device: %m");
-					err = errno;
-				} else {
-					error("short write");
-					err = -EIO;
-				}
 			}
 		} else if (async->start != BTRFS_SUPER_INFO_OFFSET) {
-			ret = write_data_to_disk(mdres->info, outbuf, async->start, size, 0);
+			ret = write_data_to_disk(mdres->info, buffer,
+						 async->start, out_len, 0);
 			if (ret) {
 				error("failed to write data");
 				exit(1);
 			}
 		}
 
-
 		/* backup super blocks are already there at fixup_offset stage */
-		if (!mdres->multi_devices && async->start == BTRFS_SUPER_INFO_OFFSET)
-			write_backup_supers(outfd, outbuf);
+		if (async->start == BTRFS_SUPER_INFO_OFFSET &&
+		    !mdres->multi_devices)
+			write_backup_supers(outfd, buffer);
+		out_offset += out_len;
+		if (compress_end) {
+			inflateEnd(&strm);
+			break;
+		}
+	}
+	return ret;
+
+write_error:
+	if (ret < 0) {
+		error("unable to write to device: %m");
+		ret = -errno;
+	} else {
+		error("short write");
+		ret = -EIO;
+	}
+out:
+	if (compress_method == COMPRESS_ZLIB)
+		inflateEnd(&strm);
+	return ret;
+}
+
+static void *restore_worker(void *data)
+{
+	struct mdrestore_struct *mdres = (struct mdrestore_struct *)data;
+	struct async_work *async;
+	u8 *buffer;
+	int ret;
+	int buffer_size = SZ_512K;
+
+	buffer = malloc(buffer_size);
+	if (!buffer) {
+		error("not enough memory for restore worker buffer");
+		pthread_mutex_lock(&mdres->mutex);
+		if (!mdres->error)
+			mdres->error = -ENOMEM;
+		pthread_mutex_unlock(&mdres->mutex);
+		pthread_exit(NULL);
+	}
+
+	while (1) {
+		pthread_mutex_lock(&mdres->mutex);
+		while (!mdres->nodesize || list_empty(&mdres->list)) {
+			if (mdres->done) {
+				pthread_mutex_unlock(&mdres->mutex);
+				goto out;
+			}
+			pthread_cond_wait(&mdres->cond, &mdres->mutex);
+		}
+		async = list_entry(mdres->list.next, struct async_work, list);
+		list_del_init(&async->list);
 
-		if (err && !mdres->error)
-			mdres->error = err;
+		ret = restore_one_work(mdres, async, buffer, buffer_size);
+		if (ret < 0) {
+			mdres->error = ret;
+			pthread_mutex_unlock(&mdres->mutex);
+			goto out;
+		}
 		mdres->num_items--;
 		pthread_mutex_unlock(&mdres->mutex);
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 11/14] btrfs-progs: image: Don't waste memory when we're just extracting super block
  2019-07-02 10:07 [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes WenRuo Qu
                   ` (9 preceding siblings ...)
  2019-07-02 10:07 ` [PATCH v2 10/14] btrfs-progs: image: Reduce memory requirement for decompression WenRuo Qu
@ 2019-07-02 10:07 ` WenRuo Qu
  2019-07-02 10:07 ` [PATCH v2 12/14] btrfs-progs: image: Reduce memory usage for chunk tree search WenRuo Qu
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: WenRuo Qu @ 2019-07-02 10:07 UTC (permalink / raw)
  To: linux-btrfs; +Cc: WenRuo Qu

There is no need to allocate 2 * max_pending_size (which can be 256M) if
we're just extracting super block.

We only need to prepare BTRFS_SUPER_INFO_SIZE as buffer size.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/image/main.c b/image/main.c
index d6e21ed68b87..7ca207de3c3a 100644
--- a/image/main.c
+++ b/image/main.c
@@ -1670,9 +1670,14 @@ static int fill_mdres_info(struct mdrestore_struct *mdres,
 		return 0;
 
 	if (mdres->compress_method == COMPRESS_ZLIB) {
-		size_t size = current_version->max_pending_size * 2;
+		/*
+		 * We know this item is superblock, its should only be 4K.
+		 * Don't need to waste memory following max_pending_size as it
+		 * can be as large as 256M.
+		 */
+		size_t size = BTRFS_SUPER_INFO_SIZE;
 
-		buffer = malloc(current_version->max_pending_size * 2);
+		buffer = malloc(size);
 		if (!buffer)
 			return -ENOMEM;
 		ret = uncompress(buffer, (unsigned long *)&size,
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 12/14] btrfs-progs: image: Reduce memory usage for chunk tree search
  2019-07-02 10:07 [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes WenRuo Qu
                   ` (10 preceding siblings ...)
  2019-07-02 10:07 ` [PATCH v2 11/14] btrfs-progs: image: Don't waste memory when we're just extracting super block WenRuo Qu
@ 2019-07-02 10:07 ` WenRuo Qu
  2019-07-02 10:07 ` [PATCH v2 13/14] btrfs-progs: image: Output error message for chunk tree build error WenRuo Qu
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: WenRuo Qu @ 2019-07-02 10:07 UTC (permalink / raw)
  To: linux-btrfs; +Cc: WenRuo Qu

Just like original restore_worker, search_for_chunk_blocks() will also
use a lot of memory for restoring large uncompressed extent or
compressed extent with data dump.

Reduce the memory usage by:
- Use fixed buffer size for uncompressed extent
  Now we will use fixed 512K as buffer size for reading uncompressed
  extent.

  Now chunk tree search will read out 512K data at most, then search
  chunk trees in the buffer.

  Reduce the memory usage from as large as item size, to fixed 512K.

- Use inflate() for compressed extent
  For compressed extent, we need two buffers, one for compressed data,
  and one for uncompressed data.
  For compressed data, we will use the item size as buffer size, since
  compressed extent should be small enough.
  For uncompressed data, we use 512K as buffer size.

  Now chunk tree search will fill the first 512K, then search chunk
  trees blocks in the uncompressed 512K buffer, then loop until the
  compressed data is exhausted.

  Reduce the memory usage from as large as 256M * 2 to 512K + compressed
  extent size.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c | 159 ++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 126 insertions(+), 33 deletions(-)

diff --git a/image/main.c b/image/main.c
index 7ca207de3c3a..2256138df079 100644
--- a/image/main.c
+++ b/image/main.c
@@ -1957,6 +1957,126 @@ static int read_chunk_block(struct mdrestore_struct *mdres, u8 *buffer,
 	return ret;
 }
 
+static int search_chunk_uncompressed(struct mdrestore_struct *mdres,
+				     struct meta_cluster_item *item,
+				     u64 current_cluster)
+{
+	u32 item_size = le32_to_cpu(item->size);
+	u64 item_bytenr = le64_to_cpu(item->bytenr);
+	int bufsize = SZ_512K;
+	int read_size;
+	u32 offset = 0;
+	u8 *buffer;
+	int ret;
+
+	ASSERT(mdres->compress_method == COMPRESS_NONE);
+	buffer = malloc(bufsize);
+	if (!buffer)
+		return -ENOMEM;
+
+	while (offset < item_size) {
+		read_size = min_t(u32, bufsize, item_size - offset);
+		ret = fread(buffer, read_size, 1, mdres->in);
+		if (ret != 1) {
+			error("read error: %m");
+			ret = -EIO;
+			goto out;
+		}
+		ret = read_chunk_block(mdres, buffer, item_bytenr, read_size,
+					current_cluster);
+		if (ret < 0) {
+			error(
+	"failed to search tree blocks in item bytenr %llu size %u",
+				item_bytenr, item_size);
+			goto out;
+		}
+		offset += read_size;
+	}
+out:
+	free(buffer);
+	return ret;
+}
+
+static int search_chunk_compressed(struct mdrestore_struct *mdres,
+				   struct meta_cluster_item *item,
+				   u64 current_cluster)
+{
+	z_stream strm;
+	u32 item_size = le32_to_cpu(item->size);
+	u64 item_bytenr = le64_to_cpu(item->bytenr);
+	int bufsize = SZ_512K;
+	int read_size;
+	u8 *out_buf = NULL;	/* uncompressed data */
+	u8 *in_buf = NULL;	/* compressed data */
+	bool end = false;
+	int ret;
+
+	ASSERT(mdres->compress_method != COMPRESS_NONE);
+	strm.zalloc = Z_NULL;
+	strm.zfree = Z_NULL;
+	strm.opaque = Z_NULL;
+	strm.avail_in = 0;
+	strm.next_in = Z_NULL;
+	strm.avail_out = 0;
+	strm.next_out = Z_NULL;
+	ret = inflateInit(&strm);
+	if (ret != Z_OK) {
+		error("failed to initialize decompress parameters: %d", ret);
+		return ret;
+	}
+
+	out_buf = malloc(bufsize);
+	in_buf = malloc(item_size);
+	if (!in_buf || !out_buf) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = fread(in_buf, item_size, 1, mdres->in);
+	if (ret != 1) {
+		error("read error: %m");
+		ret = -EIO;
+		goto out;
+	}
+	strm.avail_in = item_size;
+	strm.next_in = in_buf;
+	while (!end) {
+		if (strm.avail_out == 0) {
+			strm.avail_out = bufsize;
+			strm.next_out = out_buf;
+		}
+		ret = inflate(&strm, Z_NO_FLUSH);
+		switch (ret) {
+		case Z_NEED_DICT:
+			ret = Z_DATA_ERROR; /* fallthrough */
+			__attribute__ ((fallthrough));
+		case Z_DATA_ERROR:
+		case Z_MEM_ERROR:
+			goto out;
+		}
+		if (ret == Z_STREAM_END) {
+			ret = 0;
+			end = true;
+		}
+		read_size = bufsize - strm.avail_out;
+
+		ret = read_chunk_block(mdres, out_buf, item_bytenr, read_size,
+					current_cluster);
+		if (ret < 0) {
+			error(
+	"failed to search tree blocks in item bytenr %llu size %u",
+				item_bytenr, item_size);
+			goto out;
+		}
+	}
+
+out:
+	free(in_buf);
+	free(out_buf);
+	inflateEnd(&strm);
+	return ret;
+}
+
 /*
  * This function will try to find all chunk items in the dump image.
  *
@@ -2042,8 +2162,6 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres)
 
 		/* Search items for tree blocks in sys chunks */
 		for (i = 0; i < nritems; i++) {
-			size_t size;
-
 			item = &cluster->items[i];
 			bufsize = le32_to_cpu(item->size);
 			item_bytenr = le64_to_cpu(item->bytenr);
@@ -2068,41 +2186,16 @@ static int search_for_chunk_blocks(struct mdrestore_struct *mdres)
 			}
 
 			if (mdres->compress_method == COMPRESS_ZLIB) {
-				ret = fread(tmp, bufsize, 1, mdres->in);
-				if (ret != 1) {
-					error("read error: %m");
-					ret = -EIO;
-					goto out;
-				}
-
-				size = max_size;
-				ret = uncompress(buffer,
-						 (unsigned long *)&size, tmp,
-						 bufsize);
-				if (ret != Z_OK) {
-					error("decompression failed with %d",
-							ret);
-					ret = -EIO;
-					goto out;
-				}
+				ret = search_chunk_compressed(mdres, item,
+						current_cluster);
 			} else {
-				ret = fread(buffer, bufsize, 1, mdres->in);
-				if (ret != 1) {
-					error("read error: %m");
-					ret = -EIO;
-					goto out;
-				}
-				size = bufsize;
+				ret = search_chunk_uncompressed(mdres, item,
+						current_cluster);
 			}
-			ret = 0;
-
-			ret = read_chunk_block(mdres, buffer,
-					       item_bytenr, size,
-					       current_cluster);
 			if (ret < 0) {
 				error(
-	"failed to search tree blocks in item bytenr %llu size %lu",
-					item_bytenr, size);
+	"failed to search tree blocks in item bytenr %llu size %u",
+					item_bytenr, bufsize);
 				goto out;
 			}
 			bytenr += bufsize;
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 13/14] btrfs-progs: image: Output error message for chunk tree build error
  2019-07-02 10:07 [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes WenRuo Qu
                   ` (11 preceding siblings ...)
  2019-07-02 10:07 ` [PATCH v2 12/14] btrfs-progs: image: Reduce memory usage for chunk tree search WenRuo Qu
@ 2019-07-02 10:07 ` WenRuo Qu
  2019-07-02 10:08 ` [PATCH v2 14/14] btrfs-progs: image: Fix error output to show correct return value WenRuo Qu
  2019-07-04  2:13 ` [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes Anand Jain
  14 siblings, 0 replies; 17+ messages in thread
From: WenRuo Qu @ 2019-07-02 10:07 UTC (permalink / raw)
  To: linux-btrfs; +Cc: WenRuo Qu

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/image/main.c b/image/main.c
index 2256138df079..0be3b45569ed 100644
--- a/image/main.c
+++ b/image/main.c
@@ -2811,8 +2811,10 @@ static int restore_metadump(const char *input, FILE *out, int old_restore,
 
 	if (!multi_devices && !old_restore) {
 		ret = build_chunk_tree(&mdrestore, cluster);
-		if (ret)
+		if (ret) {
+			error("failed to build chunk tree");
 			goto out;
+		}
 		if (!list_empty(&mdrestore.overlapping_chunks))
 			remap_overlapping_chunks(&mdrestore);
 	}
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 14/14] btrfs-progs: image: Fix error output to show correct return value
  2019-07-02 10:07 [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes WenRuo Qu
                   ` (12 preceding siblings ...)
  2019-07-02 10:07 ` [PATCH v2 13/14] btrfs-progs: image: Output error message for chunk tree build error WenRuo Qu
@ 2019-07-02 10:08 ` WenRuo Qu
  2019-07-04  2:13 ` [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes Anand Jain
  14 siblings, 0 replies; 17+ messages in thread
From: WenRuo Qu @ 2019-07-02 10:08 UTC (permalink / raw)
  To: linux-btrfs; +Cc: WenRuo Qu

We can easily get confusing error message like:
  ERROR: restore failed: Success

This is caused by wrong "%m" usage, as we normally use ret to indicate
error, without populating errno.

This patch will fix it by output the return value directly as normally
we have extra error message to show more meaning message than the return
value.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/image/main.c b/image/main.c
index 0be3b45569ed..2bf3cfe395ec 100644
--- a/image/main.c
+++ b/image/main.c
@@ -3150,7 +3150,7 @@ int main(int argc, char *argv[])
 				       0, target, multi_devices);
 	}
 	if (ret) {
-		error("%s failed: %m", (create) ? "create" : "restore");
+		error("%s failed: %d", (create) ? "create" : "restore", ret);
 		goto out;
 	}
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes
  2019-07-02 10:07 [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes WenRuo Qu
                   ` (13 preceding siblings ...)
  2019-07-02 10:08 ` [PATCH v2 14/14] btrfs-progs: image: Fix error output to show correct return value WenRuo Qu
@ 2019-07-04  2:13 ` Anand Jain
  2019-07-04  2:54   ` Qu Wenruo
  14 siblings, 1 reply; 17+ messages in thread
From: Anand Jain @ 2019-07-04  2:13 UTC (permalink / raw)
  To: WenRuo Qu; +Cc: linux-btrfs

On 2/7/19 6:07 PM, WenRuo Qu wrote:
> This patchset is based on v5.1.1 tag.
> 
> With this update, the patchset has the following features:
> - various small fixes and enhancements for btrfs-image
>    * Fix an indent misalign
>    * Fix an access-beyond-boundary bug
>    * Fix a confusing error message due to unpopulated errno
>    * Output error message for chunk tree build error
>    * Use SZ_* to replace intermediate number
>    * Verify superblock before restore
> 
> - btrfs-image dump support
>    This introduce a new option -d to dump data.
>    Due to item size limit, we have to enlarge the existing limit from
>    256K (enough for tree blocks, but not enough for free space cache) to
>    256M.
>    This change will cause incompatibility, thus we have to introduce a
>    new magic as version. While keeping all other on-disk format the same.
> 
> - Reduce memory usage for both compressed and uncompressed images
>    Originally for compressed extents, we will use 4 * max_pending_size as
>    output buffer, which can be 1G for 256M newer limit.
> 
>    Change it to use at most 512K for compressed extent output buf, and
>    also use 512K fixed buffer size for uncompressed extent.
> 
> - btrfs-image restore optimization
>    This will speed up chunk item search during restore.
> 
> Changelog:
> v2:
> - New small fixes:
>    * Fix a confusing error message due to unpopulated errno
>    * Output error message for chunk tree build error
>    
> - Fix a regression of previous version
>    Patch "btrfs-progs: image: Rework how we search chunk tree blocks"
>    deleted a "ret = 0" line which could cause false early exit.
> 
> - Reduce memory usage for data dump


> Qu Wenruo (14):
>    btrfs-progs: image: Use SZ_* to replace intermediate size
>    btrfs-progs: image: Fix an indent misalign
>    btrfs-progs: image: Fix an access-beyond-boundary bug when there are
>      32 online CPUs
>    btrfs-progs: image: Verify the superblock before restore
>    btrfs-progs: image: Introduce framework for more dump versions
>    btrfs-progs: image: Introduce -d option to dump data
>    btrfs-progs: image: Allow restore to record system chunk ranges for
>      later usage
>    btrfs-progs: image: Introduce helper to determine if a tree block is
>      in the range of system chunks
>    btrfs-progs: image: Rework how we search chunk tree blocks
>    btrfs-progs: image: Reduce memory requirement for decompression
>    btrfs-progs: image: Don't waste memory when we're just extracting
>      super block
>    btrfs-progs: image: Reduce memory usage for chunk tree search
>    btrfs-progs: image: Output error message for chunk tree build error
>    btrfs-progs: image: Fix error output to show correct return value
>

How about separating the -d option enhancement patch from rest of
the patches? Looks like -d option patch is only one, and the rest
can go independently?.


>   disk-io.c        |   6 +-
>   disk-io.h        |   1 +
>   image/main.c     | 874 +++++++++++++++++++++++++++++++++++------------
>   image/metadump.h |  15 +-
>   4 files changed, 666 insertions(+), 230 deletions(-)
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes
  2019-07-04  2:13 ` [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes Anand Jain
@ 2019-07-04  2:54   ` Qu Wenruo
  0 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2019-07-04  2:54 UTC (permalink / raw)
  To: Anand Jain, WenRuo Qu; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 4151 bytes --]



On 2019/7/4 上午10:13, Anand Jain wrote:
> On 2/7/19 6:07 PM, WenRuo Qu wrote:
>> This patchset is based on v5.1.1 tag.
>>
>> With this update, the patchset has the following features:
>> - various small fixes and enhancements for btrfs-image
>>    * Fix an indent misalign
>>    * Fix an access-beyond-boundary bug
>>    * Fix a confusing error message due to unpopulated errno
>>    * Output error message for chunk tree build error
>>    * Use SZ_* to replace intermediate number
>>    * Verify superblock before restore
>>
>> - btrfs-image dump support
>>    This introduce a new option -d to dump data.
>>    Due to item size limit, we have to enlarge the existing limit from
>>    256K (enough for tree blocks, but not enough for free space cache) to
>>    256M.
>>    This change will cause incompatibility, thus we have to introduce a
>>    new magic as version. While keeping all other on-disk format the same.
>>
>> - Reduce memory usage for both compressed and uncompressed images
>>    Originally for compressed extents, we will use 4 * max_pending_size as
>>    output buffer, which can be 1G for 256M newer limit.
>>
>>    Change it to use at most 512K for compressed extent output buf, and
>>    also use 512K fixed buffer size for uncompressed extent.
>>
>> - btrfs-image restore optimization
>>    This will speed up chunk item search during restore.
>>
>> Changelog:
>> v2:
>> - New small fixes:
>>    * Fix a confusing error message due to unpopulated errno
>>    * Output error message for chunk tree build error
>>    - Fix a regression of previous version
>>    Patch "btrfs-progs: image: Rework how we search chunk tree blocks"
>>    deleted a "ret = 0" line which could cause false early exit.
>>
>> - Reduce memory usage for data dump
> 
> 
>> Qu Wenruo (14):
>>    btrfs-progs: image: Use SZ_* to replace intermediate size
>>    btrfs-progs: image: Fix an indent misalign
>>    btrfs-progs: image: Fix an access-beyond-boundary bug when there are
>>      32 online CPUs
>>    btrfs-progs: image: Verify the superblock before restore
>>    btrfs-progs: image: Introduce framework for more dump versions
>>    btrfs-progs: image: Introduce -d option to dump data
>>    btrfs-progs: image: Allow restore to record system chunk ranges for
>>      later usage
>>    btrfs-progs: image: Introduce helper to determine if a tree block is
>>      in the range of system chunks
>>    btrfs-progs: image: Rework how we search chunk tree blocks
>>    btrfs-progs: image: Reduce memory requirement for decompression
>>    btrfs-progs: image: Don't waste memory when we're just extracting
>>      super block
>>    btrfs-progs: image: Reduce memory usage for chunk tree search
>>    btrfs-progs: image: Output error message for chunk tree build error
>>    btrfs-progs: image: Fix error output to show correct return value
>>
> 
> How about separating the -d option enhancement patch from rest of
> the patches? Looks like -d option patch is only one, and the rest
> can go independently?.

For all the minor fixes like error message, the already merged ones, and
the chunk tree search part, no problem.

For the memory reduce part, it's only needed if we're going to support
data dump.

Unfortunately (or fortunately), the decompression memory usage reduction
is only needed if we enlarge the max_pending_size (used by data dump).

The original max_pending_size is just 256K, 4 * 256K per-thread is just
a piece of cake for modern systems.

If we don't need data dump, then the memory reduce part doesn't make
much sense.

I'll update the patchset to sort them into the following parts:
- Minor fixes
- Chunk tree search enhancement
- Data dump
- Memory reduction

Thanks,
Qu
> 
> 
>>   disk-io.c        |   6 +-
>>   disk-io.h        |   1 +
>>   image/main.c     | 874 +++++++++++++++++++++++++++++++++++------------
>>   image/metadump.h |  15 +-
>>   4 files changed, 666 insertions(+), 230 deletions(-)
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-07-04  2:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-02 10:07 [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes WenRuo Qu
2019-07-02 10:07 ` [PATCH v2 01/14] btrfs-progs: image: Use SZ_* to replace intermediate size WenRuo Qu
2019-07-02 10:07 ` [PATCH v2 02/14] btrfs-progs: image: Fix an indent misalign WenRuo Qu
2019-07-02 10:07 ` [PATCH v2 03/14] btrfs-progs: image: Fix an access-beyond-boundary bug when there are 32 online CPUs WenRuo Qu
2019-07-02 10:07 ` [PATCH v2 04/14] btrfs-progs: image: Verify the superblock before restore WenRuo Qu
2019-07-02 10:07 ` [PATCH v2 05/14] btrfs-progs: image: Introduce framework for more dump versions WenRuo Qu
2019-07-02 10:07 ` [PATCH v2 06/14] btrfs-progs: image: Introduce -d option to dump data WenRuo Qu
2019-07-02 10:07 ` [PATCH v2 07/14] btrfs-progs: image: Allow restore to record system chunk ranges for later usage WenRuo Qu
2019-07-02 10:07 ` [PATCH v2 08/14] btrfs-progs: image: Introduce helper to determine if a tree block is in the range of system chunks WenRuo Qu
2019-07-02 10:07 ` [PATCH v2 09/14] btrfs-progs: image: Rework how we search chunk tree blocks WenRuo Qu
2019-07-02 10:07 ` [PATCH v2 10/14] btrfs-progs: image: Reduce memory requirement for decompression WenRuo Qu
2019-07-02 10:07 ` [PATCH v2 11/14] btrfs-progs: image: Don't waste memory when we're just extracting super block WenRuo Qu
2019-07-02 10:07 ` [PATCH v2 12/14] btrfs-progs: image: Reduce memory usage for chunk tree search WenRuo Qu
2019-07-02 10:07 ` [PATCH v2 13/14] btrfs-progs: image: Output error message for chunk tree build error WenRuo Qu
2019-07-02 10:08 ` [PATCH v2 14/14] btrfs-progs: image: Fix error output to show correct return value WenRuo Qu
2019-07-04  2:13 ` [PATCH v2 00/14] btrfs-progs: image: Enhance and bug fixes Anand Jain
2019-07-04  2:54   ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).