All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/3] erofs-utils: compressed fragments feature
@ 2022-08-03  3:51 Yue Hu
  2022-08-03  3:51 ` [RFC PATCH v3 1/3] erofs-utils: lib: add support for fragments data decompression Yue Hu
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Yue Hu @ 2022-08-03  3:51 UTC (permalink / raw)
  To: linux-erofs; +Cc: huyue2, zbestahu, shaojunjun, zhangwen

In order to achieve greater compression ratio, let's introduce
compressed fragments feature which can merge tail of per-file or the
whole files into one special inode to reach the target.

And we can also set pcluster size to fragments inode for different
compression requirments.

In this patchset, we also improve the uncompressed data layout of
compressed files. Just write it from 'clusterofs' instead of 0 since it
can benefit from in-place I/O. For now, it only goes with fragments.

The main idea above is from Xiang.

Here is some test data of Linux 5.10.87 source code under Ubuntu 18.04:

linux-5.10.87 (erofs, uncompressed)                1.1G

linux-5.10.87 (erofs, lz4hc,12 4k fragments,4k)    301M
linux-5.10.87 (erofs, lz4hc,12 8k fragments,8k)    268M
linux-5.10.87 (erofs, lz4hc,12 16k fragments,16k)  242M
linux-5.10.87 (erofs, lz4hc,12 32k fragments,32k)  225M
linux-5.10.87 (erofs, lz4hc,12 64k fragments,64k)  217M

linux-5.10.87 (erofs, lz4hc,12 4k vanilla)         396M
linux-5.10.87 (erofs, lz4hc,12 8k vanilla)         376M
linux-5.10.87 (erofs, lz4hc,12 16k vanilla)        364M
linux-5.10.87 (erofs, lz4hc,12 32k vanilla)        359M
linux-5.10.87 (erofs, lz4hc,12 64k vanilla)        358M

Usage:
mkfs.erofs -zlz4hc,12 -C65536 -Efragments,65536 foo.erofs.img foo/

Changes since v2:
 - mainly reimplment the decompression logic for fragment inode due to
   kernel side;
 - fix compatibility issue to old image with ztailpacking feature;
 - move code of super.c in patch 3/3 to patch 1/3;
 - minor naming change.

Changes since v1:
 - mainly optimize index space for fragment inode;
 - add merging tail with len <= pclustersize into fragments directly;
 - use a inode instead of nid to avoid multiple load fragments;
 - fix memory leak of building fragments;
 - minor change to diff special fragments with normal inode.
 - rebase to commit cb058526 with patch [1];
 - code cleanup.

Note that inode will be extended version (64 bytes) due to mtime, may
use 'force-inode-compact' option to reduce the size if mtime careless.

[1] https://lore.kernel.org/linux-erofs/20220722053610.23912-1-huyue2@coolpad.com/

Yue Hu (3):
  erofs-utils: lib: add support for fragments data decompression
  erofs-utils: lib: support on-disk offset for shifted decompression
  erofs-utils: introduce compressed fragments support

 include/erofs/compress.h   |   3 +-
 include/erofs/config.h     |   3 +-
 include/erofs/decompress.h |   3 ++
 include/erofs/fragments.h  |  25 +++++++++
 include/erofs/inode.h      |   2 +
 include/erofs/internal.h   |   9 ++++
 include/erofs_fs.h         |  27 +++++++---
 lib/Makefile.am            |   4 +-
 lib/compress.c             | 108 +++++++++++++++++++++++++++----------
 lib/data.c                 |  28 +++++++++-
 lib/decompress.c           |  10 +++-
 lib/fragments.c            |  76 ++++++++++++++++++++++++++
 lib/inode.c                |  43 ++++++++++-----
 lib/super.c                |  24 ++++++++-
 lib/zmap.c                 |  26 +++++++++
 mkfs/main.c                |  64 +++++++++++++++++++---
 16 files changed, 393 insertions(+), 62 deletions(-)
 create mode 100644 include/erofs/fragments.h
 create mode 100644 lib/fragments.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH v3 1/3] erofs-utils: lib: add support for fragments data decompression
  2022-08-03  3:51 [RFC PATCH v3 0/3] erofs-utils: compressed fragments feature Yue Hu
@ 2022-08-03  3:51 ` Yue Hu
  2022-08-16 18:48   ` Gao Xiang
  2022-08-03  3:51 ` [RFC PATCH v3 2/3] erofs-utils: lib: support on-disk offset for shifted decompression Yue Hu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Yue Hu @ 2022-08-03  3:51 UTC (permalink / raw)
  To: linux-erofs; +Cc: huyue2, zbestahu, shaojunjun, zhangwen

Add compressed fragments support for erofsfuse.

Signed-off-by: Yue Hu <huyue2@coolpad.com>
---
 include/erofs/internal.h |  8 ++++++++
 include/erofs_fs.h       | 26 ++++++++++++++++++++------
 lib/data.c               | 20 ++++++++++++++++++++
 lib/super.c              | 24 +++++++++++++++++++++++-
 lib/zmap.c               | 26 ++++++++++++++++++++++++++
 5 files changed, 97 insertions(+), 7 deletions(-)

diff --git a/include/erofs/internal.h b/include/erofs/internal.h
index 48498fe..5980db7 100644
--- a/include/erofs/internal.h
+++ b/include/erofs/internal.h
@@ -102,6 +102,7 @@ struct erofs_sb_info {
 		u16 devt_slotoff;		/* used for mkfs */
 		u16 device_id_mask;		/* used for others */
 	};
+	struct erofs_inode *frags_inode;
 };
 
 /* global sbi */
@@ -132,6 +133,7 @@ EROFS_FEATURE_FUNCS(big_pcluster, incompat, INCOMPAT_BIG_PCLUSTER)
 EROFS_FEATURE_FUNCS(chunked_file, incompat, INCOMPAT_CHUNKED_FILE)
 EROFS_FEATURE_FUNCS(device_table, incompat, INCOMPAT_DEVICE_TABLE)
 EROFS_FEATURE_FUNCS(ztailpacking, incompat, INCOMPAT_ZTAILPACKING)
+EROFS_FEATURE_FUNCS(fragments, incompat, INCOMPAT_FRAGMENTS)
 EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM)
 
 #define EROFS_I_EA_INITED	(1 << 0)
@@ -190,6 +192,8 @@ struct erofs_inode {
 	void *eof_tailraw;
 	unsigned int eof_tailrawsize;
 
+	erofs_off_t fragmentoff;
+
 	union {
 		void *compressmeta;
 		void *chunkindexes;
@@ -201,6 +205,7 @@ struct erofs_inode {
 			uint64_t z_tailextent_headlcn;
 			unsigned int    z_idataoff;
 #define z_idata_size	idata_size
+#define z_fragmentoff	fragmentoff
 		};
 	};
 #ifdef WITH_ANDROID
@@ -276,6 +281,7 @@ enum {
 	BH_Mapped,
 	BH_Encoded,
 	BH_FullMapped,
+	BH_Fragments,
 };
 
 /* Has a disk mapping */
@@ -286,6 +292,8 @@ enum {
 #define EROFS_MAP_ENCODED	(1 << BH_Encoded)
 /* The length of extent is full */
 #define EROFS_MAP_FULL_MAPPED	(1 << BH_FullMapped)
+/* Located in fragments */
+#define EROFS_MAP_FRAGMENTS	(1 << BH_Fragments)
 
 struct erofs_map_blocks {
 	char mpage[EROFS_BLKSIZ];
diff --git a/include/erofs_fs.h b/include/erofs_fs.h
index 08f9761..4e13566 100644
--- a/include/erofs_fs.h
+++ b/include/erofs_fs.h
@@ -25,13 +25,15 @@
 #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
 #define EROFS_FEATURE_INCOMPAT_DEVICE_TABLE	0x00000008
 #define EROFS_FEATURE_INCOMPAT_ZTAILPACKING	0x00000010
+#define EROFS_FEATURE_INCOMPAT_FRAGMENTS	0x00000020
 #define EROFS_ALL_FEATURE_INCOMPAT		\
 	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
 	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
 	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
 	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
 	 EROFS_FEATURE_INCOMPAT_DEVICE_TABLE | \
-	 EROFS_FEATURE_INCOMPAT_ZTAILPACKING)
+	 EROFS_FEATURE_INCOMPAT_ZTAILPACKING | \
+	 EROFS_FEATURE_INCOMPAT_FRAGMENTS)
 
 #define EROFS_SB_EXTSLOT_SIZE	16
 
@@ -73,7 +75,9 @@ struct erofs_super_block {
 	} __packed u1;
 	__le16 extra_devices;	/* # of devices besides the primary device */
 	__le16 devt_slotoff;	/* startoff = devt_slotoff * devt_slotsize */
-	__u8 reserved2[38];
+	__u8 reserved[6];
+	__le64 frags_nid;	/* nid of the special fragments inode */
+	__u8 reserved2[24];
 };
 
 /*
@@ -294,16 +298,24 @@ struct z_erofs_lzma_cfgs {
  * bit 1 : HEAD1 big pcluster (0 - off; 1 - on)
  * bit 2 : HEAD2 big pcluster (0 - off; 1 - on)
  * bit 3 : tailpacking inline pcluster (0 - off; 1 - on)
+ * bit 4 : fragment pcluster (0 - off; 1 - on)
  */
 #define Z_EROFS_ADVISE_COMPACTED_2B		0x0001
 #define Z_EROFS_ADVISE_BIG_PCLUSTER_1		0x0002
 #define Z_EROFS_ADVISE_BIG_PCLUSTER_2		0x0004
 #define Z_EROFS_ADVISE_INLINE_PCLUSTER		0x0008
+#define Z_EROFS_ADVISE_FRAGMENT_PCLUSTER	0x0010
 
 struct z_erofs_map_header {
-	__le16	h_reserved1;
-	/* record the size of tailpacking data */
-	__le16  h_idata_size;
+	union {
+		/* direct addressing for fragment offset */
+		__le32	h_fragmentoff;
+		struct {
+			__le16  h_reserved1;
+			/* record the size of tailpacking data */
+			__le16	h_idata_size;
+		};
+	};
 	__le16	h_advise;
 	/*
 	 * bit 0-3 : algorithm type of head 1 (logical cluster type 01);
@@ -312,12 +324,14 @@ struct z_erofs_map_header {
 	__u8	h_algorithmtype;
 	/*
 	 * bit 0-2 : logical cluster bits - 12, e.g. 0 for 4096;
-	 * bit 3-7 : reserved.
+	 * bit 3-6 : reserved;
+	 * bit 7   : merge the whole file into fragments or not.
 	 */
 	__u8	h_clusterbits;
 };
 
 #define Z_EROFS_VLE_LEGACY_HEADER_PADDING       8
+#define Z_EROFS_FRAGMENT_INODE_BIT		7
 
 /*
  * Fixed-sized output compression ondisk Logical Extent cluster type:
diff --git a/lib/data.c b/lib/data.c
index 6bc554d..b9dd07b 100644
--- a/lib/data.c
+++ b/lib/data.c
@@ -275,6 +275,26 @@ static int z_erofs_read_data(struct erofs_inode *inode, char *buffer,
 			continue;
 		}
 
+		if (map.m_flags & EROFS_MAP_FRAGMENTS) {
+			char *out;
+
+			out = malloc(length - skip);
+			if (!out) {
+				ret = -ENOMEM;
+				break;
+			}
+			ret = z_erofs_read_data(sbi.frags_inode, out,
+						length - skip,
+						inode->z_fragmentoff + skip);
+			if (ret < 0) {
+				free(out);
+				break;
+			}
+			memcpy(buffer + end - offset, out, length - skip);
+			free(out);
+			continue;
+		}
+
 		if (map.m_plen > bufsize) {
 			bufsize = map.m_plen;
 			raw = realloc(raw, bufsize);
diff --git a/lib/super.c b/lib/super.c
index b267412..4d3ca00 100644
--- a/lib/super.c
+++ b/lib/super.c
@@ -104,6 +104,21 @@ int erofs_read_superblock(void)
 	sbi.xattr_blkaddr = le32_to_cpu(dsb->xattr_blkaddr);
 	sbi.islotbits = EROFS_ISLOTBITS;
 	sbi.root_nid = le16_to_cpu(dsb->root_nid);
+	sbi.frags_inode = NULL;
+	if (erofs_sb_has_fragments()) {
+		struct erofs_inode *inode;
+
+		inode = calloc(1, sizeof(struct erofs_inode));
+		if (!inode)
+			return -ENOMEM;
+		inode->nid = le64_to_cpu(dsb->frags_nid);
+		ret = erofs_read_inode_from_disk(inode);
+		if (ret) {
+			free(inode);
+			return ret;
+		}
+		sbi.frags_inode = inode;
+	}
 	sbi.inos = le64_to_cpu(dsb->inos);
 	sbi.checksum = le32_to_cpu(dsb->checksum);
 
@@ -111,11 +126,18 @@ int erofs_read_superblock(void)
 	sbi.build_time_nsec = le32_to_cpu(dsb->build_time_nsec);
 
 	memcpy(&sbi.uuid, dsb->uuid, sizeof(dsb->uuid));
-	return erofs_init_devices(&sbi, dsb);
+
+	ret = erofs_init_devices(&sbi, dsb);
+	if (ret && sbi.frags_inode)
+		free(sbi.frags_inode);
+	return ret;
 }
 
 void erofs_put_super(void)
 {
 	if (sbi.devs)
 		free(sbi.devs);
+
+	if (sbi.frags_inode)
+		free(sbi.frags_inode);
 }
diff --git a/lib/zmap.c b/lib/zmap.c
index 95745c5..16267ae 100644
--- a/lib/zmap.c
+++ b/lib/zmap.c
@@ -83,6 +83,20 @@ static int z_erofs_fill_inode_lazy(struct erofs_inode *vi)
 		if (ret < 0)
 			return ret;
 	}
+	if (vi->z_advise & Z_EROFS_ADVISE_FRAGMENT_PCLUSTER) {
+		vi->z_fragmentoff = le32_to_cpu(h->h_fragmentoff);
+
+		if (h->h_clusterbits >> Z_EROFS_FRAGMENT_INODE_BIT) {
+			vi->z_tailextent_headlcn = 0;
+		} else {
+			struct erofs_map_blocks map = { .index = UINT_MAX };
+
+			ret = z_erofs_do_map_blocks(vi, &map,
+						    EROFS_GET_BLOCKS_FINDTAIL);
+			if (ret < 0)
+				return ret;
+		}
+	}
 	vi->flags |= EROFS_I_Z_INITED;
 	return 0;
 }
@@ -546,6 +560,7 @@ static int z_erofs_do_map_blocks(struct erofs_inode *vi,
 				 int flags)
 {
 	bool ztailpacking = vi->z_advise & Z_EROFS_ADVISE_INLINE_PCLUSTER;
+	bool infrags = vi->z_advise & Z_EROFS_ADVISE_FRAGMENT_PCLUSTER;
 	struct z_erofs_maprecorder m = {
 		.inode = vi,
 		.map = map,
@@ -609,6 +624,9 @@ static int z_erofs_do_map_blocks(struct erofs_inode *vi,
 		map->m_flags |= EROFS_MAP_META;
 		map->m_pa = vi->z_idataoff;
 		map->m_plen = vi->z_idata_size;
+	} else if (infrags && m.lcn == vi->z_tailextent_headlcn) {
+		map->m_flags |= EROFS_MAP_FRAGMENTS;
+		DBG_BUGON(!map->m_la);
 	} else {
 		map->m_pa = blknr_to_addr(m.pblk);
 		err = z_erofs_get_extent_compressedlen(&m, initial_lcn);
@@ -652,6 +670,14 @@ int z_erofs_map_blocks_iter(struct erofs_inode *vi,
 	if (err)
 		goto out;
 
+	if ((vi->z_advise & Z_EROFS_ADVISE_FRAGMENT_PCLUSTER) &&
+	    !vi->z_tailextent_headlcn) {
+		map->m_llen = map->m_la + 1;
+		map->m_la = 0;
+		map->m_flags = EROFS_MAP_MAPPED | EROFS_MAP_FRAGMENTS;
+		goto out;
+	}
+
 	err = z_erofs_do_map_blocks(vi, map, flags);
 out:
 	DBG_BUGON(err < 0 && err != -ENOMEM);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v3 2/3] erofs-utils: lib: support on-disk offset for shifted decompression
  2022-08-03  3:51 [RFC PATCH v3 0/3] erofs-utils: compressed fragments feature Yue Hu
  2022-08-03  3:51 ` [RFC PATCH v3 1/3] erofs-utils: lib: add support for fragments data decompression Yue Hu
@ 2022-08-03  3:51 ` Yue Hu
  2022-08-03  3:51 ` [RFC PATCH v3 3/3] erofs-utils: introduce compressed fragments support Yue Hu
  2022-08-03  6:07 ` [RFC PATCH v3 0/3] erofs-utils: compressed fragments feature Gao Xiang
  3 siblings, 0 replies; 8+ messages in thread
From: Yue Hu @ 2022-08-03  3:51 UTC (permalink / raw)
  To: linux-erofs; +Cc: huyue2, zbestahu, shaojunjun, zhangwen

Add support to uncompressed data layout with on-disk offset for
compressed files.

Signed-off-by: Yue Hu <huyue2@coolpad.com>
---
 include/erofs/decompress.h |  3 +++
 lib/data.c                 |  8 +++++++-
 lib/decompress.c           | 10 ++++++++--
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/erofs/decompress.h b/include/erofs/decompress.h
index 82bf7b8..b622df5 100644
--- a/include/erofs/decompress.h
+++ b/include/erofs/decompress.h
@@ -23,6 +23,9 @@ struct z_erofs_decompress_req {
 	unsigned int decodedskip;
 	unsigned int inputsize, decodedlength;
 
+	/* head offset of uncompressed data */
+	unsigned int shiftedhead;
+
 	/* indicate the algorithm will be used for decompression */
 	unsigned int alg;
 	bool partial_decoding;
diff --git a/lib/data.c b/lib/data.c
index b9dd07b..7e2e2cf 100644
--- a/lib/data.c
+++ b/lib/data.c
@@ -226,7 +226,7 @@ static int z_erofs_read_data(struct erofs_inode *inode, char *buffer,
 	};
 	struct erofs_map_dev mdev;
 	bool partial;
-	unsigned int bufsize = 0;
+	unsigned int bufsize = 0, head;
 	char *raw = NULL;
 	int ret = 0;
 
@@ -307,10 +307,16 @@ static int z_erofs_read_data(struct erofs_inode *inode, char *buffer,
 		if (ret < 0)
 			break;
 
+		head = 0;
+		if (erofs_sb_has_fragments() &&
+		    map.m_algorithmformat == Z_EROFS_COMPRESSION_SHIFTED)
+			head = erofs_blkoff(end);
+
 		ret = z_erofs_decompress(&(struct z_erofs_decompress_req) {
 					.in = raw,
 					.out = buffer + end - offset,
 					.decodedskip = skip,
+					.shiftedhead = head,
 					.inputsize = map.m_plen,
 					.decodedlength = length,
 					.alg = map.m_algorithmformat,
diff --git a/lib/decompress.c b/lib/decompress.c
index 1661f91..08a0861 100644
--- a/lib/decompress.c
+++ b/lib/decompress.c
@@ -132,14 +132,20 @@ out:
 int z_erofs_decompress(struct z_erofs_decompress_req *rq)
 {
 	if (rq->alg == Z_EROFS_COMPRESSION_SHIFTED) {
+		unsigned int count, rightpart;
+
 		if (rq->inputsize > EROFS_BLKSIZ)
 			return -EFSCORRUPTED;
 
 		DBG_BUGON(rq->decodedlength > EROFS_BLKSIZ);
 		DBG_BUGON(rq->decodedlength < rq->decodedskip);
 
-		memcpy(rq->out, rq->in + rq->decodedskip,
-		       rq->decodedlength - rq->decodedskip);
+		count = rq->decodedlength - rq->decodedskip;
+		rightpart = min(EROFS_BLKSIZ - rq->shiftedhead, count);
+
+		memcpy(rq->out, rq->in + (erofs_sb_has_fragments() ?
+		       rq->shiftedhead : rq->decodedskip), rightpart);
+		memcpy(rq->out + rightpart, rq->in, count - rightpart);
 		return 0;
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v3 3/3] erofs-utils: introduce compressed fragments support
  2022-08-03  3:51 [RFC PATCH v3 0/3] erofs-utils: compressed fragments feature Yue Hu
  2022-08-03  3:51 ` [RFC PATCH v3 1/3] erofs-utils: lib: add support for fragments data decompression Yue Hu
  2022-08-03  3:51 ` [RFC PATCH v3 2/3] erofs-utils: lib: support on-disk offset for shifted decompression Yue Hu
@ 2022-08-03  3:51 ` Yue Hu
  2022-08-03  6:07 ` [RFC PATCH v3 0/3] erofs-utils: compressed fragments feature Gao Xiang
  3 siblings, 0 replies; 8+ messages in thread
From: Yue Hu @ 2022-08-03  3:51 UTC (permalink / raw)
  To: linux-erofs; +Cc: huyue2, zbestahu, shaojunjun, zhangwen

This approach can merge tail pclusters or the whole files into a special
inode in order to achieve greater compression ratio. And an option of
pcluster size is provided for different compression requirments.

Meanwhile, we change to write the uncompressed data from 'clusterofs'
when compressing files since it can benefit from in-place I/O. For now,
this change goes with the fragments.

Signed-off-by: Yue Hu <huyue2@coolpad.com>
---
 include/erofs/compress.h  |   3 +-
 include/erofs/config.h    |   3 +-
 include/erofs/fragments.h |  25 +++++++++
 include/erofs/inode.h     |   2 +
 include/erofs/internal.h  |   1 +
 include/erofs_fs.h        |   1 +
 lib/Makefile.am           |   4 +-
 lib/compress.c            | 108 ++++++++++++++++++++++++++++----------
 lib/fragments.c           |  76 +++++++++++++++++++++++++++
 lib/inode.c               |  43 ++++++++++-----
 mkfs/main.c               |  64 +++++++++++++++++++---
 11 files changed, 278 insertions(+), 52 deletions(-)
 create mode 100644 include/erofs/fragments.h
 create mode 100644 lib/fragments.c

diff --git a/include/erofs/compress.h b/include/erofs/compress.h
index 24f6204..d17aadb 100644
--- a/include/erofs/compress.h
+++ b/include/erofs/compress.h
@@ -18,7 +18,8 @@ extern "C"
 #define EROFS_CONFIG_COMPR_MIN_SZ           (32   * 1024)
 
 void z_erofs_drop_inline_pcluster(struct erofs_inode *inode);
-int erofs_write_compressed_file(struct erofs_inode *inode);
+int erofs_write_compressed_file_from_fd(struct erofs_inode *inode, int fd,
+					bool is_src);
 
 int z_erofs_compress_init(struct erofs_buffer_head *bh);
 int z_erofs_compress_exit(void);
diff --git a/include/erofs/config.h b/include/erofs/config.h
index 6c6d71f..b677c54 100644
--- a/include/erofs/config.h
+++ b/include/erofs/config.h
@@ -44,6 +44,7 @@ struct erofs_configure {
 	char c_chunkbits;
 	bool c_noinline_data;
 	bool c_ztailpacking;
+	bool c_fragments;
 	bool c_ignore_mtime;
 	bool c_showprogress;
 
@@ -62,7 +63,7 @@ struct erofs_configure {
 	/* < 0, xattr disabled and INT_MAX, always use inline xattrs */
 	int c_inline_xattr_tolerance;
 
-	u32 c_pclusterblks_max, c_pclusterblks_def;
+	u32 c_pclusterblks_max, c_pclusterblks_def, c_pclusterblks_frags;
 	u32 c_max_decompressed_extent_bytes;
 	u32 c_dict_size;
 	u64 c_unix_timestamp;
diff --git a/include/erofs/fragments.h b/include/erofs/fragments.h
new file mode 100644
index 0000000..89f0f18
--- /dev/null
+++ b/include/erofs/fragments.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR Apache-2.0 */
+/*
+ * Copyright (C), 2022, Coolpad Group Limited.
+ */
+#ifndef __EROFS_FRAGMENTS_H
+#define __EROFS_FRAGMENTS_H
+
+#ifdef __cplusplus
+extern "C"
+{
+#endif
+
+#include "erofs/internal.h"
+
+int z_erofs_fill_fragments(struct erofs_inode *inode, void *data,
+			   unsigned int len);
+struct erofs_inode *erofs_mkfs_build_fragments(void);
+int erofs_fragments_init(void);
+void erofs_fragments_exit(void);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/include/erofs/inode.h b/include/erofs/inode.h
index 79b39b0..0a87c58 100644
--- a/include/erofs/inode.h
+++ b/include/erofs/inode.h
@@ -21,6 +21,8 @@ unsigned int erofs_iput(struct erofs_inode *inode);
 erofs_nid_t erofs_lookupnid(struct erofs_inode *inode);
 struct erofs_inode *erofs_mkfs_build_tree_from_path(struct erofs_inode *parent,
 						    const char *path);
+int erofs_prepare_inode_buffer(struct erofs_inode *inode);
+struct erofs_inode *erofs_generate_inode(struct stat64 *st, const char *path);
 
 #ifdef __cplusplus
 }
diff --git a/include/erofs/internal.h b/include/erofs/internal.h
index 5980db7..deb4fd6 100644
--- a/include/erofs/internal.h
+++ b/include/erofs/internal.h
@@ -193,6 +193,7 @@ struct erofs_inode {
 	unsigned int eof_tailrawsize;
 
 	erofs_off_t fragmentoff;
+	unsigned int fragment_size;
 
 	union {
 		void *compressmeta;
diff --git a/include/erofs_fs.h b/include/erofs_fs.h
index 4e13566..429274c 100644
--- a/include/erofs_fs.h
+++ b/include/erofs_fs.h
@@ -267,6 +267,7 @@ struct erofs_inode_chunk_index {
 
 /* maximum supported size of a physical compression cluster */
 #define Z_EROFS_PCLUSTER_MAX_SIZE	(1024 * 1024)
+#define Z_EROFS_PCLUSTER_MAX_BLKS	(Z_EROFS_PCLUSTER_MAX_SIZE / EROFS_BLKSIZ)
 
 /* available compression algorithm types (for h_algorithmtype) */
 enum {
diff --git a/lib/Makefile.am b/lib/Makefile.am
index 3fad357..95f1d55 100644
--- a/lib/Makefile.am
+++ b/lib/Makefile.am
@@ -22,12 +22,14 @@ noinst_HEADERS = $(top_srcdir)/include/erofs_fs.h \
       $(top_srcdir)/include/erofs/trace.h \
       $(top_srcdir)/include/erofs/xattr.h \
       $(top_srcdir)/include/erofs/compress_hints.h \
+      $(top_srcdir)/include/erofs/fragments.h \
       $(top_srcdir)/lib/liberofs_private.h
 
 noinst_HEADERS += compressor.h
 liberofs_la_SOURCES = config.c io.c cache.c super.c inode.c xattr.c exclude.c \
 		      namei.c data.c compress.c compressor.c zmap.c decompress.c \
-		      compress_hints.c hashmap.c sha256.c blobchunk.c dir.c
+		      compress_hints.c hashmap.c sha256.c blobchunk.c dir.c \
+		      fragments.c
 liberofs_la_CFLAGS = -Wall -I$(top_srcdir)/include
 if ENABLE_LZ4
 liberofs_la_CFLAGS += ${LZ4_CFLAGS}
diff --git a/lib/compress.c b/lib/compress.c
index ee3b856..713c105 100644
--- a/lib/compress.c
+++ b/lib/compress.c
@@ -18,6 +18,7 @@
 #include "compressor.h"
 #include "erofs/block_list.h"
 #include "erofs/compress_hints.h"
+#include "erofs/fragments.h"
 
 static struct erofs_compress compresshandle;
 static unsigned int algorithmtype[2];
@@ -74,9 +75,9 @@ static void vle_write_indexes(struct z_erofs_vle_compress_ctx *ctx,
 	if (!d1) {
 		/*
 		 * A lcluster cannot have three parts with the middle one which
-		 * is well-compressed for !ztailpacking cases.
+		 * is well-compressed for !ztailpacking and !fragments cases.
 		 */
-		DBG_BUGON(!raw && !cfg.c_ztailpacking);
+		DBG_BUGON(!raw && !cfg.c_ztailpacking && !cfg.c_fragments);
 		type = raw ? Z_EROFS_VLE_CLUSTER_TYPE_PLAIN :
 			Z_EROFS_VLE_CLUSTER_TYPE_HEAD;
 		advise = cpu_to_le16(type << Z_EROFS_VLE_DI_CLUSTER_TYPE_BIT);
@@ -143,7 +144,7 @@ static int write_uncompressed_extent(struct z_erofs_vle_compress_ctx *ctx,
 				     unsigned int *len, char *dst)
 {
 	int ret;
-	unsigned int count;
+	unsigned int count, offset, rcopied, rzeroed;
 
 	/* reset clusterofs to 0 if permitted */
 	if (!erofs_sb_has_lz4_0padding() && ctx->clusterofs &&
@@ -153,11 +154,21 @@ static int write_uncompressed_extent(struct z_erofs_vle_compress_ctx *ctx,
 		ctx->clusterofs = 0;
 	}
 
-	/* write uncompressed data */
+	/*
+	 * write uncompressed data from clusterofs which can benefit from
+	 * in-place I/O, loop shift right when to exceed EROFS_BLKSIZ.
+	 */
 	count = min(EROFS_BLKSIZ, *len);
 
-	memcpy(dst, ctx->queue + ctx->head, count);
-	memset(dst + count, 0, EROFS_BLKSIZ - count);
+	offset = cfg.c_fragments ? ctx->clusterofs : 0;
+	rcopied = min(EROFS_BLKSIZ - offset, count);
+	rzeroed = EROFS_BLKSIZ - offset - rcopied;
+
+	memcpy(dst + offset, ctx->queue + ctx->head, rcopied);
+	memcpy(dst, ctx->queue + ctx->head + rcopied, count - rcopied);
+
+	memset(dst + offset + rcopied, 0, rzeroed);
+	memset(dst + count - rcopied, 0, EROFS_BLKSIZ - count - rzeroed);
 
 	erofs_dbg("Writing %u uncompressed data to block %u",
 		  count, ctx->blkaddr);
@@ -167,8 +178,11 @@ static int write_uncompressed_extent(struct z_erofs_vle_compress_ctx *ctx,
 	return count;
 }
 
-static unsigned int z_erofs_get_max_pclusterblks(struct erofs_inode *inode)
+static unsigned int z_erofs_get_max_pclusterblks(struct erofs_inode *inode,
+						 bool is_src)
 {
+	if (cfg.c_fragments && !is_src)
+		return cfg.c_pclusterblks_frags;
 #ifndef NDEBUG
 	if (cfg.c_random_pclusterblks)
 		return 1 + rand() % cfg.c_pclusterblks_max;
@@ -224,7 +238,7 @@ static void tryrecompress_trailing(void *in, unsigned int *insize,
 
 static int vle_compress_one(struct erofs_inode *inode,
 			    struct z_erofs_vle_compress_ctx *ctx,
-			    bool final)
+			    bool final, bool is_src)
 {
 	struct erofs_compress *const h = &compresshandle;
 	unsigned int len = ctx->tail - ctx->head;
@@ -234,14 +248,19 @@ static int vle_compress_one(struct erofs_inode *inode,
 	char *const dst = dstbuf + EROFS_BLKSIZ;
 
 	while (len) {
-		unsigned int pclustersize =
-			z_erofs_get_max_pclusterblks(inode) * EROFS_BLKSIZ;
+		unsigned int pclustersize = EROFS_BLKSIZ *
+				z_erofs_get_max_pclusterblks(inode, is_src);
 		bool may_inline = (cfg.c_ztailpacking && final);
+		bool may_merge = (cfg.c_fragments && final && is_src);
 		bool raw;
 
 		if (len <= pclustersize) {
 			if (!final)
 				break;
+			if (may_merge) {
+				count = len;
+				goto fragments;
+			}
 			if (!may_inline && len <= EROFS_BLKSIZ)
 				goto nocompression;
 		}
@@ -294,6 +313,19 @@ nocompression:
 				return ret;
 			ctx->compressedblks = 1;
 			raw = false;
+		} else if (may_merge && len == count && ret < pclustersize) {
+fragments:
+			ret = z_erofs_fill_fragments(inode,
+						     ctx->queue + ctx->head,
+						     len);
+			if (ret < 0)
+				return ret;
+			if (inode->i_size == inode->fragment_size) {
+				ctx->head += len;
+				return 0;
+			}
+			ctx->compressedblks = 0;
+			raw = false;
 		} else {
 			unsigned int tailused, padding;
 
@@ -546,13 +578,20 @@ static void z_erofs_write_mapheader(struct erofs_inode *inode,
 {
 	struct z_erofs_map_header h = {
 		.h_advise = cpu_to_le16(inode->z_advise),
-		.h_idata_size = cpu_to_le16(inode->idata_size),
 		.h_algorithmtype = inode->z_algorithmtype[1] << 4 |
 				   inode->z_algorithmtype[0],
 		/* lclustersize */
 		.h_clusterbits = inode->z_logical_clusterbits - 12,
 	};
 
+	if (cfg.c_fragments)
+		h.h_fragmentoff = cpu_to_le32(inode->fragmentoff);
+	else
+		h.h_idata_size = cpu_to_le16(inode->idata_size);
+
+	if (inode->fragment_size && inode->i_size == inode->fragment_size)
+		h.h_clusterbits |=  1 << Z_EROFS_FRAGMENT_INODE_BIT;
+
 	memset(compressmeta, 0, Z_EROFS_LEGACY_MAP_HEADER_SIZE);
 	/* write out map header */
 	memcpy(compressmeta, &h, sizeof(struct z_erofs_map_header));
@@ -604,30 +643,25 @@ void z_erofs_drop_inline_pcluster(struct erofs_inode *inode)
 	inode->eof_tailraw = NULL;
 }
 
-int erofs_write_compressed_file(struct erofs_inode *inode)
+int erofs_write_compressed_file_from_fd(struct erofs_inode *inode, int fd,
+					bool is_src)
 {
 	struct erofs_buffer_head *bh;
 	static struct z_erofs_vle_compress_ctx ctx;
 	erofs_off_t remaining;
 	erofs_blk_t blkaddr, compressed_blocks;
 	unsigned int legacymetasize;
-	int ret, fd;
+	int ret;
 	u8 *compressmeta = malloc(vle_compressmeta_capacity(inode->i_size));
 
 	if (!compressmeta)
 		return -ENOMEM;
 
-	fd = open(inode->i_srcpath, O_RDONLY | O_BINARY);
-	if (fd < 0) {
-		ret = -errno;
-		goto err_free_meta;
-	}
-
 	/* allocate main data buffer */
 	bh = erofs_balloc(DATA, 0, 0, 0);
 	if (IS_ERR(bh)) {
 		ret = PTR_ERR(bh);
-		goto err_close;
+		goto err_free_meta;
 	}
 
 	/* initialize per-file compression setting */
@@ -648,6 +682,9 @@ int erofs_write_compressed_file(struct erofs_inode *inode)
 	inode->z_algorithmtype[1] = algorithmtype[1];
 	inode->z_logical_clusterbits = LOG_BLOCK_SIZE;
 
+	inode->idata_size = 0;
+	inode->fragment_size = 0;
+
 	blkaddr = erofs_mapbh(bh->block);	/* start_blkaddr */
 	ctx.blkaddr = blkaddr;
 	ctx.metacur = compressmeta + Z_EROFS_LEGACY_MAP_HEADER_SIZE;
@@ -667,7 +704,7 @@ int erofs_write_compressed_file(struct erofs_inode *inode)
 		remaining -= readcount;
 		ctx.tail += readcount;
 
-		ret = vle_compress_one(inode, &ctx, !remaining);
+		ret = vle_compress_one(inode, &ctx, !remaining, is_src);
 		if (ret)
 			goto err_free_idata;
 	}
@@ -681,19 +718,20 @@ int erofs_write_compressed_file(struct erofs_inode *inode)
 	vle_write_indexes_final(&ctx);
 	legacymetasize = ctx.metacur - compressmeta;
 	/* estimate if data compression saves space or not */
-	if (compressed_blocks * EROFS_BLKSIZ + inode->idata_size +
+	if (!inode->fragment_size &&
+	    compressed_blocks * EROFS_BLKSIZ + inode->idata_size +
 	    legacymetasize >= inode->i_size) {
 		ret = -ENOSPC;
 		goto err_free_idata;
 	}
 	z_erofs_write_mapheader(inode, compressmeta);
 
-	close(fd);
 	if (compressed_blocks) {
 		ret = erofs_bh_balloon(bh, blknr_to_addr(compressed_blocks));
 		DBG_BUGON(ret != EROFS_BLKSIZ);
 	} else {
-		DBG_BUGON(!inode->idata_size);
+		if (!cfg.c_fragments)
+			DBG_BUGON(!inode->idata_size);
 	}
 
 	erofs_info("compressed %s (%llu bytes) into %u blocks",
@@ -716,7 +754,8 @@ int erofs_write_compressed_file(struct erofs_inode *inode)
 		DBG_BUGON(ret);
 	}
 	inode->compressmeta = compressmeta;
-	erofs_droid_blocklist_write(inode, blkaddr, compressed_blocks);
+	if (is_src)
+		erofs_droid_blocklist_write(inode, blkaddr, compressed_blocks);
 	return 0;
 
 err_free_idata:
@@ -726,8 +765,6 @@ err_free_idata:
 	}
 err_bdrop:
 	erofs_bdrop(bh, true);	/* revoke buffer */
-err_close:
-	close(fd);
 err_free_meta:
 	free(compressmeta);
 	return ret;
@@ -833,14 +870,27 @@ int z_erofs_compress_init(struct erofs_buffer_head *sb_bh)
 	 * to be loaded in order to get those compressed block counts.
 	 */
 	if (cfg.c_pclusterblks_max > 1) {
-		if (cfg.c_pclusterblks_max >
-		    Z_EROFS_PCLUSTER_MAX_SIZE / EROFS_BLKSIZ) {
+		if (cfg.c_pclusterblks_max > Z_EROFS_PCLUSTER_MAX_BLKS) {
 			erofs_err("unsupported clusterblks %u (too large)",
 				  cfg.c_pclusterblks_max);
 			return -EINVAL;
 		}
+		if (cfg.c_pclusterblks_frags > Z_EROFS_PCLUSTER_MAX_BLKS) {
+			erofs_err("unsupported clusterblks %u (too large for fragments)",
+				  cfg.c_pclusterblks_frags);
+			return -EINVAL;
+		}
+		if (cfg.c_pclusterblks_frags == 1) {
+			erofs_err("physical cluster size of fragments should > 4096 bytes");
+			return -EINVAL;
+		}
 		erofs_sb_set_big_pcluster();
 	}
+	if (!erofs_sb_has_big_pcluster() && cfg.c_pclusterblks_frags > 1) {
+		erofs_err("invalid clusterblks %u (for fragments)",
+			  cfg.c_pclusterblks_frags);
+		return -EINVAL;
+	}
 
 	if (ret != Z_EROFS_COMPRESSION_LZ4)
 		erofs_sb_set_compr_cfgs();
diff --git a/lib/fragments.c b/lib/fragments.c
new file mode 100644
index 0000000..67e79b8
--- /dev/null
+++ b/lib/fragments.c
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0+ OR Apache-2.0
+/*
+ * Copyright (C), 2022, Coolpad Group Limited.
+ * Created by Yue Hu <huyue2@coolpad.com>
+ */
+#define _GNU_SOURCE
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/stat.h>
+#include "erofs/err.h"
+#include "erofs/inode.h"
+#include "erofs/compress.h"
+#include "erofs/print.h"
+#include "erofs/fragments.h"
+
+static FILE *fragmentsfp;
+
+int z_erofs_fill_fragments(struct erofs_inode *inode, void *data,
+			   unsigned int len)
+{
+	inode->z_advise |= Z_EROFS_ADVISE_FRAGMENT_PCLUSTER;
+	inode->fragmentoff = ftell(fragmentsfp);
+	inode->fragment_size = len;
+
+	if (write(fileno(fragmentsfp), data, len) < 0)
+		return -EIO;
+
+	erofs_sb_set_fragments();
+
+	erofs_dbg("Recording %u fragment data at %lu", inode->fragment_size,
+		  inode->fragmentoff);
+	return len;
+}
+
+struct erofs_inode *erofs_mkfs_build_fragments(void)
+{
+	struct stat64 st;
+	struct erofs_inode *inode;
+	int ret, fd = fileno(fragmentsfp);
+
+	ret = fstat64(fd, &st);
+	if (ret)
+		return ERR_PTR(-errno);
+
+	inode = erofs_generate_inode(&st, NULL);
+	if (IS_ERR(inode))
+		return inode;
+
+	fseek(fragmentsfp, 0, SEEK_SET);
+	ret = erofs_write_compressed_file_from_fd(inode, fd, false);
+	if (ret) {
+		erofs_err("write fragments file error");
+		return ERR_PTR(ret);
+	}
+
+	erofs_prepare_inode_buffer(inode);
+	return inode;
+}
+
+void erofs_fragments_exit(void)
+{
+	if (fragmentsfp)
+		fclose(fragmentsfp);
+}
+
+int erofs_fragments_init(void)
+{
+#ifdef HAVE_TMPFILE64
+	fragmentsfp = tmpfile64();
+#else
+	fragmentsfp = tmpfile();
+#endif
+	if (!fragmentsfp)
+		return -ENOMEM;
+	return 0;
+}
diff --git a/lib/inode.c b/lib/inode.c
index f192510..a49c7a7 100644
--- a/lib/inode.c
+++ b/lib/inode.c
@@ -405,7 +405,11 @@ int erofs_write_file(struct erofs_inode *inode)
 	}
 
 	if (cfg.c_compr_alg_master && erofs_file_is_compressible(inode)) {
-		ret = erofs_write_compressed_file(inode);
+		fd = open(inode->i_srcpath, O_RDONLY | O_BINARY);
+		if (fd < 0)
+			return -errno;
+		ret = erofs_write_compressed_file_from_fd(inode, fd, true);
+		close(fd);
 
 		if (!ret || ret != -ENOSPC)
 			return ret;
@@ -583,7 +587,7 @@ static int erofs_prepare_tail_block(struct erofs_inode *inode)
 	return 0;
 }
 
-static int erofs_prepare_inode_buffer(struct erofs_inode *inode)
+int erofs_prepare_inode_buffer(struct erofs_inode *inode)
 {
 	unsigned int inodesize;
 	struct erofs_buffer_head *bh, *ibh;
@@ -782,6 +786,9 @@ int erofs_droid_inode_fsconfig(struct erofs_inode *inode,
 	const char *fspath;
 	char *decorated = NULL;
 
+	if (!path)
+		return 0;
+
 	inode->capabilities = 0;
 	if (!cfg.fs_config_file && !cfg.mount_point)
 		return 0;
@@ -868,7 +875,8 @@ static int erofs_fill_inode(struct erofs_inode *inode,
 		return -EINVAL;
 	}
 
-	strncpy(inode->i_srcpath, path, sizeof(inode->i_srcpath) - 1);
+	strncpy(inode->i_srcpath, path ? path : "tmp",
+		sizeof(inode->i_srcpath) - 1);
 	inode->i_srcpath[sizeof(inode->i_srcpath) - 1] = '\0';
 
 	inode->dev = st->st_dev;
@@ -907,6 +915,23 @@ static struct erofs_inode *erofs_new_inode(void)
 	return inode;
 }
 
+struct erofs_inode *erofs_generate_inode(struct stat64 *st, const char *path)
+{
+	struct erofs_inode *inode;
+	int ret;
+
+	inode = erofs_new_inode();
+	if (IS_ERR(inode))
+		return inode;
+
+	ret = erofs_fill_inode(inode, st, path);
+	if (ret) {
+		free(inode);
+		return ERR_PTR(ret);
+	}
+	return inode;
+}
+
 /* get the inode from the (source) path */
 static struct erofs_inode *erofs_iget_from_path(const char *path, bool is_src)
 {
@@ -934,17 +959,7 @@ static struct erofs_inode *erofs_iget_from_path(const char *path, bool is_src)
 	}
 
 	/* cannot find in the inode cache */
-	inode = erofs_new_inode();
-	if (IS_ERR(inode))
-		return inode;
-
-	ret = erofs_fill_inode(inode, &st, path);
-	if (ret) {
-		free(inode);
-		return ERR_PTR(ret);
-	}
-
-	return inode;
+	return erofs_generate_inode(&st, path);
 }
 
 static void erofs_fixup_meta_blkaddr(struct erofs_inode *rootdir)
diff --git a/mkfs/main.c b/mkfs/main.c
index deb8e1f..6a03f8d 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -23,6 +23,7 @@
 #include "erofs/block_list.h"
 #include "erofs/compress_hints.h"
 #include "erofs/blobchunk.h"
+#include "erofs/fragments.h"
 #include "../lib/liberofs_private.h"
 
 #ifdef HAVE_LIBUUID
@@ -129,9 +130,9 @@ static int parse_extended_opts(const char *opts)
 		const char *p = strchr(token, ',');
 
 		next = NULL;
-		if (p)
+		if (p) {
 			next = p + 1;
-		else {
+		} else {
 			p = token + strlen(token);
 			next = p;
 		}
@@ -198,7 +199,34 @@ static int parse_extended_opts(const char *opts)
 				return -EINVAL;
 			cfg.c_ztailpacking = true;
 		}
+
+		if (MATCH_EXTENTED_OPT("fragments", token, keylen)) {
+			char *endptr;
+			u64 i;
+
+			if (vallen || cfg.c_ztailpacking)
+				return -EINVAL;
+			cfg.c_fragments = true;
+
+			i = strtoull(next, &endptr, 0);
+			if (i == 0 || (*endptr != ',' && *endptr != '\0')) {
+				cfg.c_pclusterblks_frags = 1;
+				continue;
+			}
+			if (i % EROFS_BLKSIZ) {
+				erofs_err("invalid physical clustersize %llu",
+					  i);
+				return -EINVAL;
+			}
+			cfg.c_pclusterblks_frags = i / EROFS_BLKSIZ;
+
+			if (*endptr == ',')
+				next = strchr(next, ',')  + 1;
+			else
+				goto out;
+		}
 	}
+out:
 	return 0;
 }
 
@@ -438,7 +466,8 @@ static int mkfs_parse_options_cfg(int argc, char *argv[])
 
 int erofs_mkfs_update_super_block(struct erofs_buffer_head *bh,
 				  erofs_nid_t root_nid,
-				  erofs_blk_t *blocks)
+				  erofs_blk_t *blocks,
+				  erofs_nid_t frags_nid)
 {
 	struct erofs_super_block sb = {
 		.magic     = cpu_to_le32(EROFS_SUPER_MAGIC_V1),
@@ -462,6 +491,7 @@ int erofs_mkfs_update_super_block(struct erofs_buffer_head *bh,
 	*blocks         = erofs_mapbh(NULL);
 	sb.blocks       = cpu_to_le32(*blocks);
 	sb.root_nid     = cpu_to_le16(root_nid);
+	sb.frags_nid    = cpu_to_le64(frags_nid);
 	memcpy(sb.uuid, sbi.uuid, sizeof(sb.uuid));
 
 	if (erofs_sb_has_compr_cfgs())
@@ -579,8 +609,8 @@ int main(int argc, char **argv)
 {
 	int err = 0;
 	struct erofs_buffer_head *sb_bh;
-	struct erofs_inode *root_inode;
-	erofs_nid_t root_nid;
+	struct erofs_inode *root_inode, *frags_inode;
+	erofs_nid_t root_nid, frags_nid;
 	struct stat64 st;
 	erofs_blk_t nblocks;
 	struct timeval t;
@@ -650,6 +680,14 @@ int main(int argc, char **argv)
 		erofs_warn("EXPERIMENTAL chunked file feature in use. Use at your own risk!");
 	if (cfg.c_ztailpacking)
 		erofs_warn("EXPERIMENTAL compressed inline data feature in use. Use at your own risk!");
+	if (cfg.c_fragments) {
+		err = erofs_fragments_init();
+		if (err) {
+			erofs_err("failed to initialize fragments");
+			return 1;
+		}
+		erofs_warn("EXPERIMENTAL compressed fragments feature in use. Use at your own risk!");
+	}
 	erofs_set_fs_root(cfg.c_src_path);
 #ifndef NDEBUG
 	if (cfg.c_random_pclusterblks)
@@ -719,7 +757,19 @@ int main(int argc, char **argv)
 			goto exit;
 	}
 
-	err = erofs_mkfs_update_super_block(sb_bh, root_nid, &nblocks);
+	frags_nid = 0;
+	if (cfg.c_fragments) {
+		frags_inode = erofs_mkfs_build_fragments();
+		if (IS_ERR(frags_inode)) {
+			err = PTR_ERR(frags_inode);
+			goto exit;
+		}
+		frags_nid = erofs_lookupnid(frags_inode);
+		erofs_iput(frags_inode);
+	}
+
+	err = erofs_mkfs_update_super_block(sb_bh, root_nid, &nblocks,
+					    frags_nid);
 	if (err)
 		goto exit;
 
@@ -741,6 +791,8 @@ exit:
 	erofs_cleanup_exclude_rules();
 	if (cfg.c_chunkbits)
 		erofs_blob_exit();
+	if (cfg.c_fragments)
+		erofs_fragments_exit();
 	erofs_exit_configure();
 
 	if (err) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v3 0/3] erofs-utils: compressed fragments feature
  2022-08-03  3:51 [RFC PATCH v3 0/3] erofs-utils: compressed fragments feature Yue Hu
                   ` (2 preceding siblings ...)
  2022-08-03  3:51 ` [RFC PATCH v3 3/3] erofs-utils: introduce compressed fragments support Yue Hu
@ 2022-08-03  6:07 ` Gao Xiang
  2022-08-03  7:33   ` Yue Hu
  3 siblings, 1 reply; 8+ messages in thread
From: Gao Xiang @ 2022-08-03  6:07 UTC (permalink / raw)
  To: Yue Hu; +Cc: huyue2, linux-erofs, zbestahu, shaojunjun, zhangwen

Hi Yue,

On Wed, Aug 03, 2022 at 11:51:27AM +0800, Yue Hu wrote:
> In order to achieve greater compression ratio, let's introduce
> compressed fragments feature which can merge tail of per-file or the
> whole files into one special inode to reach the target.
> 
> And we can also set pcluster size to fragments inode for different
> compression requirments.
> 
> In this patchset, we also improve the uncompressed data layout of
> compressed files. Just write it from 'clusterofs' instead of 0 since it
> can benefit from in-place I/O. For now, it only goes with fragments.
> 
> The main idea above is from Xiang.

Thanks for your hard work! I will take a deep try this weekend,

Also I'd like to enable logical cluster size != 4k for big pcluster with
large pclustersize in order to reduce the size of compression indexes.

In such cases, I think compact indexes are unnecessary.  I think it's
already supported on the kernel side, so we just need to implement the
userspace side.

Thanks,
Gao Xiang

> 
> Here is some test data of Linux 5.10.87 source code under Ubuntu 18.04:
> 
> linux-5.10.87 (erofs, uncompressed)                1.1G
> 
> linux-5.10.87 (erofs, lz4hc,12 4k fragments,4k)    301M
> linux-5.10.87 (erofs, lz4hc,12 8k fragments,8k)    268M
> linux-5.10.87 (erofs, lz4hc,12 16k fragments,16k)  242M
> linux-5.10.87 (erofs, lz4hc,12 32k fragments,32k)  225M
> linux-5.10.87 (erofs, lz4hc,12 64k fragments,64k)  217M
> 
> linux-5.10.87 (erofs, lz4hc,12 4k vanilla)         396M
> linux-5.10.87 (erofs, lz4hc,12 8k vanilla)         376M
> linux-5.10.87 (erofs, lz4hc,12 16k vanilla)        364M
> linux-5.10.87 (erofs, lz4hc,12 32k vanilla)        359M
> linux-5.10.87 (erofs, lz4hc,12 64k vanilla)        358M
> 
> Usage:
> mkfs.erofs -zlz4hc,12 -C65536 -Efragments,65536 foo.erofs.img foo/
> 
> Changes since v2:
>  - mainly reimplment the decompression logic for fragment inode due to
>    kernel side;
>  - fix compatibility issue to old image with ztailpacking feature;
>  - move code of super.c in patch 3/3 to patch 1/3;
>  - minor naming change.
> 
> Changes since v1:
>  - mainly optimize index space for fragment inode;
>  - add merging tail with len <= pclustersize into fragments directly;
>  - use a inode instead of nid to avoid multiple load fragments;
>  - fix memory leak of building fragments;
>  - minor change to diff special fragments with normal inode.
>  - rebase to commit cb058526 with patch [1];
>  - code cleanup.
> 
> Note that inode will be extended version (64 bytes) due to mtime, may
> use 'force-inode-compact' option to reduce the size if mtime careless.
> 
> [1] https://lore.kernel.org/linux-erofs/20220722053610.23912-1-huyue2@coolpad.com/
> 
> Yue Hu (3):
>   erofs-utils: lib: add support for fragments data decompression
>   erofs-utils: lib: support on-disk offset for shifted decompression
>   erofs-utils: introduce compressed fragments support
> 
>  include/erofs/compress.h   |   3 +-
>  include/erofs/config.h     |   3 +-
>  include/erofs/decompress.h |   3 ++
>  include/erofs/fragments.h  |  25 +++++++++
>  include/erofs/inode.h      |   2 +
>  include/erofs/internal.h   |   9 ++++
>  include/erofs_fs.h         |  27 +++++++---
>  lib/Makefile.am            |   4 +-
>  lib/compress.c             | 108 +++++++++++++++++++++++++++----------
>  lib/data.c                 |  28 +++++++++-
>  lib/decompress.c           |  10 +++-
>  lib/fragments.c            |  76 ++++++++++++++++++++++++++
>  lib/inode.c                |  43 ++++++++++-----
>  lib/super.c                |  24 ++++++++-
>  lib/zmap.c                 |  26 +++++++++
>  mkfs/main.c                |  64 +++++++++++++++++++---
>  16 files changed, 393 insertions(+), 62 deletions(-)
>  create mode 100644 include/erofs/fragments.h
>  create mode 100644 lib/fragments.c
> 
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v3 0/3] erofs-utils: compressed fragments feature
  2022-08-03  6:07 ` [RFC PATCH v3 0/3] erofs-utils: compressed fragments feature Gao Xiang
@ 2022-08-03  7:33   ` Yue Hu
  0 siblings, 0 replies; 8+ messages in thread
From: Yue Hu @ 2022-08-03  7:33 UTC (permalink / raw)
  To: Gao Xiang; +Cc: huyue2, linux-erofs, zbestahu, shaojunjun, zhangwen

Hi Xiang,

On Wed, 3 Aug 2022 14:07:24 +0800
Gao Xiang <hsiangkao@linux.alibaba.com> wrote:

> Hi Yue,
> 
> On Wed, Aug 03, 2022 at 11:51:27AM +0800, Yue Hu wrote:
> > In order to achieve greater compression ratio, let's introduce
> > compressed fragments feature which can merge tail of per-file or the
> > whole files into one special inode to reach the target.
> > 
> > And we can also set pcluster size to fragments inode for different
> > compression requirments.
> > 
> > In this patchset, we also improve the uncompressed data layout of
> > compressed files. Just write it from 'clusterofs' instead of 0 since it
> > can benefit from in-place I/O. For now, it only goes with fragments.
> > 
> > The main idea above is from Xiang.  
> 
> Thanks for your hard work! I will take a deep try this weekend,

Got it.

> 
> Also I'd like to enable logical cluster size != 4k for big pcluster with
> large pclustersize in order to reduce the size of compression indexes.

Let me think about this first.

Thanks.

> 
> In such cases, I think compact indexes are unnecessary.  I think it's
> already supported on the kernel side, so we just need to implement the
> userspace side.
> 
> Thanks,
> Gao Xiang
> 
> > 
> > Here is some test data of Linux 5.10.87 source code under Ubuntu 18.04:
> > 
> > linux-5.10.87 (erofs, uncompressed)                1.1G
> > 
> > linux-5.10.87 (erofs, lz4hc,12 4k fragments,4k)    301M
> > linux-5.10.87 (erofs, lz4hc,12 8k fragments,8k)    268M
> > linux-5.10.87 (erofs, lz4hc,12 16k fragments,16k)  242M
> > linux-5.10.87 (erofs, lz4hc,12 32k fragments,32k)  225M
> > linux-5.10.87 (erofs, lz4hc,12 64k fragments,64k)  217M
> > 
> > linux-5.10.87 (erofs, lz4hc,12 4k vanilla)         396M
> > linux-5.10.87 (erofs, lz4hc,12 8k vanilla)         376M
> > linux-5.10.87 (erofs, lz4hc,12 16k vanilla)        364M
> > linux-5.10.87 (erofs, lz4hc,12 32k vanilla)        359M
> > linux-5.10.87 (erofs, lz4hc,12 64k vanilla)        358M
> > 
> > Usage:
> > mkfs.erofs -zlz4hc,12 -C65536 -Efragments,65536 foo.erofs.img foo/
> > 
> > Changes since v2:
> >  - mainly reimplment the decompression logic for fragment inode due to
> >    kernel side;
> >  - fix compatibility issue to old image with ztailpacking feature;
> >  - move code of super.c in patch 3/3 to patch 1/3;
> >  - minor naming change.
> > 
> > Changes since v1:
> >  - mainly optimize index space for fragment inode;
> >  - add merging tail with len <= pclustersize into fragments directly;
> >  - use a inode instead of nid to avoid multiple load fragments;
> >  - fix memory leak of building fragments;
> >  - minor change to diff special fragments with normal inode.
> >  - rebase to commit cb058526 with patch [1];
> >  - code cleanup.
> > 
> > Note that inode will be extended version (64 bytes) due to mtime, may
> > use 'force-inode-compact' option to reduce the size if mtime careless.
> > 
> > [1] https://lore.kernel.org/linux-erofs/20220722053610.23912-1-huyue2@coolpad.com/
> > 
> > Yue Hu (3):
> >   erofs-utils: lib: add support for fragments data decompression
> >   erofs-utils: lib: support on-disk offset for shifted decompression
> >   erofs-utils: introduce compressed fragments support
> > 
> >  include/erofs/compress.h   |   3 +-
> >  include/erofs/config.h     |   3 +-
> >  include/erofs/decompress.h |   3 ++
> >  include/erofs/fragments.h  |  25 +++++++++
> >  include/erofs/inode.h      |   2 +
> >  include/erofs/internal.h   |   9 ++++
> >  include/erofs_fs.h         |  27 +++++++---
> >  lib/Makefile.am            |   4 +-
> >  lib/compress.c             | 108 +++++++++++++++++++++++++++----------
> >  lib/data.c                 |  28 +++++++++-
> >  lib/decompress.c           |  10 +++-
> >  lib/fragments.c            |  76 ++++++++++++++++++++++++++
> >  lib/inode.c                |  43 ++++++++++-----
> >  lib/super.c                |  24 ++++++++-
> >  lib/zmap.c                 |  26 +++++++++
> >  mkfs/main.c                |  64 +++++++++++++++++++---
> >  16 files changed, 393 insertions(+), 62 deletions(-)
> >  create mode 100644 include/erofs/fragments.h
> >  create mode 100644 lib/fragments.c
> > 
> > -- 
> > 2.17.1  


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v3 1/3] erofs-utils: lib: add support for fragments data decompression
  2022-08-03  3:51 ` [RFC PATCH v3 1/3] erofs-utils: lib: add support for fragments data decompression Yue Hu
@ 2022-08-16 18:48   ` Gao Xiang
  2022-08-17  3:56     ` Yue Hu
  0 siblings, 1 reply; 8+ messages in thread
From: Gao Xiang @ 2022-08-16 18:48 UTC (permalink / raw)
  To: Yue Hu; +Cc: huyue2, linux-erofs, zbestahu, shaojunjun, zhangwen

Hi Yue,

I roughly look, some comments below...

On Wed, Aug 03, 2022 at 11:51:28AM +0800, Yue Hu wrote:
> Add compressed fragments support for erofsfuse.
> 
> Signed-off-by: Yue Hu <huyue2@coolpad.com>
> ---
>  include/erofs/internal.h |  8 ++++++++
>  include/erofs_fs.h       | 26 ++++++++++++++++++++------
>  lib/data.c               | 20 ++++++++++++++++++++
>  lib/super.c              | 24 +++++++++++++++++++++++-
>  lib/zmap.c               | 26 ++++++++++++++++++++++++++
>  5 files changed, 97 insertions(+), 7 deletions(-)
> 
> diff --git a/include/erofs/internal.h b/include/erofs/internal.h
> index 48498fe..5980db7 100644
> --- a/include/erofs/internal.h
> +++ b/include/erofs/internal.h
> @@ -102,6 +102,7 @@ struct erofs_sb_info {
>  		u16 devt_slotoff;		/* used for mkfs */
>  		u16 device_id_mask;		/* used for others */
>  	};
> +	struct erofs_inode *frags_inode;

I rethought about this feature and the naming.

I think we could name the tail (or the whole file) as a fragment.

But I tend to name the special inode as "packed inode", since
this special inode can be used as "compressed metadata" as well.

So, just name as "packed_inode"?

>  };
>  
>  /* global sbi */
> @@ -132,6 +133,7 @@ EROFS_FEATURE_FUNCS(big_pcluster, incompat, INCOMPAT_BIG_PCLUSTER)
>  EROFS_FEATURE_FUNCS(chunked_file, incompat, INCOMPAT_CHUNKED_FILE)
>  EROFS_FEATURE_FUNCS(device_table, incompat, INCOMPAT_DEVICE_TABLE)
>  EROFS_FEATURE_FUNCS(ztailpacking, incompat, INCOMPAT_ZTAILPACKING)
> +EROFS_FEATURE_FUNCS(fragments, incompat, INCOMPAT_FRAGMENTS)
>  EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM)
>  
>  #define EROFS_I_EA_INITED	(1 << 0)
> @@ -190,6 +192,8 @@ struct erofs_inode {
>  	void *eof_tailraw;
>  	unsigned int eof_tailrawsize;
>  
> +	erofs_off_t fragmentoff;

move it to the end? or find a better place?

> +
>  	union {
>  		void *compressmeta;
>  		void *chunkindexes;
> @@ -201,6 +205,7 @@ struct erofs_inode {
>  			uint64_t z_tailextent_headlcn;
>  			unsigned int    z_idataoff;
>  #define z_idata_size	idata_size
> +#define z_fragmentoff	fragmentoff

drop this line?

>  		};
>  	};
>  #ifdef WITH_ANDROID
> @@ -276,6 +281,7 @@ enum {
>  	BH_Mapped,
>  	BH_Encoded,
>  	BH_FullMapped,
> +	BH_Fragments,

	BH_Fragment,

>  };
>  
>  /* Has a disk mapping */
> @@ -286,6 +292,8 @@ enum {
>  #define EROFS_MAP_ENCODED	(1 << BH_Encoded)
>  /* The length of extent is full */
>  #define EROFS_MAP_FULL_MAPPED	(1 << BH_FullMapped)
> +/* Located in fragments */
> +#define EROFS_MAP_FRAGMENTS	(1 << BH_Fragments)


EROFS_MAP_FRAGMENT ?

>  
>  struct erofs_map_blocks {
>  	char mpage[EROFS_BLKSIZ];
> diff --git a/include/erofs_fs.h b/include/erofs_fs.h
> index 08f9761..4e13566 100644
> --- a/include/erofs_fs.h
> +++ b/include/erofs_fs.h
> @@ -25,13 +25,15 @@
>  #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
>  #define EROFS_FEATURE_INCOMPAT_DEVICE_TABLE	0x00000008
>  #define EROFS_FEATURE_INCOMPAT_ZTAILPACKING	0x00000010
> +#define EROFS_FEATURE_INCOMPAT_FRAGMENTS	0x00000020
>  #define EROFS_ALL_FEATURE_INCOMPAT		\
>  	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
>  	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
>  	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
>  	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
>  	 EROFS_FEATURE_INCOMPAT_DEVICE_TABLE | \
> -	 EROFS_FEATURE_INCOMPAT_ZTAILPACKING)
> +	 EROFS_FEATURE_INCOMPAT_ZTAILPACKING | \
> +	 EROFS_FEATURE_INCOMPAT_FRAGMENTS)
>  
>  #define EROFS_SB_EXTSLOT_SIZE	16
>  
> @@ -73,7 +75,9 @@ struct erofs_super_block {
>  	} __packed u1;
>  	__le16 extra_devices;	/* # of devices besides the primary device */
>  	__le16 devt_slotoff;	/* startoff = devt_slotoff * devt_slotsize */
> -	__u8 reserved2[38];
> +	__u8 reserved[6];
> +	__le64 frags_nid;	/* nid of the special fragments inode */

	packed_nid; ?

> +	__u8 reserved2[24];
>  };
>  
>  /*
> @@ -294,16 +298,24 @@ struct z_erofs_lzma_cfgs {
>   * bit 1 : HEAD1 big pcluster (0 - off; 1 - on)
>   * bit 2 : HEAD2 big pcluster (0 - off; 1 - on)
>   * bit 3 : tailpacking inline pcluster (0 - off; 1 - on)
> + * bit 4 : fragment pcluster (0 - off; 1 - on)
>   */
>  #define Z_EROFS_ADVISE_COMPACTED_2B		0x0001
>  #define Z_EROFS_ADVISE_BIG_PCLUSTER_1		0x0002
>  #define Z_EROFS_ADVISE_BIG_PCLUSTER_2		0x0004
>  #define Z_EROFS_ADVISE_INLINE_PCLUSTER		0x0008
> +#define Z_EROFS_ADVISE_FRAGMENT_PCLUSTER	0x0010
>  
>  struct z_erofs_map_header {
> -	__le16	h_reserved1;
> -	/* record the size of tailpacking data */
> -	__le16  h_idata_size;
> +	union {
> +		/* direct addressing for fragment offset */
> +		__le32	h_fragmentoff;
> +		struct {
> +			__le16  h_reserved1;
> +			/* record the size of tailpacking data */
> +			__le16	h_idata_size;

That is really somewhat a layout mistake when introducing
ztailpacking feature.

> +		};
> +	};
>  	__le16	h_advise;
>  	/*
>  	 * bit 0-3 : algorithm type of head 1 (logical cluster type 01);
> @@ -312,12 +324,14 @@ struct z_erofs_map_header {
>  	__u8	h_algorithmtype;
>  	/*
>  	 * bit 0-2 : logical cluster bits - 12, e.g. 0 for 4096;
> -	 * bit 3-7 : reserved.
> +	 * bit 3-6 : reserved;
> +	 * bit 7   : merge the whole file into fragments or not.

Move the whole file into packed inode or not.

>  	 */
>  	__u8	h_clusterbits;
>  };
>  
>  #define Z_EROFS_VLE_LEGACY_HEADER_PADDING       8
> +#define Z_EROFS_FRAGMENT_INODE_BIT		7

Move this forward, just before "struct z_erofs_map_header"  

>  /*
>   * Fixed-sized output compression ondisk Logical Extent cluster type:
> diff --git a/lib/data.c b/lib/data.c
> index 6bc554d..b9dd07b 100644
> --- a/lib/data.c
> +++ b/lib/data.c
> @@ -275,6 +275,26 @@ static int z_erofs_read_data(struct erofs_inode *inode, char *buffer,
>  			continue;
>  		}
>  
> +		if (map.m_flags & EROFS_MAP_FRAGMENTS) {
> +			char *out;
> +
> +			out = malloc(length - skip);
> +			if (!out) {
> +				ret = -ENOMEM;
> +				break;
> +			}
> +			ret = z_erofs_read_data(sbi.frags_inode, out,
> +						length - skip,
> +						inode->z_fragmentoff + skip);
> +			if (ret < 0) {
> +				free(out);
> +				break;
> +			}
> +			memcpy(buffer + end - offset, out, length - skip);
> +			free(out);
> +			continue;
> +		}
> +
>  		if (map.m_plen > bufsize) {
>  			bufsize = map.m_plen;
>  			raw = realloc(raw, bufsize);
> diff --git a/lib/super.c b/lib/super.c
> index b267412..4d3ca00 100644
> --- a/lib/super.c
> +++ b/lib/super.c
> @@ -104,6 +104,21 @@ int erofs_read_superblock(void)
>  	sbi.xattr_blkaddr = le32_to_cpu(dsb->xattr_blkaddr);
>  	sbi.islotbits = EROFS_ISLOTBITS;
>  	sbi.root_nid = le16_to_cpu(dsb->root_nid);
> +	sbi.frags_inode = NULL;
> +	if (erofs_sb_has_fragments()) {
> +		struct erofs_inode *inode;
> +
> +		inode = calloc(1, sizeof(struct erofs_inode));
> +		if (!inode)
> +			return -ENOMEM;
> +		inode->nid = le64_to_cpu(dsb->frags_nid);
> +		ret = erofs_read_inode_from_disk(inode);
> +		if (ret) {
> +			free(inode);
> +			return ret;
> +		}
> +		sbi.frags_inode = inode;
> +	}
>  	sbi.inos = le64_to_cpu(dsb->inos);
>  	sbi.checksum = le32_to_cpu(dsb->checksum);
>  
> @@ -111,11 +126,18 @@ int erofs_read_superblock(void)
>  	sbi.build_time_nsec = le32_to_cpu(dsb->build_time_nsec);
>  
>  	memcpy(&sbi.uuid, dsb->uuid, sizeof(dsb->uuid));
> -	return erofs_init_devices(&sbi, dsb);
> +
> +	ret = erofs_init_devices(&sbi, dsb);
> +	if (ret && sbi.frags_inode)
> +		free(sbi.frags_inode);
> +	return ret;
>  }
>  
>  void erofs_put_super(void)
>  {
>  	if (sbi.devs)
>  		free(sbi.devs);
> +
> +	if (sbi.frags_inode)
> +		free(sbi.frags_inode);
>  }
> diff --git a/lib/zmap.c b/lib/zmap.c
> index 95745c5..16267ae 100644
> --- a/lib/zmap.c
> +++ b/lib/zmap.c
> @@ -83,6 +83,20 @@ static int z_erofs_fill_inode_lazy(struct erofs_inode *vi)
>  		if (ret < 0)
>  			return ret;
>  	}
> +	if (vi->z_advise & Z_EROFS_ADVISE_FRAGMENT_PCLUSTER) {
> +		vi->z_fragmentoff = le32_to_cpu(h->h_fragmentoff);
> +
> +		if (h->h_clusterbits >> Z_EROFS_FRAGMENT_INODE_BIT) {
> +			vi->z_tailextent_headlcn = 0;
> +		} else {
> +			struct erofs_map_blocks map = { .index = UINT_MAX };
> +
> +			ret = z_erofs_do_map_blocks(vi, &map,
> +						    EROFS_GET_BLOCKS_FINDTAIL);
> +			if (ret < 0)
> +				return ret;
> +		}
> +	}
>  	vi->flags |= EROFS_I_Z_INITED;
>  	return 0;
>  }
> @@ -546,6 +560,7 @@ static int z_erofs_do_map_blocks(struct erofs_inode *vi,
>  				 int flags)
>  {
>  	bool ztailpacking = vi->z_advise & Z_EROFS_ADVISE_INLINE_PCLUSTER;
> +	bool infrags = vi->z_advise & Z_EROFS_ADVISE_FRAGMENT_PCLUSTER;

	inpacked;

Thanks,
Gao Xiang

>  	struct z_erofs_maprecorder m = {
>  		.inode = vi,
>  		.map = map,
> @@ -609,6 +624,9 @@ static int z_erofs_do_map_blocks(struct erofs_inode *vi,
>  		map->m_flags |= EROFS_MAP_META;
>  		map->m_pa = vi->z_idataoff;
>  		map->m_plen = vi->z_idata_size;
> +	} else if (infrags && m.lcn == vi->z_tailextent_headlcn) {
> +		map->m_flags |= EROFS_MAP_FRAGMENTS;
> +		DBG_BUGON(!map->m_la);
>  	} else {
>  		map->m_pa = blknr_to_addr(m.pblk);
>  		err = z_erofs_get_extent_compressedlen(&m, initial_lcn);
> @@ -652,6 +670,14 @@ int z_erofs_map_blocks_iter(struct erofs_inode *vi,
>  	if (err)
>  		goto out;
>  
> +	if ((vi->z_advise & Z_EROFS_ADVISE_FRAGMENT_PCLUSTER) &&
> +	    !vi->z_tailextent_headlcn) {
> +		map->m_llen = map->m_la + 1;
> +		map->m_la = 0;
> +		map->m_flags = EROFS_MAP_MAPPED | EROFS_MAP_FRAGMENTS;
> +		goto out;
> +	}
> +
>  	err = z_erofs_do_map_blocks(vi, map, flags);
>  out:
>  	DBG_BUGON(err < 0 && err != -ENOMEM);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v3 1/3] erofs-utils: lib: add support for fragments data decompression
  2022-08-16 18:48   ` Gao Xiang
@ 2022-08-17  3:56     ` Yue Hu
  0 siblings, 0 replies; 8+ messages in thread
From: Yue Hu @ 2022-08-17  3:56 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-erofs, zbestahu, shaojunjun, zhangwen

[-- Attachment #1: Type: text/plain, Size: 10598 bytes --]

On Wed, 17 Aug 2022 02:48:54 +0800
Gao Xiang <xiang@kernel.org> wrote:

> Hi Yue,
> 
> I roughly look, some comments below...

Ok, i will update these in next version.

> 
> On Wed, Aug 03, 2022 at 11:51:28AM +0800, Yue Hu wrote:
> > Add compressed fragments support for erofsfuse.
> > 
> > Signed-off-by: Yue Hu <huyue2@coolpad.com>
> > ---
> >  include/erofs/internal.h |  8 ++++++++
> >  include/erofs_fs.h       | 26 ++++++++++++++++++++------
> >  lib/data.c               | 20 ++++++++++++++++++++
> >  lib/super.c              | 24 +++++++++++++++++++++++-
> >  lib/zmap.c               | 26 ++++++++++++++++++++++++++
> >  5 files changed, 97 insertions(+), 7 deletions(-)
> > 
> > diff --git a/include/erofs/internal.h b/include/erofs/internal.h
> > index 48498fe..5980db7 100644
> > --- a/include/erofs/internal.h
> > +++ b/include/erofs/internal.h
> > @@ -102,6 +102,7 @@ struct erofs_sb_info {
> >  		u16 devt_slotoff;		/* used for mkfs */
> >  		u16 device_id_mask;		/* used for others */
> >  	};
> > +	struct erofs_inode *frags_inode;  
> 
> I rethought about this feature and the naming.
> 
> I think we could name the tail (or the whole file) as a fragment.
> 
> But I tend to name the special inode as "packed inode", since
> this special inode can be used as "compressed metadata" as well.
> 
> So, just name as "packed_inode"?
> 
> >  };
> >  
> >  /* global sbi */
> > @@ -132,6 +133,7 @@ EROFS_FEATURE_FUNCS(big_pcluster, incompat, INCOMPAT_BIG_PCLUSTER)
> >  EROFS_FEATURE_FUNCS(chunked_file, incompat, INCOMPAT_CHUNKED_FILE)
> >  EROFS_FEATURE_FUNCS(device_table, incompat, INCOMPAT_DEVICE_TABLE)
> >  EROFS_FEATURE_FUNCS(ztailpacking, incompat, INCOMPAT_ZTAILPACKING)
> > +EROFS_FEATURE_FUNCS(fragments, incompat, INCOMPAT_FRAGMENTS)
> >  EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM)
> >  
> >  #define EROFS_I_EA_INITED	(1 << 0)
> > @@ -190,6 +192,8 @@ struct erofs_inode {
> >  	void *eof_tailraw;
> >  	unsigned int eof_tailrawsize;
> >  
> > +	erofs_off_t fragmentoff;  
> 
> move it to the end? or find a better place?
> 
> > +
> >  	union {
> >  		void *compressmeta;
> >  		void *chunkindexes;
> > @@ -201,6 +205,7 @@ struct erofs_inode {
> >  			uint64_t z_tailextent_headlcn;
> >  			unsigned int    z_idataoff;
> >  #define z_idata_size	idata_size
> > +#define z_fragmentoff	fragmentoff  
> 
> drop this line?
> 
> >  		};
> >  	};
> >  #ifdef WITH_ANDROID
> > @@ -276,6 +281,7 @@ enum {
> >  	BH_Mapped,
> >  	BH_Encoded,
> >  	BH_FullMapped,
> > +	BH_Fragments,  
> 
> 	BH_Fragment,
> 
> >  };
> >  
> >  /* Has a disk mapping */
> > @@ -286,6 +292,8 @@ enum {
> >  #define EROFS_MAP_ENCODED	(1 << BH_Encoded)
> >  /* The length of extent is full */
> >  #define EROFS_MAP_FULL_MAPPED	(1 << BH_FullMapped)
> > +/* Located in fragments */
> > +#define EROFS_MAP_FRAGMENTS	(1 << BH_Fragments)  
> 
> 
> EROFS_MAP_FRAGMENT ?
> 
> >  
> >  struct erofs_map_blocks {
> >  	char mpage[EROFS_BLKSIZ];
> > diff --git a/include/erofs_fs.h b/include/erofs_fs.h
> > index 08f9761..4e13566 100644
> > --- a/include/erofs_fs.h
> > +++ b/include/erofs_fs.h
> > @@ -25,13 +25,15 @@
> >  #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
> >  #define EROFS_FEATURE_INCOMPAT_DEVICE_TABLE	0x00000008
> >  #define EROFS_FEATURE_INCOMPAT_ZTAILPACKING	0x00000010
> > +#define EROFS_FEATURE_INCOMPAT_FRAGMENTS	0x00000020
> >  #define EROFS_ALL_FEATURE_INCOMPAT		\
> >  	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
> >  	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
> >  	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
> >  	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
> >  	 EROFS_FEATURE_INCOMPAT_DEVICE_TABLE | \
> > -	 EROFS_FEATURE_INCOMPAT_ZTAILPACKING)
> > +	 EROFS_FEATURE_INCOMPAT_ZTAILPACKING | \
> > +	 EROFS_FEATURE_INCOMPAT_FRAGMENTS)
> >  
> >  #define EROFS_SB_EXTSLOT_SIZE	16
> >  
> > @@ -73,7 +75,9 @@ struct erofs_super_block {
> >  	} __packed u1;
> >  	__le16 extra_devices;	/* # of devices besides the primary device */
> >  	__le16 devt_slotoff;	/* startoff = devt_slotoff * devt_slotsize */
> > -	__u8 reserved2[38];
> > +	__u8 reserved[6];
> > +	__le64 frags_nid;	/* nid of the special fragments inode */  
> 
> 	packed_nid; ?
> 
> > +	__u8 reserved2[24];
> >  };
> >  
> >  /*
> > @@ -294,16 +298,24 @@ struct z_erofs_lzma_cfgs {
> >   * bit 1 : HEAD1 big pcluster (0 - off; 1 - on)
> >   * bit 2 : HEAD2 big pcluster (0 - off; 1 - on)
> >   * bit 3 : tailpacking inline pcluster (0 - off; 1 - on)
> > + * bit 4 : fragment pcluster (0 - off; 1 - on)
> >   */
> >  #define Z_EROFS_ADVISE_COMPACTED_2B		0x0001
> >  #define Z_EROFS_ADVISE_BIG_PCLUSTER_1		0x0002
> >  #define Z_EROFS_ADVISE_BIG_PCLUSTER_2		0x0004
> >  #define Z_EROFS_ADVISE_INLINE_PCLUSTER		0x0008
> > +#define Z_EROFS_ADVISE_FRAGMENT_PCLUSTER	0x0010
> >  
> >  struct z_erofs_map_header {
> > -	__le16	h_reserved1;
> > -	/* record the size of tailpacking data */
> > -	__le16  h_idata_size;
> > +	union {
> > +		/* direct addressing for fragment offset */
> > +		__le32	h_fragmentoff;
> > +		struct {
> > +			__le16  h_reserved1;
> > +			/* record the size of tailpacking data */
> > +			__le16	h_idata_size;  
> 
> That is really somewhat a layout mistake when introducing
> ztailpacking feature.

Oh, we forgot to change this then.

> 
> > +		};
> > +	};
> >  	__le16	h_advise;
> >  	/*
> >  	 * bit 0-3 : algorithm type of head 1 (logical cluster type 01);
> > @@ -312,12 +324,14 @@ struct z_erofs_map_header {
> >  	__u8	h_algorithmtype;
> >  	/*
> >  	 * bit 0-2 : logical cluster bits - 12, e.g. 0 for 4096;
> > -	 * bit 3-7 : reserved.
> > +	 * bit 3-6 : reserved;
> > +	 * bit 7   : merge the whole file into fragments or not.  
> 
> Move the whole file into packed inode or not.
> 
> >  	 */
> >  	__u8	h_clusterbits;
> >  };
> >  
> >  #define Z_EROFS_VLE_LEGACY_HEADER_PADDING       8
> > +#define Z_EROFS_FRAGMENT_INODE_BIT		7  
> 
> Move this forward, just before "struct z_erofs_map_header"  
> 
> >  /*
> >   * Fixed-sized output compression ondisk Logical Extent cluster type:
> > diff --git a/lib/data.c b/lib/data.c
> > index 6bc554d..b9dd07b 100644
> > --- a/lib/data.c
> > +++ b/lib/data.c
> > @@ -275,6 +275,26 @@ static int z_erofs_read_data(struct erofs_inode *inode, char *buffer,
> >  			continue;
> >  		}
> >  
> > +		if (map.m_flags & EROFS_MAP_FRAGMENTS) {
> > +			char *out;
> > +
> > +			out = malloc(length - skip);
> > +			if (!out) {
> > +				ret = -ENOMEM;
> > +				break;
> > +			}
> > +			ret = z_erofs_read_data(sbi.frags_inode, out,
> > +						length - skip,
> > +						inode->z_fragmentoff + skip);
> > +			if (ret < 0) {
> > +				free(out);
> > +				break;
> > +			}
> > +			memcpy(buffer + end - offset, out, length - skip);
> > +			free(out);
> > +			continue;
> > +		}
> > +
> >  		if (map.m_plen > bufsize) {
> >  			bufsize = map.m_plen;
> >  			raw = realloc(raw, bufsize);
> > diff --git a/lib/super.c b/lib/super.c
> > index b267412..4d3ca00 100644
> > --- a/lib/super.c
> > +++ b/lib/super.c
> > @@ -104,6 +104,21 @@ int erofs_read_superblock(void)
> >  	sbi.xattr_blkaddr = le32_to_cpu(dsb->xattr_blkaddr);
> >  	sbi.islotbits = EROFS_ISLOTBITS;
> >  	sbi.root_nid = le16_to_cpu(dsb->root_nid);
> > +	sbi.frags_inode = NULL;
> > +	if (erofs_sb_has_fragments()) {
> > +		struct erofs_inode *inode;
> > +
> > +		inode = calloc(1, sizeof(struct erofs_inode));
> > +		if (!inode)
> > +			return -ENOMEM;
> > +		inode->nid = le64_to_cpu(dsb->frags_nid);
> > +		ret = erofs_read_inode_from_disk(inode);
> > +		if (ret) {
> > +			free(inode);
> > +			return ret;
> > +		}
> > +		sbi.frags_inode = inode;
> > +	}
> >  	sbi.inos = le64_to_cpu(dsb->inos);
> >  	sbi.checksum = le32_to_cpu(dsb->checksum);
> >  
> > @@ -111,11 +126,18 @@ int erofs_read_superblock(void)
> >  	sbi.build_time_nsec = le32_to_cpu(dsb->build_time_nsec);
> >  
> >  	memcpy(&sbi.uuid, dsb->uuid, sizeof(dsb->uuid));
> > -	return erofs_init_devices(&sbi, dsb);
> > +
> > +	ret = erofs_init_devices(&sbi, dsb);
> > +	if (ret && sbi.frags_inode)
> > +		free(sbi.frags_inode);
> > +	return ret;
> >  }
> >  
> >  void erofs_put_super(void)
> >  {
> >  	if (sbi.devs)
> >  		free(sbi.devs);
> > +
> > +	if (sbi.frags_inode)
> > +		free(sbi.frags_inode);
> >  }
> > diff --git a/lib/zmap.c b/lib/zmap.c
> > index 95745c5..16267ae 100644
> > --- a/lib/zmap.c
> > +++ b/lib/zmap.c
> > @@ -83,6 +83,20 @@ static int z_erofs_fill_inode_lazy(struct erofs_inode *vi)
> >  		if (ret < 0)
> >  			return ret;
> >  	}
> > +	if (vi->z_advise & Z_EROFS_ADVISE_FRAGMENT_PCLUSTER) {
> > +		vi->z_fragmentoff = le32_to_cpu(h->h_fragmentoff);
> > +
> > +		if (h->h_clusterbits >> Z_EROFS_FRAGMENT_INODE_BIT) {
> > +			vi->z_tailextent_headlcn = 0;
> > +		} else {
> > +			struct erofs_map_blocks map = { .index = UINT_MAX };
> > +
> > +			ret = z_erofs_do_map_blocks(vi, &map,
> > +						    EROFS_GET_BLOCKS_FINDTAIL);
> > +			if (ret < 0)
> > +				return ret;
> > +		}
> > +	}
> >  	vi->flags |= EROFS_I_Z_INITED;
> >  	return 0;
> >  }
> > @@ -546,6 +560,7 @@ static int z_erofs_do_map_blocks(struct erofs_inode *vi,
> >  				 int flags)
> >  {
> >  	bool ztailpacking = vi->z_advise & Z_EROFS_ADVISE_INLINE_PCLUSTER;
> > +	bool infrags = vi->z_advise & Z_EROFS_ADVISE_FRAGMENT_PCLUSTER;  
> 
> 	inpacked;
> 
> Thanks,
> Gao Xiang
> 
> >  	struct z_erofs_maprecorder m = {
> >  		.inode = vi,
> >  		.map = map,
> > @@ -609,6 +624,9 @@ static int z_erofs_do_map_blocks(struct erofs_inode *vi,
> >  		map->m_flags |= EROFS_MAP_META;
> >  		map->m_pa = vi->z_idataoff;
> >  		map->m_plen = vi->z_idata_size;
> > +	} else if (infrags && m.lcn == vi->z_tailextent_headlcn) {
> > +		map->m_flags |= EROFS_MAP_FRAGMENTS;
> > +		DBG_BUGON(!map->m_la);
> >  	} else {
> >  		map->m_pa = blknr_to_addr(m.pblk);
> >  		err = z_erofs_get_extent_compressedlen(&m, initial_lcn);
> > @@ -652,6 +670,14 @@ int z_erofs_map_blocks_iter(struct erofs_inode *vi,
> >  	if (err)
> >  		goto out;
> >  
> > +	if ((vi->z_advise & Z_EROFS_ADVISE_FRAGMENT_PCLUSTER) &&
> > +	    !vi->z_tailextent_headlcn) {
> > +		map->m_llen = map->m_la + 1;
> > +		map->m_la = 0;
> > +		map->m_flags = EROFS_MAP_MAPPED | EROFS_MAP_FRAGMENTS;
> > +		goto out;
> > +	}
> > +
> >  	err = z_erofs_do_map_blocks(vi, map, flags);
> >  out:
> >  	DBG_BUGON(err < 0 && err != -ENOMEM);
> > -- 
> > 2.17.1
> >

[-- Attachment #2: Type: text/html, Size: 13466 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-08-17  4:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-03  3:51 [RFC PATCH v3 0/3] erofs-utils: compressed fragments feature Yue Hu
2022-08-03  3:51 ` [RFC PATCH v3 1/3] erofs-utils: lib: add support for fragments data decompression Yue Hu
2022-08-16 18:48   ` Gao Xiang
2022-08-17  3:56     ` Yue Hu
2022-08-03  3:51 ` [RFC PATCH v3 2/3] erofs-utils: lib: support on-disk offset for shifted decompression Yue Hu
2022-08-03  3:51 ` [RFC PATCH v3 3/3] erofs-utils: introduce compressed fragments support Yue Hu
2022-08-03  6:07 ` [RFC PATCH v3 0/3] erofs-utils: compressed fragments feature Gao Xiang
2022-08-03  7:33   ` Yue Hu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.