All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] erofs: some decompression improvements
@ 2021-10-08 20:08 ` Gao Xiang
  0 siblings, 0 replies; 33+ messages in thread
From: Gao Xiang @ 2021-10-08 20:08 UTC (permalink / raw)
  To: linux-erofs; +Cc: Chao Yu, LKML, Yue Hu, Gao Xiang

Hi folks,

This patchset is mainly intended for the upcoming LZMA preparation,
but they still have some benefits to the exist LZ4 decompression.

The first patch looks up compression algorithms on mapping instead
of in the decompression frontend, which is used for the rest patches.

The second patch introduces another compression HEAD (HEAD2) so that
each file can be compressed with two different algorithms at most,
which can be used for the upcoming LZMA compression and LZ4 range
dictionary compression for different data/access patterns.

The third patch introduces a new readmore decompression strategy
trying to improve randread for large LZ4 big pcluster and the upcoming
LZMA decompression. It mainly addresses the previous issue mentioned
in the original big pcluster patchset [1]:

FIO randread
Testdata: enwik9
Kernel: Linux 5.15.0-rc2

pclustersize		Vanilla		Patched
 4096			 54.6 MiB/s	 56.1 MiB/s
16384			117.4 MiB/s	145.6 MiB/s
32768			113.6 MiB/s	203.4 MiB/s
65536			 72.8 MiB/s	236.1 MiB/s

The latest version can also be fetched from
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/linux.git -b erofs/readmore

[1] https://lore.kernel.org/r/20210407043927.10623-1-xiang@kernel.org

Thanks,
Gao Xiang

Changes since v1:
 - correct the function name to z_erofs_map_blocks_iter() in the commit
   message pointed out by Yue;

 - fix the readmore logic which mainly impacts the LZMA approach later,
   therefore test the Patched version again.

Gao Xiang (3):
  erofs: get compression algorithms directly on mapping
  erofs: introduce the secondary compression head
  erofs: introduce readmore decompression strategy

 fs/erofs/compress.h          |   5 --
 fs/erofs/erofs_fs.h          |   8 ++-
 fs/erofs/internal.h          |  25 +++++++-
 fs/erofs/zdata.c             | 111 +++++++++++++++++++++++++++--------
 fs/erofs/zmap.c              |  55 +++++++++++------
 include/trace/events/erofs.h |   2 +-
 6 files changed, 151 insertions(+), 55 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 0/3] erofs: some decompression improvements
@ 2021-10-08 20:08 ` Gao Xiang
  0 siblings, 0 replies; 33+ messages in thread
From: Gao Xiang @ 2021-10-08 20:08 UTC (permalink / raw)
  To: linux-erofs; +Cc: LKML

Hi folks,

This patchset is mainly intended for the upcoming LZMA preparation,
but they still have some benefits to the exist LZ4 decompression.

The first patch looks up compression algorithms on mapping instead
of in the decompression frontend, which is used for the rest patches.

The second patch introduces another compression HEAD (HEAD2) so that
each file can be compressed with two different algorithms at most,
which can be used for the upcoming LZMA compression and LZ4 range
dictionary compression for different data/access patterns.

The third patch introduces a new readmore decompression strategy
trying to improve randread for large LZ4 big pcluster and the upcoming
LZMA decompression. It mainly addresses the previous issue mentioned
in the original big pcluster patchset [1]:

FIO randread
Testdata: enwik9
Kernel: Linux 5.15.0-rc2

pclustersize		Vanilla		Patched
 4096			 54.6 MiB/s	 56.1 MiB/s
16384			117.4 MiB/s	145.6 MiB/s
32768			113.6 MiB/s	203.4 MiB/s
65536			 72.8 MiB/s	236.1 MiB/s

The latest version can also be fetched from
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/linux.git -b erofs/readmore

[1] https://lore.kernel.org/r/20210407043927.10623-1-xiang@kernel.org

Thanks,
Gao Xiang

Changes since v1:
 - correct the function name to z_erofs_map_blocks_iter() in the commit
   message pointed out by Yue;

 - fix the readmore logic which mainly impacts the LZMA approach later,
   therefore test the Patched version again.

Gao Xiang (3):
  erofs: get compression algorithms directly on mapping
  erofs: introduce the secondary compression head
  erofs: introduce readmore decompression strategy

 fs/erofs/compress.h          |   5 --
 fs/erofs/erofs_fs.h          |   8 ++-
 fs/erofs/internal.h          |  25 +++++++-
 fs/erofs/zdata.c             | 111 +++++++++++++++++++++++++++--------
 fs/erofs/zmap.c              |  55 +++++++++++------
 include/trace/events/erofs.h |   2 +-
 6 files changed, 151 insertions(+), 55 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 1/3] erofs: get compression algorithms directly on mapping
  2021-10-08 20:08 ` Gao Xiang
@ 2021-10-08 20:08   ` Gao Xiang
  -1 siblings, 0 replies; 33+ messages in thread
From: Gao Xiang @ 2021-10-08 20:08 UTC (permalink / raw)
  To: linux-erofs; +Cc: Chao Yu, LKML, Yue Hu, Gao Xiang

From: Gao Xiang <hsiangkao@linux.alibaba.com>

Currently, z_erofs_map_blocks_iter() returns whether extents are
compressed or not, and the decompression frontend gets the specific
algorithms then.

It works but not quite well in many aspests, for example:
 - The decompression frontend has to deal with whether extents are
   compressed or not again and lookup the algorithms if compressed.
   It's duplicated and too detailed about the on-disk mapping.

 - A new secondary compression head will be introduced later so that
   each file can have 2 compression algorithms at most for different
   type of data. It could increase the complexity of the decompression
   frontend if still handled in this way;

 - A new readmore decompression strategy will be introduced to get
   better performance for much bigger pcluster and lzma, which needs
   the specific algorithm in advance as well.

Let's look up compression algorithms in z_erofs_map_blocks_iter()
directly instead.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/compress.h          |  5 -----
 fs/erofs/internal.h          | 12 +++++++++---
 fs/erofs/zdata.c             | 12 ++++++------
 fs/erofs/zmap.c              | 19 ++++++++++---------
 include/trace/events/erofs.h |  2 +-
 5 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/fs/erofs/compress.h b/fs/erofs/compress.h
index 3701c72bacb2..ad62d1b4d371 100644
--- a/fs/erofs/compress.h
+++ b/fs/erofs/compress.h
@@ -8,11 +8,6 @@
 
 #include "internal.h"
 
-enum {
-	Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
-	Z_EROFS_COMPRESSION_RUNTIME_MAX
-};
-
 struct z_erofs_decompress_req {
 	struct super_block *sb;
 	struct page **in, **out;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 9524e155b38f..48bfc6eb2b02 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -338,7 +338,7 @@ extern const struct address_space_operations z_erofs_aops;
  * of the corresponding uncompressed data in the file.
  */
 enum {
-	BH_Zipped = BH_PrivateStart,
+	BH_Encoded = BH_PrivateStart,
 	BH_FullMapped,
 };
 
@@ -346,8 +346,8 @@ enum {
 #define EROFS_MAP_MAPPED	(1 << BH_Mapped)
 /* Located in metadata (could be copied from bd_inode) */
 #define EROFS_MAP_META		(1 << BH_Meta)
-/* The extent has been compressed */
-#define EROFS_MAP_ZIPPED	(1 << BH_Zipped)
+/* The extent is encoded */
+#define EROFS_MAP_ENCODED	(1 << BH_Encoded)
 /* The length of extent is full */
 #define EROFS_MAP_FULL_MAPPED	(1 << BH_FullMapped)
 
@@ -355,6 +355,7 @@ struct erofs_map_blocks {
 	erofs_off_t m_pa, m_la;
 	u64 m_plen, m_llen;
 
+	char m_algorithmformat;
 	unsigned int m_flags;
 
 	struct page *mpage;
@@ -368,6 +369,11 @@ struct erofs_map_blocks {
  */
 #define EROFS_GET_BLOCKS_FIEMAP	0x0002
 
+enum {
+	Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
+	Z_EROFS_COMPRESSION_RUNTIME_MAX
+};
+
 /* zmap.c */
 extern const struct iomap_ops z_erofs_iomap_report_ops;
 
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 11c7a1aaebad..5c34ef66677f 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -476,6 +476,11 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,
 	struct erofs_workgroup *grp;
 	int err;
 
+	if (!(map->m_flags & EROFS_MAP_ENCODED)) {
+		DBG_BUGON(1);
+		return -EFSCORRUPTED;
+	}
+
 	/* no available pcluster, let's allocate one */
 	pcl = z_erofs_alloc_pcluster(map->m_plen >> PAGE_SHIFT);
 	if (IS_ERR(pcl))
@@ -483,16 +488,11 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,
 
 	atomic_set(&pcl->obj.refcount, 1);
 	pcl->obj.index = map->m_pa >> PAGE_SHIFT;
-
+	pcl->algorithmformat = map->m_algorithmformat;
 	pcl->length = (map->m_llen << Z_EROFS_PCLUSTER_LENGTH_BIT) |
 		(map->m_flags & EROFS_MAP_FULL_MAPPED ?
 			Z_EROFS_PCLUSTER_FULL_LENGTH : 0);
 
-	if (map->m_flags & EROFS_MAP_ZIPPED)
-		pcl->algorithmformat = Z_EROFS_COMPRESSION_LZ4;
-	else
-		pcl->algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
-
 	/* new pclusters should be claimed as type 1, primary and followed */
 	pcl->next = clt->owned_head;
 	clt->mode = COLLECT_PRIMARY_FOLLOWED;
diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index 7a6df35fdc91..9d9c26343dab 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -111,7 +111,7 @@ struct z_erofs_maprecorder {
 
 	unsigned long lcn;
 	/* compression extent information gathered */
-	u8  type;
+	u8  type, headtype;
 	u16 clusterofs;
 	u16 delta[2];
 	erofs_blk_t pblk, compressedlcs;
@@ -446,9 +446,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
 		}
 		return z_erofs_extent_lookback(m, m->delta[0]);
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-		map->m_flags &= ~EROFS_MAP_ZIPPED;
-		fallthrough;
 	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+		m->headtype = m->type;
 		map->m_la = (lcn << lclusterbits) | m->clusterofs;
 		break;
 	default:
@@ -472,7 +471,7 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
 
 	DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
 		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
-	if (!(map->m_flags & EROFS_MAP_ZIPPED) ||
+	if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
 	    !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
 		map->m_plen = 1 << lclusterbits;
 		return 0;
@@ -609,16 +608,13 @@ int z_erofs_map_blocks_iter(struct inode *inode,
 	if (err)
 		goto unmap_out;
 
-	map->m_flags = EROFS_MAP_ZIPPED;	/* by default, compressed */
 	end = (m.lcn + 1ULL) << lclusterbits;
 
 	switch (m.type) {
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-		if (endoff >= m.clusterofs)
-			map->m_flags &= ~EROFS_MAP_ZIPPED;
-		fallthrough;
 	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
 		if (endoff >= m.clusterofs) {
+			m.headtype = m.type;
 			map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
 			break;
 		}
@@ -650,12 +646,17 @@ int z_erofs_map_blocks_iter(struct inode *inode,
 
 	map->m_llen = end - map->m_la;
 	map->m_pa = blknr_to_addr(m.pblk);
-	map->m_flags |= EROFS_MAP_MAPPED;
+	map->m_flags = EROFS_MAP_MAPPED | EROFS_MAP_ENCODED;
 
 	err = z_erofs_get_extent_compressedlen(&m, initial_lcn);
 	if (err)
 		goto out;
 
+	if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
+		map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
+	else
+		map->m_algorithmformat = vi->z_algorithmtype[0];
+
 	if (flags & EROFS_GET_BLOCKS_FIEMAP) {
 		err = z_erofs_get_extent_decompressedlen(&m);
 		if (!err)
diff --git a/include/trace/events/erofs.h b/include/trace/events/erofs.h
index db4f2cec8360..16ae7b666810 100644
--- a/include/trace/events/erofs.h
+++ b/include/trace/events/erofs.h
@@ -24,7 +24,7 @@ struct erofs_map_blocks;
 #define show_mflags(flags) __print_flags(flags, "",	\
 	{ EROFS_MAP_MAPPED,	"M" },			\
 	{ EROFS_MAP_META,	"I" },			\
-	{ EROFS_MAP_ZIPPED,	"Z" })
+	{ EROFS_MAP_ENCODED,	"E" })
 
 TRACE_EVENT(erofs_lookup,
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 1/3] erofs: get compression algorithms directly on mapping
@ 2021-10-08 20:08   ` Gao Xiang
  0 siblings, 0 replies; 33+ messages in thread
From: Gao Xiang @ 2021-10-08 20:08 UTC (permalink / raw)
  To: linux-erofs; +Cc: Gao Xiang, LKML

From: Gao Xiang <hsiangkao@linux.alibaba.com>

Currently, z_erofs_map_blocks_iter() returns whether extents are
compressed or not, and the decompression frontend gets the specific
algorithms then.

It works but not quite well in many aspests, for example:
 - The decompression frontend has to deal with whether extents are
   compressed or not again and lookup the algorithms if compressed.
   It's duplicated and too detailed about the on-disk mapping.

 - A new secondary compression head will be introduced later so that
   each file can have 2 compression algorithms at most for different
   type of data. It could increase the complexity of the decompression
   frontend if still handled in this way;

 - A new readmore decompression strategy will be introduced to get
   better performance for much bigger pcluster and lzma, which needs
   the specific algorithm in advance as well.

Let's look up compression algorithms in z_erofs_map_blocks_iter()
directly instead.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/compress.h          |  5 -----
 fs/erofs/internal.h          | 12 +++++++++---
 fs/erofs/zdata.c             | 12 ++++++------
 fs/erofs/zmap.c              | 19 ++++++++++---------
 include/trace/events/erofs.h |  2 +-
 5 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/fs/erofs/compress.h b/fs/erofs/compress.h
index 3701c72bacb2..ad62d1b4d371 100644
--- a/fs/erofs/compress.h
+++ b/fs/erofs/compress.h
@@ -8,11 +8,6 @@
 
 #include "internal.h"
 
-enum {
-	Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
-	Z_EROFS_COMPRESSION_RUNTIME_MAX
-};
-
 struct z_erofs_decompress_req {
 	struct super_block *sb;
 	struct page **in, **out;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 9524e155b38f..48bfc6eb2b02 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -338,7 +338,7 @@ extern const struct address_space_operations z_erofs_aops;
  * of the corresponding uncompressed data in the file.
  */
 enum {
-	BH_Zipped = BH_PrivateStart,
+	BH_Encoded = BH_PrivateStart,
 	BH_FullMapped,
 };
 
@@ -346,8 +346,8 @@ enum {
 #define EROFS_MAP_MAPPED	(1 << BH_Mapped)
 /* Located in metadata (could be copied from bd_inode) */
 #define EROFS_MAP_META		(1 << BH_Meta)
-/* The extent has been compressed */
-#define EROFS_MAP_ZIPPED	(1 << BH_Zipped)
+/* The extent is encoded */
+#define EROFS_MAP_ENCODED	(1 << BH_Encoded)
 /* The length of extent is full */
 #define EROFS_MAP_FULL_MAPPED	(1 << BH_FullMapped)
 
@@ -355,6 +355,7 @@ struct erofs_map_blocks {
 	erofs_off_t m_pa, m_la;
 	u64 m_plen, m_llen;
 
+	char m_algorithmformat;
 	unsigned int m_flags;
 
 	struct page *mpage;
@@ -368,6 +369,11 @@ struct erofs_map_blocks {
  */
 #define EROFS_GET_BLOCKS_FIEMAP	0x0002
 
+enum {
+	Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
+	Z_EROFS_COMPRESSION_RUNTIME_MAX
+};
+
 /* zmap.c */
 extern const struct iomap_ops z_erofs_iomap_report_ops;
 
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 11c7a1aaebad..5c34ef66677f 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -476,6 +476,11 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,
 	struct erofs_workgroup *grp;
 	int err;
 
+	if (!(map->m_flags & EROFS_MAP_ENCODED)) {
+		DBG_BUGON(1);
+		return -EFSCORRUPTED;
+	}
+
 	/* no available pcluster, let's allocate one */
 	pcl = z_erofs_alloc_pcluster(map->m_plen >> PAGE_SHIFT);
 	if (IS_ERR(pcl))
@@ -483,16 +488,11 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,
 
 	atomic_set(&pcl->obj.refcount, 1);
 	pcl->obj.index = map->m_pa >> PAGE_SHIFT;
-
+	pcl->algorithmformat = map->m_algorithmformat;
 	pcl->length = (map->m_llen << Z_EROFS_PCLUSTER_LENGTH_BIT) |
 		(map->m_flags & EROFS_MAP_FULL_MAPPED ?
 			Z_EROFS_PCLUSTER_FULL_LENGTH : 0);
 
-	if (map->m_flags & EROFS_MAP_ZIPPED)
-		pcl->algorithmformat = Z_EROFS_COMPRESSION_LZ4;
-	else
-		pcl->algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
-
 	/* new pclusters should be claimed as type 1, primary and followed */
 	pcl->next = clt->owned_head;
 	clt->mode = COLLECT_PRIMARY_FOLLOWED;
diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index 7a6df35fdc91..9d9c26343dab 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -111,7 +111,7 @@ struct z_erofs_maprecorder {
 
 	unsigned long lcn;
 	/* compression extent information gathered */
-	u8  type;
+	u8  type, headtype;
 	u16 clusterofs;
 	u16 delta[2];
 	erofs_blk_t pblk, compressedlcs;
@@ -446,9 +446,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
 		}
 		return z_erofs_extent_lookback(m, m->delta[0]);
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-		map->m_flags &= ~EROFS_MAP_ZIPPED;
-		fallthrough;
 	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+		m->headtype = m->type;
 		map->m_la = (lcn << lclusterbits) | m->clusterofs;
 		break;
 	default:
@@ -472,7 +471,7 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
 
 	DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
 		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
-	if (!(map->m_flags & EROFS_MAP_ZIPPED) ||
+	if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
 	    !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
 		map->m_plen = 1 << lclusterbits;
 		return 0;
@@ -609,16 +608,13 @@ int z_erofs_map_blocks_iter(struct inode *inode,
 	if (err)
 		goto unmap_out;
 
-	map->m_flags = EROFS_MAP_ZIPPED;	/* by default, compressed */
 	end = (m.lcn + 1ULL) << lclusterbits;
 
 	switch (m.type) {
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-		if (endoff >= m.clusterofs)
-			map->m_flags &= ~EROFS_MAP_ZIPPED;
-		fallthrough;
 	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
 		if (endoff >= m.clusterofs) {
+			m.headtype = m.type;
 			map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
 			break;
 		}
@@ -650,12 +646,17 @@ int z_erofs_map_blocks_iter(struct inode *inode,
 
 	map->m_llen = end - map->m_la;
 	map->m_pa = blknr_to_addr(m.pblk);
-	map->m_flags |= EROFS_MAP_MAPPED;
+	map->m_flags = EROFS_MAP_MAPPED | EROFS_MAP_ENCODED;
 
 	err = z_erofs_get_extent_compressedlen(&m, initial_lcn);
 	if (err)
 		goto out;
 
+	if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
+		map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
+	else
+		map->m_algorithmformat = vi->z_algorithmtype[0];
+
 	if (flags & EROFS_GET_BLOCKS_FIEMAP) {
 		err = z_erofs_get_extent_decompressedlen(&m);
 		if (!err)
diff --git a/include/trace/events/erofs.h b/include/trace/events/erofs.h
index db4f2cec8360..16ae7b666810 100644
--- a/include/trace/events/erofs.h
+++ b/include/trace/events/erofs.h
@@ -24,7 +24,7 @@ struct erofs_map_blocks;
 #define show_mflags(flags) __print_flags(flags, "",	\
 	{ EROFS_MAP_MAPPED,	"M" },			\
 	{ EROFS_MAP_META,	"I" },			\
-	{ EROFS_MAP_ZIPPED,	"Z" })
+	{ EROFS_MAP_ENCODED,	"E" })
 
 TRACE_EVENT(erofs_lookup,
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 2/3] erofs: introduce the secondary compression head
  2021-10-08 20:08 ` Gao Xiang
@ 2021-10-08 20:08   ` Gao Xiang
  -1 siblings, 0 replies; 33+ messages in thread
From: Gao Xiang @ 2021-10-08 20:08 UTC (permalink / raw)
  To: linux-erofs; +Cc: Chao Yu, LKML, Yue Hu, Gao Xiang

From: Gao Xiang <hsiangkao@linux.alibaba.com>

Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
lcluster to indicate whether the whole pcluster is compressed or not.

In this patch, a new HEAD2 head type is introduced to specify another
compression algorithm other than the primary algorithm for each
compressed file, which can be used for upcoming LZMA compression and
LZ4 range dictionary compression for various data patterns.

It has been stayed in the EROFS roadmap for years. Complete it now!

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/erofs_fs.h |  8 +++++---
 fs/erofs/zmap.c     | 36 +++++++++++++++++++++++++++---------
 2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index b0b23f41abc3..f579c8c78fff 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -21,11 +21,13 @@
 #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS	0x00000002
 #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER	0x00000002
 #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
+#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2	0x00000008
 #define EROFS_ALL_FEATURE_INCOMPAT		\
 	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
 	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
 	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
-	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
+	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
+	 EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
 
 #define EROFS_SB_EXTSLOT_SIZE	16
 
@@ -314,9 +316,9 @@ struct z_erofs_map_header {
  */
 enum {
 	Z_EROFS_VLE_CLUSTER_TYPE_PLAIN		= 0,
-	Z_EROFS_VLE_CLUSTER_TYPE_HEAD		= 1,
+	Z_EROFS_VLE_CLUSTER_TYPE_HEAD1		= 1,
 	Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD	= 2,
-	Z_EROFS_VLE_CLUSTER_TYPE_RESERVED	= 3,
+	Z_EROFS_VLE_CLUSTER_TYPE_HEAD2		= 3,
 	Z_EROFS_VLE_CLUSTER_TYPE_MAX
 };
 
diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index 9d9c26343dab..03945f15ceae 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -69,11 +69,17 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
 	vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
 
 	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
-		erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
+		erofs_err(sb, "unknown HEAD1 format %u for nid %llu, please upgrade kernel",
 			  vi->z_algorithmtype[0], vi->nid);
 		err = -EOPNOTSUPP;
 		goto unmap_done;
 	}
+	if (vi->z_algorithmtype[1] >= Z_EROFS_COMPRESSION_MAX) {
+		erofs_err(sb, "unknown HEAD2 format %u for nid %llu, please upgrade kernel",
+			  vi->z_algorithmtype[1], vi->nid);
+		err = -EOPNOTSUPP;
+		goto unmap_done;
+	}
 
 	vi->z_logical_clusterbits = LOG_BLOCK_SIZE + (h->h_clusterbits & 7);
 	if (!erofs_sb_has_big_pcluster(EROFS_SB(sb)) &&
@@ -189,7 +195,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
 		m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
 		break;
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		m->clusterofs = le16_to_cpu(di->di_clusterofs);
 		m->pblk = le32_to_cpu(di->di_u.blkaddr);
 		break;
@@ -446,7 +453,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
 		}
 		return z_erofs_extent_lookback(m, m->delta[0]);
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		m->headtype = m->type;
 		map->m_la = (lcn << lclusterbits) | m->clusterofs;
 		break;
@@ -470,13 +478,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
 	int err;
 
 	DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
-		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
+		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
+		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
+	DBG_BUGON(m->type != m->headtype);
+
 	if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
-	    !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
+	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
+	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
+	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
+	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
 		map->m_plen = 1 << lclusterbits;
 		return 0;
 	}
-
 	lcn = m->lcn + 1;
 	if (m->compressedlcs)
 		goto out;
@@ -498,7 +511,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
 
 	switch (m->type) {
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		/*
 		 * if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
 		 * rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
@@ -553,7 +567,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
 			DBG_BUGON(!m->delta[1] &&
 				  m->clusterofs != 1 << lclusterbits);
 		} else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
-			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
+			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
+			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
 			/* go on until the next HEAD lcluster */
 			if (lcn != headlcn)
 				break;
@@ -612,7 +627,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
 
 	switch (m.type) {
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		if (endoff >= m.clusterofs) {
 			m.headtype = m.type;
 			map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
@@ -654,6 +670,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
 
 	if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
 		map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
+	else if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2)
+		map->m_algorithmformat = vi->z_algorithmtype[1];
 	else
 		map->m_algorithmformat = vi->z_algorithmtype[0];
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 2/3] erofs: introduce the secondary compression head
@ 2021-10-08 20:08   ` Gao Xiang
  0 siblings, 0 replies; 33+ messages in thread
From: Gao Xiang @ 2021-10-08 20:08 UTC (permalink / raw)
  To: linux-erofs; +Cc: Gao Xiang, LKML

From: Gao Xiang <hsiangkao@linux.alibaba.com>

Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
lcluster to indicate whether the whole pcluster is compressed or not.

In this patch, a new HEAD2 head type is introduced to specify another
compression algorithm other than the primary algorithm for each
compressed file, which can be used for upcoming LZMA compression and
LZ4 range dictionary compression for various data patterns.

It has been stayed in the EROFS roadmap for years. Complete it now!

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/erofs_fs.h |  8 +++++---
 fs/erofs/zmap.c     | 36 +++++++++++++++++++++++++++---------
 2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index b0b23f41abc3..f579c8c78fff 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -21,11 +21,13 @@
 #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS	0x00000002
 #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER	0x00000002
 #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
+#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2	0x00000008
 #define EROFS_ALL_FEATURE_INCOMPAT		\
 	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
 	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
 	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
-	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
+	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
+	 EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
 
 #define EROFS_SB_EXTSLOT_SIZE	16
 
@@ -314,9 +316,9 @@ struct z_erofs_map_header {
  */
 enum {
 	Z_EROFS_VLE_CLUSTER_TYPE_PLAIN		= 0,
-	Z_EROFS_VLE_CLUSTER_TYPE_HEAD		= 1,
+	Z_EROFS_VLE_CLUSTER_TYPE_HEAD1		= 1,
 	Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD	= 2,
-	Z_EROFS_VLE_CLUSTER_TYPE_RESERVED	= 3,
+	Z_EROFS_VLE_CLUSTER_TYPE_HEAD2		= 3,
 	Z_EROFS_VLE_CLUSTER_TYPE_MAX
 };
 
diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index 9d9c26343dab..03945f15ceae 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -69,11 +69,17 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
 	vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
 
 	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
-		erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
+		erofs_err(sb, "unknown HEAD1 format %u for nid %llu, please upgrade kernel",
 			  vi->z_algorithmtype[0], vi->nid);
 		err = -EOPNOTSUPP;
 		goto unmap_done;
 	}
+	if (vi->z_algorithmtype[1] >= Z_EROFS_COMPRESSION_MAX) {
+		erofs_err(sb, "unknown HEAD2 format %u for nid %llu, please upgrade kernel",
+			  vi->z_algorithmtype[1], vi->nid);
+		err = -EOPNOTSUPP;
+		goto unmap_done;
+	}
 
 	vi->z_logical_clusterbits = LOG_BLOCK_SIZE + (h->h_clusterbits & 7);
 	if (!erofs_sb_has_big_pcluster(EROFS_SB(sb)) &&
@@ -189,7 +195,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
 		m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
 		break;
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		m->clusterofs = le16_to_cpu(di->di_clusterofs);
 		m->pblk = le32_to_cpu(di->di_u.blkaddr);
 		break;
@@ -446,7 +453,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
 		}
 		return z_erofs_extent_lookback(m, m->delta[0]);
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		m->headtype = m->type;
 		map->m_la = (lcn << lclusterbits) | m->clusterofs;
 		break;
@@ -470,13 +478,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
 	int err;
 
 	DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
-		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
+		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
+		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
+	DBG_BUGON(m->type != m->headtype);
+
 	if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
-	    !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
+	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
+	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
+	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
+	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
 		map->m_plen = 1 << lclusterbits;
 		return 0;
 	}
-
 	lcn = m->lcn + 1;
 	if (m->compressedlcs)
 		goto out;
@@ -498,7 +511,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
 
 	switch (m->type) {
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		/*
 		 * if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
 		 * rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
@@ -553,7 +567,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
 			DBG_BUGON(!m->delta[1] &&
 				  m->clusterofs != 1 << lclusterbits);
 		} else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
-			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
+			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
+			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
 			/* go on until the next HEAD lcluster */
 			if (lcn != headlcn)
 				break;
@@ -612,7 +627,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
 
 	switch (m.type) {
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		if (endoff >= m.clusterofs) {
 			m.headtype = m.type;
 			map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
@@ -654,6 +670,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
 
 	if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
 		map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
+	else if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2)
+		map->m_algorithmformat = vi->z_algorithmtype[1];
 	else
 		map->m_algorithmformat = vi->z_algorithmtype[0];
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 3/3] erofs: introduce readmore decompression strategy
  2021-10-08 20:08 ` Gao Xiang
@ 2021-10-08 20:08   ` Gao Xiang
  -1 siblings, 0 replies; 33+ messages in thread
From: Gao Xiang @ 2021-10-08 20:08 UTC (permalink / raw)
  To: linux-erofs; +Cc: Chao Yu, LKML, Yue Hu, Gao Xiang

From: Gao Xiang <hsiangkao@linux.alibaba.com>

Previously, the readahead window was strictly followed by EROFS
decompression strategy in order to minimize extra memory footprint.
However, it could become inefficient if just reading the partial
requested data for much big LZ4 pclusters and the upcoming LZMA
implementation.

Let's try to request the leading data in a pcluster without
triggering memory reclaiming instead for the LZ4 approach first
to boost up 100% randread of large big pclusters, and it has no real
impact on low memory scenarios.

It also introduces a way to expand read lengths in order to decompress
the whole pcluster, which is useful for LZMA since the algorithm
itself is relatively slow and causes CPU bound, but LZ4 is not.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/internal.h | 13 ++++++
 fs/erofs/zdata.c    | 99 ++++++++++++++++++++++++++++++++++++---------
 2 files changed, 93 insertions(+), 19 deletions(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 48bfc6eb2b02..7f96265ccbdb 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -307,6 +307,19 @@ static inline unsigned int erofs_inode_datalayout(unsigned int value)
 			      EROFS_I_DATALAYOUT_BITS);
 }
 
+/*
+ * Different from grab_cache_page_nowait(), reclaiming is never triggered
+ * when allocating new pages.
+ */
+static inline
+struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
+					  pgoff_t index)
+{
+	return pagecache_get_page(mapping, index,
+			FGP_LOCK|FGP_CREAT|FGP_NOFS|FGP_NOWAIT,
+			readahead_gfp_mask(mapping) & ~__GFP_RECLAIM);
+}
+
 extern const struct super_operations erofs_sops;
 
 extern const struct address_space_operations erofs_raw_access_aops;
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 5c34ef66677f..febb018e10a7 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1377,6 +1377,72 @@ static void z_erofs_runqueue(struct super_block *sb,
 	z_erofs_decompress_queue(&io[JQ_SUBMIT], pagepool);
 }
 
+/*
+ * Since partial uptodate is still unimplemented for now, we have to use
+ * approximate readmore strategies as a start.
+ */
+static void z_erofs_pcluster_readmore(struct z_erofs_decompress_frontend *f,
+				      struct readahead_control *rac,
+				      erofs_off_t end,
+				      struct list_head *pagepool,
+				      bool backmost)
+{
+	struct inode *inode = f->inode;
+	struct erofs_map_blocks *map = &f->map;
+	erofs_off_t cur;
+	int err;
+
+	if (backmost) {
+		map->m_la = end;
+		/* TODO: pass in EROFS_GET_BLOCKS_READMORE for LZMA later */
+		err = z_erofs_map_blocks_iter(inode, map, 0);
+		if (err)
+			return;
+
+		/* expend ra for the trailing edge if readahead */
+		if (rac) {
+			loff_t newstart = readahead_pos(rac);
+
+			cur = round_up(map->m_la + map->m_llen, PAGE_SIZE);
+			readahead_expand(rac, newstart, cur - newstart);
+			return;
+		}
+		end = round_up(end, PAGE_SIZE);
+	} else {
+		end = round_up(map->m_la, PAGE_SIZE);
+
+		if (!map->m_llen)
+			return;
+	}
+
+	cur = map->m_la + map->m_llen - 1;
+	while (cur >= end) {
+		pgoff_t index = cur >> PAGE_SHIFT;
+		struct page *page;
+
+		page = erofs_grab_cache_page_nowait(inode->i_mapping, index);
+		if (!page)
+			goto skip;
+
+		if (PageUptodate(page)) {
+			unlock_page(page);
+			put_page(page);
+			goto skip;
+		}
+
+		err = z_erofs_do_read_page(f, page, pagepool);
+		if (err)
+			erofs_err(inode->i_sb,
+				  "readmore error at page %lu @ nid %llu",
+				  index, EROFS_I(inode)->nid);
+		put_page(page);
+skip:
+		if (cur < PAGE_SIZE)
+			break;
+		cur = (index << PAGE_SHIFT) - 1;
+	}
+}
+
 static int z_erofs_readpage(struct file *file, struct page *page)
 {
 	struct inode *const inode = page->mapping->host;
@@ -1385,10 +1451,13 @@ static int z_erofs_readpage(struct file *file, struct page *page)
 	LIST_HEAD(pagepool);
 
 	trace_erofs_readpage(page, false);
-
 	f.headoffset = (erofs_off_t)page->index << PAGE_SHIFT;
 
+	z_erofs_pcluster_readmore(&f, NULL, f.headoffset + PAGE_SIZE - 1,
+				  &pagepool, true);
 	err = z_erofs_do_read_page(&f, page, &pagepool);
+	z_erofs_pcluster_readmore(&f, NULL, 0, &pagepool, false);
+
 	(void)z_erofs_collector_end(&f.clt);
 
 	/* if some compressed cluster ready, need submit them anyway */
@@ -1409,29 +1478,20 @@ static void z_erofs_readahead(struct readahead_control *rac)
 {
 	struct inode *const inode = rac->mapping->host;
 	struct erofs_sb_info *const sbi = EROFS_I_SB(inode);
-
-	unsigned int nr_pages = readahead_count(rac);
-	bool sync = (sbi->ctx.readahead_sync_decompress &&
-			nr_pages <= sbi->ctx.max_sync_decompress_pages);
 	struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode);
 	struct page *page, *head = NULL;
+	unsigned int nr_pages;
 	LIST_HEAD(pagepool);
 
-	trace_erofs_readpages(inode, readahead_index(rac), nr_pages, false);
-
 	f.readahead = true;
 	f.headoffset = readahead_pos(rac);
 
-	while ((page = readahead_page(rac))) {
-		prefetchw(&page->flags);
-
-		/*
-		 * A pure asynchronous readahead is indicated if
-		 * a PG_readahead marked page is hitted at first.
-		 * Let's also do asynchronous decompression for this case.
-		 */
-		sync &= !(PageReadahead(page) && !head);
+	z_erofs_pcluster_readmore(&f, rac, f.headoffset +
+				  readahead_length(rac) - 1, &pagepool, true);
+	nr_pages = readahead_count(rac);
+	trace_erofs_readpages(inode, readahead_index(rac), nr_pages, false);
 
+	while ((page = readahead_page(rac))) {
 		set_page_private(page, (unsigned long)head);
 		head = page;
 	}
@@ -1450,11 +1510,12 @@ static void z_erofs_readahead(struct readahead_control *rac)
 				  page->index, EROFS_I(inode)->nid);
 		put_page(page);
 	}
-
+	z_erofs_pcluster_readmore(&f, rac, 0, &pagepool, false);
 	(void)z_erofs_collector_end(&f.clt);
 
-	z_erofs_runqueue(inode->i_sb, &f, &pagepool, sync);
-
+	z_erofs_runqueue(inode->i_sb, &f, &pagepool,
+			 sbi->ctx.readahead_sync_decompress &&
+			 nr_pages <= sbi->ctx.max_sync_decompress_pages);
 	if (f.map.mpage)
 		put_page(f.map.mpage);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 3/3] erofs: introduce readmore decompression strategy
@ 2021-10-08 20:08   ` Gao Xiang
  0 siblings, 0 replies; 33+ messages in thread
From: Gao Xiang @ 2021-10-08 20:08 UTC (permalink / raw)
  To: linux-erofs; +Cc: Gao Xiang, LKML

From: Gao Xiang <hsiangkao@linux.alibaba.com>

Previously, the readahead window was strictly followed by EROFS
decompression strategy in order to minimize extra memory footprint.
However, it could become inefficient if just reading the partial
requested data for much big LZ4 pclusters and the upcoming LZMA
implementation.

Let's try to request the leading data in a pcluster without
triggering memory reclaiming instead for the LZ4 approach first
to boost up 100% randread of large big pclusters, and it has no real
impact on low memory scenarios.

It also introduces a way to expand read lengths in order to decompress
the whole pcluster, which is useful for LZMA since the algorithm
itself is relatively slow and causes CPU bound, but LZ4 is not.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/internal.h | 13 ++++++
 fs/erofs/zdata.c    | 99 ++++++++++++++++++++++++++++++++++++---------
 2 files changed, 93 insertions(+), 19 deletions(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 48bfc6eb2b02..7f96265ccbdb 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -307,6 +307,19 @@ static inline unsigned int erofs_inode_datalayout(unsigned int value)
 			      EROFS_I_DATALAYOUT_BITS);
 }
 
+/*
+ * Different from grab_cache_page_nowait(), reclaiming is never triggered
+ * when allocating new pages.
+ */
+static inline
+struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
+					  pgoff_t index)
+{
+	return pagecache_get_page(mapping, index,
+			FGP_LOCK|FGP_CREAT|FGP_NOFS|FGP_NOWAIT,
+			readahead_gfp_mask(mapping) & ~__GFP_RECLAIM);
+}
+
 extern const struct super_operations erofs_sops;
 
 extern const struct address_space_operations erofs_raw_access_aops;
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 5c34ef66677f..febb018e10a7 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1377,6 +1377,72 @@ static void z_erofs_runqueue(struct super_block *sb,
 	z_erofs_decompress_queue(&io[JQ_SUBMIT], pagepool);
 }
 
+/*
+ * Since partial uptodate is still unimplemented for now, we have to use
+ * approximate readmore strategies as a start.
+ */
+static void z_erofs_pcluster_readmore(struct z_erofs_decompress_frontend *f,
+				      struct readahead_control *rac,
+				      erofs_off_t end,
+				      struct list_head *pagepool,
+				      bool backmost)
+{
+	struct inode *inode = f->inode;
+	struct erofs_map_blocks *map = &f->map;
+	erofs_off_t cur;
+	int err;
+
+	if (backmost) {
+		map->m_la = end;
+		/* TODO: pass in EROFS_GET_BLOCKS_READMORE for LZMA later */
+		err = z_erofs_map_blocks_iter(inode, map, 0);
+		if (err)
+			return;
+
+		/* expend ra for the trailing edge if readahead */
+		if (rac) {
+			loff_t newstart = readahead_pos(rac);
+
+			cur = round_up(map->m_la + map->m_llen, PAGE_SIZE);
+			readahead_expand(rac, newstart, cur - newstart);
+			return;
+		}
+		end = round_up(end, PAGE_SIZE);
+	} else {
+		end = round_up(map->m_la, PAGE_SIZE);
+
+		if (!map->m_llen)
+			return;
+	}
+
+	cur = map->m_la + map->m_llen - 1;
+	while (cur >= end) {
+		pgoff_t index = cur >> PAGE_SHIFT;
+		struct page *page;
+
+		page = erofs_grab_cache_page_nowait(inode->i_mapping, index);
+		if (!page)
+			goto skip;
+
+		if (PageUptodate(page)) {
+			unlock_page(page);
+			put_page(page);
+			goto skip;
+		}
+
+		err = z_erofs_do_read_page(f, page, pagepool);
+		if (err)
+			erofs_err(inode->i_sb,
+				  "readmore error at page %lu @ nid %llu",
+				  index, EROFS_I(inode)->nid);
+		put_page(page);
+skip:
+		if (cur < PAGE_SIZE)
+			break;
+		cur = (index << PAGE_SHIFT) - 1;
+	}
+}
+
 static int z_erofs_readpage(struct file *file, struct page *page)
 {
 	struct inode *const inode = page->mapping->host;
@@ -1385,10 +1451,13 @@ static int z_erofs_readpage(struct file *file, struct page *page)
 	LIST_HEAD(pagepool);
 
 	trace_erofs_readpage(page, false);
-
 	f.headoffset = (erofs_off_t)page->index << PAGE_SHIFT;
 
+	z_erofs_pcluster_readmore(&f, NULL, f.headoffset + PAGE_SIZE - 1,
+				  &pagepool, true);
 	err = z_erofs_do_read_page(&f, page, &pagepool);
+	z_erofs_pcluster_readmore(&f, NULL, 0, &pagepool, false);
+
 	(void)z_erofs_collector_end(&f.clt);
 
 	/* if some compressed cluster ready, need submit them anyway */
@@ -1409,29 +1478,20 @@ static void z_erofs_readahead(struct readahead_control *rac)
 {
 	struct inode *const inode = rac->mapping->host;
 	struct erofs_sb_info *const sbi = EROFS_I_SB(inode);
-
-	unsigned int nr_pages = readahead_count(rac);
-	bool sync = (sbi->ctx.readahead_sync_decompress &&
-			nr_pages <= sbi->ctx.max_sync_decompress_pages);
 	struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode);
 	struct page *page, *head = NULL;
+	unsigned int nr_pages;
 	LIST_HEAD(pagepool);
 
-	trace_erofs_readpages(inode, readahead_index(rac), nr_pages, false);
-
 	f.readahead = true;
 	f.headoffset = readahead_pos(rac);
 
-	while ((page = readahead_page(rac))) {
-		prefetchw(&page->flags);
-
-		/*
-		 * A pure asynchronous readahead is indicated if
-		 * a PG_readahead marked page is hitted at first.
-		 * Let's also do asynchronous decompression for this case.
-		 */
-		sync &= !(PageReadahead(page) && !head);
+	z_erofs_pcluster_readmore(&f, rac, f.headoffset +
+				  readahead_length(rac) - 1, &pagepool, true);
+	nr_pages = readahead_count(rac);
+	trace_erofs_readpages(inode, readahead_index(rac), nr_pages, false);
 
+	while ((page = readahead_page(rac))) {
 		set_page_private(page, (unsigned long)head);
 		head = page;
 	}
@@ -1450,11 +1510,12 @@ static void z_erofs_readahead(struct readahead_control *rac)
 				  page->index, EROFS_I(inode)->nid);
 		put_page(page);
 	}
-
+	z_erofs_pcluster_readmore(&f, rac, 0, &pagepool, false);
 	(void)z_erofs_collector_end(&f.clt);
 
-	z_erofs_runqueue(inode->i_sb, &f, &pagepool, sync);
-
+	z_erofs_runqueue(inode->i_sb, &f, &pagepool,
+			 sbi->ctx.readahead_sync_decompress &&
+			 nr_pages <= sbi->ctx.max_sync_decompress_pages);
 	if (f.map.mpage)
 		put_page(f.map.mpage);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/3] erofs: get compression algorithms directly on mapping
  2021-10-08 20:08   ` Gao Xiang
@ 2021-10-09  1:52     ` Yue Hu
  -1 siblings, 0 replies; 33+ messages in thread
From: Yue Hu @ 2021-10-09  1:52 UTC (permalink / raw)
  To: Gao Xiang
  Cc: linux-erofs, Chao Yu, LKML, Gao Xiang, huyue2, zhangwen, shaojunjun

On Sat,  9 Oct 2021 04:08:37 +0800
Gao Xiang <xiang@kernel.org> wrote:

> From: Gao Xiang <hsiangkao@linux.alibaba.com>
> 
> Currently, z_erofs_map_blocks_iter() returns whether extents are
> compressed or not, and the decompression frontend gets the specific
> algorithms then.
> 
> It works but not quite well in many aspests, for example:
>  - The decompression frontend has to deal with whether extents are
>    compressed or not again and lookup the algorithms if compressed.
>    It's duplicated and too detailed about the on-disk mapping.
> 
>  - A new secondary compression head will be introduced later so that
>    each file can have 2 compression algorithms at most for different
>    type of data. It could increase the complexity of the decompression
>    frontend if still handled in this way;
> 
>  - A new readmore decompression strategy will be introduced to get
>    better performance for much bigger pcluster and lzma, which needs
>    the specific algorithm in advance as well.
> 
> Let's look up compression algorithms in z_erofs_map_blocks_iter()
> directly instead.
> 
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>  fs/erofs/compress.h          |  5 -----
>  fs/erofs/internal.h          | 12 +++++++++---
>  fs/erofs/zdata.c             | 12 ++++++------
>  fs/erofs/zmap.c              | 19 ++++++++++---------
>  include/trace/events/erofs.h |  2 +-
>  5 files changed, 26 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/erofs/compress.h b/fs/erofs/compress.h
> index 3701c72bacb2..ad62d1b4d371 100644
> --- a/fs/erofs/compress.h
> +++ b/fs/erofs/compress.h
> @@ -8,11 +8,6 @@
>  
>  #include "internal.h"
>  
> -enum {
> -	Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
> -	Z_EROFS_COMPRESSION_RUNTIME_MAX
> -};
> -
>  struct z_erofs_decompress_req {
>  	struct super_block *sb;
>  	struct page **in, **out;
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 9524e155b38f..48bfc6eb2b02 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -338,7 +338,7 @@ extern const struct address_space_operations z_erofs_aops;
>   * of the corresponding uncompressed data in the file.
>   */
>  enum {
> -	BH_Zipped = BH_PrivateStart,
> +	BH_Encoded = BH_PrivateStart,
>  	BH_FullMapped,
>  };
>  
> @@ -346,8 +346,8 @@ enum {
>  #define EROFS_MAP_MAPPED	(1 << BH_Mapped)
>  /* Located in metadata (could be copied from bd_inode) */
>  #define EROFS_MAP_META		(1 << BH_Meta)
> -/* The extent has been compressed */
> -#define EROFS_MAP_ZIPPED	(1 << BH_Zipped)
> +/* The extent is encoded */
> +#define EROFS_MAP_ENCODED	(1 << BH_Encoded)
>  /* The length of extent is full */
>  #define EROFS_MAP_FULL_MAPPED	(1 << BH_FullMapped)
>  
> @@ -355,6 +355,7 @@ struct erofs_map_blocks {
>  	erofs_off_t m_pa, m_la;
>  	u64 m_plen, m_llen;
>  
> +	char m_algorithmformat;
>  	unsigned int m_flags;
>  
>  	struct page *mpage;
> @@ -368,6 +369,11 @@ struct erofs_map_blocks {
>   */
>  #define EROFS_GET_BLOCKS_FIEMAP	0x0002
>  
> +enum {
> +	Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
> +	Z_EROFS_COMPRESSION_RUNTIME_MAX
> +};
> +
>  /* zmap.c */
>  extern const struct iomap_ops z_erofs_iomap_report_ops;
>  
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index 11c7a1aaebad..5c34ef66677f 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -476,6 +476,11 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,
>  	struct erofs_workgroup *grp;
>  	int err;
>  
> +	if (!(map->m_flags & EROFS_MAP_ENCODED)) {
> +		DBG_BUGON(1);
> +		return -EFSCORRUPTED;
> +	}
> +
>  	/* no available pcluster, let's allocate one */
>  	pcl = z_erofs_alloc_pcluster(map->m_plen >> PAGE_SHIFT);
>  	if (IS_ERR(pcl))
> @@ -483,16 +488,11 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,
>  
>  	atomic_set(&pcl->obj.refcount, 1);
>  	pcl->obj.index = map->m_pa >> PAGE_SHIFT;
> -
> +	pcl->algorithmformat = map->m_algorithmformat;
>  	pcl->length = (map->m_llen << Z_EROFS_PCLUSTER_LENGTH_BIT) |
>  		(map->m_flags & EROFS_MAP_FULL_MAPPED ?
>  			Z_EROFS_PCLUSTER_FULL_LENGTH : 0);
>  
> -	if (map->m_flags & EROFS_MAP_ZIPPED)
> -		pcl->algorithmformat = Z_EROFS_COMPRESSION_LZ4;
> -	else
> -		pcl->algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
> -
>  	/* new pclusters should be claimed as type 1, primary and followed */
>  	pcl->next = clt->owned_head;
>  	clt->mode = COLLECT_PRIMARY_FOLLOWED;
> diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
> index 7a6df35fdc91..9d9c26343dab 100644
> --- a/fs/erofs/zmap.c
> +++ b/fs/erofs/zmap.c
> @@ -111,7 +111,7 @@ struct z_erofs_maprecorder {
>  
>  	unsigned long lcn;
>  	/* compression extent information gathered */
> -	u8  type;
> +	u8  type, headtype;
>  	u16 clusterofs;
>  	u16 delta[2];
>  	erofs_blk_t pblk, compressedlcs;
> @@ -446,9 +446,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
>  		}
>  		return z_erofs_extent_lookback(m, m->delta[0]);
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -		map->m_flags &= ~EROFS_MAP_ZIPPED;
> -		fallthrough;
>  	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +		m->headtype = m->type;
>  		map->m_la = (lcn << lclusterbits) | m->clusterofs;
>  		break;
>  	default:
> @@ -472,7 +471,7 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>  
>  	DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
>  		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
> -	if (!(map->m_flags & EROFS_MAP_ZIPPED) ||
> +	if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
>  	    !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
>  		map->m_plen = 1 << lclusterbits;
>  		return 0;
> @@ -609,16 +608,13 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>  	if (err)
>  		goto unmap_out;
>  
> -	map->m_flags = EROFS_MAP_ZIPPED;	/* by default, compressed */
>  	end = (m.lcn + 1ULL) << lclusterbits;
>  
>  	switch (m.type) {
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -		if (endoff >= m.clusterofs)
> -			map->m_flags &= ~EROFS_MAP_ZIPPED;
> -		fallthrough;
>  	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
>  		if (endoff >= m.clusterofs) {
> +			m.headtype = m.type;
>  			map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
>  			break;
>  		}
> @@ -650,12 +646,17 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>  
>  	map->m_llen = end - map->m_la;
>  	map->m_pa = blknr_to_addr(m.pblk);
> -	map->m_flags |= EROFS_MAP_MAPPED;
> +	map->m_flags = EROFS_MAP_MAPPED | EROFS_MAP_ENCODED;
>  
>  	err = z_erofs_get_extent_compressedlen(&m, initial_lcn);
>  	if (err)
>  		goto out;
>  
> +	if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
> +		map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
> +	else
> +		map->m_algorithmformat = vi->z_algorithmtype[0];
> +
>  	if (flags & EROFS_GET_BLOCKS_FIEMAP) {
>  		err = z_erofs_get_extent_decompressedlen(&m);
>  		if (!err)
> diff --git a/include/trace/events/erofs.h b/include/trace/events/erofs.h
> index db4f2cec8360..16ae7b666810 100644
> --- a/include/trace/events/erofs.h
> +++ b/include/trace/events/erofs.h
> @@ -24,7 +24,7 @@ struct erofs_map_blocks;
>  #define show_mflags(flags) __print_flags(flags, "",	\
>  	{ EROFS_MAP_MAPPED,	"M" },			\
>  	{ EROFS_MAP_META,	"I" },			\
> -	{ EROFS_MAP_ZIPPED,	"Z" })
> +	{ EROFS_MAP_ENCODED,	"E" })

Looks good to me.

Reviewed-by: Yue Hu <huyue2@yulong.com>

>  
>  TRACE_EVENT(erofs_lookup,
>  


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/3] erofs: get compression algorithms directly on mapping
@ 2021-10-09  1:52     ` Yue Hu
  0 siblings, 0 replies; 33+ messages in thread
From: Yue Hu @ 2021-10-09  1:52 UTC (permalink / raw)
  To: Gao Xiang; +Cc: LKML, shaojunjun, huyue2, Gao Xiang, zhangwen, linux-erofs

On Sat,  9 Oct 2021 04:08:37 +0800
Gao Xiang <xiang@kernel.org> wrote:

> From: Gao Xiang <hsiangkao@linux.alibaba.com>
> 
> Currently, z_erofs_map_blocks_iter() returns whether extents are
> compressed or not, and the decompression frontend gets the specific
> algorithms then.
> 
> It works but not quite well in many aspests, for example:
>  - The decompression frontend has to deal with whether extents are
>    compressed or not again and lookup the algorithms if compressed.
>    It's duplicated and too detailed about the on-disk mapping.
> 
>  - A new secondary compression head will be introduced later so that
>    each file can have 2 compression algorithms at most for different
>    type of data. It could increase the complexity of the decompression
>    frontend if still handled in this way;
> 
>  - A new readmore decompression strategy will be introduced to get
>    better performance for much bigger pcluster and lzma, which needs
>    the specific algorithm in advance as well.
> 
> Let's look up compression algorithms in z_erofs_map_blocks_iter()
> directly instead.
> 
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>  fs/erofs/compress.h          |  5 -----
>  fs/erofs/internal.h          | 12 +++++++++---
>  fs/erofs/zdata.c             | 12 ++++++------
>  fs/erofs/zmap.c              | 19 ++++++++++---------
>  include/trace/events/erofs.h |  2 +-
>  5 files changed, 26 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/erofs/compress.h b/fs/erofs/compress.h
> index 3701c72bacb2..ad62d1b4d371 100644
> --- a/fs/erofs/compress.h
> +++ b/fs/erofs/compress.h
> @@ -8,11 +8,6 @@
>  
>  #include "internal.h"
>  
> -enum {
> -	Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
> -	Z_EROFS_COMPRESSION_RUNTIME_MAX
> -};
> -
>  struct z_erofs_decompress_req {
>  	struct super_block *sb;
>  	struct page **in, **out;
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 9524e155b38f..48bfc6eb2b02 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -338,7 +338,7 @@ extern const struct address_space_operations z_erofs_aops;
>   * of the corresponding uncompressed data in the file.
>   */
>  enum {
> -	BH_Zipped = BH_PrivateStart,
> +	BH_Encoded = BH_PrivateStart,
>  	BH_FullMapped,
>  };
>  
> @@ -346,8 +346,8 @@ enum {
>  #define EROFS_MAP_MAPPED	(1 << BH_Mapped)
>  /* Located in metadata (could be copied from bd_inode) */
>  #define EROFS_MAP_META		(1 << BH_Meta)
> -/* The extent has been compressed */
> -#define EROFS_MAP_ZIPPED	(1 << BH_Zipped)
> +/* The extent is encoded */
> +#define EROFS_MAP_ENCODED	(1 << BH_Encoded)
>  /* The length of extent is full */
>  #define EROFS_MAP_FULL_MAPPED	(1 << BH_FullMapped)
>  
> @@ -355,6 +355,7 @@ struct erofs_map_blocks {
>  	erofs_off_t m_pa, m_la;
>  	u64 m_plen, m_llen;
>  
> +	char m_algorithmformat;
>  	unsigned int m_flags;
>  
>  	struct page *mpage;
> @@ -368,6 +369,11 @@ struct erofs_map_blocks {
>   */
>  #define EROFS_GET_BLOCKS_FIEMAP	0x0002
>  
> +enum {
> +	Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
> +	Z_EROFS_COMPRESSION_RUNTIME_MAX
> +};
> +
>  /* zmap.c */
>  extern const struct iomap_ops z_erofs_iomap_report_ops;
>  
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index 11c7a1aaebad..5c34ef66677f 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -476,6 +476,11 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,
>  	struct erofs_workgroup *grp;
>  	int err;
>  
> +	if (!(map->m_flags & EROFS_MAP_ENCODED)) {
> +		DBG_BUGON(1);
> +		return -EFSCORRUPTED;
> +	}
> +
>  	/* no available pcluster, let's allocate one */
>  	pcl = z_erofs_alloc_pcluster(map->m_plen >> PAGE_SHIFT);
>  	if (IS_ERR(pcl))
> @@ -483,16 +488,11 @@ static int z_erofs_register_collection(struct z_erofs_collector *clt,
>  
>  	atomic_set(&pcl->obj.refcount, 1);
>  	pcl->obj.index = map->m_pa >> PAGE_SHIFT;
> -
> +	pcl->algorithmformat = map->m_algorithmformat;
>  	pcl->length = (map->m_llen << Z_EROFS_PCLUSTER_LENGTH_BIT) |
>  		(map->m_flags & EROFS_MAP_FULL_MAPPED ?
>  			Z_EROFS_PCLUSTER_FULL_LENGTH : 0);
>  
> -	if (map->m_flags & EROFS_MAP_ZIPPED)
> -		pcl->algorithmformat = Z_EROFS_COMPRESSION_LZ4;
> -	else
> -		pcl->algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
> -
>  	/* new pclusters should be claimed as type 1, primary and followed */
>  	pcl->next = clt->owned_head;
>  	clt->mode = COLLECT_PRIMARY_FOLLOWED;
> diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
> index 7a6df35fdc91..9d9c26343dab 100644
> --- a/fs/erofs/zmap.c
> +++ b/fs/erofs/zmap.c
> @@ -111,7 +111,7 @@ struct z_erofs_maprecorder {
>  
>  	unsigned long lcn;
>  	/* compression extent information gathered */
> -	u8  type;
> +	u8  type, headtype;
>  	u16 clusterofs;
>  	u16 delta[2];
>  	erofs_blk_t pblk, compressedlcs;
> @@ -446,9 +446,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
>  		}
>  		return z_erofs_extent_lookback(m, m->delta[0]);
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -		map->m_flags &= ~EROFS_MAP_ZIPPED;
> -		fallthrough;
>  	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +		m->headtype = m->type;
>  		map->m_la = (lcn << lclusterbits) | m->clusterofs;
>  		break;
>  	default:
> @@ -472,7 +471,7 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>  
>  	DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
>  		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
> -	if (!(map->m_flags & EROFS_MAP_ZIPPED) ||
> +	if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
>  	    !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
>  		map->m_plen = 1 << lclusterbits;
>  		return 0;
> @@ -609,16 +608,13 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>  	if (err)
>  		goto unmap_out;
>  
> -	map->m_flags = EROFS_MAP_ZIPPED;	/* by default, compressed */
>  	end = (m.lcn + 1ULL) << lclusterbits;
>  
>  	switch (m.type) {
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -		if (endoff >= m.clusterofs)
> -			map->m_flags &= ~EROFS_MAP_ZIPPED;
> -		fallthrough;
>  	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
>  		if (endoff >= m.clusterofs) {
> +			m.headtype = m.type;
>  			map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
>  			break;
>  		}
> @@ -650,12 +646,17 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>  
>  	map->m_llen = end - map->m_la;
>  	map->m_pa = blknr_to_addr(m.pblk);
> -	map->m_flags |= EROFS_MAP_MAPPED;
> +	map->m_flags = EROFS_MAP_MAPPED | EROFS_MAP_ENCODED;
>  
>  	err = z_erofs_get_extent_compressedlen(&m, initial_lcn);
>  	if (err)
>  		goto out;
>  
> +	if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
> +		map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
> +	else
> +		map->m_algorithmformat = vi->z_algorithmtype[0];
> +
>  	if (flags & EROFS_GET_BLOCKS_FIEMAP) {
>  		err = z_erofs_get_extent_decompressedlen(&m);
>  		if (!err)
> diff --git a/include/trace/events/erofs.h b/include/trace/events/erofs.h
> index db4f2cec8360..16ae7b666810 100644
> --- a/include/trace/events/erofs.h
> +++ b/include/trace/events/erofs.h
> @@ -24,7 +24,7 @@ struct erofs_map_blocks;
>  #define show_mflags(flags) __print_flags(flags, "",	\
>  	{ EROFS_MAP_MAPPED,	"M" },			\
>  	{ EROFS_MAP_META,	"I" },			\
> -	{ EROFS_MAP_ZIPPED,	"Z" })
> +	{ EROFS_MAP_ENCODED,	"E" })

Looks good to me.

Reviewed-by: Yue Hu <huyue2@yulong.com>

>  
>  TRACE_EVENT(erofs_lookup,
>  


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/3] erofs: introduce the secondary compression head
  2021-10-08 20:08   ` Gao Xiang
@ 2021-10-09  3:50     ` Yue Hu
  -1 siblings, 0 replies; 33+ messages in thread
From: Yue Hu @ 2021-10-09  3:50 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-erofs, Chao Yu, LKML, Gao Xiang, huyue2, zhangwen

On Sat,  9 Oct 2021 04:08:38 +0800
Gao Xiang <xiang@kernel.org> wrote:

> From: Gao Xiang <hsiangkao@linux.alibaba.com>
> 
> Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
> lcluster to indicate whether the whole pcluster is compressed or not.
> 
> In this patch, a new HEAD2 head type is introduced to specify another
> compression algorithm other than the primary algorithm for each
> compressed file, which can be used for upcoming LZMA compression and
> LZ4 range dictionary compression for various data patterns.
> 
> It has been stayed in the EROFS roadmap for years. Complete it now!
> 
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>  fs/erofs/erofs_fs.h |  8 +++++---
>  fs/erofs/zmap.c     | 36 +++++++++++++++++++++++++++---------
>  2 files changed, 32 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
> index b0b23f41abc3..f579c8c78fff 100644
> --- a/fs/erofs/erofs_fs.h
> +++ b/fs/erofs/erofs_fs.h
> @@ -21,11 +21,13 @@
>  #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS	0x00000002
>  #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER	0x00000002
>  #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
> +#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2	0x00000008
>  #define EROFS_ALL_FEATURE_INCOMPAT		\
>  	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
>  	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
>  	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
> -	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
> +	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
> +	 EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
>  
>  #define EROFS_SB_EXTSLOT_SIZE	16
>  
> @@ -314,9 +316,9 @@ struct z_erofs_map_header {
>   */
>  enum {
>  	Z_EROFS_VLE_CLUSTER_TYPE_PLAIN		= 0,
> -	Z_EROFS_VLE_CLUSTER_TYPE_HEAD		= 1,
> +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD1		= 1,
>  	Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD	= 2,
> -	Z_EROFS_VLE_CLUSTER_TYPE_RESERVED	= 3,
> +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD2		= 3,
>  	Z_EROFS_VLE_CLUSTER_TYPE_MAX
>  };
>  
> diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
> index 9d9c26343dab..03945f15ceae 100644
> --- a/fs/erofs/zmap.c
> +++ b/fs/erofs/zmap.c
> @@ -69,11 +69,17 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
>  	vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
>  
>  	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
> -		erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
> +		erofs_err(sb, "unknown HEAD1 format %u for nid %llu, please upgrade kernel",
>  			  vi->z_algorithmtype[0], vi->nid);
>  		err = -EOPNOTSUPP;
>  		goto unmap_done;
>  	}
> +	if (vi->z_algorithmtype[1] >= Z_EROFS_COMPRESSION_MAX) {
> +		erofs_err(sb, "unknown HEAD2 format %u for nid %llu, please upgrade kernel",
> +			  vi->z_algorithmtype[1], vi->nid);
> +		err = -EOPNOTSUPP;
> +		goto unmap_done;
> +	}

Seems duplicated a little, how about below code?

	if (vi->z_algorithmtype[i] >= Z_EROFS_COMPRESSION_MAX ||
	    vi->z_algorithmtype[++i] >= Z_EROFS_COMPRESSION_MAX) {
                erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel",
			  i, vi->z_algorithmtype[i], vi->nid);
		err = -EOPNOTSUPP;
		goto unmap_done;
	}

>  
>  	vi->z_logical_clusterbits = LOG_BLOCK_SIZE + (h->h_clusterbits & 7);
>  	if (!erofs_sb_has_big_pcluster(EROFS_SB(sb)) &&
> @@ -189,7 +195,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
>  		m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
>  		break;
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>  		m->clusterofs = le16_to_cpu(di->di_clusterofs);
>  		m->pblk = le32_to_cpu(di->di_u.blkaddr);
>  		break;
> @@ -446,7 +453,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
>  		}
>  		return z_erofs_extent_lookback(m, m->delta[0]);
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>  		m->headtype = m->type;
>  		map->m_la = (lcn << lclusterbits) | m->clusterofs;
>  		break;
> @@ -470,13 +478,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>  	int err;
>  
>  	DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
> -		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
> +		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
> +		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
> +	DBG_BUGON(m->type != m->headtype);
> +
>  	if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> -	    !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
> +	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
> +	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
> +	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
> +	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
>  		map->m_plen = 1 << lclusterbits;
>  		return 0;
>  	}
> -
>  	lcn = m->lcn + 1;
>  	if (m->compressedlcs)
>  		goto out;
> @@ -498,7 +511,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>  
>  	switch (m->type) {
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>  		/*
>  		 * if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
>  		 * rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
> @@ -553,7 +567,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
>  			DBG_BUGON(!m->delta[1] &&
>  				  m->clusterofs != 1 << lclusterbits);
>  		} else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> -			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
> +			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
> +			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
>  			/* go on until the next HEAD lcluster */
>  			if (lcn != headlcn)
>  				break;
> @@ -612,7 +627,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>  
>  	switch (m.type) {
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>  		if (endoff >= m.clusterofs) {
>  			m.headtype = m.type;
>  			map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
> @@ -654,6 +670,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>  
>  	if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
>  		map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
> +	else if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2)
> +		map->m_algorithmformat = vi->z_algorithmtype[1];
>  	else
>  		map->m_algorithmformat = vi->z_algorithmtype[0];
>  


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/3] erofs: introduce the secondary compression head
@ 2021-10-09  3:50     ` Yue Hu
  0 siblings, 0 replies; 33+ messages in thread
From: Yue Hu @ 2021-10-09  3:50 UTC (permalink / raw)
  To: Gao Xiang; +Cc: LKML, huyue2, Gao Xiang, zhangwen, linux-erofs

On Sat,  9 Oct 2021 04:08:38 +0800
Gao Xiang <xiang@kernel.org> wrote:

> From: Gao Xiang <hsiangkao@linux.alibaba.com>
> 
> Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
> lcluster to indicate whether the whole pcluster is compressed or not.
> 
> In this patch, a new HEAD2 head type is introduced to specify another
> compression algorithm other than the primary algorithm for each
> compressed file, which can be used for upcoming LZMA compression and
> LZ4 range dictionary compression for various data patterns.
> 
> It has been stayed in the EROFS roadmap for years. Complete it now!
> 
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>  fs/erofs/erofs_fs.h |  8 +++++---
>  fs/erofs/zmap.c     | 36 +++++++++++++++++++++++++++---------
>  2 files changed, 32 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
> index b0b23f41abc3..f579c8c78fff 100644
> --- a/fs/erofs/erofs_fs.h
> +++ b/fs/erofs/erofs_fs.h
> @@ -21,11 +21,13 @@
>  #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS	0x00000002
>  #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER	0x00000002
>  #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
> +#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2	0x00000008
>  #define EROFS_ALL_FEATURE_INCOMPAT		\
>  	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
>  	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
>  	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
> -	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
> +	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
> +	 EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
>  
>  #define EROFS_SB_EXTSLOT_SIZE	16
>  
> @@ -314,9 +316,9 @@ struct z_erofs_map_header {
>   */
>  enum {
>  	Z_EROFS_VLE_CLUSTER_TYPE_PLAIN		= 0,
> -	Z_EROFS_VLE_CLUSTER_TYPE_HEAD		= 1,
> +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD1		= 1,
>  	Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD	= 2,
> -	Z_EROFS_VLE_CLUSTER_TYPE_RESERVED	= 3,
> +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD2		= 3,
>  	Z_EROFS_VLE_CLUSTER_TYPE_MAX
>  };
>  
> diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
> index 9d9c26343dab..03945f15ceae 100644
> --- a/fs/erofs/zmap.c
> +++ b/fs/erofs/zmap.c
> @@ -69,11 +69,17 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
>  	vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
>  
>  	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
> -		erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
> +		erofs_err(sb, "unknown HEAD1 format %u for nid %llu, please upgrade kernel",
>  			  vi->z_algorithmtype[0], vi->nid);
>  		err = -EOPNOTSUPP;
>  		goto unmap_done;
>  	}
> +	if (vi->z_algorithmtype[1] >= Z_EROFS_COMPRESSION_MAX) {
> +		erofs_err(sb, "unknown HEAD2 format %u for nid %llu, please upgrade kernel",
> +			  vi->z_algorithmtype[1], vi->nid);
> +		err = -EOPNOTSUPP;
> +		goto unmap_done;
> +	}

Seems duplicated a little, how about below code?

	if (vi->z_algorithmtype[i] >= Z_EROFS_COMPRESSION_MAX ||
	    vi->z_algorithmtype[++i] >= Z_EROFS_COMPRESSION_MAX) {
                erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel",
			  i, vi->z_algorithmtype[i], vi->nid);
		err = -EOPNOTSUPP;
		goto unmap_done;
	}

>  
>  	vi->z_logical_clusterbits = LOG_BLOCK_SIZE + (h->h_clusterbits & 7);
>  	if (!erofs_sb_has_big_pcluster(EROFS_SB(sb)) &&
> @@ -189,7 +195,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
>  		m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
>  		break;
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>  		m->clusterofs = le16_to_cpu(di->di_clusterofs);
>  		m->pblk = le32_to_cpu(di->di_u.blkaddr);
>  		break;
> @@ -446,7 +453,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
>  		}
>  		return z_erofs_extent_lookback(m, m->delta[0]);
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>  		m->headtype = m->type;
>  		map->m_la = (lcn << lclusterbits) | m->clusterofs;
>  		break;
> @@ -470,13 +478,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>  	int err;
>  
>  	DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
> -		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
> +		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
> +		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
> +	DBG_BUGON(m->type != m->headtype);
> +
>  	if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> -	    !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
> +	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
> +	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
> +	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
> +	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
>  		map->m_plen = 1 << lclusterbits;
>  		return 0;
>  	}
> -
>  	lcn = m->lcn + 1;
>  	if (m->compressedlcs)
>  		goto out;
> @@ -498,7 +511,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>  
>  	switch (m->type) {
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>  		/*
>  		 * if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
>  		 * rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
> @@ -553,7 +567,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
>  			DBG_BUGON(!m->delta[1] &&
>  				  m->clusterofs != 1 << lclusterbits);
>  		} else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> -			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
> +			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
> +			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
>  			/* go on until the next HEAD lcluster */
>  			if (lcn != headlcn)
>  				break;
> @@ -612,7 +627,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>  
>  	switch (m.type) {
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>  		if (endoff >= m.clusterofs) {
>  			m.headtype = m.type;
>  			map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
> @@ -654,6 +670,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>  
>  	if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
>  		map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
> +	else if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2)
> +		map->m_algorithmformat = vi->z_algorithmtype[1];
>  	else
>  		map->m_algorithmformat = vi->z_algorithmtype[0];
>  


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/3] erofs: introduce the secondary compression head
  2021-10-09  3:50     ` Yue Hu
@ 2021-10-09  4:47       ` Gao Xiang
  -1 siblings, 0 replies; 33+ messages in thread
From: Gao Xiang @ 2021-10-09  4:47 UTC (permalink / raw)
  To: Yue Hu; +Cc: Gao Xiang, linux-erofs, Chao Yu, LKML, huyue2, zhangwen

Hi Yue,

On Sat, Oct 09, 2021 at 11:50:32AM +0800, Yue Hu wrote:
> On Sat,  9 Oct 2021 04:08:38 +0800
> Gao Xiang <xiang@kernel.org> wrote:
> 
> > From: Gao Xiang <hsiangkao@linux.alibaba.com>
> > 
> > Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
> > lcluster to indicate whether the whole pcluster is compressed or not.
> > 
> > In this patch, a new HEAD2 head type is introduced to specify another
> > compression algorithm other than the primary algorithm for each
> > compressed file, which can be used for upcoming LZMA compression and
> > LZ4 range dictionary compression for various data patterns.
> > 
> > It has been stayed in the EROFS roadmap for years. Complete it now!
> > 
> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > ---
> >  fs/erofs/erofs_fs.h |  8 +++++---
> >  fs/erofs/zmap.c     | 36 +++++++++++++++++++++++++++---------
> >  2 files changed, 32 insertions(+), 12 deletions(-)
> > 
> > diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
> > index b0b23f41abc3..f579c8c78fff 100644
> > --- a/fs/erofs/erofs_fs.h
> > +++ b/fs/erofs/erofs_fs.h
> > @@ -21,11 +21,13 @@
> >  #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS	0x00000002
> >  #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER	0x00000002
> >  #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
> > +#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2	0x00000008
> >  #define EROFS_ALL_FEATURE_INCOMPAT		\
> >  	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
> >  	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
> >  	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
> > -	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
> > +	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
> > +	 EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
> >  
> >  #define EROFS_SB_EXTSLOT_SIZE	16
> >  
> > @@ -314,9 +316,9 @@ struct z_erofs_map_header {
> >   */
> >  enum {
> >  	Z_EROFS_VLE_CLUSTER_TYPE_PLAIN		= 0,
> > -	Z_EROFS_VLE_CLUSTER_TYPE_HEAD		= 1,
> > +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD1		= 1,
> >  	Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD	= 2,
> > -	Z_EROFS_VLE_CLUSTER_TYPE_RESERVED	= 3,
> > +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD2		= 3,
> >  	Z_EROFS_VLE_CLUSTER_TYPE_MAX
> >  };
> >  
> > diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
> > index 9d9c26343dab..03945f15ceae 100644
> > --- a/fs/erofs/zmap.c
> > +++ b/fs/erofs/zmap.c
> > @@ -69,11 +69,17 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
> >  	vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
> >  
> >  	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
> > -		erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
> > +		erofs_err(sb, "unknown HEAD1 format %u for nid %llu, please upgrade kernel",
> >  			  vi->z_algorithmtype[0], vi->nid);
> >  		err = -EOPNOTSUPP;
> >  		goto unmap_done;
> >  	}
> > +	if (vi->z_algorithmtype[1] >= Z_EROFS_COMPRESSION_MAX) {
> > +		erofs_err(sb, "unknown HEAD2 format %u for nid %llu, please upgrade kernel",
> > +			  vi->z_algorithmtype[1], vi->nid);
> > +		err = -EOPNOTSUPP;
> > +		goto unmap_done;
> > +	}
> 
> Seems duplicated a little, how about below code?
> 
> 	if (vi->z_algorithmtype[i] >= Z_EROFS_COMPRESSION_MAX ||
> 	    vi->z_algorithmtype[++i] >= Z_EROFS_COMPRESSION_MAX) {
>                 erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel",
> 			  i, vi->z_algorithmtype[i], vi->nid);
> 		err = -EOPNOTSUPP;
> 		goto unmap_done;
> 	}

Yeah, good simplification. I will update it and rename `i' to `headnr'
here.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/3] erofs: introduce the secondary compression head
@ 2021-10-09  4:47       ` Gao Xiang
  0 siblings, 0 replies; 33+ messages in thread
From: Gao Xiang @ 2021-10-09  4:47 UTC (permalink / raw)
  To: Yue Hu; +Cc: LKML, huyue2, linux-erofs, zhangwen

Hi Yue,

On Sat, Oct 09, 2021 at 11:50:32AM +0800, Yue Hu wrote:
> On Sat,  9 Oct 2021 04:08:38 +0800
> Gao Xiang <xiang@kernel.org> wrote:
> 
> > From: Gao Xiang <hsiangkao@linux.alibaba.com>
> > 
> > Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
> > lcluster to indicate whether the whole pcluster is compressed or not.
> > 
> > In this patch, a new HEAD2 head type is introduced to specify another
> > compression algorithm other than the primary algorithm for each
> > compressed file, which can be used for upcoming LZMA compression and
> > LZ4 range dictionary compression for various data patterns.
> > 
> > It has been stayed in the EROFS roadmap for years. Complete it now!
> > 
> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > ---
> >  fs/erofs/erofs_fs.h |  8 +++++---
> >  fs/erofs/zmap.c     | 36 +++++++++++++++++++++++++++---------
> >  2 files changed, 32 insertions(+), 12 deletions(-)
> > 
> > diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
> > index b0b23f41abc3..f579c8c78fff 100644
> > --- a/fs/erofs/erofs_fs.h
> > +++ b/fs/erofs/erofs_fs.h
> > @@ -21,11 +21,13 @@
> >  #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS	0x00000002
> >  #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER	0x00000002
> >  #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
> > +#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2	0x00000008
> >  #define EROFS_ALL_FEATURE_INCOMPAT		\
> >  	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
> >  	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
> >  	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
> > -	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
> > +	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
> > +	 EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
> >  
> >  #define EROFS_SB_EXTSLOT_SIZE	16
> >  
> > @@ -314,9 +316,9 @@ struct z_erofs_map_header {
> >   */
> >  enum {
> >  	Z_EROFS_VLE_CLUSTER_TYPE_PLAIN		= 0,
> > -	Z_EROFS_VLE_CLUSTER_TYPE_HEAD		= 1,
> > +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD1		= 1,
> >  	Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD	= 2,
> > -	Z_EROFS_VLE_CLUSTER_TYPE_RESERVED	= 3,
> > +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD2		= 3,
> >  	Z_EROFS_VLE_CLUSTER_TYPE_MAX
> >  };
> >  
> > diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
> > index 9d9c26343dab..03945f15ceae 100644
> > --- a/fs/erofs/zmap.c
> > +++ b/fs/erofs/zmap.c
> > @@ -69,11 +69,17 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
> >  	vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
> >  
> >  	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
> > -		erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
> > +		erofs_err(sb, "unknown HEAD1 format %u for nid %llu, please upgrade kernel",
> >  			  vi->z_algorithmtype[0], vi->nid);
> >  		err = -EOPNOTSUPP;
> >  		goto unmap_done;
> >  	}
> > +	if (vi->z_algorithmtype[1] >= Z_EROFS_COMPRESSION_MAX) {
> > +		erofs_err(sb, "unknown HEAD2 format %u for nid %llu, please upgrade kernel",
> > +			  vi->z_algorithmtype[1], vi->nid);
> > +		err = -EOPNOTSUPP;
> > +		goto unmap_done;
> > +	}
> 
> Seems duplicated a little, how about below code?
> 
> 	if (vi->z_algorithmtype[i] >= Z_EROFS_COMPRESSION_MAX ||
> 	    vi->z_algorithmtype[++i] >= Z_EROFS_COMPRESSION_MAX) {
>                 erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel",
> 			  i, vi->z_algorithmtype[i], vi->nid);
> 		err = -EOPNOTSUPP;
> 		goto unmap_done;
> 	}

Yeah, good simplification. I will update it and rename `i' to `headnr'
here.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3 2/3] erofs: introduce the secondary compression head
  2021-10-08 20:08   ` Gao Xiang
@ 2021-10-09 18:12     ` Gao Xiang
  -1 siblings, 0 replies; 33+ messages in thread
From: Gao Xiang @ 2021-10-09 18:12 UTC (permalink / raw)
  To: linux-erofs; +Cc: Chao Yu, LKML, Yue Hu, Gao Xiang

From: Gao Xiang <hsiangkao@linux.alibaba.com>

Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
lcluster to indicate whether the whole pcluster is compressed or not.

In this patch, a new HEAD2 head type is introduced to specify another
compression algorithm other than the primary algorithm for each
compressed file, which can be used for upcoming LZMA compression and
LZ4 range dictionary compression for various data patterns.

It has been stayed in the EROFS roadmap for years. Complete it now!

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
v2: https://lore.kernel.org/r/20211008200839.24541-3-xiang@kernel.org
changes since v2:
 - simplify z_algorithmtype check suggested by Yue.

 fs/erofs/erofs_fs.h |  8 +++++---
 fs/erofs/zmap.c     | 38 ++++++++++++++++++++++++++------------
 2 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index b0b23f41abc3..f579c8c78fff 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -21,11 +21,13 @@
 #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS	0x00000002
 #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER	0x00000002
 #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
+#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2	0x00000008
 #define EROFS_ALL_FEATURE_INCOMPAT		\
 	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
 	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
 	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
-	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
+	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
+	 EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
 
 #define EROFS_SB_EXTSLOT_SIZE	16
 
@@ -314,9 +316,9 @@ struct z_erofs_map_header {
  */
 enum {
 	Z_EROFS_VLE_CLUSTER_TYPE_PLAIN		= 0,
-	Z_EROFS_VLE_CLUSTER_TYPE_HEAD		= 1,
+	Z_EROFS_VLE_CLUSTER_TYPE_HEAD1		= 1,
 	Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD	= 2,
-	Z_EROFS_VLE_CLUSTER_TYPE_RESERVED	= 3,
+	Z_EROFS_VLE_CLUSTER_TYPE_HEAD2		= 3,
 	Z_EROFS_VLE_CLUSTER_TYPE_MAX
 };
 
diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index 9d9c26343dab..864d9d5474d5 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -28,7 +28,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
 {
 	struct erofs_inode *const vi = EROFS_I(inode);
 	struct super_block *const sb = inode->i_sb;
-	int err;
+	int err, headnr;
 	erofs_off_t pos;
 	struct page *page;
 	void *kaddr;
@@ -68,9 +68,11 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
 	vi->z_algorithmtype[0] = h->h_algorithmtype & 15;
 	vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
 
-	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
-		erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
-			  vi->z_algorithmtype[0], vi->nid);
+	headnr = 0;
+	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX ||
+	    vi->z_algorithmtype[++headnr] >= Z_EROFS_COMPRESSION_MAX) {
+		erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel",
+			  headnr + 1, vi->z_algorithmtype[headnr], vi->nid);
 		err = -EOPNOTSUPP;
 		goto unmap_done;
 	}
@@ -189,7 +191,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
 		m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
 		break;
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		m->clusterofs = le16_to_cpu(di->di_clusterofs);
 		m->pblk = le32_to_cpu(di->di_u.blkaddr);
 		break;
@@ -446,7 +449,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
 		}
 		return z_erofs_extent_lookback(m, m->delta[0]);
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		m->headtype = m->type;
 		map->m_la = (lcn << lclusterbits) | m->clusterofs;
 		break;
@@ -470,13 +474,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
 	int err;
 
 	DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
-		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
+		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
+		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
+	DBG_BUGON(m->type != m->headtype);
+
 	if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
-	    !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
+	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
+	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
+	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
+	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
 		map->m_plen = 1 << lclusterbits;
 		return 0;
 	}
-
 	lcn = m->lcn + 1;
 	if (m->compressedlcs)
 		goto out;
@@ -498,7 +507,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
 
 	switch (m->type) {
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		/*
 		 * if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
 		 * rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
@@ -553,7 +563,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
 			DBG_BUGON(!m->delta[1] &&
 				  m->clusterofs != 1 << lclusterbits);
 		} else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
-			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
+			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
+			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
 			/* go on until the next HEAD lcluster */
 			if (lcn != headlcn)
 				break;
@@ -612,7 +623,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
 
 	switch (m.type) {
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		if (endoff >= m.clusterofs) {
 			m.headtype = m.type;
 			map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
@@ -654,6 +666,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
 
 	if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
 		map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
+	else if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2)
+		map->m_algorithmformat = vi->z_algorithmtype[1];
 	else
 		map->m_algorithmformat = vi->z_algorithmtype[0];
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 2/3] erofs: introduce the secondary compression head
@ 2021-10-09 18:12     ` Gao Xiang
  0 siblings, 0 replies; 33+ messages in thread
From: Gao Xiang @ 2021-10-09 18:12 UTC (permalink / raw)
  To: linux-erofs; +Cc: Gao Xiang, LKML

From: Gao Xiang <hsiangkao@linux.alibaba.com>

Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
lcluster to indicate whether the whole pcluster is compressed or not.

In this patch, a new HEAD2 head type is introduced to specify another
compression algorithm other than the primary algorithm for each
compressed file, which can be used for upcoming LZMA compression and
LZ4 range dictionary compression for various data patterns.

It has been stayed in the EROFS roadmap for years. Complete it now!

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
v2: https://lore.kernel.org/r/20211008200839.24541-3-xiang@kernel.org
changes since v2:
 - simplify z_algorithmtype check suggested by Yue.

 fs/erofs/erofs_fs.h |  8 +++++---
 fs/erofs/zmap.c     | 38 ++++++++++++++++++++++++++------------
 2 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index b0b23f41abc3..f579c8c78fff 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -21,11 +21,13 @@
 #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS	0x00000002
 #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER	0x00000002
 #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
+#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2	0x00000008
 #define EROFS_ALL_FEATURE_INCOMPAT		\
 	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
 	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
 	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
-	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
+	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
+	 EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
 
 #define EROFS_SB_EXTSLOT_SIZE	16
 
@@ -314,9 +316,9 @@ struct z_erofs_map_header {
  */
 enum {
 	Z_EROFS_VLE_CLUSTER_TYPE_PLAIN		= 0,
-	Z_EROFS_VLE_CLUSTER_TYPE_HEAD		= 1,
+	Z_EROFS_VLE_CLUSTER_TYPE_HEAD1		= 1,
 	Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD	= 2,
-	Z_EROFS_VLE_CLUSTER_TYPE_RESERVED	= 3,
+	Z_EROFS_VLE_CLUSTER_TYPE_HEAD2		= 3,
 	Z_EROFS_VLE_CLUSTER_TYPE_MAX
 };
 
diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index 9d9c26343dab..864d9d5474d5 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -28,7 +28,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
 {
 	struct erofs_inode *const vi = EROFS_I(inode);
 	struct super_block *const sb = inode->i_sb;
-	int err;
+	int err, headnr;
 	erofs_off_t pos;
 	struct page *page;
 	void *kaddr;
@@ -68,9 +68,11 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
 	vi->z_algorithmtype[0] = h->h_algorithmtype & 15;
 	vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
 
-	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
-		erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
-			  vi->z_algorithmtype[0], vi->nid);
+	headnr = 0;
+	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX ||
+	    vi->z_algorithmtype[++headnr] >= Z_EROFS_COMPRESSION_MAX) {
+		erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel",
+			  headnr + 1, vi->z_algorithmtype[headnr], vi->nid);
 		err = -EOPNOTSUPP;
 		goto unmap_done;
 	}
@@ -189,7 +191,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
 		m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
 		break;
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		m->clusterofs = le16_to_cpu(di->di_clusterofs);
 		m->pblk = le32_to_cpu(di->di_u.blkaddr);
 		break;
@@ -446,7 +449,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
 		}
 		return z_erofs_extent_lookback(m, m->delta[0]);
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		m->headtype = m->type;
 		map->m_la = (lcn << lclusterbits) | m->clusterofs;
 		break;
@@ -470,13 +474,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
 	int err;
 
 	DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
-		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
+		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
+		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
+	DBG_BUGON(m->type != m->headtype);
+
 	if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
-	    !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
+	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
+	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
+	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
+	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
 		map->m_plen = 1 << lclusterbits;
 		return 0;
 	}
-
 	lcn = m->lcn + 1;
 	if (m->compressedlcs)
 		goto out;
@@ -498,7 +507,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
 
 	switch (m->type) {
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		/*
 		 * if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
 		 * rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
@@ -553,7 +563,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
 			DBG_BUGON(!m->delta[1] &&
 				  m->clusterofs != 1 << lclusterbits);
 		} else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
-			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
+			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
+			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
 			/* go on until the next HEAD lcluster */
 			if (lcn != headlcn)
 				break;
@@ -612,7 +623,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
 
 	switch (m.type) {
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		if (endoff >= m.clusterofs) {
 			m.headtype = m.type;
 			map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
@@ -654,6 +666,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
 
 	if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
 		map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
+	else if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2)
+		map->m_algorithmformat = vi->z_algorithmtype[1];
 	else
 		map->m_algorithmformat = vi->z_algorithmtype[0];
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 2/3] erofs: introduce the secondary compression head
  2021-10-09 18:12     ` Gao Xiang
@ 2021-10-10  0:53       ` Yue Hu
  -1 siblings, 0 replies; 33+ messages in thread
From: Yue Hu @ 2021-10-10  0:53 UTC (permalink / raw)
  To: Gao Xiang; +Cc: LKML, huyue2, Gao Xiang, zhangwen, linux-erofs

On Sun, 10 Oct 2021 02:12:09 +0800
Gao Xiang <xiang@kernel.org> wrote:

> From: Gao Xiang <hsiangkao@linux.alibaba.com>
> 
> Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
> lcluster to indicate whether the whole pcluster is compressed or not.
> 
> In this patch, a new HEAD2 head type is introduced to specify another
> compression algorithm other than the primary algorithm for each
> compressed file, which can be used for upcoming LZMA compression and
> LZ4 range dictionary compression for various data patterns.
> 
> It has been stayed in the EROFS roadmap for years. Complete it now!
> 
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
> v2: https://lore.kernel.org/r/20211008200839.24541-3-xiang@kernel.org
> changes since v2:
>  - simplify z_algorithmtype check suggested by Yue.
> 
>  fs/erofs/erofs_fs.h |  8 +++++---
>  fs/erofs/zmap.c     | 38 ++++++++++++++++++++++++++------------
>  2 files changed, 31 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
> index b0b23f41abc3..f579c8c78fff 100644
> --- a/fs/erofs/erofs_fs.h
> +++ b/fs/erofs/erofs_fs.h
> @@ -21,11 +21,13 @@
>  #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS	0x00000002
>  #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER	0x00000002
>  #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
> +#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2	0x00000008
>  #define EROFS_ALL_FEATURE_INCOMPAT		\
>  	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
>  	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
>  	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
> -	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
> +	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
> +	 EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
>  
>  #define EROFS_SB_EXTSLOT_SIZE	16
>  
> @@ -314,9 +316,9 @@ struct z_erofs_map_header {
>   */
>  enum {
>  	Z_EROFS_VLE_CLUSTER_TYPE_PLAIN		= 0,
> -	Z_EROFS_VLE_CLUSTER_TYPE_HEAD		= 1,
> +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD1		= 1,
>  	Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD	= 2,
> -	Z_EROFS_VLE_CLUSTER_TYPE_RESERVED	= 3,
> +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD2		= 3,
>  	Z_EROFS_VLE_CLUSTER_TYPE_MAX
>  };
>  
> diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
> index 9d9c26343dab..864d9d5474d5 100644
> --- a/fs/erofs/zmap.c
> +++ b/fs/erofs/zmap.c
> @@ -28,7 +28,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
>  {
>  	struct erofs_inode *const vi = EROFS_I(inode);
>  	struct super_block *const sb = inode->i_sb;
> -	int err;
> +	int err, headnr;
>  	erofs_off_t pos;
>  	struct page *page;
>  	void *kaddr;
> @@ -68,9 +68,11 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
>  	vi->z_algorithmtype[0] = h->h_algorithmtype & 15;
>  	vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
>  
> -	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
> -		erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
> -			  vi->z_algorithmtype[0], vi->nid);
> +	headnr = 0;
> +	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX ||
> +	    vi->z_algorithmtype[++headnr] >= Z_EROFS_COMPRESSION_MAX) {
> +		erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel",
> +			  headnr + 1, vi->z_algorithmtype[headnr], vi->nid);
>  		err = -EOPNOTSUPP;
>  		goto unmap_done;
>  	}
> @@ -189,7 +191,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
>  		m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
>  		break;
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>  		m->clusterofs = le16_to_cpu(di->di_clusterofs);
>  		m->pblk = le32_to_cpu(di->di_u.blkaddr);
>  		break;
> @@ -446,7 +449,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
>  		}
>  		return z_erofs_extent_lookback(m, m->delta[0]);
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>  		m->headtype = m->type;
>  		map->m_la = (lcn << lclusterbits) | m->clusterofs;
>  		break;
> @@ -470,13 +474,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>  	int err;
>  
>  	DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
> -		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
> +		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
> +		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
> +	DBG_BUGON(m->type != m->headtype);
> +
>  	if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> -	    !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
> +	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
> +	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
> +	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
> +	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
>  		map->m_plen = 1 << lclusterbits;
>  		return 0;
>  	}
> -
>  	lcn = m->lcn + 1;
>  	if (m->compressedlcs)
>  		goto out;
> @@ -498,7 +507,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>  
>  	switch (m->type) {
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>  		/*
>  		 * if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
>  		 * rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
> @@ -553,7 +563,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
>  			DBG_BUGON(!m->delta[1] &&
>  				  m->clusterofs != 1 << lclusterbits);
>  		} else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> -			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
> +			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
> +			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
>  			/* go on until the next HEAD lcluster */
>  			if (lcn != headlcn)
>  				break;
> @@ -612,7 +623,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>  
>  	switch (m.type) {
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>  		if (endoff >= m.clusterofs) {
>  			m.headtype = m.type;
>  			map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
> @@ -654,6 +666,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>  
>  	if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
>  		map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
> +	else if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2)
> +		map->m_algorithmformat = vi->z_algorithmtype[1];
>  	else
>  		map->m_algorithmformat = vi->z_algorithmtype[0];
>  

Looks good to me.

Reviewed-by: Yue Hu <huyue2@yulong.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 2/3] erofs: introduce the secondary compression head
@ 2021-10-10  0:53       ` Yue Hu
  0 siblings, 0 replies; 33+ messages in thread
From: Yue Hu @ 2021-10-10  0:53 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-erofs, Chao Yu, LKML, Gao Xiang, huyue2, zhangwen

On Sun, 10 Oct 2021 02:12:09 +0800
Gao Xiang <xiang@kernel.org> wrote:

> From: Gao Xiang <hsiangkao@linux.alibaba.com>
> 
> Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
> lcluster to indicate whether the whole pcluster is compressed or not.
> 
> In this patch, a new HEAD2 head type is introduced to specify another
> compression algorithm other than the primary algorithm for each
> compressed file, which can be used for upcoming LZMA compression and
> LZ4 range dictionary compression for various data patterns.
> 
> It has been stayed in the EROFS roadmap for years. Complete it now!
> 
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
> v2: https://lore.kernel.org/r/20211008200839.24541-3-xiang@kernel.org
> changes since v2:
>  - simplify z_algorithmtype check suggested by Yue.
> 
>  fs/erofs/erofs_fs.h |  8 +++++---
>  fs/erofs/zmap.c     | 38 ++++++++++++++++++++++++++------------
>  2 files changed, 31 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
> index b0b23f41abc3..f579c8c78fff 100644
> --- a/fs/erofs/erofs_fs.h
> +++ b/fs/erofs/erofs_fs.h
> @@ -21,11 +21,13 @@
>  #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS	0x00000002
>  #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER	0x00000002
>  #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
> +#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2	0x00000008
>  #define EROFS_ALL_FEATURE_INCOMPAT		\
>  	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
>  	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
>  	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
> -	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
> +	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
> +	 EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
>  
>  #define EROFS_SB_EXTSLOT_SIZE	16
>  
> @@ -314,9 +316,9 @@ struct z_erofs_map_header {
>   */
>  enum {
>  	Z_EROFS_VLE_CLUSTER_TYPE_PLAIN		= 0,
> -	Z_EROFS_VLE_CLUSTER_TYPE_HEAD		= 1,
> +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD1		= 1,
>  	Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD	= 2,
> -	Z_EROFS_VLE_CLUSTER_TYPE_RESERVED	= 3,
> +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD2		= 3,
>  	Z_EROFS_VLE_CLUSTER_TYPE_MAX
>  };
>  
> diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
> index 9d9c26343dab..864d9d5474d5 100644
> --- a/fs/erofs/zmap.c
> +++ b/fs/erofs/zmap.c
> @@ -28,7 +28,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
>  {
>  	struct erofs_inode *const vi = EROFS_I(inode);
>  	struct super_block *const sb = inode->i_sb;
> -	int err;
> +	int err, headnr;
>  	erofs_off_t pos;
>  	struct page *page;
>  	void *kaddr;
> @@ -68,9 +68,11 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
>  	vi->z_algorithmtype[0] = h->h_algorithmtype & 15;
>  	vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
>  
> -	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
> -		erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
> -			  vi->z_algorithmtype[0], vi->nid);
> +	headnr = 0;
> +	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX ||
> +	    vi->z_algorithmtype[++headnr] >= Z_EROFS_COMPRESSION_MAX) {
> +		erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel",
> +			  headnr + 1, vi->z_algorithmtype[headnr], vi->nid);
>  		err = -EOPNOTSUPP;
>  		goto unmap_done;
>  	}
> @@ -189,7 +191,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
>  		m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
>  		break;
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>  		m->clusterofs = le16_to_cpu(di->di_clusterofs);
>  		m->pblk = le32_to_cpu(di->di_u.blkaddr);
>  		break;
> @@ -446,7 +449,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
>  		}
>  		return z_erofs_extent_lookback(m, m->delta[0]);
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>  		m->headtype = m->type;
>  		map->m_la = (lcn << lclusterbits) | m->clusterofs;
>  		break;
> @@ -470,13 +474,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>  	int err;
>  
>  	DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
> -		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
> +		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
> +		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
> +	DBG_BUGON(m->type != m->headtype);
> +
>  	if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> -	    !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
> +	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
> +	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
> +	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
> +	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
>  		map->m_plen = 1 << lclusterbits;
>  		return 0;
>  	}
> -
>  	lcn = m->lcn + 1;
>  	if (m->compressedlcs)
>  		goto out;
> @@ -498,7 +507,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>  
>  	switch (m->type) {
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>  		/*
>  		 * if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
>  		 * rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
> @@ -553,7 +563,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
>  			DBG_BUGON(!m->delta[1] &&
>  				  m->clusterofs != 1 << lclusterbits);
>  		} else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> -			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
> +			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
> +			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
>  			/* go on until the next HEAD lcluster */
>  			if (lcn != headlcn)
>  				break;
> @@ -612,7 +623,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>  
>  	switch (m.type) {
>  	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>  		if (endoff >= m.clusterofs) {
>  			m.headtype = m.type;
>  			map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
> @@ -654,6 +666,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>  
>  	if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
>  		map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
> +	else if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2)
> +		map->m_algorithmformat = vi->z_algorithmtype[1];
>  	else
>  		map->m_algorithmformat = vi->z_algorithmtype[0];
>  

Looks good to me.

Reviewed-by: Yue Hu <huyue2@yulong.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/3] erofs: get compression algorithms directly on mapping
  2021-10-08 20:08   ` Gao Xiang
@ 2021-10-17 15:25     ` Chao Yu
  -1 siblings, 0 replies; 33+ messages in thread
From: Chao Yu @ 2021-10-17 15:25 UTC (permalink / raw)
  To: Gao Xiang, linux-erofs; +Cc: Gao Xiang, LKML

On 2021/10/9 4:08, Gao Xiang wrote:
> From: Gao Xiang <hsiangkao@linux.alibaba.com>
> 
> Currently, z_erofs_map_blocks_iter() returns whether extents are
> compressed or not, and the decompression frontend gets the specific
> algorithms then.
> 
> It works but not quite well in many aspests, for example:
>   - The decompression frontend has to deal with whether extents are
>     compressed or not again and lookup the algorithms if compressed.
>     It's duplicated and too detailed about the on-disk mapping.
> 
>   - A new secondary compression head will be introduced later so that
>     each file can have 2 compression algorithms at most for different
>     type of data. It could increase the complexity of the decompression
>     frontend if still handled in this way;
> 
>   - A new readmore decompression strategy will be introduced to get
>     better performance for much bigger pcluster and lzma, which needs
>     the specific algorithm in advance as well.
> 
> Let's look up compression algorithms in z_erofs_map_blocks_iter()
> directly instead.
> 
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Reviewed-by: Chao Yu <chao@kernel.org>

Thanks,

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/3] erofs: get compression algorithms directly on mapping
@ 2021-10-17 15:25     ` Chao Yu
  0 siblings, 0 replies; 33+ messages in thread
From: Chao Yu @ 2021-10-17 15:25 UTC (permalink / raw)
  To: Gao Xiang, linux-erofs; +Cc: LKML, Yue Hu, Gao Xiang

On 2021/10/9 4:08, Gao Xiang wrote:
> From: Gao Xiang <hsiangkao@linux.alibaba.com>
> 
> Currently, z_erofs_map_blocks_iter() returns whether extents are
> compressed or not, and the decompression frontend gets the specific
> algorithms then.
> 
> It works but not quite well in many aspests, for example:
>   - The decompression frontend has to deal with whether extents are
>     compressed or not again and lookup the algorithms if compressed.
>     It's duplicated and too detailed about the on-disk mapping.
> 
>   - A new secondary compression head will be introduced later so that
>     each file can have 2 compression algorithms at most for different
>     type of data. It could increase the complexity of the decompression
>     frontend if still handled in this way;
> 
>   - A new readmore decompression strategy will be introduced to get
>     better performance for much bigger pcluster and lzma, which needs
>     the specific algorithm in advance as well.
> 
> Let's look up compression algorithms in z_erofs_map_blocks_iter()
> directly instead.
> 
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Reviewed-by: Chao Yu <chao@kernel.org>

Thanks,

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 2/3] erofs: introduce the secondary compression head
  2021-10-09 18:12     ` Gao Xiang
@ 2021-10-17 15:27       ` Chao Yu
  -1 siblings, 0 replies; 33+ messages in thread
From: Chao Yu @ 2021-10-17 15:27 UTC (permalink / raw)
  To: Gao Xiang, linux-erofs; +Cc: LKML, Yue Hu, Gao Xiang

On 2021/10/10 2:12, Gao Xiang wrote:
> From: Gao Xiang <hsiangkao@linux.alibaba.com>
> 
> Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
> lcluster to indicate whether the whole pcluster is compressed or not.
> 
> In this patch, a new HEAD2 head type is introduced to specify another
> compression algorithm other than the primary algorithm for each
> compressed file, which can be used for upcoming LZMA compression and
> LZ4 range dictionary compression for various data patterns.
> 
> It has been stayed in the EROFS roadmap for years. Complete it now!
> 
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
> v2: https://lore.kernel.org/r/20211008200839.24541-3-xiang@kernel.org
> changes since v2:
>   - simplify z_algorithmtype check suggested by Yue.
> 
>   fs/erofs/erofs_fs.h |  8 +++++---
>   fs/erofs/zmap.c     | 38 ++++++++++++++++++++++++++------------
>   2 files changed, 31 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
> index b0b23f41abc3..f579c8c78fff 100644
> --- a/fs/erofs/erofs_fs.h
> +++ b/fs/erofs/erofs_fs.h
> @@ -21,11 +21,13 @@
>   #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS	0x00000002
>   #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER	0x00000002
>   #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
> +#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2	0x00000008
>   #define EROFS_ALL_FEATURE_INCOMPAT		\
>   	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
>   	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
>   	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
> -	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
> +	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
> +	 EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
>   
>   #define EROFS_SB_EXTSLOT_SIZE	16
>   
> @@ -314,9 +316,9 @@ struct z_erofs_map_header {
>    */
>   enum {
>   	Z_EROFS_VLE_CLUSTER_TYPE_PLAIN		= 0,
> -	Z_EROFS_VLE_CLUSTER_TYPE_HEAD		= 1,
> +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD1		= 1,
>   	Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD	= 2,
> -	Z_EROFS_VLE_CLUSTER_TYPE_RESERVED	= 3,
> +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD2		= 3,

It needs to update comments above as well.

Thanks,

>   	Z_EROFS_VLE_CLUSTER_TYPE_MAX
>   };
>   
> diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
> index 9d9c26343dab..864d9d5474d5 100644
> --- a/fs/erofs/zmap.c
> +++ b/fs/erofs/zmap.c
> @@ -28,7 +28,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
>   {
>   	struct erofs_inode *const vi = EROFS_I(inode);
>   	struct super_block *const sb = inode->i_sb;
> -	int err;
> +	int err, headnr;
>   	erofs_off_t pos;
>   	struct page *page;
>   	void *kaddr;
> @@ -68,9 +68,11 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
>   	vi->z_algorithmtype[0] = h->h_algorithmtype & 15;
>   	vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
>   
> -	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
> -		erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
> -			  vi->z_algorithmtype[0], vi->nid);
> +	headnr = 0;
> +	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX ||
> +	    vi->z_algorithmtype[++headnr] >= Z_EROFS_COMPRESSION_MAX) {
> +		erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel",
> +			  headnr + 1, vi->z_algorithmtype[headnr], vi->nid);
>   		err = -EOPNOTSUPP;
>   		goto unmap_done;
>   	}
> @@ -189,7 +191,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
>   		m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
>   		break;
>   	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>   		m->clusterofs = le16_to_cpu(di->di_clusterofs);
>   		m->pblk = le32_to_cpu(di->di_u.blkaddr);
>   		break;
> @@ -446,7 +449,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
>   		}
>   		return z_erofs_extent_lookback(m, m->delta[0]);
>   	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>   		m->headtype = m->type;
>   		map->m_la = (lcn << lclusterbits) | m->clusterofs;
>   		break;
> @@ -470,13 +474,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>   	int err;
>   
>   	DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
> -		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
> +		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
> +		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
> +	DBG_BUGON(m->type != m->headtype);
> +
>   	if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> -	    !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
> +	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
> +	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
> +	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
> +	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
>   		map->m_plen = 1 << lclusterbits;
>   		return 0;
>   	}
> -
>   	lcn = m->lcn + 1;
>   	if (m->compressedlcs)
>   		goto out;
> @@ -498,7 +507,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>   
>   	switch (m->type) {
>   	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>   		/*
>   		 * if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
>   		 * rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
> @@ -553,7 +563,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
>   			DBG_BUGON(!m->delta[1] &&
>   				  m->clusterofs != 1 << lclusterbits);
>   		} else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> -			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
> +			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
> +			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
>   			/* go on until the next HEAD lcluster */
>   			if (lcn != headlcn)
>   				break;
> @@ -612,7 +623,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>   
>   	switch (m.type) {
>   	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>   		if (endoff >= m.clusterofs) {
>   			m.headtype = m.type;
>   			map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
> @@ -654,6 +666,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>   
>   	if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
>   		map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
> +	else if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2)
> +		map->m_algorithmformat = vi->z_algorithmtype[1];
>   	else
>   		map->m_algorithmformat = vi->z_algorithmtype[0];
>   
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 2/3] erofs: introduce the secondary compression head
@ 2021-10-17 15:27       ` Chao Yu
  0 siblings, 0 replies; 33+ messages in thread
From: Chao Yu @ 2021-10-17 15:27 UTC (permalink / raw)
  To: Gao Xiang, linux-erofs; +Cc: Gao Xiang, LKML

On 2021/10/10 2:12, Gao Xiang wrote:
> From: Gao Xiang <hsiangkao@linux.alibaba.com>
> 
> Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
> lcluster to indicate whether the whole pcluster is compressed or not.
> 
> In this patch, a new HEAD2 head type is introduced to specify another
> compression algorithm other than the primary algorithm for each
> compressed file, which can be used for upcoming LZMA compression and
> LZ4 range dictionary compression for various data patterns.
> 
> It has been stayed in the EROFS roadmap for years. Complete it now!
> 
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
> v2: https://lore.kernel.org/r/20211008200839.24541-3-xiang@kernel.org
> changes since v2:
>   - simplify z_algorithmtype check suggested by Yue.
> 
>   fs/erofs/erofs_fs.h |  8 +++++---
>   fs/erofs/zmap.c     | 38 ++++++++++++++++++++++++++------------
>   2 files changed, 31 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
> index b0b23f41abc3..f579c8c78fff 100644
> --- a/fs/erofs/erofs_fs.h
> +++ b/fs/erofs/erofs_fs.h
> @@ -21,11 +21,13 @@
>   #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS	0x00000002
>   #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER	0x00000002
>   #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
> +#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2	0x00000008
>   #define EROFS_ALL_FEATURE_INCOMPAT		\
>   	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
>   	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
>   	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
> -	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
> +	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
> +	 EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
>   
>   #define EROFS_SB_EXTSLOT_SIZE	16
>   
> @@ -314,9 +316,9 @@ struct z_erofs_map_header {
>    */
>   enum {
>   	Z_EROFS_VLE_CLUSTER_TYPE_PLAIN		= 0,
> -	Z_EROFS_VLE_CLUSTER_TYPE_HEAD		= 1,
> +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD1		= 1,
>   	Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD	= 2,
> -	Z_EROFS_VLE_CLUSTER_TYPE_RESERVED	= 3,
> +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD2		= 3,

It needs to update comments above as well.

Thanks,

>   	Z_EROFS_VLE_CLUSTER_TYPE_MAX
>   };
>   
> diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
> index 9d9c26343dab..864d9d5474d5 100644
> --- a/fs/erofs/zmap.c
> +++ b/fs/erofs/zmap.c
> @@ -28,7 +28,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
>   {
>   	struct erofs_inode *const vi = EROFS_I(inode);
>   	struct super_block *const sb = inode->i_sb;
> -	int err;
> +	int err, headnr;
>   	erofs_off_t pos;
>   	struct page *page;
>   	void *kaddr;
> @@ -68,9 +68,11 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
>   	vi->z_algorithmtype[0] = h->h_algorithmtype & 15;
>   	vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
>   
> -	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
> -		erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
> -			  vi->z_algorithmtype[0], vi->nid);
> +	headnr = 0;
> +	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX ||
> +	    vi->z_algorithmtype[++headnr] >= Z_EROFS_COMPRESSION_MAX) {
> +		erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel",
> +			  headnr + 1, vi->z_algorithmtype[headnr], vi->nid);
>   		err = -EOPNOTSUPP;
>   		goto unmap_done;
>   	}
> @@ -189,7 +191,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
>   		m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
>   		break;
>   	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>   		m->clusterofs = le16_to_cpu(di->di_clusterofs);
>   		m->pblk = le32_to_cpu(di->di_u.blkaddr);
>   		break;
> @@ -446,7 +449,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
>   		}
>   		return z_erofs_extent_lookback(m, m->delta[0]);
>   	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>   		m->headtype = m->type;
>   		map->m_la = (lcn << lclusterbits) | m->clusterofs;
>   		break;
> @@ -470,13 +474,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>   	int err;
>   
>   	DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
> -		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
> +		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
> +		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
> +	DBG_BUGON(m->type != m->headtype);
> +
>   	if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> -	    !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
> +	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
> +	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
> +	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
> +	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
>   		map->m_plen = 1 << lclusterbits;
>   		return 0;
>   	}
> -
>   	lcn = m->lcn + 1;
>   	if (m->compressedlcs)
>   		goto out;
> @@ -498,7 +507,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
>   
>   	switch (m->type) {
>   	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>   		/*
>   		 * if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
>   		 * rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
> @@ -553,7 +563,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
>   			DBG_BUGON(!m->delta[1] &&
>   				  m->clusterofs != 1 << lclusterbits);
>   		} else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
> -			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
> +			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
> +			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
>   			/* go on until the next HEAD lcluster */
>   			if (lcn != headlcn)
>   				break;
> @@ -612,7 +623,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>   
>   	switch (m.type) {
>   	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
> -	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
> +	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
>   		if (endoff >= m.clusterofs) {
>   			m.headtype = m.type;
>   			map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
> @@ -654,6 +666,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
>   
>   	if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
>   		map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
> +	else if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2)
> +		map->m_algorithmformat = vi->z_algorithmtype[1];
>   	else
>   		map->m_algorithmformat = vi->z_algorithmtype[0];
>   
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 2/3] erofs: introduce the secondary compression head
  2021-10-17 15:27       ` Chao Yu
@ 2021-10-17 15:32         ` Gao Xiang
  -1 siblings, 0 replies; 33+ messages in thread
From: Gao Xiang @ 2021-10-17 15:32 UTC (permalink / raw)
  To: Chao Yu; +Cc: Gao Xiang, linux-erofs, LKML, Yue Hu, Gao Xiang

Hi Chao,

On Sun, Oct 17, 2021 at 11:27:54PM +0800, Chao Yu wrote:
> On 2021/10/10 2:12, Gao Xiang wrote:
> > From: Gao Xiang <hsiangkao@linux.alibaba.com>
> > 
> > Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
> > lcluster to indicate whether the whole pcluster is compressed or not.
> > 
> > In this patch, a new HEAD2 head type is introduced to specify another
> > compression algorithm other than the primary algorithm for each
> > compressed file, which can be used for upcoming LZMA compression and
> > LZ4 range dictionary compression for various data patterns.
> > 
> > It has been stayed in the EROFS roadmap for years. Complete it now!
> > 
> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > ---
> > v2: https://lore.kernel.org/r/20211008200839.24541-3-xiang@kernel.org
> > changes since v2:
> >   - simplify z_algorithmtype check suggested by Yue.
> > 
> >   fs/erofs/erofs_fs.h |  8 +++++---
> >   fs/erofs/zmap.c     | 38 ++++++++++++++++++++++++++------------
> >   2 files changed, 31 insertions(+), 15 deletions(-)
> > 
> > diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
> > index b0b23f41abc3..f579c8c78fff 100644
> > --- a/fs/erofs/erofs_fs.h
> > +++ b/fs/erofs/erofs_fs.h
> > @@ -21,11 +21,13 @@
> >   #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS	0x00000002
> >   #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER	0x00000002
> >   #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
> > +#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2	0x00000008
> >   #define EROFS_ALL_FEATURE_INCOMPAT		\
> >   	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
> >   	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
> >   	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
> > -	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
> > +	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
> > +	 EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
> >   #define EROFS_SB_EXTSLOT_SIZE	16
> > @@ -314,9 +316,9 @@ struct z_erofs_map_header {
> >    */
> >   enum {
> >   	Z_EROFS_VLE_CLUSTER_TYPE_PLAIN		= 0,
> > -	Z_EROFS_VLE_CLUSTER_TYPE_HEAD		= 1,
> > +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD1		= 1,
> >   	Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD	= 2,
> > -	Z_EROFS_VLE_CLUSTER_TYPE_RESERVED	= 3,
> > +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD2		= 3,
> 
> It needs to update comments above as well.

okay, let me revise them now.

Thanks,
Gao XIang

> 
> Thanks,

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 2/3] erofs: introduce the secondary compression head
@ 2021-10-17 15:32         ` Gao Xiang
  0 siblings, 0 replies; 33+ messages in thread
From: Gao Xiang @ 2021-10-17 15:32 UTC (permalink / raw)
  To: Chao Yu; +Cc: Gao Xiang, linux-erofs, LKML

Hi Chao,

On Sun, Oct 17, 2021 at 11:27:54PM +0800, Chao Yu wrote:
> On 2021/10/10 2:12, Gao Xiang wrote:
> > From: Gao Xiang <hsiangkao@linux.alibaba.com>
> > 
> > Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
> > lcluster to indicate whether the whole pcluster is compressed or not.
> > 
> > In this patch, a new HEAD2 head type is introduced to specify another
> > compression algorithm other than the primary algorithm for each
> > compressed file, which can be used for upcoming LZMA compression and
> > LZ4 range dictionary compression for various data patterns.
> > 
> > It has been stayed in the EROFS roadmap for years. Complete it now!
> > 
> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > ---
> > v2: https://lore.kernel.org/r/20211008200839.24541-3-xiang@kernel.org
> > changes since v2:
> >   - simplify z_algorithmtype check suggested by Yue.
> > 
> >   fs/erofs/erofs_fs.h |  8 +++++---
> >   fs/erofs/zmap.c     | 38 ++++++++++++++++++++++++++------------
> >   2 files changed, 31 insertions(+), 15 deletions(-)
> > 
> > diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
> > index b0b23f41abc3..f579c8c78fff 100644
> > --- a/fs/erofs/erofs_fs.h
> > +++ b/fs/erofs/erofs_fs.h
> > @@ -21,11 +21,13 @@
> >   #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS	0x00000002
> >   #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER	0x00000002
> >   #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
> > +#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2	0x00000008
> >   #define EROFS_ALL_FEATURE_INCOMPAT		\
> >   	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
> >   	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
> >   	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
> > -	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE)
> > +	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
> > +	 EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
> >   #define EROFS_SB_EXTSLOT_SIZE	16
> > @@ -314,9 +316,9 @@ struct z_erofs_map_header {
> >    */
> >   enum {
> >   	Z_EROFS_VLE_CLUSTER_TYPE_PLAIN		= 0,
> > -	Z_EROFS_VLE_CLUSTER_TYPE_HEAD		= 1,
> > +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD1		= 1,
> >   	Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD	= 2,
> > -	Z_EROFS_VLE_CLUSTER_TYPE_RESERVED	= 3,
> > +	Z_EROFS_VLE_CLUSTER_TYPE_HEAD2		= 3,
> 
> It needs to update comments above as well.

okay, let me revise them now.

Thanks,
Gao XIang

> 
> Thanks,

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 3/3] erofs: introduce readmore decompression strategy
  2021-10-08 20:08   ` Gao Xiang
@ 2021-10-17 15:34     ` Chao Yu
  -1 siblings, 0 replies; 33+ messages in thread
From: Chao Yu @ 2021-10-17 15:34 UTC (permalink / raw)
  To: Gao Xiang, linux-erofs; +Cc: LKML, Yue Hu, Gao Xiang

On 2021/10/9 4:08, Gao Xiang wrote:
> From: Gao Xiang <hsiangkao@linux.alibaba.com>
> 
> Previously, the readahead window was strictly followed by EROFS
> decompression strategy in order to minimize extra memory footprint.
> However, it could become inefficient if just reading the partial
> requested data for much big LZ4 pclusters and the upcoming LZMA
> implementation.
> 
> Let's try to request the leading data in a pcluster without
> triggering memory reclaiming instead for the LZ4 approach first
> to boost up 100% randread of large big pclusters, and it has no real
> impact on low memory scenarios.
> 
> It also introduces a way to expand read lengths in order to decompress
> the whole pcluster, which is useful for LZMA since the algorithm
> itself is relatively slow and causes CPU bound, but LZ4 is not.
> 
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>   fs/erofs/internal.h | 13 ++++++
>   fs/erofs/zdata.c    | 99 ++++++++++++++++++++++++++++++++++++---------
>   2 files changed, 93 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 48bfc6eb2b02..7f96265ccbdb 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -307,6 +307,19 @@ static inline unsigned int erofs_inode_datalayout(unsigned int value)
>   			      EROFS_I_DATALAYOUT_BITS);
>   }
>   
> +/*
> + * Different from grab_cache_page_nowait(), reclaiming is never triggered
> + * when allocating new pages.
> + */
> +static inline
> +struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
> +					  pgoff_t index)
> +{
> +	return pagecache_get_page(mapping, index,
> +			FGP_LOCK|FGP_CREAT|FGP_NOFS|FGP_NOWAIT,
> +			readahead_gfp_mask(mapping) & ~__GFP_RECLAIM);
> +}
> +
>   extern const struct super_operations erofs_sops;
>   
>   extern const struct address_space_operations erofs_raw_access_aops;
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index 5c34ef66677f..febb018e10a7 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -1377,6 +1377,72 @@ static void z_erofs_runqueue(struct super_block *sb,
>   	z_erofs_decompress_queue(&io[JQ_SUBMIT], pagepool);
>   }
>   
> +/*
> + * Since partial uptodate is still unimplemented for now, we have to use
> + * approximate readmore strategies as a start.
> + */
> +static void z_erofs_pcluster_readmore(struct z_erofs_decompress_frontend *f,
> +				      struct readahead_control *rac,
> +				      erofs_off_t end,
> +				      struct list_head *pagepool,
> +				      bool backmost)
> +{
> +	struct inode *inode = f->inode;
> +	struct erofs_map_blocks *map = &f->map;
> +	erofs_off_t cur;
> +	int err;
> +
> +	if (backmost) {
> +		map->m_la = end;
> +		/* TODO: pass in EROFS_GET_BLOCKS_READMORE for LZMA later */
> +		err = z_erofs_map_blocks_iter(inode, map, 0);
> +		if (err)
> +			return;
> +
> +		/* expend ra for the trailing edge if readahead */
> +		if (rac) {
> +			loff_t newstart = readahead_pos(rac);
> +
> +			cur = round_up(map->m_la + map->m_llen, PAGE_SIZE);
> +			readahead_expand(rac, newstart, cur - newstart);
> +			return;
> +		}
> +		end = round_up(end, PAGE_SIZE);
> +	} else {
> +		end = round_up(map->m_la, PAGE_SIZE);
> +
> +		if (!map->m_llen)
> +			return;
> +	}
> +
> +	cur = map->m_la + map->m_llen - 1;
> +	while (cur >= end) {
> +		pgoff_t index = cur >> PAGE_SHIFT;
> +		struct page *page;
> +
> +		page = erofs_grab_cache_page_nowait(inode->i_mapping, index);
> +		if (!page)
> +			goto skip;
> +
> +		if (PageUptodate(page)) {
> +			unlock_page(page);
> +			put_page(page);
> +			goto skip;
> +		}
> +
> +		err = z_erofs_do_read_page(f, page, pagepool);
> +		if (err)
> +			erofs_err(inode->i_sb,
> +				  "readmore error at page %lu @ nid %llu",
> +				  index, EROFS_I(inode)->nid);
> +		put_page(page);
> +skip:
> +		if (cur < PAGE_SIZE)
> +			break;
> +		cur = (index << PAGE_SHIFT) - 1;

Looks a little bit weird to readahead backward, any special reason here?

Thanks,

> +	}
> +}
> +
>   static int z_erofs_readpage(struct file *file, struct page *page)
>   {
>   	struct inode *const inode = page->mapping->host;
> @@ -1385,10 +1451,13 @@ static int z_erofs_readpage(struct file *file, struct page *page)
>   	LIST_HEAD(pagepool);
>   
>   	trace_erofs_readpage(page, false);
> -
>   	f.headoffset = (erofs_off_t)page->index << PAGE_SHIFT;
>   
> +	z_erofs_pcluster_readmore(&f, NULL, f.headoffset + PAGE_SIZE - 1,
> +				  &pagepool, true);
>   	err = z_erofs_do_read_page(&f, page, &pagepool);
> +	z_erofs_pcluster_readmore(&f, NULL, 0, &pagepool, false);
> +
>   	(void)z_erofs_collector_end(&f.clt);
>   
>   	/* if some compressed cluster ready, need submit them anyway */
> @@ -1409,29 +1478,20 @@ static void z_erofs_readahead(struct readahead_control *rac)
>   {
>   	struct inode *const inode = rac->mapping->host;
>   	struct erofs_sb_info *const sbi = EROFS_I_SB(inode);
> -
> -	unsigned int nr_pages = readahead_count(rac);
> -	bool sync = (sbi->ctx.readahead_sync_decompress &&
> -			nr_pages <= sbi->ctx.max_sync_decompress_pages);
>   	struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode);
>   	struct page *page, *head = NULL;
> +	unsigned int nr_pages;
>   	LIST_HEAD(pagepool);
>   
> -	trace_erofs_readpages(inode, readahead_index(rac), nr_pages, false);
> -
>   	f.readahead = true;
>   	f.headoffset = readahead_pos(rac);
>   
> -	while ((page = readahead_page(rac))) {
> -		prefetchw(&page->flags);
> -
> -		/*
> -		 * A pure asynchronous readahead is indicated if
> -		 * a PG_readahead marked page is hitted at first.
> -		 * Let's also do asynchronous decompression for this case.
> -		 */
> -		sync &= !(PageReadahead(page) && !head);
> +	z_erofs_pcluster_readmore(&f, rac, f.headoffset +
> +				  readahead_length(rac) - 1, &pagepool, true);
> +	nr_pages = readahead_count(rac);
> +	trace_erofs_readpages(inode, readahead_index(rac), nr_pages, false);
>   
> +	while ((page = readahead_page(rac))) {
>   		set_page_private(page, (unsigned long)head);
>   		head = page;
>   	}
> @@ -1450,11 +1510,12 @@ static void z_erofs_readahead(struct readahead_control *rac)
>   				  page->index, EROFS_I(inode)->nid);
>   		put_page(page);
>   	}
> -
> +	z_erofs_pcluster_readmore(&f, rac, 0, &pagepool, false);
>   	(void)z_erofs_collector_end(&f.clt);
>   
> -	z_erofs_runqueue(inode->i_sb, &f, &pagepool, sync);
> -
> +	z_erofs_runqueue(inode->i_sb, &f, &pagepool,
> +			 sbi->ctx.readahead_sync_decompress &&
> +			 nr_pages <= sbi->ctx.max_sync_decompress_pages);
>   	if (f.map.mpage)
>   		put_page(f.map.mpage);
>   
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 3/3] erofs: introduce readmore decompression strategy
@ 2021-10-17 15:34     ` Chao Yu
  0 siblings, 0 replies; 33+ messages in thread
From: Chao Yu @ 2021-10-17 15:34 UTC (permalink / raw)
  To: Gao Xiang, linux-erofs; +Cc: Gao Xiang, LKML

On 2021/10/9 4:08, Gao Xiang wrote:
> From: Gao Xiang <hsiangkao@linux.alibaba.com>
> 
> Previously, the readahead window was strictly followed by EROFS
> decompression strategy in order to minimize extra memory footprint.
> However, it could become inefficient if just reading the partial
> requested data for much big LZ4 pclusters and the upcoming LZMA
> implementation.
> 
> Let's try to request the leading data in a pcluster without
> triggering memory reclaiming instead for the LZ4 approach first
> to boost up 100% randread of large big pclusters, and it has no real
> impact on low memory scenarios.
> 
> It also introduces a way to expand read lengths in order to decompress
> the whole pcluster, which is useful for LZMA since the algorithm
> itself is relatively slow and causes CPU bound, but LZ4 is not.
> 
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>   fs/erofs/internal.h | 13 ++++++
>   fs/erofs/zdata.c    | 99 ++++++++++++++++++++++++++++++++++++---------
>   2 files changed, 93 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 48bfc6eb2b02..7f96265ccbdb 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -307,6 +307,19 @@ static inline unsigned int erofs_inode_datalayout(unsigned int value)
>   			      EROFS_I_DATALAYOUT_BITS);
>   }
>   
> +/*
> + * Different from grab_cache_page_nowait(), reclaiming is never triggered
> + * when allocating new pages.
> + */
> +static inline
> +struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
> +					  pgoff_t index)
> +{
> +	return pagecache_get_page(mapping, index,
> +			FGP_LOCK|FGP_CREAT|FGP_NOFS|FGP_NOWAIT,
> +			readahead_gfp_mask(mapping) & ~__GFP_RECLAIM);
> +}
> +
>   extern const struct super_operations erofs_sops;
>   
>   extern const struct address_space_operations erofs_raw_access_aops;
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index 5c34ef66677f..febb018e10a7 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -1377,6 +1377,72 @@ static void z_erofs_runqueue(struct super_block *sb,
>   	z_erofs_decompress_queue(&io[JQ_SUBMIT], pagepool);
>   }
>   
> +/*
> + * Since partial uptodate is still unimplemented for now, we have to use
> + * approximate readmore strategies as a start.
> + */
> +static void z_erofs_pcluster_readmore(struct z_erofs_decompress_frontend *f,
> +				      struct readahead_control *rac,
> +				      erofs_off_t end,
> +				      struct list_head *pagepool,
> +				      bool backmost)
> +{
> +	struct inode *inode = f->inode;
> +	struct erofs_map_blocks *map = &f->map;
> +	erofs_off_t cur;
> +	int err;
> +
> +	if (backmost) {
> +		map->m_la = end;
> +		/* TODO: pass in EROFS_GET_BLOCKS_READMORE for LZMA later */
> +		err = z_erofs_map_blocks_iter(inode, map, 0);
> +		if (err)
> +			return;
> +
> +		/* expend ra for the trailing edge if readahead */
> +		if (rac) {
> +			loff_t newstart = readahead_pos(rac);
> +
> +			cur = round_up(map->m_la + map->m_llen, PAGE_SIZE);
> +			readahead_expand(rac, newstart, cur - newstart);
> +			return;
> +		}
> +		end = round_up(end, PAGE_SIZE);
> +	} else {
> +		end = round_up(map->m_la, PAGE_SIZE);
> +
> +		if (!map->m_llen)
> +			return;
> +	}
> +
> +	cur = map->m_la + map->m_llen - 1;
> +	while (cur >= end) {
> +		pgoff_t index = cur >> PAGE_SHIFT;
> +		struct page *page;
> +
> +		page = erofs_grab_cache_page_nowait(inode->i_mapping, index);
> +		if (!page)
> +			goto skip;
> +
> +		if (PageUptodate(page)) {
> +			unlock_page(page);
> +			put_page(page);
> +			goto skip;
> +		}
> +
> +		err = z_erofs_do_read_page(f, page, pagepool);
> +		if (err)
> +			erofs_err(inode->i_sb,
> +				  "readmore error at page %lu @ nid %llu",
> +				  index, EROFS_I(inode)->nid);
> +		put_page(page);
> +skip:
> +		if (cur < PAGE_SIZE)
> +			break;
> +		cur = (index << PAGE_SHIFT) - 1;

Looks a little bit weird to readahead backward, any special reason here?

Thanks,

> +	}
> +}
> +
>   static int z_erofs_readpage(struct file *file, struct page *page)
>   {
>   	struct inode *const inode = page->mapping->host;
> @@ -1385,10 +1451,13 @@ static int z_erofs_readpage(struct file *file, struct page *page)
>   	LIST_HEAD(pagepool);
>   
>   	trace_erofs_readpage(page, false);
> -
>   	f.headoffset = (erofs_off_t)page->index << PAGE_SHIFT;
>   
> +	z_erofs_pcluster_readmore(&f, NULL, f.headoffset + PAGE_SIZE - 1,
> +				  &pagepool, true);
>   	err = z_erofs_do_read_page(&f, page, &pagepool);
> +	z_erofs_pcluster_readmore(&f, NULL, 0, &pagepool, false);
> +
>   	(void)z_erofs_collector_end(&f.clt);
>   
>   	/* if some compressed cluster ready, need submit them anyway */
> @@ -1409,29 +1478,20 @@ static void z_erofs_readahead(struct readahead_control *rac)
>   {
>   	struct inode *const inode = rac->mapping->host;
>   	struct erofs_sb_info *const sbi = EROFS_I_SB(inode);
> -
> -	unsigned int nr_pages = readahead_count(rac);
> -	bool sync = (sbi->ctx.readahead_sync_decompress &&
> -			nr_pages <= sbi->ctx.max_sync_decompress_pages);
>   	struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode);
>   	struct page *page, *head = NULL;
> +	unsigned int nr_pages;
>   	LIST_HEAD(pagepool);
>   
> -	trace_erofs_readpages(inode, readahead_index(rac), nr_pages, false);
> -
>   	f.readahead = true;
>   	f.headoffset = readahead_pos(rac);
>   
> -	while ((page = readahead_page(rac))) {
> -		prefetchw(&page->flags);
> -
> -		/*
> -		 * A pure asynchronous readahead is indicated if
> -		 * a PG_readahead marked page is hitted at first.
> -		 * Let's also do asynchronous decompression for this case.
> -		 */
> -		sync &= !(PageReadahead(page) && !head);
> +	z_erofs_pcluster_readmore(&f, rac, f.headoffset +
> +				  readahead_length(rac) - 1, &pagepool, true);
> +	nr_pages = readahead_count(rac);
> +	trace_erofs_readpages(inode, readahead_index(rac), nr_pages, false);
>   
> +	while ((page = readahead_page(rac))) {
>   		set_page_private(page, (unsigned long)head);
>   		head = page;
>   	}
> @@ -1450,11 +1510,12 @@ static void z_erofs_readahead(struct readahead_control *rac)
>   				  page->index, EROFS_I(inode)->nid);
>   		put_page(page);
>   	}
> -
> +	z_erofs_pcluster_readmore(&f, rac, 0, &pagepool, false);
>   	(void)z_erofs_collector_end(&f.clt);
>   
> -	z_erofs_runqueue(inode->i_sb, &f, &pagepool, sync);
> -
> +	z_erofs_runqueue(inode->i_sb, &f, &pagepool,
> +			 sbi->ctx.readahead_sync_decompress &&
> +			 nr_pages <= sbi->ctx.max_sync_decompress_pages);
>   	if (f.map.mpage)
>   		put_page(f.map.mpage);
>   
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 3/3] erofs: introduce readmore decompression strategy
  2021-10-17 15:34     ` Chao Yu
@ 2021-10-17 15:42       ` Gao Xiang
  -1 siblings, 0 replies; 33+ messages in thread
From: Gao Xiang @ 2021-10-17 15:42 UTC (permalink / raw)
  To: Chao Yu; +Cc: Gao Xiang, linux-erofs, LKML, Yue Hu, Gao Xiang

On Sun, Oct 17, 2021 at 11:34:22PM +0800, Chao Yu wrote:
> On 2021/10/9 4:08, Gao Xiang wrote:
> > From: Gao Xiang <hsiangkao@linux.alibaba.com>
> > 
> > Previously, the readahead window was strictly followed by EROFS
> > decompression strategy in order to minimize extra memory footprint.
> > However, it could become inefficient if just reading the partial
> > requested data for much big LZ4 pclusters and the upcoming LZMA
> > implementation.
> > 
> > Let's try to request the leading data in a pcluster without
> > triggering memory reclaiming instead for the LZ4 approach first
> > to boost up 100% randread of large big pclusters, and it has no real
> > impact on low memory scenarios.
> > 
> > It also introduces a way to expand read lengths in order to decompress
> > the whole pcluster, which is useful for LZMA since the algorithm
> > itself is relatively slow and causes CPU bound, but LZ4 is not.
> > 
> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > ---
> >   fs/erofs/internal.h | 13 ++++++
> >   fs/erofs/zdata.c    | 99 ++++++++++++++++++++++++++++++++++++---------
> >   2 files changed, 93 insertions(+), 19 deletions(-)
> > 
> > diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> > index 48bfc6eb2b02..7f96265ccbdb 100644
> > --- a/fs/erofs/internal.h
> > +++ b/fs/erofs/internal.h
> > @@ -307,6 +307,19 @@ static inline unsigned int erofs_inode_datalayout(unsigned int value)
> >   			      EROFS_I_DATALAYOUT_BITS);
> >   }
> > +/*
> > + * Different from grab_cache_page_nowait(), reclaiming is never triggered
> > + * when allocating new pages.
> > + */
> > +static inline
> > +struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
> > +					  pgoff_t index)
> > +{
> > +	return pagecache_get_page(mapping, index,
> > +			FGP_LOCK|FGP_CREAT|FGP_NOFS|FGP_NOWAIT,
> > +			readahead_gfp_mask(mapping) & ~__GFP_RECLAIM);
> > +}
> > +
> >   extern const struct super_operations erofs_sops;
> >   extern const struct address_space_operations erofs_raw_access_aops;
> > diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> > index 5c34ef66677f..febb018e10a7 100644
> > --- a/fs/erofs/zdata.c
> > +++ b/fs/erofs/zdata.c
> > @@ -1377,6 +1377,72 @@ static void z_erofs_runqueue(struct super_block *sb,
> >   	z_erofs_decompress_queue(&io[JQ_SUBMIT], pagepool);
> >   }
> > +/*
> > + * Since partial uptodate is still unimplemented for now, we have to use
> > + * approximate readmore strategies as a start.
> > + */
> > +static void z_erofs_pcluster_readmore(struct z_erofs_decompress_frontend *f,
> > +				      struct readahead_control *rac,
> > +				      erofs_off_t end,
> > +				      struct list_head *pagepool,
> > +				      bool backmost)
> > +{
> > +	struct inode *inode = f->inode;
> > +	struct erofs_map_blocks *map = &f->map;
> > +	erofs_off_t cur;
> > +	int err;
> > +
> > +	if (backmost) {
> > +		map->m_la = end;
> > +		/* TODO: pass in EROFS_GET_BLOCKS_READMORE for LZMA later */
> > +		err = z_erofs_map_blocks_iter(inode, map, 0);
> > +		if (err)
> > +			return;
> > +
> > +		/* expend ra for the trailing edge if readahead */
> > +		if (rac) {
> > +			loff_t newstart = readahead_pos(rac);
> > +
> > +			cur = round_up(map->m_la + map->m_llen, PAGE_SIZE);
> > +			readahead_expand(rac, newstart, cur - newstart);
> > +			return;
> > +		}
> > +		end = round_up(end, PAGE_SIZE);
> > +	} else {
> > +		end = round_up(map->m_la, PAGE_SIZE);
> > +
> > +		if (!map->m_llen)
> > +			return;
> > +	}
> > +
> > +	cur = map->m_la + map->m_llen - 1;
> > +	while (cur >= end) {
> > +		pgoff_t index = cur >> PAGE_SHIFT;
> > +		struct page *page;
> > +
> > +		page = erofs_grab_cache_page_nowait(inode->i_mapping, index);
> > +		if (!page)
> > +			goto skip;
> > +
> > +		if (PageUptodate(page)) {
> > +			unlock_page(page);
> > +			put_page(page);
> > +			goto skip;
> > +		}
> > +
> > +		err = z_erofs_do_read_page(f, page, pagepool);
> > +		if (err)
> > +			erofs_err(inode->i_sb,
> > +				  "readmore error at page %lu @ nid %llu",
> > +				  index, EROFS_I(inode)->nid);
> > +		put_page(page);
> > +skip:
> > +		if (cur < PAGE_SIZE)
> > +			break;
> > +		cur = (index << PAGE_SHIFT) - 1;
> 
> Looks a little bit weird to readahead backward, any special reason here?

Due to the do_read_page implementation, since I'd like to avoid
to get the exact full extent length (FIEMAP-likewise) inside
do_read_page but only request the needed range, so it should be
all in a backward way. Also the submission chain can be then in
a forward way.

If the question was asked why we should read backward, as I said in the
commit message, big pclusters matter since we could read in more leading
data at once.

Thanks,
Gao Xiang

> 
> Thanks,

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 3/3] erofs: introduce readmore decompression strategy
@ 2021-10-17 15:42       ` Gao Xiang
  0 siblings, 0 replies; 33+ messages in thread
From: Gao Xiang @ 2021-10-17 15:42 UTC (permalink / raw)
  To: Chao Yu; +Cc: Gao Xiang, linux-erofs, LKML

On Sun, Oct 17, 2021 at 11:34:22PM +0800, Chao Yu wrote:
> On 2021/10/9 4:08, Gao Xiang wrote:
> > From: Gao Xiang <hsiangkao@linux.alibaba.com>
> > 
> > Previously, the readahead window was strictly followed by EROFS
> > decompression strategy in order to minimize extra memory footprint.
> > However, it could become inefficient if just reading the partial
> > requested data for much big LZ4 pclusters and the upcoming LZMA
> > implementation.
> > 
> > Let's try to request the leading data in a pcluster without
> > triggering memory reclaiming instead for the LZ4 approach first
> > to boost up 100% randread of large big pclusters, and it has no real
> > impact on low memory scenarios.
> > 
> > It also introduces a way to expand read lengths in order to decompress
> > the whole pcluster, which is useful for LZMA since the algorithm
> > itself is relatively slow and causes CPU bound, but LZ4 is not.
> > 
> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > ---
> >   fs/erofs/internal.h | 13 ++++++
> >   fs/erofs/zdata.c    | 99 ++++++++++++++++++++++++++++++++++++---------
> >   2 files changed, 93 insertions(+), 19 deletions(-)
> > 
> > diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> > index 48bfc6eb2b02..7f96265ccbdb 100644
> > --- a/fs/erofs/internal.h
> > +++ b/fs/erofs/internal.h
> > @@ -307,6 +307,19 @@ static inline unsigned int erofs_inode_datalayout(unsigned int value)
> >   			      EROFS_I_DATALAYOUT_BITS);
> >   }
> > +/*
> > + * Different from grab_cache_page_nowait(), reclaiming is never triggered
> > + * when allocating new pages.
> > + */
> > +static inline
> > +struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
> > +					  pgoff_t index)
> > +{
> > +	return pagecache_get_page(mapping, index,
> > +			FGP_LOCK|FGP_CREAT|FGP_NOFS|FGP_NOWAIT,
> > +			readahead_gfp_mask(mapping) & ~__GFP_RECLAIM);
> > +}
> > +
> >   extern const struct super_operations erofs_sops;
> >   extern const struct address_space_operations erofs_raw_access_aops;
> > diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> > index 5c34ef66677f..febb018e10a7 100644
> > --- a/fs/erofs/zdata.c
> > +++ b/fs/erofs/zdata.c
> > @@ -1377,6 +1377,72 @@ static void z_erofs_runqueue(struct super_block *sb,
> >   	z_erofs_decompress_queue(&io[JQ_SUBMIT], pagepool);
> >   }
> > +/*
> > + * Since partial uptodate is still unimplemented for now, we have to use
> > + * approximate readmore strategies as a start.
> > + */
> > +static void z_erofs_pcluster_readmore(struct z_erofs_decompress_frontend *f,
> > +				      struct readahead_control *rac,
> > +				      erofs_off_t end,
> > +				      struct list_head *pagepool,
> > +				      bool backmost)
> > +{
> > +	struct inode *inode = f->inode;
> > +	struct erofs_map_blocks *map = &f->map;
> > +	erofs_off_t cur;
> > +	int err;
> > +
> > +	if (backmost) {
> > +		map->m_la = end;
> > +		/* TODO: pass in EROFS_GET_BLOCKS_READMORE for LZMA later */
> > +		err = z_erofs_map_blocks_iter(inode, map, 0);
> > +		if (err)
> > +			return;
> > +
> > +		/* expend ra for the trailing edge if readahead */
> > +		if (rac) {
> > +			loff_t newstart = readahead_pos(rac);
> > +
> > +			cur = round_up(map->m_la + map->m_llen, PAGE_SIZE);
> > +			readahead_expand(rac, newstart, cur - newstart);
> > +			return;
> > +		}
> > +		end = round_up(end, PAGE_SIZE);
> > +	} else {
> > +		end = round_up(map->m_la, PAGE_SIZE);
> > +
> > +		if (!map->m_llen)
> > +			return;
> > +	}
> > +
> > +	cur = map->m_la + map->m_llen - 1;
> > +	while (cur >= end) {
> > +		pgoff_t index = cur >> PAGE_SHIFT;
> > +		struct page *page;
> > +
> > +		page = erofs_grab_cache_page_nowait(inode->i_mapping, index);
> > +		if (!page)
> > +			goto skip;
> > +
> > +		if (PageUptodate(page)) {
> > +			unlock_page(page);
> > +			put_page(page);
> > +			goto skip;
> > +		}
> > +
> > +		err = z_erofs_do_read_page(f, page, pagepool);
> > +		if (err)
> > +			erofs_err(inode->i_sb,
> > +				  "readmore error at page %lu @ nid %llu",
> > +				  index, EROFS_I(inode)->nid);
> > +		put_page(page);
> > +skip:
> > +		if (cur < PAGE_SIZE)
> > +			break;
> > +		cur = (index << PAGE_SHIFT) - 1;
> 
> Looks a little bit weird to readahead backward, any special reason here?

Due to the do_read_page implementation, since I'd like to avoid
to get the exact full extent length (FIEMAP-likewise) inside
do_read_page but only request the needed range, so it should be
all in a backward way. Also the submission chain can be then in
a forward way.

If the question was asked why we should read backward, as I said in the
commit message, big pclusters matter since we could read in more leading
data at once.

Thanks,
Gao Xiang

> 
> Thanks,

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 2/3] erofs: introduce the secondary compression head
  2021-10-09 18:12     ` Gao Xiang
@ 2021-10-17 16:57       ` Gao Xiang
  -1 siblings, 0 replies; 33+ messages in thread
From: Gao Xiang @ 2021-10-17 16:57 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: LKML, Gao Xiang, Yue Hu

From: Gao Xiang <hsiangkao@linux.alibaba.com>

Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
lcluster to indicate whether the whole pcluster is compressed or not.

In this patch, a new HEAD2 head type is introduced to specify another
compression algorithm other than the primary algorithm for each
compressed file, which can be used for upcoming LZMA compression and
LZ4 range dictionary compression for various data patterns.

It has been stayed in the EROFS roadmap for years. Complete it now!

Reviewed-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
changes since v3:
 - update comments about on-disk lclusters suggested by Chao.

 fs/erofs/erofs_fs.h | 39 ++++++++++++++++++++-------------------
 fs/erofs/zmap.c     | 41 ++++++++++++++++++++++++++++-------------
 2 files changed, 48 insertions(+), 32 deletions(-)

diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index e480b3854d88..87736cbf18cc 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -22,12 +22,14 @@
 #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER	0x00000002
 #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
 #define EROFS_FEATURE_INCOMPAT_DEVICE_TABLE	0x00000008
+#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2	0x00000008
 #define EROFS_ALL_FEATURE_INCOMPAT		\
 	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
 	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
 	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
 	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
-	 EROFS_FEATURE_INCOMPAT_DEVICE_TABLE)
+	 EROFS_FEATURE_INCOMPAT_DEVICE_TABLE | \
+	 EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
 
 #define EROFS_SB_EXTSLOT_SIZE	16
 
@@ -303,35 +305,34 @@ struct z_erofs_map_header {
 #define Z_EROFS_VLE_LEGACY_HEADER_PADDING       8
 
 /*
- * Fixed-sized output compression ondisk Logical Extent cluster type:
- *    0 - literal (uncompressed) cluster
- *    1 - compressed cluster (for the head logical cluster)
- *    2 - compressed cluster (for the other logical clusters)
+ * Fixed-sized output compression on-disk logical cluster type:
+ *    0   - literal (uncompressed) lcluster
+ *    1,3 - compressed lcluster (for HEAD lclusters)
+ *    2   - compressed lcluster (for NONHEAD lclusters)
  *
  * In detail,
- *    0 - literal (uncompressed) cluster,
+ *    0 - literal (uncompressed) lcluster,
  *        di_advise = 0
- *        di_clusterofs = the literal data offset of the cluster
- *        di_blkaddr = the blkaddr of the literal cluster
+ *        di_clusterofs = the literal data offset of the lcluster
+ *        di_blkaddr = the blkaddr of the literal pcluster
  *
- *    1 - compressed cluster (for the head logical cluster)
- *        di_advise = 1
- *        di_clusterofs = the decompressed data offset of the cluster
- *        di_blkaddr = the blkaddr of the compressed cluster
+ *    1,3 - compressed lcluster (for HEAD lclusters)
+ *        di_advise = 1 or 3
+ *        di_clusterofs = the decompressed data offset of the lcluster
+ *        di_blkaddr = the blkaddr of the compressed pcluster
  *
- *    2 - compressed cluster (for the other logical clusters)
+ *    2 - compressed cluster (for NONHEAD lclusters)
  *        di_advise = 2
  *        di_clusterofs =
- *           the decompressed data offset in its own head cluster
- *        di_u.delta[0] = distance to its corresponding head cluster
- *        di_u.delta[1] = distance to its corresponding tail cluster
- *                (di_advise could be 0, 1 or 2)
+ *           the decompressed data offset in its own HEAD lcluster
+ *        di_u.delta[0] = distance to this HEAD lcluster
+ *        di_u.delta[1] = distance to the next HEAD lcluster
  */
 enum {
 	Z_EROFS_VLE_CLUSTER_TYPE_PLAIN		= 0,
-	Z_EROFS_VLE_CLUSTER_TYPE_HEAD		= 1,
+	Z_EROFS_VLE_CLUSTER_TYPE_HEAD1		= 1,
 	Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD	= 2,
-	Z_EROFS_VLE_CLUSTER_TYPE_RESERVED	= 3,
+	Z_EROFS_VLE_CLUSTER_TYPE_HEAD2		= 3,
 	Z_EROFS_VLE_CLUSTER_TYPE_MAX
 };
 
diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index 1c3b068e5a42..85d0289429b3 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -28,7 +28,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
 {
 	struct erofs_inode *const vi = EROFS_I(inode);
 	struct super_block *const sb = inode->i_sb;
-	int err;
+	int err, headnr;
 	erofs_off_t pos;
 	struct page *page;
 	void *kaddr;
@@ -68,9 +68,11 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
 	vi->z_algorithmtype[0] = h->h_algorithmtype & 15;
 	vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
 
-	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
-		erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
-			  vi->z_algorithmtype[0], vi->nid);
+	headnr = 0;
+	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX ||
+	    vi->z_algorithmtype[++headnr] >= Z_EROFS_COMPRESSION_MAX) {
+		erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel",
+			  headnr + 1, vi->z_algorithmtype[headnr], vi->nid);
 		err = -EOPNOTSUPP;
 		goto unmap_done;
 	}
@@ -178,7 +180,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
 		m->clusterofs = 1 << vi->z_logical_clusterbits;
 		m->delta[0] = le16_to_cpu(di->di_u.delta[0]);
 		if (m->delta[0] & Z_EROFS_VLE_DI_D0_CBLKCNT) {
-			if (!(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
+			if (!(vi->z_advise & (Z_EROFS_ADVISE_BIG_PCLUSTER_1 |
+					Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
 				DBG_BUGON(1);
 				return -EFSCORRUPTED;
 			}
@@ -189,7 +192,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
 		m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
 		break;
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		m->clusterofs = le16_to_cpu(di->di_clusterofs);
 		m->pblk = le32_to_cpu(di->di_u.blkaddr);
 		break;
@@ -446,7 +450,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
 		}
 		return z_erofs_extent_lookback(m, m->delta[0]);
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		m->headtype = m->type;
 		map->m_la = (lcn << lclusterbits) | m->clusterofs;
 		break;
@@ -470,13 +475,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
 	int err;
 
 	DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
-		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
+		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
+		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
+	DBG_BUGON(m->type != m->headtype);
+
 	if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
-	    !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
+	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
+	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
+	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
+	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
 		map->m_plen = 1 << lclusterbits;
 		return 0;
 	}
-
 	lcn = m->lcn + 1;
 	if (m->compressedlcs)
 		goto out;
@@ -498,7 +508,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
 
 	switch (m->type) {
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		/*
 		 * if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
 		 * rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
@@ -553,7 +564,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
 			DBG_BUGON(!m->delta[1] &&
 				  m->clusterofs != 1 << lclusterbits);
 		} else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
-			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
+			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
+			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
 			/* go on until the next HEAD lcluster */
 			if (lcn != headlcn)
 				break;
@@ -613,7 +625,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
 
 	switch (m.type) {
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		if (endoff >= m.clusterofs) {
 			m.headtype = m.type;
 			map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
@@ -654,6 +667,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
 
 	if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
 		map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
+	else if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2)
+		map->m_algorithmformat = vi->z_algorithmtype[1];
 	else
 		map->m_algorithmformat = vi->z_algorithmtype[0];
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 2/3] erofs: introduce the secondary compression head
@ 2021-10-17 16:57       ` Gao Xiang
  0 siblings, 0 replies; 33+ messages in thread
From: Gao Xiang @ 2021-10-17 16:57 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: Gao Xiang, Yue Hu, LKML

From: Gao Xiang <hsiangkao@linux.alibaba.com>

Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
lcluster to indicate whether the whole pcluster is compressed or not.

In this patch, a new HEAD2 head type is introduced to specify another
compression algorithm other than the primary algorithm for each
compressed file, which can be used for upcoming LZMA compression and
LZ4 range dictionary compression for various data patterns.

It has been stayed in the EROFS roadmap for years. Complete it now!

Reviewed-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
changes since v3:
 - update comments about on-disk lclusters suggested by Chao.

 fs/erofs/erofs_fs.h | 39 ++++++++++++++++++++-------------------
 fs/erofs/zmap.c     | 41 ++++++++++++++++++++++++++++-------------
 2 files changed, 48 insertions(+), 32 deletions(-)

diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index e480b3854d88..87736cbf18cc 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -22,12 +22,14 @@
 #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER	0x00000002
 #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE	0x00000004
 #define EROFS_FEATURE_INCOMPAT_DEVICE_TABLE	0x00000008
+#define EROFS_FEATURE_INCOMPAT_COMPR_HEAD2	0x00000008
 #define EROFS_ALL_FEATURE_INCOMPAT		\
 	(EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
 	 EROFS_FEATURE_INCOMPAT_COMPR_CFGS | \
 	 EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER | \
 	 EROFS_FEATURE_INCOMPAT_CHUNKED_FILE | \
-	 EROFS_FEATURE_INCOMPAT_DEVICE_TABLE)
+	 EROFS_FEATURE_INCOMPAT_DEVICE_TABLE | \
+	 EROFS_FEATURE_INCOMPAT_COMPR_HEAD2)
 
 #define EROFS_SB_EXTSLOT_SIZE	16
 
@@ -303,35 +305,34 @@ struct z_erofs_map_header {
 #define Z_EROFS_VLE_LEGACY_HEADER_PADDING       8
 
 /*
- * Fixed-sized output compression ondisk Logical Extent cluster type:
- *    0 - literal (uncompressed) cluster
- *    1 - compressed cluster (for the head logical cluster)
- *    2 - compressed cluster (for the other logical clusters)
+ * Fixed-sized output compression on-disk logical cluster type:
+ *    0   - literal (uncompressed) lcluster
+ *    1,3 - compressed lcluster (for HEAD lclusters)
+ *    2   - compressed lcluster (for NONHEAD lclusters)
  *
  * In detail,
- *    0 - literal (uncompressed) cluster,
+ *    0 - literal (uncompressed) lcluster,
  *        di_advise = 0
- *        di_clusterofs = the literal data offset of the cluster
- *        di_blkaddr = the blkaddr of the literal cluster
+ *        di_clusterofs = the literal data offset of the lcluster
+ *        di_blkaddr = the blkaddr of the literal pcluster
  *
- *    1 - compressed cluster (for the head logical cluster)
- *        di_advise = 1
- *        di_clusterofs = the decompressed data offset of the cluster
- *        di_blkaddr = the blkaddr of the compressed cluster
+ *    1,3 - compressed lcluster (for HEAD lclusters)
+ *        di_advise = 1 or 3
+ *        di_clusterofs = the decompressed data offset of the lcluster
+ *        di_blkaddr = the blkaddr of the compressed pcluster
  *
- *    2 - compressed cluster (for the other logical clusters)
+ *    2 - compressed cluster (for NONHEAD lclusters)
  *        di_advise = 2
  *        di_clusterofs =
- *           the decompressed data offset in its own head cluster
- *        di_u.delta[0] = distance to its corresponding head cluster
- *        di_u.delta[1] = distance to its corresponding tail cluster
- *                (di_advise could be 0, 1 or 2)
+ *           the decompressed data offset in its own HEAD lcluster
+ *        di_u.delta[0] = distance to this HEAD lcluster
+ *        di_u.delta[1] = distance to the next HEAD lcluster
  */
 enum {
 	Z_EROFS_VLE_CLUSTER_TYPE_PLAIN		= 0,
-	Z_EROFS_VLE_CLUSTER_TYPE_HEAD		= 1,
+	Z_EROFS_VLE_CLUSTER_TYPE_HEAD1		= 1,
 	Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD	= 2,
-	Z_EROFS_VLE_CLUSTER_TYPE_RESERVED	= 3,
+	Z_EROFS_VLE_CLUSTER_TYPE_HEAD2		= 3,
 	Z_EROFS_VLE_CLUSTER_TYPE_MAX
 };
 
diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index 1c3b068e5a42..85d0289429b3 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -28,7 +28,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
 {
 	struct erofs_inode *const vi = EROFS_I(inode);
 	struct super_block *const sb = inode->i_sb;
-	int err;
+	int err, headnr;
 	erofs_off_t pos;
 	struct page *page;
 	void *kaddr;
@@ -68,9 +68,11 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
 	vi->z_algorithmtype[0] = h->h_algorithmtype & 15;
 	vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
 
-	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
-		erofs_err(sb, "unknown compression format %u for nid %llu, please upgrade kernel",
-			  vi->z_algorithmtype[0], vi->nid);
+	headnr = 0;
+	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX ||
+	    vi->z_algorithmtype[++headnr] >= Z_EROFS_COMPRESSION_MAX) {
+		erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel",
+			  headnr + 1, vi->z_algorithmtype[headnr], vi->nid);
 		err = -EOPNOTSUPP;
 		goto unmap_done;
 	}
@@ -178,7 +180,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
 		m->clusterofs = 1 << vi->z_logical_clusterbits;
 		m->delta[0] = le16_to_cpu(di->di_u.delta[0]);
 		if (m->delta[0] & Z_EROFS_VLE_DI_D0_CBLKCNT) {
-			if (!(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
+			if (!(vi->z_advise & (Z_EROFS_ADVISE_BIG_PCLUSTER_1 |
+					Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
 				DBG_BUGON(1);
 				return -EFSCORRUPTED;
 			}
@@ -189,7 +192,8 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
 		m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
 		break;
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		m->clusterofs = le16_to_cpu(di->di_clusterofs);
 		m->pblk = le32_to_cpu(di->di_u.blkaddr);
 		break;
@@ -446,7 +450,8 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m,
 		}
 		return z_erofs_extent_lookback(m, m->delta[0]);
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		m->headtype = m->type;
 		map->m_la = (lcn << lclusterbits) | m->clusterofs;
 		break;
@@ -470,13 +475,18 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
 	int err;
 
 	DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN &&
-		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD);
+		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 &&
+		  m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD2);
+	DBG_BUGON(m->type != m->headtype);
+
 	if (m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
-	    !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) {
+	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1) &&
+	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) ||
+	    ((m->headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) &&
+	     !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_2))) {
 		map->m_plen = 1 << lclusterbits;
 		return 0;
 	}
-
 	lcn = m->lcn + 1;
 	if (m->compressedlcs)
 		goto out;
@@ -498,7 +508,8 @@ static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m,
 
 	switch (m->type) {
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		/*
 		 * if the 1st NONHEAD lcluster is actually PLAIN or HEAD type
 		 * rather than CBLKCNT, it's a 1 lcluster-sized pcluster.
@@ -553,7 +564,8 @@ static int z_erofs_get_extent_decompressedlen(struct z_erofs_maprecorder *m)
 			DBG_BUGON(!m->delta[1] &&
 				  m->clusterofs != 1 << lclusterbits);
 		} else if (m->type == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN ||
-			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD) {
+			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD1 ||
+			   m->type == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2) {
 			/* go on until the next HEAD lcluster */
 			if (lcn != headlcn)
 				break;
@@ -613,7 +625,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
 
 	switch (m.type) {
 	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD1:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD2:
 		if (endoff >= m.clusterofs) {
 			m.headtype = m.type;
 			map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
@@ -654,6 +667,8 @@ int z_erofs_map_blocks_iter(struct inode *inode,
 
 	if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN)
 		map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED;
+	else if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_HEAD2)
+		map->m_algorithmformat = vi->z_algorithmtype[1];
 	else
 		map->m_algorithmformat = vi->z_algorithmtype[0];
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 2/3] erofs: introduce the secondary compression head
  2021-10-17 16:57       ` Gao Xiang
@ 2021-10-19 12:56         ` Chao Yu
  -1 siblings, 0 replies; 33+ messages in thread
From: Chao Yu @ 2021-10-19 12:56 UTC (permalink / raw)
  To: Gao Xiang, linux-erofs; +Cc: LKML, Gao Xiang, Yue Hu

On 2021/10/18 0:57, Gao Xiang wrote:
> From: Gao Xiang <hsiangkao@linux.alibaba.com>
> 
> Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
> lcluster to indicate whether the whole pcluster is compressed or not.
> 
> In this patch, a new HEAD2 head type is introduced to specify another
> compression algorithm other than the primary algorithm for each
> compressed file, which can be used for upcoming LZMA compression and
> LZ4 range dictionary compression for various data patterns.
> 
> It has been stayed in the EROFS roadmap for years. Complete it now!
> 
> Reviewed-by: Yue Hu <huyue2@yulong.com>
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Reviewed-by: Chao Yu <chao@kernel.org>

Thanks,

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 2/3] erofs: introduce the secondary compression head
@ 2021-10-19 12:56         ` Chao Yu
  0 siblings, 0 replies; 33+ messages in thread
From: Chao Yu @ 2021-10-19 12:56 UTC (permalink / raw)
  To: Gao Xiang, linux-erofs; +Cc: Gao Xiang, Yue Hu, LKML

On 2021/10/18 0:57, Gao Xiang wrote:
> From: Gao Xiang <hsiangkao@linux.alibaba.com>
> 
> Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
> lcluster to indicate whether the whole pcluster is compressed or not.
> 
> In this patch, a new HEAD2 head type is introduced to specify another
> compression algorithm other than the primary algorithm for each
> compressed file, which can be used for upcoming LZMA compression and
> LZ4 range dictionary compression for various data patterns.
> 
> It has been stayed in the EROFS roadmap for years. Complete it now!
> 
> Reviewed-by: Yue Hu <huyue2@yulong.com>
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Reviewed-by: Chao Yu <chao@kernel.org>

Thanks,

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 3/3] erofs: introduce readmore decompression strategy
  2021-10-17 15:42       ` Gao Xiang
  (?)
@ 2021-10-19 12:58       ` Chao Yu
  -1 siblings, 0 replies; 33+ messages in thread
From: Chao Yu @ 2021-10-19 12:58 UTC (permalink / raw)
  To: Gao Xiang, linux-erofs, LKML, Yue Hu, Gao Xiang

On 2021/10/17 23:42, Gao Xiang wrote:
> On Sun, Oct 17, 2021 at 11:34:22PM +0800, Chao Yu wrote:
>> On 2021/10/9 4:08, Gao Xiang wrote:
>>> From: Gao Xiang <hsiangkao@linux.alibaba.com>
>>>
>>> Previously, the readahead window was strictly followed by EROFS
>>> decompression strategy in order to minimize extra memory footprint.
>>> However, it could become inefficient if just reading the partial
>>> requested data for much big LZ4 pclusters and the upcoming LZMA
>>> implementation.
>>>
>>> Let's try to request the leading data in a pcluster without
>>> triggering memory reclaiming instead for the LZ4 approach first
>>> to boost up 100% randread of large big pclusters, and it has no real
>>> impact on low memory scenarios.
>>>
>>> It also introduces a way to expand read lengths in order to decompress
>>> the whole pcluster, which is useful for LZMA since the algorithm
>>> itself is relatively slow and causes CPU bound, but LZ4 is not.
>>>
>>> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Looks fine to me now.

Reviewed-by: Chao Yu <chao@kernel.org>

Thanks,

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2021-10-19 12:58 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-08 20:08 [PATCH v2 0/3] erofs: some decompression improvements Gao Xiang
2021-10-08 20:08 ` Gao Xiang
2021-10-08 20:08 ` [PATCH v2 1/3] erofs: get compression algorithms directly on mapping Gao Xiang
2021-10-08 20:08   ` Gao Xiang
2021-10-09  1:52   ` Yue Hu
2021-10-09  1:52     ` Yue Hu
2021-10-17 15:25   ` Chao Yu
2021-10-17 15:25     ` Chao Yu
2021-10-08 20:08 ` [PATCH v2 2/3] erofs: introduce the secondary compression head Gao Xiang
2021-10-08 20:08   ` Gao Xiang
2021-10-09  3:50   ` Yue Hu
2021-10-09  3:50     ` Yue Hu
2021-10-09  4:47     ` Gao Xiang
2021-10-09  4:47       ` Gao Xiang
2021-10-09 18:12   ` [PATCH v3 " Gao Xiang
2021-10-09 18:12     ` Gao Xiang
2021-10-10  0:53     ` Yue Hu
2021-10-10  0:53       ` Yue Hu
2021-10-17 15:27     ` Chao Yu
2021-10-17 15:27       ` Chao Yu
2021-10-17 15:32       ` Gao Xiang
2021-10-17 15:32         ` Gao Xiang
2021-10-17 16:57     ` [PATCH v4 " Gao Xiang
2021-10-17 16:57       ` Gao Xiang
2021-10-19 12:56       ` Chao Yu
2021-10-19 12:56         ` Chao Yu
2021-10-08 20:08 ` [PATCH v2 3/3] erofs: introduce readmore decompression strategy Gao Xiang
2021-10-08 20:08   ` Gao Xiang
2021-10-17 15:34   ` Chao Yu
2021-10-17 15:34     ` Chao Yu
2021-10-17 15:42     ` Gao Xiang
2021-10-17 15:42       ` Gao Xiang
2021-10-19 12:58       ` Chao Yu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.