From: Gao Xiang <hsiangkao@aol.com> Hi folks, This is the first RFC patch of multiple block compression (including erofsfuse) after I carefully think over the on-disk design to support multiblock in-place decompression. Compression ratio results (POC, lz4hc, lz4-1.9.3, not final result): 1000000000 enwik9 621211648 enwik9_4k.squashfs.img 557858816 enwik9_4k.erofs.img 556191744 enwik9_8k.squashfs.img 502661120 enwik9_16k.squashfs.img 500723712 enwik9_8k.erofs.img 458784768 enwik9_32k.squashfs.img 453971968 enwik9_16k.erofs.img 422318080 enwik9_64k.squashfs.img 416686080 enwik9_32k.erofs.img 398204928 enwik9_128k.squashfs.img 395276288 enwik9_64k.erofs.img TODO: - support compact indexes for multiple block compression **; - support multithread compression (keep compressed data in memory); - carefully design kernel optimized paths to maximize runtime performance; - widely testing. If you think that'd be useful for your products and you also have interest in development, feel free to follow that as well since I don't have abundant free time so the progress might be somewhat slow (I tend to finish them all before the next LTS). Thanks, Gao Xiang Gao Xiang (3): erofs-utils: add -C# for the maximum size of pclusters erofs-utils: mkfs: support multiple block compression erofs-utils: fuse: support multiple block compression include/erofs/config.h | 2 ++ include/erofs/internal.h | 1 + include/erofs_fs.h | 19 ++++++++--- lib/compress.c | 70 ++++++++++++++++++++++++-------------- lib/config.c | 1 + lib/data.c | 4 +-- lib/zmap.c | 72 ++++++++++++++++++++++++++++++++++++---- mkfs/main.c | 14 +++++++- 8 files changed, 146 insertions(+), 37 deletions(-) -- 2.24.0
From: Gao Xiang <hsiangkao@aol.com> Set up -C >= EROFS_BLKSIZ (more specifically, >= lclustersize) to enable big pcluster feature. Signed-off-by: Gao Xiang <hsiangkao@aol.com> --- include/erofs/config.h | 2 ++ lib/config.c | 1 + mkfs/main.c | 14 +++++++++++++- 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/include/erofs/config.h b/include/erofs/config.h index 02ddf594ca60..5f5a05a8b796 100644 --- a/include/erofs/config.h +++ b/include/erofs/config.h @@ -53,6 +53,8 @@ struct erofs_configure { int c_force_inodeversion; /* < 0, xattr disabled and INT_MAX, always use inline xattrs */ int c_inline_xattr_tolerance; + + u32 c_physical_clusterblks; u64 c_unix_timestamp; #ifdef WITH_ANDROID char *mount_point; diff --git a/lib/config.c b/lib/config.c index 3ecd48140cfd..352a77c8d639 100644 --- a/lib/config.c +++ b/lib/config.c @@ -24,6 +24,7 @@ void erofs_init_configure(void) cfg.c_force_inodeversion = 0; cfg.c_inline_xattr_tolerance = 2; cfg.c_unix_timestamp = -1; + cfg.c_physical_clusterblks = 1; } void erofs_show_config(void) diff --git a/mkfs/main.c b/mkfs/main.c index abd48be0fa4f..c4c67c962919 100644 --- a/mkfs/main.c +++ b/mkfs/main.c @@ -62,6 +62,7 @@ static void usage(void) fputs("usage: [options] FILE DIRECTORY\n\n" "Generate erofs image from DIRECTORY to FILE, and [options] are:\n" " -zX[,Y] X=compressor (Y=compression level, optional)\n" + " -C# specify the size of compress physical cluster in bytes\n" " -d# set output message level to # (maximum 9)\n" " -x# set xattr tolerance to # (< 0, disable xattrs; default 2)\n" " -EX[,...] X=extended options\n" @@ -152,7 +153,7 @@ static int mkfs_parse_options_cfg(int argc, char *argv[]) char *endptr; int opt, i; - while((opt = getopt_long(argc, argv, "d:x:z:E:T:U:", + while((opt = getopt_long(argc, argv, "d:x:z:E:T:U:C:", long_options, NULL)) != -1) { switch (opt) { case 'z': @@ -248,6 +249,17 @@ static int mkfs_parse_options_cfg(int argc, char *argv[]) cfg.fs_config_file = optarg; break; #endif + case 'C': + i = strtoull(optarg, &endptr, 0); + if (*endptr != '\0' || + i < EROFS_BLKSIZ || i % EROFS_BLKSIZ) { + erofs_err("invalid physical clustersize %s", + optarg); + return -EINVAL; + } + cfg.c_physical_clusterblks = i / EROFS_BLKSIZ; + break; + case 1: usage(); exit(0); -- 2.24.0
From: Gao Xiang <hsiangkao@aol.com> Store compressed block count to the compressed index so that EROFS can compress from variable-sized input to variable-sized compressed blocks and make the in-place decompression possible as well. TODO: support storing compressed block count for compact indexes. Signed-off-by: Gao Xiang <hsiangkao@aol.com> --- include/erofs/internal.h | 1 + include/erofs_fs.h | 19 ++++++++--- lib/compress.c | 70 ++++++++++++++++++++++++++-------------- 3 files changed, 62 insertions(+), 28 deletions(-) diff --git a/include/erofs/internal.h b/include/erofs/internal.h index ac5b270329e2..de307e7f3d8f 100644 --- a/include/erofs/internal.h +++ b/include/erofs/internal.h @@ -104,6 +104,7 @@ static inline void erofs_sb_clear_##name(void) \ } EROFS_FEATURE_FUNCS(lz4_0padding, incompat, INCOMPAT_LZ4_0PADDING) +EROFS_FEATURE_FUNCS(big_pcluster, incompat, INCOMPAT_BIG_PCLUSTER) EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM) #define EROFS_I_EA_INITED (1 << 0) diff --git a/include/erofs_fs.h b/include/erofs_fs.h index a69f179a51a5..fa9467a2608c 100644 --- a/include/erofs_fs.h +++ b/include/erofs_fs.h @@ -20,7 +20,10 @@ * be incompatible with this kernel version. */ #define EROFS_FEATURE_INCOMPAT_LZ4_0PADDING 0x00000001 -#define EROFS_ALL_FEATURE_INCOMPAT EROFS_FEATURE_INCOMPAT_LZ4_0PADDING +#define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER 0x00000002 +#define EROFS_ALL_FEATURE_INCOMPAT \ + (EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \ + EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER) /* 128-byte erofs on-disk super block */ struct erofs_super_block { @@ -201,10 +204,11 @@ enum { * bit 0 : COMPACTED_2B indexes (0 - off; 1 - on) * e.g. for 4k logical cluster size, 4B if compacted 2B is off; * (4B) + 2B + (4B) if compacted 2B is on. + * bit 1 : HEAD1 big pcluster (0 - off; 1 - on) + * bit 2 : (reserved now) HEAD2 big pcluster (0 - off; 1 - on) */ -#define Z_EROFS_ADVISE_COMPACTED_2B_BIT 0 - -#define Z_EROFS_ADVISE_COMPACTED_2B (1 << Z_EROFS_ADVISE_COMPACTED_2B_BIT) +#define Z_EROFS_ADVISE_COMPACTED_2B 0x0001 +#define Z_EROFS_ADVISE_BIG_PCLUSTER_1 0x0002 struct z_erofs_map_header { __le32 h_reserved1; @@ -261,6 +265,13 @@ enum { #define Z_EROFS_VLE_DI_CLUSTER_TYPE_BITS 2 #define Z_EROFS_VLE_DI_CLUSTER_TYPE_BIT 0 +/* + * D0_CBLKCNT will be marked _only_ for the 1st non-head lcluster to + * store the compressed block count of a compressed extent (aka. block + * count of a pcluster). + */ +#define Z_EROFS_VLE_DI_D0_CBLKCNT 0x8000 + struct z_erofs_vle_decompressed_index { __le16 di_advise; /* where to decompress in the head cluster */ diff --git a/lib/compress.c b/lib/compress.c index 86db940b6edd..f340f432c6b7 100644 --- a/lib/compress.c +++ b/lib/compress.c @@ -29,8 +29,8 @@ struct z_erofs_vle_compress_ctx { u8 queue[EROFS_CONFIG_COMPR_MAX_SZ * 2]; unsigned int head, tail; - - erofs_blk_t blkaddr; /* pointing to the next blkaddr */ + unsigned int compressedblks; + erofs_blk_t blkaddr; /* pointing to the next blkaddr */ u16 clusterofs; }; @@ -89,7 +89,13 @@ static void vle_write_indexes(struct z_erofs_vle_compress_ctx *ctx, } do { - if (d0) { + /* XXX: big pcluster feature should be per-inode */ + if (d0 == 1 && cfg.c_physical_clusterblks > 1) { + type = Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD; + di.di_u.delta[0] = cpu_to_le16(ctx->compressedblks | + Z_EROFS_VLE_DI_D0_CBLKCNT); + di.di_u.delta[1] = cpu_to_le16(d1); + } else if (d0) { type = Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD; di.di_u.delta[0] = cpu_to_le16(d0); @@ -115,9 +121,8 @@ static void vle_write_indexes(struct z_erofs_vle_compress_ctx *ctx, ctx->clusterofs = clusterofs + count; } -static int write_uncompressed_block(struct z_erofs_vle_compress_ctx *ctx, - unsigned int *len, - char *dst) +static int write_uncompressed_extent(struct z_erofs_vle_compress_ctx *ctx, + unsigned int *len, char *dst) { int ret; unsigned int count; @@ -148,17 +153,19 @@ static int vle_compress_one(struct erofs_inode *inode, struct z_erofs_vle_compress_ctx *ctx, bool final) { + const unsigned int pclusterblks = cfg.c_physical_clusterblks; + const unsigned int pclustersize = pclusterblks * EROFS_BLKSIZ; struct erofs_compress *const h = &compresshandle; unsigned int len = ctx->tail - ctx->head; unsigned int count; int ret; - static char dstbuf[EROFS_BLKSIZ * 2]; + static char dstbuf[EROFS_CONFIG_COMPR_MAX_SZ + EROFS_BLKSIZ]; char *const dst = dstbuf + EROFS_BLKSIZ; while (len) { bool raw; - if (len <= EROFS_BLKSIZ) { + if (len <= pclustersize) { if (final) goto nocompression; break; @@ -167,7 +174,7 @@ static int vle_compress_one(struct erofs_inode *inode, count = len; ret = erofs_compress_destsize(h, compressionlevel, ctx->queue + ctx->head, - &count, dst, EROFS_BLKSIZ); + &count, dst, pclustersize); if (ret <= 0) { if (ret != -EAGAIN) { erofs_err("failed to compress %s: %s", @@ -175,32 +182,36 @@ static int vle_compress_one(struct erofs_inode *inode, erofs_strerror(ret)); } nocompression: - ret = write_uncompressed_block(ctx, &len, dst); + ret = write_uncompressed_extent(ctx, &len, dst); if (ret < 0) return ret; count = ret; + ctx->compressedblks = 1; raw = true; } else { - /* write compressed data */ - erofs_dbg("Writing %u compressed data to block %u", - count, ctx->blkaddr); + const unsigned int used = ret & (EROFS_BLKSIZ - 1); + const unsigned int margin = + erofs_sb_has_lz4_0padding() && used ? + EROFS_BLKSIZ - used : 0; - if (erofs_sb_has_lz4_0padding()) - ret = blk_write(dst - (EROFS_BLKSIZ - ret), - ctx->blkaddr, 1); - else - ret = blk_write(dst, ctx->blkaddr, 1); + ctx->compressedblks = DIV_ROUND_UP(ret, EROFS_BLKSIZ); + /* write compressed data */ + erofs_dbg("Writing %u compressed data to %u of %u blocks", + count, ctx->blkaddr, ctx->compressedblks); + + ret = blk_write(dst - margin, ctx->blkaddr, + ctx->compressedblks); if (ret) return ret; raw = false; } ctx->head += count; - /* write compression indexes for this blkaddr */ + /* write compression indexes for this pcluster */ vle_write_indexes(ctx, count, raw); - ++ctx->blkaddr; + ctx->blkaddr += ctx->compressedblks; len -= count; if (!final && ctx->head >= EROFS_CONFIG_COMPR_MAX_SZ) { @@ -345,8 +356,6 @@ int z_erofs_convert_to_compacted_format(struct erofs_inode *inode, out = in = inode->compressmeta; - /* write out compacted header */ - memcpy(out, &mapheader, sizeof(mapheader)); out += sizeof(mapheader); in += Z_EROFS_LEGACY_MAP_HEADER_SIZE; @@ -415,6 +424,8 @@ int erofs_write_compressed_file(struct erofs_inode *inode) } memset(compressmeta, 0, Z_EROFS_LEGACY_MAP_HEADER_SIZE); + /* write out compressed header */ + memcpy(compressmeta, &mapheader, sizeof(mapheader)); blkaddr = erofs_mapbh(bh->block, true); /* start_blkaddr */ ctx.blkaddr = blkaddr; @@ -473,7 +484,8 @@ int erofs_write_compressed_file(struct erofs_inode *inode) inode->u.i_blocks = compressed_blocks; legacymetasize = ctx.metacur - compressmeta; - if (cfg.c_legacy_compress) { + /* XXX: temporarily use legacy index instead for mbpcluster */ + if (cfg.c_legacy_compress || cfg.c_physical_clusterblks > 1) { inode->extent_isize = legacymetasize; inode->datalayout = EROFS_INODE_FLAT_COMPRESSION_LEGACY; } else { @@ -531,7 +543,17 @@ int z_erofs_compress_init(void) algorithmtype[0] = ret; /* primary algorithm (head 0) */ algorithmtype[1] = 0; /* secondary algorithm (head 1) */ - mapheader.h_advise |= Z_EROFS_ADVISE_COMPACTED_2B; + mapheader.h_advise = 0; + if (!cfg.c_legacy_compress) + mapheader.h_advise |= Z_EROFS_ADVISE_COMPACTED_2B; + /* + * if big pcluster is enabled, an extra CBLKCNT lcluster index needs + * to be loaded in order to get those compressed block counts. + */ + if (cfg.c_physical_clusterblks > 1) { + erofs_sb_set_big_pcluster(); + mapheader.h_advise |= Z_EROFS_ADVISE_BIG_PCLUSTER_1; + } mapheader.h_algorithmtype = algorithmtype[1] << 4 | algorithmtype[0]; mapheader.h_clusterbits = LOG_BLOCK_SIZE - 12; -- 2.24.0
From: Gao Xiang <hsiangkao@aol.com> Add multiple block compression runtime support for erofsfuse. Signed-off-by: Gao Xiang <hsiangkao@aol.com> --- lib/data.c | 4 +-- lib/zmap.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 68 insertions(+), 8 deletions(-) diff --git a/lib/data.c b/lib/data.c index 3781846743aa..b330a91e4e34 100644 --- a/lib/data.c +++ b/lib/data.c @@ -127,7 +127,7 @@ static int z_erofs_read_data(struct erofs_inode *inode, char *buffer, }; bool partial; unsigned int algorithmformat; - char raw[EROFS_BLKSIZ]; + char raw[1024 * EROFS_BLKSIZ]; end = offset + size; while (end > offset) { @@ -142,7 +142,7 @@ static int z_erofs_read_data(struct erofs_inode *inode, char *buffer, continue; } - ret = dev_read(raw, map.m_pa, EROFS_BLKSIZ); + ret = dev_read(raw, map.m_pa, map.m_plen); if (ret < 0) return -EIO; diff --git a/lib/zmap.c b/lib/zmap.c index ee63de74cab2..096fd35cdeb3 100644 --- a/lib/zmap.c +++ b/lib/zmap.c @@ -14,7 +14,8 @@ int z_erofs_fill_inode(struct erofs_inode *vi) { - if (vi->datalayout == EROFS_INODE_FLAT_COMPRESSION_LEGACY) { + if (!erofs_sb_has_big_pcluster() && + vi->datalayout == EROFS_INODE_FLAT_COMPRESSION_LEGACY) { vi->z_advise = 0; vi->z_algorithmtype[0] = 0; vi->z_algorithmtype[1] = 0; @@ -37,7 +38,8 @@ static int z_erofs_fill_inode_lazy(struct erofs_inode *vi) if (vi->flags & EROFS_I_Z_INITED) return 0; - DBG_BUGON(vi->datalayout == EROFS_INODE_FLAT_COMPRESSION_LEGACY); + DBG_BUGON(!erofs_sb_has_big_pcluster() && + vi->datalayout == EROFS_INODE_FLAT_COMPRESSION_LEGACY); pos = round_up(iloc(vi->nid) + vi->inode_isize + vi->xattr_isize, 8); ret = dev_read(buf, pos, sizeof(buf)); @@ -81,7 +83,7 @@ struct z_erofs_maprecorder { u8 type; u16 clusterofs; u16 delta[2]; - erofs_blk_t pblk; + erofs_blk_t pblk, compressedlcs; }; static int z_erofs_reload_indexes(struct z_erofs_maprecorder *m, @@ -130,6 +132,15 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m, case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD: m->clusterofs = 1 << vi->z_logical_clusterbits; m->delta[0] = le16_to_cpu(di->di_u.delta[0]); + if (m->delta[0] & Z_EROFS_VLE_DI_D0_CBLKCNT) { + if (!(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1)) { + DBG_BUGON(1); + return -EFSCORRUPTED; + } + m->compressedlcs = m->delta[0] & + ~Z_EROFS_VLE_DI_D0_CBLKCNT; + m->delta[0] = 1; + } m->delta[1] = le16_to_cpu(di->di_u.delta[1]); break; case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN: @@ -333,6 +344,51 @@ static int z_erofs_extent_lookback(struct z_erofs_maprecorder *m, return 0; } +static int z_erofs_get_extent_compressedlen(struct z_erofs_maprecorder *m, + unsigned int initial_lcn) +{ + struct erofs_inode *const vi = m->inode; + struct erofs_map_blocks *const map = m->map; + const unsigned int lclusterbits = vi->z_logical_clusterbits; + unsigned long lcn; + int err; + + DBG_BUGON(m->type != Z_EROFS_VLE_CLUSTER_TYPE_PLAIN && + m->type != Z_EROFS_VLE_CLUSTER_TYPE_HEAD); + if (!((map->m_flags & EROFS_MAP_ZIPPED) && + (vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1))) { + map->m_plen = 1 << lclusterbits; + return 0; + } + + lcn = m->lcn + 1; + if (lcn == initial_lcn && !m->compressedlcs) + m->compressedlcs = 2; + + if (m->compressedlcs) + goto out; + + err = z_erofs_load_cluster_from_disk(m, lcn); + if (err) + return err; + + switch (m->type) { + case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD: + DBG_BUGON(m->delta[0] != 1); + if (m->compressedlcs) { + break; + } + default: + erofs_err("cannot found CBLKCNT @ lcn %lu of nid %llu", + lcn, (unsigned long long)vi->nid); + DBG_BUGON(1); + return -EFSCORRUPTED; + } +out: + map->m_plen = m->compressedlcs << lclusterbits; + return 0; +} + int z_erofs_map_blocks_iter(struct erofs_inode *vi, struct erofs_map_blocks *map) { @@ -343,6 +399,7 @@ int z_erofs_map_blocks_iter(struct erofs_inode *vi, }; int err = 0; unsigned int lclusterbits, endoff; + unsigned long initial_lcn; unsigned long long ofs, end; /* when trying to read beyond EOF, leave it unmapped */ @@ -359,10 +416,10 @@ int z_erofs_map_blocks_iter(struct erofs_inode *vi, lclusterbits = vi->z_logical_clusterbits; ofs = map->m_la; - m.lcn = ofs >> lclusterbits; + initial_lcn = ofs >> lclusterbits; endoff = ofs & ((1 << lclusterbits) - 1); - err = z_erofs_load_cluster_from_disk(&m, m.lcn); + err = z_erofs_load_cluster_from_disk(&m, initial_lcn); if (err) goto out; @@ -401,8 +458,11 @@ int z_erofs_map_blocks_iter(struct erofs_inode *vi, } map->m_llen = end - map->m_la; - map->m_plen = 1 << lclusterbits; map->m_pa = blknr_to_addr(m.pblk); + + err = z_erofs_get_extent_compressedlen(&m, initial_lcn); + if (err) + goto out; map->m_flags |= EROFS_MAP_MAPPED; out: -- 2.24.0
On Wed, Dec 30, 2020 at 04:47:25PM +0800, Gao Xiang via Linux-erofs wrote:
> From: Gao Xiang <hsiangkao@aol.com>
>
> Hi folks,
>
> This is the first RFC patch of multiple block compression (including
> erofsfuse) after I carefully think over the on-disk design to support
> multiblock in-place decompression.
>
> Compression ratio results (POC, lz4hc, lz4-1.9.3, not final result):
> 1000000000 enwik9
> 621211648 enwik9_4k.squashfs.img
> 557858816 enwik9_4k.erofs.img
> 556191744 enwik9_8k.squashfs.img
> 502661120 enwik9_16k.squashfs.img
> 500723712 enwik9_8k.erofs.img
> 458784768 enwik9_32k.squashfs.img
> 453971968 enwik9_16k.erofs.img
> 422318080 enwik9_64k.squashfs.img
> 416686080 enwik9_32k.erofs.img
> 398204928 enwik9_128k.squashfs.img
> 395276288 enwik9_64k.erofs.img
I can also think out several compress strategies to control read amplification
but maintain a given C/R due to EROFS can compress variable-sized input data
to arbitary compressed block count for each pcluster, FYI.
Thanks,
Gao Xiang