From: Gao Xiang <xiang@kernel.org> To: linux-erofs@lists.ozlabs.org, Chao Yu <yuchao0@huawei.com>, Chao Yu <chao@kernel.org> Cc: LKML <linux-kernel@vger.kernel.org>, Gao Xiang <xiang@kernel.org> Subject: [PATCH v3 00/10] erofs: add big pcluster compression support Date: Wed, 7 Apr 2021 12:39:17 +0800 [thread overview] Message-ID: <20210407043927.10623-1-xiang@kernel.org> (raw) Hi folks, This is the formal version of EROFS big pcluster support, which means EROFS can compress data into more than 1 fs block after this patchset. {l,p}cluster are EROFS-specific concepts, standing for `logical cluster' and `physical cluster' correspondingly. Logical cluster is the basic unit of compress indexes in file logical mapping, e.g. it can build compress indexes in 2 blocks rather than 1 block (currently only 1 block lcluster is supported). Physical cluster is a container of physical compressed blocks which contains compressed data, the size of which is the multiple of lclustersize. Different from previous thoughts, which had fixed-sized pclusterblks recorded in the on-disk compress index header, our on-disk design allows variable-sized pclusterblks now. The main reasons are - user data varies in compression ratio locally, so fixed-sized clustersize approach is space-wasting and causes extra read amplification for high CR cases; - inplace decompression needs zero padding to guarantee its safe margin, but we don't want to pad more than 1 fs block for big pcluster; - end users can now customize the pcluster size according to data type since various pclustersize can exist in a file, for example, using different pcluster size for executable code and one-shot data. such design should be more flexible than many other public compression fses (Btw, each file in EROFS can have maximum 2 algorithms at the same time by using HEAD1/2, which will be formally added with LZMA support.) In brief, EROFS can now compress from variable-sized input to variable-sized pcluster blocks, as illustrated below: |<-_lcluster_->|________________________|<-_lcluster_->| |____._________|_________ .. ___________|_______.______| . . . . .__________________________________. |______________| .. |______________| |<- pcluster ->| The next step would be how to record the compressed block count in lclusters. In compress indexes, there are 2 concepts called HEAD and NONHEAD lclusters. The difference is that HEAD lcluster starts a new pcluster in the lcluster, but NONHEAD not. It's easy to understand that big pclusters at least have 2 pclusters, thus at least 2 lclusters as well. Therefore, let the delta0 (distance to its HEAD lcluster) of first NONHEAD compress index store the compressed block count with a special flag as a new called CBLKCNT compress index. It's also easy to know its delta0 is constantly 1, as illustrated below: ________________________________________________________ |_HEAD_|_CBLKCNT_|_NONHEAD_|_..._|_NONHEAD_|_HEAD | HEAD | |<------ a pcluster with CBLKCNT --------->|<-- -->| ^ a pcluster with 1 If another HEAD follows a HEAD lcluster, there is no room to record CBLKCNT, but it's easy to know the size of pcluster will be 1. More implementation details about this and compact indexes are in the commit message. On the runtime performance side, the current EROFS test results are: ________________________________________________________________ | file system | size | seq read | rand read | rand9m read | |_______________|___________|_ MiB/s __|__ MiB/s __|___ MiB/s ___| |___erofs_4k____|_556879872_|_ 781.4 __|__ 55.3 ___|___ 25.3 ___| |___erofs_16k___|_452509696_|_ 864.8 __|_ 123.2 ___|___ 20.8 ___| |___erofs_32k___|_415223808_|_ 899.8 __|_ 105.8 _*_|___ 16.8 ____| |___erofs_64k___|_393814016_|_ 906.6 __|__ 66.6 _*_|___ 11.8 ____| |__squashfs_8k__|_556191744_|_ 64.9 __|__ 19.3 ___|____ 9.1 ____| |__squashfs_16k_|_502661120_|_ 98.9 __|__ 38.0 ___|____ 9.8 ____| |__squashfs_32k_|_458784768_|_ 115.4 __|__ 71.6 _*_|___ 10.0 ____| |_squashfs_128k_|_398204928_|_ 257.2 __|_ 253.8 _*_|___ 10.9 ____| |____ext4_4k____|____()_____|_ 786.6 __|__ 28.6 ___|___ 27.8 ____| * Squashfs grabs more page cache to keep all decompressed data with grab_cache_page_nowait() than the normal requested readahead (see squashfs_copy_cache and squashfs_readpage_block). In principle, EROFS can also cache such all decompressed data if necessary, yet it's low priority for now and has little use (rand9m is actually a better rand read workload, since the amount of I/O is 9m rather than full-sized 1000m). More details are in https://lore.kernel.org/r/20210329053654.GA3281654@xiangao.remote.csb Also it's easy to know EROFS is not a fixed pcluster design, so users can make several optimized strategy according to data type when mkfs. And there is still room to optimize runtime performance for big pcluster even further. Finally, it passes ro_fsstress and can also successfully boot buildroot & Android system with android-mainline repo. current mkfs repo for big pcluster: https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git -b experimental-bigpcluster-compact Thanks for your time on reading this! Thanks, Gao Xiang changes since v2: - introduce a new erofs_vm_ram_map() helper to reduce duplicated logic and fix uninitialized variable pointed out by Colin & Joe; - add a new EXPERIMENTAL warning for new big pcluster feature to end users. changes since v1: - add a missing vunmap in erofs_pcpubuf_exit(); - refine comments and commit messages. Gao Xiang (10): erofs: reserve physical_clusterbits[] erofs: introduce multipage per-CPU buffers erofs: introduce physical cluster slab pools erofs: fix up inplace I/O pointer for big pcluster erofs: add big physical cluster definition erofs: adjust per-CPU buffers according to max_pclusterblks erofs: support parsing big pcluster compress indexes erofs: support parsing big pcluster compact indexes erofs: support decompress big pcluster for lz4 backend erofs: enable big pcluster feature fs/erofs/Kconfig | 14 --- fs/erofs/Makefile | 2 +- fs/erofs/decompressor.c | 235 ++++++++++++++++++++++++---------------- fs/erofs/erofs_fs.h | 31 ++++-- fs/erofs/internal.h | 44 ++++---- fs/erofs/pcpubuf.c | 134 +++++++++++++++++++++++ fs/erofs/super.c | 1 + fs/erofs/utils.c | 12 -- fs/erofs/zdata.c | 193 +++++++++++++++++++++------------ fs/erofs/zdata.h | 14 +-- fs/erofs/zmap.c | 162 ++++++++++++++++++++++----- 11 files changed, 587 insertions(+), 255 deletions(-) create mode 100644 fs/erofs/pcpubuf.c -- 2.20.1
WARNING: multiple messages have this Message-ID (diff)
From: Gao Xiang <xiang@kernel.org> To: linux-erofs@lists.ozlabs.org, Chao Yu <yuchao0@huawei.com>, Chao Yu <chao@kernel.org> Cc: Gao Xiang <xiang@kernel.org>, LKML <linux-kernel@vger.kernel.org> Subject: [PATCH v3 00/10] erofs: add big pcluster compression support Date: Wed, 7 Apr 2021 12:39:17 +0800 [thread overview] Message-ID: <20210407043927.10623-1-xiang@kernel.org> (raw) Hi folks, This is the formal version of EROFS big pcluster support, which means EROFS can compress data into more than 1 fs block after this patchset. {l,p}cluster are EROFS-specific concepts, standing for `logical cluster' and `physical cluster' correspondingly. Logical cluster is the basic unit of compress indexes in file logical mapping, e.g. it can build compress indexes in 2 blocks rather than 1 block (currently only 1 block lcluster is supported). Physical cluster is a container of physical compressed blocks which contains compressed data, the size of which is the multiple of lclustersize. Different from previous thoughts, which had fixed-sized pclusterblks recorded in the on-disk compress index header, our on-disk design allows variable-sized pclusterblks now. The main reasons are - user data varies in compression ratio locally, so fixed-sized clustersize approach is space-wasting and causes extra read amplification for high CR cases; - inplace decompression needs zero padding to guarantee its safe margin, but we don't want to pad more than 1 fs block for big pcluster; - end users can now customize the pcluster size according to data type since various pclustersize can exist in a file, for example, using different pcluster size for executable code and one-shot data. such design should be more flexible than many other public compression fses (Btw, each file in EROFS can have maximum 2 algorithms at the same time by using HEAD1/2, which will be formally added with LZMA support.) In brief, EROFS can now compress from variable-sized input to variable-sized pcluster blocks, as illustrated below: |<-_lcluster_->|________________________|<-_lcluster_->| |____._________|_________ .. ___________|_______.______| . . . . .__________________________________. |______________| .. |______________| |<- pcluster ->| The next step would be how to record the compressed block count in lclusters. In compress indexes, there are 2 concepts called HEAD and NONHEAD lclusters. The difference is that HEAD lcluster starts a new pcluster in the lcluster, but NONHEAD not. It's easy to understand that big pclusters at least have 2 pclusters, thus at least 2 lclusters as well. Therefore, let the delta0 (distance to its HEAD lcluster) of first NONHEAD compress index store the compressed block count with a special flag as a new called CBLKCNT compress index. It's also easy to know its delta0 is constantly 1, as illustrated below: ________________________________________________________ |_HEAD_|_CBLKCNT_|_NONHEAD_|_..._|_NONHEAD_|_HEAD | HEAD | |<------ a pcluster with CBLKCNT --------->|<-- -->| ^ a pcluster with 1 If another HEAD follows a HEAD lcluster, there is no room to record CBLKCNT, but it's easy to know the size of pcluster will be 1. More implementation details about this and compact indexes are in the commit message. On the runtime performance side, the current EROFS test results are: ________________________________________________________________ | file system | size | seq read | rand read | rand9m read | |_______________|___________|_ MiB/s __|__ MiB/s __|___ MiB/s ___| |___erofs_4k____|_556879872_|_ 781.4 __|__ 55.3 ___|___ 25.3 ___| |___erofs_16k___|_452509696_|_ 864.8 __|_ 123.2 ___|___ 20.8 ___| |___erofs_32k___|_415223808_|_ 899.8 __|_ 105.8 _*_|___ 16.8 ____| |___erofs_64k___|_393814016_|_ 906.6 __|__ 66.6 _*_|___ 11.8 ____| |__squashfs_8k__|_556191744_|_ 64.9 __|__ 19.3 ___|____ 9.1 ____| |__squashfs_16k_|_502661120_|_ 98.9 __|__ 38.0 ___|____ 9.8 ____| |__squashfs_32k_|_458784768_|_ 115.4 __|__ 71.6 _*_|___ 10.0 ____| |_squashfs_128k_|_398204928_|_ 257.2 __|_ 253.8 _*_|___ 10.9 ____| |____ext4_4k____|____()_____|_ 786.6 __|__ 28.6 ___|___ 27.8 ____| * Squashfs grabs more page cache to keep all decompressed data with grab_cache_page_nowait() than the normal requested readahead (see squashfs_copy_cache and squashfs_readpage_block). In principle, EROFS can also cache such all decompressed data if necessary, yet it's low priority for now and has little use (rand9m is actually a better rand read workload, since the amount of I/O is 9m rather than full-sized 1000m). More details are in https://lore.kernel.org/r/20210329053654.GA3281654@xiangao.remote.csb Also it's easy to know EROFS is not a fixed pcluster design, so users can make several optimized strategy according to data type when mkfs. And there is still room to optimize runtime performance for big pcluster even further. Finally, it passes ro_fsstress and can also successfully boot buildroot & Android system with android-mainline repo. current mkfs repo for big pcluster: https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git -b experimental-bigpcluster-compact Thanks for your time on reading this! Thanks, Gao Xiang changes since v2: - introduce a new erofs_vm_ram_map() helper to reduce duplicated logic and fix uninitialized variable pointed out by Colin & Joe; - add a new EXPERIMENTAL warning for new big pcluster feature to end users. changes since v1: - add a missing vunmap in erofs_pcpubuf_exit(); - refine comments and commit messages. Gao Xiang (10): erofs: reserve physical_clusterbits[] erofs: introduce multipage per-CPU buffers erofs: introduce physical cluster slab pools erofs: fix up inplace I/O pointer for big pcluster erofs: add big physical cluster definition erofs: adjust per-CPU buffers according to max_pclusterblks erofs: support parsing big pcluster compress indexes erofs: support parsing big pcluster compact indexes erofs: support decompress big pcluster for lz4 backend erofs: enable big pcluster feature fs/erofs/Kconfig | 14 --- fs/erofs/Makefile | 2 +- fs/erofs/decompressor.c | 235 ++++++++++++++++++++++++---------------- fs/erofs/erofs_fs.h | 31 ++++-- fs/erofs/internal.h | 44 ++++---- fs/erofs/pcpubuf.c | 134 +++++++++++++++++++++++ fs/erofs/super.c | 1 + fs/erofs/utils.c | 12 -- fs/erofs/zdata.c | 193 +++++++++++++++++++++------------ fs/erofs/zdata.h | 14 +-- fs/erofs/zmap.c | 162 ++++++++++++++++++++++----- 11 files changed, 587 insertions(+), 255 deletions(-) create mode 100644 fs/erofs/pcpubuf.c -- 2.20.1
next reply other threads:[~2021-04-07 4:39 UTC|newest] Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-04-07 4:39 Gao Xiang [this message] 2021-04-07 4:39 ` [PATCH v3 00/10] erofs: add big pcluster compression support Gao Xiang 2021-04-07 4:39 ` [PATCH v3 01/10] erofs: reserve physical_clusterbits[] Gao Xiang 2021-04-07 4:39 ` Gao Xiang 2021-04-07 4:39 ` [PATCH v3 02/10] erofs: introduce multipage per-CPU buffers Gao Xiang 2021-04-07 4:39 ` Gao Xiang 2021-04-09 10:53 ` [PATCH v3.1 " Gao Xiang 2021-04-09 10:53 ` Gao Xiang 2021-04-09 19:06 ` [PATCH v3.2 " Gao Xiang 2021-04-09 19:06 ` Gao Xiang 2021-04-07 4:39 ` [PATCH v3 03/10] erofs: introduce physical cluster slab pools Gao Xiang 2021-04-07 4:39 ` Gao Xiang 2021-04-07 4:39 ` [PATCH v3 04/10] erofs: fix up inplace I/O pointer for big pcluster Gao Xiang 2021-04-07 4:39 ` Gao Xiang 2021-04-07 4:39 ` [PATCH v3 05/10] erofs: add big physical cluster definition Gao Xiang 2021-04-07 4:39 ` Gao Xiang 2021-04-07 4:39 ` [PATCH v3 06/10] erofs: adjust per-CPU buffers according to max_pclusterblks Gao Xiang 2021-04-07 4:39 ` Gao Xiang 2021-04-07 4:39 ` [PATCH v3 07/10] erofs: support parsing big pcluster compress indexes Gao Xiang 2021-04-07 4:39 ` Gao Xiang 2021-04-07 4:39 ` [PATCH v3 08/10] erofs: support parsing big pcluster compact indexes Gao Xiang 2021-04-07 4:39 ` Gao Xiang 2021-04-07 4:39 ` [PATCH v3 09/10] erofs: support decompress big pcluster for lz4 backend Gao Xiang 2021-04-07 4:39 ` Gao Xiang 2021-04-07 4:39 ` [PATCH v3 10/10] erofs: enable big pcluster feature Gao Xiang 2021-04-07 4:39 ` Gao Xiang
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210407043927.10623-1-xiang@kernel.org \ --to=xiang@kernel.org \ --cc=chao@kernel.org \ --cc=linux-erofs@lists.ozlabs.org \ --cc=linux-kernel@vger.kernel.org \ --cc=yuchao0@huawei.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.