From: Richard Weinberger <richard.weinberger@gmail.com>
To: Gao Xiang <gaoxiang25@huawei.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
miaoxie@huawei.com, yuchao0@huawei.com, sunqiuyang@huawei.com,
fangwei1@huawei.com, liguifu2@huawei.com, weidu.du@huawei.com,
chen.chun.yen@huawei.com, brooke.wangzhigang@hisilicon.com,
dongjinguang@huawei.com
Subject: Re: [NOMERGE] [RFC PATCH 00/12] erofs: introduce erofs file system
Date: Fri, 1 Jun 2018 09:48:12 +0200 [thread overview]
Message-ID: <CAFLxGvw5PXBLKuaaK5xipiwTOXohtdeenD0XQHgX6+r4rS=GqQ@mail.gmail.com> (raw)
In-Reply-To: <1527764767-22190-1-git-send-email-gaoxiang25@huawei.com>
On Thu, May 31, 2018 at 1:06 PM, Gao Xiang <gaoxiang25@huawei.com> wrote:
> Hi all,
>
> Read-only file systems are used in many cases, such as read-only storage media.
> We are now focusing on the Android device which several read-only partitions exist.
> Due to limited read-only solutions, a new read-only file system EROFS
> (Extendable Read-Only File System) is introduced.
In which sense is it extendable?
> As the other read-only file systems, several meta regions in generic file systems
> such as free space bitmap are omitted. But the difference is that EROFS focuses
> more on performance than purely on saving storage space as much as possible.
>
> Furthermore, we also add the compression support called z_erofs.
>
> Traditional file systems with the compression support use the fixed-sized input
> compression, the output compressed units could be arbitrary lengths.
> However, data is accessed in the block unit for block devices, which means
> (A) if the accessed compressed data is not buffered, some data read from
> the physical block cannot be further utilized, which is illustrated as follows:
>
> ++-----------++-----------++ ++-----------++-----------++
> ...|| || || ... || || || ... original data
> ++-----------++-----------++ ++-----------++-----------++
> \ / \ /
> \ / \ /
> \ / \ /
> ++---|-------++--|--------++ ++-----|----++--------|--++
> ||xxx| || |xxxxxxxx|| ... ||xxxxx| || |xx|| compressed data
> ++---|-------++--|--------++ ++-----|----++--------|--++
>
> The shadow regions read from the block device but cannot be used for decompression.
>
> (B) If the compressed data is also buffered, it will increase the memory overhead.
> Because these are compressed data, it cannot be directly used, and we don't know
> when the corresponding compressed blocks are accessed, which is not friendly to
> the random read.
>
> In order to reduce the proportion of the data which cannot be directly decompressed,
> larger compressed sizes are preferred to be selected, which is also not friendly to
> the random read.
>
> Erofs implements the compression in a different approach, the details of which will
> be discussed in the next section.
>
> In brief, the following points summarize our design at a high level:
>
> 1) Use page-sized blocks so that there are no buffer heads.
>
> 2) By introducing a more general inline data / xattr, metadata and small data have
> the opportunity to be read with the inode metadata at the same time.
>
> 3) Introduce another shared xattr region in order to store the common xattrs (eg.
> selinux labels) or xattrs too large to be suitable for meta inline.
>
> 4) Metadata and data could be mixed by design, so it could be more flexible for mkfs
> to organize files and data.
>
> 5) instead of using the fixed-sized input compression, we put forward a new fixed
> output compression to make the full use of IO (which means all data from IO can be
> decompressed), reduce the read amplification, improve random read and keep the
> relatively lower compression ratios, illustrated as follows:
>
>
> |---- varient-length extent ----|------ VLE ------|--- VLE ---|
> /> clusterofs /> clusterofs /> clusterofs /> clusterofs
> ++---|-------++-----------++---------|-++-----------++-|---------++-|
> ...|| | || || | || || | || | ... original data
> ++---|-------++-----------++---------|-++-----------++-|---------++-|
> ++->cluster<-++->cluster<-++->cluster<-++->cluster<-++->cluster<-++
> size size size size size
> \ / / /
> \ / / /
> \ / / /
> ++-----------++-----------++-----------++
> ... || || || || ... compressed clusters
> ++-----------++-----------++-----------++
> ++->cluster<-++->cluster<-++->cluster<-++
> size size size
>
> A cluster could have more than one blocks by design, but currently we only have the
> page-sized cluster implementation (page-sized fixed output compression can also have
> better compression ratio than fixed input compression).
>
> All compressed clusters have a fixed size but could be decompressed into extents with
> arbitrary lengths.
>
> In addition, if a buffered IO reads the following shadow region (x), we could make a more
> customized path (to replace generic_file_buffered_read) which only reads one compressed
> cluster and makes the partial page available.
> /> clusterofs
> ++---|-------++
> ...|| | xxxx || ...
> ||---|-------||
>
> Some numbers using fixed output compression (VLE, cluster size = block size = 4k) on
> the server and Android phone (kirin970 platform):
>
> Server (magnetic disk):
>
> compression EROFS seq read EXT4 seq read EROFS random read EXT4 random read
> ratio bw[MB/s] bw[MB/s] bw[MB/s] (20%) bw[MB/s] (20%)
>
> 4 480.3 502.5 69.8 11.1
> 10 472.3 503.3 56.4 10.0
> 15 457.6 495.3 47.0 10.9
> 26 401.5 511.2 34.7 11.1
> 35 389.1 512.5 28.0 11.0
> 48 375.4 496.5 23.2 10.6
> 53 370.2 512.0 21.8 11.0
> 66 349.2 512.0 19.0 11.4
> 76 310.5 497.3 17.3 11.6
> 85 301.2 512.0 16.0 11.0
> 94 292.7 496.5 14.6 11.1
> 100 538.9 512.0 11.4 10.8
>
> Kirin970 (A73 Big-core 2361Mhz, A53 little-core 0Mhz, DDR 1866Mhz):
What storage was used? An eMMC?
> compression EROFS seq read EXT4 seq read EROFS random read EXT4 random read
> ratio bw[MB/s] bw[MB/s] bw[MB/s] (20%) bw[MB/s] (20%)
>
> 4 546.7 544.3 157.7 57.9
> 10 535.7 521.0 152.7 62.0
> 15 529.0 520.3 125.0 65.0
> 26 418.0 526.3 97.6 63.7
> 35 367.7 511.7 89.0 63.7
> 48 415.7 500.7 78.2 61.2
> 53 423.0 566.7 72.8 62.9
> 66 334.3 537.3 69.8 58.3
> 76 387.3 546.0 65.2 56.0
> 85 306.3 546.0 63.8 57.7
> 94 345.0 589.7 59.2 49.9
> 100 579.7 556.7 62.1 57.7
How does it compare to existing read only filesystems, such as squashfs?
--
Thanks,
//richard
next prev parent reply other threads:[~2018-06-01 7:48 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-31 11:06 [NOMERGE] [RFC PATCH 00/12] erofs: introduce erofs file system Gao Xiang
2018-06-01 7:48 ` Richard Weinberger [this message]
2018-06-01 9:11 ` Gao Xiang
2018-06-01 9:28 ` Richard Weinberger
2018-06-01 11:16 ` Gao Xiang
2018-06-07 10:26 ` Pavel Machek
2018-07-27 0:55 ` Joey Pabalinas
2018-07-27 0:57 ` Joey Pabalinas
2018-07-26 12:21 ` [PATCH 00/25] staging: " Gao Xiang
2018-07-26 12:21 ` [PATCH 01/25] staging: erofs: add on-disk layout Gao Xiang
2018-07-26 12:21 ` [PATCH 02/25] staging: erofs: add erofs in-memory stuffs Gao Xiang
2018-07-26 12:21 ` [PATCH 03/25] staging: erofs: add super block operations Gao Xiang
2018-07-26 12:21 ` [PATCH 04/25] staging: erofs: add raw address_space operations Gao Xiang
2018-07-26 12:21 ` [PATCH 05/25] staging: erofs: add inode operations Gao Xiang
2018-07-26 12:21 ` [PATCH 06/25] staging: erofs: add directory operations Gao Xiang
2018-07-26 12:21 ` [PATCH 07/25] staging: erofs: add namei functions Gao Xiang
2018-07-26 12:21 ` [PATCH 08/25] staging: erofs: update Kconfig and Makefile Gao Xiang
2018-07-26 12:21 ` [PATCH 09/25] staging: erofs: introduce xattr & acl support Gao Xiang
2018-07-26 12:21 ` [PATCH 10/25] staging: erofs: support special inode Gao Xiang
2018-07-26 12:21 ` [PATCH 11/25] staging: erofs: introduce error injection infrastructure Gao Xiang
2018-07-26 12:21 ` [PATCH 12/25] staging: erofs: support tracepoint Gao Xiang
2018-07-26 12:21 ` [PATCH 13/25] staging: erofs: <linux/tagptr.h>: introduce tagged pointer Gao Xiang
2018-07-26 12:21 ` [PATCH 14/25] staging: erofs: introduce pagevec for unzip subsystem Gao Xiang
2018-07-26 12:21 ` [PATCH 15/25] staging: erofs: add erofs_map_blocks_iter Gao Xiang
2018-07-26 12:21 ` [PATCH 16/25] staging: erofs: add erofs_allocpage Gao Xiang
2018-07-26 12:22 ` [PATCH 17/25] staging: erofs: globalize prepare_bio and __submit_bio Gao Xiang
2018-07-26 12:22 ` [PATCH 18/25] staging: erofs: introduce a customized LZ4 decompression Gao Xiang
2018-07-26 12:22 ` [PATCH 19/25] staging: erofs: add a generic z_erofs VLE decompressor Gao Xiang
2018-07-26 12:22 ` [PATCH 20/25] staging: erofs: introduce superblock registration Gao Xiang
2018-07-26 12:22 ` [PATCH 21/25] staging: erofs: introduce erofs shrinker Gao Xiang
2018-07-26 12:22 ` [PATCH 22/25] staging: erofs: introduce workstation for decompression Gao Xiang
2018-07-26 12:22 ` [PATCH 23/25] staging: erofs: introduce VLE decompression support Gao Xiang
2018-07-26 12:22 ` [PATCH 24/25] staging: erofs: introduce cached decompression Gao Xiang
2018-07-26 12:22 ` [PATCH 25/25] staging: erofs: add a TODO and update MAINTAINERS for staging Gao Xiang
2018-07-28 7:10 ` [PATCH] staging: erofs: fix a compile warning of Z_EROFS_VLE_VMAP_ONSTACK_PAGES Gao Xiang
2018-07-28 10:43 ` Chao Yu
2018-07-29 5:34 ` [PATCH 1/2] staging: erofs: fix compile error without built-in decompression support Gao Xiang
2018-07-29 5:37 ` [PATCH 2/2] staging: erofs: fix conditional uninitialized `pcn' in z_erofs_map_blocks_iter Gao Xiang
2018-07-30 1:51 ` [PATCH] staging: erofs: use the wrapped PTR_ERR_OR_ZERO instead of open code Gao Xiang
2018-07-30 6:58 ` Chao Yu
2018-08-01 6:38 ` [PATCH 1/2] staging: erofs: add the missing break in z_erofs_map_blocks_iter Gao Xiang
2018-08-01 6:38 ` [PATCH 2/2] staging: erofs: remove a redundant marco in xattr Gao Xiang
2018-08-01 9:02 ` [PATCH 1/2] staging: erofs: add the missing break in z_erofs_map_blocks_iter Dan Carpenter
2018-08-01 9:19 ` Gao Xiang
2018-08-01 9:36 ` [PATCH RESEND " Gao Xiang
2018-08-01 11:36 ` Dan Carpenter
2018-08-01 12:08 ` Gao Xiang
2018-07-30 2:07 ` [PATCH 2/2] staging: erofs: fix conditional uninitialized `pcn' " Chao Yu
2018-07-30 2:07 ` [PATCH 1/2] staging: erofs: fix compile error without built-in decompression support Chao Yu
2018-07-30 2:32 ` Gao Xiang
2018-07-30 3:07 ` Chao Yu
2018-07-30 3:55 ` Gao Xiang
2018-07-27 0:25 ` [PATCH 00/25] staging: erofs: introduce erofs file system Christian Kujau
2018-07-27 1:39 ` Gao Xiang
2018-07-27 1:56 ` Gao Xiang
2018-07-28 7:25 ` Greg Kroah-Hartman
2018-07-28 9:33 ` Gao Xiang
2018-07-28 10:34 ` Chao Yu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAFLxGvw5PXBLKuaaK5xipiwTOXohtdeenD0XQHgX6+r4rS=GqQ@mail.gmail.com' \
--to=richard.weinberger@gmail.com \
--cc=brooke.wangzhigang@hisilicon.com \
--cc=chen.chun.yen@huawei.com \
--cc=dongjinguang@huawei.com \
--cc=fangwei1@huawei.com \
--cc=gaoxiang25@huawei.com \
--cc=liguifu2@huawei.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miaoxie@huawei.com \
--cc=sunqiuyang@huawei.com \
--cc=weidu.du@huawei.com \
--cc=yuchao0@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).