linux-erofs.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Gao Xiang <xiang@kernel.org>
To: Chao Yu <chao@kernel.org>
Cc: Yan Song <imeoer@linux.alibaba.com>,
	Peng Tao <tao.peng@linux.alibaba.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Joseph Qi <joseph.qi@linux.alibaba.com>,
	Liu Bo <bo.liu@linux.alibaba.com>,
	Changwei Ge <chge@linux.alibaba.com>,
	Gao Xiang <hsiangkao@linux.alibaba.com>,
	Liu Jiang <gerry@linux.alibaba.com>,
	linux-erofs@lists.ozlabs.org
Subject: Re: [PATCH v6 2/2] erofs: add multiple device support
Date: Sun, 17 Oct 2021 12:15:24 +0800	[thread overview]
Message-ID: <20211017041523.GA15116@hsiangkao-HP-ZHAN-66-Pro-G1> (raw)
In-Reply-To: <b5f8c41f-d781-a9d2-6ee1-77f2692f9461@kernel.org>

Hi Chao,

On Sun, Oct 17, 2021 at 10:10:15AM +0800, Chao Yu wrote:
> On 2021/10/14 16:10, Gao Xiang wrote:
> > In order to support multi-layer container images, add multiple
> > device feature to EROFS. Two ways are available to use for now:
> > 
> >   - Devices can be mapped into 32-bit global block address space;
> >   - Device ID can be specified with the chunk indexes format.
> > 
> > Note that it assumes no extent would cross device boundary and mkfs
> > should take care of it seriously.
> > 
> > In the future, a dedicated device manager could be introduced then
> > thus extra devices can be automatically scanned by UUID as well.
> > 
> > Cc: Chao Yu <chao@kernel.org>
> > Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > ---
> > changes since v5:
> >   - update the outdated comment of on-disk device id;
> >   - add some description about device_id_mask: which is calculated by
> >     using valid bits of extra_devices + 1. Thus the rest bits can be
> >     used for userdata to record extra information.
> > 
> >   Documentation/filesystems/erofs.rst |  12 ++-
> >   fs/erofs/Kconfig                    |  24 +++--
> >   fs/erofs/data.c                     |  73 ++++++++++---
> >   fs/erofs/erofs_fs.h                 |  22 +++-
> >   fs/erofs/internal.h                 |  35 ++++++-
> >   fs/erofs/super.c                    | 156 ++++++++++++++++++++++++++--
> >   fs/erofs/zdata.c                    |  20 +++-
> >   7 files changed, 296 insertions(+), 46 deletions(-)
> > 
> > diff --git a/Documentation/filesystems/erofs.rst b/Documentation/filesystems/erofs.rst
> > index b97579b7d8fb..01df283c7d04 100644
> > --- a/Documentation/filesystems/erofs.rst
> > +++ b/Documentation/filesystems/erofs.rst
> > @@ -19,9 +19,10 @@ It is designed as a better filesystem solution for the following scenarios:
> >      immutable and bit-for-bit identical to the official golden image for
> >      their releases due to security and other considerations and
> > - - hope to save some extra storage space with guaranteed end-to-end performance
> > -   by using reduced metadata and transparent file compression, especially
> > -   for those embedded devices with limited memory (ex, smartphone);
> > + - hope to minimize extra storage space with guaranteed end-to-end performance
> > +   by using compact layout, transparent file compression and direct access,
> > +   especially for those embedded devices with limited memory and high-density
> > +   hosts with numerous containers;
> >   Here is the main features of EROFS:
> > @@ -51,7 +52,9 @@ Here is the main features of EROFS:
> >    - Support POSIX.1e ACLs by using xattrs;
> >    - Support transparent data compression as an option:
> > -   LZ4 algorithm with the fixed-sized output compression for high performance.
> > +   LZ4 algorithm with the fixed-sized output compression for high performance;
> > +
> > + - Multiple device support for multi-layer container images.
> >   The following git tree provides the file system user-space tools under
> >   development (ex, formatting tool mkfs.erofs):
> > @@ -87,6 +90,7 @@ cache_strategy=%s      Select a strategy for cached decompression from now on:
> >   dax={always,never}     Use direct access (no page cache).  See
> >                          Documentation/filesystems/dax.rst.
> >   dax                    A legacy option which is an alias for ``dax=always``.
> > +device=%s              Specify a path to an extra device to be used together.
> >   ===================    =========================================================
> >   On-disk details
> > diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig
> > index 14b747026742..addfe608d08e 100644
> > --- a/fs/erofs/Kconfig
> > +++ b/fs/erofs/Kconfig
> > @@ -6,16 +6,22 @@ config EROFS_FS
> >   	select FS_IOMAP
> >   	select LIBCRC32C
> >   	help
> > -	  EROFS (Enhanced Read-Only File System) is a lightweight
> > -	  read-only file system with modern designs (eg. page-sized
> > -	  blocks, inline xattrs/data, etc.) for scenarios which need
> > -	  high-performance read-only requirements, e.g. Android OS
> > -	  for mobile phones and LIVECDs.
> > +	  EROFS (Enhanced Read-Only File System) is a lightweight read-only
> > +	  file system with modern designs (e.g. no buffer heads, inline
> > +	  xattrs/data, chunk-based deduplication, multiple devices, etc.) for
> > +	  scenarios which need high-performance read-only solutions, e.g.
> > +	  smartphones with Android OS, LiveCDs and high-density hosts with
> > +	  numerous containers;
> > -	  It also provides fixed-sized output compression support,
> > -	  which improves storage density, keeps relatively higher
> > -	  compression ratios, which is more useful to achieve high
> > -	  performance for embedded devices with limited memory.
> > +	  It also provides fixed-sized output compression support in order to
> > +	  improve storage density as well as keep relatively higher compression
> > +	  ratios and implements in-place decompression to reuse the file page
> > +	  for compressed data temporarily with proper strategies, which is
> > +	  quite useful to ensure guaranteed end-to-end runtime decompression
> > +	  performance under extremely memory pressure without extra cost.
> > +
> > +	  See the documentation at <file:Documentation/filesystems/erofs.rst>
> > +	  for more details.
> >   	  If unsure, say N.
> > diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> > index 9db829715652..808234d9190c 100644
> > --- a/fs/erofs/data.c
> > +++ b/fs/erofs/data.c
> > @@ -89,6 +89,7 @@ static int erofs_map_blocks(struct inode *inode,
> >   	erofs_off_t pos;
> >   	int err = 0;
> > +	map->m_deviceid = 0;
> >   	if (map->m_la >= inode->i_size) {
> >   		/* leave out-of-bound access unmapped */
> >   		map->m_flags = 0;
> > @@ -135,14 +136,8 @@ static int erofs_map_blocks(struct inode *inode,
> >   		map->m_flags = 0;
> >   		break;
> >   	default:
> > -		/* only one device is supported for now */
> > -		if (idx->device_id) {
> > -			erofs_err(sb, "invalid device id %u @ %llu for nid %llu",
> > -				  le16_to_cpu(idx->device_id),
> > -				  chunknr, vi->nid);
> > -			err = -EFSCORRUPTED;
> > -			goto out_unlock;
> > -		}
> > +		map->m_deviceid = le16_to_cpu(idx->device_id) &
> > +			EROFS_SB(sb)->device_id_mask;
> >   		map->m_pa = blknr_to_addr(le32_to_cpu(idx->blkaddr));
> >   		map->m_flags = EROFS_MAP_MAPPED;
> >   		break;
> > @@ -155,11 +150,55 @@ static int erofs_map_blocks(struct inode *inode,
> >   	return err;
> >   }
> > +int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
> > +{
> > +	struct erofs_dev_context *devs = EROFS_SB(sb)->devs;
> > +	struct erofs_device_info *dif;
> > +	int id;
> > +
> > +	/* primary device by default */
> > +	map->m_bdev = sb->s_bdev;
> > +	map->m_daxdev = EROFS_SB(sb)->dax_dev;
> > +
> > +	if (map->m_deviceid) {
> > +		down_read(&devs->rwsem);
> > +		dif = idr_find(&devs->tree, map->m_deviceid - 1);
> > +		if (!dif) {
> > +			up_read(&devs->rwsem);
> > +			return -ENODEV;
> > +		}
> > +		map->m_bdev = dif->bdev;
> > +		map->m_daxdev = dif->dax_dev;
> > +		up_read(&devs->rwsem);
> > +	} else if (devs->extra_devices) {
> > +		down_read(&devs->rwsem);
> > +		idr_for_each_entry(&devs->tree, dif, id) {
> > +			erofs_off_t startoff, length;
> > +
> > +			if (!dif->mapped_blkaddr)
> > +				continue;
> > +			startoff = blknr_to_addr(dif->mapped_blkaddr);
> > +			length = blknr_to_addr(dif->blocks);
> > +
> > +			if (map->m_pa >= startoff &&
> > +			    map->m_pa < startoff + length) {
> > +				map->m_pa -= startoff;
> > +				map->m_bdev = dif->bdev;
> > +				map->m_daxdev = dif->dax_dev;
> > +				break;
> 
> File won't locate in multidevices, right? otherwise it needs to shrink mapped length
> as well.

Thanks for your review.

File can be located in multi-devices. But it's intended as I mentioned in the commit
message, each extent won't cross devices, which is guaranteed by mkfs seriously.
Otherwise, it's more complicated to handle (especially for the compression side) and
has no more benefits.

Thanks,
Gao Xiang

> 
> Thanks,

  reply	other threads:[~2021-10-17  4:15 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-07  7:02 [PATCH v4 1/2] erofs: decouple basic mount options from fs_context Gao Xiang
2021-10-07  7:02 ` [PATCH v4 2/2] erofs: add multiple device support Gao Xiang
2021-10-10  6:33   ` [PATCH v5 " Gao Xiang
2021-10-14  8:10     ` [PATCH v6 " Gao Xiang
2021-10-17  2:10       ` Chao Yu
2021-10-17  4:15         ` Gao Xiang [this message]
2021-10-17 15:00           ` Chao Yu
2021-10-07 17:47 ` [PATCH v4 1/2] erofs: decouple basic mount options from fs_context Liu Bo
2021-10-17  1:18 ` Chao Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211017041523.GA15116@hsiangkao-HP-ZHAN-66-Pro-G1 \
    --to=xiang@kernel.org \
    --cc=bo.liu@linux.alibaba.com \
    --cc=chao@kernel.org \
    --cc=chge@linux.alibaba.com \
    --cc=gerry@linux.alibaba.com \
    --cc=hsiangkao@linux.alibaba.com \
    --cc=imeoer@linux.alibaba.com \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tao.peng@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).