All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gao Xiang <xiang@kernel.org>
To: Chao Yu <chao@kernel.org>
Cc: Gao Xiang <xiang@kernel.org>,
	linux-erofs@lists.ozlabs.org, LKML <linux-kernel@vger.kernel.org>,
	Yue Hu <zbestahu@gmail.com>,
	Gao Xiang <hsiangkao@linux.alibaba.com>
Subject: Re: [PATCH v2 3/3] erofs: introduce readmore decompression strategy
Date: Sun, 17 Oct 2021 23:42:55 +0800	[thread overview]
Message-ID: <20211017154253.GB4054@hsiangkao-HP-ZHAN-66-Pro-G1> (raw)
In-Reply-To: <8e39e5d1-285d-52b6-8fea-8bb9ff10bf5a@kernel.org>

On Sun, Oct 17, 2021 at 11:34:22PM +0800, Chao Yu wrote:
> On 2021/10/9 4:08, Gao Xiang wrote:
> > From: Gao Xiang <hsiangkao@linux.alibaba.com>
> > 
> > Previously, the readahead window was strictly followed by EROFS
> > decompression strategy in order to minimize extra memory footprint.
> > However, it could become inefficient if just reading the partial
> > requested data for much big LZ4 pclusters and the upcoming LZMA
> > implementation.
> > 
> > Let's try to request the leading data in a pcluster without
> > triggering memory reclaiming instead for the LZ4 approach first
> > to boost up 100% randread of large big pclusters, and it has no real
> > impact on low memory scenarios.
> > 
> > It also introduces a way to expand read lengths in order to decompress
> > the whole pcluster, which is useful for LZMA since the algorithm
> > itself is relatively slow and causes CPU bound, but LZ4 is not.
> > 
> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > ---
> >   fs/erofs/internal.h | 13 ++++++
> >   fs/erofs/zdata.c    | 99 ++++++++++++++++++++++++++++++++++++---------
> >   2 files changed, 93 insertions(+), 19 deletions(-)
> > 
> > diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> > index 48bfc6eb2b02..7f96265ccbdb 100644
> > --- a/fs/erofs/internal.h
> > +++ b/fs/erofs/internal.h
> > @@ -307,6 +307,19 @@ static inline unsigned int erofs_inode_datalayout(unsigned int value)
> >   			      EROFS_I_DATALAYOUT_BITS);
> >   }
> > +/*
> > + * Different from grab_cache_page_nowait(), reclaiming is never triggered
> > + * when allocating new pages.
> > + */
> > +static inline
> > +struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
> > +					  pgoff_t index)
> > +{
> > +	return pagecache_get_page(mapping, index,
> > +			FGP_LOCK|FGP_CREAT|FGP_NOFS|FGP_NOWAIT,
> > +			readahead_gfp_mask(mapping) & ~__GFP_RECLAIM);
> > +}
> > +
> >   extern const struct super_operations erofs_sops;
> >   extern const struct address_space_operations erofs_raw_access_aops;
> > diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> > index 5c34ef66677f..febb018e10a7 100644
> > --- a/fs/erofs/zdata.c
> > +++ b/fs/erofs/zdata.c
> > @@ -1377,6 +1377,72 @@ static void z_erofs_runqueue(struct super_block *sb,
> >   	z_erofs_decompress_queue(&io[JQ_SUBMIT], pagepool);
> >   }
> > +/*
> > + * Since partial uptodate is still unimplemented for now, we have to use
> > + * approximate readmore strategies as a start.
> > + */
> > +static void z_erofs_pcluster_readmore(struct z_erofs_decompress_frontend *f,
> > +				      struct readahead_control *rac,
> > +				      erofs_off_t end,
> > +				      struct list_head *pagepool,
> > +				      bool backmost)
> > +{
> > +	struct inode *inode = f->inode;
> > +	struct erofs_map_blocks *map = &f->map;
> > +	erofs_off_t cur;
> > +	int err;
> > +
> > +	if (backmost) {
> > +		map->m_la = end;
> > +		/* TODO: pass in EROFS_GET_BLOCKS_READMORE for LZMA later */
> > +		err = z_erofs_map_blocks_iter(inode, map, 0);
> > +		if (err)
> > +			return;
> > +
> > +		/* expend ra for the trailing edge if readahead */
> > +		if (rac) {
> > +			loff_t newstart = readahead_pos(rac);
> > +
> > +			cur = round_up(map->m_la + map->m_llen, PAGE_SIZE);
> > +			readahead_expand(rac, newstart, cur - newstart);
> > +			return;
> > +		}
> > +		end = round_up(end, PAGE_SIZE);
> > +	} else {
> > +		end = round_up(map->m_la, PAGE_SIZE);
> > +
> > +		if (!map->m_llen)
> > +			return;
> > +	}
> > +
> > +	cur = map->m_la + map->m_llen - 1;
> > +	while (cur >= end) {
> > +		pgoff_t index = cur >> PAGE_SHIFT;
> > +		struct page *page;
> > +
> > +		page = erofs_grab_cache_page_nowait(inode->i_mapping, index);
> > +		if (!page)
> > +			goto skip;
> > +
> > +		if (PageUptodate(page)) {
> > +			unlock_page(page);
> > +			put_page(page);
> > +			goto skip;
> > +		}
> > +
> > +		err = z_erofs_do_read_page(f, page, pagepool);
> > +		if (err)
> > +			erofs_err(inode->i_sb,
> > +				  "readmore error at page %lu @ nid %llu",
> > +				  index, EROFS_I(inode)->nid);
> > +		put_page(page);
> > +skip:
> > +		if (cur < PAGE_SIZE)
> > +			break;
> > +		cur = (index << PAGE_SHIFT) - 1;
> 
> Looks a little bit weird to readahead backward, any special reason here?

Due to the do_read_page implementation, since I'd like to avoid
to get the exact full extent length (FIEMAP-likewise) inside
do_read_page but only request the needed range, so it should be
all in a backward way. Also the submission chain can be then in
a forward way.

If the question was asked why we should read backward, as I said in the
commit message, big pclusters matter since we could read in more leading
data at once.

Thanks,
Gao Xiang

> 
> Thanks,

WARNING: multiple messages have this Message-ID (diff)
From: Gao Xiang <xiang@kernel.org>
To: Chao Yu <chao@kernel.org>
Cc: Gao Xiang <hsiangkao@linux.alibaba.com>,
	linux-erofs@lists.ozlabs.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 3/3] erofs: introduce readmore decompression strategy
Date: Sun, 17 Oct 2021 23:42:55 +0800	[thread overview]
Message-ID: <20211017154253.GB4054@hsiangkao-HP-ZHAN-66-Pro-G1> (raw)
In-Reply-To: <8e39e5d1-285d-52b6-8fea-8bb9ff10bf5a@kernel.org>

On Sun, Oct 17, 2021 at 11:34:22PM +0800, Chao Yu wrote:
> On 2021/10/9 4:08, Gao Xiang wrote:
> > From: Gao Xiang <hsiangkao@linux.alibaba.com>
> > 
> > Previously, the readahead window was strictly followed by EROFS
> > decompression strategy in order to minimize extra memory footprint.
> > However, it could become inefficient if just reading the partial
> > requested data for much big LZ4 pclusters and the upcoming LZMA
> > implementation.
> > 
> > Let's try to request the leading data in a pcluster without
> > triggering memory reclaiming instead for the LZ4 approach first
> > to boost up 100% randread of large big pclusters, and it has no real
> > impact on low memory scenarios.
> > 
> > It also introduces a way to expand read lengths in order to decompress
> > the whole pcluster, which is useful for LZMA since the algorithm
> > itself is relatively slow and causes CPU bound, but LZ4 is not.
> > 
> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > ---
> >   fs/erofs/internal.h | 13 ++++++
> >   fs/erofs/zdata.c    | 99 ++++++++++++++++++++++++++++++++++++---------
> >   2 files changed, 93 insertions(+), 19 deletions(-)
> > 
> > diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> > index 48bfc6eb2b02..7f96265ccbdb 100644
> > --- a/fs/erofs/internal.h
> > +++ b/fs/erofs/internal.h
> > @@ -307,6 +307,19 @@ static inline unsigned int erofs_inode_datalayout(unsigned int value)
> >   			      EROFS_I_DATALAYOUT_BITS);
> >   }
> > +/*
> > + * Different from grab_cache_page_nowait(), reclaiming is never triggered
> > + * when allocating new pages.
> > + */
> > +static inline
> > +struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
> > +					  pgoff_t index)
> > +{
> > +	return pagecache_get_page(mapping, index,
> > +			FGP_LOCK|FGP_CREAT|FGP_NOFS|FGP_NOWAIT,
> > +			readahead_gfp_mask(mapping) & ~__GFP_RECLAIM);
> > +}
> > +
> >   extern const struct super_operations erofs_sops;
> >   extern const struct address_space_operations erofs_raw_access_aops;
> > diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> > index 5c34ef66677f..febb018e10a7 100644
> > --- a/fs/erofs/zdata.c
> > +++ b/fs/erofs/zdata.c
> > @@ -1377,6 +1377,72 @@ static void z_erofs_runqueue(struct super_block *sb,
> >   	z_erofs_decompress_queue(&io[JQ_SUBMIT], pagepool);
> >   }
> > +/*
> > + * Since partial uptodate is still unimplemented for now, we have to use
> > + * approximate readmore strategies as a start.
> > + */
> > +static void z_erofs_pcluster_readmore(struct z_erofs_decompress_frontend *f,
> > +				      struct readahead_control *rac,
> > +				      erofs_off_t end,
> > +				      struct list_head *pagepool,
> > +				      bool backmost)
> > +{
> > +	struct inode *inode = f->inode;
> > +	struct erofs_map_blocks *map = &f->map;
> > +	erofs_off_t cur;
> > +	int err;
> > +
> > +	if (backmost) {
> > +		map->m_la = end;
> > +		/* TODO: pass in EROFS_GET_BLOCKS_READMORE for LZMA later */
> > +		err = z_erofs_map_blocks_iter(inode, map, 0);
> > +		if (err)
> > +			return;
> > +
> > +		/* expend ra for the trailing edge if readahead */
> > +		if (rac) {
> > +			loff_t newstart = readahead_pos(rac);
> > +
> > +			cur = round_up(map->m_la + map->m_llen, PAGE_SIZE);
> > +			readahead_expand(rac, newstart, cur - newstart);
> > +			return;
> > +		}
> > +		end = round_up(end, PAGE_SIZE);
> > +	} else {
> > +		end = round_up(map->m_la, PAGE_SIZE);
> > +
> > +		if (!map->m_llen)
> > +			return;
> > +	}
> > +
> > +	cur = map->m_la + map->m_llen - 1;
> > +	while (cur >= end) {
> > +		pgoff_t index = cur >> PAGE_SHIFT;
> > +		struct page *page;
> > +
> > +		page = erofs_grab_cache_page_nowait(inode->i_mapping, index);
> > +		if (!page)
> > +			goto skip;
> > +
> > +		if (PageUptodate(page)) {
> > +			unlock_page(page);
> > +			put_page(page);
> > +			goto skip;
> > +		}
> > +
> > +		err = z_erofs_do_read_page(f, page, pagepool);
> > +		if (err)
> > +			erofs_err(inode->i_sb,
> > +				  "readmore error at page %lu @ nid %llu",
> > +				  index, EROFS_I(inode)->nid);
> > +		put_page(page);
> > +skip:
> > +		if (cur < PAGE_SIZE)
> > +			break;
> > +		cur = (index << PAGE_SHIFT) - 1;
> 
> Looks a little bit weird to readahead backward, any special reason here?

Due to the do_read_page implementation, since I'd like to avoid
to get the exact full extent length (FIEMAP-likewise) inside
do_read_page but only request the needed range, so it should be
all in a backward way. Also the submission chain can be then in
a forward way.

If the question was asked why we should read backward, as I said in the
commit message, big pclusters matter since we could read in more leading
data at once.

Thanks,
Gao Xiang

> 
> Thanks,

  reply	other threads:[~2021-10-17 15:43 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-08 20:08 [PATCH v2 0/3] erofs: some decompression improvements Gao Xiang
2021-10-08 20:08 ` Gao Xiang
2021-10-08 20:08 ` [PATCH v2 1/3] erofs: get compression algorithms directly on mapping Gao Xiang
2021-10-08 20:08   ` Gao Xiang
2021-10-09  1:52   ` Yue Hu
2021-10-09  1:52     ` Yue Hu
2021-10-17 15:25   ` Chao Yu
2021-10-17 15:25     ` Chao Yu
2021-10-08 20:08 ` [PATCH v2 2/3] erofs: introduce the secondary compression head Gao Xiang
2021-10-08 20:08   ` Gao Xiang
2021-10-09  3:50   ` Yue Hu
2021-10-09  3:50     ` Yue Hu
2021-10-09  4:47     ` Gao Xiang
2021-10-09  4:47       ` Gao Xiang
2021-10-09 18:12   ` [PATCH v3 " Gao Xiang
2021-10-09 18:12     ` Gao Xiang
2021-10-10  0:53     ` Yue Hu
2021-10-10  0:53       ` Yue Hu
2021-10-17 15:27     ` Chao Yu
2021-10-17 15:27       ` Chao Yu
2021-10-17 15:32       ` Gao Xiang
2021-10-17 15:32         ` Gao Xiang
2021-10-17 16:57     ` [PATCH v4 " Gao Xiang
2021-10-17 16:57       ` Gao Xiang
2021-10-19 12:56       ` Chao Yu
2021-10-19 12:56         ` Chao Yu
2021-10-08 20:08 ` [PATCH v2 3/3] erofs: introduce readmore decompression strategy Gao Xiang
2021-10-08 20:08   ` Gao Xiang
2021-10-17 15:34   ` Chao Yu
2021-10-17 15:34     ` Chao Yu
2021-10-17 15:42     ` Gao Xiang [this message]
2021-10-17 15:42       ` Gao Xiang
2021-10-19 12:58       ` Chao Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211017154253.GB4054@hsiangkao-HP-ZHAN-66-Pro-G1 \
    --to=xiang@kernel.org \
    --cc=chao@kernel.org \
    --cc=hsiangkao@linux.alibaba.com \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=zbestahu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.