From: Gao Xiang <xiang@kernel.org> To: Chao Yu <chao@kernel.org> Cc: Gao Xiang <xiang@kernel.org>, linux-erofs@lists.ozlabs.org, LKML <linux-kernel@vger.kernel.org>, Yue Hu <zbestahu@gmail.com>, Gao Xiang <hsiangkao@linux.alibaba.com> Subject: Re: [PATCH v2 3/3] erofs: introduce readmore decompression strategy Date: Sun, 17 Oct 2021 23:42:55 +0800 [thread overview] Message-ID: <20211017154253.GB4054@hsiangkao-HP-ZHAN-66-Pro-G1> (raw) In-Reply-To: <8e39e5d1-285d-52b6-8fea-8bb9ff10bf5a@kernel.org> On Sun, Oct 17, 2021 at 11:34:22PM +0800, Chao Yu wrote: > On 2021/10/9 4:08, Gao Xiang wrote: > > From: Gao Xiang <hsiangkao@linux.alibaba.com> > > > > Previously, the readahead window was strictly followed by EROFS > > decompression strategy in order to minimize extra memory footprint. > > However, it could become inefficient if just reading the partial > > requested data for much big LZ4 pclusters and the upcoming LZMA > > implementation. > > > > Let's try to request the leading data in a pcluster without > > triggering memory reclaiming instead for the LZ4 approach first > > to boost up 100% randread of large big pclusters, and it has no real > > impact on low memory scenarios. > > > > It also introduces a way to expand read lengths in order to decompress > > the whole pcluster, which is useful for LZMA since the algorithm > > itself is relatively slow and causes CPU bound, but LZ4 is not. > > > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> > > --- > > fs/erofs/internal.h | 13 ++++++ > > fs/erofs/zdata.c | 99 ++++++++++++++++++++++++++++++++++++--------- > > 2 files changed, 93 insertions(+), 19 deletions(-) > > > > diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h > > index 48bfc6eb2b02..7f96265ccbdb 100644 > > --- a/fs/erofs/internal.h > > +++ b/fs/erofs/internal.h > > @@ -307,6 +307,19 @@ static inline unsigned int erofs_inode_datalayout(unsigned int value) > > EROFS_I_DATALAYOUT_BITS); > > } > > +/* > > + * Different from grab_cache_page_nowait(), reclaiming is never triggered > > + * when allocating new pages. > > + */ > > +static inline > > +struct page *erofs_grab_cache_page_nowait(struct address_space *mapping, > > + pgoff_t index) > > +{ > > + return pagecache_get_page(mapping, index, > > + FGP_LOCK|FGP_CREAT|FGP_NOFS|FGP_NOWAIT, > > + readahead_gfp_mask(mapping) & ~__GFP_RECLAIM); > > +} > > + > > extern const struct super_operations erofs_sops; > > extern const struct address_space_operations erofs_raw_access_aops; > > diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c > > index 5c34ef66677f..febb018e10a7 100644 > > --- a/fs/erofs/zdata.c > > +++ b/fs/erofs/zdata.c > > @@ -1377,6 +1377,72 @@ static void z_erofs_runqueue(struct super_block *sb, > > z_erofs_decompress_queue(&io[JQ_SUBMIT], pagepool); > > } > > +/* > > + * Since partial uptodate is still unimplemented for now, we have to use > > + * approximate readmore strategies as a start. > > + */ > > +static void z_erofs_pcluster_readmore(struct z_erofs_decompress_frontend *f, > > + struct readahead_control *rac, > > + erofs_off_t end, > > + struct list_head *pagepool, > > + bool backmost) > > +{ > > + struct inode *inode = f->inode; > > + struct erofs_map_blocks *map = &f->map; > > + erofs_off_t cur; > > + int err; > > + > > + if (backmost) { > > + map->m_la = end; > > + /* TODO: pass in EROFS_GET_BLOCKS_READMORE for LZMA later */ > > + err = z_erofs_map_blocks_iter(inode, map, 0); > > + if (err) > > + return; > > + > > + /* expend ra for the trailing edge if readahead */ > > + if (rac) { > > + loff_t newstart = readahead_pos(rac); > > + > > + cur = round_up(map->m_la + map->m_llen, PAGE_SIZE); > > + readahead_expand(rac, newstart, cur - newstart); > > + return; > > + } > > + end = round_up(end, PAGE_SIZE); > > + } else { > > + end = round_up(map->m_la, PAGE_SIZE); > > + > > + if (!map->m_llen) > > + return; > > + } > > + > > + cur = map->m_la + map->m_llen - 1; > > + while (cur >= end) { > > + pgoff_t index = cur >> PAGE_SHIFT; > > + struct page *page; > > + > > + page = erofs_grab_cache_page_nowait(inode->i_mapping, index); > > + if (!page) > > + goto skip; > > + > > + if (PageUptodate(page)) { > > + unlock_page(page); > > + put_page(page); > > + goto skip; > > + } > > + > > + err = z_erofs_do_read_page(f, page, pagepool); > > + if (err) > > + erofs_err(inode->i_sb, > > + "readmore error at page %lu @ nid %llu", > > + index, EROFS_I(inode)->nid); > > + put_page(page); > > +skip: > > + if (cur < PAGE_SIZE) > > + break; > > + cur = (index << PAGE_SHIFT) - 1; > > Looks a little bit weird to readahead backward, any special reason here? Due to the do_read_page implementation, since I'd like to avoid to get the exact full extent length (FIEMAP-likewise) inside do_read_page but only request the needed range, so it should be all in a backward way. Also the submission chain can be then in a forward way. If the question was asked why we should read backward, as I said in the commit message, big pclusters matter since we could read in more leading data at once. Thanks, Gao Xiang > > Thanks,
WARNING: multiple messages have this Message-ID (diff)
From: Gao Xiang <xiang@kernel.org> To: Chao Yu <chao@kernel.org> Cc: Gao Xiang <hsiangkao@linux.alibaba.com>, linux-erofs@lists.ozlabs.org, LKML <linux-kernel@vger.kernel.org> Subject: Re: [PATCH v2 3/3] erofs: introduce readmore decompression strategy Date: Sun, 17 Oct 2021 23:42:55 +0800 [thread overview] Message-ID: <20211017154253.GB4054@hsiangkao-HP-ZHAN-66-Pro-G1> (raw) In-Reply-To: <8e39e5d1-285d-52b6-8fea-8bb9ff10bf5a@kernel.org> On Sun, Oct 17, 2021 at 11:34:22PM +0800, Chao Yu wrote: > On 2021/10/9 4:08, Gao Xiang wrote: > > From: Gao Xiang <hsiangkao@linux.alibaba.com> > > > > Previously, the readahead window was strictly followed by EROFS > > decompression strategy in order to minimize extra memory footprint. > > However, it could become inefficient if just reading the partial > > requested data for much big LZ4 pclusters and the upcoming LZMA > > implementation. > > > > Let's try to request the leading data in a pcluster without > > triggering memory reclaiming instead for the LZ4 approach first > > to boost up 100% randread of large big pclusters, and it has no real > > impact on low memory scenarios. > > > > It also introduces a way to expand read lengths in order to decompress > > the whole pcluster, which is useful for LZMA since the algorithm > > itself is relatively slow and causes CPU bound, but LZ4 is not. > > > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> > > --- > > fs/erofs/internal.h | 13 ++++++ > > fs/erofs/zdata.c | 99 ++++++++++++++++++++++++++++++++++++--------- > > 2 files changed, 93 insertions(+), 19 deletions(-) > > > > diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h > > index 48bfc6eb2b02..7f96265ccbdb 100644 > > --- a/fs/erofs/internal.h > > +++ b/fs/erofs/internal.h > > @@ -307,6 +307,19 @@ static inline unsigned int erofs_inode_datalayout(unsigned int value) > > EROFS_I_DATALAYOUT_BITS); > > } > > +/* > > + * Different from grab_cache_page_nowait(), reclaiming is never triggered > > + * when allocating new pages. > > + */ > > +static inline > > +struct page *erofs_grab_cache_page_nowait(struct address_space *mapping, > > + pgoff_t index) > > +{ > > + return pagecache_get_page(mapping, index, > > + FGP_LOCK|FGP_CREAT|FGP_NOFS|FGP_NOWAIT, > > + readahead_gfp_mask(mapping) & ~__GFP_RECLAIM); > > +} > > + > > extern const struct super_operations erofs_sops; > > extern const struct address_space_operations erofs_raw_access_aops; > > diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c > > index 5c34ef66677f..febb018e10a7 100644 > > --- a/fs/erofs/zdata.c > > +++ b/fs/erofs/zdata.c > > @@ -1377,6 +1377,72 @@ static void z_erofs_runqueue(struct super_block *sb, > > z_erofs_decompress_queue(&io[JQ_SUBMIT], pagepool); > > } > > +/* > > + * Since partial uptodate is still unimplemented for now, we have to use > > + * approximate readmore strategies as a start. > > + */ > > +static void z_erofs_pcluster_readmore(struct z_erofs_decompress_frontend *f, > > + struct readahead_control *rac, > > + erofs_off_t end, > > + struct list_head *pagepool, > > + bool backmost) > > +{ > > + struct inode *inode = f->inode; > > + struct erofs_map_blocks *map = &f->map; > > + erofs_off_t cur; > > + int err; > > + > > + if (backmost) { > > + map->m_la = end; > > + /* TODO: pass in EROFS_GET_BLOCKS_READMORE for LZMA later */ > > + err = z_erofs_map_blocks_iter(inode, map, 0); > > + if (err) > > + return; > > + > > + /* expend ra for the trailing edge if readahead */ > > + if (rac) { > > + loff_t newstart = readahead_pos(rac); > > + > > + cur = round_up(map->m_la + map->m_llen, PAGE_SIZE); > > + readahead_expand(rac, newstart, cur - newstart); > > + return; > > + } > > + end = round_up(end, PAGE_SIZE); > > + } else { > > + end = round_up(map->m_la, PAGE_SIZE); > > + > > + if (!map->m_llen) > > + return; > > + } > > + > > + cur = map->m_la + map->m_llen - 1; > > + while (cur >= end) { > > + pgoff_t index = cur >> PAGE_SHIFT; > > + struct page *page; > > + > > + page = erofs_grab_cache_page_nowait(inode->i_mapping, index); > > + if (!page) > > + goto skip; > > + > > + if (PageUptodate(page)) { > > + unlock_page(page); > > + put_page(page); > > + goto skip; > > + } > > + > > + err = z_erofs_do_read_page(f, page, pagepool); > > + if (err) > > + erofs_err(inode->i_sb, > > + "readmore error at page %lu @ nid %llu", > > + index, EROFS_I(inode)->nid); > > + put_page(page); > > +skip: > > + if (cur < PAGE_SIZE) > > + break; > > + cur = (index << PAGE_SHIFT) - 1; > > Looks a little bit weird to readahead backward, any special reason here? Due to the do_read_page implementation, since I'd like to avoid to get the exact full extent length (FIEMAP-likewise) inside do_read_page but only request the needed range, so it should be all in a backward way. Also the submission chain can be then in a forward way. If the question was asked why we should read backward, as I said in the commit message, big pclusters matter since we could read in more leading data at once. Thanks, Gao Xiang > > Thanks,
next prev parent reply other threads:[~2021-10-17 15:43 UTC|newest] Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-10-08 20:08 [PATCH v2 0/3] erofs: some decompression improvements Gao Xiang 2021-10-08 20:08 ` Gao Xiang 2021-10-08 20:08 ` [PATCH v2 1/3] erofs: get compression algorithms directly on mapping Gao Xiang 2021-10-08 20:08 ` Gao Xiang 2021-10-09 1:52 ` Yue Hu 2021-10-09 1:52 ` Yue Hu 2021-10-17 15:25 ` Chao Yu 2021-10-17 15:25 ` Chao Yu 2021-10-08 20:08 ` [PATCH v2 2/3] erofs: introduce the secondary compression head Gao Xiang 2021-10-08 20:08 ` Gao Xiang 2021-10-09 3:50 ` Yue Hu 2021-10-09 3:50 ` Yue Hu 2021-10-09 4:47 ` Gao Xiang 2021-10-09 4:47 ` Gao Xiang 2021-10-09 18:12 ` [PATCH v3 " Gao Xiang 2021-10-09 18:12 ` Gao Xiang 2021-10-10 0:53 ` Yue Hu 2021-10-10 0:53 ` Yue Hu 2021-10-17 15:27 ` Chao Yu 2021-10-17 15:27 ` Chao Yu 2021-10-17 15:32 ` Gao Xiang 2021-10-17 15:32 ` Gao Xiang 2021-10-17 16:57 ` [PATCH v4 " Gao Xiang 2021-10-17 16:57 ` Gao Xiang 2021-10-19 12:56 ` Chao Yu 2021-10-19 12:56 ` Chao Yu 2021-10-08 20:08 ` [PATCH v2 3/3] erofs: introduce readmore decompression strategy Gao Xiang 2021-10-08 20:08 ` Gao Xiang 2021-10-17 15:34 ` Chao Yu 2021-10-17 15:34 ` Chao Yu 2021-10-17 15:42 ` Gao Xiang [this message] 2021-10-17 15:42 ` Gao Xiang 2021-10-19 12:58 ` Chao Yu
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20211017154253.GB4054@hsiangkao-HP-ZHAN-66-Pro-G1 \ --to=xiang@kernel.org \ --cc=chao@kernel.org \ --cc=hsiangkao@linux.alibaba.com \ --cc=linux-erofs@lists.ozlabs.org \ --cc=linux-kernel@vger.kernel.org \ --cc=zbestahu@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.