linux-erofs.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error
@ 2022-07-14 13:20 Gao Xiang
  2022-07-14 13:20 ` [PATCH 01/16] erofs: get rid of unneeded `inode', `map' and `sb' Gao Xiang
                   ` (16 more replies)
  0 siblings, 17 replies; 28+ messages in thread
From: Gao Xiang @ 2022-07-14 13:20 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: Gao Xiang, LKML

Hi folks,

I've been doing this for almost 2 months, the main point of this is
to support large folios and rolling hash duplication for compressed
data.

This patchset is as a start of this work targeting for the next 5.20,
it introduces a flexable range representation for (de)compressed buffers
instead of too relying on page(s) directly themselves, so large folios
can laterly base on this work.  Also, this patchset gets rid of all
PG_error flags in the decompression code. It's a cleanup as a result
as well.

In addition, this patchset kicks off rolling hash duplication for
compressed data by introducing fully-referenced multi-reference
pclusters first instead of reporting fs corruption if one pcluster
is introduced by several differnt extents.  The full implementation
is expected to be finished in the merge window after the next.  One
of my colleagues is actively working on the userspace part of this
feature.

However, it's still easy to verify fully-referenced multi-reference
pcluster by constructing some image by hand (see attachment):

Dataset: 300M
seq-read (data-duplicated, read_ahead_kb 8192): 1095MiB/s
seq-read (data-duplicated, read_ahead_kb 4096): 771MiB/s
seq-read (data-duplicated, read_ahead_kb 512):  577MiB/s
seq-read (vanilla, read_ahead_kb 8192):         364MiB/s

Finally, this patchset survives ro-fsstress on my side.

Thanks,
Gao Xiang

Gao Xiang (16):
  erofs: get rid of unneeded `inode', `map' and `sb'
  erofs: clean up z_erofs_collector_begin()
  erofs: introduce `z_erofs_parse_out_bvecs()'
  erofs: introduce bufvec to store decompressed buffers
  erofs: drop the old pagevec approach
  erofs: introduce `z_erofs_parse_in_bvecs'
  erofs: switch compressed_pages[] to bufvec
  erofs: rework online page handling
  erofs: get rid of `enum z_erofs_page_type'
  erofs: clean up `enum z_erofs_collectmode'
  erofs: get rid of `z_pagemap_global'
  erofs: introduce struct z_erofs_decompress_backend
  erofs: try to leave (de)compressed_pages on stack if possible
  erofs: introduce z_erofs_do_decompressed_bvec()
  erofs: record the longest decompressed size in this round
  erofs: introduce multi-reference pclusters (fully-referenced)

 fs/erofs/compress.h     |   2 +-
 fs/erofs/decompressor.c |   2 +-
 fs/erofs/zdata.c        | 777 ++++++++++++++++++++++------------------
 fs/erofs/zdata.h        | 119 +++---
 fs/erofs/zpvec.h        | 159 --------
 5 files changed, 490 insertions(+), 569 deletions(-)
 delete mode 100644 fs/erofs/zpvec.h

-- 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/16] erofs: get rid of unneeded `inode', `map' and `sb'
  2022-07-14 13:20 [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
@ 2022-07-14 13:20 ` Gao Xiang
  2022-07-15  6:20   ` Yue Hu
  2022-07-14 13:20 ` [PATCH 02/16] erofs: clean up z_erofs_collector_begin() Gao Xiang
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 28+ messages in thread
From: Gao Xiang @ 2022-07-14 13:20 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: Gao Xiang, LKML

Since commit 5c6dcc57e2e5 ("erofs: get rid of
`struct z_erofs_collector'"), these arguments can be dropped as well.

No logic changes.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/zdata.c | 42 +++++++++++++++++++-----------------------
 1 file changed, 19 insertions(+), 23 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 724bb57075f6..1b6816dd235f 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -404,10 +404,9 @@ static void z_erofs_try_to_claim_pcluster(struct z_erofs_decompress_frontend *f)
 	f->mode = COLLECT_PRIMARY;
 }
 
-static int z_erofs_lookup_pcluster(struct z_erofs_decompress_frontend *fe,
-				   struct inode *inode,
-				   struct erofs_map_blocks *map)
+static int z_erofs_lookup_pcluster(struct z_erofs_decompress_frontend *fe)
 {
+	struct erofs_map_blocks *map = &fe->map;
 	struct z_erofs_pcluster *pcl = fe->pcl;
 	unsigned int length;
 
@@ -449,10 +448,9 @@ static int z_erofs_lookup_pcluster(struct z_erofs_decompress_frontend *fe,
 	return 0;
 }
 
-static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe,
-				     struct inode *inode,
-				     struct erofs_map_blocks *map)
+static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe)
 {
+	struct erofs_map_blocks *map = &fe->map;
 	bool ztailpacking = map->m_flags & EROFS_MAP_META;
 	struct z_erofs_pcluster *pcl;
 	struct erofs_workgroup *grp;
@@ -494,7 +492,7 @@ static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe,
 	} else {
 		pcl->obj.index = map->m_pa >> PAGE_SHIFT;
 
-		grp = erofs_insert_workgroup(inode->i_sb, &pcl->obj);
+		grp = erofs_insert_workgroup(fe->inode->i_sb, &pcl->obj);
 		if (IS_ERR(grp)) {
 			err = PTR_ERR(grp);
 			goto err_out;
@@ -520,10 +518,9 @@ static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe,
 	return err;
 }
 
-static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe,
-				   struct inode *inode,
-				   struct erofs_map_blocks *map)
+static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe)
 {
+	struct erofs_map_blocks *map = &fe->map;
 	struct erofs_workgroup *grp;
 	int ret;
 
@@ -541,19 +538,19 @@ static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe,
 		goto tailpacking;
 	}
 
-	grp = erofs_find_workgroup(inode->i_sb, map->m_pa >> PAGE_SHIFT);
+	grp = erofs_find_workgroup(fe->inode->i_sb, map->m_pa >> PAGE_SHIFT);
 	if (grp) {
 		fe->pcl = container_of(grp, struct z_erofs_pcluster, obj);
 	} else {
 tailpacking:
-		ret = z_erofs_register_pcluster(fe, inode, map);
+		ret = z_erofs_register_pcluster(fe);
 		if (!ret)
 			goto out;
 		if (ret != -EEXIST)
 			return ret;
 	}
 
-	ret = z_erofs_lookup_pcluster(fe, inode, map);
+	ret = z_erofs_lookup_pcluster(fe);
 	if (ret) {
 		erofs_workgroup_put(&fe->pcl->obj);
 		return ret;
@@ -663,7 +660,7 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
 	if (!(map->m_flags & EROFS_MAP_MAPPED))
 		goto hitted;
 
-	err = z_erofs_collector_begin(fe, inode, map);
+	err = z_erofs_collector_begin(fe);
 	if (err)
 		goto err_out;
 
@@ -1259,13 +1256,13 @@ static void z_erofs_decompressqueue_endio(struct bio *bio)
 	bio_put(bio);
 }
 
-static void z_erofs_submit_queue(struct super_block *sb,
-				 struct z_erofs_decompress_frontend *f,
+static void z_erofs_submit_queue(struct z_erofs_decompress_frontend *f,
 				 struct page **pagepool,
 				 struct z_erofs_decompressqueue *fgq,
 				 bool *force_fg)
 {
-	struct erofs_sb_info *const sbi = EROFS_SB(sb);
+	struct super_block *sb = f->inode->i_sb;
+	struct address_space *mc = MNGD_MAPPING(EROFS_SB(sb));
 	z_erofs_next_pcluster_t qtail[NR_JOBQUEUES];
 	struct z_erofs_decompressqueue *q[NR_JOBQUEUES];
 	void *bi_private;
@@ -1317,7 +1314,7 @@ static void z_erofs_submit_queue(struct super_block *sb,
 			struct page *page;
 
 			page = pickup_page_for_submission(pcl, i++, pagepool,
-							  MNGD_MAPPING(sbi));
+							  mc);
 			if (!page)
 				continue;
 
@@ -1369,15 +1366,14 @@ static void z_erofs_submit_queue(struct super_block *sb,
 	z_erofs_decompress_kickoff(q[JQ_SUBMIT], *force_fg, nr_bios);
 }
 
-static void z_erofs_runqueue(struct super_block *sb,
-			     struct z_erofs_decompress_frontend *f,
+static void z_erofs_runqueue(struct z_erofs_decompress_frontend *f,
 			     struct page **pagepool, bool force_fg)
 {
 	struct z_erofs_decompressqueue io[NR_JOBQUEUES];
 
 	if (f->owned_head == Z_EROFS_PCLUSTER_TAIL)
 		return;
-	z_erofs_submit_queue(sb, f, pagepool, io, &force_fg);
+	z_erofs_submit_queue(f, pagepool, io, &force_fg);
 
 	/* handle bypass queue (no i/o pclusters) immediately */
 	z_erofs_decompress_queue(&io[JQ_BYPASS], pagepool);
@@ -1475,7 +1471,7 @@ static int z_erofs_read_folio(struct file *file, struct folio *folio)
 	(void)z_erofs_collector_end(&f);
 
 	/* if some compressed cluster ready, need submit them anyway */
-	z_erofs_runqueue(inode->i_sb, &f, &pagepool,
+	z_erofs_runqueue(&f, &pagepool,
 			 z_erofs_get_sync_decompress_policy(sbi, 0));
 
 	if (err)
@@ -1524,7 +1520,7 @@ static void z_erofs_readahead(struct readahead_control *rac)
 	z_erofs_pcluster_readmore(&f, rac, 0, &pagepool, false);
 	(void)z_erofs_collector_end(&f);
 
-	z_erofs_runqueue(inode->i_sb, &f, &pagepool,
+	z_erofs_runqueue(&f, &pagepool,
 			 z_erofs_get_sync_decompress_policy(sbi, nr_pages));
 	erofs_put_metabuf(&f.map.buf);
 	erofs_release_pages(&pagepool);
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/16] erofs: clean up z_erofs_collector_begin()
  2022-07-14 13:20 [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
  2022-07-14 13:20 ` [PATCH 01/16] erofs: get rid of unneeded `inode', `map' and `sb' Gao Xiang
@ 2022-07-14 13:20 ` Gao Xiang
  2022-07-15  6:22   ` Yue Hu
  2022-07-14 13:20 ` [PATCH 03/16] erofs: introduce `z_erofs_parse_out_bvecs()' Gao Xiang
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 28+ messages in thread
From: Gao Xiang @ 2022-07-14 13:20 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: Gao Xiang, LKML

Rearrange the code and get rid of all gotos.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/zdata.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 1b6816dd235f..c7be447ac64d 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -521,7 +521,7 @@ static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe)
 static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe)
 {
 	struct erofs_map_blocks *map = &fe->map;
-	struct erofs_workgroup *grp;
+	struct erofs_workgroup *grp = NULL;
 	int ret;
 
 	DBG_BUGON(fe->pcl);
@@ -530,33 +530,31 @@ static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe)
 	DBG_BUGON(fe->owned_head == Z_EROFS_PCLUSTER_NIL);
 	DBG_BUGON(fe->owned_head == Z_EROFS_PCLUSTER_TAIL_CLOSED);
 
-	if (map->m_flags & EROFS_MAP_META) {
-		if ((map->m_pa & ~PAGE_MASK) + map->m_plen > PAGE_SIZE) {
-			DBG_BUGON(1);
-			return -EFSCORRUPTED;
-		}
-		goto tailpacking;
+	if (!(map->m_flags & EROFS_MAP_META)) {
+		grp = erofs_find_workgroup(fe->inode->i_sb,
+					   map->m_pa >> PAGE_SHIFT);
+	} else if ((map->m_pa & ~PAGE_MASK) + map->m_plen > PAGE_SIZE) {
+		DBG_BUGON(1);
+		return -EFSCORRUPTED;
 	}
 
-	grp = erofs_find_workgroup(fe->inode->i_sb, map->m_pa >> PAGE_SHIFT);
 	if (grp) {
 		fe->pcl = container_of(grp, struct z_erofs_pcluster, obj);
+		ret = -EEXIST;
 	} else {
-tailpacking:
 		ret = z_erofs_register_pcluster(fe);
-		if (!ret)
-			goto out;
-		if (ret != -EEXIST)
-			return ret;
 	}
 
-	ret = z_erofs_lookup_pcluster(fe);
-	if (ret) {
-		erofs_workgroup_put(&fe->pcl->obj);
+	if (ret == -EEXIST) {
+		ret = z_erofs_lookup_pcluster(fe);
+		if (ret) {
+			erofs_workgroup_put(&fe->pcl->obj);
+			return ret;
+		}
+	} else if (ret) {
 		return ret;
 	}
 
-out:
 	z_erofs_pagevec_ctor_init(&fe->vector, Z_EROFS_NR_INLINE_PAGEVECS,
 				  fe->pcl->pagevec, fe->pcl->vcnt);
 	/* since file-backed online pages are traversed in reverse order */
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/16] erofs: introduce `z_erofs_parse_out_bvecs()'
  2022-07-14 13:20 [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
  2022-07-14 13:20 ` [PATCH 01/16] erofs: get rid of unneeded `inode', `map' and `sb' Gao Xiang
  2022-07-14 13:20 ` [PATCH 02/16] erofs: clean up z_erofs_collector_begin() Gao Xiang
@ 2022-07-14 13:20 ` Gao Xiang
  2022-07-15  6:22   ` Yue Hu
  2022-07-14 13:20 ` [PATCH 04/16] erofs: introduce bufvec to store decompressed buffers Gao Xiang
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 28+ messages in thread
From: Gao Xiang @ 2022-07-14 13:20 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: Gao Xiang, LKML

`z_erofs_decompress_pcluster()' is too long therefore it'd be better
to introduce another helper to parse decompressed pages (or laterly,
decompressed bvecs.)

BTW, since `decompressed_bvecs' is too long as a part of the function
name, `out_bvecs' is used instead.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/zdata.c | 81 +++++++++++++++++++++++++-----------------------
 1 file changed, 43 insertions(+), 38 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index c7be447ac64d..c183cd0bc42b 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -778,18 +778,58 @@ static bool z_erofs_page_is_invalidated(struct page *page)
 	return !page->mapping && !z_erofs_is_shortlived_page(page);
 }
 
+static int z_erofs_parse_out_bvecs(struct z_erofs_pcluster *pcl,
+				   struct page **pages, struct page **pagepool)
+{
+	struct z_erofs_pagevec_ctor ctor;
+	enum z_erofs_page_type page_type;
+	int i, err = 0;
+
+	z_erofs_pagevec_ctor_init(&ctor, Z_EROFS_NR_INLINE_PAGEVECS,
+				  pcl->pagevec, 0);
+	for (i = 0; i < pcl->vcnt; ++i) {
+		struct page *page = z_erofs_pagevec_dequeue(&ctor, &page_type);
+		unsigned int pagenr;
+
+		/* all pages in pagevec ought to be valid */
+		DBG_BUGON(!page);
+		DBG_BUGON(z_erofs_page_is_invalidated(page));
+
+		if (z_erofs_put_shortlivedpage(pagepool, page))
+			continue;
+
+		if (page_type == Z_EROFS_VLE_PAGE_TYPE_HEAD)
+			pagenr = 0;
+		else
+			pagenr = z_erofs_onlinepage_index(page);
+
+		DBG_BUGON(pagenr >= pcl->nr_pages);
+		/*
+		 * currently EROFS doesn't support multiref(dedup),
+		 * so here erroring out one multiref page.
+		 */
+		if (pages[pagenr]) {
+			DBG_BUGON(1);
+			SetPageError(pages[pagenr]);
+			z_erofs_onlinepage_endio(pages[pagenr]);
+			err = -EFSCORRUPTED;
+		}
+		pages[pagenr] = page;
+	}
+	z_erofs_pagevec_ctor_exit(&ctor, true);
+	return err;
+}
+
 static int z_erofs_decompress_pcluster(struct super_block *sb,
 				       struct z_erofs_pcluster *pcl,
 				       struct page **pagepool)
 {
 	struct erofs_sb_info *const sbi = EROFS_SB(sb);
 	unsigned int pclusterpages = z_erofs_pclusterpages(pcl);
-	struct z_erofs_pagevec_ctor ctor;
 	unsigned int i, inputsize, outputsize, llen, nr_pages;
 	struct page *pages_onstack[Z_EROFS_VMAP_ONSTACK_PAGES];
 	struct page **pages, **compressed_pages, *page;
 
-	enum z_erofs_page_type page_type;
 	bool overlapped, partial;
 	int err;
 
@@ -823,42 +863,7 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
 	for (i = 0; i < nr_pages; ++i)
 		pages[i] = NULL;
 
-	err = 0;
-	z_erofs_pagevec_ctor_init(&ctor, Z_EROFS_NR_INLINE_PAGEVECS,
-				  pcl->pagevec, 0);
-
-	for (i = 0; i < pcl->vcnt; ++i) {
-		unsigned int pagenr;
-
-		page = z_erofs_pagevec_dequeue(&ctor, &page_type);
-
-		/* all pages in pagevec ought to be valid */
-		DBG_BUGON(!page);
-		DBG_BUGON(z_erofs_page_is_invalidated(page));
-
-		if (z_erofs_put_shortlivedpage(pagepool, page))
-			continue;
-
-		if (page_type == Z_EROFS_VLE_PAGE_TYPE_HEAD)
-			pagenr = 0;
-		else
-			pagenr = z_erofs_onlinepage_index(page);
-
-		DBG_BUGON(pagenr >= nr_pages);
-
-		/*
-		 * currently EROFS doesn't support multiref(dedup),
-		 * so here erroring out one multiref page.
-		 */
-		if (pages[pagenr]) {
-			DBG_BUGON(1);
-			SetPageError(pages[pagenr]);
-			z_erofs_onlinepage_endio(pages[pagenr]);
-			err = -EFSCORRUPTED;
-		}
-		pages[pagenr] = page;
-	}
-	z_erofs_pagevec_ctor_exit(&ctor, true);
+	err = z_erofs_parse_out_bvecs(pcl, pages, pagepool);
 
 	overlapped = false;
 	compressed_pages = pcl->compressed_pages;
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/16] erofs: introduce bufvec to store decompressed buffers
  2022-07-14 13:20 [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
                   ` (2 preceding siblings ...)
  2022-07-14 13:20 ` [PATCH 03/16] erofs: introduce `z_erofs_parse_out_bvecs()' Gao Xiang
@ 2022-07-14 13:20 ` Gao Xiang
  2022-07-15  6:29   ` Yue Hu
  2022-07-14 13:20 ` [PATCH 05/16] erofs: drop the old pagevec approach Gao Xiang
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 28+ messages in thread
From: Gao Xiang @ 2022-07-14 13:20 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: Gao Xiang, LKML

For each pcluster, the total compressed buffers are determined in
advance, yet the number of decompressed buffers actually vary.  Too
many decompressed pages can be recorded if one pcluster is highly
compressed or its pcluster size is large.  That takes extra memory
footprints compared to uncompressed filesystems, especially a lot of
I/O in flight on low-ended devices.

Therefore, similar to inplace I/O, pagevec was introduced to reuse
page cache to store these pointers in the time-sharing way since
these pages are actually unused before decompressing.

In order to make it more flexable, a cleaner bufvec is used to
replace the old pagevec stuffs so that

 - Decompressed offsets can be stored inline, thus it can be used
   for the upcoming feature like compressed data deduplication;

 - Towards supporting large folios for compressed inodes since
   our final goal is to completely avoid page->private but use
   folio->private only for all page cache pages.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/zdata.c | 177 +++++++++++++++++++++++++++++++++++------------
 fs/erofs/zdata.h |  26 +++++--
 2 files changed, 153 insertions(+), 50 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index c183cd0bc42b..f52c54058f31 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -2,6 +2,7 @@
 /*
  * Copyright (C) 2018 HUAWEI, Inc.
  *             https://www.huawei.com/
+ * Copyright (C) 2022 Alibaba Cloud
  */
 #include "zdata.h"
 #include "compress.h"
@@ -26,6 +27,82 @@ static struct z_erofs_pcluster_slab pcluster_pool[] __read_mostly = {
 	_PCLP(Z_EROFS_PCLUSTER_MAX_PAGES)
 };
 
+struct z_erofs_bvec_iter {
+	struct page *bvpage;
+	struct z_erofs_bvset *bvset;
+	unsigned int nr, cur;
+};
+
+static struct page *z_erofs_bvec_iter_end(struct z_erofs_bvec_iter *iter)
+{
+	if (iter->bvpage)
+		kunmap_local(iter->bvset);
+	return iter->bvpage;
+}
+
+static struct page *z_erofs_bvset_flip(struct z_erofs_bvec_iter *iter)
+{
+	unsigned long base = (unsigned long)((struct z_erofs_bvset *)0)->bvec;
+	/* have to access nextpage in advance, otherwise it will be unmapped */
+	struct page *nextpage = iter->bvset->nextpage;
+	struct page *oldpage;
+
+	DBG_BUGON(!nextpage);
+	oldpage = z_erofs_bvec_iter_end(iter);
+	iter->bvpage = nextpage;
+	iter->bvset = kmap_local_page(nextpage);
+	iter->nr = (PAGE_SIZE - base) / sizeof(struct z_erofs_bvec);
+	iter->cur = 0;
+	return oldpage;
+}
+
+static void z_erofs_bvec_iter_begin(struct z_erofs_bvec_iter *iter,
+				    struct z_erofs_bvset_inline *bvset,
+				    unsigned int bootstrap_nr,
+				    unsigned int cur)
+{
+	*iter = (struct z_erofs_bvec_iter) {
+		.nr = bootstrap_nr,
+		.bvset = (struct z_erofs_bvset *)bvset,
+	};
+
+	while (cur > iter->nr) {
+		cur -= iter->nr;
+		z_erofs_bvset_flip(iter);
+	}
+	iter->cur = cur;
+}
+
+static int z_erofs_bvec_enqueue(struct z_erofs_bvec_iter *iter,
+				struct z_erofs_bvec *bvec,
+				struct page **candidate_bvpage)
+{
+	if (iter->cur == iter->nr) {
+		if (!*candidate_bvpage)
+			return -EAGAIN;
+
+		DBG_BUGON(iter->bvset->nextpage);
+		iter->bvset->nextpage = *candidate_bvpage;
+		z_erofs_bvset_flip(iter);
+
+		iter->bvset->nextpage = NULL;
+		*candidate_bvpage = NULL;
+	}
+	iter->bvset->bvec[iter->cur++] = *bvec;
+	return 0;
+}
+
+static void z_erofs_bvec_dequeue(struct z_erofs_bvec_iter *iter,
+				 struct z_erofs_bvec *bvec,
+				 struct page **old_bvpage)
+{
+	if (iter->cur == iter->nr)
+		*old_bvpage = z_erofs_bvset_flip(iter);
+	else
+		*old_bvpage = NULL;
+	*bvec = iter->bvset->bvec[iter->cur++];
+}
+
 static void z_erofs_destroy_pcluster_pool(void)
 {
 	int i;
@@ -195,9 +272,10 @@ enum z_erofs_collectmode {
 struct z_erofs_decompress_frontend {
 	struct inode *const inode;
 	struct erofs_map_blocks map;
-
+	struct z_erofs_bvec_iter biter;
 	struct z_erofs_pagevec_ctor vector;
 
+	struct page *candidate_bvpage;
 	struct z_erofs_pcluster *pcl, *tailpcl;
 	/* a pointer used to pick up inplace I/O pages */
 	struct page **icpage_ptr;
@@ -358,21 +436,24 @@ static bool z_erofs_try_inplace_io(struct z_erofs_decompress_frontend *fe,
 
 /* callers must be with pcluster lock held */
 static int z_erofs_attach_page(struct z_erofs_decompress_frontend *fe,
-			       struct page *page, enum z_erofs_page_type type,
-			       bool pvec_safereuse)
+			       struct z_erofs_bvec *bvec,
+			       enum z_erofs_page_type type)
 {
 	int ret;
 
-	/* give priority for inplaceio */
 	if (fe->mode >= COLLECT_PRIMARY &&
-	    type == Z_EROFS_PAGE_TYPE_EXCLUSIVE &&
-	    z_erofs_try_inplace_io(fe, page))
-		return 0;
-
-	ret = z_erofs_pagevec_enqueue(&fe->vector, page, type,
-				      pvec_safereuse);
-	fe->pcl->vcnt += (unsigned int)ret;
-	return ret ? 0 : -EAGAIN;
+	    type == Z_EROFS_PAGE_TYPE_EXCLUSIVE) {
+		/* give priority for inplaceio to use file pages first */
+		if (z_erofs_try_inplace_io(fe, bvec->page))
+			return 0;
+		/* otherwise, check if it can be used as a bvpage */
+		if (fe->mode >= COLLECT_PRIMARY_FOLLOWED &&
+		    !fe->candidate_bvpage)
+			fe->candidate_bvpage = bvec->page;
+	}
+	ret = z_erofs_bvec_enqueue(&fe->biter, bvec, &fe->candidate_bvpage);
+	fe->pcl->vcnt += (ret >= 0);
+	return ret;
 }
 
 static void z_erofs_try_to_claim_pcluster(struct z_erofs_decompress_frontend *f)
@@ -554,9 +635,8 @@ static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe)
 	} else if (ret) {
 		return ret;
 	}
-
-	z_erofs_pagevec_ctor_init(&fe->vector, Z_EROFS_NR_INLINE_PAGEVECS,
-				  fe->pcl->pagevec, fe->pcl->vcnt);
+	z_erofs_bvec_iter_begin(&fe->biter, &fe->pcl->bvset,
+				Z_EROFS_NR_INLINE_PAGEVECS, fe->pcl->vcnt);
 	/* since file-backed online pages are traversed in reverse order */
 	fe->icpage_ptr = fe->pcl->compressed_pages +
 			z_erofs_pclusterpages(fe->pcl);
@@ -588,9 +668,14 @@ static bool z_erofs_collector_end(struct z_erofs_decompress_frontend *fe)
 	if (!pcl)
 		return false;
 
-	z_erofs_pagevec_ctor_exit(&fe->vector, false);
+	z_erofs_bvec_iter_end(&fe->biter);
 	mutex_unlock(&pcl->lock);
 
+	if (fe->candidate_bvpage) {
+		DBG_BUGON(z_erofs_is_shortlived_page(fe->candidate_bvpage));
+		fe->candidate_bvpage = NULL;
+	}
+
 	/*
 	 * if all pending pages are added, don't hold its reference
 	 * any longer if the pcluster isn't hosted by ourselves.
@@ -712,22 +797,23 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
 		tight &= (fe->mode >= COLLECT_PRIMARY_FOLLOWED);
 
 retry:
-	err = z_erofs_attach_page(fe, page, page_type,
-				  fe->mode >= COLLECT_PRIMARY_FOLLOWED);
-	/* should allocate an additional short-lived page for pagevec */
-	if (err == -EAGAIN) {
-		struct page *const newpage =
-				alloc_page(GFP_NOFS | __GFP_NOFAIL);
-
-		set_page_private(newpage, Z_EROFS_SHORTLIVED_PAGE);
-		err = z_erofs_attach_page(fe, newpage,
-					  Z_EROFS_PAGE_TYPE_EXCLUSIVE, true);
-		if (!err)
-			goto retry;
+	err = z_erofs_attach_page(fe, &((struct z_erofs_bvec) {
+					.page = page,
+					.offset = offset - map->m_la,
+					.end = end,
+				  }), page_type);
+	/* should allocate an additional short-lived page for bvset */
+	if (err == -EAGAIN && !fe->candidate_bvpage) {
+		fe->candidate_bvpage = alloc_page(GFP_NOFS | __GFP_NOFAIL);
+		set_page_private(fe->candidate_bvpage,
+				 Z_EROFS_SHORTLIVED_PAGE);
+		goto retry;
 	}
 
-	if (err)
+	if (err) {
+		DBG_BUGON(err == -EAGAIN && fe->candidate_bvpage);
 		goto err_out;
+	}
 
 	index = page->index - (map->m_la >> PAGE_SHIFT);
 
@@ -781,29 +867,24 @@ static bool z_erofs_page_is_invalidated(struct page *page)
 static int z_erofs_parse_out_bvecs(struct z_erofs_pcluster *pcl,
 				   struct page **pages, struct page **pagepool)
 {
-	struct z_erofs_pagevec_ctor ctor;
-	enum z_erofs_page_type page_type;
+	struct z_erofs_bvec_iter biter;
+	struct page *old_bvpage;
 	int i, err = 0;
 
-	z_erofs_pagevec_ctor_init(&ctor, Z_EROFS_NR_INLINE_PAGEVECS,
-				  pcl->pagevec, 0);
+	z_erofs_bvec_iter_begin(&biter, &pcl->bvset,
+				Z_EROFS_NR_INLINE_PAGEVECS, 0);
 	for (i = 0; i < pcl->vcnt; ++i) {
-		struct page *page = z_erofs_pagevec_dequeue(&ctor, &page_type);
+		struct z_erofs_bvec bvec;
 		unsigned int pagenr;
 
-		/* all pages in pagevec ought to be valid */
-		DBG_BUGON(!page);
-		DBG_BUGON(z_erofs_page_is_invalidated(page));
-
-		if (z_erofs_put_shortlivedpage(pagepool, page))
-			continue;
+		z_erofs_bvec_dequeue(&biter, &bvec, &old_bvpage);
 
-		if (page_type == Z_EROFS_VLE_PAGE_TYPE_HEAD)
-			pagenr = 0;
-		else
-			pagenr = z_erofs_onlinepage_index(page);
+		if (old_bvpage)
+			z_erofs_put_shortlivedpage(pagepool, old_bvpage);
 
+		pagenr = (bvec.offset + pcl->pageofs_out) >> PAGE_SHIFT;
 		DBG_BUGON(pagenr >= pcl->nr_pages);
+		DBG_BUGON(z_erofs_page_is_invalidated(bvec.page));
 		/*
 		 * currently EROFS doesn't support multiref(dedup),
 		 * so here erroring out one multiref page.
@@ -814,9 +895,12 @@ static int z_erofs_parse_out_bvecs(struct z_erofs_pcluster *pcl,
 			z_erofs_onlinepage_endio(pages[pagenr]);
 			err = -EFSCORRUPTED;
 		}
-		pages[pagenr] = page;
+		pages[pagenr] = bvec.page;
 	}
-	z_erofs_pagevec_ctor_exit(&ctor, true);
+
+	old_bvpage = z_erofs_bvec_iter_end(&biter);
+	if (old_bvpage)
+		z_erofs_put_shortlivedpage(pagepool, old_bvpage);
 	return err;
 }
 
@@ -986,6 +1070,7 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
 		kvfree(pages);
 
 	pcl->nr_pages = 0;
+	pcl->bvset.nextpage = NULL;
 	pcl->vcnt = 0;
 
 	/* pcluster lock MUST be taken before the following line */
diff --git a/fs/erofs/zdata.h b/fs/erofs/zdata.h
index 58053bb5066f..d03e333e4fde 100644
--- a/fs/erofs/zdata.h
+++ b/fs/erofs/zdata.h
@@ -21,6 +21,21 @@
  */
 typedef void *z_erofs_next_pcluster_t;
 
+struct z_erofs_bvec {
+	struct page *page;
+	int offset;
+	unsigned int end;
+};
+
+#define __Z_EROFS_BVSET(name, total) \
+struct name { \
+	/* point to the next page which contains the following bvecs */ \
+	struct page *nextpage; \
+	struct z_erofs_bvec bvec[total]; \
+}
+__Z_EROFS_BVSET(z_erofs_bvset,);
+__Z_EROFS_BVSET(z_erofs_bvset_inline, Z_EROFS_NR_INLINE_PAGEVECS);
+
 /*
  * Structure fields follow one of the following exclusion rules.
  *
@@ -41,22 +56,25 @@ struct z_erofs_pcluster {
 	/* A: lower limit of decompressed length and if full length or not */
 	unsigned int length;
 
+	/* L: total number of bvecs */
+	unsigned int vcnt;
+
 	/* I: page offset of start position of decompression */
 	unsigned short pageofs_out;
 
 	/* I: page offset of inline compressed data */
 	unsigned short pageofs_in;
 
-	/* L: maximum relative page index in pagevec[] */
+	/* L: maximum relative page index in bvecs */
 	unsigned short nr_pages;
 
-	/* L: total number of pages in pagevec[] */
-	unsigned int vcnt;
-
 	union {
 		/* L: inline a certain number of pagevecs for bootstrap */
 		erofs_vtptr_t pagevec[Z_EROFS_NR_INLINE_PAGEVECS];
 
+		/* L: inline a certain number of bvec for bootstrap */
+		struct z_erofs_bvset_inline bvset;
+
 		/* I: can be used to free the pcluster by RCU. */
 		struct rcu_head rcu;
 	};
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/16] erofs: drop the old pagevec approach
  2022-07-14 13:20 [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
                   ` (3 preceding siblings ...)
  2022-07-14 13:20 ` [PATCH 04/16] erofs: introduce bufvec to store decompressed buffers Gao Xiang
@ 2022-07-14 13:20 ` Gao Xiang
  2022-07-15  7:07   ` Yue Hu
  2022-07-14 13:20 ` [PATCH 06/16] erofs: introduce `z_erofs_parse_in_bvecs' Gao Xiang
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 28+ messages in thread
From: Gao Xiang @ 2022-07-14 13:20 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: Gao Xiang, LKML

Remove the old pagevec approach but keep z_erofs_page_type for now.
It will be reworked in the following commits as well.

Also rename Z_EROFS_NR_INLINE_PAGEVECS as Z_EROFS_INLINE_BVECS with
the new value 2 since it's actually enough to bootstrap.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/zdata.c |  17 +++--
 fs/erofs/zdata.h |   9 +--
 fs/erofs/zpvec.h | 159 -----------------------------------------------
 3 files changed, 16 insertions(+), 169 deletions(-)
 delete mode 100644 fs/erofs/zpvec.h

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index f52c54058f31..e96704db106e 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -27,6 +27,17 @@ static struct z_erofs_pcluster_slab pcluster_pool[] __read_mostly = {
 	_PCLP(Z_EROFS_PCLUSTER_MAX_PAGES)
 };
 
+/* page type in pagevec for decompress subsystem */
+enum z_erofs_page_type {
+	/* including Z_EROFS_VLE_PAGE_TAIL_EXCLUSIVE */
+	Z_EROFS_PAGE_TYPE_EXCLUSIVE,
+
+	Z_EROFS_VLE_PAGE_TYPE_TAIL_SHARED,
+
+	Z_EROFS_VLE_PAGE_TYPE_HEAD,
+	Z_EROFS_VLE_PAGE_TYPE_MAX
+};
+
 struct z_erofs_bvec_iter {
 	struct page *bvpage;
 	struct z_erofs_bvset *bvset;
@@ -273,7 +284,6 @@ struct z_erofs_decompress_frontend {
 	struct inode *const inode;
 	struct erofs_map_blocks map;
 	struct z_erofs_bvec_iter biter;
-	struct z_erofs_pagevec_ctor vector;
 
 	struct page *candidate_bvpage;
 	struct z_erofs_pcluster *pcl, *tailpcl;
@@ -636,7 +646,7 @@ static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe)
 		return ret;
 	}
 	z_erofs_bvec_iter_begin(&fe->biter, &fe->pcl->bvset,
-				Z_EROFS_NR_INLINE_PAGEVECS, fe->pcl->vcnt);
+				Z_EROFS_INLINE_BVECS, fe->pcl->vcnt);
 	/* since file-backed online pages are traversed in reverse order */
 	fe->icpage_ptr = fe->pcl->compressed_pages +
 			z_erofs_pclusterpages(fe->pcl);
@@ -871,8 +881,7 @@ static int z_erofs_parse_out_bvecs(struct z_erofs_pcluster *pcl,
 	struct page *old_bvpage;
 	int i, err = 0;
 
-	z_erofs_bvec_iter_begin(&biter, &pcl->bvset,
-				Z_EROFS_NR_INLINE_PAGEVECS, 0);
+	z_erofs_bvec_iter_begin(&biter, &pcl->bvset, Z_EROFS_INLINE_BVECS, 0);
 	for (i = 0; i < pcl->vcnt; ++i) {
 		struct z_erofs_bvec bvec;
 		unsigned int pagenr;
diff --git a/fs/erofs/zdata.h b/fs/erofs/zdata.h
index d03e333e4fde..a755c5a44d87 100644
--- a/fs/erofs/zdata.h
+++ b/fs/erofs/zdata.h
@@ -7,10 +7,10 @@
 #define __EROFS_FS_ZDATA_H
 
 #include "internal.h"
-#include "zpvec.h"
+#include "tagptr.h"
 
 #define Z_EROFS_PCLUSTER_MAX_PAGES	(Z_EROFS_PCLUSTER_MAX_SIZE / PAGE_SIZE)
-#define Z_EROFS_NR_INLINE_PAGEVECS      3
+#define Z_EROFS_INLINE_BVECS		2
 
 #define Z_EROFS_PCLUSTER_FULL_LENGTH    0x00000001
 #define Z_EROFS_PCLUSTER_LENGTH_BIT     1
@@ -34,7 +34,7 @@ struct name { \
 	struct z_erofs_bvec bvec[total]; \
 };
 __Z_EROFS_BVSET(z_erofs_bvset,)
-__Z_EROFS_BVSET(z_erofs_bvset_inline, Z_EROFS_NR_INLINE_PAGEVECS)
+__Z_EROFS_BVSET(z_erofs_bvset_inline, Z_EROFS_INLINE_BVECS)
 
 /*
  * Structure fields follow one of the following exclusion rules.
@@ -69,9 +69,6 @@ struct z_erofs_pcluster {
 	unsigned short nr_pages;
 
 	union {
-		/* L: inline a certain number of pagevecs for bootstrap */
-		erofs_vtptr_t pagevec[Z_EROFS_NR_INLINE_PAGEVECS];
-
 		/* L: inline a certain number of bvec for bootstrap */
 		struct z_erofs_bvset_inline bvset;
 
diff --git a/fs/erofs/zpvec.h b/fs/erofs/zpvec.h
deleted file mode 100644
index b05464f4a808..000000000000
--- a/fs/erofs/zpvec.h
+++ /dev/null
@@ -1,159 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (C) 2018 HUAWEI, Inc.
- *             https://www.huawei.com/
- */
-#ifndef __EROFS_FS_ZPVEC_H
-#define __EROFS_FS_ZPVEC_H
-
-#include "tagptr.h"
-
-/* page type in pagevec for decompress subsystem */
-enum z_erofs_page_type {
-	/* including Z_EROFS_VLE_PAGE_TAIL_EXCLUSIVE */
-	Z_EROFS_PAGE_TYPE_EXCLUSIVE,
-
-	Z_EROFS_VLE_PAGE_TYPE_TAIL_SHARED,
-
-	Z_EROFS_VLE_PAGE_TYPE_HEAD,
-	Z_EROFS_VLE_PAGE_TYPE_MAX
-};
-
-extern void __compiletime_error("Z_EROFS_PAGE_TYPE_EXCLUSIVE != 0")
-	__bad_page_type_exclusive(void);
-
-/* pagevec tagged pointer */
-typedef tagptr2_t	erofs_vtptr_t;
-
-/* pagevec collector */
-struct z_erofs_pagevec_ctor {
-	struct page *curr, *next;
-	erofs_vtptr_t *pages;
-
-	unsigned int nr, index;
-};
-
-static inline void z_erofs_pagevec_ctor_exit(struct z_erofs_pagevec_ctor *ctor,
-					     bool atomic)
-{
-	if (!ctor->curr)
-		return;
-
-	if (atomic)
-		kunmap_atomic(ctor->pages);
-	else
-		kunmap(ctor->curr);
-}
-
-static inline struct page *
-z_erofs_pagevec_ctor_next_page(struct z_erofs_pagevec_ctor *ctor,
-			       unsigned int nr)
-{
-	unsigned int index;
-
-	/* keep away from occupied pages */
-	if (ctor->next)
-		return ctor->next;
-
-	for (index = 0; index < nr; ++index) {
-		const erofs_vtptr_t t = ctor->pages[index];
-		const unsigned int tags = tagptr_unfold_tags(t);
-
-		if (tags == Z_EROFS_PAGE_TYPE_EXCLUSIVE)
-			return tagptr_unfold_ptr(t);
-	}
-	DBG_BUGON(nr >= ctor->nr);
-	return NULL;
-}
-
-static inline void
-z_erofs_pagevec_ctor_pagedown(struct z_erofs_pagevec_ctor *ctor,
-			      bool atomic)
-{
-	struct page *next = z_erofs_pagevec_ctor_next_page(ctor, ctor->nr);
-
-	z_erofs_pagevec_ctor_exit(ctor, atomic);
-
-	ctor->curr = next;
-	ctor->next = NULL;
-	ctor->pages = atomic ?
-		kmap_atomic(ctor->curr) : kmap(ctor->curr);
-
-	ctor->nr = PAGE_SIZE / sizeof(struct page *);
-	ctor->index = 0;
-}
-
-static inline void z_erofs_pagevec_ctor_init(struct z_erofs_pagevec_ctor *ctor,
-					     unsigned int nr,
-					     erofs_vtptr_t *pages,
-					     unsigned int i)
-{
-	ctor->nr = nr;
-	ctor->curr = ctor->next = NULL;
-	ctor->pages = pages;
-
-	if (i >= nr) {
-		i -= nr;
-		z_erofs_pagevec_ctor_pagedown(ctor, false);
-		while (i > ctor->nr) {
-			i -= ctor->nr;
-			z_erofs_pagevec_ctor_pagedown(ctor, false);
-		}
-	}
-	ctor->next = z_erofs_pagevec_ctor_next_page(ctor, i);
-	ctor->index = i;
-}
-
-static inline bool z_erofs_pagevec_enqueue(struct z_erofs_pagevec_ctor *ctor,
-					   struct page *page,
-					   enum z_erofs_page_type type,
-					   bool pvec_safereuse)
-{
-	if (!ctor->next) {
-		/* some pages cannot be reused as pvec safely without I/O */
-		if (type == Z_EROFS_PAGE_TYPE_EXCLUSIVE && !pvec_safereuse)
-			type = Z_EROFS_VLE_PAGE_TYPE_TAIL_SHARED;
-
-		if (type != Z_EROFS_PAGE_TYPE_EXCLUSIVE &&
-		    ctor->index + 1 == ctor->nr)
-			return false;
-	}
-
-	if (ctor->index >= ctor->nr)
-		z_erofs_pagevec_ctor_pagedown(ctor, false);
-
-	/* exclusive page type must be 0 */
-	if (Z_EROFS_PAGE_TYPE_EXCLUSIVE != (uintptr_t)NULL)
-		__bad_page_type_exclusive();
-
-	/* should remind that collector->next never equal to 1, 2 */
-	if (type == (uintptr_t)ctor->next) {
-		ctor->next = page;
-	}
-	ctor->pages[ctor->index++] = tagptr_fold(erofs_vtptr_t, page, type);
-	return true;
-}
-
-static inline struct page *
-z_erofs_pagevec_dequeue(struct z_erofs_pagevec_ctor *ctor,
-			enum z_erofs_page_type *type)
-{
-	erofs_vtptr_t t;
-
-	if (ctor->index >= ctor->nr) {
-		DBG_BUGON(!ctor->next);
-		z_erofs_pagevec_ctor_pagedown(ctor, true);
-	}
-
-	t = ctor->pages[ctor->index];
-
-	*type = tagptr_unfold_tags(t);
-
-	/* should remind that collector->next never equal to 1, 2 */
-	if (*type == (uintptr_t)ctor->next)
-		ctor->next = tagptr_unfold_ptr(t);
-
-	ctor->pages[ctor->index++] = tagptr_fold(erofs_vtptr_t, NULL, 0);
-	return tagptr_unfold_ptr(t);
-}
-#endif
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/16] erofs: introduce `z_erofs_parse_in_bvecs'
  2022-07-14 13:20 [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
                   ` (4 preceding siblings ...)
  2022-07-14 13:20 ` [PATCH 05/16] erofs: drop the old pagevec approach Gao Xiang
@ 2022-07-14 13:20 ` Gao Xiang
  2022-07-14 13:20 ` [PATCH 07/16] erofs: switch compressed_pages[] to bufvec Gao Xiang
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Gao Xiang @ 2022-07-14 13:20 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: Gao Xiang, LKML

`z_erofs_decompress_pcluster()' is too long therefore it'd be better
to introduce another helper to parse compressed pages (or laterly,
compressed bvecs.)

BTW, since `compressed_bvecs' is too long as a part of the function
name, `in_bvecs' is used here instead.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/zdata.c | 132 ++++++++++++++++++++++++++++-------------------
 1 file changed, 80 insertions(+), 52 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index e96704db106e..757d352bc2c7 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -913,6 +913,76 @@ static int z_erofs_parse_out_bvecs(struct z_erofs_pcluster *pcl,
 	return err;
 }
 
+static struct page **z_erofs_parse_in_bvecs(struct erofs_sb_info *sbi,
+			struct z_erofs_pcluster *pcl, struct page **pages,
+			struct page **pagepool, bool *overlapped)
+{
+	unsigned int pclusterpages = z_erofs_pclusterpages(pcl);
+	struct page **compressed_pages;
+	int i, err = 0;
+
+	/* XXX: will have a better approach in the following commits */
+	compressed_pages = kmalloc_array(pclusterpages, sizeof(struct page *),
+					 GFP_KERNEL | __GFP_NOFAIL);
+	*overlapped = false;
+
+	for (i = 0; i < pclusterpages; ++i) {
+		unsigned int pagenr;
+		struct page *page = pcl->compressed_pages[i];
+
+		/* compressed pages ought to be present before decompressing */
+		if (!page) {
+			DBG_BUGON(1);
+			continue;
+		}
+		compressed_pages[i] = page;
+
+		if (z_erofs_is_inline_pcluster(pcl)) {
+			if (!PageUptodate(page))
+				err = -EIO;
+			continue;
+		}
+
+		DBG_BUGON(z_erofs_page_is_invalidated(page));
+		if (!z_erofs_is_shortlived_page(page)) {
+			if (erofs_page_is_managed(sbi, page)) {
+				if (!PageUptodate(page))
+					err = -EIO;
+				continue;
+			}
+
+			/*
+			 * only if non-head page can be selected
+			 * for inplace decompression
+			 */
+			pagenr = z_erofs_onlinepage_index(page);
+
+			DBG_BUGON(pagenr >= pcl->nr_pages);
+			if (pages[pagenr]) {
+				DBG_BUGON(1);
+				SetPageError(pages[pagenr]);
+				z_erofs_onlinepage_endio(pages[pagenr]);
+				err = -EFSCORRUPTED;
+			}
+			pages[pagenr] = page;
+
+			*overlapped = true;
+		}
+
+		/* PG_error needs checking for all non-managed pages */
+		if (PageError(page)) {
+			DBG_BUGON(PageUptodate(page));
+			err = -EIO;
+		}
+	}
+
+	if (err) {
+		kfree(compressed_pages);
+		return ERR_PTR(err);
+	}
+	return compressed_pages;
+}
+
 static int z_erofs_decompress_pcluster(struct super_block *sb,
 				       struct z_erofs_pcluster *pcl,
 				       struct page **pagepool)
@@ -957,54 +1027,11 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
 		pages[i] = NULL;
 
 	err = z_erofs_parse_out_bvecs(pcl, pages, pagepool);
-
-	overlapped = false;
-	compressed_pages = pcl->compressed_pages;
-
-	for (i = 0; i < pclusterpages; ++i) {
-		unsigned int pagenr;
-
-		page = compressed_pages[i];
-		/* all compressed pages ought to be valid */
-		DBG_BUGON(!page);
-
-		if (z_erofs_is_inline_pcluster(pcl)) {
-			if (!PageUptodate(page))
-				err = -EIO;
-			continue;
-		}
-
-		DBG_BUGON(z_erofs_page_is_invalidated(page));
-		if (!z_erofs_is_shortlived_page(page)) {
-			if (erofs_page_is_managed(sbi, page)) {
-				if (!PageUptodate(page))
-					err = -EIO;
-				continue;
-			}
-
-			/*
-			 * only if non-head page can be selected
-			 * for inplace decompression
-			 */
-			pagenr = z_erofs_onlinepage_index(page);
-
-			DBG_BUGON(pagenr >= nr_pages);
-			if (pages[pagenr]) {
-				DBG_BUGON(1);
-				SetPageError(pages[pagenr]);
-				z_erofs_onlinepage_endio(pages[pagenr]);
-				err = -EFSCORRUPTED;
-			}
-			pages[pagenr] = page;
-
-			overlapped = true;
-		}
-
-		/* PG_error needs checking for all non-managed pages */
-		if (PageError(page)) {
-			DBG_BUGON(PageUptodate(page));
-			err = -EIO;
-		}
+	compressed_pages = z_erofs_parse_in_bvecs(sbi, pcl, pages,
+						pagepool, &overlapped);
+	if (IS_ERR(compressed_pages)) {
+		err = PTR_ERR(compressed_pages);
+		compressed_pages = NULL;
 	}
 
 	if (err)
@@ -1040,21 +1067,22 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
 out:
 	/* must handle all compressed pages before actual file pages */
 	if (z_erofs_is_inline_pcluster(pcl)) {
-		page = compressed_pages[0];
-		WRITE_ONCE(compressed_pages[0], NULL);
+		page = pcl->compressed_pages[0];
+		WRITE_ONCE(pcl->compressed_pages[0], NULL);
 		put_page(page);
 	} else {
 		for (i = 0; i < pclusterpages; ++i) {
-			page = compressed_pages[i];
+			page = pcl->compressed_pages[i];
 
 			if (erofs_page_is_managed(sbi, page))
 				continue;
 
 			/* recycle all individual short-lived pages */
 			(void)z_erofs_put_shortlivedpage(pagepool, page);
-			WRITE_ONCE(compressed_pages[i], NULL);
+			WRITE_ONCE(pcl->compressed_pages[i], NULL);
 		}
 	}
+	kfree(compressed_pages);
 
 	for (i = 0; i < nr_pages; ++i) {
 		page = pages[i];
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/16] erofs: switch compressed_pages[] to bufvec
  2022-07-14 13:20 [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
                   ` (5 preceding siblings ...)
  2022-07-14 13:20 ` [PATCH 06/16] erofs: introduce `z_erofs_parse_in_bvecs' Gao Xiang
@ 2022-07-14 13:20 ` Gao Xiang
  2022-07-15  7:53   ` Yue Hu
  2022-07-14 13:20 ` [PATCH 08/16] erofs: rework online page handling Gao Xiang
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 28+ messages in thread
From: Gao Xiang @ 2022-07-14 13:20 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: Gao Xiang, LKML

Convert compressed_pages[] to bufvec in order to avoid using
page->private to keep onlinepage_index (decompressed offset)
for inplace I/O pages.

In the future, we only rely on folio->private to keep a countdown
to unlock folios and set folio_uptodate.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/zdata.c | 113 +++++++++++++++++++++++------------------------
 fs/erofs/zdata.h |   4 +-
 2 files changed, 57 insertions(+), 60 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 757d352bc2c7..f2e3f07baad7 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -134,7 +134,7 @@ static int z_erofs_create_pcluster_pool(void)
 
 	for (pcs = pcluster_pool;
 	     pcs < pcluster_pool + ARRAY_SIZE(pcluster_pool); ++pcs) {
-		size = struct_size(a, compressed_pages, pcs->maxpages);
+		size = struct_size(a, compressed_bvecs, pcs->maxpages);
 
 		sprintf(pcs->name, "erofs_pcluster-%u", pcs->maxpages);
 		pcs->slab = kmem_cache_create(pcs->name, size, 0,
@@ -287,16 +287,16 @@ struct z_erofs_decompress_frontend {
 
 	struct page *candidate_bvpage;
 	struct z_erofs_pcluster *pcl, *tailpcl;
-	/* a pointer used to pick up inplace I/O pages */
-	struct page **icpage_ptr;
 	z_erofs_next_pcluster_t owned_head;
-
 	enum z_erofs_collectmode mode;
 
 	bool readahead;
 	/* used for applying cache strategy on the fly */
 	bool backmost;
 	erofs_off_t headoffset;
+
+	/* a pointer used to pick up inplace I/O pages */
+	unsigned int icur;
 };
 
 #define DECOMPRESS_FRONTEND_INIT(__i) { \
@@ -319,24 +319,21 @@ static void z_erofs_bind_cache(struct z_erofs_decompress_frontend *fe,
 	 */
 	gfp_t gfp = (mapping_gfp_mask(mc) & ~__GFP_DIRECT_RECLAIM) |
 			__GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN;
-	struct page **pages;
-	pgoff_t index;
+	unsigned int i;
 
 	if (fe->mode < COLLECT_PRIMARY_FOLLOWED)
 		return;
 
-	pages = pcl->compressed_pages;
-	index = pcl->obj.index;
-	for (; index < pcl->obj.index + pcl->pclusterpages; ++index, ++pages) {
+	for (i = 0; i < pcl->pclusterpages; ++i) {
 		struct page *page;
 		compressed_page_t t;
 		struct page *newpage = NULL;
 
 		/* the compressed page was loaded before */
-		if (READ_ONCE(*pages))
+		if (READ_ONCE(pcl->compressed_bvecs[i].page))
 			continue;
 
-		page = find_get_page(mc, index);
+		page = find_get_page(mc, pcl->obj.index + i);
 
 		if (page) {
 			t = tag_compressed_page_justfound(page);
@@ -357,7 +354,8 @@ static void z_erofs_bind_cache(struct z_erofs_decompress_frontend *fe,
 			}
 		}
 
-		if (!cmpxchg_relaxed(pages, NULL, tagptr_cast_ptr(t)))
+		if (!cmpxchg_relaxed(&pcl->compressed_bvecs[i].page, NULL,
+				     tagptr_cast_ptr(t)))
 			continue;
 
 		if (page)
@@ -388,7 +386,7 @@ int erofs_try_to_free_all_cached_pages(struct erofs_sb_info *sbi,
 	 * therefore no need to worry about available decompression users.
 	 */
 	for (i = 0; i < pcl->pclusterpages; ++i) {
-		struct page *page = pcl->compressed_pages[i];
+		struct page *page = pcl->compressed_bvecs[i].page;
 
 		if (!page)
 			continue;
@@ -401,7 +399,7 @@ int erofs_try_to_free_all_cached_pages(struct erofs_sb_info *sbi,
 			continue;
 
 		/* barrier is implied in the following 'unlock_page' */
-		WRITE_ONCE(pcl->compressed_pages[i], NULL);
+		WRITE_ONCE(pcl->compressed_bvecs[i].page, NULL);
 		detach_page_private(page);
 		unlock_page(page);
 	}
@@ -411,36 +409,39 @@ int erofs_try_to_free_all_cached_pages(struct erofs_sb_info *sbi,
 int erofs_try_to_free_cached_page(struct page *page)
 {
 	struct z_erofs_pcluster *const pcl = (void *)page_private(page);
-	int ret = 0;	/* 0 - busy */
+	int ret, i;
 
-	if (erofs_workgroup_try_to_freeze(&pcl->obj, 1)) {
-		unsigned int i;
+	if (!erofs_workgroup_try_to_freeze(&pcl->obj, 1))
+		return 0;
 
-		DBG_BUGON(z_erofs_is_inline_pcluster(pcl));
-		for (i = 0; i < pcl->pclusterpages; ++i) {
-			if (pcl->compressed_pages[i] == page) {
-				WRITE_ONCE(pcl->compressed_pages[i], NULL);
-				ret = 1;
-				break;
-			}
+	ret = 0;
+	DBG_BUGON(z_erofs_is_inline_pcluster(pcl));
+	for (i = 0; i < pcl->pclusterpages; ++i) {
+		if (pcl->compressed_bvecs[i].page == page) {
+			WRITE_ONCE(pcl->compressed_bvecs[i].page, NULL);
+			ret = 1;
+			break;
 		}
-		erofs_workgroup_unfreeze(&pcl->obj, 1);
-
-		if (ret)
-			detach_page_private(page);
 	}
+	erofs_workgroup_unfreeze(&pcl->obj, 1);
+	if (ret)
+		detach_page_private(page);
 	return ret;
 }
 
 /* page_type must be Z_EROFS_PAGE_TYPE_EXCLUSIVE */
 static bool z_erofs_try_inplace_io(struct z_erofs_decompress_frontend *fe,
-				   struct page *page)
+				   struct z_erofs_bvec *bvec)
 {
 	struct z_erofs_pcluster *const pcl = fe->pcl;
 
-	while (fe->icpage_ptr > pcl->compressed_pages)
-		if (!cmpxchg(--fe->icpage_ptr, NULL, page))
+	while (fe->icur > 0) {
+		if (!cmpxchg(&pcl->compressed_bvecs[--fe->icur].page,
+			     NULL, bvec->page)) {
+			pcl->compressed_bvecs[fe->icur] = *bvec;
 			return true;
+		}
+	}
 	return false;
 }
 
@@ -454,7 +455,7 @@ static int z_erofs_attach_page(struct z_erofs_decompress_frontend *fe,
 	if (fe->mode >= COLLECT_PRIMARY &&
 	    type == Z_EROFS_PAGE_TYPE_EXCLUSIVE) {
 		/* give priority for inplaceio to use file pages first */
-		if (z_erofs_try_inplace_io(fe, bvec->page))
+		if (z_erofs_try_inplace_io(fe, bvec))
 			return 0;
 		/* otherwise, check if it can be used as a bvpage */
 		if (fe->mode >= COLLECT_PRIMARY_FOLLOWED &&
@@ -648,8 +649,7 @@ static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe)
 	z_erofs_bvec_iter_begin(&fe->biter, &fe->pcl->bvset,
 				Z_EROFS_INLINE_BVECS, fe->pcl->vcnt);
 	/* since file-backed online pages are traversed in reverse order */
-	fe->icpage_ptr = fe->pcl->compressed_pages +
-			z_erofs_pclusterpages(fe->pcl);
+	fe->icur = z_erofs_pclusterpages(fe->pcl);
 	return 0;
 }
 
@@ -769,7 +769,8 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
 			goto err_out;
 		}
 		get_page(fe->map.buf.page);
-		WRITE_ONCE(fe->pcl->compressed_pages[0], fe->map.buf.page);
+		WRITE_ONCE(fe->pcl->compressed_bvecs[0].page,
+			   fe->map.buf.page);
 		fe->mode = COLLECT_PRIMARY_FOLLOWED_NOINPLACE;
 	} else {
 		/* bind cache first when cached decompression is preferred */
@@ -927,8 +928,9 @@ static struct page **z_erofs_parse_in_bvecs(struct erofs_sb_info *sbi,
 	*overlapped = false;
 
 	for (i = 0; i < pclusterpages; ++i) {
-		unsigned int pagenr;
-		struct page *page = pcl->compressed_pages[i];
+		struct z_erofs_bvec *bvec = &pcl->compressed_bvecs[i];
+		struct page *page = bvec->page;
+		unsigned int pgnr;
 
 		/* compressed pages ought to be present before decompressing */
 		if (!page) {
@@ -951,21 +953,15 @@ static struct page **z_erofs_parse_in_bvecs(struct erofs_sb_info *sbi,
 				continue;
 			}
 
-			/*
-			 * only if non-head page can be selected
-			 * for inplace decompression
-			 */
-			pagenr = z_erofs_onlinepage_index(page);
-
-			DBG_BUGON(pagenr >= pcl->nr_pages);
-			if (pages[pagenr]) {
+			pgnr = (bvec->offset + pcl->pageofs_out) >> PAGE_SHIFT;
+			DBG_BUGON(pgnr >= pcl->nr_pages);
+			if (pages[pgnr]) {
 				DBG_BUGON(1);
-				SetPageError(pages[pagenr]);
-				z_erofs_onlinepage_endio(pages[pagenr]);
+				SetPageError(pages[pgnr]);
+				z_erofs_onlinepage_endio(pages[pgnr]);
 				err = -EFSCORRUPTED;
 			}
-			pages[pagenr] = page;
-
+			pages[pgnr] = page;
 			*overlapped = true;
 		}
 
@@ -1067,19 +1063,19 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
 out:
 	/* must handle all compressed pages before actual file pages */
 	if (z_erofs_is_inline_pcluster(pcl)) {
-		page = pcl->compressed_pages[0];
-		WRITE_ONCE(pcl->compressed_pages[0], NULL);
+		page = pcl->compressed_bvecs[0].page;
+		WRITE_ONCE(pcl->compressed_bvecs[0].page, NULL);
 		put_page(page);
 	} else {
 		for (i = 0; i < pclusterpages; ++i) {
-			page = pcl->compressed_pages[i];
+			page = pcl->compressed_bvecs[i].page;
 
 			if (erofs_page_is_managed(sbi, page))
 				continue;
 
 			/* recycle all individual short-lived pages */
 			(void)z_erofs_put_shortlivedpage(pagepool, page);
-			WRITE_ONCE(pcl->compressed_pages[i], NULL);
+			WRITE_ONCE(pcl->compressed_bvecs[i].page, NULL);
 		}
 	}
 	kfree(compressed_pages);
@@ -1193,7 +1189,7 @@ static struct page *pickup_page_for_submission(struct z_erofs_pcluster *pcl,
 	int justfound;
 
 repeat:
-	page = READ_ONCE(pcl->compressed_pages[nr]);
+	page = READ_ONCE(pcl->compressed_bvecs[nr].page);
 	oldpage = page;
 
 	if (!page)
@@ -1209,7 +1205,7 @@ static struct page *pickup_page_for_submission(struct z_erofs_pcluster *pcl,
 	 * otherwise, it will go inplace I/O path instead.
 	 */
 	if (page->private == Z_EROFS_PREALLOCATED_PAGE) {
-		WRITE_ONCE(pcl->compressed_pages[nr], page);
+		WRITE_ONCE(pcl->compressed_bvecs[nr].page, page);
 		set_page_private(page, 0);
 		tocache = true;
 		goto out_tocache;
@@ -1235,14 +1231,14 @@ static struct page *pickup_page_for_submission(struct z_erofs_pcluster *pcl,
 
 	/* the page is still in manage cache */
 	if (page->mapping == mc) {
-		WRITE_ONCE(pcl->compressed_pages[nr], page);
+		WRITE_ONCE(pcl->compressed_bvecs[nr].page, page);
 
 		ClearPageError(page);
 		if (!PagePrivate(page)) {
 			/*
 			 * impossible to be !PagePrivate(page) for
 			 * the current restriction as well if
-			 * the page is already in compressed_pages[].
+			 * the page is already in compressed_bvecs[].
 			 */
 			DBG_BUGON(!justfound);
 
@@ -1271,7 +1267,8 @@ static struct page *pickup_page_for_submission(struct z_erofs_pcluster *pcl,
 	put_page(page);
 out_allocpage:
 	page = erofs_allocpage(pagepool, gfp | __GFP_NOFAIL);
-	if (oldpage != cmpxchg(&pcl->compressed_pages[nr], oldpage, page)) {
+	if (oldpage != cmpxchg(&pcl->compressed_bvecs[nr].page,
+			       oldpage, page)) {
 		erofs_pagepool_add(pagepool, page);
 		cond_resched();
 		goto repeat;
diff --git a/fs/erofs/zdata.h b/fs/erofs/zdata.h
index a755c5a44d87..a70f1b73e901 100644
--- a/fs/erofs/zdata.h
+++ b/fs/erofs/zdata.h
@@ -87,8 +87,8 @@ struct z_erofs_pcluster {
 	/* I: compression algorithm format */
 	unsigned char algorithmformat;
 
-	/* A: compressed pages (can be cached or inplaced pages) */
-	struct page *compressed_pages[];
+	/* A: compressed bvecs (can be cached or inplaced pages) */
+	struct z_erofs_bvec compressed_bvecs[];
 };
 
 /* let's avoid the valid 32-bit kernel addresses */
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/16] erofs: rework online page handling
  2022-07-14 13:20 [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
                   ` (6 preceding siblings ...)
  2022-07-14 13:20 ` [PATCH 07/16] erofs: switch compressed_pages[] to bufvec Gao Xiang
@ 2022-07-14 13:20 ` Gao Xiang
  2022-07-14 13:20 ` [PATCH 09/16] erofs: get rid of `enum z_erofs_page_type' Gao Xiang
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Gao Xiang @ 2022-07-14 13:20 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: Gao Xiang, LKML

Since all decompressed offsets have been integrated to bvecs[], this
patch avoids all sub-indexes so that page->private only includes a
part count and an eio flag, thus in the future folio->private can have
the same meaning.

In addition, PG_error will not be used anymore after this patch and
we're heading to use page->private (later folio->private) and
page->mapping  (later folio->mapping) only.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/zdata.c | 51 ++++++++++++++----------------------
 fs/erofs/zdata.h | 68 ++++++++++++++----------------------------------
 2 files changed, 38 insertions(+), 81 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index f2e3f07baad7..9065e160d6a6 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -743,7 +743,7 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
 		map->m_llen = 0;
 		err = z_erofs_map_blocks_iter(inode, map, 0);
 		if (err)
-			goto err_out;
+			goto out;
 	} else {
 		if (fe->pcl)
 			goto hitted;
@@ -755,7 +755,7 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
 
 	err = z_erofs_collector_begin(fe);
 	if (err)
-		goto err_out;
+		goto out;
 
 	if (z_erofs_is_inline_pcluster(fe->pcl)) {
 		void *mp;
@@ -766,7 +766,7 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
 			err = PTR_ERR(mp);
 			erofs_err(inode->i_sb,
 				  "failed to get inline page, err %d", err);
-			goto err_out;
+			goto out;
 		}
 		get_page(fe->map.buf.page);
 		WRITE_ONCE(fe->pcl->compressed_bvecs[0].page,
@@ -823,16 +823,15 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
 
 	if (err) {
 		DBG_BUGON(err == -EAGAIN && fe->candidate_bvpage);
-		goto err_out;
+		goto out;
 	}
 
-	index = page->index - (map->m_la >> PAGE_SHIFT);
-
-	z_erofs_onlinepage_fixup(page, index, true);
-
+	z_erofs_onlinepage_split(page);
 	/* bump up the number of spiltted parts of a page */
 	++spiltted;
+
 	/* also update nr_pages */
+	index = page->index - (map->m_la >> PAGE_SHIFT);
 	fe->pcl->nr_pages = max_t(pgoff_t, fe->pcl->nr_pages, index + 1);
 next_part:
 	/* can be used for verification */
@@ -843,16 +842,13 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
 		goto repeat;
 
 out:
+	if (err)
+		z_erofs_page_mark_eio(page);
 	z_erofs_onlinepage_endio(page);
 
 	erofs_dbg("%s, finish page: %pK spiltted: %u map->m_llen %llu",
 		  __func__, page, spiltted, map->m_llen);
 	return err;
-
-	/* if some error occurred while processing this page */
-err_out:
-	SetPageError(page);
-	goto out;
 }
 
 static bool z_erofs_get_sync_decompress_policy(struct erofs_sb_info *sbi,
@@ -901,7 +897,7 @@ static int z_erofs_parse_out_bvecs(struct z_erofs_pcluster *pcl,
 		 */
 		if (pages[pagenr]) {
 			DBG_BUGON(1);
-			SetPageError(pages[pagenr]);
+			z_erofs_page_mark_eio(pages[pagenr]);
 			z_erofs_onlinepage_endio(pages[pagenr]);
 			err = -EFSCORRUPTED;
 		}
@@ -957,19 +953,13 @@ static struct page **z_erofs_parse_in_bvecs(struct erofs_sb_info *sbi,
 			DBG_BUGON(pgnr >= pcl->nr_pages);
 			if (pages[pgnr]) {
 				DBG_BUGON(1);
-				SetPageError(pages[pgnr]);
+				z_erofs_page_mark_eio(pages[pgnr]);
 				z_erofs_onlinepage_endio(pages[pgnr]);
 				err = -EFSCORRUPTED;
 			}
 			pages[pgnr] = page;
 			*overlapped = true;
 		}
-
-		/* PG_error needs checking for all non-managed pages */
-		if (PageError(page)) {
-			DBG_BUGON(PageUptodate(page));
-			err = -EIO;
-		}
 	}
 
 	if (err) {
@@ -981,7 +971,7 @@ static struct page **z_erofs_parse_in_bvecs(struct erofs_sb_info *sbi,
 
 static int z_erofs_decompress_pcluster(struct super_block *sb,
 				       struct z_erofs_pcluster *pcl,
-				       struct page **pagepool)
+				       struct page **pagepool, int err)
 {
 	struct erofs_sb_info *const sbi = EROFS_SB(sb);
 	unsigned int pclusterpages = z_erofs_pclusterpages(pcl);
@@ -990,7 +980,6 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
 	struct page **pages, **compressed_pages, *page;
 
 	bool overlapped, partial;
-	int err;
 
 	might_sleep();
 	DBG_BUGON(!READ_ONCE(pcl->nr_pages));
@@ -1090,10 +1079,8 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
 		/* recycle all individual short-lived pages */
 		if (z_erofs_put_shortlivedpage(pagepool, page))
 			continue;
-
-		if (err < 0)
-			SetPageError(page);
-
+		if (err)
+			z_erofs_page_mark_eio(page);
 		z_erofs_onlinepage_endio(page);
 	}
 
@@ -1129,7 +1116,8 @@ static void z_erofs_decompress_queue(const struct z_erofs_decompressqueue *io,
 		pcl = container_of(owned, struct z_erofs_pcluster, next);
 		owned = READ_ONCE(pcl->next);
 
-		z_erofs_decompress_pcluster(io->sb, pcl, pagepool);
+		z_erofs_decompress_pcluster(io->sb, pcl, pagepool,
+					    io->eio ? -EIO : 0);
 		erofs_workgroup_put(&pcl->obj);
 	}
 }
@@ -1233,7 +1221,6 @@ static struct page *pickup_page_for_submission(struct z_erofs_pcluster *pcl,
 	if (page->mapping == mc) {
 		WRITE_ONCE(pcl->compressed_bvecs[nr].page, page);
 
-		ClearPageError(page);
 		if (!PagePrivate(page)) {
 			/*
 			 * impossible to be !PagePrivate(page) for
@@ -1305,6 +1292,7 @@ jobqueue_init(struct super_block *sb,
 		q = fgq;
 		init_completion(&fgq->u.done);
 		atomic_set(&fgq->pending_bios, 0);
+		q->eio = true;
 	}
 	q->sb = sb;
 	q->head = Z_EROFS_PCLUSTER_TAIL_CLOSED;
@@ -1365,15 +1353,14 @@ static void z_erofs_decompressqueue_endio(struct bio *bio)
 		DBG_BUGON(PageUptodate(page));
 		DBG_BUGON(z_erofs_page_is_invalidated(page));
 
-		if (err)
-			SetPageError(page);
-
 		if (erofs_page_is_managed(EROFS_SB(q->sb), page)) {
 			if (!err)
 				SetPageUptodate(page);
 			unlock_page(page);
 		}
 	}
+	if (err)
+		q->eio = true;
 	z_erofs_decompress_kickoff(q, tagptr_unfold_tags(t), -1);
 	bio_put(bio);
 }
diff --git a/fs/erofs/zdata.h b/fs/erofs/zdata.h
index a70f1b73e901..852da31e2e91 100644
--- a/fs/erofs/zdata.h
+++ b/fs/erofs/zdata.h
@@ -109,6 +109,8 @@ struct z_erofs_decompressqueue {
 		struct completion done;
 		struct work_struct work;
 	} u;
+
+	bool eio;
 };
 
 static inline bool z_erofs_is_inline_pcluster(struct z_erofs_pcluster *pcl)
@@ -123,38 +125,17 @@ static inline unsigned int z_erofs_pclusterpages(struct z_erofs_pcluster *pcl)
 	return pcl->pclusterpages;
 }
 
-#define Z_EROFS_ONLINEPAGE_COUNT_BITS   2
-#define Z_EROFS_ONLINEPAGE_COUNT_MASK   ((1 << Z_EROFS_ONLINEPAGE_COUNT_BITS) - 1)
-#define Z_EROFS_ONLINEPAGE_INDEX_SHIFT  (Z_EROFS_ONLINEPAGE_COUNT_BITS)
-
 /*
- * waiters (aka. ongoing_packs): # to unlock the page
- * sub-index: 0 - for partial page, >= 1 full page sub-index
+ * bit 31: I/O error occurred on this page
+ * bit 0 - 30: remaining parts to complete this page
  */
-typedef atomic_t z_erofs_onlinepage_t;
-
-/* type punning */
-union z_erofs_onlinepage_converter {
-	z_erofs_onlinepage_t *o;
-	unsigned long *v;
-};
-
-static inline unsigned int z_erofs_onlinepage_index(struct page *page)
-{
-	union z_erofs_onlinepage_converter u;
-
-	DBG_BUGON(!PagePrivate(page));
-	u.v = &page_private(page);
-
-	return atomic_read(u.o) >> Z_EROFS_ONLINEPAGE_INDEX_SHIFT;
-}
+#define Z_EROFS_PAGE_EIO			(1 << 31)
 
 static inline void z_erofs_onlinepage_init(struct page *page)
 {
 	union {
-		z_erofs_onlinepage_t o;
+		atomic_t o;
 		unsigned long v;
-	/* keep from being unlocked in advance */
 	} u = { .o = ATOMIC_INIT(1) };
 
 	set_page_private(page, u.v);
@@ -162,45 +143,34 @@ static inline void z_erofs_onlinepage_init(struct page *page)
 	SetPagePrivate(page);
 }
 
-static inline void z_erofs_onlinepage_fixup(struct page *page,
-	uintptr_t index, bool down)
+static inline void z_erofs_onlinepage_split(struct page *page)
 {
-	union z_erofs_onlinepage_converter u = { .v = &page_private(page) };
-	int orig, orig_index, val;
-
-repeat:
-	orig = atomic_read(u.o);
-	orig_index = orig >> Z_EROFS_ONLINEPAGE_INDEX_SHIFT;
-	if (orig_index) {
-		if (!index)
-			return;
+	atomic_inc((atomic_t *)&page->private);
+}
 
-		DBG_BUGON(orig_index != index);
-	}
+static inline void z_erofs_page_mark_eio(struct page *page)
+{
+	int orig;
 
-	val = (index << Z_EROFS_ONLINEPAGE_INDEX_SHIFT) |
-		((orig & Z_EROFS_ONLINEPAGE_COUNT_MASK) + (unsigned int)down);
-	if (atomic_cmpxchg(u.o, orig, val) != orig)
-		goto repeat;
+	do {
+		orig = atomic_read((atomic_t *)&page->private);
+	} while (atomic_cmpxchg((atomic_t *)&page->private, orig,
+				orig | Z_EROFS_PAGE_EIO) != orig);
 }
 
 static inline void z_erofs_onlinepage_endio(struct page *page)
 {
-	union z_erofs_onlinepage_converter u;
 	unsigned int v;
 
 	DBG_BUGON(!PagePrivate(page));
-	u.v = &page_private(page);
-
-	v = atomic_dec_return(u.o);
-	if (!(v & Z_EROFS_ONLINEPAGE_COUNT_MASK)) {
+	v = atomic_dec_return((atomic_t *)&page->private);
+	if (!(v & ~Z_EROFS_PAGE_EIO)) {
 		set_page_private(page, 0);
 		ClearPagePrivate(page);
-		if (!PageError(page))
+		if (!(v & Z_EROFS_PAGE_EIO))
 			SetPageUptodate(page);
 		unlock_page(page);
 	}
-	erofs_dbg("%s, page %p value %x", __func__, page, atomic_read(u.o));
 }
 
 #define Z_EROFS_VMAP_ONSTACK_PAGES	\
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/16] erofs: get rid of `enum z_erofs_page_type'
  2022-07-14 13:20 [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
                   ` (7 preceding siblings ...)
  2022-07-14 13:20 ` [PATCH 08/16] erofs: rework online page handling Gao Xiang
@ 2022-07-14 13:20 ` Gao Xiang
  2022-07-14 13:20 ` [PATCH 10/16] erofs: clean up `enum z_erofs_collectmode' Gao Xiang
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Gao Xiang @ 2022-07-14 13:20 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: Gao Xiang, LKML

Remove it since pagevec[] is no longer used.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/zdata.c | 30 +++++-------------------------
 1 file changed, 5 insertions(+), 25 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 9065e160d6a6..cdfb2706e4ae 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -27,17 +27,6 @@ static struct z_erofs_pcluster_slab pcluster_pool[] __read_mostly = {
 	_PCLP(Z_EROFS_PCLUSTER_MAX_PAGES)
 };
 
-/* page type in pagevec for decompress subsystem */
-enum z_erofs_page_type {
-	/* including Z_EROFS_VLE_PAGE_TAIL_EXCLUSIVE */
-	Z_EROFS_PAGE_TYPE_EXCLUSIVE,
-
-	Z_EROFS_VLE_PAGE_TYPE_TAIL_SHARED,
-
-	Z_EROFS_VLE_PAGE_TYPE_HEAD,
-	Z_EROFS_VLE_PAGE_TYPE_MAX
-};
-
 struct z_erofs_bvec_iter {
 	struct page *bvpage;
 	struct z_erofs_bvset *bvset;
@@ -429,7 +418,6 @@ int erofs_try_to_free_cached_page(struct page *page)
 	return ret;
 }
 
-/* page_type must be Z_EROFS_PAGE_TYPE_EXCLUSIVE */
 static bool z_erofs_try_inplace_io(struct z_erofs_decompress_frontend *fe,
 				   struct z_erofs_bvec *bvec)
 {
@@ -447,13 +435,11 @@ static bool z_erofs_try_inplace_io(struct z_erofs_decompress_frontend *fe,
 
 /* callers must be with pcluster lock held */
 static int z_erofs_attach_page(struct z_erofs_decompress_frontend *fe,
-			       struct z_erofs_bvec *bvec,
-			       enum z_erofs_page_type type)
+			       struct z_erofs_bvec *bvec, bool exclusive)
 {
 	int ret;
 
-	if (fe->mode >= COLLECT_PRIMARY &&
-	    type == Z_EROFS_PAGE_TYPE_EXCLUSIVE) {
+	if (fe->mode >= COLLECT_PRIMARY && exclusive) {
 		/* give priority for inplaceio to use file pages first */
 		if (z_erofs_try_inplace_io(fe, bvec))
 			return 0;
@@ -718,10 +704,9 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
 	struct erofs_sb_info *const sbi = EROFS_I_SB(inode);
 	struct erofs_map_blocks *const map = &fe->map;
 	const loff_t offset = page_offset(page);
-	bool tight = true;
+	bool tight = true, exclusive;
 
 	enum z_erofs_cache_alloctype cache_strategy;
-	enum z_erofs_page_type page_type;
 	unsigned int cur, end, spiltted, index;
 	int err = 0;
 
@@ -798,12 +783,7 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
 		goto next_part;
 	}
 
-	/* let's derive page type */
-	page_type = cur ? Z_EROFS_VLE_PAGE_TYPE_HEAD :
-		(!spiltted ? Z_EROFS_PAGE_TYPE_EXCLUSIVE :
-			(tight ? Z_EROFS_PAGE_TYPE_EXCLUSIVE :
-				Z_EROFS_VLE_PAGE_TYPE_TAIL_SHARED));
-
+	exclusive = (!cur && (!spiltted || tight));
 	if (cur)
 		tight &= (fe->mode >= COLLECT_PRIMARY_FOLLOWED);
 
@@ -812,7 +792,7 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
 					.page = page,
 					.offset = offset - map->m_la,
 					.end = end,
-				  }), page_type);
+				  }), exclusive);
 	/* should allocate an additional short-lived page for bvset */
 	if (err == -EAGAIN && !fe->candidate_bvpage) {
 		fe->candidate_bvpage = alloc_page(GFP_NOFS | __GFP_NOFAIL);
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/16] erofs: clean up `enum z_erofs_collectmode'
  2022-07-14 13:20 [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
                   ` (8 preceding siblings ...)
  2022-07-14 13:20 ` [PATCH 09/16] erofs: get rid of `enum z_erofs_page_type' Gao Xiang
@ 2022-07-14 13:20 ` Gao Xiang
  2022-07-14 13:20 ` [PATCH 11/16] erofs: get rid of `z_pagemap_global' Gao Xiang
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Gao Xiang @ 2022-07-14 13:20 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: Gao Xiang, LKML

`enum z_erofs_collectmode' is really ambiguous, but I'm not quite
sure if there are better naming, basically it's used to judge whether
inplace I/O can be used due to the current status of pclusters in
the chain.

Rename it as `enum z_erofs_pclustermode' instead.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/zdata.c | 63 ++++++++++++++++++++++++------------------------
 1 file changed, 31 insertions(+), 32 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index cdfb2706e4ae..55bcd6e5ae9a 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -227,30 +227,29 @@ int __init z_erofs_init_zip_subsystem(void)
 	return err;
 }
 
-enum z_erofs_collectmode {
-	COLLECT_SECONDARY,
-	COLLECT_PRIMARY,
+enum z_erofs_pclustermode {
+	Z_EROFS_PCLUSTER_INFLIGHT,
 	/*
-	 * The current collection was the tail of an exist chain, in addition
-	 * that the previous processed chained collections are all decided to
+	 * The current pclusters was the tail of an exist chain, in addition
+	 * that the previous processed chained pclusters are all decided to
 	 * be hooked up to it.
-	 * A new chain will be created for the remaining collections which are
-	 * not processed yet, therefore different from COLLECT_PRIMARY_FOLLOWED,
-	 * the next collection cannot reuse the whole page safely in
-	 * the following scenario:
+	 * A new chain will be created for the remaining pclusters which are
+	 * not processed yet, so different from Z_EROFS_PCLUSTER_FOLLOWED,
+	 * the next pcluster cannot reuse the whole page safely for inplace I/O
+	 * in the following scenario:
 	 *  ________________________________________________________________
 	 * |      tail (partial) page     |       head (partial) page       |
-	 * |   (belongs to the next cl)   |   (belongs to the current cl)   |
-	 * |_______PRIMARY_FOLLOWED_______|________PRIMARY_HOOKED___________|
+	 * |   (belongs to the next pcl)  |   (belongs to the current pcl)  |
+	 * |_______PCLUSTER_FOLLOWED______|________PCLUSTER_HOOKED__________|
 	 */
-	COLLECT_PRIMARY_HOOKED,
+	Z_EROFS_PCLUSTER_HOOKED,
 	/*
-	 * a weak form of COLLECT_PRIMARY_FOLLOWED, the difference is that it
+	 * a weak form of Z_EROFS_PCLUSTER_FOLLOWED, the difference is that it
 	 * could be dispatched into bypass queue later due to uptodated managed
 	 * pages. All related online pages cannot be reused for inplace I/O (or
 	 * pagevec) since it can be directly decoded without I/O submission.
 	 */
-	COLLECT_PRIMARY_FOLLOWED_NOINPLACE,
+	Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE,
 	/*
 	 * The current collection has been linked with the owned chain, and
 	 * could also be linked with the remaining collections, which means
@@ -261,12 +260,12 @@ enum z_erofs_collectmode {
 	 *  ________________________________________________________________
 	 * |  tail (partial) page |          head (partial) page           |
 	 * |  (of the current cl) |      (of the previous collection)      |
-	 * |  PRIMARY_FOLLOWED or |                                        |
-	 * |_____PRIMARY_HOOKED___|____________PRIMARY_FOLLOWED____________|
+	 * | PCLUSTER_FOLLOWED or |                                        |
+	 * |_____PCLUSTER_HOOKED__|___________PCLUSTER_FOLLOWED____________|
 	 *
 	 * [  (*) the above page can be used as inplace I/O.               ]
 	 */
-	COLLECT_PRIMARY_FOLLOWED,
+	Z_EROFS_PCLUSTER_FOLLOWED,
 };
 
 struct z_erofs_decompress_frontend {
@@ -277,7 +276,7 @@ struct z_erofs_decompress_frontend {
 	struct page *candidate_bvpage;
 	struct z_erofs_pcluster *pcl, *tailpcl;
 	z_erofs_next_pcluster_t owned_head;
-	enum z_erofs_collectmode mode;
+	enum z_erofs_pclustermode mode;
 
 	bool readahead;
 	/* used for applying cache strategy on the fly */
@@ -290,7 +289,7 @@ struct z_erofs_decompress_frontend {
 
 #define DECOMPRESS_FRONTEND_INIT(__i) { \
 	.inode = __i, .owned_head = Z_EROFS_PCLUSTER_TAIL, \
-	.mode = COLLECT_PRIMARY_FOLLOWED, .backmost = true }
+	.mode = Z_EROFS_PCLUSTER_FOLLOWED, .backmost = true }
 
 static struct page *z_pagemap_global[Z_EROFS_VMAP_GLOBAL_PAGES];
 static DEFINE_MUTEX(z_pagemap_global_lock);
@@ -310,7 +309,7 @@ static void z_erofs_bind_cache(struct z_erofs_decompress_frontend *fe,
 			__GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN;
 	unsigned int i;
 
-	if (fe->mode < COLLECT_PRIMARY_FOLLOWED)
+	if (fe->mode < Z_EROFS_PCLUSTER_FOLLOWED)
 		return;
 
 	for (i = 0; i < pcl->pclusterpages; ++i) {
@@ -358,7 +357,7 @@ static void z_erofs_bind_cache(struct z_erofs_decompress_frontend *fe,
 	 * managed cache since it can be moved to the bypass queue instead.
 	 */
 	if (standalone)
-		fe->mode = COLLECT_PRIMARY_FOLLOWED_NOINPLACE;
+		fe->mode = Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE;
 }
 
 /* called by erofs_shrinker to get rid of all compressed_pages */
@@ -439,12 +438,12 @@ static int z_erofs_attach_page(struct z_erofs_decompress_frontend *fe,
 {
 	int ret;
 
-	if (fe->mode >= COLLECT_PRIMARY && exclusive) {
+	if (exclusive) {
 		/* give priority for inplaceio to use file pages first */
 		if (z_erofs_try_inplace_io(fe, bvec))
 			return 0;
 		/* otherwise, check if it can be used as a bvpage */
-		if (fe->mode >= COLLECT_PRIMARY_FOLLOWED &&
+		if (fe->mode >= Z_EROFS_PCLUSTER_FOLLOWED &&
 		    !fe->candidate_bvpage)
 			fe->candidate_bvpage = bvec->page;
 	}
@@ -463,7 +462,7 @@ static void z_erofs_try_to_claim_pcluster(struct z_erofs_decompress_frontend *f)
 		    *owned_head) == Z_EROFS_PCLUSTER_NIL) {
 		*owned_head = &pcl->next;
 		/* so we can attach this pcluster to our submission chain. */
-		f->mode = COLLECT_PRIMARY_FOLLOWED;
+		f->mode = Z_EROFS_PCLUSTER_FOLLOWED;
 		return;
 	}
 
@@ -474,12 +473,12 @@ static void z_erofs_try_to_claim_pcluster(struct z_erofs_decompress_frontend *f)
 	if (cmpxchg(&pcl->next, Z_EROFS_PCLUSTER_TAIL,
 		    *owned_head) == Z_EROFS_PCLUSTER_TAIL) {
 		*owned_head = Z_EROFS_PCLUSTER_TAIL;
-		f->mode = COLLECT_PRIMARY_HOOKED;
+		f->mode = Z_EROFS_PCLUSTER_HOOKED;
 		f->tailpcl = NULL;
 		return;
 	}
 	/* type 3, it belongs to a chain, but it isn't the end of the chain */
-	f->mode = COLLECT_PRIMARY;
+	f->mode = Z_EROFS_PCLUSTER_INFLIGHT;
 }
 
 static int z_erofs_lookup_pcluster(struct z_erofs_decompress_frontend *fe)
@@ -554,7 +553,7 @@ static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe)
 	/* new pclusters should be claimed as type 1, primary and followed */
 	pcl->next = fe->owned_head;
 	pcl->pageofs_out = map->m_la & ~PAGE_MASK;
-	fe->mode = COLLECT_PRIMARY_FOLLOWED;
+	fe->mode = Z_EROFS_PCLUSTER_FOLLOWED;
 
 	/*
 	 * lock all primary followed works before visible to others
@@ -676,7 +675,7 @@ static bool z_erofs_collector_end(struct z_erofs_decompress_frontend *fe)
 	 * if all pending pages are added, don't hold its reference
 	 * any longer if the pcluster isn't hosted by ourselves.
 	 */
-	if (fe->mode < COLLECT_PRIMARY_FOLLOWED_NOINPLACE)
+	if (fe->mode < Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE)
 		erofs_workgroup_put(&pcl->obj);
 
 	fe->pcl = NULL;
@@ -756,7 +755,7 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
 		get_page(fe->map.buf.page);
 		WRITE_ONCE(fe->pcl->compressed_bvecs[0].page,
 			   fe->map.buf.page);
-		fe->mode = COLLECT_PRIMARY_FOLLOWED_NOINPLACE;
+		fe->mode = Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE;
 	} else {
 		/* bind cache first when cached decompression is preferred */
 		if (should_alloc_managed_pages(fe, sbi->opt.cache_strategy,
@@ -774,8 +773,8 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
 	 * those chains are handled asynchronously thus the page cannot be used
 	 * for inplace I/O or pagevec (should be processed in strict order.)
 	 */
-	tight &= (fe->mode >= COLLECT_PRIMARY_HOOKED &&
-		  fe->mode != COLLECT_PRIMARY_FOLLOWED_NOINPLACE);
+	tight &= (fe->mode >= Z_EROFS_PCLUSTER_HOOKED &&
+		  fe->mode != Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE);
 
 	cur = end - min_t(unsigned int, offset + end - map->m_la, end);
 	if (!(map->m_flags & EROFS_MAP_MAPPED)) {
@@ -785,7 +784,7 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
 
 	exclusive = (!cur && (!spiltted || tight));
 	if (cur)
-		tight &= (fe->mode >= COLLECT_PRIMARY_FOLLOWED);
+		tight &= (fe->mode >= Z_EROFS_PCLUSTER_FOLLOWED);
 
 retry:
 	err = z_erofs_attach_page(fe, &((struct z_erofs_bvec) {
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 11/16] erofs: get rid of `z_pagemap_global'
  2022-07-14 13:20 [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
                   ` (9 preceding siblings ...)
  2022-07-14 13:20 ` [PATCH 10/16] erofs: clean up `enum z_erofs_collectmode' Gao Xiang
@ 2022-07-14 13:20 ` Gao Xiang
  2022-07-14 13:20 ` [PATCH 12/16] erofs: introduce struct z_erofs_decompress_backend Gao Xiang
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Gao Xiang @ 2022-07-14 13:20 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: Gao Xiang, LKML

In order to introduce multi-reference pclusters for compressed data
deduplication, let's get rid of the global page array for now since
it needs to be re-designed then at least.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/zdata.c | 28 ++++------------------------
 fs/erofs/zdata.h |  1 -
 2 files changed, 4 insertions(+), 25 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 55bcd6e5ae9a..f24b866bc975 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -291,9 +291,6 @@ struct z_erofs_decompress_frontend {
 	.inode = __i, .owned_head = Z_EROFS_PCLUSTER_TAIL, \
 	.mode = Z_EROFS_PCLUSTER_FOLLOWED, .backmost = true }
 
-static struct page *z_pagemap_global[Z_EROFS_VMAP_GLOBAL_PAGES];
-static DEFINE_MUTEX(z_pagemap_global_lock);
-
 static void z_erofs_bind_cache(struct z_erofs_decompress_frontend *fe,
 			       enum z_erofs_cache_alloctype type,
 			       struct page **pagepool)
@@ -966,26 +963,11 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
 	mutex_lock(&pcl->lock);
 	nr_pages = pcl->nr_pages;
 
-	if (nr_pages <= Z_EROFS_VMAP_ONSTACK_PAGES) {
+	if (nr_pages <= Z_EROFS_VMAP_ONSTACK_PAGES)
 		pages = pages_onstack;
-	} else if (nr_pages <= Z_EROFS_VMAP_GLOBAL_PAGES &&
-		   mutex_trylock(&z_pagemap_global_lock)) {
-		pages = z_pagemap_global;
-	} else {
-		gfp_t gfp_flags = GFP_KERNEL;
-
-		if (nr_pages > Z_EROFS_VMAP_GLOBAL_PAGES)
-			gfp_flags |= __GFP_NOFAIL;
-
+	else
 		pages = kvmalloc_array(nr_pages, sizeof(struct page *),
-				       gfp_flags);
-
-		/* fallback to global pagemap for the lowmem scenario */
-		if (!pages) {
-			mutex_lock(&z_pagemap_global_lock);
-			pages = z_pagemap_global;
-		}
-	}
+				       GFP_KERNEL | __GFP_NOFAIL);
 
 	for (i = 0; i < nr_pages; ++i)
 		pages[i] = NULL;
@@ -1063,9 +1045,7 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
 		z_erofs_onlinepage_endio(page);
 	}
 
-	if (pages == z_pagemap_global)
-		mutex_unlock(&z_pagemap_global_lock);
-	else if (pages != pages_onstack)
+	if (pages != pages_onstack)
 		kvfree(pages);
 
 	pcl->nr_pages = 0;
diff --git a/fs/erofs/zdata.h b/fs/erofs/zdata.h
index 852da31e2e91..5964c942799e 100644
--- a/fs/erofs/zdata.h
+++ b/fs/erofs/zdata.h
@@ -175,6 +175,5 @@ static inline void z_erofs_onlinepage_endio(struct page *page)
 
 #define Z_EROFS_VMAP_ONSTACK_PAGES	\
 	min_t(unsigned int, THREAD_SIZE / 8 / sizeof(struct page *), 96U)
-#define Z_EROFS_VMAP_GLOBAL_PAGES	2048
 
 #endif
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 12/16] erofs: introduce struct z_erofs_decompress_backend
  2022-07-14 13:20 [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
                   ` (10 preceding siblings ...)
  2022-07-14 13:20 ` [PATCH 11/16] erofs: get rid of `z_pagemap_global' Gao Xiang
@ 2022-07-14 13:20 ` Gao Xiang
  2022-07-14 13:20 ` [PATCH 13/16] erofs: try to leave (de)compressed_pages on stack if possible Gao Xiang
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Gao Xiang @ 2022-07-14 13:20 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: Gao Xiang, LKML

Let's introduce struct z_erofs_decompress_backend in order to pass
on the decompression backend context between helper functions more
easier and avoid too many arguments.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/zdata.c | 142 +++++++++++++++++++++++++----------------------
 fs/erofs/zdata.h |   3 +-
 2 files changed, 76 insertions(+), 69 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index f24b866bc975..7aea6bb1e018 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -847,9 +847,22 @@ static bool z_erofs_page_is_invalidated(struct page *page)
 	return !page->mapping && !z_erofs_is_shortlived_page(page);
 }
 
-static int z_erofs_parse_out_bvecs(struct z_erofs_pcluster *pcl,
-				   struct page **pages, struct page **pagepool)
+struct z_erofs_decompress_backend {
+	struct page *onstack_pages[Z_EROFS_ONSTACK_PAGES];
+	struct super_block *sb;
+	struct z_erofs_pcluster *pcl;
+
+	/* pages with the longest decompressed length for deduplication */
+	struct page **decompressed_pages;
+	/* pages to keep the compressed data */
+	struct page **compressed_pages;
+
+	struct page **pagepool;
+};
+
+static int z_erofs_parse_out_bvecs(struct z_erofs_decompress_backend *be)
 {
+	struct z_erofs_pcluster *pcl = be->pcl;
 	struct z_erofs_bvec_iter biter;
 	struct page *old_bvpage;
 	int i, err = 0;
@@ -857,39 +870,39 @@ static int z_erofs_parse_out_bvecs(struct z_erofs_pcluster *pcl,
 	z_erofs_bvec_iter_begin(&biter, &pcl->bvset, Z_EROFS_INLINE_BVECS, 0);
 	for (i = 0; i < pcl->vcnt; ++i) {
 		struct z_erofs_bvec bvec;
-		unsigned int pagenr;
+		unsigned int pgnr;
 
 		z_erofs_bvec_dequeue(&biter, &bvec, &old_bvpage);
 
 		if (old_bvpage)
-			z_erofs_put_shortlivedpage(pagepool, old_bvpage);
+			z_erofs_put_shortlivedpage(be->pagepool, old_bvpage);
 
-		pagenr = (bvec.offset + pcl->pageofs_out) >> PAGE_SHIFT;
-		DBG_BUGON(pagenr >= pcl->nr_pages);
+		pgnr = (bvec.offset + pcl->pageofs_out) >> PAGE_SHIFT;
+		DBG_BUGON(pgnr >= pcl->nr_pages);
 		DBG_BUGON(z_erofs_page_is_invalidated(bvec.page));
 		/*
 		 * currently EROFS doesn't support multiref(dedup),
 		 * so here erroring out one multiref page.
 		 */
-		if (pages[pagenr]) {
+		if (be->decompressed_pages[pgnr]) {
 			DBG_BUGON(1);
-			z_erofs_page_mark_eio(pages[pagenr]);
-			z_erofs_onlinepage_endio(pages[pagenr]);
+			z_erofs_page_mark_eio(be->decompressed_pages[pgnr]);
+			z_erofs_onlinepage_endio(be->decompressed_pages[pgnr]);
 			err = -EFSCORRUPTED;
 		}
-		pages[pagenr] = bvec.page;
+		be->decompressed_pages[pgnr] = bvec.page;
 	}
 
 	old_bvpage = z_erofs_bvec_iter_end(&biter);
 	if (old_bvpage)
-		z_erofs_put_shortlivedpage(pagepool, old_bvpage);
+		z_erofs_put_shortlivedpage(be->pagepool, old_bvpage);
 	return err;
 }
 
-static struct page **z_erofs_parse_in_bvecs(struct erofs_sb_info *sbi,
-			struct z_erofs_pcluster *pcl, struct page **pages,
-			struct page **pagepool, bool *overlapped)
+static int z_erofs_parse_in_bvecs(struct z_erofs_decompress_backend *be,
+				  bool *overlapped)
 {
+	struct z_erofs_pcluster *pcl = be->pcl;
 	unsigned int pclusterpages = z_erofs_pclusterpages(pcl);
 	struct page **compressed_pages;
 	int i, err = 0;
@@ -919,7 +932,7 @@ static struct page **z_erofs_parse_in_bvecs(struct erofs_sb_info *sbi,
 
 		DBG_BUGON(z_erofs_page_is_invalidated(page));
 		if (!z_erofs_is_shortlived_page(page)) {
-			if (erofs_page_is_managed(sbi, page)) {
+			if (erofs_page_is_managed(EROFS_SB(be->sb), page)) {
 				if (!PageUptodate(page))
 					err = -EIO;
 				continue;
@@ -927,59 +940,55 @@ static struct page **z_erofs_parse_in_bvecs(struct erofs_sb_info *sbi,
 
 			pgnr = (bvec->offset + pcl->pageofs_out) >> PAGE_SHIFT;
 			DBG_BUGON(pgnr >= pcl->nr_pages);
-			if (pages[pgnr]) {
+			if (be->decompressed_pages[pgnr]) {
 				DBG_BUGON(1);
-				z_erofs_page_mark_eio(pages[pgnr]);
-				z_erofs_onlinepage_endio(pages[pgnr]);
+				z_erofs_page_mark_eio(
+						be->decompressed_pages[pgnr]);
+				z_erofs_onlinepage_endio(
+						be->decompressed_pages[pgnr]);
 				err = -EFSCORRUPTED;
 			}
-			pages[pgnr] = page;
+			be->decompressed_pages[pgnr] = page;
 			*overlapped = true;
 		}
 	}
 
 	if (err) {
 		kfree(compressed_pages);
-		return ERR_PTR(err);
+		return err;
 	}
-	return compressed_pages;
+	be->compressed_pages = compressed_pages;
+	return 0;
 }
 
-static int z_erofs_decompress_pcluster(struct super_block *sb,
-				       struct z_erofs_pcluster *pcl,
-				       struct page **pagepool, int err)
+static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be,
+				       int err)
 {
-	struct erofs_sb_info *const sbi = EROFS_SB(sb);
+	struct erofs_sb_info *const sbi = EROFS_SB(be->sb);
+	struct z_erofs_pcluster *pcl = be->pcl;
 	unsigned int pclusterpages = z_erofs_pclusterpages(pcl);
-	unsigned int i, inputsize, outputsize, llen, nr_pages;
-	struct page *pages_onstack[Z_EROFS_VMAP_ONSTACK_PAGES];
-	struct page **pages, **compressed_pages, *page;
-
+	unsigned int i, inputsize, outputsize, llen, nr_pages, err2;
+	struct page *page;
 	bool overlapped, partial;
 
-	might_sleep();
 	DBG_BUGON(!READ_ONCE(pcl->nr_pages));
-
 	mutex_lock(&pcl->lock);
 	nr_pages = pcl->nr_pages;
 
-	if (nr_pages <= Z_EROFS_VMAP_ONSTACK_PAGES)
-		pages = pages_onstack;
-	else
-		pages = kvmalloc_array(nr_pages, sizeof(struct page *),
-				       GFP_KERNEL | __GFP_NOFAIL);
-
-	for (i = 0; i < nr_pages; ++i)
-		pages[i] = NULL;
-
-	err = z_erofs_parse_out_bvecs(pcl, pages, pagepool);
-	compressed_pages = z_erofs_parse_in_bvecs(sbi, pcl, pages,
-						pagepool, &overlapped);
-	if (IS_ERR(compressed_pages)) {
-		err = PTR_ERR(compressed_pages);
-		compressed_pages = NULL;
+	if (nr_pages <= Z_EROFS_ONSTACK_PAGES) {
+		be->decompressed_pages = be->onstack_pages;
+		memset(be->decompressed_pages, 0,
+		       sizeof(struct page *) * nr_pages);
+	} else {
+		be->decompressed_pages =
+			kvcalloc(nr_pages, sizeof(struct page *),
+				 GFP_KERNEL | __GFP_NOFAIL);
 	}
 
+	err = z_erofs_parse_out_bvecs(be);
+	err2 = z_erofs_parse_in_bvecs(be, &overlapped);
+	if (err2)
+		err = err2;
 	if (err)
 		goto out;
 
@@ -998,9 +1007,9 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
 		inputsize = pclusterpages * PAGE_SIZE;
 
 	err = z_erofs_decompress(&(struct z_erofs_decompress_req) {
-					.sb = sb,
-					.in = compressed_pages,
-					.out = pages,
+					.sb = be->sb,
+					.in = be->compressed_pages,
+					.out = be->decompressed_pages,
 					.pageofs_in = pcl->pageofs_in,
 					.pageofs_out = pcl->pageofs_out,
 					.inputsize = inputsize,
@@ -1008,7 +1017,7 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
 					.alg = pcl->algorithmformat,
 					.inplace_io = overlapped,
 					.partial_decoding = partial
-				 }, pagepool);
+				 }, be->pagepool);
 
 out:
 	/* must handle all compressed pages before actual file pages */
@@ -1024,29 +1033,29 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
 				continue;
 
 			/* recycle all individual short-lived pages */
-			(void)z_erofs_put_shortlivedpage(pagepool, page);
+			(void)z_erofs_put_shortlivedpage(be->pagepool, page);
 			WRITE_ONCE(pcl->compressed_bvecs[i].page, NULL);
 		}
 	}
-	kfree(compressed_pages);
+	kfree(be->compressed_pages);
 
 	for (i = 0; i < nr_pages; ++i) {
-		page = pages[i];
+		page = be->decompressed_pages[i];
 		if (!page)
 			continue;
 
 		DBG_BUGON(z_erofs_page_is_invalidated(page));
 
 		/* recycle all individual short-lived pages */
-		if (z_erofs_put_shortlivedpage(pagepool, page))
+		if (z_erofs_put_shortlivedpage(be->pagepool, page))
 			continue;
 		if (err)
 			z_erofs_page_mark_eio(page);
 		z_erofs_onlinepage_endio(page);
 	}
 
-	if (pages != pages_onstack)
-		kvfree(pages);
+	if (be->decompressed_pages != be->onstack_pages)
+		kvfree(be->decompressed_pages);
 
 	pcl->nr_pages = 0;
 	pcl->bvset.nextpage = NULL;
@@ -1061,23 +1070,23 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
 static void z_erofs_decompress_queue(const struct z_erofs_decompressqueue *io,
 				     struct page **pagepool)
 {
+	struct z_erofs_decompress_backend be = {
+		.sb = io->sb,
+		.pagepool = pagepool,
+	};
 	z_erofs_next_pcluster_t owned = io->head;
 
 	while (owned != Z_EROFS_PCLUSTER_TAIL_CLOSED) {
-		struct z_erofs_pcluster *pcl;
-
-		/* no possible that 'owned' equals Z_EROFS_WORK_TPTR_TAIL */
+		/* impossible that 'owned' equals Z_EROFS_WORK_TPTR_TAIL */
 		DBG_BUGON(owned == Z_EROFS_PCLUSTER_TAIL);
-
-		/* no possible that 'owned' equals NULL */
+		/* impossible that 'owned' equals Z_EROFS_PCLUSTER_NIL */
 		DBG_BUGON(owned == Z_EROFS_PCLUSTER_NIL);
 
-		pcl = container_of(owned, struct z_erofs_pcluster, next);
-		owned = READ_ONCE(pcl->next);
+		be.pcl = container_of(owned, struct z_erofs_pcluster, next);
+		owned = READ_ONCE(be.pcl->next);
 
-		z_erofs_decompress_pcluster(io->sb, pcl, pagepool,
-					    io->eio ? -EIO : 0);
-		erofs_workgroup_put(&pcl->obj);
+		z_erofs_decompress_pcluster(&be, io->eio ? -EIO : 0);
+		erofs_workgroup_put(&be.pcl->obj);
 	}
 }
 
@@ -1103,7 +1112,6 @@ static void z_erofs_decompress_kickoff(struct z_erofs_decompressqueue *io,
 	if (sync) {
 		if (!atomic_add_return(bios, &io->pending_bios))
 			complete(&io->u.done);
-
 		return;
 	}
 
diff --git a/fs/erofs/zdata.h b/fs/erofs/zdata.h
index 5964c942799e..ec09ca035fbb 100644
--- a/fs/erofs/zdata.h
+++ b/fs/erofs/zdata.h
@@ -173,7 +173,6 @@ static inline void z_erofs_onlinepage_endio(struct page *page)
 	}
 }
 
-#define Z_EROFS_VMAP_ONSTACK_PAGES	\
-	min_t(unsigned int, THREAD_SIZE / 8 / sizeof(struct page *), 96U)
+#define Z_EROFS_ONSTACK_PAGES		32
 
 #endif
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 13/16] erofs: try to leave (de)compressed_pages on stack if possible
  2022-07-14 13:20 [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
                   ` (11 preceding siblings ...)
  2022-07-14 13:20 ` [PATCH 12/16] erofs: introduce struct z_erofs_decompress_backend Gao Xiang
@ 2022-07-14 13:20 ` Gao Xiang
  2022-07-14 13:20 ` [PATCH 14/16] erofs: introduce z_erofs_do_decompressed_bvec() Gao Xiang
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Gao Xiang @ 2022-07-14 13:20 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: Gao Xiang, LKML

For the most cases, small pclusters can be decompressed with page
arrays on stack.

Try to leave both (de)compressed_pages on stack if possible as before.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/zdata.c | 34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 7aea6bb1e018..4093d8a4ce93 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -858,6 +858,7 @@ struct z_erofs_decompress_backend {
 	struct page **compressed_pages;
 
 	struct page **pagepool;
+	unsigned int onstack_used;
 };
 
 static int z_erofs_parse_out_bvecs(struct z_erofs_decompress_backend *be)
@@ -904,14 +905,9 @@ static int z_erofs_parse_in_bvecs(struct z_erofs_decompress_backend *be,
 {
 	struct z_erofs_pcluster *pcl = be->pcl;
 	unsigned int pclusterpages = z_erofs_pclusterpages(pcl);
-	struct page **compressed_pages;
 	int i, err = 0;
 
-	/* XXX: will have a better approach in the following commits */
-	compressed_pages = kmalloc_array(pclusterpages, sizeof(struct page *),
-					 GFP_KERNEL | __GFP_NOFAIL);
 	*overlapped = false;
-
 	for (i = 0; i < pclusterpages; ++i) {
 		struct z_erofs_bvec *bvec = &pcl->compressed_bvecs[i];
 		struct page *page = bvec->page;
@@ -922,7 +918,7 @@ static int z_erofs_parse_in_bvecs(struct z_erofs_decompress_backend *be,
 			DBG_BUGON(1);
 			continue;
 		}
-		compressed_pages[i] = page;
+		be->compressed_pages[i] = page;
 
 		if (z_erofs_is_inline_pcluster(pcl)) {
 			if (!PageUptodate(page))
@@ -953,11 +949,8 @@ static int z_erofs_parse_in_bvecs(struct z_erofs_decompress_backend *be,
 		}
 	}
 
-	if (err) {
-		kfree(compressed_pages);
+	if (err)
 		return err;
-	}
-	be->compressed_pages = compressed_pages;
 	return 0;
 }
 
@@ -975,15 +968,28 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be,
 	mutex_lock(&pcl->lock);
 	nr_pages = pcl->nr_pages;
 
+	/* allocate (de)compressed page arrays if cannot be kept on stack */
+	be->decompressed_pages = NULL;
+	be->compressed_pages = NULL;
+	be->onstack_used = 0;
 	if (nr_pages <= Z_EROFS_ONSTACK_PAGES) {
 		be->decompressed_pages = be->onstack_pages;
+		be->onstack_used = nr_pages;
 		memset(be->decompressed_pages, 0,
 		       sizeof(struct page *) * nr_pages);
-	} else {
+	}
+
+	if (pclusterpages + be->onstack_used <= Z_EROFS_ONSTACK_PAGES)
+		be->compressed_pages = be->onstack_pages + be->onstack_used;
+
+	if (!be->decompressed_pages)
 		be->decompressed_pages =
 			kvcalloc(nr_pages, sizeof(struct page *),
 				 GFP_KERNEL | __GFP_NOFAIL);
-	}
+	if (!be->compressed_pages)
+		be->compressed_pages =
+			kvcalloc(pclusterpages, sizeof(struct page *),
+				 GFP_KERNEL | __GFP_NOFAIL);
 
 	err = z_erofs_parse_out_bvecs(be);
 	err2 = z_erofs_parse_in_bvecs(be, &overlapped);
@@ -1037,7 +1043,9 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be,
 			WRITE_ONCE(pcl->compressed_bvecs[i].page, NULL);
 		}
 	}
-	kfree(be->compressed_pages);
+	if (be->compressed_pages < be->onstack_pages ||
+	    be->compressed_pages >= be->onstack_pages + Z_EROFS_ONSTACK_PAGES)
+		kvfree(be->compressed_pages);
 
 	for (i = 0; i < nr_pages; ++i) {
 		page = be->decompressed_pages[i];
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 14/16] erofs: introduce z_erofs_do_decompressed_bvec()
  2022-07-14 13:20 [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
                   ` (12 preceding siblings ...)
  2022-07-14 13:20 ` [PATCH 13/16] erofs: try to leave (de)compressed_pages on stack if possible Gao Xiang
@ 2022-07-14 13:20 ` Gao Xiang
  2022-07-14 13:20 ` [PATCH 15/16] erofs: record the longest decompressed size in this round Gao Xiang
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Gao Xiang @ 2022-07-14 13:20 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: Gao Xiang, LKML

Both out_bvecs and in_bvecs share the common logic for decompressed
buffers. So let's make a helper for this.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/zdata.c | 49 ++++++++++++++++++++++--------------------------
 1 file changed, 22 insertions(+), 27 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 4093d8a4ce93..391755dafecd 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -861,6 +861,26 @@ struct z_erofs_decompress_backend {
 	unsigned int onstack_used;
 };
 
+static int z_erofs_do_decompressed_bvec(struct z_erofs_decompress_backend *be,
+					struct z_erofs_bvec *bvec)
+{
+	unsigned int pgnr = (bvec->offset + be->pcl->pageofs_out) >> PAGE_SHIFT;
+	struct page *oldpage;
+
+	DBG_BUGON(pgnr >= be->pcl->nr_pages);
+	oldpage = be->decompressed_pages[pgnr];
+	be->decompressed_pages[pgnr] = bvec->page;
+
+	/* error out if one pcluster is refenenced multiple times. */
+	if (oldpage) {
+		DBG_BUGON(1);
+		z_erofs_page_mark_eio(oldpage);
+		z_erofs_onlinepage_endio(oldpage);
+		return -EFSCORRUPTED;
+	}
+	return 0;
+}
+
 static int z_erofs_parse_out_bvecs(struct z_erofs_decompress_backend *be)
 {
 	struct z_erofs_pcluster *pcl = be->pcl;
@@ -871,27 +891,14 @@ static int z_erofs_parse_out_bvecs(struct z_erofs_decompress_backend *be)
 	z_erofs_bvec_iter_begin(&biter, &pcl->bvset, Z_EROFS_INLINE_BVECS, 0);
 	for (i = 0; i < pcl->vcnt; ++i) {
 		struct z_erofs_bvec bvec;
-		unsigned int pgnr;
 
 		z_erofs_bvec_dequeue(&biter, &bvec, &old_bvpage);
 
 		if (old_bvpage)
 			z_erofs_put_shortlivedpage(be->pagepool, old_bvpage);
 
-		pgnr = (bvec.offset + pcl->pageofs_out) >> PAGE_SHIFT;
-		DBG_BUGON(pgnr >= pcl->nr_pages);
 		DBG_BUGON(z_erofs_page_is_invalidated(bvec.page));
-		/*
-		 * currently EROFS doesn't support multiref(dedup),
-		 * so here erroring out one multiref page.
-		 */
-		if (be->decompressed_pages[pgnr]) {
-			DBG_BUGON(1);
-			z_erofs_page_mark_eio(be->decompressed_pages[pgnr]);
-			z_erofs_onlinepage_endio(be->decompressed_pages[pgnr]);
-			err = -EFSCORRUPTED;
-		}
-		be->decompressed_pages[pgnr] = bvec.page;
+		err = z_erofs_do_decompressed_bvec(be, &bvec);
 	}
 
 	old_bvpage = z_erofs_bvec_iter_end(&biter);
@@ -911,7 +918,6 @@ static int z_erofs_parse_in_bvecs(struct z_erofs_decompress_backend *be,
 	for (i = 0; i < pclusterpages; ++i) {
 		struct z_erofs_bvec *bvec = &pcl->compressed_bvecs[i];
 		struct page *page = bvec->page;
-		unsigned int pgnr;
 
 		/* compressed pages ought to be present before decompressing */
 		if (!page) {
@@ -933,18 +939,7 @@ static int z_erofs_parse_in_bvecs(struct z_erofs_decompress_backend *be,
 					err = -EIO;
 				continue;
 			}
-
-			pgnr = (bvec->offset + pcl->pageofs_out) >> PAGE_SHIFT;
-			DBG_BUGON(pgnr >= pcl->nr_pages);
-			if (be->decompressed_pages[pgnr]) {
-				DBG_BUGON(1);
-				z_erofs_page_mark_eio(
-						be->decompressed_pages[pgnr]);
-				z_erofs_onlinepage_endio(
-						be->decompressed_pages[pgnr]);
-				err = -EFSCORRUPTED;
-			}
-			be->decompressed_pages[pgnr] = page;
+			err = z_erofs_do_decompressed_bvec(be, bvec);
 			*overlapped = true;
 		}
 	}
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 15/16] erofs: record the longest decompressed size in this round
  2022-07-14 13:20 [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
                   ` (13 preceding siblings ...)
  2022-07-14 13:20 ` [PATCH 14/16] erofs: introduce z_erofs_do_decompressed_bvec() Gao Xiang
@ 2022-07-14 13:20 ` Gao Xiang
  2022-07-14 13:20 ` [PATCH 16/16] erofs: introduce multi-reference pclusters (fully-referenced) Gao Xiang
  2022-07-14 13:38 ` [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
  16 siblings, 0 replies; 28+ messages in thread
From: Gao Xiang @ 2022-07-14 13:20 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: Gao Xiang, LKML

Currently, `pcl->length' records the longest decompressed length
as long as the pcluster itself isn't reclaimed.  However, such
number is unneeded for the general cases since it doesn't indicate
the exact decompressed size in this round.

Instead, let's record the decompressed size for this round instead,
thus `pcl->nr_pages' can be completely dropped and pageofs_out is
also designed to be kept in sync with `pcl->length'.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/zdata.c | 76 +++++++++++++++++-------------------------------
 fs/erofs/zdata.h | 11 +++----
 2 files changed, 30 insertions(+), 57 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 391755dafecd..8dcfc2a9704e 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -482,7 +482,6 @@ static int z_erofs_lookup_pcluster(struct z_erofs_decompress_frontend *fe)
 {
 	struct erofs_map_blocks *map = &fe->map;
 	struct z_erofs_pcluster *pcl = fe->pcl;
-	unsigned int length;
 
 	/* to avoid unexpected loop formed by corrupted images */
 	if (fe->owned_head == &pcl->next || pcl == fe->tailpcl) {
@@ -495,24 +494,6 @@ static int z_erofs_lookup_pcluster(struct z_erofs_decompress_frontend *fe)
 		return -EFSCORRUPTED;
 	}
 
-	length = READ_ONCE(pcl->length);
-	if (length & Z_EROFS_PCLUSTER_FULL_LENGTH) {
-		if ((map->m_llen << Z_EROFS_PCLUSTER_LENGTH_BIT) > length) {
-			DBG_BUGON(1);
-			return -EFSCORRUPTED;
-		}
-	} else {
-		unsigned int llen = map->m_llen << Z_EROFS_PCLUSTER_LENGTH_BIT;
-
-		if (map->m_flags & EROFS_MAP_FULL_MAPPED)
-			llen |= Z_EROFS_PCLUSTER_FULL_LENGTH;
-
-		while (llen > length &&
-		       length != cmpxchg_relaxed(&pcl->length, length, llen)) {
-			cpu_relax();
-			length = READ_ONCE(pcl->length);
-		}
-	}
 	mutex_lock(&pcl->lock);
 	/* used to check tail merging loop due to corrupted images */
 	if (fe->owned_head == Z_EROFS_PCLUSTER_TAIL)
@@ -543,9 +524,8 @@ static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe)
 
 	atomic_set(&pcl->obj.refcount, 1);
 	pcl->algorithmformat = map->m_algorithmformat;
-	pcl->length = (map->m_llen << Z_EROFS_PCLUSTER_LENGTH_BIT) |
-		(map->m_flags & EROFS_MAP_FULL_MAPPED ?
-			Z_EROFS_PCLUSTER_FULL_LENGTH : 0);
+	pcl->length = 0;
+	pcl->partial = true;
 
 	/* new pclusters should be claimed as type 1, primary and followed */
 	pcl->next = fe->owned_head;
@@ -703,7 +683,7 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
 	bool tight = true, exclusive;
 
 	enum z_erofs_cache_alloctype cache_strategy;
-	unsigned int cur, end, spiltted, index;
+	unsigned int cur, end, spiltted;
 	int err = 0;
 
 	/* register locked file pages as online pages in pack */
@@ -806,12 +786,17 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
 	/* bump up the number of spiltted parts of a page */
 	++spiltted;
 
-	/* also update nr_pages */
-	index = page->index - (map->m_la >> PAGE_SHIFT);
-	fe->pcl->nr_pages = max_t(pgoff_t, fe->pcl->nr_pages, index + 1);
+	if (fe->pcl->length < offset + end - map->m_la) {
+		fe->pcl->length = offset + end - map->m_la;
+		fe->pcl->pageofs_out = map->m_la & ~PAGE_MASK;
+	}
+	if ((map->m_flags & EROFS_MAP_FULL_MAPPED) &&
+	    fe->pcl->length == map->m_llen)
+		fe->pcl->partial = false;
 next_part:
-	/* can be used for verification */
+	/* shorten the remaining extent to update progress */
 	map->m_llen = offset + cur - map->m_la;
+	map->m_flags &= ~EROFS_MAP_FULL_MAPPED;
 
 	end = cur;
 	if (end > 0)
@@ -858,7 +843,7 @@ struct z_erofs_decompress_backend {
 	struct page **compressed_pages;
 
 	struct page **pagepool;
-	unsigned int onstack_used;
+	unsigned int onstack_used, nr_pages;
 };
 
 static int z_erofs_do_decompressed_bvec(struct z_erofs_decompress_backend *be,
@@ -867,7 +852,7 @@ static int z_erofs_do_decompressed_bvec(struct z_erofs_decompress_backend *be,
 	unsigned int pgnr = (bvec->offset + be->pcl->pageofs_out) >> PAGE_SHIFT;
 	struct page *oldpage;
 
-	DBG_BUGON(pgnr >= be->pcl->nr_pages);
+	DBG_BUGON(pgnr >= be->nr_pages);
 	oldpage = be->decompressed_pages[pgnr];
 	be->decompressed_pages[pgnr] = bvec->page;
 
@@ -955,23 +940,22 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be,
 	struct erofs_sb_info *const sbi = EROFS_SB(be->sb);
 	struct z_erofs_pcluster *pcl = be->pcl;
 	unsigned int pclusterpages = z_erofs_pclusterpages(pcl);
-	unsigned int i, inputsize, outputsize, llen, nr_pages, err2;
+	unsigned int i, inputsize, err2;
 	struct page *page;
-	bool overlapped, partial;
+	bool overlapped;
 
-	DBG_BUGON(!READ_ONCE(pcl->nr_pages));
 	mutex_lock(&pcl->lock);
-	nr_pages = pcl->nr_pages;
+	be->nr_pages = PAGE_ALIGN(pcl->length + pcl->pageofs_out) >> PAGE_SHIFT;
 
 	/* allocate (de)compressed page arrays if cannot be kept on stack */
 	be->decompressed_pages = NULL;
 	be->compressed_pages = NULL;
 	be->onstack_used = 0;
-	if (nr_pages <= Z_EROFS_ONSTACK_PAGES) {
+	if (be->nr_pages <= Z_EROFS_ONSTACK_PAGES) {
 		be->decompressed_pages = be->onstack_pages;
-		be->onstack_used = nr_pages;
+		be->onstack_used = be->nr_pages;
 		memset(be->decompressed_pages, 0,
-		       sizeof(struct page *) * nr_pages);
+		       sizeof(struct page *) * be->nr_pages);
 	}
 
 	if (pclusterpages + be->onstack_used <= Z_EROFS_ONSTACK_PAGES)
@@ -979,7 +963,7 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be,
 
 	if (!be->decompressed_pages)
 		be->decompressed_pages =
-			kvcalloc(nr_pages, sizeof(struct page *),
+			kvcalloc(be->nr_pages, sizeof(struct page *),
 				 GFP_KERNEL | __GFP_NOFAIL);
 	if (!be->compressed_pages)
 		be->compressed_pages =
@@ -993,15 +977,6 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be,
 	if (err)
 		goto out;
 
-	llen = pcl->length >> Z_EROFS_PCLUSTER_LENGTH_BIT;
-	if (nr_pages << PAGE_SHIFT >= pcl->pageofs_out + llen) {
-		outputsize = llen;
-		partial = !(pcl->length & Z_EROFS_PCLUSTER_FULL_LENGTH);
-	} else {
-		outputsize = (nr_pages << PAGE_SHIFT) - pcl->pageofs_out;
-		partial = true;
-	}
-
 	if (z_erofs_is_inline_pcluster(pcl))
 		inputsize = pcl->tailpacking_size;
 	else
@@ -1014,10 +989,10 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be,
 					.pageofs_in = pcl->pageofs_in,
 					.pageofs_out = pcl->pageofs_out,
 					.inputsize = inputsize,
-					.outputsize = outputsize,
+					.outputsize = pcl->length,
 					.alg = pcl->algorithmformat,
 					.inplace_io = overlapped,
-					.partial_decoding = partial
+					.partial_decoding = pcl->partial,
 				 }, be->pagepool);
 
 out:
@@ -1042,7 +1017,7 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be,
 	    be->compressed_pages >= be->onstack_pages + Z_EROFS_ONSTACK_PAGES)
 		kvfree(be->compressed_pages);
 
-	for (i = 0; i < nr_pages; ++i) {
+	for (i = 0; i < be->nr_pages; ++i) {
 		page = be->decompressed_pages[i];
 		if (!page)
 			continue;
@@ -1060,7 +1035,8 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be,
 	if (be->decompressed_pages != be->onstack_pages)
 		kvfree(be->decompressed_pages);
 
-	pcl->nr_pages = 0;
+	pcl->length = 0;
+	pcl->partial = true;
 	pcl->bvset.nextpage = NULL;
 	pcl->vcnt = 0;
 
diff --git a/fs/erofs/zdata.h b/fs/erofs/zdata.h
index ec09ca035fbb..a7fd44d21d9e 100644
--- a/fs/erofs/zdata.h
+++ b/fs/erofs/zdata.h
@@ -12,9 +12,6 @@
 #define Z_EROFS_PCLUSTER_MAX_PAGES	(Z_EROFS_PCLUSTER_MAX_SIZE / PAGE_SIZE)
 #define Z_EROFS_INLINE_BVECS		2
 
-#define Z_EROFS_PCLUSTER_FULL_LENGTH    0x00000001
-#define Z_EROFS_PCLUSTER_LENGTH_BIT     1
-
 /*
  * let's leave a type here in case of introducing
  * another tagged pointer later.
@@ -53,7 +50,7 @@ struct z_erofs_pcluster {
 	/* A: point to next chained pcluster or TAILs */
 	z_erofs_next_pcluster_t next;
 
-	/* A: lower limit of decompressed length and if full length or not */
+	/* L: the maximum decompression size of this round */
 	unsigned int length;
 
 	/* L: total number of bvecs */
@@ -65,9 +62,6 @@ struct z_erofs_pcluster {
 	/* I: page offset of inline compressed data */
 	unsigned short pageofs_in;
 
-	/* L: maximum relative page index in bvecs */
-	unsigned short nr_pages;
-
 	union {
 		/* L: inline a certain number of bvec for bootstrap */
 		struct z_erofs_bvset_inline bvset;
@@ -87,6 +81,9 @@ struct z_erofs_pcluster {
 	/* I: compression algorithm format */
 	unsigned char algorithmformat;
 
+	/* L: whether partial decompression or not */
+	bool partial;
+
 	/* A: compressed bvecs (can be cached or inplaced pages) */
 	struct z_erofs_bvec compressed_bvecs[];
 };
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 16/16] erofs: introduce multi-reference pclusters (fully-referenced)
  2022-07-14 13:20 [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
                   ` (14 preceding siblings ...)
  2022-07-14 13:20 ` [PATCH 15/16] erofs: record the longest decompressed size in this round Gao Xiang
@ 2022-07-14 13:20 ` Gao Xiang
  2022-07-14 13:38 ` [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
  16 siblings, 0 replies; 28+ messages in thread
From: Gao Xiang @ 2022-07-14 13:20 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: Gao Xiang, LKML

Let's introduce multi-reference pclusters at runtime. In details,
if one pcluster is requested by multiple extents at almost the same
time (even belong to different files), the longest extent will be
decompressed as representative and the other extents are actually
copied from the longest one.

After this patch, fully-referenced extents can be correctly handled
and the full decoding check needs to be bypassed for
partial-referenced extents.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/compress.h     |   2 +-
 fs/erofs/decompressor.c |   2 +-
 fs/erofs/zdata.c        | 120 +++++++++++++++++++++++++++-------------
 fs/erofs/zdata.h        |   3 +
 4 files changed, 88 insertions(+), 39 deletions(-)

diff --git a/fs/erofs/compress.h b/fs/erofs/compress.h
index 19e6c56a9f47..26fa170090b8 100644
--- a/fs/erofs/compress.h
+++ b/fs/erofs/compress.h
@@ -17,7 +17,7 @@ struct z_erofs_decompress_req {
 
 	/* indicate the algorithm will be used for decompression */
 	unsigned int alg;
-	bool inplace_io, partial_decoding;
+	bool inplace_io, partial_decoding, fillgaps;
 };
 
 struct z_erofs_decompressor {
diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index 6dca1900c733..91b9bff10198 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -83,7 +83,7 @@ static int z_erofs_lz4_prepare_dstpages(struct z_erofs_lz4_decompress_ctx *ctx,
 			j = 0;
 
 		/* 'valid' bounced can only be tested after a complete round */
-		if (test_bit(j, bounced)) {
+		if (!rq->fillgaps && test_bit(j, bounced)) {
 			DBG_BUGON(i < lz4_max_distance_pages);
 			DBG_BUGON(top >= lz4_max_distance_pages);
 			availables[top++] = rq->out[i - lz4_max_distance_pages];
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 8dcfc2a9704e..601cfcb07c50 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -467,7 +467,8 @@ static void z_erofs_try_to_claim_pcluster(struct z_erofs_decompress_frontend *f)
 	 * type 2, link to the end of an existing open chain, be careful
 	 * that its submission is controlled by the original attached chain.
 	 */
-	if (cmpxchg(&pcl->next, Z_EROFS_PCLUSTER_TAIL,
+	if (*owned_head != &pcl->next && pcl != f->tailpcl &&
+	    cmpxchg(&pcl->next, Z_EROFS_PCLUSTER_TAIL,
 		    *owned_head) == Z_EROFS_PCLUSTER_TAIL) {
 		*owned_head = Z_EROFS_PCLUSTER_TAIL;
 		f->mode = Z_EROFS_PCLUSTER_HOOKED;
@@ -480,20 +481,8 @@ static void z_erofs_try_to_claim_pcluster(struct z_erofs_decompress_frontend *f)
 
 static int z_erofs_lookup_pcluster(struct z_erofs_decompress_frontend *fe)
 {
-	struct erofs_map_blocks *map = &fe->map;
 	struct z_erofs_pcluster *pcl = fe->pcl;
 
-	/* to avoid unexpected loop formed by corrupted images */
-	if (fe->owned_head == &pcl->next || pcl == fe->tailpcl) {
-		DBG_BUGON(1);
-		return -EFSCORRUPTED;
-	}
-
-	if (pcl->pageofs_out != (map->m_la & ~PAGE_MASK)) {
-		DBG_BUGON(1);
-		return -EFSCORRUPTED;
-	}
-
 	mutex_lock(&pcl->lock);
 	/* used to check tail merging loop due to corrupted images */
 	if (fe->owned_head == Z_EROFS_PCLUSTER_TAIL)
@@ -785,6 +774,8 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
 	z_erofs_onlinepage_split(page);
 	/* bump up the number of spiltted parts of a page */
 	++spiltted;
+	fe->pcl->multibases =
+		(fe->pcl->pageofs_out != (map->m_la & ~PAGE_MASK));
 
 	if (fe->pcl->length < offset + end - map->m_la) {
 		fe->pcl->length = offset + end - map->m_la;
@@ -842,36 +833,90 @@ struct z_erofs_decompress_backend {
 	/* pages to keep the compressed data */
 	struct page **compressed_pages;
 
+	struct list_head decompressed_secondary_bvecs;
 	struct page **pagepool;
 	unsigned int onstack_used, nr_pages;
 };
 
-static int z_erofs_do_decompressed_bvec(struct z_erofs_decompress_backend *be,
-					struct z_erofs_bvec *bvec)
+struct z_erofs_bvec_item {
+	struct z_erofs_bvec bvec;
+	struct list_head list;
+};
+
+static void z_erofs_do_decompressed_bvec(struct z_erofs_decompress_backend *be,
+					 struct z_erofs_bvec *bvec)
 {
-	unsigned int pgnr = (bvec->offset + be->pcl->pageofs_out) >> PAGE_SHIFT;
-	struct page *oldpage;
+	struct z_erofs_bvec_item *item;
 
-	DBG_BUGON(pgnr >= be->nr_pages);
-	oldpage = be->decompressed_pages[pgnr];
-	be->decompressed_pages[pgnr] = bvec->page;
+	if (!((bvec->offset + be->pcl->pageofs_out) & ~PAGE_MASK)) {
+		unsigned int pgnr;
+		struct page *oldpage;
 
-	/* error out if one pcluster is refenenced multiple times. */
-	if (oldpage) {
-		DBG_BUGON(1);
-		z_erofs_page_mark_eio(oldpage);
-		z_erofs_onlinepage_endio(oldpage);
-		return -EFSCORRUPTED;
+		pgnr = (bvec->offset + be->pcl->pageofs_out) >> PAGE_SHIFT;
+		DBG_BUGON(pgnr >= be->nr_pages);
+		oldpage = be->decompressed_pages[pgnr];
+		be->decompressed_pages[pgnr] = bvec->page;
+
+		if (!oldpage)
+			return;
+	}
+
+	/* (cold path) one pcluster is requested multiple times */
+	item = kmalloc(sizeof(*item), GFP_KERNEL | __GFP_NOFAIL);
+	item->bvec = *bvec;
+	list_add(&item->list, &be->decompressed_secondary_bvecs);
+}
+
+static void z_erofs_fill_other_copies(struct z_erofs_decompress_backend *be,
+				      int err)
+{
+	unsigned int off0 = be->pcl->pageofs_out;
+	struct list_head *p, *n;
+
+	list_for_each_safe(p, n, &be->decompressed_secondary_bvecs) {
+		struct z_erofs_bvec_item *bvi;
+		unsigned int end, cur;
+		void *dst, *src;
+
+		bvi = container_of(p, struct z_erofs_bvec_item, list);
+		cur = bvi->bvec.offset < 0 ? -bvi->bvec.offset : 0;
+		end = min_t(unsigned int, be->pcl->length - bvi->bvec.offset,
+			    bvi->bvec.end);
+		dst = kmap_local_page(bvi->bvec.page);
+		while (cur < end) {
+			unsigned int pgnr, scur, len;
+
+			pgnr = (bvi->bvec.offset + cur + off0) >> PAGE_SHIFT;
+			DBG_BUGON(pgnr >= be->nr_pages);
+
+			scur = bvi->bvec.offset + cur -
+					((pgnr << PAGE_SHIFT) - off0);
+			len = min_t(unsigned int, end - cur, PAGE_SIZE - scur);
+			if (!be->decompressed_pages[pgnr]) {
+				err = -EFSCORRUPTED;
+				cur += len;
+				continue;
+			}
+			src = kmap_local_page(be->decompressed_pages[pgnr]);
+			memcpy(dst + cur, src + scur, len);
+			kunmap_local(src);
+			cur += len;
+		}
+		kunmap_local(dst);
+		if (err)
+			z_erofs_page_mark_eio(bvi->bvec.page);
+		z_erofs_onlinepage_endio(bvi->bvec.page);
+		list_del(p);
+		kfree(bvi);
 	}
-	return 0;
 }
 
-static int z_erofs_parse_out_bvecs(struct z_erofs_decompress_backend *be)
+static void z_erofs_parse_out_bvecs(struct z_erofs_decompress_backend *be)
 {
 	struct z_erofs_pcluster *pcl = be->pcl;
 	struct z_erofs_bvec_iter biter;
 	struct page *old_bvpage;
-	int i, err = 0;
+	int i;
 
 	z_erofs_bvec_iter_begin(&biter, &pcl->bvset, Z_EROFS_INLINE_BVECS, 0);
 	for (i = 0; i < pcl->vcnt; ++i) {
@@ -883,13 +928,12 @@ static int z_erofs_parse_out_bvecs(struct z_erofs_decompress_backend *be)
 			z_erofs_put_shortlivedpage(be->pagepool, old_bvpage);
 
 		DBG_BUGON(z_erofs_page_is_invalidated(bvec.page));
-		err = z_erofs_do_decompressed_bvec(be, &bvec);
+		z_erofs_do_decompressed_bvec(be, &bvec);
 	}
 
 	old_bvpage = z_erofs_bvec_iter_end(&biter);
 	if (old_bvpage)
 		z_erofs_put_shortlivedpage(be->pagepool, old_bvpage);
-	return err;
 }
 
 static int z_erofs_parse_in_bvecs(struct z_erofs_decompress_backend *be,
@@ -924,7 +968,7 @@ static int z_erofs_parse_in_bvecs(struct z_erofs_decompress_backend *be,
 					err = -EIO;
 				continue;
 			}
-			err = z_erofs_do_decompressed_bvec(be, bvec);
+			z_erofs_do_decompressed_bvec(be, bvec);
 			*overlapped = true;
 		}
 	}
@@ -940,7 +984,7 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be,
 	struct erofs_sb_info *const sbi = EROFS_SB(be->sb);
 	struct z_erofs_pcluster *pcl = be->pcl;
 	unsigned int pclusterpages = z_erofs_pclusterpages(pcl);
-	unsigned int i, inputsize, err2;
+	unsigned int i, inputsize;
 	struct page *page;
 	bool overlapped;
 
@@ -970,10 +1014,8 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be,
 			kvcalloc(pclusterpages, sizeof(struct page *),
 				 GFP_KERNEL | __GFP_NOFAIL);
 
-	err = z_erofs_parse_out_bvecs(be);
-	err2 = z_erofs_parse_in_bvecs(be, &overlapped);
-	if (err2)
-		err = err2;
+	z_erofs_parse_out_bvecs(be);
+	err = z_erofs_parse_in_bvecs(be, &overlapped);
 	if (err)
 		goto out;
 
@@ -993,6 +1035,7 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be,
 					.alg = pcl->algorithmformat,
 					.inplace_io = overlapped,
 					.partial_decoding = pcl->partial,
+					.fillgaps = pcl->multibases,
 				 }, be->pagepool);
 
 out:
@@ -1016,6 +1059,7 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be,
 	if (be->compressed_pages < be->onstack_pages ||
 	    be->compressed_pages >= be->onstack_pages + Z_EROFS_ONSTACK_PAGES)
 		kvfree(be->compressed_pages);
+	z_erofs_fill_other_copies(be, err);
 
 	for (i = 0; i < be->nr_pages; ++i) {
 		page = be->decompressed_pages[i];
@@ -1052,6 +1096,8 @@ static void z_erofs_decompress_queue(const struct z_erofs_decompressqueue *io,
 	struct z_erofs_decompress_backend be = {
 		.sb = io->sb,
 		.pagepool = pagepool,
+		.decompressed_secondary_bvecs =
+			LIST_HEAD_INIT(be.decompressed_secondary_bvecs),
 	};
 	z_erofs_next_pcluster_t owned = io->head;
 
diff --git a/fs/erofs/zdata.h b/fs/erofs/zdata.h
index a7fd44d21d9e..515fa2b28b97 100644
--- a/fs/erofs/zdata.h
+++ b/fs/erofs/zdata.h
@@ -84,6 +84,9 @@ struct z_erofs_pcluster {
 	/* L: whether partial decompression or not */
 	bool partial;
 
+	/* L: indicate several pageofs_outs or not */
+	bool multibases;
+
 	/* A: compressed bvecs (can be cached or inplaced pages) */
 	struct z_erofs_bvec compressed_bvecs[];
 };
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error
  2022-07-14 13:20 [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
                   ` (15 preceding siblings ...)
  2022-07-14 13:20 ` [PATCH 16/16] erofs: introduce multi-reference pclusters (fully-referenced) Gao Xiang
@ 2022-07-14 13:38 ` Gao Xiang
  16 siblings, 0 replies; 28+ messages in thread
From: Gao Xiang @ 2022-07-14 13:38 UTC (permalink / raw)
  To: linux-erofs, Chao Yu; +Cc: LKML

[-- Attachment #1: Type: text/plain, Size: 1474 bytes --]

On Thu, Jul 14, 2022 at 09:20:35PM +0800, Gao Xiang wrote:
> Hi folks,
> 
> I've been doing this for almost 2 months, the main point of this is
> to support large folios and rolling hash deduplication for compressed
> data.
> 
> This patchset is as a start of this work targeting for the next 5.20,
> it introduces a flexable range representation for (de)compressed buffers
> instead of too relying on page(s) directly themselves, so large folios
> can laterly base on this work.  Also, this patchset gets rid of all
> PG_error flags in the decompression code. It's a cleanup as a result
> as well.
> 
> In addition, this patchset kicks off rolling hash deduplication for
> compressed data by introducing fully-referenced multi-reference
> pclusters first instead of reporting fs corruption if one pcluster
> is introduced by several differnt extents.  The full implementation
> is expected to be finished in the merge window after the next.  One
> of my colleagues is actively working on the userspace part of this
> feature.
> 
> However, it's still easy to verify fully-referenced multi-reference
> pcluster by constructing some image by hand (see attachment):
> 
> Dataset: 300M
> seq-read (data-deduplicated, read_ahead_kb 8192): 1095MiB/s
> seq-read (data-deduplicated, read_ahead_kb 4096): 771MiB/s
> seq-read (data-deduplicated, read_ahead_kb 512):  577MiB/s
> seq-read (vanilla, read_ahead_kb 8192):         364MiB/s
> 

Testdata above as attachment for reference.

[-- Attachment #2: pat.erofs.xz --]
[-- Type: application/octet-stream, Size: 12212 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/16] erofs: get rid of unneeded `inode', `map' and `sb'
  2022-07-14 13:20 ` [PATCH 01/16] erofs: get rid of unneeded `inode', `map' and `sb' Gao Xiang
@ 2022-07-15  6:20   ` Yue Hu
  0 siblings, 0 replies; 28+ messages in thread
From: Yue Hu @ 2022-07-15  6:20 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-erofs, LKML, zhangwen

On Thu, 14 Jul 2022 21:20:36 +0800
Gao Xiang <hsiangkao@linux.alibaba.com> wrote:

> Since commit 5c6dcc57e2e5 ("erofs: get rid of
> `struct z_erofs_collector'"), these arguments can be dropped as well.
> 
> No logic changes.
> 
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>  fs/erofs/zdata.c | 42 +++++++++++++++++++-----------------------
>  1 file changed, 19 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index 724bb57075f6..1b6816dd235f 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -404,10 +404,9 @@ static void z_erofs_try_to_claim_pcluster(struct z_erofs_decompress_frontend *f)
>  	f->mode = COLLECT_PRIMARY;
>  }
>  
> -static int z_erofs_lookup_pcluster(struct z_erofs_decompress_frontend *fe,
> -				   struct inode *inode,
> -				   struct erofs_map_blocks *map)
> +static int z_erofs_lookup_pcluster(struct z_erofs_decompress_frontend *fe)
>  {
> +	struct erofs_map_blocks *map = &fe->map;
>  	struct z_erofs_pcluster *pcl = fe->pcl;
>  	unsigned int length;
>  
> @@ -449,10 +448,9 @@ static int z_erofs_lookup_pcluster(struct z_erofs_decompress_frontend *fe,
>  	return 0;
>  }
>  
> -static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe,
> -				     struct inode *inode,
> -				     struct erofs_map_blocks *map)
> +static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe)
>  {
> +	struct erofs_map_blocks *map = &fe->map;
>  	bool ztailpacking = map->m_flags & EROFS_MAP_META;
>  	struct z_erofs_pcluster *pcl;
>  	struct erofs_workgroup *grp;
> @@ -494,7 +492,7 @@ static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe,
>  	} else {
>  		pcl->obj.index = map->m_pa >> PAGE_SHIFT;
>  
> -		grp = erofs_insert_workgroup(inode->i_sb, &pcl->obj);
> +		grp = erofs_insert_workgroup(fe->inode->i_sb, &pcl->obj);
>  		if (IS_ERR(grp)) {
>  			err = PTR_ERR(grp);
>  			goto err_out;
> @@ -520,10 +518,9 @@ static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe,
>  	return err;
>  }
>  
> -static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe,
> -				   struct inode *inode,
> -				   struct erofs_map_blocks *map)
> +static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe)
>  {
> +	struct erofs_map_blocks *map = &fe->map;
>  	struct erofs_workgroup *grp;
>  	int ret;
>  
> @@ -541,19 +538,19 @@ static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe,
>  		goto tailpacking;
>  	}
>  
> -	grp = erofs_find_workgroup(inode->i_sb, map->m_pa >> PAGE_SHIFT);
> +	grp = erofs_find_workgroup(fe->inode->i_sb, map->m_pa >> PAGE_SHIFT);
>  	if (grp) {
>  		fe->pcl = container_of(grp, struct z_erofs_pcluster, obj);
>  	} else {
>  tailpacking:
> -		ret = z_erofs_register_pcluster(fe, inode, map);
> +		ret = z_erofs_register_pcluster(fe);
>  		if (!ret)
>  			goto out;
>  		if (ret != -EEXIST)
>  			return ret;
>  	}
>  
> -	ret = z_erofs_lookup_pcluster(fe, inode, map);
> +	ret = z_erofs_lookup_pcluster(fe);
>  	if (ret) {
>  		erofs_workgroup_put(&fe->pcl->obj);
>  		return ret;
> @@ -663,7 +660,7 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
>  	if (!(map->m_flags & EROFS_MAP_MAPPED))
>  		goto hitted;
>  
> -	err = z_erofs_collector_begin(fe, inode, map);
> +	err = z_erofs_collector_begin(fe);
>  	if (err)
>  		goto err_out;
>  
> @@ -1259,13 +1256,13 @@ static void z_erofs_decompressqueue_endio(struct bio *bio)
>  	bio_put(bio);
>  }
>  
> -static void z_erofs_submit_queue(struct super_block *sb,
> -				 struct z_erofs_decompress_frontend *f,
> +static void z_erofs_submit_queue(struct z_erofs_decompress_frontend *f,
>  				 struct page **pagepool,
>  				 struct z_erofs_decompressqueue *fgq,
>  				 bool *force_fg)
>  {
> -	struct erofs_sb_info *const sbi = EROFS_SB(sb);
> +	struct super_block *sb = f->inode->i_sb;
> +	struct address_space *mc = MNGD_MAPPING(EROFS_SB(sb));
>  	z_erofs_next_pcluster_t qtail[NR_JOBQUEUES];
>  	struct z_erofs_decompressqueue *q[NR_JOBQUEUES];
>  	void *bi_private;
> @@ -1317,7 +1314,7 @@ static void z_erofs_submit_queue(struct super_block *sb,
>  			struct page *page;
>  
>  			page = pickup_page_for_submission(pcl, i++, pagepool,
> -							  MNGD_MAPPING(sbi));
> +							  mc);
>  			if (!page)
>  				continue;
>  
> @@ -1369,15 +1366,14 @@ static void z_erofs_submit_queue(struct super_block *sb,
>  	z_erofs_decompress_kickoff(q[JQ_SUBMIT], *force_fg, nr_bios);
>  }
>  
> -static void z_erofs_runqueue(struct super_block *sb,
> -			     struct z_erofs_decompress_frontend *f,
> +static void z_erofs_runqueue(struct z_erofs_decompress_frontend *f,
>  			     struct page **pagepool, bool force_fg)
>  {
>  	struct z_erofs_decompressqueue io[NR_JOBQUEUES];
>  
>  	if (f->owned_head == Z_EROFS_PCLUSTER_TAIL)
>  		return;
> -	z_erofs_submit_queue(sb, f, pagepool, io, &force_fg);
> +	z_erofs_submit_queue(f, pagepool, io, &force_fg);
>  
>  	/* handle bypass queue (no i/o pclusters) immediately */
>  	z_erofs_decompress_queue(&io[JQ_BYPASS], pagepool);
> @@ -1475,7 +1471,7 @@ static int z_erofs_read_folio(struct file *file, struct folio *folio)
>  	(void)z_erofs_collector_end(&f);
>  
>  	/* if some compressed cluster ready, need submit them anyway */
> -	z_erofs_runqueue(inode->i_sb, &f, &pagepool,
> +	z_erofs_runqueue(&f, &pagepool,
>  			 z_erofs_get_sync_decompress_policy(sbi, 0));
>  
>  	if (err)
> @@ -1524,7 +1520,7 @@ static void z_erofs_readahead(struct readahead_control *rac)
>  	z_erofs_pcluster_readmore(&f, rac, 0, &pagepool, false);
>  	(void)z_erofs_collector_end(&f);
>  
> -	z_erofs_runqueue(inode->i_sb, &f, &pagepool,
> +	z_erofs_runqueue(&f, &pagepool,
>  			 z_erofs_get_sync_decompress_policy(sbi, nr_pages));
>  	erofs_put_metabuf(&f.map.buf);
>  	erofs_release_pages(&pagepool);

Reviewed-by: Yue Hu <huyue2@coolpad.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/16] erofs: clean up z_erofs_collector_begin()
  2022-07-14 13:20 ` [PATCH 02/16] erofs: clean up z_erofs_collector_begin() Gao Xiang
@ 2022-07-15  6:22   ` Yue Hu
  0 siblings, 0 replies; 28+ messages in thread
From: Yue Hu @ 2022-07-15  6:22 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-erofs, LKML, zhangwen

On Thu, 14 Jul 2022 21:20:37 +0800
Gao Xiang <hsiangkao@linux.alibaba.com> wrote:

> Rearrange the code and get rid of all gotos.
> 
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>  fs/erofs/zdata.c | 32 +++++++++++++++-----------------
>  1 file changed, 15 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index 1b6816dd235f..c7be447ac64d 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -521,7 +521,7 @@ static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe)
>  static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe)
>  {
>  	struct erofs_map_blocks *map = &fe->map;
> -	struct erofs_workgroup *grp;
> +	struct erofs_workgroup *grp = NULL;
>  	int ret;
>  
>  	DBG_BUGON(fe->pcl);
> @@ -530,33 +530,31 @@ static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe)
>  	DBG_BUGON(fe->owned_head == Z_EROFS_PCLUSTER_NIL);
>  	DBG_BUGON(fe->owned_head == Z_EROFS_PCLUSTER_TAIL_CLOSED);
>  
> -	if (map->m_flags & EROFS_MAP_META) {
> -		if ((map->m_pa & ~PAGE_MASK) + map->m_plen > PAGE_SIZE) {
> -			DBG_BUGON(1);
> -			return -EFSCORRUPTED;
> -		}
> -		goto tailpacking;
> +	if (!(map->m_flags & EROFS_MAP_META)) {
> +		grp = erofs_find_workgroup(fe->inode->i_sb,
> +					   map->m_pa >> PAGE_SHIFT);
> +	} else if ((map->m_pa & ~PAGE_MASK) + map->m_plen > PAGE_SIZE) {
> +		DBG_BUGON(1);
> +		return -EFSCORRUPTED;
>  	}
>  
> -	grp = erofs_find_workgroup(fe->inode->i_sb, map->m_pa >> PAGE_SHIFT);
>  	if (grp) {
>  		fe->pcl = container_of(grp, struct z_erofs_pcluster, obj);
> +		ret = -EEXIST;
>  	} else {
> -tailpacking:
>  		ret = z_erofs_register_pcluster(fe);
> -		if (!ret)
> -			goto out;
> -		if (ret != -EEXIST)
> -			return ret;
>  	}
>  
> -	ret = z_erofs_lookup_pcluster(fe);
> -	if (ret) {
> -		erofs_workgroup_put(&fe->pcl->obj);
> +	if (ret == -EEXIST) {
> +		ret = z_erofs_lookup_pcluster(fe);
> +		if (ret) {
> +			erofs_workgroup_put(&fe->pcl->obj);
> +			return ret;
> +		}
> +	} else if (ret) {
>  		return ret;
>  	}
>  
> -out:
>  	z_erofs_pagevec_ctor_init(&fe->vector, Z_EROFS_NR_INLINE_PAGEVECS,
>  				  fe->pcl->pagevec, fe->pcl->vcnt);
>  	/* since file-backed online pages are traversed in reverse order */

Reviewed-by: Yue Hu <huyue2@coolpad.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 03/16] erofs: introduce `z_erofs_parse_out_bvecs()'
  2022-07-14 13:20 ` [PATCH 03/16] erofs: introduce `z_erofs_parse_out_bvecs()' Gao Xiang
@ 2022-07-15  6:22   ` Yue Hu
  0 siblings, 0 replies; 28+ messages in thread
From: Yue Hu @ 2022-07-15  6:22 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-erofs, LKML, zhangwen

On Thu, 14 Jul 2022 21:20:38 +0800
Gao Xiang <hsiangkao@linux.alibaba.com> wrote:

> `z_erofs_decompress_pcluster()' is too long therefore it'd be better
> to introduce another helper to parse decompressed pages (or laterly,
> decompressed bvecs.)
> 
> BTW, since `decompressed_bvecs' is too long as a part of the function
> name, `out_bvecs' is used instead.
> 
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>  fs/erofs/zdata.c | 81 +++++++++++++++++++++++++-----------------------
>  1 file changed, 43 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index c7be447ac64d..c183cd0bc42b 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -778,18 +778,58 @@ static bool z_erofs_page_is_invalidated(struct page *page)
>  	return !page->mapping && !z_erofs_is_shortlived_page(page);
>  }
>  
> +static int z_erofs_parse_out_bvecs(struct z_erofs_pcluster *pcl,
> +				   struct page **pages, struct page **pagepool)
> +{
> +	struct z_erofs_pagevec_ctor ctor;
> +	enum z_erofs_page_type page_type;
> +	int i, err = 0;
> +
> +	z_erofs_pagevec_ctor_init(&ctor, Z_EROFS_NR_INLINE_PAGEVECS,
> +				  pcl->pagevec, 0);
> +	for (i = 0; i < pcl->vcnt; ++i) {
> +		struct page *page = z_erofs_pagevec_dequeue(&ctor, &page_type);
> +		unsigned int pagenr;
> +
> +		/* all pages in pagevec ought to be valid */
> +		DBG_BUGON(!page);
> +		DBG_BUGON(z_erofs_page_is_invalidated(page));
> +
> +		if (z_erofs_put_shortlivedpage(pagepool, page))
> +			continue;
> +
> +		if (page_type == Z_EROFS_VLE_PAGE_TYPE_HEAD)
> +			pagenr = 0;
> +		else
> +			pagenr = z_erofs_onlinepage_index(page);
> +
> +		DBG_BUGON(pagenr >= pcl->nr_pages);
> +		/*
> +		 * currently EROFS doesn't support multiref(dedup),
> +		 * so here erroring out one multiref page.
> +		 */
> +		if (pages[pagenr]) {
> +			DBG_BUGON(1);
> +			SetPageError(pages[pagenr]);
> +			z_erofs_onlinepage_endio(pages[pagenr]);
> +			err = -EFSCORRUPTED;
> +		}
> +		pages[pagenr] = page;
> +	}
> +	z_erofs_pagevec_ctor_exit(&ctor, true);
> +	return err;
> +}
> +
>  static int z_erofs_decompress_pcluster(struct super_block *sb,
>  				       struct z_erofs_pcluster *pcl,
>  				       struct page **pagepool)
>  {
>  	struct erofs_sb_info *const sbi = EROFS_SB(sb);
>  	unsigned int pclusterpages = z_erofs_pclusterpages(pcl);
> -	struct z_erofs_pagevec_ctor ctor;
>  	unsigned int i, inputsize, outputsize, llen, nr_pages;
>  	struct page *pages_onstack[Z_EROFS_VMAP_ONSTACK_PAGES];
>  	struct page **pages, **compressed_pages, *page;
>  
> -	enum z_erofs_page_type page_type;
>  	bool overlapped, partial;
>  	int err;
>  
> @@ -823,42 +863,7 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
>  	for (i = 0; i < nr_pages; ++i)
>  		pages[i] = NULL;
>  
> -	err = 0;
> -	z_erofs_pagevec_ctor_init(&ctor, Z_EROFS_NR_INLINE_PAGEVECS,
> -				  pcl->pagevec, 0);
> -
> -	for (i = 0; i < pcl->vcnt; ++i) {
> -		unsigned int pagenr;
> -
> -		page = z_erofs_pagevec_dequeue(&ctor, &page_type);
> -
> -		/* all pages in pagevec ought to be valid */
> -		DBG_BUGON(!page);
> -		DBG_BUGON(z_erofs_page_is_invalidated(page));
> -
> -		if (z_erofs_put_shortlivedpage(pagepool, page))
> -			continue;
> -
> -		if (page_type == Z_EROFS_VLE_PAGE_TYPE_HEAD)
> -			pagenr = 0;
> -		else
> -			pagenr = z_erofs_onlinepage_index(page);
> -
> -		DBG_BUGON(pagenr >= nr_pages);
> -
> -		/*
> -		 * currently EROFS doesn't support multiref(dedup),
> -		 * so here erroring out one multiref page.
> -		 */
> -		if (pages[pagenr]) {
> -			DBG_BUGON(1);
> -			SetPageError(pages[pagenr]);
> -			z_erofs_onlinepage_endio(pages[pagenr]);
> -			err = -EFSCORRUPTED;
> -		}
> -		pages[pagenr] = page;
> -	}
> -	z_erofs_pagevec_ctor_exit(&ctor, true);
> +	err = z_erofs_parse_out_bvecs(pcl, pages, pagepool);
>  
>  	overlapped = false;
>  	compressed_pages = pcl->compressed_pages;

Reviewed-by: Yue Hu <huyue2@coolpad.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 04/16] erofs: introduce bufvec to store decompressed buffers
  2022-07-14 13:20 ` [PATCH 04/16] erofs: introduce bufvec to store decompressed buffers Gao Xiang
@ 2022-07-15  6:29   ` Yue Hu
  2022-07-15  6:36     ` Gao Xiang
  0 siblings, 1 reply; 28+ messages in thread
From: Yue Hu @ 2022-07-15  6:29 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-erofs, LKML, zhangwen

On Thu, 14 Jul 2022 21:20:39 +0800
Gao Xiang <hsiangkao@linux.alibaba.com> wrote:

> For each pcluster, the total compressed buffers are determined in
> advance, yet the number of decompressed buffers actually vary.  Too
> many decompressed pages can be recorded if one pcluster is highly
> compressed or its pcluster size is large.  That takes extra memory
> footprints compared to uncompressed filesystems, especially a lot of
> I/O in flight on low-ended devices.
> 
> Therefore, similar to inplace I/O, pagevec was introduced to reuse
> page cache to store these pointers in the time-sharing way since
> these pages are actually unused before decompressing.
> 
> In order to make it more flexable, a cleaner bufvec is used to
> replace the old pagevec stuffs so that
> 
>  - Decompressed offsets can be stored inline, thus it can be used
>    for the upcoming feature like compressed data deduplication;
> 
>  - Towards supporting large folios for compressed inodes since
>    our final goal is to completely avoid page->private but use
>    folio->private only for all page cache pages.
> 
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>  fs/erofs/zdata.c | 177 +++++++++++++++++++++++++++++++++++------------
>  fs/erofs/zdata.h |  26 +++++--
>  2 files changed, 153 insertions(+), 50 deletions(-)
> 
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index c183cd0bc42b..f52c54058f31 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -2,6 +2,7 @@
>  /*
>   * Copyright (C) 2018 HUAWEI, Inc.
>   *             https://www.huawei.com/
> + * Copyright (C) 2022 Alibaba Cloud
>   */
>  #include "zdata.h"
>  #include "compress.h"
> @@ -26,6 +27,82 @@ static struct z_erofs_pcluster_slab pcluster_pool[] __read_mostly = {
>  	_PCLP(Z_EROFS_PCLUSTER_MAX_PAGES)
>  };
>  
> +struct z_erofs_bvec_iter {
> +	struct page *bvpage;
> +	struct z_erofs_bvset *bvset;
> +	unsigned int nr, cur;
> +};
> +
> +static struct page *z_erofs_bvec_iter_end(struct z_erofs_bvec_iter *iter)
> +{
> +	if (iter->bvpage)
> +		kunmap_local(iter->bvset);
> +	return iter->bvpage;
> +}
> +
> +static struct page *z_erofs_bvset_flip(struct z_erofs_bvec_iter *iter)
> +{
> +	unsigned long base = (unsigned long)((struct z_erofs_bvset *)0)->bvec;
> +	/* have to access nextpage in advance, otherwise it will be unmapped */
> +	struct page *nextpage = iter->bvset->nextpage;
> +	struct page *oldpage;
> +
> +	DBG_BUGON(!nextpage);
> +	oldpage = z_erofs_bvec_iter_end(iter);
> +	iter->bvpage = nextpage;
> +	iter->bvset = kmap_local_page(nextpage);
> +	iter->nr = (PAGE_SIZE - base) / sizeof(struct z_erofs_bvec);
> +	iter->cur = 0;
> +	return oldpage;
> +}
> +
> +static void z_erofs_bvec_iter_begin(struct z_erofs_bvec_iter *iter,
> +				    struct z_erofs_bvset_inline *bvset,
> +				    unsigned int bootstrap_nr,
> +				    unsigned int cur)
> +{
> +	*iter = (struct z_erofs_bvec_iter) {
> +		.nr = bootstrap_nr,
> +		.bvset = (struct z_erofs_bvset *)bvset,
> +	};
> +
> +	while (cur > iter->nr) {
> +		cur -= iter->nr;
> +		z_erofs_bvset_flip(iter);
> +	}
> +	iter->cur = cur;
> +}
> +
> +static int z_erofs_bvec_enqueue(struct z_erofs_bvec_iter *iter,
> +				struct z_erofs_bvec *bvec,
> +				struct page **candidate_bvpage)
> +{
> +	if (iter->cur == iter->nr) {
> +		if (!*candidate_bvpage)
> +			return -EAGAIN;
> +
> +		DBG_BUGON(iter->bvset->nextpage);
> +		iter->bvset->nextpage = *candidate_bvpage;
> +		z_erofs_bvset_flip(iter);
> +
> +		iter->bvset->nextpage = NULL;
> +		*candidate_bvpage = NULL;
> +	}
> +	iter->bvset->bvec[iter->cur++] = *bvec;
> +	return 0;
> +}
> +
> +static void z_erofs_bvec_dequeue(struct z_erofs_bvec_iter *iter,
> +				 struct z_erofs_bvec *bvec,
> +				 struct page **old_bvpage)
> +{
> +	if (iter->cur == iter->nr)
> +		*old_bvpage = z_erofs_bvset_flip(iter);
> +	else
> +		*old_bvpage = NULL;
> +	*bvec = iter->bvset->bvec[iter->cur++];
> +}
> +

Touch a new file to include bufvec related code? call it zbvec.c/h?

>  static void z_erofs_destroy_pcluster_pool(void)
>  {
>  	int i;
> @@ -195,9 +272,10 @@ enum z_erofs_collectmode {
>  struct z_erofs_decompress_frontend {
>  	struct inode *const inode;
>  	struct erofs_map_blocks map;
> -
> +	struct z_erofs_bvec_iter biter;
>  	struct z_erofs_pagevec_ctor vector;
>  
> +	struct page *candidate_bvpage;
>  	struct z_erofs_pcluster *pcl, *tailpcl;
>  	/* a pointer used to pick up inplace I/O pages */
>  	struct page **icpage_ptr;
> @@ -358,21 +436,24 @@ static bool z_erofs_try_inplace_io(struct z_erofs_decompress_frontend *fe,
>  
>  /* callers must be with pcluster lock held */
>  static int z_erofs_attach_page(struct z_erofs_decompress_frontend *fe,
> -			       struct page *page, enum z_erofs_page_type type,
> -			       bool pvec_safereuse)
> +			       struct z_erofs_bvec *bvec,
> +			       enum z_erofs_page_type type)
>  {
>  	int ret;
>  
> -	/* give priority for inplaceio */
>  	if (fe->mode >= COLLECT_PRIMARY &&
> -	    type == Z_EROFS_PAGE_TYPE_EXCLUSIVE &&
> -	    z_erofs_try_inplace_io(fe, page))
> -		return 0;
> -
> -	ret = z_erofs_pagevec_enqueue(&fe->vector, page, type,
> -				      pvec_safereuse);
> -	fe->pcl->vcnt += (unsigned int)ret;
> -	return ret ? 0 : -EAGAIN;
> +	    type == Z_EROFS_PAGE_TYPE_EXCLUSIVE) {
> +		/* give priority for inplaceio to use file pages first */
> +		if (z_erofs_try_inplace_io(fe, bvec->page))
> +			return 0;
> +		/* otherwise, check if it can be used as a bvpage */
> +		if (fe->mode >= COLLECT_PRIMARY_FOLLOWED &&
> +		    !fe->candidate_bvpage)
> +			fe->candidate_bvpage = bvec->page;
> +	}
> +	ret = z_erofs_bvec_enqueue(&fe->biter, bvec, &fe->candidate_bvpage);
> +	fe->pcl->vcnt += (ret >= 0);
> +	return ret;
>  }
>  
>  static void z_erofs_try_to_claim_pcluster(struct z_erofs_decompress_frontend *f)
> @@ -554,9 +635,8 @@ static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe)
>  	} else if (ret) {
>  		return ret;
>  	}
> -
> -	z_erofs_pagevec_ctor_init(&fe->vector, Z_EROFS_NR_INLINE_PAGEVECS,
> -				  fe->pcl->pagevec, fe->pcl->vcnt);
> +	z_erofs_bvec_iter_begin(&fe->biter, &fe->pcl->bvset,
> +				Z_EROFS_NR_INLINE_PAGEVECS, fe->pcl->vcnt);
>  	/* since file-backed online pages are traversed in reverse order */
>  	fe->icpage_ptr = fe->pcl->compressed_pages +
>  			z_erofs_pclusterpages(fe->pcl);
> @@ -588,9 +668,14 @@ static bool z_erofs_collector_end(struct z_erofs_decompress_frontend *fe)
>  	if (!pcl)
>  		return false;
>  
> -	z_erofs_pagevec_ctor_exit(&fe->vector, false);
> +	z_erofs_bvec_iter_end(&fe->biter);
>  	mutex_unlock(&pcl->lock);
>  
> +	if (fe->candidate_bvpage) {
> +		DBG_BUGON(z_erofs_is_shortlived_page(fe->candidate_bvpage));
> +		fe->candidate_bvpage = NULL;
> +	}
> +
>  	/*
>  	 * if all pending pages are added, don't hold its reference
>  	 * any longer if the pcluster isn't hosted by ourselves.
> @@ -712,22 +797,23 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
>  		tight &= (fe->mode >= COLLECT_PRIMARY_FOLLOWED);
>  
>  retry:
> -	err = z_erofs_attach_page(fe, page, page_type,
> -				  fe->mode >= COLLECT_PRIMARY_FOLLOWED);
> -	/* should allocate an additional short-lived page for pagevec */
> -	if (err == -EAGAIN) {
> -		struct page *const newpage =
> -				alloc_page(GFP_NOFS | __GFP_NOFAIL);
> -
> -		set_page_private(newpage, Z_EROFS_SHORTLIVED_PAGE);
> -		err = z_erofs_attach_page(fe, newpage,
> -					  Z_EROFS_PAGE_TYPE_EXCLUSIVE, true);
> -		if (!err)
> -			goto retry;
> +	err = z_erofs_attach_page(fe, &((struct z_erofs_bvec) {
> +					.page = page,
> +					.offset = offset - map->m_la,
> +					.end = end,
> +				  }), page_type);
> +	/* should allocate an additional short-lived page for bvset */
> +	if (err == -EAGAIN && !fe->candidate_bvpage) {
> +		fe->candidate_bvpage = alloc_page(GFP_NOFS | __GFP_NOFAIL);
> +		set_page_private(fe->candidate_bvpage,
> +				 Z_EROFS_SHORTLIVED_PAGE);
> +		goto retry;
>  	}
>  
> -	if (err)
> +	if (err) {
> +		DBG_BUGON(err == -EAGAIN && fe->candidate_bvpage);
>  		goto err_out;
> +	}
>  
>  	index = page->index - (map->m_la >> PAGE_SHIFT);
>  
> @@ -781,29 +867,24 @@ static bool z_erofs_page_is_invalidated(struct page *page)
>  static int z_erofs_parse_out_bvecs(struct z_erofs_pcluster *pcl,
>  				   struct page **pages, struct page **pagepool)
>  {
> -	struct z_erofs_pagevec_ctor ctor;
> -	enum z_erofs_page_type page_type;
> +	struct z_erofs_bvec_iter biter;
> +	struct page *old_bvpage;
>  	int i, err = 0;
>  
> -	z_erofs_pagevec_ctor_init(&ctor, Z_EROFS_NR_INLINE_PAGEVECS,
> -				  pcl->pagevec, 0);
> +	z_erofs_bvec_iter_begin(&biter, &pcl->bvset,
> +				Z_EROFS_NR_INLINE_PAGEVECS, 0);
>  	for (i = 0; i < pcl->vcnt; ++i) {
> -		struct page *page = z_erofs_pagevec_dequeue(&ctor, &page_type);
> +		struct z_erofs_bvec bvec;
>  		unsigned int pagenr;
>  
> -		/* all pages in pagevec ought to be valid */
> -		DBG_BUGON(!page);
> -		DBG_BUGON(z_erofs_page_is_invalidated(page));
> -
> -		if (z_erofs_put_shortlivedpage(pagepool, page))
> -			continue;
> +		z_erofs_bvec_dequeue(&biter, &bvec, &old_bvpage);
>  
> -		if (page_type == Z_EROFS_VLE_PAGE_TYPE_HEAD)
> -			pagenr = 0;
> -		else
> -			pagenr = z_erofs_onlinepage_index(page);
> +		if (old_bvpage)
> +			z_erofs_put_shortlivedpage(pagepool, old_bvpage);
>  
> +		pagenr = (bvec.offset + pcl->pageofs_out) >> PAGE_SHIFT;
>  		DBG_BUGON(pagenr >= pcl->nr_pages);
> +		DBG_BUGON(z_erofs_page_is_invalidated(bvec.page));
>  		/*
>  		 * currently EROFS doesn't support multiref(dedup),
>  		 * so here erroring out one multiref page.
> @@ -814,9 +895,12 @@ static int z_erofs_parse_out_bvecs(struct z_erofs_pcluster *pcl,
>  			z_erofs_onlinepage_endio(pages[pagenr]);
>  			err = -EFSCORRUPTED;
>  		}
> -		pages[pagenr] = page;
> +		pages[pagenr] = bvec.page;
>  	}
> -	z_erofs_pagevec_ctor_exit(&ctor, true);
> +
> +	old_bvpage = z_erofs_bvec_iter_end(&biter);
> +	if (old_bvpage)
> +		z_erofs_put_shortlivedpage(pagepool, old_bvpage);
>  	return err;
>  }
>  
> @@ -986,6 +1070,7 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
>  		kvfree(pages);
>  
>  	pcl->nr_pages = 0;
> +	pcl->bvset.nextpage = NULL;
>  	pcl->vcnt = 0;
>  
>  	/* pcluster lock MUST be taken before the following line */
> diff --git a/fs/erofs/zdata.h b/fs/erofs/zdata.h
> index 58053bb5066f..d03e333e4fde 100644
> --- a/fs/erofs/zdata.h
> +++ b/fs/erofs/zdata.h
> @@ -21,6 +21,21 @@
>   */
>  typedef void *z_erofs_next_pcluster_t;
>  
> +struct z_erofs_bvec {
> +	struct page *page;
> +	int offset;
> +	unsigned int end;
> +};
> +
> +#define __Z_EROFS_BVSET(name, total) \
> +struct name { \
> +	/* point to the next page which contains the following bvecs */ \
> +	struct page *nextpage; \
> +	struct z_erofs_bvec bvec[total]; \
> +}
> +__Z_EROFS_BVSET(z_erofs_bvset,);
> +__Z_EROFS_BVSET(z_erofs_bvset_inline, Z_EROFS_NR_INLINE_PAGEVECS);
> +
>  /*
>   * Structure fields follow one of the following exclusion rules.
>   *
> @@ -41,22 +56,25 @@ struct z_erofs_pcluster {
>  	/* A: lower limit of decompressed length and if full length or not */
>  	unsigned int length;
>  
> +	/* L: total number of bvecs */
> +	unsigned int vcnt;
> +
>  	/* I: page offset of start position of decompression */
>  	unsigned short pageofs_out;
>  
>  	/* I: page offset of inline compressed data */
>  	unsigned short pageofs_in;
>  
> -	/* L: maximum relative page index in pagevec[] */
> +	/* L: maximum relative page index in bvecs */
>  	unsigned short nr_pages;
>  
> -	/* L: total number of pages in pagevec[] */
> -	unsigned int vcnt;
> -
>  	union {
>  		/* L: inline a certain number of pagevecs for bootstrap */
>  		erofs_vtptr_t pagevec[Z_EROFS_NR_INLINE_PAGEVECS];
>  
> +		/* L: inline a certain number of bvec for bootstrap */
> +		struct z_erofs_bvset_inline bvset;
> +
>  		/* I: can be used to free the pcluster by RCU. */
>  		struct rcu_head rcu;
>  	};


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 04/16] erofs: introduce bufvec to store decompressed buffers
  2022-07-15  6:29   ` Yue Hu
@ 2022-07-15  6:36     ` Gao Xiang
  0 siblings, 0 replies; 28+ messages in thread
From: Gao Xiang @ 2022-07-15  6:36 UTC (permalink / raw)
  To: Yue Hu; +Cc: linux-erofs, LKML, zhangwen

Hi Yue,

On Fri, Jul 15, 2022 at 02:29:30PM +0800, Yue Hu wrote:
> On Thu, 14 Jul 2022 21:20:39 +0800
> Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> 
> > For each pcluster, the total compressed buffers are determined in
> > advance, yet the number of decompressed buffers actually vary.  Too
> > many decompressed pages can be recorded if one pcluster is highly
> > compressed or its pcluster size is large.  That takes extra memory
> > footprints compared to uncompressed filesystems, especially a lot of
> > I/O in flight on low-ended devices.
> > 
> > Therefore, similar to inplace I/O, pagevec was introduced to reuse
> > page cache to store these pointers in the time-sharing way since
> > these pages are actually unused before decompressing.
> > 
> > In order to make it more flexable, a cleaner bufvec is used to
> > replace the old pagevec stuffs so that
> > 
> >  - Decompressed offsets can be stored inline, thus it can be used
> >    for the upcoming feature like compressed data deduplication;
> > 
> >  - Towards supporting large folios for compressed inodes since
> >    our final goal is to completely avoid page->private but use
> >    folio->private only for all page cache pages.
> > 
> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > ---
> >  fs/erofs/zdata.c | 177 +++++++++++++++++++++++++++++++++++------------
> >  fs/erofs/zdata.h |  26 +++++--
> >  2 files changed, 153 insertions(+), 50 deletions(-)
> > 
> > diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> > index c183cd0bc42b..f52c54058f31 100644
> > --- a/fs/erofs/zdata.c
> > +++ b/fs/erofs/zdata.c
> > @@ -2,6 +2,7 @@
> >  /*
> >   * Copyright (C) 2018 HUAWEI, Inc.
> >   *             https://www.huawei.com/
> > + * Copyright (C) 2022 Alibaba Cloud
> >   */
> >  #include "zdata.h"
> >  #include "compress.h"
> > @@ -26,6 +27,82 @@ static struct z_erofs_pcluster_slab pcluster_pool[] __read_mostly = {
> >  	_PCLP(Z_EROFS_PCLUSTER_MAX_PAGES)
> >  };
> >  
> > +struct z_erofs_bvec_iter {
> > +	struct page *bvpage;
> > +	struct z_erofs_bvset *bvset;
> > +	unsigned int nr, cur;
> > +};
> > +
> > +static struct page *z_erofs_bvec_iter_end(struct z_erofs_bvec_iter *iter)
> > +{
> > +	if (iter->bvpage)
> > +		kunmap_local(iter->bvset);
> > +	return iter->bvpage;
> > +}
> > +
> > +static struct page *z_erofs_bvset_flip(struct z_erofs_bvec_iter *iter)
> > +{
> > +	unsigned long base = (unsigned long)((struct z_erofs_bvset *)0)->bvec;
> > +	/* have to access nextpage in advance, otherwise it will be unmapped */
> > +	struct page *nextpage = iter->bvset->nextpage;
> > +	struct page *oldpage;
> > +
> > +	DBG_BUGON(!nextpage);
> > +	oldpage = z_erofs_bvec_iter_end(iter);
> > +	iter->bvpage = nextpage;
> > +	iter->bvset = kmap_local_page(nextpage);
> > +	iter->nr = (PAGE_SIZE - base) / sizeof(struct z_erofs_bvec);
> > +	iter->cur = 0;
> > +	return oldpage;
> > +}
> > +
> > +static void z_erofs_bvec_iter_begin(struct z_erofs_bvec_iter *iter,
> > +				    struct z_erofs_bvset_inline *bvset,
> > +				    unsigned int bootstrap_nr,
> > +				    unsigned int cur)
> > +{
> > +	*iter = (struct z_erofs_bvec_iter) {
> > +		.nr = bootstrap_nr,
> > +		.bvset = (struct z_erofs_bvset *)bvset,
> > +	};
> > +
> > +	while (cur > iter->nr) {
> > +		cur -= iter->nr;
> > +		z_erofs_bvset_flip(iter);
> > +	}
> > +	iter->cur = cur;
> > +}
> > +
> > +static int z_erofs_bvec_enqueue(struct z_erofs_bvec_iter *iter,
> > +				struct z_erofs_bvec *bvec,
> > +				struct page **candidate_bvpage)
> > +{
> > +	if (iter->cur == iter->nr) {
> > +		if (!*candidate_bvpage)
> > +			return -EAGAIN;
> > +
> > +		DBG_BUGON(iter->bvset->nextpage);
> > +		iter->bvset->nextpage = *candidate_bvpage;
> > +		z_erofs_bvset_flip(iter);
> > +
> > +		iter->bvset->nextpage = NULL;
> > +		*candidate_bvpage = NULL;
> > +	}
> > +	iter->bvset->bvec[iter->cur++] = *bvec;
> > +	return 0;
> > +}
> > +
> > +static void z_erofs_bvec_dequeue(struct z_erofs_bvec_iter *iter,
> > +				 struct z_erofs_bvec *bvec,
> > +				 struct page **old_bvpage)
> > +{
> > +	if (iter->cur == iter->nr)
> > +		*old_bvpage = z_erofs_bvset_flip(iter);
> > +	else
> > +		*old_bvpage = NULL;
> > +	*bvec = iter->bvset->bvec[iter->cur++];
> > +}
> > +
> 
> Touch a new file to include bufvec related code? call it zbvec.c/h?

Thanks for the suggestion. The new implementation is simple enough,
so I tend to directly leave it in zdata.c instead of making more
churn to add more new files and zpvec.h will be completely removed
in the following patch.

Thanks,
Gao Xiang


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 05/16] erofs: drop the old pagevec approach
  2022-07-14 13:20 ` [PATCH 05/16] erofs: drop the old pagevec approach Gao Xiang
@ 2022-07-15  7:07   ` Yue Hu
  2022-07-15  7:19     ` Gao Xiang
  0 siblings, 1 reply; 28+ messages in thread
From: Yue Hu @ 2022-07-15  7:07 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-erofs, LKML, zhangwen

On Thu, 14 Jul 2022 21:20:40 +0800
Gao Xiang <hsiangkao@linux.alibaba.com> wrote:

> Remove the old pagevec approach but keep z_erofs_page_type for now.
> It will be reworked in the following commits as well.
> 
> Also rename Z_EROFS_NR_INLINE_PAGEVECS as Z_EROFS_INLINE_BVECS with
> the new value 2 since it's actually enough to bootstrap.

I notice there are 2 comments as below which still use pagevec, should we
update it as well?

[1] 
* pagevec) since it can be directly decoded without I/O submission.
[2]
* for inplace I/O or pagevec (should be processed in strict order.)

BTW, utils.c includes needles <pagevec.h>, we can remove it along with the patchset
or remove it later.

> 
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>  fs/erofs/zdata.c |  17 +++--
>  fs/erofs/zdata.h |   9 +--
>  fs/erofs/zpvec.h | 159 -----------------------------------------------
>  3 files changed, 16 insertions(+), 169 deletions(-)
>  delete mode 100644 fs/erofs/zpvec.h
> 
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index f52c54058f31..e96704db106e 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -27,6 +27,17 @@ static struct z_erofs_pcluster_slab pcluster_pool[] __read_mostly = {
>  	_PCLP(Z_EROFS_PCLUSTER_MAX_PAGES)
>  };
>  
> +/* page type in pagevec for decompress subsystem */
> +enum z_erofs_page_type {
> +	/* including Z_EROFS_VLE_PAGE_TAIL_EXCLUSIVE */
> +	Z_EROFS_PAGE_TYPE_EXCLUSIVE,
> +
> +	Z_EROFS_VLE_PAGE_TYPE_TAIL_SHARED,
> +
> +	Z_EROFS_VLE_PAGE_TYPE_HEAD,
> +	Z_EROFS_VLE_PAGE_TYPE_MAX
> +};
> +
>  struct z_erofs_bvec_iter {
>  	struct page *bvpage;
>  	struct z_erofs_bvset *bvset;
> @@ -273,7 +284,6 @@ struct z_erofs_decompress_frontend {
>  	struct inode *const inode;
>  	struct erofs_map_blocks map;
>  	struct z_erofs_bvec_iter biter;
> -	struct z_erofs_pagevec_ctor vector;
>  
>  	struct page *candidate_bvpage;
>  	struct z_erofs_pcluster *pcl, *tailpcl;
> @@ -636,7 +646,7 @@ static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe)
>  		return ret;
>  	}
>  	z_erofs_bvec_iter_begin(&fe->biter, &fe->pcl->bvset,
> -				Z_EROFS_NR_INLINE_PAGEVECS, fe->pcl->vcnt);
> +				Z_EROFS_INLINE_BVECS, fe->pcl->vcnt);
>  	/* since file-backed online pages are traversed in reverse order */
>  	fe->icpage_ptr = fe->pcl->compressed_pages +
>  			z_erofs_pclusterpages(fe->pcl);
> @@ -871,8 +881,7 @@ static int z_erofs_parse_out_bvecs(struct z_erofs_pcluster *pcl,
>  	struct page *old_bvpage;
>  	int i, err = 0;
>  
> -	z_erofs_bvec_iter_begin(&biter, &pcl->bvset,
> -				Z_EROFS_NR_INLINE_PAGEVECS, 0);
> +	z_erofs_bvec_iter_begin(&biter, &pcl->bvset, Z_EROFS_INLINE_BVECS, 0);
>  	for (i = 0; i < pcl->vcnt; ++i) {
>  		struct z_erofs_bvec bvec;
>  		unsigned int pagenr;
> diff --git a/fs/erofs/zdata.h b/fs/erofs/zdata.h
> index d03e333e4fde..a755c5a44d87 100644
> --- a/fs/erofs/zdata.h
> +++ b/fs/erofs/zdata.h
> @@ -7,10 +7,10 @@
>  #define __EROFS_FS_ZDATA_H
>  
>  #include "internal.h"
> -#include "zpvec.h"
> +#include "tagptr.h"
>  
>  #define Z_EROFS_PCLUSTER_MAX_PAGES	(Z_EROFS_PCLUSTER_MAX_SIZE / PAGE_SIZE)
> -#define Z_EROFS_NR_INLINE_PAGEVECS      3
> +#define Z_EROFS_INLINE_BVECS		2
>  
>  #define Z_EROFS_PCLUSTER_FULL_LENGTH    0x00000001
>  #define Z_EROFS_PCLUSTER_LENGTH_BIT     1
> @@ -34,7 +34,7 @@ struct name { \
>  	struct z_erofs_bvec bvec[total]; \
>  };
>  __Z_EROFS_BVSET(z_erofs_bvset,)
> -__Z_EROFS_BVSET(z_erofs_bvset_inline, Z_EROFS_NR_INLINE_PAGEVECS)
> +__Z_EROFS_BVSET(z_erofs_bvset_inline, Z_EROFS_INLINE_BVECS)
>  
>  /*
>   * Structure fields follow one of the following exclusion rules.
> @@ -69,9 +69,6 @@ struct z_erofs_pcluster {
>  	unsigned short nr_pages;
>  
>  	union {
> -		/* L: inline a certain number of pagevecs for bootstrap */
> -		erofs_vtptr_t pagevec[Z_EROFS_NR_INLINE_PAGEVECS];
> -
>  		/* L: inline a certain number of bvec for bootstrap */
>  		struct z_erofs_bvset_inline bvset;
>  
> diff --git a/fs/erofs/zpvec.h b/fs/erofs/zpvec.h
> deleted file mode 100644
> index b05464f4a808..000000000000
> --- a/fs/erofs/zpvec.h
> +++ /dev/null
> @@ -1,159 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
> -/*
> - * Copyright (C) 2018 HUAWEI, Inc.
> - *             https://www.huawei.com/
> - */
> -#ifndef __EROFS_FS_ZPVEC_H
> -#define __EROFS_FS_ZPVEC_H
> -
> -#include "tagptr.h"
> -
> -/* page type in pagevec for decompress subsystem */
> -enum z_erofs_page_type {
> -	/* including Z_EROFS_VLE_PAGE_TAIL_EXCLUSIVE */
> -	Z_EROFS_PAGE_TYPE_EXCLUSIVE,
> -
> -	Z_EROFS_VLE_PAGE_TYPE_TAIL_SHARED,
> -
> -	Z_EROFS_VLE_PAGE_TYPE_HEAD,
> -	Z_EROFS_VLE_PAGE_TYPE_MAX
> -};
> -
> -extern void __compiletime_error("Z_EROFS_PAGE_TYPE_EXCLUSIVE != 0")
> -	__bad_page_type_exclusive(void);
> -
> -/* pagevec tagged pointer */
> -typedef tagptr2_t	erofs_vtptr_t;
> -
> -/* pagevec collector */
> -struct z_erofs_pagevec_ctor {
> -	struct page *curr, *next;
> -	erofs_vtptr_t *pages;
> -
> -	unsigned int nr, index;
> -};
> -
> -static inline void z_erofs_pagevec_ctor_exit(struct z_erofs_pagevec_ctor *ctor,
> -					     bool atomic)
> -{
> -	if (!ctor->curr)
> -		return;
> -
> -	if (atomic)
> -		kunmap_atomic(ctor->pages);
> -	else
> -		kunmap(ctor->curr);
> -}
> -
> -static inline struct page *
> -z_erofs_pagevec_ctor_next_page(struct z_erofs_pagevec_ctor *ctor,
> -			       unsigned int nr)
> -{
> -	unsigned int index;
> -
> -	/* keep away from occupied pages */
> -	if (ctor->next)
> -		return ctor->next;
> -
> -	for (index = 0; index < nr; ++index) {
> -		const erofs_vtptr_t t = ctor->pages[index];
> -		const unsigned int tags = tagptr_unfold_tags(t);
> -
> -		if (tags == Z_EROFS_PAGE_TYPE_EXCLUSIVE)
> -			return tagptr_unfold_ptr(t);
> -	}
> -	DBG_BUGON(nr >= ctor->nr);
> -	return NULL;
> -}
> -
> -static inline void
> -z_erofs_pagevec_ctor_pagedown(struct z_erofs_pagevec_ctor *ctor,
> -			      bool atomic)
> -{
> -	struct page *next = z_erofs_pagevec_ctor_next_page(ctor, ctor->nr);
> -
> -	z_erofs_pagevec_ctor_exit(ctor, atomic);
> -
> -	ctor->curr = next;
> -	ctor->next = NULL;
> -	ctor->pages = atomic ?
> -		kmap_atomic(ctor->curr) : kmap(ctor->curr);
> -
> -	ctor->nr = PAGE_SIZE / sizeof(struct page *);
> -	ctor->index = 0;
> -}
> -
> -static inline void z_erofs_pagevec_ctor_init(struct z_erofs_pagevec_ctor *ctor,
> -					     unsigned int nr,
> -					     erofs_vtptr_t *pages,
> -					     unsigned int i)
> -{
> -	ctor->nr = nr;
> -	ctor->curr = ctor->next = NULL;
> -	ctor->pages = pages;
> -
> -	if (i >= nr) {
> -		i -= nr;
> -		z_erofs_pagevec_ctor_pagedown(ctor, false);
> -		while (i > ctor->nr) {
> -			i -= ctor->nr;
> -			z_erofs_pagevec_ctor_pagedown(ctor, false);
> -		}
> -	}
> -	ctor->next = z_erofs_pagevec_ctor_next_page(ctor, i);
> -	ctor->index = i;
> -}
> -
> -static inline bool z_erofs_pagevec_enqueue(struct z_erofs_pagevec_ctor *ctor,
> -					   struct page *page,
> -					   enum z_erofs_page_type type,
> -					   bool pvec_safereuse)
> -{
> -	if (!ctor->next) {
> -		/* some pages cannot be reused as pvec safely without I/O */
> -		if (type == Z_EROFS_PAGE_TYPE_EXCLUSIVE && !pvec_safereuse)
> -			type = Z_EROFS_VLE_PAGE_TYPE_TAIL_SHARED;
> -
> -		if (type != Z_EROFS_PAGE_TYPE_EXCLUSIVE &&
> -		    ctor->index + 1 == ctor->nr)
> -			return false;
> -	}
> -
> -	if (ctor->index >= ctor->nr)
> -		z_erofs_pagevec_ctor_pagedown(ctor, false);
> -
> -	/* exclusive page type must be 0 */
> -	if (Z_EROFS_PAGE_TYPE_EXCLUSIVE != (uintptr_t)NULL)
> -		__bad_page_type_exclusive();
> -
> -	/* should remind that collector->next never equal to 1, 2 */
> -	if (type == (uintptr_t)ctor->next) {
> -		ctor->next = page;
> -	}
> -	ctor->pages[ctor->index++] = tagptr_fold(erofs_vtptr_t, page, type);
> -	return true;
> -}
> -
> -static inline struct page *
> -z_erofs_pagevec_dequeue(struct z_erofs_pagevec_ctor *ctor,
> -			enum z_erofs_page_type *type)
> -{
> -	erofs_vtptr_t t;
> -
> -	if (ctor->index >= ctor->nr) {
> -		DBG_BUGON(!ctor->next);
> -		z_erofs_pagevec_ctor_pagedown(ctor, true);
> -	}
> -
> -	t = ctor->pages[ctor->index];
> -
> -	*type = tagptr_unfold_tags(t);
> -
> -	/* should remind that collector->next never equal to 1, 2 */
> -	if (*type == (uintptr_t)ctor->next)
> -		ctor->next = tagptr_unfold_ptr(t);
> -
> -	ctor->pages[ctor->index++] = tagptr_fold(erofs_vtptr_t, NULL, 0);
> -	return tagptr_unfold_ptr(t);
> -}
> -#endif


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 05/16] erofs: drop the old pagevec approach
  2022-07-15  7:07   ` Yue Hu
@ 2022-07-15  7:19     ` Gao Xiang
  2022-07-15  7:45       ` Yue Hu
  0 siblings, 1 reply; 28+ messages in thread
From: Gao Xiang @ 2022-07-15  7:19 UTC (permalink / raw)
  To: Yue Hu; +Cc: linux-erofs, LKML, zhangwen

On Fri, Jul 15, 2022 at 03:07:37PM +0800, Yue Hu wrote:
> On Thu, 14 Jul 2022 21:20:40 +0800
> Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> 
> > Remove the old pagevec approach but keep z_erofs_page_type for now.
> > It will be reworked in the following commits as well.
> > 
> > Also rename Z_EROFS_NR_INLINE_PAGEVECS as Z_EROFS_INLINE_BVECS with
> > the new value 2 since it's actually enough to bootstrap.
> 
> I notice there are 2 comments as below which still use pagevec, should we
> update it as well?
> 
> [1] 
> * pagevec) since it can be directly decoded without I/O submission.
> [2]
> * for inplace I/O or pagevec (should be processed in strict order.)

Yeah, thanks for reminder... I will update it in this patch in the next
version.

> 
> BTW, utils.c includes needles <pagevec.h>, we can remove it along with the patchset
> or remove it later.

That is a completely different stuff, would you mind submitting a patch
to remove it if needed?

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 05/16] erofs: drop the old pagevec approach
  2022-07-15  7:19     ` Gao Xiang
@ 2022-07-15  7:45       ` Yue Hu
  0 siblings, 0 replies; 28+ messages in thread
From: Yue Hu @ 2022-07-15  7:45 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-erofs, LKML, zhangwen

On Fri, 15 Jul 2022 15:19:46 +0800
Gao Xiang <hsiangkao@linux.alibaba.com> wrote:

> On Fri, Jul 15, 2022 at 03:07:37PM +0800, Yue Hu wrote:
> > On Thu, 14 Jul 2022 21:20:40 +0800
> > Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> >   
> > > Remove the old pagevec approach but keep z_erofs_page_type for now.
> > > It will be reworked in the following commits as well.
> > > 
> > > Also rename Z_EROFS_NR_INLINE_PAGEVECS as Z_EROFS_INLINE_BVECS with
> > > the new value 2 since it's actually enough to bootstrap.  
> > 
> > I notice there are 2 comments as below which still use pagevec, should we
> > update it as well?
> > 
> > [1] 
> > * pagevec) since it can be directly decoded without I/O submission.
> > [2]
> > * for inplace I/O or pagevec (should be processed in strict order.)  
> 
> Yeah, thanks for reminder... I will update it in this patch in the next
> version.
> 
> > 
> > BTW, utils.c includes needles <pagevec.h>, we can remove it along with the patchset
> > or remove it later.  
> 
> That is a completely different stuff, would you mind submitting a patch
> to remove it if needed?

ok, may submit later.

> 
> Thanks,
> Gao Xiang


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 07/16] erofs: switch compressed_pages[] to bufvec
  2022-07-14 13:20 ` [PATCH 07/16] erofs: switch compressed_pages[] to bufvec Gao Xiang
@ 2022-07-15  7:53   ` Yue Hu
  2022-07-15  7:59     ` Gao Xiang
  0 siblings, 1 reply; 28+ messages in thread
From: Yue Hu @ 2022-07-15  7:53 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-erofs, LKML

On Thu, 14 Jul 2022 21:20:42 +0800
Gao Xiang <hsiangkao@linux.alibaba.com> wrote:

> Convert compressed_pages[] to bufvec in order to avoid using
> page->private to keep onlinepage_index (decompressed offset)
> for inplace I/O pages.
> 
> In the future, we only rely on folio->private to keep a countdown
> to unlock folios and set folio_uptodate.
> 
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
>  fs/erofs/zdata.c | 113 +++++++++++++++++++++++------------------------
>  fs/erofs/zdata.h |   4 +-
>  2 files changed, 57 insertions(+), 60 deletions(-)
> 
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index 757d352bc2c7..f2e3f07baad7 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -134,7 +134,7 @@ static int z_erofs_create_pcluster_pool(void)
>  
>  	for (pcs = pcluster_pool;
>  	     pcs < pcluster_pool + ARRAY_SIZE(pcluster_pool); ++pcs) {
> -		size = struct_size(a, compressed_pages, pcs->maxpages);
> +		size = struct_size(a, compressed_bvecs, pcs->maxpages);
>  
>  		sprintf(pcs->name, "erofs_pcluster-%u", pcs->maxpages);
>  		pcs->slab = kmem_cache_create(pcs->name, size, 0,
> @@ -287,16 +287,16 @@ struct z_erofs_decompress_frontend {
>  
>  	struct page *candidate_bvpage;
>  	struct z_erofs_pcluster *pcl, *tailpcl;
> -	/* a pointer used to pick up inplace I/O pages */
> -	struct page **icpage_ptr;
>  	z_erofs_next_pcluster_t owned_head;
> -
>  	enum z_erofs_collectmode mode;
>  
>  	bool readahead;
>  	/* used for applying cache strategy on the fly */
>  	bool backmost;
>  	erofs_off_t headoffset;
> +
> +	/* a pointer used to pick up inplace I/O pages */
> +	unsigned int icur;

not a pointer?

>  };
>  
>  #define DECOMPRESS_FRONTEND_INIT(__i) { \
> @@ -319,24 +319,21 @@ static void z_erofs_bind_cache(struct z_erofs_decompress_frontend *fe,
>  	 */
>  	gfp_t gfp = (mapping_gfp_mask(mc) & ~__GFP_DIRECT_RECLAIM) |
>  			__GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN;
> -	struct page **pages;
> -	pgoff_t index;
> +	unsigned int i;
>  
>  	if (fe->mode < COLLECT_PRIMARY_FOLLOWED)
>  		return;
>  
> -	pages = pcl->compressed_pages;
> -	index = pcl->obj.index;
> -	for (; index < pcl->obj.index + pcl->pclusterpages; ++index, ++pages) {
> +	for (i = 0; i < pcl->pclusterpages; ++i) {
>  		struct page *page;
>  		compressed_page_t t;
>  		struct page *newpage = NULL;
>  
>  		/* the compressed page was loaded before */
> -		if (READ_ONCE(*pages))
> +		if (READ_ONCE(pcl->compressed_bvecs[i].page))
>  			continue;
>  
> -		page = find_get_page(mc, index);
> +		page = find_get_page(mc, pcl->obj.index + i);
>  
>  		if (page) {
>  			t = tag_compressed_page_justfound(page);
> @@ -357,7 +354,8 @@ static void z_erofs_bind_cache(struct z_erofs_decompress_frontend *fe,
>  			}
>  		}
>  
> -		if (!cmpxchg_relaxed(pages, NULL, tagptr_cast_ptr(t)))
> +		if (!cmpxchg_relaxed(&pcl->compressed_bvecs[i].page, NULL,
> +				     tagptr_cast_ptr(t)))
>  			continue;
>  
>  		if (page)
> @@ -388,7 +386,7 @@ int erofs_try_to_free_all_cached_pages(struct erofs_sb_info *sbi,
>  	 * therefore no need to worry about available decompression users.
>  	 */
>  	for (i = 0; i < pcl->pclusterpages; ++i) {
> -		struct page *page = pcl->compressed_pages[i];
> +		struct page *page = pcl->compressed_bvecs[i].page;
>  
>  		if (!page)
>  			continue;
> @@ -401,7 +399,7 @@ int erofs_try_to_free_all_cached_pages(struct erofs_sb_info *sbi,
>  			continue;
>  
>  		/* barrier is implied in the following 'unlock_page' */
> -		WRITE_ONCE(pcl->compressed_pages[i], NULL);
> +		WRITE_ONCE(pcl->compressed_bvecs[i].page, NULL);
>  		detach_page_private(page);
>  		unlock_page(page);
>  	}
> @@ -411,36 +409,39 @@ int erofs_try_to_free_all_cached_pages(struct erofs_sb_info *sbi,
>  int erofs_try_to_free_cached_page(struct page *page)
>  {
>  	struct z_erofs_pcluster *const pcl = (void *)page_private(page);
> -	int ret = 0;	/* 0 - busy */
> +	int ret, i;
>  
> -	if (erofs_workgroup_try_to_freeze(&pcl->obj, 1)) {
> -		unsigned int i;
> +	if (!erofs_workgroup_try_to_freeze(&pcl->obj, 1))
> +		return 0;
>  
> -		DBG_BUGON(z_erofs_is_inline_pcluster(pcl));
> -		for (i = 0; i < pcl->pclusterpages; ++i) {
> -			if (pcl->compressed_pages[i] == page) {
> -				WRITE_ONCE(pcl->compressed_pages[i], NULL);
> -				ret = 1;
> -				break;
> -			}
> +	ret = 0;
> +	DBG_BUGON(z_erofs_is_inline_pcluster(pcl));
> +	for (i = 0; i < pcl->pclusterpages; ++i) {
> +		if (pcl->compressed_bvecs[i].page == page) {
> +			WRITE_ONCE(pcl->compressed_bvecs[i].page, NULL);
> +			ret = 1;
> +			break;
>  		}
> -		erofs_workgroup_unfreeze(&pcl->obj, 1);
> -
> -		if (ret)
> -			detach_page_private(page);
>  	}
> +	erofs_workgroup_unfreeze(&pcl->obj, 1);
> +	if (ret)
> +		detach_page_private(page);
>  	return ret;
>  }
>  
>  /* page_type must be Z_EROFS_PAGE_TYPE_EXCLUSIVE */
>  static bool z_erofs_try_inplace_io(struct z_erofs_decompress_frontend *fe,
> -				   struct page *page)
> +				   struct z_erofs_bvec *bvec)
>  {
>  	struct z_erofs_pcluster *const pcl = fe->pcl;
>  
> -	while (fe->icpage_ptr > pcl->compressed_pages)
> -		if (!cmpxchg(--fe->icpage_ptr, NULL, page))
> +	while (fe->icur > 0) {
> +		if (!cmpxchg(&pcl->compressed_bvecs[--fe->icur].page,
> +			     NULL, bvec->page)) {
> +			pcl->compressed_bvecs[fe->icur] = *bvec;
>  			return true;
> +		}
> +	}
>  	return false;
>  }
>  
> @@ -454,7 +455,7 @@ static int z_erofs_attach_page(struct z_erofs_decompress_frontend *fe,
>  	if (fe->mode >= COLLECT_PRIMARY &&
>  	    type == Z_EROFS_PAGE_TYPE_EXCLUSIVE) {
>  		/* give priority for inplaceio to use file pages first */
> -		if (z_erofs_try_inplace_io(fe, bvec->page))
> +		if (z_erofs_try_inplace_io(fe, bvec))
>  			return 0;
>  		/* otherwise, check if it can be used as a bvpage */
>  		if (fe->mode >= COLLECT_PRIMARY_FOLLOWED &&
> @@ -648,8 +649,7 @@ static int z_erofs_collector_begin(struct z_erofs_decompress_frontend *fe)
>  	z_erofs_bvec_iter_begin(&fe->biter, &fe->pcl->bvset,
>  				Z_EROFS_INLINE_BVECS, fe->pcl->vcnt);
>  	/* since file-backed online pages are traversed in reverse order */
> -	fe->icpage_ptr = fe->pcl->compressed_pages +
> -			z_erofs_pclusterpages(fe->pcl);
> +	fe->icur = z_erofs_pclusterpages(fe->pcl);
>  	return 0;
>  }
>  
> @@ -769,7 +769,8 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
>  			goto err_out;
>  		}
>  		get_page(fe->map.buf.page);
> -		WRITE_ONCE(fe->pcl->compressed_pages[0], fe->map.buf.page);
> +		WRITE_ONCE(fe->pcl->compressed_bvecs[0].page,
> +			   fe->map.buf.page);
>  		fe->mode = COLLECT_PRIMARY_FOLLOWED_NOINPLACE;
>  	} else {
>  		/* bind cache first when cached decompression is preferred */
> @@ -927,8 +928,9 @@ static struct page **z_erofs_parse_in_bvecs(struct erofs_sb_info *sbi,
>  	*overlapped = false;
>  
>  	for (i = 0; i < pclusterpages; ++i) {
> -		unsigned int pagenr;
> -		struct page *page = pcl->compressed_pages[i];
> +		struct z_erofs_bvec *bvec = &pcl->compressed_bvecs[i];
> +		struct page *page = bvec->page;
> +		unsigned int pgnr;
>  
>  		/* compressed pages ought to be present before decompressing */
>  		if (!page) {
> @@ -951,21 +953,15 @@ static struct page **z_erofs_parse_in_bvecs(struct erofs_sb_info *sbi,
>  				continue;
>  			}
>  
> -			/*
> -			 * only if non-head page can be selected
> -			 * for inplace decompression
> -			 */
> -			pagenr = z_erofs_onlinepage_index(page);
> -
> -			DBG_BUGON(pagenr >= pcl->nr_pages);
> -			if (pages[pagenr]) {
> +			pgnr = (bvec->offset + pcl->pageofs_out) >> PAGE_SHIFT;
> +			DBG_BUGON(pgnr >= pcl->nr_pages);
> +			if (pages[pgnr]) {
>  				DBG_BUGON(1);
> -				SetPageError(pages[pagenr]);
> -				z_erofs_onlinepage_endio(pages[pagenr]);
> +				SetPageError(pages[pgnr]);
> +				z_erofs_onlinepage_endio(pages[pgnr]);
>  				err = -EFSCORRUPTED;
>  			}
> -			pages[pagenr] = page;
> -
> +			pages[pgnr] = page;
>  			*overlapped = true;
>  		}
>  
> @@ -1067,19 +1063,19 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
>  out:
>  	/* must handle all compressed pages before actual file pages */
>  	if (z_erofs_is_inline_pcluster(pcl)) {
> -		page = pcl->compressed_pages[0];
> -		WRITE_ONCE(pcl->compressed_pages[0], NULL);
> +		page = pcl->compressed_bvecs[0].page;
> +		WRITE_ONCE(pcl->compressed_bvecs[0].page, NULL);
>  		put_page(page);
>  	} else {
>  		for (i = 0; i < pclusterpages; ++i) {
> -			page = pcl->compressed_pages[i];
> +			page = pcl->compressed_bvecs[i].page;
>  
>  			if (erofs_page_is_managed(sbi, page))
>  				continue;
>  
>  			/* recycle all individual short-lived pages */
>  			(void)z_erofs_put_shortlivedpage(pagepool, page);
> -			WRITE_ONCE(pcl->compressed_pages[i], NULL);
> +			WRITE_ONCE(pcl->compressed_bvecs[i].page, NULL);
>  		}
>  	}
>  	kfree(compressed_pages);
> @@ -1193,7 +1189,7 @@ static struct page *pickup_page_for_submission(struct z_erofs_pcluster *pcl,
>  	int justfound;
>  
>  repeat:
> -	page = READ_ONCE(pcl->compressed_pages[nr]);
> +	page = READ_ONCE(pcl->compressed_bvecs[nr].page);
>  	oldpage = page;
>  
>  	if (!page)
> @@ -1209,7 +1205,7 @@ static struct page *pickup_page_for_submission(struct z_erofs_pcluster *pcl,
>  	 * otherwise, it will go inplace I/O path instead.
>  	 */
>  	if (page->private == Z_EROFS_PREALLOCATED_PAGE) {
> -		WRITE_ONCE(pcl->compressed_pages[nr], page);
> +		WRITE_ONCE(pcl->compressed_bvecs[nr].page, page);
>  		set_page_private(page, 0);
>  		tocache = true;
>  		goto out_tocache;
> @@ -1235,14 +1231,14 @@ static struct page *pickup_page_for_submission(struct z_erofs_pcluster *pcl,
>  
>  	/* the page is still in manage cache */
>  	if (page->mapping == mc) {
> -		WRITE_ONCE(pcl->compressed_pages[nr], page);
> +		WRITE_ONCE(pcl->compressed_bvecs[nr].page, page);
>  
>  		ClearPageError(page);
>  		if (!PagePrivate(page)) {
>  			/*
>  			 * impossible to be !PagePrivate(page) for
>  			 * the current restriction as well if
> -			 * the page is already in compressed_pages[].
> +			 * the page is already in compressed_bvecs[].
>  			 */
>  			DBG_BUGON(!justfound);
>  
> @@ -1271,7 +1267,8 @@ static struct page *pickup_page_for_submission(struct z_erofs_pcluster *pcl,
>  	put_page(page);
>  out_allocpage:
>  	page = erofs_allocpage(pagepool, gfp | __GFP_NOFAIL);
> -	if (oldpage != cmpxchg(&pcl->compressed_pages[nr], oldpage, page)) {
> +	if (oldpage != cmpxchg(&pcl->compressed_bvecs[nr].page,
> +			       oldpage, page)) {
>  		erofs_pagepool_add(pagepool, page);
>  		cond_resched();
>  		goto repeat;
> diff --git a/fs/erofs/zdata.h b/fs/erofs/zdata.h
> index a755c5a44d87..a70f1b73e901 100644
> --- a/fs/erofs/zdata.h
> +++ b/fs/erofs/zdata.h
> @@ -87,8 +87,8 @@ struct z_erofs_pcluster {
>  	/* I: compression algorithm format */
>  	unsigned char algorithmformat;
>  
> -	/* A: compressed pages (can be cached or inplaced pages) */
> -	struct page *compressed_pages[];
> +	/* A: compressed bvecs (can be cached or inplaced pages) */
> +	struct z_erofs_bvec compressed_bvecs[];
>  };
>  
>  /* let's avoid the valid 32-bit kernel addresses */


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 07/16] erofs: switch compressed_pages[] to bufvec
  2022-07-15  7:53   ` Yue Hu
@ 2022-07-15  7:59     ` Gao Xiang
  0 siblings, 0 replies; 28+ messages in thread
From: Gao Xiang @ 2022-07-15  7:59 UTC (permalink / raw)
  To: Yue Hu; +Cc: linux-erofs, LKML

On Fri, Jul 15, 2022 at 03:53:23PM +0800, Yue Hu wrote:
> On Thu, 14 Jul 2022 21:20:42 +0800
> Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> 
> > Convert compressed_pages[] to bufvec in order to avoid using
> > page->private to keep onlinepage_index (decompressed offset)
> > for inplace I/O pages.
> > 
> > In the future, we only rely on folio->private to keep a countdown
> > to unlock folios and set folio_uptodate.
> > 
> > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > ---
> >  fs/erofs/zdata.c | 113 +++++++++++++++++++++++------------------------
> >  fs/erofs/zdata.h |   4 +-
> >  2 files changed, 57 insertions(+), 60 deletions(-)
> > 
> > diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> > index 757d352bc2c7..f2e3f07baad7 100644
> > --- a/fs/erofs/zdata.c
> > +++ b/fs/erofs/zdata.c
> > @@ -134,7 +134,7 @@ static int z_erofs_create_pcluster_pool(void)
> >  
> >  	for (pcs = pcluster_pool;
> >  	     pcs < pcluster_pool + ARRAY_SIZE(pcluster_pool); ++pcs) {
> > -		size = struct_size(a, compressed_pages, pcs->maxpages);
> > +		size = struct_size(a, compressed_bvecs, pcs->maxpages);
> >  
> >  		sprintf(pcs->name, "erofs_pcluster-%u", pcs->maxpages);
> >  		pcs->slab = kmem_cache_create(pcs->name, size, 0,
> > @@ -287,16 +287,16 @@ struct z_erofs_decompress_frontend {
> >  
> >  	struct page *candidate_bvpage;
> >  	struct z_erofs_pcluster *pcl, *tailpcl;
> > -	/* a pointer used to pick up inplace I/O pages */
> > -	struct page **icpage_ptr;
> >  	z_erofs_next_pcluster_t owned_head;
> > -
> >  	enum z_erofs_collectmode mode;
> >  
> >  	bool readahead;
> >  	/* used for applying cache strategy on the fly */
> >  	bool backmost;
> >  	erofs_off_t headoffset;
> > +
> > +	/* a pointer used to pick up inplace I/O pages */
> > +	unsigned int icur;
> 
> not a pointer?

Here `pointer' means a cursor or called sub-index.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-07-15  7:59 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-14 13:20 [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang
2022-07-14 13:20 ` [PATCH 01/16] erofs: get rid of unneeded `inode', `map' and `sb' Gao Xiang
2022-07-15  6:20   ` Yue Hu
2022-07-14 13:20 ` [PATCH 02/16] erofs: clean up z_erofs_collector_begin() Gao Xiang
2022-07-15  6:22   ` Yue Hu
2022-07-14 13:20 ` [PATCH 03/16] erofs: introduce `z_erofs_parse_out_bvecs()' Gao Xiang
2022-07-15  6:22   ` Yue Hu
2022-07-14 13:20 ` [PATCH 04/16] erofs: introduce bufvec to store decompressed buffers Gao Xiang
2022-07-15  6:29   ` Yue Hu
2022-07-15  6:36     ` Gao Xiang
2022-07-14 13:20 ` [PATCH 05/16] erofs: drop the old pagevec approach Gao Xiang
2022-07-15  7:07   ` Yue Hu
2022-07-15  7:19     ` Gao Xiang
2022-07-15  7:45       ` Yue Hu
2022-07-14 13:20 ` [PATCH 06/16] erofs: introduce `z_erofs_parse_in_bvecs' Gao Xiang
2022-07-14 13:20 ` [PATCH 07/16] erofs: switch compressed_pages[] to bufvec Gao Xiang
2022-07-15  7:53   ` Yue Hu
2022-07-15  7:59     ` Gao Xiang
2022-07-14 13:20 ` [PATCH 08/16] erofs: rework online page handling Gao Xiang
2022-07-14 13:20 ` [PATCH 09/16] erofs: get rid of `enum z_erofs_page_type' Gao Xiang
2022-07-14 13:20 ` [PATCH 10/16] erofs: clean up `enum z_erofs_collectmode' Gao Xiang
2022-07-14 13:20 ` [PATCH 11/16] erofs: get rid of `z_pagemap_global' Gao Xiang
2022-07-14 13:20 ` [PATCH 12/16] erofs: introduce struct z_erofs_decompress_backend Gao Xiang
2022-07-14 13:20 ` [PATCH 13/16] erofs: try to leave (de)compressed_pages on stack if possible Gao Xiang
2022-07-14 13:20 ` [PATCH 14/16] erofs: introduce z_erofs_do_decompressed_bvec() Gao Xiang
2022-07-14 13:20 ` [PATCH 15/16] erofs: record the longest decompressed size in this round Gao Xiang
2022-07-14 13:20 ` [PATCH 16/16] erofs: introduce multi-reference pclusters (fully-referenced) Gao Xiang
2022-07-14 13:38 ` [PATCH 00/16] erofs: prepare for folios, duplication and kill PG_error Gao Xiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).