From: Gao Xiang <xiang@kernel.org> To: linux-erofs@lists.ozlabs.org, Chao Yu <chao@kernel.org> Cc: LKML <linux-kernel@vger.kernel.org>, Gao Xiang <xiang@kernel.org>, stable@vger.kernel.org Subject: [PATCH] erofs: fix unsafe pagevec reuse of hooked pclusters Date: Thu, 4 Nov 2021 01:49:53 +0800 [thread overview] Message-ID: <20211103174953.3209-1-xiang@kernel.org> (raw) There are pclusters in runtime marked with Z_EROFS_PCLUSTER_TAIL before actual I/O submission. Thus, the submission chain can be extended if the following pcluster chain hook such tail pcluster. As the related comment mentioned, if some page is made of a hooked pcluster and another followed pcluster, it can be reused for in-place I/O (since I/O should be submitted anyway): _______________________________________________________________ | tail (partial) page | head (partial) page | |_____PRIMARY_HOOKED___|____________PRIMARY_FOLLOWED____________| However, it's by no means safe to reuse as pagevec since if such PRIMARY_HOOKED pclusters finally move into bypass chain without I/O submission. It's somewhat hard to reproduce with LZ4 and I just found it by ro_fsstress a LZMA image for long time. I'm going to clean up related code together with multi-page folio adaption in the next few months. Let's address it directly for easier backporting for now. Call trace for reference: z_erofs_decompress_pcluster+0x10a/0x8a0 [erofs] z_erofs_decompress_queue.isra.36+0x3c/0x60 [erofs] z_erofs_runqueue+0x5f3/0x840 [erofs] z_erofs_readahead+0x1e8/0x320 [erofs] read_pages+0x91/0x270 page_cache_ra_unbounded+0x18b/0x240 filemap_get_pages+0x10a/0x5f0 filemap_read+0xa9/0x330 new_sync_read+0x11b/0x1a0 vfs_read+0xf1/0x190 Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: <stable@vger.kernel.org> # 4.19+ Signed-off-by: Gao Xiang <xiang@kernel.org> --- fs/erofs/zdata.c | 13 +++++++------ fs/erofs/zpvec.h | 13 ++++++++++--- 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index 11c7a1aaebad..eb51df4a9f77 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -373,8 +373,8 @@ static bool z_erofs_try_inplace_io(struct z_erofs_collector *clt, /* callers must be with collection lock held */ static int z_erofs_attach_page(struct z_erofs_collector *clt, - struct page *page, - enum z_erofs_page_type type) + struct page *page, enum z_erofs_page_type type, + bool pvec_safereuse) { int ret; @@ -384,9 +384,9 @@ static int z_erofs_attach_page(struct z_erofs_collector *clt, z_erofs_try_inplace_io(clt, page)) return 0; - ret = z_erofs_pagevec_enqueue(&clt->vector, page, type); + ret = z_erofs_pagevec_enqueue(&clt->vector, page, type, + pvec_safereuse); clt->cl->vcnt += (unsigned int)ret; - return ret ? 0 : -EAGAIN; } @@ -729,7 +729,8 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe, tight &= (clt->mode >= COLLECT_PRIMARY_FOLLOWED); retry: - err = z_erofs_attach_page(clt, page, page_type); + err = z_erofs_attach_page(clt, page, page_type, + clt->mode >= COLLECT_PRIMARY_FOLLOWED); /* should allocate an additional short-lived page for pagevec */ if (err == -EAGAIN) { struct page *const newpage = @@ -737,7 +738,7 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe, set_page_private(newpage, Z_EROFS_SHORTLIVED_PAGE); err = z_erofs_attach_page(clt, newpage, - Z_EROFS_PAGE_TYPE_EXCLUSIVE); + Z_EROFS_PAGE_TYPE_EXCLUSIVE, true); if (!err) goto retry; } diff --git a/fs/erofs/zpvec.h b/fs/erofs/zpvec.h index dfd7fe0503bb..b05464f4a808 100644 --- a/fs/erofs/zpvec.h +++ b/fs/erofs/zpvec.h @@ -106,11 +106,18 @@ static inline void z_erofs_pagevec_ctor_init(struct z_erofs_pagevec_ctor *ctor, static inline bool z_erofs_pagevec_enqueue(struct z_erofs_pagevec_ctor *ctor, struct page *page, - enum z_erofs_page_type type) + enum z_erofs_page_type type, + bool pvec_safereuse) { - if (!ctor->next && type) - if (ctor->index + 1 == ctor->nr) + if (!ctor->next) { + /* some pages cannot be reused as pvec safely without I/O */ + if (type == Z_EROFS_PAGE_TYPE_EXCLUSIVE && !pvec_safereuse) + type = Z_EROFS_VLE_PAGE_TYPE_TAIL_SHARED; + + if (type != Z_EROFS_PAGE_TYPE_EXCLUSIVE && + ctor->index + 1 == ctor->nr) return false; + } if (ctor->index >= ctor->nr) z_erofs_pagevec_ctor_pagedown(ctor, false); -- 2.20.1
WARNING: multiple messages have this Message-ID (diff)
From: Gao Xiang <xiang@kernel.org> To: linux-erofs@lists.ozlabs.org, Chao Yu <chao@kernel.org> Cc: LKML <linux-kernel@vger.kernel.org>, stable@vger.kernel.org Subject: [PATCH] erofs: fix unsafe pagevec reuse of hooked pclusters Date: Thu, 4 Nov 2021 01:49:53 +0800 [thread overview] Message-ID: <20211103174953.3209-1-xiang@kernel.org> (raw) There are pclusters in runtime marked with Z_EROFS_PCLUSTER_TAIL before actual I/O submission. Thus, the submission chain can be extended if the following pcluster chain hook such tail pcluster. As the related comment mentioned, if some page is made of a hooked pcluster and another followed pcluster, it can be reused for in-place I/O (since I/O should be submitted anyway): _______________________________________________________________ | tail (partial) page | head (partial) page | |_____PRIMARY_HOOKED___|____________PRIMARY_FOLLOWED____________| However, it's by no means safe to reuse as pagevec since if such PRIMARY_HOOKED pclusters finally move into bypass chain without I/O submission. It's somewhat hard to reproduce with LZ4 and I just found it by ro_fsstress a LZMA image for long time. I'm going to clean up related code together with multi-page folio adaption in the next few months. Let's address it directly for easier backporting for now. Call trace for reference: z_erofs_decompress_pcluster+0x10a/0x8a0 [erofs] z_erofs_decompress_queue.isra.36+0x3c/0x60 [erofs] z_erofs_runqueue+0x5f3/0x840 [erofs] z_erofs_readahead+0x1e8/0x320 [erofs] read_pages+0x91/0x270 page_cache_ra_unbounded+0x18b/0x240 filemap_get_pages+0x10a/0x5f0 filemap_read+0xa9/0x330 new_sync_read+0x11b/0x1a0 vfs_read+0xf1/0x190 Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: <stable@vger.kernel.org> # 4.19+ Signed-off-by: Gao Xiang <xiang@kernel.org> --- fs/erofs/zdata.c | 13 +++++++------ fs/erofs/zpvec.h | 13 ++++++++++--- 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index 11c7a1aaebad..eb51df4a9f77 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -373,8 +373,8 @@ static bool z_erofs_try_inplace_io(struct z_erofs_collector *clt, /* callers must be with collection lock held */ static int z_erofs_attach_page(struct z_erofs_collector *clt, - struct page *page, - enum z_erofs_page_type type) + struct page *page, enum z_erofs_page_type type, + bool pvec_safereuse) { int ret; @@ -384,9 +384,9 @@ static int z_erofs_attach_page(struct z_erofs_collector *clt, z_erofs_try_inplace_io(clt, page)) return 0; - ret = z_erofs_pagevec_enqueue(&clt->vector, page, type); + ret = z_erofs_pagevec_enqueue(&clt->vector, page, type, + pvec_safereuse); clt->cl->vcnt += (unsigned int)ret; - return ret ? 0 : -EAGAIN; } @@ -729,7 +729,8 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe, tight &= (clt->mode >= COLLECT_PRIMARY_FOLLOWED); retry: - err = z_erofs_attach_page(clt, page, page_type); + err = z_erofs_attach_page(clt, page, page_type, + clt->mode >= COLLECT_PRIMARY_FOLLOWED); /* should allocate an additional short-lived page for pagevec */ if (err == -EAGAIN) { struct page *const newpage = @@ -737,7 +738,7 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe, set_page_private(newpage, Z_EROFS_SHORTLIVED_PAGE); err = z_erofs_attach_page(clt, newpage, - Z_EROFS_PAGE_TYPE_EXCLUSIVE); + Z_EROFS_PAGE_TYPE_EXCLUSIVE, true); if (!err) goto retry; } diff --git a/fs/erofs/zpvec.h b/fs/erofs/zpvec.h index dfd7fe0503bb..b05464f4a808 100644 --- a/fs/erofs/zpvec.h +++ b/fs/erofs/zpvec.h @@ -106,11 +106,18 @@ static inline void z_erofs_pagevec_ctor_init(struct z_erofs_pagevec_ctor *ctor, static inline bool z_erofs_pagevec_enqueue(struct z_erofs_pagevec_ctor *ctor, struct page *page, - enum z_erofs_page_type type) + enum z_erofs_page_type type, + bool pvec_safereuse) { - if (!ctor->next && type) - if (ctor->index + 1 == ctor->nr) + if (!ctor->next) { + /* some pages cannot be reused as pvec safely without I/O */ + if (type == Z_EROFS_PAGE_TYPE_EXCLUSIVE && !pvec_safereuse) + type = Z_EROFS_VLE_PAGE_TYPE_TAIL_SHARED; + + if (type != Z_EROFS_PAGE_TYPE_EXCLUSIVE && + ctor->index + 1 == ctor->nr) return false; + } if (ctor->index >= ctor->nr) z_erofs_pagevec_ctor_pagedown(ctor, false); -- 2.20.1
next reply other threads:[~2021-11-03 17:50 UTC|newest] Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-11-03 17:49 Gao Xiang [this message] 2021-11-03 17:49 ` [PATCH] erofs: fix unsafe pagevec reuse of hooked pclusters Gao Xiang 2021-11-03 18:20 ` [PATCH v2] " Gao Xiang 2021-11-03 18:20 ` Gao Xiang 2021-11-05 4:28 ` Chao Yu
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20211103174953.3209-1-xiang@kernel.org \ --to=xiang@kernel.org \ --cc=chao@kernel.org \ --cc=linux-erofs@lists.ozlabs.org \ --cc=linux-kernel@vger.kernel.org \ --cc=stable@vger.kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.