linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] erofs: better comment z_erofs_readahead()
@ 2021-07-05 18:32 Gao Xiang
  2021-07-05 18:32 ` [RFC PATCH 2/2] erofs: directly traverse pages in z_erofs_readahead() Gao Xiang
  0 siblings, 1 reply; 4+ messages in thread
From: Gao Xiang @ 2021-07-05 18:32 UTC (permalink / raw)
  To: linux-erofs; +Cc: Matthew Wilcox, LKML, Gao Xiang

Add some words about the traversal order and its pagepool usage.

Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Gao Xiang <xiang@kernel.org>
---
 fs/erofs/zdata.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index cb4d0889eca9..054b9839e9db 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1424,6 +1424,16 @@ static void z_erofs_readahead(struct readahead_control *rac)
 	f.readahead = true;
 	f.headoffset = readahead_pos(rac);
 
+	/*
+	 * All pages are locked in the forward order in advance, so directly
+	 * traverse pages in the reverse order since:
+	 *  1) more effective to get each extent start offset, calculate partial
+	 *     decompressed length w/o knowing the full extent length (which is
+	 *     more metadata costly). If traversing in the normal order, it's
+	 *     mandatory to get full extent length one-by-one.
+	 *  2) submission chain can be then in the forward order since
+	 *     pclusters are all inserted at head.
+	 */
 	while ((page = readahead_page(rac))) {
 		prefetchw(&page->flags);
 
@@ -1460,7 +1470,7 @@ static void z_erofs_readahead(struct readahead_control *rac)
 	if (f.map.mpage)
 		put_page(f.map.mpage);
 
-	/* clean up the remaining free pages */
+	/* drain the on-stack pagepool with unused non-LRU temporary pages */
 	put_pages_list(&pagepool);
 }
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC PATCH 2/2] erofs: directly traverse pages in z_erofs_readahead()
  2021-07-05 18:32 [PATCH 1/2] erofs: better comment z_erofs_readahead() Gao Xiang
@ 2021-07-05 18:32 ` Gao Xiang
  2021-07-05 20:15   ` Matthew Wilcox
  0 siblings, 1 reply; 4+ messages in thread
From: Gao Xiang @ 2021-07-05 18:32 UTC (permalink / raw)
  To: linux-erofs; +Cc: Matthew Wilcox, LKML, Gao Xiang

In that way, pages can be accessed directly with xarray.

Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Gao Xiang <xiang@kernel.org>
---
Just an open-coded PoC one, since that is what EROFS actually needs but
without the iteration API (see [PATCH 1/2] as well.)

 fs/erofs/zdata.c | 32 +++++++++++++-------------------
 1 file changed, 13 insertions(+), 19 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 054b9839e9db..210b2965ecc4 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1416,7 +1416,8 @@ static void z_erofs_readahead(struct readahead_control *rac)
 	bool sync = (sbi->ctx.readahead_sync_decompress &&
 			nr_pages <= sbi->ctx.max_sync_decompress_pages);
 	struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode);
-	struct page *page, *head = NULL;
+	struct page *page;
+	pgoff_t index;
 	LIST_HEAD(pagepool);
 
 	trace_erofs_readpages(inode, readahead_index(rac), nr_pages, false);
@@ -1434,26 +1435,19 @@ static void z_erofs_readahead(struct readahead_control *rac)
 	 *  2) submission chain can be then in the forward order since
 	 *     pclusters are all inserted at head.
 	 */
-	while ((page = readahead_page(rac))) {
-		prefetchw(&page->flags);
-
-		/*
-		 * A pure asynchronous readahead is indicated if
-		 * a PG_readahead marked page is hitted at first.
-		 * Let's also do asynchronous decompression for this case.
-		 */
-		sync &= !(PageReadahead(page) && !head);
-
-		set_page_private(page, (unsigned long)head);
-		head = page;
-	}
-
-	while (head) {
-		struct page *page = head;
+	index = rac->_index + rac->_nr_pages;
+	while (rac->_nr_pages) {
+		struct page *head;
 		int err;
 
-		/* traversal in reverse order */
-		head = (void *)page_private(page);
+		--rac->_nr_pages;
+		page = xa_load(&rac->mapping->i_pages, --index);
+		/* XXX: very incomplete thp support */
+		head = thp_head(page);
+		if (head != page) {
+			index -= page->index - head->index;
+			page = head;
+		}
 
 		err = z_erofs_do_read_page(&f, page, &pagepool);
 		if (err)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH 2/2] erofs: directly traverse pages in z_erofs_readahead()
  2021-07-05 18:32 ` [RFC PATCH 2/2] erofs: directly traverse pages in z_erofs_readahead() Gao Xiang
@ 2021-07-05 20:15   ` Matthew Wilcox
  2021-07-05 22:48     ` Gao Xiang
  0 siblings, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2021-07-05 20:15 UTC (permalink / raw)
  To: Gao Xiang; +Cc: linux-erofs, LKML

On Tue, Jul 06, 2021 at 02:32:53AM +0800, Gao Xiang wrote:
> In that way, pages can be accessed directly with xarray.

I didn't mean "open code readahead_page()".  I meant "Wouldn't it be
great if z_erofs_do_read_page() used readahead_expand() in order to
allocate the extra pages in the extents that cover the start and end of
the requested chunk".  That way the pages would already be in the page
cache for subsequent reads.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH 2/2] erofs: directly traverse pages in z_erofs_readahead()
  2021-07-05 20:15   ` Matthew Wilcox
@ 2021-07-05 22:48     ` Gao Xiang
  0 siblings, 0 replies; 4+ messages in thread
From: Gao Xiang @ 2021-07-05 22:48 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-erofs, LKML

Hi Matthew,

On Mon, Jul 05, 2021 at 09:15:26PM +0100, Matthew Wilcox wrote:
> On Tue, Jul 06, 2021 at 02:32:53AM +0800, Gao Xiang wrote:
> > In that way, pages can be accessed directly with xarray.
> 
> I didn't mean "open code readahead_page()".  I meant "Wouldn't it be
> great if z_erofs_do_read_page() used readahead_expand() in order to
> allocate the extra pages in the extents that cover the start and end of
> the requested chunk".  That way the pages would already be in the page
> cache for subsequent reads.

Yes, I understand that idea. readahead_expand() can be used to
cache the whole requested decompressed chunk. Currently, EROFS tends
to cache compressed data for incomplete requested chunks instead in the
managed inode if cache mode is enabled since LZ4 compressed data is
more effective for caching to save I/O than uncompressed data
(considering LZ4 decompression speed, which is somewhat like zcache.)

For now, EROFS LZMA support is also on the way (but recently I
have to work on the DAX support in advance.) LZMA is somewhat slow
algorithm but with much better compression ratio. I think it should
decompress all requested chunks to save the overall decompression
time. So readahead_expand() is quite useful with EROFS LZMA cases,
I will investigate it for LZMA-specific cases...

BTW, from the patch [2/2] itself, it there some chance to extend
readahead_page() or some new API like this? So the original dance
can be avoided.

Some comments for the iteration behavior itself is always useful,
anyway...

Thanks,
Gao Xiang

> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-07-05 22:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-05 18:32 [PATCH 1/2] erofs: better comment z_erofs_readahead() Gao Xiang
2021-07-05 18:32 ` [RFC PATCH 2/2] erofs: directly traverse pages in z_erofs_readahead() Gao Xiang
2021-07-05 20:15   ` Matthew Wilcox
2021-07-05 22:48     ` Gao Xiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).