linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/3] Implement readahead for squashfs
@ 2022-06-06 15:03 Hsin-Yi Wang
  2022-06-06 15:03 ` [PATCH v5 1/3] Revert "squashfs: provide backing_dev_info in order to disable read-ahead" Hsin-Yi Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Hsin-Yi Wang @ 2022-06-06 15:03 UTC (permalink / raw)
  To: Phillip Lougher, Matthew Wilcox, Xiongwei Song, Marek Szyprowski,
	Andrew Morton
  Cc: Zheng Liang, Zhang Yi, Hou Tao, Miao Xie, linux-mm @ kvack . org,
	squashfs-devel @ lists . sourceforge . net, linux-kernel

Commit c1f6925e1091("mm: put readahead pages in cache earlier") requires
fs to implement readahead callback. Otherwise there will be a
performance regression.

Commit 9eec1d897139("squashfs: provide backing_dev_info in order to
disable read-ahead") mitigates the performance drop issue for squashfs
by closing readahead for it.

This series implements readahead callback for squashfs. The previous
discussion are in [1] and [2].

[1] https://lore.kernel.org/all/CAJMQK-g9G6KQmH-V=BRGX0swZji9Wxe_2c7ht-MMAapdFy2pXw@mail.gmail.com/T/
[2] https://lore.kernel.org/linux-mm/Yn5Yij9pRPCzDozt@casper.infradead.org/t/#m4af4473b94f98a4996cb11756b633a07e5e059d1

Hsin-Yi Wang (2):
  Revert "squashfs: provide backing_dev_info in order to disable
    read-ahead"
  squashfs: implement readahead

Phillip Lougher (1):
  squashfs: always build "file direct" version of page actor

 fs/squashfs/Makefile     |   4 +-
 fs/squashfs/file.c       | 124 ++++++++++++++++++++++++++++++++++++++-
 fs/squashfs/page_actor.h |  41 -------------
 fs/squashfs/super.c      |  33 -----------
 4 files changed, 125 insertions(+), 77 deletions(-)

-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 1/3] Revert "squashfs: provide backing_dev_info in order to disable read-ahead"
  2022-06-06 15:03 [PATCH v5 0/3] Implement readahead for squashfs Hsin-Yi Wang
@ 2022-06-06 15:03 ` Hsin-Yi Wang
  2022-06-06 15:03 ` [PATCH v5 2/3] squashfs: always build "file direct" version of page actor Hsin-Yi Wang
  2022-06-06 15:03 ` [PATCH v5 3/3] squashfs: implement readahead Hsin-Yi Wang
  2 siblings, 0 replies; 16+ messages in thread
From: Hsin-Yi Wang @ 2022-06-06 15:03 UTC (permalink / raw)
  To: Phillip Lougher, Matthew Wilcox, Xiongwei Song, Marek Szyprowski,
	Andrew Morton
  Cc: Zheng Liang, Zhang Yi, Hou Tao, Miao Xie, linux-mm @ kvack . org,
	squashfs-devel @ lists . sourceforge . net, linux-kernel

This reverts commit 9eec1d897139e5de287af5d559a02b811b844d82.

Revert closing the readahead to squashfs since the readahead callback
for squashfs is implemented.

Suggested-by: Xiongwei Song <Xiongwei.Song@windriver.com>
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
---
 fs/squashfs/super.c | 33 ---------------------------------
 1 file changed, 33 deletions(-)

diff --git a/fs/squashfs/super.c b/fs/squashfs/super.c
index 6d594ba2ed28..32565dafa7f3 100644
--- a/fs/squashfs/super.c
+++ b/fs/squashfs/super.c
@@ -29,7 +29,6 @@
 #include <linux/module.h>
 #include <linux/magic.h>
 #include <linux/xattr.h>
-#include <linux/backing-dev.h>
 
 #include "squashfs_fs.h"
 #include "squashfs_fs_sb.h"
@@ -113,24 +112,6 @@ static const struct squashfs_decompressor *supported_squashfs_filesystem(
 	return decompressor;
 }
 
-static int squashfs_bdi_init(struct super_block *sb)
-{
-	int err;
-	unsigned int major = MAJOR(sb->s_dev);
-	unsigned int minor = MINOR(sb->s_dev);
-
-	bdi_put(sb->s_bdi);
-	sb->s_bdi = &noop_backing_dev_info;
-
-	err = super_setup_bdi_name(sb, "squashfs_%u_%u", major, minor);
-	if (err)
-		return err;
-
-	sb->s_bdi->ra_pages = 0;
-	sb->s_bdi->io_pages = 0;
-
-	return 0;
-}
 
 static int squashfs_fill_super(struct super_block *sb, struct fs_context *fc)
 {
@@ -146,20 +127,6 @@ static int squashfs_fill_super(struct super_block *sb, struct fs_context *fc)
 
 	TRACE("Entered squashfs_fill_superblock\n");
 
-	/*
-	 * squashfs provides 'backing_dev_info' in order to disable read-ahead. For
-	 * squashfs, I/O is not deferred, it is done immediately in read_folio,
-	 * which means the user would always have to wait their own I/O. So the effect
-	 * of readahead is very weak for squashfs. squashfs_bdi_init will set
-	 * sb->s_bdi->ra_pages and sb->s_bdi->io_pages to 0 and close readahead for
-	 * squashfs.
-	 */
-	err = squashfs_bdi_init(sb);
-	if (err) {
-		errorf(fc, "squashfs init bdi failed");
-		return err;
-	}
-
 	sb->s_fs_info = kzalloc(sizeof(*msblk), GFP_KERNEL);
 	if (sb->s_fs_info == NULL) {
 		ERROR("Failed to allocate squashfs_sb_info\n");
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 2/3] squashfs: always build "file direct" version of page actor
  2022-06-06 15:03 [PATCH v5 0/3] Implement readahead for squashfs Hsin-Yi Wang
  2022-06-06 15:03 ` [PATCH v5 1/3] Revert "squashfs: provide backing_dev_info in order to disable read-ahead" Hsin-Yi Wang
@ 2022-06-06 15:03 ` Hsin-Yi Wang
  2022-06-06 15:03 ` [PATCH v5 3/3] squashfs: implement readahead Hsin-Yi Wang
  2 siblings, 0 replies; 16+ messages in thread
From: Hsin-Yi Wang @ 2022-06-06 15:03 UTC (permalink / raw)
  To: Phillip Lougher, Matthew Wilcox, Xiongwei Song, Marek Szyprowski,
	Andrew Morton
  Cc: Zheng Liang, Zhang Yi, Hou Tao, Miao Xie, linux-mm @ kvack . org,
	squashfs-devel @ lists . sourceforge . net, linux-kernel

From: Phillip Lougher <phillip@squashfs.org.uk>

Squashfs_readahead uses the "file direct" version of the page
actor, and so build it unconditionally.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
---
 fs/squashfs/Makefile     |  4 ++--
 fs/squashfs/page_actor.h | 41 ----------------------------------------
 2 files changed, 2 insertions(+), 43 deletions(-)

diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile
index 7bd9b8b856d0..477c89a519ee 100644
--- a/fs/squashfs/Makefile
+++ b/fs/squashfs/Makefile
@@ -5,9 +5,9 @@
 
 obj-$(CONFIG_SQUASHFS) += squashfs.o
 squashfs-y += block.o cache.o dir.o export.o file.o fragment.o id.o inode.o
-squashfs-y += namei.o super.o symlink.o decompressor.o
+squashfs-y += namei.o super.o symlink.o decompressor.o page_actor.o
 squashfs-$(CONFIG_SQUASHFS_FILE_CACHE) += file_cache.o
-squashfs-$(CONFIG_SQUASHFS_FILE_DIRECT) += file_direct.o page_actor.o
+squashfs-$(CONFIG_SQUASHFS_FILE_DIRECT) += file_direct.o
 squashfs-$(CONFIG_SQUASHFS_DECOMP_SINGLE) += decompressor_single.o
 squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI) += decompressor_multi.o
 squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU) += decompressor_multi_percpu.o
diff --git a/fs/squashfs/page_actor.h b/fs/squashfs/page_actor.h
index 2e3073ace009..26e07373af8a 100644
--- a/fs/squashfs/page_actor.h
+++ b/fs/squashfs/page_actor.h
@@ -6,46 +6,6 @@
  * Phillip Lougher <phillip@squashfs.org.uk>
  */
 
-#ifndef CONFIG_SQUASHFS_FILE_DIRECT
-struct squashfs_page_actor {
-	void	**page;
-	int	pages;
-	int	length;
-	int	next_page;
-};
-
-static inline struct squashfs_page_actor *squashfs_page_actor_init(void **page,
-	int pages, int length)
-{
-	struct squashfs_page_actor *actor = kmalloc(sizeof(*actor), GFP_KERNEL);
-
-	if (actor == NULL)
-		return NULL;
-
-	actor->length = length ? : pages * PAGE_SIZE;
-	actor->page = page;
-	actor->pages = pages;
-	actor->next_page = 0;
-	return actor;
-}
-
-static inline void *squashfs_first_page(struct squashfs_page_actor *actor)
-{
-	actor->next_page = 1;
-	return actor->page[0];
-}
-
-static inline void *squashfs_next_page(struct squashfs_page_actor *actor)
-{
-	return actor->next_page == actor->pages ? NULL :
-		actor->page[actor->next_page++];
-}
-
-static inline void squashfs_finish_page(struct squashfs_page_actor *actor)
-{
-	/* empty */
-}
-#else
 struct squashfs_page_actor {
 	union {
 		void		**buffer;
@@ -76,4 +36,3 @@ static inline void squashfs_finish_page(struct squashfs_page_actor *actor)
 	actor->squashfs_finish_page(actor);
 }
 #endif
-#endif
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 3/3] squashfs: implement readahead
  2022-06-06 15:03 [PATCH v5 0/3] Implement readahead for squashfs Hsin-Yi Wang
  2022-06-06 15:03 ` [PATCH v5 1/3] Revert "squashfs: provide backing_dev_info in order to disable read-ahead" Hsin-Yi Wang
  2022-06-06 15:03 ` [PATCH v5 2/3] squashfs: always build "file direct" version of page actor Hsin-Yi Wang
@ 2022-06-06 15:03 ` Hsin-Yi Wang
  2022-06-07  3:59   ` Phillip Lougher
                     ` (3 more replies)
  2 siblings, 4 replies; 16+ messages in thread
From: Hsin-Yi Wang @ 2022-06-06 15:03 UTC (permalink / raw)
  To: Phillip Lougher, Matthew Wilcox, Xiongwei Song, Marek Szyprowski,
	Andrew Morton
  Cc: Zheng Liang, Zhang Yi, Hou Tao, Miao Xie, linux-mm @ kvack . org,
	squashfs-devel @ lists . sourceforge . net, linux-kernel

Implement readahead callback for squashfs. It will read datablocks
which cover pages in readahead request. For a few cases it will
not mark page as uptodate, including:
- file end is 0.
- zero filled blocks.
- current batch of pages isn't in the same datablock.
- decompressor error.
Otherwise pages will be marked as uptodate. The unhandled pages will be
updated by readpage later.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
Reported-by: Matthew Wilcox <willy@infradead.org>
Reported-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: Xiongwei Song <Xiongwei.Song@windriver.com>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
---
v4->v5:
- Handle short file cases reported by Marek and Matthew.
- Fix checkpatch error reported by Andrew.

v4: https://lore.kernel.org/lkml/20220601103922.1338320-4-hsinyi@chromium.org/
v3: https://lore.kernel.org/lkml/20220523065909.883444-4-hsinyi@chromium.org/
v2: https://lore.kernel.org/lkml/20220517082650.2005840-4-hsinyi@chromium.org/
v1: https://lore.kernel.org/lkml/20220516105100.1412740-3-hsinyi@chromium.org/
---
 fs/squashfs/file.c | 124 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 123 insertions(+), 1 deletion(-)

diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
index a8e495d8eb86..fbd096cd15f4 100644
--- a/fs/squashfs/file.c
+++ b/fs/squashfs/file.c
@@ -39,6 +39,7 @@
 #include "squashfs_fs_sb.h"
 #include "squashfs_fs_i.h"
 #include "squashfs.h"
+#include "page_actor.h"
 
 /*
  * Locate cache slot in range [offset, index] for specified inode.  If
@@ -495,7 +496,128 @@ static int squashfs_read_folio(struct file *file, struct folio *folio)
 	return 0;
 }
 
+static void squashfs_readahead(struct readahead_control *ractl)
+{
+	struct inode *inode = ractl->mapping->host;
+	struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
+	size_t mask = (1UL << msblk->block_log) - 1;
+	unsigned short shift = msblk->block_log - PAGE_SHIFT;
+	loff_t start = readahead_pos(ractl) & ~mask;
+	size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
+	struct squashfs_page_actor *actor;
+	unsigned int nr_pages = 0;
+	struct page **pages;
+	int i, file_end = i_size_read(inode) >> msblk->block_log;
+	unsigned int max_pages = 1UL << shift;
+
+	readahead_expand(ractl, start, (len | mask) + 1);
+
+	if (file_end == 0)
+		return;
+
+	pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
+	if (!pages)
+		return;
+
+	actor = squashfs_page_actor_init_special(pages, max_pages, 0);
+	if (!actor)
+		goto out;
+
+	for (;;) {
+		pgoff_t index;
+		int res, bsize;
+		u64 block = 0;
+		unsigned int expected;
+
+		nr_pages = __readahead_batch(ractl, pages, max_pages);
+		if (!nr_pages)
+			break;
+
+		if (readahead_pos(ractl) >= i_size_read(inode))
+			goto skip_pages;
+
+		index = pages[0]->index >> shift;
+		if ((pages[nr_pages - 1]->index >> shift) != index)
+			goto skip_pages;
+
+		expected = index == file_end ?
+			   (i_size_read(inode) & (msblk->block_size - 1)) :
+			    msblk->block_size;
+
+		bsize = read_blocklist(inode, index, &block);
+		if (bsize == 0)
+			goto skip_pages;
+
+		if (nr_pages < max_pages) {
+			struct squashfs_cache_entry *buffer;
+			unsigned int block_mask = max_pages - 1;
+			int offset = pages[0]->index - (pages[0]->index & ~block_mask);
+
+			buffer = squashfs_get_datablock(inode->i_sb, block,
+							bsize);
+			if (buffer->error) {
+				squashfs_cache_put(buffer);
+				goto skip_pages;
+			}
+
+			expected -= offset * PAGE_SIZE;
+			for (i = 0; i < nr_pages && expected > 0; i++,
+						expected -= PAGE_SIZE, offset++) {
+				int avail = min_t(int, expected, PAGE_SIZE);
+
+				squashfs_fill_page(pages[i], buffer,
+						offset * PAGE_SIZE, avail);
+				unlock_page(pages[i]);
+			}
+
+			squashfs_cache_put(buffer);
+			continue;
+		}
+
+		res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
+					 actor);
+
+		if (res == expected) {
+			int bytes;
+
+			/* Last page may have trailing bytes not filled */
+			bytes = res % PAGE_SIZE;
+			if (bytes) {
+				void *pageaddr;
+
+				pageaddr = kmap_atomic(pages[nr_pages - 1]);
+				memset(pageaddr + bytes, 0, PAGE_SIZE - bytes);
+				kunmap_atomic(pageaddr);
+			}
+
+			for (i = 0; i < nr_pages; i++) {
+				flush_dcache_page(pages[i]);
+				SetPageUptodate(pages[i]);
+			}
+		}
+
+		for (i = 0; i < nr_pages; i++) {
+			unlock_page(pages[i]);
+			put_page(pages[i]);
+		}
+	}
+
+	kfree(actor);
+	kfree(pages);
+	return;
+
+skip_pages:
+	for (i = 0; i < nr_pages; i++) {
+		unlock_page(pages[i]);
+		put_page(pages[i]);
+	}
+
+	kfree(actor);
+out:
+	kfree(pages);
+}
 
 const struct address_space_operations squashfs_aops = {
-	.read_folio = squashfs_read_folio
+	.read_folio = squashfs_read_folio,
+	.readahead = squashfs_readahead
 };
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/3] squashfs: implement readahead
  2022-06-06 15:03 ` [PATCH v5 3/3] squashfs: implement readahead Hsin-Yi Wang
@ 2022-06-07  3:59   ` Phillip Lougher
  2022-06-07 19:29   ` Fabio M. De Francesco
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Phillip Lougher @ 2022-06-07  3:59 UTC (permalink / raw)
  To: Hsin-Yi Wang, Matthew Wilcox, Xiongwei Song, Marek Szyprowski,
	Andrew Morton
  Cc: Zheng Liang, Zhang Yi, Hou Tao, Miao Xie, linux-mm @ kvack . org,
	squashfs-devel @ lists . sourceforge . net, linux-kernel

On 06/06/2022 16:03, Hsin-Yi Wang wrote:
> Implement readahead callback for squashfs. It will read datablocks
> which cover pages in readahead request. For a few cases it will
> not mark page as uptodate, including:
> - file end is 0.
> - zero filled blocks.
> - current batch of pages isn't in the same datablock.
> - decompressor error.
> Otherwise pages will be marked as uptodate. The unhandled pages will be
> updated by readpage later.
> 
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
> Reported-by: Matthew Wilcox <willy@infradead.org>
> Reported-by: Phillip Lougher <phillip@squashfs.org.uk>
> Reported-by: Xiongwei Song <Xiongwei.Song@windriver.com>
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Reported-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> v4->v5:
> - Handle short file cases reported by Marek and Matthew.
> - Fix checkpatch error reported by Andrew.

Thanks for the updated patch.  I'm currently testing and
reviewing the patch, and this may take a couple of days.

Phillip

> 
> v4: https://lore.kernel.org/lkml/20220601103922.1338320-4-hsinyi@chromium.org/
> v3: https://lore.kernel.org/lkml/20220523065909.883444-4-hsinyi@chromium.org/
> v2: https://lore.kernel.org/lkml/20220517082650.2005840-4-hsinyi@chromium.org/
> v1: https://lore.kernel.org/lkml/20220516105100.1412740-3-hsinyi@chromium.org/
> ---
>   fs/squashfs/file.c | 124 ++++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 123 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> index a8e495d8eb86..fbd096cd15f4 100644
> --- a/fs/squashfs/file.c
> +++ b/fs/squashfs/file.c
> @@ -39,6 +39,7 @@
>   #include "squashfs_fs_sb.h"
>   #include "squashfs_fs_i.h"
>   #include "squashfs.h"
> +#include "page_actor.h"
>   
>   /*
>    * Locate cache slot in range [offset, index] for specified inode.  If
> @@ -495,7 +496,128 @@ static int squashfs_read_folio(struct file *file, struct folio *folio)
>   	return 0;
>   }
>   
> +static void squashfs_readahead(struct readahead_control *ractl)
> +{
> +	struct inode *inode = ractl->mapping->host;
> +	struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> +	size_t mask = (1UL << msblk->block_log) - 1;
> +	unsigned short shift = msblk->block_log - PAGE_SHIFT;
> +	loff_t start = readahead_pos(ractl) & ~mask;
> +	size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
> +	struct squashfs_page_actor *actor;
> +	unsigned int nr_pages = 0;
> +	struct page **pages;
> +	int i, file_end = i_size_read(inode) >> msblk->block_log;
> +	unsigned int max_pages = 1UL << shift;
> +
> +	readahead_expand(ractl, start, (len | mask) + 1);
> +
> +	if (file_end == 0)
> +		return;
> +
> +	pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
> +	if (!pages)
> +		return;
> +
> +	actor = squashfs_page_actor_init_special(pages, max_pages, 0);
> +	if (!actor)
> +		goto out;
> +
> +	for (;;) {
> +		pgoff_t index;
> +		int res, bsize;
> +		u64 block = 0;
> +		unsigned int expected;
> +
> +		nr_pages = __readahead_batch(ractl, pages, max_pages);
> +		if (!nr_pages)
> +			break;
> +
> +		if (readahead_pos(ractl) >= i_size_read(inode))
> +			goto skip_pages;
> +
> +		index = pages[0]->index >> shift;
> +		if ((pages[nr_pages - 1]->index >> shift) != index)
> +			goto skip_pages;
> +
> +		expected = index == file_end ?
> +			   (i_size_read(inode) & (msblk->block_size - 1)) :
> +			    msblk->block_size;
> +
> +		bsize = read_blocklist(inode, index, &block);
> +		if (bsize == 0)
> +			goto skip_pages;
> +
> +		if (nr_pages < max_pages) {
> +			struct squashfs_cache_entry *buffer;
> +			unsigned int block_mask = max_pages - 1;
> +			int offset = pages[0]->index - (pages[0]->index & ~block_mask);
> +
> +			buffer = squashfs_get_datablock(inode->i_sb, block,
> +							bsize);
> +			if (buffer->error) {
> +				squashfs_cache_put(buffer);
> +				goto skip_pages;
> +			}
> +
> +			expected -= offset * PAGE_SIZE;
> +			for (i = 0; i < nr_pages && expected > 0; i++,
> +						expected -= PAGE_SIZE, offset++) {
> +				int avail = min_t(int, expected, PAGE_SIZE);
> +
> +				squashfs_fill_page(pages[i], buffer,
> +						offset * PAGE_SIZE, avail);
> +				unlock_page(pages[i]);
> +			}
> +
> +			squashfs_cache_put(buffer);
> +			continue;
> +		}
> +
> +		res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
> +					 actor);
> +
> +		if (res == expected) {
> +			int bytes;
> +
> +			/* Last page may have trailing bytes not filled */
> +			bytes = res % PAGE_SIZE;
> +			if (bytes) {
> +				void *pageaddr;
> +
> +				pageaddr = kmap_atomic(pages[nr_pages - 1]);
> +				memset(pageaddr + bytes, 0, PAGE_SIZE - bytes);
> +				kunmap_atomic(pageaddr);
> +			}
> +
> +			for (i = 0; i < nr_pages; i++) {
> +				flush_dcache_page(pages[i]);
> +				SetPageUptodate(pages[i]);
> +			}
> +		}
> +
> +		for (i = 0; i < nr_pages; i++) {
> +			unlock_page(pages[i]);
> +			put_page(pages[i]);
> +		}
> +	}
> +
> +	kfree(actor);
> +	kfree(pages);
> +	return;
> +
> +skip_pages:
> +	for (i = 0; i < nr_pages; i++) {
> +		unlock_page(pages[i]);
> +		put_page(pages[i]);
> +	}
> +
> +	kfree(actor);
> +out:
> +	kfree(pages);
> +}
>   
>   const struct address_space_operations squashfs_aops = {
> -	.read_folio = squashfs_read_folio
> +	.read_folio = squashfs_read_folio,
> +	.readahead = squashfs_readahead
>   };


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/3] squashfs: implement readahead
  2022-06-06 15:03 ` [PATCH v5 3/3] squashfs: implement readahead Hsin-Yi Wang
  2022-06-07  3:59   ` Phillip Lougher
@ 2022-06-07 19:29   ` Fabio M. De Francesco
  2022-06-08 10:20     ` Hsin-Yi Wang
  2022-06-09 14:46   ` Xiongwei Song
  2022-06-11  5:23   ` Phillip Lougher
  3 siblings, 1 reply; 16+ messages in thread
From: Fabio M. De Francesco @ 2022-06-07 19:29 UTC (permalink / raw)
  To: Phillip Lougher, Matthew Wilcox, Xiongwei Song, Marek Szyprowski,
	Andrew Morton, Hsin-Yi Wang
  Cc: Zheng Liang, Zhang Yi, Hou Tao, Miao Xie, linux-mm @ kvack . org,
	squashfs-devel @ lists . sourceforge . net, linux-kernel,
	ira.weiny

On lunedì 6 giugno 2022 17:03:05 CEST Hsin-Yi Wang wrote:
> Implement readahead callback for squashfs. It will read datablocks
> which cover pages in readahead request. For a few cases it will
> not mark page as uptodate, including:
> - file end is 0.
> - zero filled blocks.
> - current batch of pages isn't in the same datablock.
> - decompressor error.
> Otherwise pages will be marked as uptodate. The unhandled pages will be
> updated by readpage later.
> 
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
> Reported-by: Matthew Wilcox <willy@infradead.org>
> Reported-by: Phillip Lougher <phillip@squashfs.org.uk>
> Reported-by: Xiongwei Song <Xiongwei.Song@windriver.com>
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Reported-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> v4->v5:
> - Handle short file cases reported by Marek and Matthew.
> - Fix checkpatch error reported by Andrew.
> 
> v4: https://lore.kernel.org/lkml/20220601103922.1338320-4-hsinyi@chromium.org/
> v3: https://lore.kernel.org/lkml/20220523065909.883444-4-hsinyi@chromium.org/
> v2: https://lore.kernel.org/lkml/20220517082650.2005840-4-hsinyi@chromium.org/
> v1: https://lore.kernel.org/lkml/20220516105100.1412740-3-hsinyi@chromium.org/
> ---
>  fs/squashfs/file.c | 124 ++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 123 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> index a8e495d8eb86..fbd096cd15f4 100644
> --- a/fs/squashfs/file.c
> +++ b/fs/squashfs/file.c
> @@ -39,6 +39,7 @@
>  #include "squashfs_fs_sb.h"
>  #include "squashfs_fs_i.h"
>  #include "squashfs.h"
> +#include "page_actor.h"
>  
>  /*
>   * Locate cache slot in range [offset, index] for specified inode.  If
> @@ -495,7 +496,128 @@ static int squashfs_read_folio(struct file *file, 
struct folio *folio)
>  	return 0;
>  }
>  
> +static void squashfs_readahead(struct readahead_control *ractl)
> +{
> +	struct inode *inode = ractl->mapping->host;
> +	struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> +	size_t mask = (1UL << msblk->block_log) - 1;
> +	unsigned short shift = msblk->block_log - PAGE_SHIFT;
> +	loff_t start = readahead_pos(ractl) & ~mask;
> +	size_t len = readahead_length(ractl) + readahead_pos(ractl) - 
start;
> +	struct squashfs_page_actor *actor;
> +	unsigned int nr_pages = 0;
> +	struct page **pages;
> +	int i, file_end = i_size_read(inode) >> msblk->block_log;
> +	unsigned int max_pages = 1UL << shift;
> +
> +	readahead_expand(ractl, start, (len | mask) + 1);
> +
> +	if (file_end == 0)
> +		return;
> +
> +	pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
> +	if (!pages)
> +		return;
> +
> +	actor = squashfs_page_actor_init_special(pages, max_pages, 0);
> +	if (!actor)
> +		goto out;
> +
> +	for (;;) {
> +		pgoff_t index;
> +		int res, bsize;
> +		u64 block = 0;
> +		unsigned int expected;
> +
> +		nr_pages = __readahead_batch(ractl, pages, max_pages);
> +		if (!nr_pages)
> +			break;
> +
> +		if (readahead_pos(ractl) >= i_size_read(inode))
> +			goto skip_pages;
> +
> +		index = pages[0]->index >> shift;
> +		if ((pages[nr_pages - 1]->index >> shift) != index)
> +			goto skip_pages;
> +
> +		expected = index == file_end ?
> +			   (i_size_read(inode) & (msblk->block_size - 
1)) :
> +			    msblk->block_size;
> +
> +		bsize = read_blocklist(inode, index, &block);
> +		if (bsize == 0)
> +			goto skip_pages;
> +
> +		if (nr_pages < max_pages) {
> +			struct squashfs_cache_entry *buffer;
> +			unsigned int block_mask = max_pages - 1;
> +			int offset = pages[0]->index - (pages[0]-
>index & ~block_mask);
> +
> +			buffer = squashfs_get_datablock(inode->i_sb, 
block,
> +							
bsize);
> +			if (buffer->error) {
> +				squashfs_cache_put(buffer);
> +				goto skip_pages;
> +			}
> +
> +			expected -= offset * PAGE_SIZE;
> +			for (i = 0; i < nr_pages && expected > 0; i+
+,
> +						expected -= 
PAGE_SIZE, offset++) {
> +				int avail = min_t(int, expected, 
PAGE_SIZE);
> +
> +				squashfs_fill_page(pages[i], 
buffer,
> +						offset * 
PAGE_SIZE, avail);
> +				unlock_page(pages[i]);
> +			}
> +
> +			squashfs_cache_put(buffer);
> +			continue;
> +		}
> +
> +		res = squashfs_read_data(inode->i_sb, block, bsize, 
NULL,
> +					 actor);
> +
> +		if (res == expected) {
> +			int bytes;
> +
> +			/* Last page may have trailing bytes not 
filled */
> +			bytes = res % PAGE_SIZE;
> +			if (bytes) {
> +				void *pageaddr;
> +
> +				pageaddr = 
kmap_atomic(pages[nr_pages - 1]);
> +				memset(pageaddr + bytes, 0, 
PAGE_SIZE - bytes);
> +				kunmap_atomic(pageaddr);
> +			}

Hi Hsin-Yi,

kmap_atomic() shouldn't be used in new code, unless there are special 
reasons that I am not able to spot here.

Why not use kmap_local_page(), preferably via memzero_page?

Thanks,

Fabio

> +
> +			for (i = 0; i < nr_pages; i++) {
> +				flush_dcache_page(pages[i]);
> +				SetPageUptodate(pages[i]);
> +			}
> +		}
> +
> +		for (i = 0; i < nr_pages; i++) {
> +			unlock_page(pages[i]);
> +			put_page(pages[i]);
> +		}
> +	}
> +
> +	kfree(actor);
> +	kfree(pages);
> +	return;
> +
> +skip_pages:
> +	for (i = 0; i < nr_pages; i++) {
> +		unlock_page(pages[i]);
> +		put_page(pages[i]);
> +	}
> +
> +	kfree(actor);
> +out:
> +	kfree(pages);
> +}
>  
>  const struct address_space_operations squashfs_aops = {
> -	.read_folio = squashfs_read_folio
> +	.read_folio = squashfs_read_folio,
> +	.readahead = squashfs_readahead
>  };
> -- 
> 2.36.1.255.ge46751e96f-goog
> 
> 
> 





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/3] squashfs: implement readahead
  2022-06-07 19:29   ` Fabio M. De Francesco
@ 2022-06-08 10:20     ` Hsin-Yi Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Hsin-Yi Wang @ 2022-06-08 10:20 UTC (permalink / raw)
  To: Fabio M. De Francesco
  Cc: Phillip Lougher, Matthew Wilcox, Xiongwei Song, Marek Szyprowski,
	Andrew Morton, Zheng Liang, Zhang Yi, Hou Tao, Miao Xie,
	linux-mm @ kvack . org,
	squashfs-devel @ lists . sourceforge . net, linux-kernel,
	ira.weiny

On Wed, Jun 8, 2022 at 3:29 AM Fabio M. De Francesco
<fmdefrancesco@gmail.com> wrote:
>
> On lunedì 6 giugno 2022 17:03:05 CEST Hsin-Yi Wang wrote:
> > Implement readahead callback for squashfs. It will read datablocks
> > which cover pages in readahead request. For a few cases it will
> > not mark page as uptodate, including:
> > - file end is 0.
> > - zero filled blocks.
> > - current batch of pages isn't in the same datablock.
> > - decompressor error.
> > Otherwise pages will be marked as uptodate. The unhandled pages will be
> > updated by readpage later.
> >
> > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
> > Reported-by: Matthew Wilcox <willy@infradead.org>
> > Reported-by: Phillip Lougher <phillip@squashfs.org.uk>
> > Reported-by: Xiongwei Song <Xiongwei.Song@windriver.com>
> > Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Reported-by: Andrew Morton <akpm@linux-foundation.org>
> > ---
> > v4->v5:
> > - Handle short file cases reported by Marek and Matthew.
> > - Fix checkpatch error reported by Andrew.
> >
> > v4: https://lore.kernel.org/lkml/20220601103922.1338320-4-hsinyi@chromium.org/
> > v3: https://lore.kernel.org/lkml/20220523065909.883444-4-hsinyi@chromium.org/
> > v2: https://lore.kernel.org/lkml/20220517082650.2005840-4-hsinyi@chromium.org/
> > v1: https://lore.kernel.org/lkml/20220516105100.1412740-3-hsinyi@chromium.org/
> > ---
> >  fs/squashfs/file.c | 124 ++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 123 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > index a8e495d8eb86..fbd096cd15f4 100644
> > --- a/fs/squashfs/file.c
> > +++ b/fs/squashfs/file.c
> > @@ -39,6 +39,7 @@
> >  #include "squashfs_fs_sb.h"
> >  #include "squashfs_fs_i.h"
> >  #include "squashfs.h"
> > +#include "page_actor.h"
> >
> >  /*
> >   * Locate cache slot in range [offset, index] for specified inode.  If
> > @@ -495,7 +496,128 @@ static int squashfs_read_folio(struct file *file,
> struct folio *folio)
> >       return 0;
> >  }
> >
> > +static void squashfs_readahead(struct readahead_control *ractl)
> > +{
> > +     struct inode *inode = ractl->mapping->host;
> > +     struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> > +     size_t mask = (1UL << msblk->block_log) - 1;
> > +     unsigned short shift = msblk->block_log - PAGE_SHIFT;
> > +     loff_t start = readahead_pos(ractl) & ~mask;
> > +     size_t len = readahead_length(ractl) + readahead_pos(ractl) -
> start;
> > +     struct squashfs_page_actor *actor;
> > +     unsigned int nr_pages = 0;
> > +     struct page **pages;
> > +     int i, file_end = i_size_read(inode) >> msblk->block_log;
> > +     unsigned int max_pages = 1UL << shift;
> > +
> > +     readahead_expand(ractl, start, (len | mask) + 1);
> > +
> > +     if (file_end == 0)
> > +             return;
> > +
> > +     pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
> > +     if (!pages)
> > +             return;
> > +
> > +     actor = squashfs_page_actor_init_special(pages, max_pages, 0);
> > +     if (!actor)
> > +             goto out;
> > +
> > +     for (;;) {
> > +             pgoff_t index;
> > +             int res, bsize;
> > +             u64 block = 0;
> > +             unsigned int expected;
> > +
> > +             nr_pages = __readahead_batch(ractl, pages, max_pages);
> > +             if (!nr_pages)
> > +                     break;
> > +
> > +             if (readahead_pos(ractl) >= i_size_read(inode))
> > +                     goto skip_pages;
> > +
> > +             index = pages[0]->index >> shift;
> > +             if ((pages[nr_pages - 1]->index >> shift) != index)
> > +                     goto skip_pages;
> > +
> > +             expected = index == file_end ?
> > +                        (i_size_read(inode) & (msblk->block_size -
> 1)) :
> > +                         msblk->block_size;
> > +
> > +             bsize = read_blocklist(inode, index, &block);
> > +             if (bsize == 0)
> > +                     goto skip_pages;
> > +
> > +             if (nr_pages < max_pages) {
> > +                     struct squashfs_cache_entry *buffer;
> > +                     unsigned int block_mask = max_pages - 1;
> > +                     int offset = pages[0]->index - (pages[0]-
> >index & ~block_mask);
> > +
> > +                     buffer = squashfs_get_datablock(inode->i_sb,
> block,
> > +
> bsize);
> > +                     if (buffer->error) {
> > +                             squashfs_cache_put(buffer);
> > +                             goto skip_pages;
> > +                     }
> > +
> > +                     expected -= offset * PAGE_SIZE;
> > +                     for (i = 0; i < nr_pages && expected > 0; i+
> +,
> > +                                             expected -=
> PAGE_SIZE, offset++) {
> > +                             int avail = min_t(int, expected,
> PAGE_SIZE);
> > +
> > +                             squashfs_fill_page(pages[i],
> buffer,
> > +                                             offset *
> PAGE_SIZE, avail);
> > +                             unlock_page(pages[i]);
> > +                     }
> > +
> > +                     squashfs_cache_put(buffer);
> > +                     continue;
> > +             }
> > +
> > +             res = squashfs_read_data(inode->i_sb, block, bsize,
> NULL,
> > +                                      actor);
> > +
> > +             if (res == expected) {
> > +                     int bytes;
> > +
> > +                     /* Last page may have trailing bytes not
> filled */
> > +                     bytes = res % PAGE_SIZE;
> > +                     if (bytes) {
> > +                             void *pageaddr;
> > +
> > +                             pageaddr =
> kmap_atomic(pages[nr_pages - 1]);
> > +                             memset(pageaddr + bytes, 0,
> PAGE_SIZE - bytes);
> > +                             kunmap_atomic(pageaddr);
> > +                     }
>
> Hi Hsin-Yi,
>
> kmap_atomic() shouldn't be used in new code, unless there are special
> reasons that I am not able to spot here.
>
> Why not use kmap_local_page(), preferably via memzero_page?

Right, these can be replaced with kmap_local_page(pages[nr_pages - 1],
bytes, PAGE_SIZE - bytes);

Thanks for the suggestion.
>
> Thanks,
>
> Fabio
>
> > +
> > +                     for (i = 0; i < nr_pages; i++) {
> > +                             flush_dcache_page(pages[i]);
> > +                             SetPageUptodate(pages[i]);
> > +                     }
> > +             }
> > +
> > +             for (i = 0; i < nr_pages; i++) {
> > +                     unlock_page(pages[i]);
> > +                     put_page(pages[i]);
> > +             }
> > +     }
> > +
> > +     kfree(actor);
> > +     kfree(pages);
> > +     return;
> > +
> > +skip_pages:
> > +     for (i = 0; i < nr_pages; i++) {
> > +             unlock_page(pages[i]);
> > +             put_page(pages[i]);
> > +     }
> > +
> > +     kfree(actor);
> > +out:
> > +     kfree(pages);
> > +}
> >
> >  const struct address_space_operations squashfs_aops = {
> > -     .read_folio = squashfs_read_folio
> > +     .read_folio = squashfs_read_folio,
> > +     .readahead = squashfs_readahead
> >  };
> > --
> > 2.36.1.255.ge46751e96f-goog
> >
> >
> >
>
>
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/3] squashfs: implement readahead
  2022-06-06 15:03 ` [PATCH v5 3/3] squashfs: implement readahead Hsin-Yi Wang
  2022-06-07  3:59   ` Phillip Lougher
  2022-06-07 19:29   ` Fabio M. De Francesco
@ 2022-06-09 14:46   ` Xiongwei Song
  2022-06-10  1:32     ` Xiongwei Song
  2022-06-10  7:42     ` Phillip Lougher
  2022-06-11  5:23   ` Phillip Lougher
  3 siblings, 2 replies; 16+ messages in thread
From: Xiongwei Song @ 2022-06-09 14:46 UTC (permalink / raw)
  To: Hsin-Yi Wang
  Cc: Phillip Lougher, Matthew Wilcox, Xiongwei Song, Marek Szyprowski,
	Andrew Morton, Zheng Liang, Zhang Yi, Hou Tao, Miao Xie,
	linux-mm @ kvack . org,
	squashfs-devel @ lists . sourceforge . net,
	Linux Kernel Mailing List

This version is bad for my test. I ran the test below
"for cnt in $(seq 0 9); do echo 3 > /proc/sys/vm/drop_caches; echo
"Loop ${cnt}:"; time -v find /software/test[0-9][0-9] | xargs -P 24 -i
cat {} > /dev/null 2>/dev/null; echo ""; done"
in 90 partitions.

With 9eec1d897139 reverted:
1:06.18 (1m + 6.18s)
1:05.65
1:06.34
1:06.88
1:06.52
1:06.78
1:06.61
1:06.99
1:06.60
1:06.79

With this version:
2:36.85 (2m + 36.85s)
2:28.89
1:43.46
1:41.50
1:42.75
1:43.46
1:43.67
1:44.41
1:44.91
1:45.44

Any thoughts?

Regards,
Xiongwei

On Mon, Jun 6, 2022 at 11:03 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
>
> Implement readahead callback for squashfs. It will read datablocks
> which cover pages in readahead request. For a few cases it will
> not mark page as uptodate, including:
> - file end is 0.
> - zero filled blocks.
> - current batch of pages isn't in the same datablock.
> - decompressor error.
> Otherwise pages will be marked as uptodate. The unhandled pages will be
> updated by readpage later.
>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
> Reported-by: Matthew Wilcox <willy@infradead.org>
> Reported-by: Phillip Lougher <phillip@squashfs.org.uk>
> Reported-by: Xiongwei Song <Xiongwei.Song@windriver.com>
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Reported-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> v4->v5:
> - Handle short file cases reported by Marek and Matthew.
> - Fix checkpatch error reported by Andrew.
>
> v4: https://lore.kernel.org/lkml/20220601103922.1338320-4-hsinyi@chromium.org/
> v3: https://lore.kernel.org/lkml/20220523065909.883444-4-hsinyi@chromium.org/
> v2: https://lore.kernel.org/lkml/20220517082650.2005840-4-hsinyi@chromium.org/
> v1: https://lore.kernel.org/lkml/20220516105100.1412740-3-hsinyi@chromium.org/
> ---
>  fs/squashfs/file.c | 124 ++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 123 insertions(+), 1 deletion(-)
>
> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> index a8e495d8eb86..fbd096cd15f4 100644
> --- a/fs/squashfs/file.c
> +++ b/fs/squashfs/file.c
> @@ -39,6 +39,7 @@
>  #include "squashfs_fs_sb.h"
>  #include "squashfs_fs_i.h"
>  #include "squashfs.h"
> +#include "page_actor.h"
>
>  /*
>   * Locate cache slot in range [offset, index] for specified inode.  If
> @@ -495,7 +496,128 @@ static int squashfs_read_folio(struct file *file, struct folio *folio)
>         return 0;
>  }
>
> +static void squashfs_readahead(struct readahead_control *ractl)
> +{
> +       struct inode *inode = ractl->mapping->host;
> +       struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> +       size_t mask = (1UL << msblk->block_log) - 1;
> +       unsigned short shift = msblk->block_log - PAGE_SHIFT;
> +       loff_t start = readahead_pos(ractl) & ~mask;
> +       size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
> +       struct squashfs_page_actor *actor;
> +       unsigned int nr_pages = 0;
> +       struct page **pages;
> +       int i, file_end = i_size_read(inode) >> msblk->block_log;
> +       unsigned int max_pages = 1UL << shift;
> +
> +       readahead_expand(ractl, start, (len | mask) + 1);
> +
> +       if (file_end == 0)
> +               return;
> +
> +       pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
> +       if (!pages)
> +               return;
> +
> +       actor = squashfs_page_actor_init_special(pages, max_pages, 0);
> +       if (!actor)
> +               goto out;
> +
> +       for (;;) {
> +               pgoff_t index;
> +               int res, bsize;
> +               u64 block = 0;
> +               unsigned int expected;
> +
> +               nr_pages = __readahead_batch(ractl, pages, max_pages);
> +               if (!nr_pages)
> +                       break;
> +
> +               if (readahead_pos(ractl) >= i_size_read(inode))
> +                       goto skip_pages;
> +
> +               index = pages[0]->index >> shift;
> +               if ((pages[nr_pages - 1]->index >> shift) != index)
> +                       goto skip_pages;
> +
> +               expected = index == file_end ?
> +                          (i_size_read(inode) & (msblk->block_size - 1)) :
> +                           msblk->block_size;
> +
> +               bsize = read_blocklist(inode, index, &block);
> +               if (bsize == 0)
> +                       goto skip_pages;
> +
> +               if (nr_pages < max_pages) {
> +                       struct squashfs_cache_entry *buffer;
> +                       unsigned int block_mask = max_pages - 1;
> +                       int offset = pages[0]->index - (pages[0]->index & ~block_mask);
> +
> +                       buffer = squashfs_get_datablock(inode->i_sb, block,
> +                                                       bsize);
> +                       if (buffer->error) {
> +                               squashfs_cache_put(buffer);
> +                               goto skip_pages;
> +                       }
> +
> +                       expected -= offset * PAGE_SIZE;
> +                       for (i = 0; i < nr_pages && expected > 0; i++,
> +                                               expected -= PAGE_SIZE, offset++) {
> +                               int avail = min_t(int, expected, PAGE_SIZE);
> +
> +                               squashfs_fill_page(pages[i], buffer,
> +                                               offset * PAGE_SIZE, avail);
> +                               unlock_page(pages[i]);
> +                       }
> +
> +                       squashfs_cache_put(buffer);
> +                       continue;
> +               }
> +
> +               res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
> +                                        actor);
> +
> +               if (res == expected) {
> +                       int bytes;
> +
> +                       /* Last page may have trailing bytes not filled */
> +                       bytes = res % PAGE_SIZE;
> +                       if (bytes) {
> +                               void *pageaddr;
> +
> +                               pageaddr = kmap_atomic(pages[nr_pages - 1]);
> +                               memset(pageaddr + bytes, 0, PAGE_SIZE - bytes);
> +                               kunmap_atomic(pageaddr);
> +                       }
> +
> +                       for (i = 0; i < nr_pages; i++) {
> +                               flush_dcache_page(pages[i]);
> +                               SetPageUptodate(pages[i]);
> +                       }
> +               }
> +
> +               for (i = 0; i < nr_pages; i++) {
> +                       unlock_page(pages[i]);
> +                       put_page(pages[i]);
> +               }
> +       }
> +
> +       kfree(actor);
> +       kfree(pages);
> +       return;
> +
> +skip_pages:
> +       for (i = 0; i < nr_pages; i++) {
> +               unlock_page(pages[i]);
> +               put_page(pages[i]);
> +       }
> +
> +       kfree(actor);
> +out:
> +       kfree(pages);
> +}
>
>  const struct address_space_operations squashfs_aops = {
> -       .read_folio = squashfs_read_folio
> +       .read_folio = squashfs_read_folio,
> +       .readahead = squashfs_readahead
>  };
> --
> 2.36.1.255.ge46751e96f-goog
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/3] squashfs: implement readahead
  2022-06-09 14:46   ` Xiongwei Song
@ 2022-06-10  1:32     ` Xiongwei Song
  2022-06-10  7:42     ` Phillip Lougher
  1 sibling, 0 replies; 16+ messages in thread
From: Xiongwei Song @ 2022-06-10  1:32 UTC (permalink / raw)
  To: Hsin-Yi Wang
  Cc: Phillip Lougher, Matthew Wilcox, Xiongwei Song, Marek Szyprowski,
	Andrew Morton, Zheng Liang, Zhang Yi, Hou Tao, Miao Xie,
	linux-mm @ kvack . org,
	squashfs-devel @ lists . sourceforge . net,
	Linux Kernel Mailing List

On Thu, Jun 9, 2022 at 10:46 PM Xiongwei Song <sxwjean@gmail.com> wrote:
>
> This version is bad for my test. I ran the test below
> "for cnt in $(seq 0 9); do echo 3 > /proc/sys/vm/drop_caches; echo
> "Loop ${cnt}:"; time -v find /software/test[0-9][0-9] | xargs -P 24 -i
> cat {} > /dev/null 2>/dev/null; echo ""; done"
> in 90 partitions.
>
> With 9eec1d897139 reverted:

Sorry, it's with readahead enabled in linux 5.10.

Regards,
Xiongwei

> 1:06.18 (1m + 6.18s)
> 1:05.65
> 1:06.34
> 1:06.88
> 1:06.52
> 1:06.78
> 1:06.61
> 1:06.99
> 1:06.60
> 1:06.79
>
> With this version:
> 2:36.85 (2m + 36.85s)
> 2:28.89
> 1:43.46
> 1:41.50
> 1:42.75
> 1:43.46
> 1:43.67
> 1:44.41
> 1:44.91
> 1:45.44
>
> Any thoughts?
>
> Regards,
> Xiongwei
>
> On Mon, Jun 6, 2022 at 11:03 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> >
> > Implement readahead callback for squashfs. It will read datablocks
> > which cover pages in readahead request. For a few cases it will
> > not mark page as uptodate, including:
> > - file end is 0.
> > - zero filled blocks.
> > - current batch of pages isn't in the same datablock.
> > - decompressor error.
> > Otherwise pages will be marked as uptodate. The unhandled pages will be
> > updated by readpage later.
> >
> > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
> > Reported-by: Matthew Wilcox <willy@infradead.org>
> > Reported-by: Phillip Lougher <phillip@squashfs.org.uk>
> > Reported-by: Xiongwei Song <Xiongwei.Song@windriver.com>
> > Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Reported-by: Andrew Morton <akpm@linux-foundation.org>
> > ---
> > v4->v5:
> > - Handle short file cases reported by Marek and Matthew.
> > - Fix checkpatch error reported by Andrew.
> >
> > v4: https://lore.kernel.org/lkml/20220601103922.1338320-4-hsinyi@chromium.org/
> > v3: https://lore.kernel.org/lkml/20220523065909.883444-4-hsinyi@chromium.org/
> > v2: https://lore.kernel.org/lkml/20220517082650.2005840-4-hsinyi@chromium.org/
> > v1: https://lore.kernel.org/lkml/20220516105100.1412740-3-hsinyi@chromium.org/
> > ---
> >  fs/squashfs/file.c | 124 ++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 123 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > index a8e495d8eb86..fbd096cd15f4 100644
> > --- a/fs/squashfs/file.c
> > +++ b/fs/squashfs/file.c
> > @@ -39,6 +39,7 @@
> >  #include "squashfs_fs_sb.h"
> >  #include "squashfs_fs_i.h"
> >  #include "squashfs.h"
> > +#include "page_actor.h"
> >
> >  /*
> >   * Locate cache slot in range [offset, index] for specified inode.  If
> > @@ -495,7 +496,128 @@ static int squashfs_read_folio(struct file *file, struct folio *folio)
> >         return 0;
> >  }
> >
> > +static void squashfs_readahead(struct readahead_control *ractl)
> > +{
> > +       struct inode *inode = ractl->mapping->host;
> > +       struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> > +       size_t mask = (1UL << msblk->block_log) - 1;
> > +       unsigned short shift = msblk->block_log - PAGE_SHIFT;
> > +       loff_t start = readahead_pos(ractl) & ~mask;
> > +       size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
> > +       struct squashfs_page_actor *actor;
> > +       unsigned int nr_pages = 0;
> > +       struct page **pages;
> > +       int i, file_end = i_size_read(inode) >> msblk->block_log;
> > +       unsigned int max_pages = 1UL << shift;
> > +
> > +       readahead_expand(ractl, start, (len | mask) + 1);
> > +
> > +       if (file_end == 0)
> > +               return;
> > +
> > +       pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
> > +       if (!pages)
> > +               return;
> > +
> > +       actor = squashfs_page_actor_init_special(pages, max_pages, 0);
> > +       if (!actor)
> > +               goto out;
> > +
> > +       for (;;) {
> > +               pgoff_t index;
> > +               int res, bsize;
> > +               u64 block = 0;
> > +               unsigned int expected;
> > +
> > +               nr_pages = __readahead_batch(ractl, pages, max_pages);
> > +               if (!nr_pages)
> > +                       break;
> > +
> > +               if (readahead_pos(ractl) >= i_size_read(inode))
> > +                       goto skip_pages;
> > +
> > +               index = pages[0]->index >> shift;
> > +               if ((pages[nr_pages - 1]->index >> shift) != index)
> > +                       goto skip_pages;
> > +
> > +               expected = index == file_end ?
> > +                          (i_size_read(inode) & (msblk->block_size - 1)) :
> > +                           msblk->block_size;
> > +
> > +               bsize = read_blocklist(inode, index, &block);
> > +               if (bsize == 0)
> > +                       goto skip_pages;
> > +
> > +               if (nr_pages < max_pages) {
> > +                       struct squashfs_cache_entry *buffer;
> > +                       unsigned int block_mask = max_pages - 1;
> > +                       int offset = pages[0]->index - (pages[0]->index & ~block_mask);
> > +
> > +                       buffer = squashfs_get_datablock(inode->i_sb, block,
> > +                                                       bsize);
> > +                       if (buffer->error) {
> > +                               squashfs_cache_put(buffer);
> > +                               goto skip_pages;
> > +                       }
> > +
> > +                       expected -= offset * PAGE_SIZE;
> > +                       for (i = 0; i < nr_pages && expected > 0; i++,
> > +                                               expected -= PAGE_SIZE, offset++) {
> > +                               int avail = min_t(int, expected, PAGE_SIZE);
> > +
> > +                               squashfs_fill_page(pages[i], buffer,
> > +                                               offset * PAGE_SIZE, avail);
> > +                               unlock_page(pages[i]);
> > +                       }
> > +
> > +                       squashfs_cache_put(buffer);
> > +                       continue;
> > +               }
> > +
> > +               res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
> > +                                        actor);
> > +
> > +               if (res == expected) {
> > +                       int bytes;
> > +
> > +                       /* Last page may have trailing bytes not filled */
> > +                       bytes = res % PAGE_SIZE;
> > +                       if (bytes) {
> > +                               void *pageaddr;
> > +
> > +                               pageaddr = kmap_atomic(pages[nr_pages - 1]);
> > +                               memset(pageaddr + bytes, 0, PAGE_SIZE - bytes);
> > +                               kunmap_atomic(pageaddr);
> > +                       }
> > +
> > +                       for (i = 0; i < nr_pages; i++) {
> > +                               flush_dcache_page(pages[i]);
> > +                               SetPageUptodate(pages[i]);
> > +                       }
> > +               }
> > +
> > +               for (i = 0; i < nr_pages; i++) {
> > +                       unlock_page(pages[i]);
> > +                       put_page(pages[i]);
> > +               }
> > +       }
> > +
> > +       kfree(actor);
> > +       kfree(pages);
> > +       return;
> > +
> > +skip_pages:
> > +       for (i = 0; i < nr_pages; i++) {
> > +               unlock_page(pages[i]);
> > +               put_page(pages[i]);
> > +       }
> > +
> > +       kfree(actor);
> > +out:
> > +       kfree(pages);
> > +}
> >
> >  const struct address_space_operations squashfs_aops = {
> > -       .read_folio = squashfs_read_folio
> > +       .read_folio = squashfs_read_folio,
> > +       .readahead = squashfs_readahead
> >  };
> > --
> > 2.36.1.255.ge46751e96f-goog
> >
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/3] squashfs: implement readahead
  2022-06-09 14:46   ` Xiongwei Song
  2022-06-10  1:32     ` Xiongwei Song
@ 2022-06-10  7:42     ` Phillip Lougher
  2022-06-13  1:36       ` Xiongwei Song
  2022-07-15  1:45       ` Xiongwei Song
  1 sibling, 2 replies; 16+ messages in thread
From: Phillip Lougher @ 2022-06-10  7:42 UTC (permalink / raw)
  To: Xiongwei Song, Hsin-Yi Wang
  Cc: Matthew Wilcox, Xiongwei Song, Marek Szyprowski, Andrew Morton,
	Zheng Liang, Zhang Yi, Hou Tao, Miao Xie, linux-mm @ kvack . org,
	squashfs-devel @ lists . sourceforge . net,
	Linux Kernel Mailing List

On 09/06/2022 15:46, Xiongwei Song wrote:
> This version is bad for my test. I ran the test below
> "for cnt in $(seq 0 9); do echo 3 > /proc/sys/vm/drop_caches; echo
> "Loop ${cnt}:"; time -v find /software/test[0-9][0-9] | xargs -P 24 -i
> cat {} > /dev/null 2>/dev/null; echo ""; done"
> in 90 partitions.
> 
> With 9eec1d897139 reverted:
> 1:06.18 (1m + 6.18s)
> 1:05.65
> 1:06.34
> 1:06.88
> 1:06.52
> 1:06.78
> 1:06.61
> 1:06.99
> 1:06.60
> 1:06.79
> 
> With this version:
> 2:36.85 (2m + 36.85s)
> 2:28.89
> 1:43.46
> 1:41.50
> 1:42.75
> 1:43.46
> 1:43.67
> 1:44.41
> 1:44.91
> 1:45.44
> 
> Any thoughts?

Thank-you for your latest test results, and they tend to
imply that the latest version of the patch hasn't improved
performance in your use-case.

One thing which is becoming clear here is that the devil is in
the detail, and your results being summaries are not capturing
enough detail to understand what is happening.  They show
something is wrong, but, don't give any guidance as to what
is happening.

I think it will be difficult to capture more details from
your test case.  But, detail can be captured from summaries, by
varying the input and extrapolating from the results.

By that I mean have you tried changing anything, and observed any
changed results?

For instance have you tried any of the following

1.  Changing the parallelism of your test from 24 read threads.
     Does 1, 2, 4 etc parallel read threads change the observed
     behaviour?  In other words is the slow-down observed across
     all degrees of parallelism, or is there a critical point.

2. Does the Squashfs parallelism options in the kernel configuration
    change the behaviour?  Knowing if the number of "decompressors"
    available changes the difference in performance could be important.

3. Are your Squashfs filesystems built using fragments, or without
    fragments?  Rebuilding the filesystems without fragments, and
    observing any different performance, would help to pinpoint
    where the issue lies.

4. What is the block size used in your Squashfs filesystems.  Have
    you tried changing the block size, and seen what effect
    it has on the difference in performance between the patches?

5. You don't mention where your Squashfs filesystems are stored.
    Is this slow media or fast media?  Have you tried moving
    the Squashfs filesystems onto different media and observed
    any difference in performance between the patches?

The fact of the matter is there are many over-lapping factors
which affect the performance of Squashfs filesystems (like any
reasonably complex code), which may be elsewhere.  It can only
take a small change somewhere to have a dramatic affect on
performance.

This is particularly the case with embedded systems, which
may be short on CPU performance, short on RAM, and have low
performance media, and be effectively operating on the "edge".
It can only take a small change, an update for instance, to
change from performing well to badly.

I speak from experience, having spent over ten years in embedded
Linux as a senior engineer and then as a consultant.  I have
my own horror tales as a consultant, dealing with systems pushed
beyond the edge (with hacks), and the customer insisting they
didn't do anything to cause the system to finally break.

Maybe it is off topic here.  But, I remember one instance where
a customer had a system out in the field, which "inexplicably"
started to lock up every 6 months or so.  This system had regular
updates "over the air", and I discovered the "lock up" only
started happening after the latest update.  It turns out the new version
of the application had grown a new feature which needed more
RAM than normal.  This feature wasn't used very often, but,
if it coincided with an infrequent "house-keeping" background task,
the system ran out of memory and locked up (they had disabled the OOM
killer).  This was so rare it might only coincide after six months.  No
bug, but a slow growth in working set RAM over a number of versions.

In other words we may be looking at a knock-on side effect of
readahead, which is either caused by issues elsewhere or is
causing issues elsewhere.

Dealing with it in isolation, as bug in the readahead code is going
to get us nowhere, looking for something that isn't there.

I'm not saying that this is the case here.  But, the more detail
you can provide, and the more test variants you can provide will
help to determine what is the problem.

Thanks

Phillip


> 
> Regards,
> Xiongwei
> 
> On Mon, Jun 6, 2022 at 11:03 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
>>
>> Implement readahead callback for squashfs. It will read datablocks
>> which cover pages in readahead request. For a few cases it will
>> not mark page as uptodate, including:
>> - file end is 0.
>> - zero filled blocks.
>> - current batch of pages isn't in the same datablock.
>> - decompressor error.
>> Otherwise pages will be marked as uptodate. The unhandled pages will be
>> updated by readpage later.
>>
>> Suggested-by: Matthew Wilcox <willy@infradead.org>
>> Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
>> Reported-by: Matthew Wilcox <willy@infradead.org>
>> Reported-by: Phillip Lougher <phillip@squashfs.org.uk>
>> Reported-by: Xiongwei Song <Xiongwei.Song@windriver.com>
>> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> Reported-by: Andrew Morton <akpm@linux-foundation.org>
>> ---
>> v4->v5:
>> - Handle short file cases reported by Marek and Matthew.
>> - Fix checkpatch error reported by Andrew.
>>
>> v4: https://lore.kernel.org/lkml/20220601103922.1338320-4-hsinyi@chromium.org/
>> v3: https://lore.kernel.org/lkml/20220523065909.883444-4-hsinyi@chromium.org/
>> v2: https://lore.kernel.org/lkml/20220517082650.2005840-4-hsinyi@chromium.org/
>> v1: https://lore.kernel.org/lkml/20220516105100.1412740-3-hsinyi@chromium.org/
>> ---
>>   fs/squashfs/file.c | 124 ++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 123 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
>> index a8e495d8eb86..fbd096cd15f4 100644
>> --- a/fs/squashfs/file.c
>> +++ b/fs/squashfs/file.c
>> @@ -39,6 +39,7 @@
>>   #include "squashfs_fs_sb.h"
>>   #include "squashfs_fs_i.h"
>>   #include "squashfs.h"
>> +#include "page_actor.h"
>>
>>   /*
>>    * Locate cache slot in range [offset, index] for specified inode.  If
>> @@ -495,7 +496,128 @@ static int squashfs_read_folio(struct file *file, struct folio *folio)
>>          return 0;
>>   }
>>
>> +static void squashfs_readahead(struct readahead_control *ractl)
>> +{
>> +       struct inode *inode = ractl->mapping->host;
>> +       struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
>> +       size_t mask = (1UL << msblk->block_log) - 1;
>> +       unsigned short shift = msblk->block_log - PAGE_SHIFT;
>> +       loff_t start = readahead_pos(ractl) & ~mask;
>> +       size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
>> +       struct squashfs_page_actor *actor;
>> +       unsigned int nr_pages = 0;
>> +       struct page **pages;
>> +       int i, file_end = i_size_read(inode) >> msblk->block_log;
>> +       unsigned int max_pages = 1UL << shift;
>> +
>> +       readahead_expand(ractl, start, (len | mask) + 1);
>> +
>> +       if (file_end == 0)
>> +               return;
>> +
>> +       pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
>> +       if (!pages)
>> +               return;
>> +
>> +       actor = squashfs_page_actor_init_special(pages, max_pages, 0);
>> +       if (!actor)
>> +               goto out;
>> +
>> +       for (;;) {
>> +               pgoff_t index;
>> +               int res, bsize;
>> +               u64 block = 0;
>> +               unsigned int expected;
>> +
>> +               nr_pages = __readahead_batch(ractl, pages, max_pages);
>> +               if (!nr_pages)
>> +                       break;
>> +
>> +               if (readahead_pos(ractl) >= i_size_read(inode))
>> +                       goto skip_pages;
>> +
>> +               index = pages[0]->index >> shift;
>> +               if ((pages[nr_pages - 1]->index >> shift) != index)
>> +                       goto skip_pages;
>> +
>> +               expected = index == file_end ?
>> +                          (i_size_read(inode) & (msblk->block_size - 1)) :
>> +                           msblk->block_size;
>> +
>> +               bsize = read_blocklist(inode, index, &block);
>> +               if (bsize == 0)
>> +                       goto skip_pages;
>> +
>> +               if (nr_pages < max_pages) {
>> +                       struct squashfs_cache_entry *buffer;
>> +                       unsigned int block_mask = max_pages - 1;
>> +                       int offset = pages[0]->index - (pages[0]->index & ~block_mask);
>> +
>> +                       buffer = squashfs_get_datablock(inode->i_sb, block,
>> +                                                       bsize);
>> +                       if (buffer->error) {
>> +                               squashfs_cache_put(buffer);
>> +                               goto skip_pages;
>> +                       }
>> +
>> +                       expected -= offset * PAGE_SIZE;
>> +                       for (i = 0; i < nr_pages && expected > 0; i++,
>> +                                               expected -= PAGE_SIZE, offset++) {
>> +                               int avail = min_t(int, expected, PAGE_SIZE);
>> +
>> +                               squashfs_fill_page(pages[i], buffer,
>> +                                               offset * PAGE_SIZE, avail);
>> +                               unlock_page(pages[i]);
>> +                       }
>> +
>> +                       squashfs_cache_put(buffer);
>> +                       continue;
>> +               }
>> +
>> +               res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
>> +                                        actor);
>> +
>> +               if (res == expected) {
>> +                       int bytes;
>> +
>> +                       /* Last page may have trailing bytes not filled */
>> +                       bytes = res % PAGE_SIZE;
>> +                       if (bytes) {
>> +                               void *pageaddr;
>> +
>> +                               pageaddr = kmap_atomic(pages[nr_pages - 1]);
>> +                               memset(pageaddr + bytes, 0, PAGE_SIZE - bytes);
>> +                               kunmap_atomic(pageaddr);
>> +                       }
>> +
>> +                       for (i = 0; i < nr_pages; i++) {
>> +                               flush_dcache_page(pages[i]);
>> +                               SetPageUptodate(pages[i]);
>> +                       }
>> +               }
>> +
>> +               for (i = 0; i < nr_pages; i++) {
>> +                       unlock_page(pages[i]);
>> +                       put_page(pages[i]);
>> +               }
>> +       }
>> +
>> +       kfree(actor);
>> +       kfree(pages);
>> +       return;
>> +
>> +skip_pages:
>> +       for (i = 0; i < nr_pages; i++) {
>> +               unlock_page(pages[i]);
>> +               put_page(pages[i]);
>> +       }
>> +
>> +       kfree(actor);
>> +out:
>> +       kfree(pages);
>> +}
>>
>>   const struct address_space_operations squashfs_aops = {
>> -       .read_folio = squashfs_read_folio
>> +       .read_folio = squashfs_read_folio,
>> +       .readahead = squashfs_readahead
>>   };
>> --
>> 2.36.1.255.ge46751e96f-goog
>>
>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/3] squashfs: implement readahead
  2022-06-06 15:03 ` [PATCH v5 3/3] squashfs: implement readahead Hsin-Yi Wang
                     ` (2 preceding siblings ...)
  2022-06-09 14:46   ` Xiongwei Song
@ 2022-06-11  5:23   ` Phillip Lougher
  3 siblings, 0 replies; 16+ messages in thread
From: Phillip Lougher @ 2022-06-11  5:23 UTC (permalink / raw)
  To: Hsin-Yi Wang, Matthew Wilcox, Xiongwei Song, Marek Szyprowski,
	Andrew Morton
  Cc: Zheng Liang, Zhang Yi, Hou Tao, Miao Xie, linux-mm @ kvack . org,
	squashfs-devel @ lists . sourceforge . net, linux-kernel

On 06/06/2022 16:03, Hsin-Yi Wang wrote:
> Implement readahead callback for squashfs. It will read datablocks
> which cover pages in readahead request. For a few cases it will
> not mark page as uptodate, including:
> - file end is 0.
> - zero filled blocks.
> - current batch of pages isn't in the same datablock.
> - decompressor error.
> Otherwise pages will be marked as uptodate. The unhandled pages will be
> updated by readpage later.
> 

Hi Hsin-Yi,

I have reviewed, tested and instrumented the following patch.

There are a number of problems with the patch including
performance, unhandled issues, and bugs.

In this email I'll concentrate on the performance aspects.

The major change between this V5 patch and the previous patches
(V4 etc), is that it now handles the case where

+ nr_pages = __readahead_batch(ractl, pages, max_pages);

returns an "nr_pages" less than "max_pages".

What this means is that the readahead code has returned a set
of page cache pages which does not fully map the datablock to
be decompressed.

If this is passed to squashfs_read_data() using the current
"page actor" code, the decompression will fail on the missing
pages.

In recognition of that fact, your V5 patch falls back to using
the earlier intermediate buffer method, with
squashfs_get_datablock() returning a buffer, which is then memcopied
into the page cache pages.

This is currently what is also done in the existing
squashfs_readpage_block() function if the entire set of pages cannot
be obtained.

The problem with this fallback intermediate buffer is it is slow, both
due to the additional memcopies, but, more importantly because it
introduces contention on a single shared buffer.

I have long had the intention to fix this performance issue in
squashfs_readpage_block(), but, due it being a rare issue there, the
additional work has seemed to be nice but not essential.

The problem is we don't want the readahead code to be using this
slow method, because the scenario will probably happen much more
often, and for a performance improvement patch, falling back to
an old slow method isn't very useful.

So I have finally done the work to make the "page actor" code handle
missing pages.

This I have sent out in the following patch-set updating the
squashfs_readpage_block() function to use it.

https://lore.kernel.org/lkml/20220611032133.5743-1-phillip@squashfs.org.uk/

You can use this updated "page actor" code to eliminate the
"nr_pages < max_pages" special case in your patch.  With the benefit
that decompression is done directly into the page cache.

I have updated your patch to use the new functionality.  The diff
including a bug fix I have appended to this email.

Phillip

diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
index b86b2f9d9ae6..721d35ecfca9 100644
--- a/fs/squashfs/file.c
+++ b/fs/squashfs/file.c
@@ -519,10 +519,6 @@ static void squashfs_readahead(struct 
readahead_control *ractl)
  	if (!pages)
  		return;

-	actor = squashfs_page_actor_init_special(pages, max_pages, 0);
-	if (!actor)
-		goto out;
-
  	for (;;) {
  		pgoff_t index;
  		int res, bsize;
@@ -548,41 +544,21 @@ static void squashfs_readahead(struct 
readahead_control *ractl)
  		if (bsize == 0)
  			goto skip_pages;

-		if (nr_pages < max_pages) {
-			struct squashfs_cache_entry *buffer;
-			unsigned int block_mask = max_pages - 1;
-			int offset = pages[0]->index - (pages[0]->index & ~block_mask);
-
-			buffer = squashfs_get_datablock(inode->i_sb, block,
-							bsize);
-			if (buffer->error) {
-				squashfs_cache_put(buffer);
-				goto skip_pages;
-			}
-
-			expected -= offset * PAGE_SIZE;
-			for (i = 0; i < nr_pages && expected > 0; i++,
-						expected -= PAGE_SIZE, offset++) {
-				int avail = min_t(int, expected, PAGE_SIZE);
-
-				squashfs_fill_page(pages[i], buffer,
-						offset * PAGE_SIZE, avail);
-				unlock_page(pages[i]);
-			}
-
-			squashfs_cache_put(buffer);
-			continue;
-		}
+		actor = squashfs_page_actor_init_special(msblk, pages, nr_pages, 
expected);
+		if (!actor)
+			goto out;

  		res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
  					 actor);

+		kfree(actor);
+
  		if (res == expected) {
  			int bytes;

-			/* Last page may have trailing bytes not filled */
+			/* Last page (if present) may have trailing bytes not filled */
  			bytes = res % PAGE_SIZE;
-			if (bytes) {
+			if (pages[nr_pages - 1]->index == file_end && bytes) {
  				void *pageaddr;

  				pageaddr = kmap_atomic(pages[nr_pages - 1]);
@@ -602,7 +578,6 @@ static void squashfs_readahead(struct 
readahead_control *ractl)
  		}
  	}

-	kfree(actor);
  	kfree(pages);
  	return;

@@ -612,7 +587,6 @@ static void squashfs_readahead(struct 
readahead_control *ractl)
  		put_page(pages[i]);
  	}

-	kfree(actor);
  out:
  	kfree(pages);
  }
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/3] squashfs: implement readahead
  2022-06-10  7:42     ` Phillip Lougher
@ 2022-06-13  1:36       ` Xiongwei Song
  2022-06-13  8:35         ` Hsin-Yi Wang
  2022-07-15  1:45       ` Xiongwei Song
  1 sibling, 1 reply; 16+ messages in thread
From: Xiongwei Song @ 2022-06-13  1:36 UTC (permalink / raw)
  To: Phillip Lougher
  Cc: Hsin-Yi Wang, Matthew Wilcox, Xiongwei Song, Marek Szyprowski,
	Andrew Morton, Zheng Liang, Zhang Yi, Hou Tao, Miao Xie,
	linux-mm @ kvack . org,
	squashfs-devel @ lists . sourceforge . net,
	Linux Kernel Mailing List

Hi,

On Fri, Jun 10, 2022 at 3:42 PM Phillip Lougher <phillip@squashfs.org.uk> wrote:
>
> On 09/06/2022 15:46, Xiongwei Song wrote:
> > This version is bad for my test. I ran the test below
> > "for cnt in $(seq 0 9); do echo 3 > /proc/sys/vm/drop_caches; echo
> > "Loop ${cnt}:"; time -v find /software/test[0-9][0-9] | xargs -P 24 -i
> > cat {} > /dev/null 2>/dev/null; echo ""; done"
> > in 90 partitions.
> >
> > With 9eec1d897139 reverted:
> > 1:06.18 (1m + 6.18s)
> > 1:05.65
> > 1:06.34
> > 1:06.88
> > 1:06.52
> > 1:06.78
> > 1:06.61
> > 1:06.99
> > 1:06.60
> > 1:06.79
> >
> > With this version:
> > 2:36.85 (2m + 36.85s)
> > 2:28.89
> > 1:43.46
> > 1:41.50
> > 1:42.75
> > 1:43.46
> > 1:43.67
> > 1:44.41
> > 1:44.91
> > 1:45.44
> >
> > Any thoughts?
>
> Thank-you for your latest test results, and they tend to
> imply that the latest version of the patch hasn't improved
> performance in your use-case.
>
> One thing which is becoming clear here is that the devil is in
> the detail, and your results being summaries are not capturing
> enough detail to understand what is happening.  They show
> something is wrong, but, don't give any guidance as to what
> is happening.
>
> I think it will be difficult to capture more details from
> your test case.  But, detail can be captured from summaries, by
> varying the input and extrapolating from the results.
>
> By that I mean have you tried changing anything, and observed any
> changed results?
>
> For instance have you tried any of the following
>
> 1.  Changing the parallelism of your test from 24 read threads.
>      Does 1, 2, 4 etc parallel read threads change the observed
>      behaviour?  In other words is the slow-down observed across
>      all degrees of parallelism, or is there a critical point.
>
> 2. Does the Squashfs parallelism options in the kernel configuration
>     change the behaviour?  Knowing if the number of "decompressors"
>     available changes the difference in performance could be important.
>
> 3. Are your Squashfs filesystems built using fragments, or without
>     fragments?  Rebuilding the filesystems without fragments, and
>     observing any different performance, would help to pinpoint
>     where the issue lies.
>
> 4. What is the block size used in your Squashfs filesystems.  Have
>     you tried changing the block size, and seen what effect
>     it has on the difference in performance between the patches?
>
> 5. You don't mention where your Squashfs filesystems are stored.
>     Is this slow media or fast media?  Have you tried moving
>     the Squashfs filesystems onto different media and observed
>     any difference in performance between the patches?
>

Thanks for your response and inputs.  I really appreciated your help.
 I can try these things  but can't provide the detailed results for
now because I'm busy with a few things, hence It's hard to focus
on this one thing for me.

> The fact of the matter is there are many over-lapping factors
> which affect the performance of Squashfs filesystems (like any
> reasonably complex code), which may be elsewhere.  It can only
> take a small change somewhere to have a dramatic affect on
> performance.
>
> This is particularly the case with embedded systems, which
> may be short on CPU performance, short on RAM, and have low
> performance media, and be effectively operating on the "edge".
> It can only take a small change, an update for instance, to
> change from performing well to badly.

Totally agree.

>
> I speak from experience, having spent over ten years in embedded
> Linux as a senior engineer and then as a consultant.  I have
> my own horror tales as a consultant, dealing with systems pushed
> beyond the edge (with hacks), and the customer insisting they
> didn't do anything to cause the system to finally break.
>
> Maybe it is off topic here.  But, I remember one instance where
> a customer had a system out in the field, which "inexplicably"
> started to lock up every 6 months or so.  This system had regular
> updates "over the air", and I discovered the "lock up" only
> started happening after the latest update.  It turns out the new version
> of the application had grown a new feature which needed more
> RAM than normal.  This feature wasn't used very often, but,
> if it coincided with an infrequent "house-keeping" background task,
> the system ran out of memory and locked up (they had disabled the OOM
> killer).  This was so rare it might only coincide after six months.  No
> bug, but a slow growth in working set RAM over a number of versions.
>
> In other words we may be looking at a knock-on side effect of
> readahead, which is either caused by issues elsewhere or is
> causing issues elsewhere.
>
> Dealing with it in isolation, as bug in the readahead code is going
> to get us nowhere, looking for something that isn't there.
>
> I'm not saying that this is the case here.  But, the more detail
> you can provide, and the more test variants you can provide will
> help to determine what is the problem.

Thanks for your sharing. I will provide detail later.

Regards,
Xiongwei

>
> Thanks
>
> Phillip
>
>
> >
> > Regards,
> > Xiongwei
> >
> > On Mon, Jun 6, 2022 at 11:03 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> >>
> >> Implement readahead callback for squashfs. It will read datablocks
> >> which cover pages in readahead request. For a few cases it will
> >> not mark page as uptodate, including:
> >> - file end is 0.
> >> - zero filled blocks.
> >> - current batch of pages isn't in the same datablock.
> >> - decompressor error.
> >> Otherwise pages will be marked as uptodate. The unhandled pages will be
> >> updated by readpage later.
> >>
> >> Suggested-by: Matthew Wilcox <willy@infradead.org>
> >> Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
> >> Reported-by: Matthew Wilcox <willy@infradead.org>
> >> Reported-by: Phillip Lougher <phillip@squashfs.org.uk>
> >> Reported-by: Xiongwei Song <Xiongwei.Song@windriver.com>
> >> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >> Reported-by: Andrew Morton <akpm@linux-foundation.org>
> >> ---
> >> v4->v5:
> >> - Handle short file cases reported by Marek and Matthew.
> >> - Fix checkpatch error reported by Andrew.
> >>
> >> v4: https://lore.kernel.org/lkml/20220601103922.1338320-4-hsinyi@chromium.org/
> >> v3: https://lore.kernel.org/lkml/20220523065909.883444-4-hsinyi@chromium.org/
> >> v2: https://lore.kernel.org/lkml/20220517082650.2005840-4-hsinyi@chromium.org/
> >> v1: https://lore.kernel.org/lkml/20220516105100.1412740-3-hsinyi@chromium.org/
> >> ---
> >>   fs/squashfs/file.c | 124 ++++++++++++++++++++++++++++++++++++++++++++-
> >>   1 file changed, 123 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> >> index a8e495d8eb86..fbd096cd15f4 100644
> >> --- a/fs/squashfs/file.c
> >> +++ b/fs/squashfs/file.c
> >> @@ -39,6 +39,7 @@
> >>   #include "squashfs_fs_sb.h"
> >>   #include "squashfs_fs_i.h"
> >>   #include "squashfs.h"
> >> +#include "page_actor.h"
> >>
> >>   /*
> >>    * Locate cache slot in range [offset, index] for specified inode.  If
> >> @@ -495,7 +496,128 @@ static int squashfs_read_folio(struct file *file, struct folio *folio)
> >>          return 0;
> >>   }
> >>
> >> +static void squashfs_readahead(struct readahead_control *ractl)
> >> +{
> >> +       struct inode *inode = ractl->mapping->host;
> >> +       struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> >> +       size_t mask = (1UL << msblk->block_log) - 1;
> >> +       unsigned short shift = msblk->block_log - PAGE_SHIFT;
> >> +       loff_t start = readahead_pos(ractl) & ~mask;
> >> +       size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
> >> +       struct squashfs_page_actor *actor;
> >> +       unsigned int nr_pages = 0;
> >> +       struct page **pages;
> >> +       int i, file_end = i_size_read(inode) >> msblk->block_log;
> >> +       unsigned int max_pages = 1UL << shift;
> >> +
> >> +       readahead_expand(ractl, start, (len | mask) + 1);
> >> +
> >> +       if (file_end == 0)
> >> +               return;
> >> +
> >> +       pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
> >> +       if (!pages)
> >> +               return;
> >> +
> >> +       actor = squashfs_page_actor_init_special(pages, max_pages, 0);
> >> +       if (!actor)
> >> +               goto out;
> >> +
> >> +       for (;;) {
> >> +               pgoff_t index;
> >> +               int res, bsize;
> >> +               u64 block = 0;
> >> +               unsigned int expected;
> >> +
> >> +               nr_pages = __readahead_batch(ractl, pages, max_pages);
> >> +               if (!nr_pages)
> >> +                       break;
> >> +
> >> +               if (readahead_pos(ractl) >= i_size_read(inode))
> >> +                       goto skip_pages;
> >> +
> >> +               index = pages[0]->index >> shift;
> >> +               if ((pages[nr_pages - 1]->index >> shift) != index)
> >> +                       goto skip_pages;
> >> +
> >> +               expected = index == file_end ?
> >> +                          (i_size_read(inode) & (msblk->block_size - 1)) :
> >> +                           msblk->block_size;
> >> +
> >> +               bsize = read_blocklist(inode, index, &block);
> >> +               if (bsize == 0)
> >> +                       goto skip_pages;
> >> +
> >> +               if (nr_pages < max_pages) {
> >> +                       struct squashfs_cache_entry *buffer;
> >> +                       unsigned int block_mask = max_pages - 1;
> >> +                       int offset = pages[0]->index - (pages[0]->index & ~block_mask);
> >> +
> >> +                       buffer = squashfs_get_datablock(inode->i_sb, block,
> >> +                                                       bsize);
> >> +                       if (buffer->error) {
> >> +                               squashfs_cache_put(buffer);
> >> +                               goto skip_pages;
> >> +                       }
> >> +
> >> +                       expected -= offset * PAGE_SIZE;
> >> +                       for (i = 0; i < nr_pages && expected > 0; i++,
> >> +                                               expected -= PAGE_SIZE, offset++) {
> >> +                               int avail = min_t(int, expected, PAGE_SIZE);
> >> +
> >> +                               squashfs_fill_page(pages[i], buffer,
> >> +                                               offset * PAGE_SIZE, avail);
> >> +                               unlock_page(pages[i]);
> >> +                       }
> >> +
> >> +                       squashfs_cache_put(buffer);
> >> +                       continue;
> >> +               }
> >> +
> >> +               res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
> >> +                                        actor);
> >> +
> >> +               if (res == expected) {
> >> +                       int bytes;
> >> +
> >> +                       /* Last page may have trailing bytes not filled */
> >> +                       bytes = res % PAGE_SIZE;
> >> +                       if (bytes) {
> >> +                               void *pageaddr;
> >> +
> >> +                               pageaddr = kmap_atomic(pages[nr_pages - 1]);
> >> +                               memset(pageaddr + bytes, 0, PAGE_SIZE - bytes);
> >> +                               kunmap_atomic(pageaddr);
> >> +                       }
> >> +
> >> +                       for (i = 0; i < nr_pages; i++) {
> >> +                               flush_dcache_page(pages[i]);
> >> +                               SetPageUptodate(pages[i]);
> >> +                       }
> >> +               }
> >> +
> >> +               for (i = 0; i < nr_pages; i++) {
> >> +                       unlock_page(pages[i]);
> >> +                       put_page(pages[i]);
> >> +               }
> >> +       }
> >> +
> >> +       kfree(actor);
> >> +       kfree(pages);
> >> +       return;
> >> +
> >> +skip_pages:
> >> +       for (i = 0; i < nr_pages; i++) {
> >> +               unlock_page(pages[i]);
> >> +               put_page(pages[i]);
> >> +       }
> >> +
> >> +       kfree(actor);
> >> +out:
> >> +       kfree(pages);
> >> +}
> >>
> >>   const struct address_space_operations squashfs_aops = {
> >> -       .read_folio = squashfs_read_folio
> >> +       .read_folio = squashfs_read_folio,
> >> +       .readahead = squashfs_readahead
> >>   };
> >> --
> >> 2.36.1.255.ge46751e96f-goog
> >>
> >>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/3] squashfs: implement readahead
  2022-06-13  1:36       ` Xiongwei Song
@ 2022-06-13  8:35         ` Hsin-Yi Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Hsin-Yi Wang @ 2022-06-13  8:35 UTC (permalink / raw)
  To: Xiongwei Song
  Cc: Phillip Lougher, Matthew Wilcox, Xiongwei Song, Marek Szyprowski,
	Andrew Morton, Zheng Liang, Zhang Yi, Hou Tao, Miao Xie,
	linux-mm @ kvack . org,
	squashfs-devel @ lists . sourceforge . net,
	Linux Kernel Mailing List

On Mon, Jun 13, 2022 at 9:36 AM Xiongwei Song <sxwjean@gmail.com> wrote:
>
> Hi,
>
> On Fri, Jun 10, 2022 at 3:42 PM Phillip Lougher <phillip@squashfs.org.uk> wrote:
> >
> > On 09/06/2022 15:46, Xiongwei Song wrote:
> > > This version is bad for my test. I ran the test below
> > > "for cnt in $(seq 0 9); do echo 3 > /proc/sys/vm/drop_caches; echo
> > > "Loop ${cnt}:"; time -v find /software/test[0-9][0-9] | xargs -P 24 -i
> > > cat {} > /dev/null 2>/dev/null; echo ""; done"
> > > in 90 partitions.
> > >
> > > With 9eec1d897139 reverted:
> > > 1:06.18 (1m + 6.18s)
> > > 1:05.65
> > > 1:06.34
> > > 1:06.88
> > > 1:06.52
> > > 1:06.78
> > > 1:06.61
> > > 1:06.99
> > > 1:06.60
> > > 1:06.79
> > >
> > > With this version:
> > > 2:36.85 (2m + 36.85s)
> > > 2:28.89
> > > 1:43.46
> > > 1:41.50
> > > 1:42.75
> > > 1:43.46
> > > 1:43.67
> > > 1:44.41
> > > 1:44.91
> > > 1:45.44
> > >
> > > Any thoughts?
> >
> > Thank-you for your latest test results, and they tend to
> > imply that the latest version of the patch hasn't improved
> > performance in your use-case.
> >
> > One thing which is becoming clear here is that the devil is in
> > the detail, and your results being summaries are not capturing
> > enough detail to understand what is happening.  They show
> > something is wrong, but, don't give any guidance as to what
> > is happening.
> >
> > I think it will be difficult to capture more details from
> > your test case.  But, detail can be captured from summaries, by
> > varying the input and extrapolating from the results.
> >
> > By that I mean have you tried changing anything, and observed any
> > changed results?
> >
> > For instance have you tried any of the following
> >
> > 1.  Changing the parallelism of your test from 24 read threads.
> >      Does 1, 2, 4 etc parallel read threads change the observed
> >      behaviour?  In other words is the slow-down observed across
> >      all degrees of parallelism, or is there a critical point.
> >
> > 2. Does the Squashfs parallelism options in the kernel configuration
> >     change the behaviour?  Knowing if the number of "decompressors"
> >     available changes the difference in performance could be important.
> >
> > 3. Are your Squashfs filesystems built using fragments, or without
> >     fragments?  Rebuilding the filesystems without fragments, and
> >     observing any different performance, would help to pinpoint
> >     where the issue lies.
> >
> > 4. What is the block size used in your Squashfs filesystems.  Have
> >     you tried changing the block size, and seen what effect
> >     it has on the difference in performance between the patches?
> >
> > 5. You don't mention where your Squashfs filesystems are stored.
> >     Is this slow media or fast media?  Have you tried moving
> >     the Squashfs filesystems onto different media and observed
> >     any difference in performance between the patches?
> >
>
> Thanks for your response and inputs.  I really appreciated your help.
>  I can try these things  but can't provide the detailed results for
> now because I'm busy with a few things, hence It's hard to focus
> on this one thing for me.
>
> > The fact of the matter is there are many over-lapping factors
> > which affect the performance of Squashfs filesystems (like any
> > reasonably complex code), which may be elsewhere.  It can only
> > take a small change somewhere to have a dramatic affect on
> > performance.
> >
> > This is particularly the case with embedded systems, which
> > may be short on CPU performance, short on RAM, and have low
> > performance media, and be effectively operating on the "edge".
> > It can only take a small change, an update for instance, to
> > change from performing well to badly.
>
> Totally agree.
>
> >
> > I speak from experience, having spent over ten years in embedded
> > Linux as a senior engineer and then as a consultant.  I have
> > my own horror tales as a consultant, dealing with systems pushed
> > beyond the edge (with hacks), and the customer insisting they
> > didn't do anything to cause the system to finally break.
> >
> > Maybe it is off topic here.  But, I remember one instance where
> > a customer had a system out in the field, which "inexplicably"
> > started to lock up every 6 months or so.  This system had regular
> > updates "over the air", and I discovered the "lock up" only
> > started happening after the latest update.  It turns out the new version
> > of the application had grown a new feature which needed more
> > RAM than normal.  This feature wasn't used very often, but,
> > if it coincided with an infrequent "house-keeping" background task,
> > the system ran out of memory and locked up (they had disabled the OOM
> > killer).  This was so rare it might only coincide after six months.  No
> > bug, but a slow growth in working set RAM over a number of versions.
> >
> > In other words we may be looking at a knock-on side effect of
> > readahead, which is either caused by issues elsewhere or is
> > causing issues elsewhere.
> >
> > Dealing with it in isolation, as bug in the readahead code is going
> > to get us nowhere, looking for something that isn't there.
> >
> > I'm not saying that this is the case here.  But, the more detail
> > you can provide, and the more test variants you can provide will
> > help to determine what is the problem.
>
> Thanks for your sharing. I will provide detail later.
>
Hi,

Thanks for testing on v5. I've sent v6 which is based on the series:
https://patchwork.kernel.org/project/linux-mm/cover/20220611032133.5743-1-phillip@squashfs.org.uk/.

v6: https://patchwork.kernel.org/project/linux-mm/cover/20220613082802.1301238-1-hsinyi@chromium.org/

To apply the patch on linux-next, one might have to revert
ca1505bf4805 ("squashfs: implement readahead") and 9d58b94aa73a
("squashfs: always build "file direct" version of page actor") first.


> Regards,
> Xiongwei
>
> >
> > Thanks
> >
> > Phillip
> >
> >
> > >
> > > Regards,
> > > Xiongwei
> > >
> > > On Mon, Jun 6, 2022 at 11:03 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> > >>
> > >> Implement readahead callback for squashfs. It will read datablocks
> > >> which cover pages in readahead request. For a few cases it will
> > >> not mark page as uptodate, including:
> > >> - file end is 0.
> > >> - zero filled blocks.
> > >> - current batch of pages isn't in the same datablock.
> > >> - decompressor error.
> > >> Otherwise pages will be marked as uptodate. The unhandled pages will be
> > >> updated by readpage later.
> > >>
> > >> Suggested-by: Matthew Wilcox <willy@infradead.org>
> > >> Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
> > >> Reported-by: Matthew Wilcox <willy@infradead.org>
> > >> Reported-by: Phillip Lougher <phillip@squashfs.org.uk>
> > >> Reported-by: Xiongwei Song <Xiongwei.Song@windriver.com>
> > >> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > >> Reported-by: Andrew Morton <akpm@linux-foundation.org>
> > >> ---
> > >> v4->v5:
> > >> - Handle short file cases reported by Marek and Matthew.
> > >> - Fix checkpatch error reported by Andrew.
> > >>
> > >> v4: https://lore.kernel.org/lkml/20220601103922.1338320-4-hsinyi@chromium.org/
> > >> v3: https://lore.kernel.org/lkml/20220523065909.883444-4-hsinyi@chromium.org/
> > >> v2: https://lore.kernel.org/lkml/20220517082650.2005840-4-hsinyi@chromium.org/
> > >> v1: https://lore.kernel.org/lkml/20220516105100.1412740-3-hsinyi@chromium.org/
> > >> ---
> > >>   fs/squashfs/file.c | 124 ++++++++++++++++++++++++++++++++++++++++++++-
> > >>   1 file changed, 123 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > >> index a8e495d8eb86..fbd096cd15f4 100644
> > >> --- a/fs/squashfs/file.c
> > >> +++ b/fs/squashfs/file.c
> > >> @@ -39,6 +39,7 @@
> > >>   #include "squashfs_fs_sb.h"
> > >>   #include "squashfs_fs_i.h"
> > >>   #include "squashfs.h"
> > >> +#include "page_actor.h"
> > >>
> > >>   /*
> > >>    * Locate cache slot in range [offset, index] for specified inode.  If
> > >> @@ -495,7 +496,128 @@ static int squashfs_read_folio(struct file *file, struct folio *folio)
> > >>          return 0;
> > >>   }
> > >>
> > >> +static void squashfs_readahead(struct readahead_control *ractl)
> > >> +{
> > >> +       struct inode *inode = ractl->mapping->host;
> > >> +       struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> > >> +       size_t mask = (1UL << msblk->block_log) - 1;
> > >> +       unsigned short shift = msblk->block_log - PAGE_SHIFT;
> > >> +       loff_t start = readahead_pos(ractl) & ~mask;
> > >> +       size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
> > >> +       struct squashfs_page_actor *actor;
> > >> +       unsigned int nr_pages = 0;
> > >> +       struct page **pages;
> > >> +       int i, file_end = i_size_read(inode) >> msblk->block_log;
> > >> +       unsigned int max_pages = 1UL << shift;
> > >> +
> > >> +       readahead_expand(ractl, start, (len | mask) + 1);
> > >> +
> > >> +       if (file_end == 0)
> > >> +               return;
> > >> +
> > >> +       pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
> > >> +       if (!pages)
> > >> +               return;
> > >> +
> > >> +       actor = squashfs_page_actor_init_special(pages, max_pages, 0);
> > >> +       if (!actor)
> > >> +               goto out;
> > >> +
> > >> +       for (;;) {
> > >> +               pgoff_t index;
> > >> +               int res, bsize;
> > >> +               u64 block = 0;
> > >> +               unsigned int expected;
> > >> +
> > >> +               nr_pages = __readahead_batch(ractl, pages, max_pages);
> > >> +               if (!nr_pages)
> > >> +                       break;
> > >> +
> > >> +               if (readahead_pos(ractl) >= i_size_read(inode))
> > >> +                       goto skip_pages;
> > >> +
> > >> +               index = pages[0]->index >> shift;
> > >> +               if ((pages[nr_pages - 1]->index >> shift) != index)
> > >> +                       goto skip_pages;
> > >> +
> > >> +               expected = index == file_end ?
> > >> +                          (i_size_read(inode) & (msblk->block_size - 1)) :
> > >> +                           msblk->block_size;
> > >> +
> > >> +               bsize = read_blocklist(inode, index, &block);
> > >> +               if (bsize == 0)
> > >> +                       goto skip_pages;
> > >> +
> > >> +               if (nr_pages < max_pages) {
> > >> +                       struct squashfs_cache_entry *buffer;
> > >> +                       unsigned int block_mask = max_pages - 1;
> > >> +                       int offset = pages[0]->index - (pages[0]->index & ~block_mask);
> > >> +
> > >> +                       buffer = squashfs_get_datablock(inode->i_sb, block,
> > >> +                                                       bsize);
> > >> +                       if (buffer->error) {
> > >> +                               squashfs_cache_put(buffer);
> > >> +                               goto skip_pages;
> > >> +                       }
> > >> +
> > >> +                       expected -= offset * PAGE_SIZE;
> > >> +                       for (i = 0; i < nr_pages && expected > 0; i++,
> > >> +                                               expected -= PAGE_SIZE, offset++) {
> > >> +                               int avail = min_t(int, expected, PAGE_SIZE);
> > >> +
> > >> +                               squashfs_fill_page(pages[i], buffer,
> > >> +                                               offset * PAGE_SIZE, avail);
> > >> +                               unlock_page(pages[i]);
> > >> +                       }
> > >> +
> > >> +                       squashfs_cache_put(buffer);
> > >> +                       continue;
> > >> +               }
> > >> +
> > >> +               res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
> > >> +                                        actor);
> > >> +
> > >> +               if (res == expected) {
> > >> +                       int bytes;
> > >> +
> > >> +                       /* Last page may have trailing bytes not filled */
> > >> +                       bytes = res % PAGE_SIZE;
> > >> +                       if (bytes) {
> > >> +                               void *pageaddr;
> > >> +
> > >> +                               pageaddr = kmap_atomic(pages[nr_pages - 1]);
> > >> +                               memset(pageaddr + bytes, 0, PAGE_SIZE - bytes);
> > >> +                               kunmap_atomic(pageaddr);
> > >> +                       }
> > >> +
> > >> +                       for (i = 0; i < nr_pages; i++) {
> > >> +                               flush_dcache_page(pages[i]);
> > >> +                               SetPageUptodate(pages[i]);
> > >> +                       }
> > >> +               }
> > >> +
> > >> +               for (i = 0; i < nr_pages; i++) {
> > >> +                       unlock_page(pages[i]);
> > >> +                       put_page(pages[i]);
> > >> +               }
> > >> +       }
> > >> +
> > >> +       kfree(actor);
> > >> +       kfree(pages);
> > >> +       return;
> > >> +
> > >> +skip_pages:
> > >> +       for (i = 0; i < nr_pages; i++) {
> > >> +               unlock_page(pages[i]);
> > >> +               put_page(pages[i]);
> > >> +       }
> > >> +
> > >> +       kfree(actor);
> > >> +out:
> > >> +       kfree(pages);
> > >> +}
> > >>
> > >>   const struct address_space_operations squashfs_aops = {
> > >> -       .read_folio = squashfs_read_folio
> > >> +       .read_folio = squashfs_read_folio,
> > >> +       .readahead = squashfs_readahead
> > >>   };
> > >> --
> > >> 2.36.1.255.ge46751e96f-goog
> > >>
> > >>
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/3] squashfs: implement readahead
  2022-06-10  7:42     ` Phillip Lougher
  2022-06-13  1:36       ` Xiongwei Song
@ 2022-07-15  1:45       ` Xiongwei Song
  2022-07-29  5:22         ` Xiongwei Song
  1 sibling, 1 reply; 16+ messages in thread
From: Xiongwei Song @ 2022-07-15  1:45 UTC (permalink / raw)
  To: Phillip Lougher
  Cc: Hsin-Yi Wang, Matthew Wilcox, Xiongwei Song, Marek Szyprowski,
	Andrew Morton, Zheng Liang, Zhang Yi, Hou Tao, Miao Xie,
	linux-mm @ kvack . org,
	squashfs-devel @ lists . sourceforge . net,
	Linux Kernel Mailing List, xiaohong.qi

Hi Phillip,

Sorry for providing my test info so late.

On Fri, Jun 10, 2022 at 3:42 PM Phillip Lougher <phillip@squashfs.org.uk> wrote:
>
> On 09/06/2022 15:46, Xiongwei Song wrote:
> > This version is bad for my test. I ran the test below
> > "for cnt in $(seq 0 9); do echo 3 > /proc/sys/vm/drop_caches; echo
> > "Loop ${cnt}:"; time -v find /software/test[0-9][0-9] | xargs -P 24 -i
> > cat {} > /dev/null 2>/dev/null; echo ""; done"
> > in 90 partitions.
> >
> > With 9eec1d897139 reverted:
> > 1:06.18 (1m + 6.18s)
> > 1:05.65
> > 1:06.34
> > 1:06.88
> > 1:06.52
> > 1:06.78
> > 1:06.61
> > 1:06.99
> > 1:06.60
> > 1:06.79
> >
> > With this version:
> > 2:36.85 (2m + 36.85s)
> > 2:28.89
> > 1:43.46
> > 1:41.50
> > 1:42.75
> > 1:43.46
> > 1:43.67
> > 1:44.41
> > 1:44.91
> > 1:45.44
> >
> > Any thoughts?
>
> Thank-you for your latest test results, and they tend to
> imply that the latest version of the patch hasn't improved
> performance in your use-case.
>
> One thing which is becoming clear here is that the devil is in
> the detail, and your results being summaries are not capturing
> enough detail to understand what is happening.  They show
> something is wrong, but, don't give any guidance as to what
> is happening.
>
> I think it will be difficult to capture more details from
> your test case.  But, detail can be captured from summaries, by
> varying the input and extrapolating from the results.
>
> By that I mean have you tried changing anything, and observed any
> changed results?
>
> For instance have you tried any of the following
>
> 1.  Changing the parallelism of your test from 24 read threads.
>      Does 1, 2, 4 etc parallel read threads change the observed
>      behaviour?  In other words is the slow-down observed across
>      all degrees of parallelism, or is there a critical point.

Please see the test results below, which are from my colleague Xiaohong Qi:

I test file size from  256KB to 5120KB with thread number
1,2,4,8,16,24,32(run ten times and get it’s average value). The read
performance is shown below. The difference of read performance between
4.18 kernel and 5.10(with squashfs_readahead() patch v7) seems is
caused by the files whose size is litter than 256KB.

                    T1              T2            T4             T8
         T16          T24          T32
All File Size
    4.18         136.8642   100.479    96.5523    96.1569    96.204
 96.0587    96.0519
    5.10-v7    138.474     103.1351  99.9192    99.7091    99.7894
100.2034   100.4447
    Delta        1.6098       2.6561      3.3669      3.5522
3.5854      4.1447      4.3928

Fsize < 256KB
    4.18          21.7949    14.6959    11.639     10.5154    10.14
  10.1092    10.1425
    5.10-v7     23.8629    16.2483    13.1475   12.3697    12.1985
12.8799    13.3292
    Delta         2.068        1.5524      1.5085     1.8543
2.0585     2.7707     3.1867

256KB < Fsize < 512KB
    4.18          11.8042    7.9228     7.6891     7.7924     7.8181
 7.8548     7.8496
    5.10-v7     12.0505    8.2506     8.1557     8.156       8.16
  8.1577     8.1611
    Delta         0.2463      0.3278     0.4666     0.3636     0.3419
  0.3029     0.3115

512KB < Fsize < 1024KB
    4.18           7.7806     5.5496     5.496      5.4912     5.4897
  5.4883     5.6602
    5.10-v7      8.1283     5.8784     5.8486    5.8505     5.8523
5.8511     5.856
    Delta          0.3477     0.3288     0.3526     0.3593     0.3626
  0.3628     0.1958

1024KB < Fsize < 1536KB
    4.18           10.2686    7.5294     7.5012     7.4902     7.4855
  7.4858     7.4851
    5.10-v7      10.5289    7.8486     7.8502     7.8477     7.849
7.8482     7.8542
    Delta          0.2603     0.3192     0.349      0.3575     0.3635
  0.3624     0.3691

1536KB < Fsize < 2048KB
    4.18           5.6439     4.0588     3.9974     3.9946     3.9949
  3.9942     3.9925
    5.10-v7      6.2263     4.6009     4.6062     4.6069     4.6078
4.6074     4.6099
    Delta          0.5824     0.5421     0.6088     0.6123     0.6129
  0.6132     0.6174

2048KB < Fsize < 5120KB
    4.18           34.9166    28.7944    28.7355    28.7192    28.7046
  28.6976    28.69
    5.10-v7      33.8689    27.9726    27.9747    27.9801    27.9849
27.9855    27.9915
    Delta          -1.0477    -0.8218     -0.7608     -0.7391
-0.7197    -0.7121     -0.6985

> 5120KB
    4.18           45.6575    33.8609    33.7512    33.7349    33.7196
  33.7166    33.708
    5.10-v7      45.3494    34.0473    34.0443    34.0692    34.0635
34.0622    34.0599
    Delta          -0.3081     0.1864     0.2931       0.3343
0.3439     0.3456      0.3519

(T1 means test with 1 thread, File size unit: KB, time unit: second,
5.10-v7 means
 we backported squashfs_readahead() v7 patchset on linux 5.10)

The command to test is like:
echo 3 > /proc/sys/vm/drop_caches; sleep 3; time -v find /test/ -type
f -size -256k | xargs -P 32 -i cat {} > /dev/null 2>/dev/null
echo 3 > /proc/sys/vm/drop_caches; sleep 3; time -v find /test/ -type
f -size +256k -size -512k | xargs -P 32 -i cat {} > /dev/null
2>/dev/null

>
> 2. Does the Squashfs parallelism options in the kernel configuration
>     change the behaviour?  Knowing if the number of "decompressors"
>     available changes the difference in performance could be important.

In our ENV, the config SQUASHFS_DECOMP_MULTI_PERCPU is enalbed. There are
12 cpus in our board. We tried to enable CONFIG_SQUASHFS_DECOMP_MULTI and
read files with 2/4/6/8/12/16/24/32 threads, the performance was not
improved and even a bit worse.

>
> 3. Are your Squashfs filesystems built using fragments, or without
>     fragments?  Rebuilding the filesystems without fragments, and
>     observing any different performance, would help to pinpoint
>     where the issue lies.

We didn't use option "-no-fragments" when build the squashfs image.
The steps of build squashfs partition is:
        a. mksquashfs /lib64/ test.squash
        b. lvcreate -L 24M /dev/vg0 -n test -y
        c. dd if=/root/test.squash of=/dev/vg0/test
        d. mount -t squashfs /dev/vg0/test xxx

When using "-no-fragments", the performance is much worse than with
fragments. As you can see, the test files are from /lib64, most of
them are small files.

>
> 4. What is the block size used in your Squashfs filesystems.  Have
>     you tried changing the block size, and seen what effect
>     it has on the difference in performance between the patches?

We configured CONFIG_SQUASHFS_4K_DEVBLK_SIZE to "y", so the blk size
should be 4k. We didn't try other block sizes because we have identical squashfs
configs on 4.18 and 5.10.

>
> 5. You don't mention where your Squashfs filesystems are stored.
>     Is this slow media or fast media?

Please see the disk info we are testing on:
    """
    $ hdparm -I /dev/sda1

    /dev/sda1:
    SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0a 00 00
00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

    ATA device, with non-removable media
    Standards:
    Likely used: 1
    Configuration:
    Logical max current
    cylinders 0 0
    heads 0 0
    sectors/track 0 0
    –
    Logical/Physical Sector size: 512 bytes
    device size with M = 1024*1024: 0 MBytes
    device size with M = 1000*1000: 0 MBytes
    cache/buffer size = unknown
    Capabilities:
    IORDY not likely
    Cannot perform double-word IO
    R/W multiple sector transfer: not supported
    DMA: not supported
    PIO: pio0
    """

>    Have you tried moving
>     the Squashfs filesystems onto different media and observed
>     any difference in performance between the patches?

Sorry, I still didn't get a chance to test on other medias.

>
> The fact of the matter is there are many over-lapping factors
> which affect the performance of squashfs filesystems (like any
> reasonably complex code), which may be elsewhere.  It can only
> take a small change somewhere to have a dramatic affect on
> performance.

We found the performance is improved when running our test after remaking
the partitions with my steps in item 3 above. The following data is the
elapsed times of squashfs_readahead() when reading files before(this status
means we have run the test command many times) and after remaking the
partitions. I captured the data below with ftrace:

Fo 14k file:
Before partition remade     After partition remade:
4352.306 us                      3943.846 us
4321.176 us                      3929.255 us

For 1.8M file:
Before partition remade     After partition remade:
17446.73 us                     16506.58 us
17446.73 us                     16201.32 us
18465.38 us                     17548.96 us
12269.78 us                     11939.09 us
9627.990 us                     9167.052 us

As you can see the elapsed times of squashfs_readahead() got significant
reduction after fresh partitions. We hit same problem on linux 4.18.

By the way, I think my test results that I have ever sent out in v5 thread
is related with if the partitions remade:
https://lore.kernel.org/lkml/20220606150305.1883410-1-hsinyi@chromium.org/T/#m5f3f8386eb8b72a1f63b60be37ea2cc6d03c5f84

>
> This is particularly the case with embedded systems, which
> may be short on CPU performance, short on RAM, and have low
> performance media, and be effectively operating on the "edge".
> It can only take a small change, an update for instance, to
> change from performing well to badly.

  Checked cpu usage it's not over 11%. The RAM is also enough:
              total        used        free      shared  buff/cache   available
Mem:    15837684   531420  11051344      262080     4254920    14858224
Swap:             0

Regards,
Xiongwei


>
> I speak from experience, having spent over ten years in embedded
> Linux as a senior engineer and then as a consultant.  I have
> my own horror tales as a consultant, dealing with systems pushed
> beyond the edge (with hacks), and the customer insisting they
> didn't do anything to cause the system to finally break.
>
> Maybe it is off topic here.  But, I remember one instance where
> a customer had a system out in the field, which "inexplicably"
> started to lock up every 6 months or so.  This system had regular
> updates "over the air", and I discovered the "lock up" only
> started happening after the latest update.  It turns out the new version
> of the application had grown a new feature which needed more
> RAM than normal.  This feature wasn't used very often, but,
> if it coincided with an infrequent "house-keeping" background task,
> the system ran out of memory and locked up (they had disabled the OOM
> killer).  This was so rare it might only coincide after six months.  No
> bug, but a slow growth in working set RAM over a number of versions.
>
> In other words we may be looking at a knock-on side effect of
> readahead, which is either caused by issues elsewhere or is
> causing issues elsewhere.
>
> Dealing with it in isolation, as bug in the readahead code is going
> to get us nowhere, looking for something that isn't there.
>
> I'm not saying that this is the case here.  But, the more detail
> you can provide, and the more test variants you can provide will
> help to determine what is the problem.
>
> Thanks
>
> Phillip
>
>
> >
> > Regards,
> > Xiongwei
> >
> > On Mon, Jun 6, 2022 at 11:03 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> >>
> >> Implement readahead callback for squashfs. It will read datablocks
> >> which cover pages in readahead request. For a few cases it will
> >> not mark page as uptodate, including:
> >> - file end is 0.
> >> - zero filled blocks.
> >> - current batch of pages isn't in the same datablock.
> >> - decompressor error.
> >> Otherwise pages will be marked as uptodate. The unhandled pages will be
> >> updated by readpage later.
> >>
> >> Suggested-by: Matthew Wilcox <willy@infradead.org>
> >> Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
> >> Reported-by: Matthew Wilcox <willy@infradead.org>
> >> Reported-by: Phillip Lougher <phillip@squashfs.org.uk>
> >> Reported-by: Xiongwei Song <Xiongwei.Song@windriver.com>
> >> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >> Reported-by: Andrew Morton <akpm@linux-foundation.org>
> >> ---
> >> v4->v5:
> >> - Handle short file cases reported by Marek and Matthew.
> >> - Fix checkpatch error reported by Andrew.
> >>
> >> v4: https://lore.kernel.org/lkml/20220601103922.1338320-4-hsinyi@chromium.org/
> >> v3: https://lore.kernel.org/lkml/20220523065909.883444-4-hsinyi@chromium.org/
> >> v2: https://lore.kernel.org/lkml/20220517082650.2005840-4-hsinyi@chromium.org/
> >> v1: https://lore.kernel.org/lkml/20220516105100.1412740-3-hsinyi@chromium.org/
> >> ---
> >>   fs/squashfs/file.c | 124 ++++++++++++++++++++++++++++++++++++++++++++-
> >>   1 file changed, 123 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> >> index a8e495d8eb86..fbd096cd15f4 100644
> >> --- a/fs/squashfs/file.c
> >> +++ b/fs/squashfs/file.c
> >> @@ -39,6 +39,7 @@
> >>   #include "squashfs_fs_sb.h"
> >>   #include "squashfs_fs_i.h"
> >>   #include "squashfs.h"
> >> +#include "page_actor.h"
> >>
> >>   /*
> >>    * Locate cache slot in range [offset, index] for specified inode.  If
> >> @@ -495,7 +496,128 @@ static int squashfs_read_folio(struct file *file, struct folio *folio)
> >>          return 0;
> >>   }
> >>
> >> +static void squashfs_readahead(struct readahead_control *ractl)
> >> +{
> >> +       struct inode *inode = ractl->mapping->host;
> >> +       struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> >> +       size_t mask = (1UL << msblk->block_log) - 1;
> >> +       unsigned short shift = msblk->block_log - PAGE_SHIFT;
> >> +       loff_t start = readahead_pos(ractl) & ~mask;
> >> +       size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
> >> +       struct squashfs_page_actor *actor;
> >> +       unsigned int nr_pages = 0;
> >> +       struct page **pages;
> >> +       int i, file_end = i_size_read(inode) >> msblk->block_log;
> >> +       unsigned int max_pages = 1UL << shift;
> >> +
> >> +       readahead_expand(ractl, start, (len | mask) + 1);
> >> +
> >> +       if (file_end == 0)
> >> +               return;
> >> +
> >> +       pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
> >> +       if (!pages)
> >> +               return;
> >> +
> >> +       actor = squashfs_page_actor_init_special(pages, max_pages, 0);
> >> +       if (!actor)
> >> +               goto out;
> >> +
> >> +       for (;;) {
> >> +               pgoff_t index;
> >> +               int res, bsize;
> >> +               u64 block = 0;
> >> +               unsigned int expected;
> >> +
> >> +               nr_pages = __readahead_batch(ractl, pages, max_pages);
> >> +               if (!nr_pages)
> >> +                       break;
> >> +
> >> +               if (readahead_pos(ractl) >= i_size_read(inode))
> >> +                       goto skip_pages;
> >> +
> >> +               index = pages[0]->index >> shift;
> >> +               if ((pages[nr_pages - 1]->index >> shift) != index)
> >> +                       goto skip_pages;
> >> +
> >> +               expected = index == file_end ?
> >> +                          (i_size_read(inode) & (msblk->block_size - 1)) :
> >> +                           msblk->block_size;
> >> +
> >> +               bsize = read_blocklist(inode, index, &block);
> >> +               if (bsize == 0)
> >> +                       goto skip_pages;
> >> +
> >> +               if (nr_pages < max_pages) {
> >> +                       struct squashfs_cache_entry *buffer;
> >> +                       unsigned int block_mask = max_pages - 1;
> >> +                       int offset = pages[0]->index - (pages[0]->index & ~block_mask);
> >> +
> >> +                       buffer = squashfs_get_datablock(inode->i_sb, block,
> >> +                                                       bsize);
> >> +                       if (buffer->error) {
> >> +                               squashfs_cache_put(buffer);
> >> +                               goto skip_pages;
> >> +                       }
> >> +
> >> +                       expected -= offset * PAGE_SIZE;
> >> +                       for (i = 0; i < nr_pages && expected > 0; i++,
> >> +                                               expected -= PAGE_SIZE, offset++) {
> >> +                               int avail = min_t(int, expected, PAGE_SIZE);
> >> +
> >> +                               squashfs_fill_page(pages[i], buffer,
> >> +                                               offset * PAGE_SIZE, avail);
> >> +                               unlock_page(pages[i]);
> >> +                       }
> >> +
> >> +                       squashfs_cache_put(buffer);
> >> +                       continue;
> >> +               }
> >> +
> >> +               res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
> >> +                                        actor);
> >> +
> >> +               if (res == expected) {
> >> +                       int bytes;
> >> +
> >> +                       /* Last page may have trailing bytes not filled */
> >> +                       bytes = res % PAGE_SIZE;
> >> +                       if (bytes) {
> >> +                               void *pageaddr;
> >> +
> >> +                               pageaddr = kmap_atomic(pages[nr_pages - 1]);
> >> +                               memset(pageaddr + bytes, 0, PAGE_SIZE - bytes);
> >> +                               kunmap_atomic(pageaddr);
> >> +                       }
> >> +
> >> +                       for (i = 0; i < nr_pages; i++) {
> >> +                               flush_dcache_page(pages[i]);
> >> +                               SetPageUptodate(pages[i]);
> >> +                       }
> >> +               }
> >> +
> >> +               for (i = 0; i < nr_pages; i++) {
> >> +                       unlock_page(pages[i]);
> >> +                       put_page(pages[i]);
> >> +               }
> >> +       }
> >> +
> >> +       kfree(actor);
> >> +       kfree(pages);
> >> +       return;
> >> +
> >> +skip_pages:
> >> +       for (i = 0; i < nr_pages; i++) {
> >> +               unlock_page(pages[i]);
> >> +               put_page(pages[i]);
> >> +       }
> >> +
> >> +       kfree(actor);
> >> +out:
> >> +       kfree(pages);
> >> +}
> >>
> >>   const struct address_space_operations squashfs_aops = {
> >> -       .read_folio = squashfs_read_folio
> >> +       .read_folio = squashfs_read_folio,
> >> +       .readahead = squashfs_readahead
> >>   };
> >> --
> >> 2.36.1.255.ge46751e96f-goog
> >>
> >>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/3] squashfs: implement readahead
  2022-07-15  1:45       ` Xiongwei Song
@ 2022-07-29  5:22         ` Xiongwei Song
  2022-08-01  4:53           ` Phillip Lougher
  0 siblings, 1 reply; 16+ messages in thread
From: Xiongwei Song @ 2022-07-29  5:22 UTC (permalink / raw)
  To: Phillip Lougher
  Cc: Hsin-Yi Wang, Matthew Wilcox, Xiongwei Song, Marek Szyprowski,
	Andrew Morton, Zheng Liang, Zhang Yi, Hou Tao, Miao Xie,
	linux-mm @ kvack . org,
	squashfs-devel @ lists . sourceforge . net,
	Linux Kernel Mailing List, xiaohong.qi

Hi Phillip,

Gentle ping.

Regards,
Xiongwei

On Fri, Jul 15, 2022 at 9:45 AM Xiongwei Song <sxwjean@gmail.com> wrote:
>
> Hi Phillip,
>
> Sorry for providing my test info so late.
>
> On Fri, Jun 10, 2022 at 3:42 PM Phillip Lougher <phillip@squashfs.org.uk> wrote:
> >
> > On 09/06/2022 15:46, Xiongwei Song wrote:
> > > This version is bad for my test. I ran the test below
> > > "for cnt in $(seq 0 9); do echo 3 > /proc/sys/vm/drop_caches; echo
> > > "Loop ${cnt}:"; time -v find /software/test[0-9][0-9] | xargs -P 24 -i
> > > cat {} > /dev/null 2>/dev/null; echo ""; done"
> > > in 90 partitions.
> > >
> > > With 9eec1d897139 reverted:
> > > 1:06.18 (1m + 6.18s)
> > > 1:05.65
> > > 1:06.34
> > > 1:06.88
> > > 1:06.52
> > > 1:06.78
> > > 1:06.61
> > > 1:06.99
> > > 1:06.60
> > > 1:06.79
> > >
> > > With this version:
> > > 2:36.85 (2m + 36.85s)
> > > 2:28.89
> > > 1:43.46
> > > 1:41.50
> > > 1:42.75
> > > 1:43.46
> > > 1:43.67
> > > 1:44.41
> > > 1:44.91
> > > 1:45.44
> > >
> > > Any thoughts?
> >
> > Thank-you for your latest test results, and they tend to
> > imply that the latest version of the patch hasn't improved
> > performance in your use-case.
> >
> > One thing which is becoming clear here is that the devil is in
> > the detail, and your results being summaries are not capturing
> > enough detail to understand what is happening.  They show
> > something is wrong, but, don't give any guidance as to what
> > is happening.
> >
> > I think it will be difficult to capture more details from
> > your test case.  But, detail can be captured from summaries, by
> > varying the input and extrapolating from the results.
> >
> > By that I mean have you tried changing anything, and observed any
> > changed results?
> >
> > For instance have you tried any of the following
> >
> > 1.  Changing the parallelism of your test from 24 read threads.
> >      Does 1, 2, 4 etc parallel read threads change the observed
> >      behaviour?  In other words is the slow-down observed across
> >      all degrees of parallelism, or is there a critical point.
>
> Please see the test results below, which are from my colleague Xiaohong Qi:
>
> I test file size from  256KB to 5120KB with thread number
> 1,2,4,8,16,24,32(run ten times and get it’s average value). The read
> performance is shown below. The difference of read performance between
> 4.18 kernel and 5.10(with squashfs_readahead() patch v7) seems is
> caused by the files whose size is litter than 256KB.
>
>                     T1              T2            T4             T8
>          T16          T24          T32
> All File Size
>     4.18         136.8642   100.479    96.5523    96.1569    96.204
>  96.0587    96.0519
>     5.10-v7    138.474     103.1351  99.9192    99.7091    99.7894
> 100.2034   100.4447
>     Delta        1.6098       2.6561      3.3669      3.5522
> 3.5854      4.1447      4.3928
>
> Fsize < 256KB
>     4.18          21.7949    14.6959    11.639     10.5154    10.14
>   10.1092    10.1425
>     5.10-v7     23.8629    16.2483    13.1475   12.3697    12.1985
> 12.8799    13.3292
>     Delta         2.068        1.5524      1.5085     1.8543
> 2.0585     2.7707     3.1867
>
> 256KB < Fsize < 512KB
>     4.18          11.8042    7.9228     7.6891     7.7924     7.8181
>  7.8548     7.8496
>     5.10-v7     12.0505    8.2506     8.1557     8.156       8.16
>   8.1577     8.1611
>     Delta         0.2463      0.3278     0.4666     0.3636     0.3419
>   0.3029     0.3115
>
> 512KB < Fsize < 1024KB
>     4.18           7.7806     5.5496     5.496      5.4912     5.4897
>   5.4883     5.6602
>     5.10-v7      8.1283     5.8784     5.8486    5.8505     5.8523
> 5.8511     5.856
>     Delta          0.3477     0.3288     0.3526     0.3593     0.3626
>   0.3628     0.1958
>
> 1024KB < Fsize < 1536KB
>     4.18           10.2686    7.5294     7.5012     7.4902     7.4855
>   7.4858     7.4851
>     5.10-v7      10.5289    7.8486     7.8502     7.8477     7.849
> 7.8482     7.8542
>     Delta          0.2603     0.3192     0.349      0.3575     0.3635
>   0.3624     0.3691
>
> 1536KB < Fsize < 2048KB
>     4.18           5.6439     4.0588     3.9974     3.9946     3.9949
>   3.9942     3.9925
>     5.10-v7      6.2263     4.6009     4.6062     4.6069     4.6078
> 4.6074     4.6099
>     Delta          0.5824     0.5421     0.6088     0.6123     0.6129
>   0.6132     0.6174
>
> 2048KB < Fsize < 5120KB
>     4.18           34.9166    28.7944    28.7355    28.7192    28.7046
>   28.6976    28.69
>     5.10-v7      33.8689    27.9726    27.9747    27.9801    27.9849
> 27.9855    27.9915
>     Delta          -1.0477    -0.8218     -0.7608     -0.7391
> -0.7197    -0.7121     -0.6985
>
> > 5120KB
>     4.18           45.6575    33.8609    33.7512    33.7349    33.7196
>   33.7166    33.708
>     5.10-v7      45.3494    34.0473    34.0443    34.0692    34.0635
> 34.0622    34.0599
>     Delta          -0.3081     0.1864     0.2931       0.3343
> 0.3439     0.3456      0.3519
>
> (T1 means test with 1 thread, File size unit: KB, time unit: second,
> 5.10-v7 means
>  we backported squashfs_readahead() v7 patchset on linux 5.10)
>
> The command to test is like:
> echo 3 > /proc/sys/vm/drop_caches; sleep 3; time -v find /test/ -type
> f -size -256k | xargs -P 32 -i cat {} > /dev/null 2>/dev/null
> echo 3 > /proc/sys/vm/drop_caches; sleep 3; time -v find /test/ -type
> f -size +256k -size -512k | xargs -P 32 -i cat {} > /dev/null
> 2>/dev/null
>
> >
> > 2. Does the Squashfs parallelism options in the kernel configuration
> >     change the behaviour?  Knowing if the number of "decompressors"
> >     available changes the difference in performance could be important.
>
> In our ENV, the config SQUASHFS_DECOMP_MULTI_PERCPU is enalbed. There are
> 12 cpus in our board. We tried to enable CONFIG_SQUASHFS_DECOMP_MULTI and
> read files with 2/4/6/8/12/16/24/32 threads, the performance was not
> improved and even a bit worse.
>
> >
> > 3. Are your Squashfs filesystems built using fragments, or without
> >     fragments?  Rebuilding the filesystems without fragments, and
> >     observing any different performance, would help to pinpoint
> >     where the issue lies.
>
> We didn't use option "-no-fragments" when build the squashfs image.
> The steps of build squashfs partition is:
>         a. mksquashfs /lib64/ test.squash
>         b. lvcreate -L 24M /dev/vg0 -n test -y
>         c. dd if=/root/test.squash of=/dev/vg0/test
>         d. mount -t squashfs /dev/vg0/test xxx
>
> When using "-no-fragments", the performance is much worse than with
> fragments. As you can see, the test files are from /lib64, most of
> them are small files.
>
> >
> > 4. What is the block size used in your Squashfs filesystems.  Have
> >     you tried changing the block size, and seen what effect
> >     it has on the difference in performance between the patches?
>
> We configured CONFIG_SQUASHFS_4K_DEVBLK_SIZE to "y", so the blk size
> should be 4k. We didn't try other block sizes because we have identical squashfs
> configs on 4.18 and 5.10.
>
> >
> > 5. You don't mention where your Squashfs filesystems are stored.
> >     Is this slow media or fast media?
>
> Please see the disk info we are testing on:
>     """
>     $ hdparm -I /dev/sda1
>
>     /dev/sda1:
>     SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0a 00 00
> 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
>     ATA device, with non-removable media
>     Standards:
>     Likely used: 1
>     Configuration:
>     Logical max current
>     cylinders 0 0
>     heads 0 0
>     sectors/track 0 0
>     –
>     Logical/Physical Sector size: 512 bytes
>     device size with M = 1024*1024: 0 MBytes
>     device size with M = 1000*1000: 0 MBytes
>     cache/buffer size = unknown
>     Capabilities:
>     IORDY not likely
>     Cannot perform double-word IO
>     R/W multiple sector transfer: not supported
>     DMA: not supported
>     PIO: pio0
>     """
>
> >    Have you tried moving
> >     the Squashfs filesystems onto different media and observed
> >     any difference in performance between the patches?
>
> Sorry, I still didn't get a chance to test on other medias.
>
> >
> > The fact of the matter is there are many over-lapping factors
> > which affect the performance of squashfs filesystems (like any
> > reasonably complex code), which may be elsewhere.  It can only
> > take a small change somewhere to have a dramatic affect on
> > performance.
>
> We found the performance is improved when running our test after remaking
> the partitions with my steps in item 3 above. The following data is the
> elapsed times of squashfs_readahead() when reading files before(this status
> means we have run the test command many times) and after remaking the
> partitions. I captured the data below with ftrace:
>
> Fo 14k file:
> Before partition remade     After partition remade:
> 4352.306 us                      3943.846 us
> 4321.176 us                      3929.255 us
>
> For 1.8M file:
> Before partition remade     After partition remade:
> 17446.73 us                     16506.58 us
> 17446.73 us                     16201.32 us
> 18465.38 us                     17548.96 us
> 12269.78 us                     11939.09 us
> 9627.990 us                     9167.052 us
>
> As you can see the elapsed times of squashfs_readahead() got significant
> reduction after fresh partitions. We hit same problem on linux 4.18.
>
> By the way, I think my test results that I have ever sent out in v5 thread
> is related with if the partitions remade:
> https://lore.kernel.org/lkml/20220606150305.1883410-1-hsinyi@chromium.org/T/#m5f3f8386eb8b72a1f63b60be37ea2cc6d03c5f84
>
> >
> > This is particularly the case with embedded systems, which
> > may be short on CPU performance, short on RAM, and have low
> > performance media, and be effectively operating on the "edge".
> > It can only take a small change, an update for instance, to
> > change from performing well to badly.
>
>   Checked cpu usage it's not over 11%. The RAM is also enough:
>               total        used        free      shared  buff/cache   available
> Mem:    15837684   531420  11051344      262080     4254920    14858224
> Swap:             0
>
> Regards,
> Xiongwei
>
>
> >
> > I speak from experience, having spent over ten years in embedded
> > Linux as a senior engineer and then as a consultant.  I have
> > my own horror tales as a consultant, dealing with systems pushed
> > beyond the edge (with hacks), and the customer insisting they
> > didn't do anything to cause the system to finally break.
> >
> > Maybe it is off topic here.  But, I remember one instance where
> > a customer had a system out in the field, which "inexplicably"
> > started to lock up every 6 months or so.  This system had regular
> > updates "over the air", and I discovered the "lock up" only
> > started happening after the latest update.  It turns out the new version
> > of the application had grown a new feature which needed more
> > RAM than normal.  This feature wasn't used very often, but,
> > if it coincided with an infrequent "house-keeping" background task,
> > the system ran out of memory and locked up (they had disabled the OOM
> > killer).  This was so rare it might only coincide after six months.  No
> > bug, but a slow growth in working set RAM over a number of versions.
> >
> > In other words we may be looking at a knock-on side effect of
> > readahead, which is either caused by issues elsewhere or is
> > causing issues elsewhere.
> >
> > Dealing with it in isolation, as bug in the readahead code is going
> > to get us nowhere, looking for something that isn't there.
> >
> > I'm not saying that this is the case here.  But, the more detail
> > you can provide, and the more test variants you can provide will
> > help to determine what is the problem.
> >
> > Thanks
> >
> > Phillip
> >
> >
> > >
> > > Regards,
> > > Xiongwei
> > >
> > > On Mon, Jun 6, 2022 at 11:03 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> > >>
> > >> Implement readahead callback for squashfs. It will read datablocks
> > >> which cover pages in readahead request. For a few cases it will
> > >> not mark page as uptodate, including:
> > >> - file end is 0.
> > >> - zero filled blocks.
> > >> - current batch of pages isn't in the same datablock.
> > >> - decompressor error.
> > >> Otherwise pages will be marked as uptodate. The unhandled pages will be
> > >> updated by readpage later.
> > >>
> > >> Suggested-by: Matthew Wilcox <willy@infradead.org>
> > >> Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
> > >> Reported-by: Matthew Wilcox <willy@infradead.org>
> > >> Reported-by: Phillip Lougher <phillip@squashfs.org.uk>
> > >> Reported-by: Xiongwei Song <Xiongwei.Song@windriver.com>
> > >> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > >> Reported-by: Andrew Morton <akpm@linux-foundation.org>
> > >> ---
> > >> v4->v5:
> > >> - Handle short file cases reported by Marek and Matthew.
> > >> - Fix checkpatch error reported by Andrew.
> > >>
> > >> v4: https://lore.kernel.org/lkml/20220601103922.1338320-4-hsinyi@chromium.org/
> > >> v3: https://lore.kernel.org/lkml/20220523065909.883444-4-hsinyi@chromium.org/
> > >> v2: https://lore.kernel.org/lkml/20220517082650.2005840-4-hsinyi@chromium.org/
> > >> v1: https://lore.kernel.org/lkml/20220516105100.1412740-3-hsinyi@chromium.org/
> > >> ---
> > >>   fs/squashfs/file.c | 124 ++++++++++++++++++++++++++++++++++++++++++++-
> > >>   1 file changed, 123 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > >> index a8e495d8eb86..fbd096cd15f4 100644
> > >> --- a/fs/squashfs/file.c
> > >> +++ b/fs/squashfs/file.c
> > >> @@ -39,6 +39,7 @@
> > >>   #include "squashfs_fs_sb.h"
> > >>   #include "squashfs_fs_i.h"
> > >>   #include "squashfs.h"
> > >> +#include "page_actor.h"
> > >>
> > >>   /*
> > >>    * Locate cache slot in range [offset, index] for specified inode.  If
> > >> @@ -495,7 +496,128 @@ static int squashfs_read_folio(struct file *file, struct folio *folio)
> > >>          return 0;
> > >>   }
> > >>
> > >> +static void squashfs_readahead(struct readahead_control *ractl)
> > >> +{
> > >> +       struct inode *inode = ractl->mapping->host;
> > >> +       struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> > >> +       size_t mask = (1UL << msblk->block_log) - 1;
> > >> +       unsigned short shift = msblk->block_log - PAGE_SHIFT;
> > >> +       loff_t start = readahead_pos(ractl) & ~mask;
> > >> +       size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
> > >> +       struct squashfs_page_actor *actor;
> > >> +       unsigned int nr_pages = 0;
> > >> +       struct page **pages;
> > >> +       int i, file_end = i_size_read(inode) >> msblk->block_log;
> > >> +       unsigned int max_pages = 1UL << shift;
> > >> +
> > >> +       readahead_expand(ractl, start, (len | mask) + 1);
> > >> +
> > >> +       if (file_end == 0)
> > >> +               return;
> > >> +
> > >> +       pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
> > >> +       if (!pages)
> > >> +               return;
> > >> +
> > >> +       actor = squashfs_page_actor_init_special(pages, max_pages, 0);
> > >> +       if (!actor)
> > >> +               goto out;
> > >> +
> > >> +       for (;;) {
> > >> +               pgoff_t index;
> > >> +               int res, bsize;
> > >> +               u64 block = 0;
> > >> +               unsigned int expected;
> > >> +
> > >> +               nr_pages = __readahead_batch(ractl, pages, max_pages);
> > >> +               if (!nr_pages)
> > >> +                       break;
> > >> +
> > >> +               if (readahead_pos(ractl) >= i_size_read(inode))
> > >> +                       goto skip_pages;
> > >> +
> > >> +               index = pages[0]->index >> shift;
> > >> +               if ((pages[nr_pages - 1]->index >> shift) != index)
> > >> +                       goto skip_pages;
> > >> +
> > >> +               expected = index == file_end ?
> > >> +                          (i_size_read(inode) & (msblk->block_size - 1)) :
> > >> +                           msblk->block_size;
> > >> +
> > >> +               bsize = read_blocklist(inode, index, &block);
> > >> +               if (bsize == 0)
> > >> +                       goto skip_pages;
> > >> +
> > >> +               if (nr_pages < max_pages) {
> > >> +                       struct squashfs_cache_entry *buffer;
> > >> +                       unsigned int block_mask = max_pages - 1;
> > >> +                       int offset = pages[0]->index - (pages[0]->index & ~block_mask);
> > >> +
> > >> +                       buffer = squashfs_get_datablock(inode->i_sb, block,
> > >> +                                                       bsize);
> > >> +                       if (buffer->error) {
> > >> +                               squashfs_cache_put(buffer);
> > >> +                               goto skip_pages;
> > >> +                       }
> > >> +
> > >> +                       expected -= offset * PAGE_SIZE;
> > >> +                       for (i = 0; i < nr_pages && expected > 0; i++,
> > >> +                                               expected -= PAGE_SIZE, offset++) {
> > >> +                               int avail = min_t(int, expected, PAGE_SIZE);
> > >> +
> > >> +                               squashfs_fill_page(pages[i], buffer,
> > >> +                                               offset * PAGE_SIZE, avail);
> > >> +                               unlock_page(pages[i]);
> > >> +                       }
> > >> +
> > >> +                       squashfs_cache_put(buffer);
> > >> +                       continue;
> > >> +               }
> > >> +
> > >> +               res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
> > >> +                                        actor);
> > >> +
> > >> +               if (res == expected) {
> > >> +                       int bytes;
> > >> +
> > >> +                       /* Last page may have trailing bytes not filled */
> > >> +                       bytes = res % PAGE_SIZE;
> > >> +                       if (bytes) {
> > >> +                               void *pageaddr;
> > >> +
> > >> +                               pageaddr = kmap_atomic(pages[nr_pages - 1]);
> > >> +                               memset(pageaddr + bytes, 0, PAGE_SIZE - bytes);
> > >> +                               kunmap_atomic(pageaddr);
> > >> +                       }
> > >> +
> > >> +                       for (i = 0; i < nr_pages; i++) {
> > >> +                               flush_dcache_page(pages[i]);
> > >> +                               SetPageUptodate(pages[i]);
> > >> +                       }
> > >> +               }
> > >> +
> > >> +               for (i = 0; i < nr_pages; i++) {
> > >> +                       unlock_page(pages[i]);
> > >> +                       put_page(pages[i]);
> > >> +               }
> > >> +       }
> > >> +
> > >> +       kfree(actor);
> > >> +       kfree(pages);
> > >> +       return;
> > >> +
> > >> +skip_pages:
> > >> +       for (i = 0; i < nr_pages; i++) {
> > >> +               unlock_page(pages[i]);
> > >> +               put_page(pages[i]);
> > >> +       }
> > >> +
> > >> +       kfree(actor);
> > >> +out:
> > >> +       kfree(pages);
> > >> +}
> > >>
> > >>   const struct address_space_operations squashfs_aops = {
> > >> -       .read_folio = squashfs_read_folio
> > >> +       .read_folio = squashfs_read_folio,
> > >> +       .readahead = squashfs_readahead
> > >>   };
> > >> --
> > >> 2.36.1.255.ge46751e96f-goog
> > >>
> > >>
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/3] squashfs: implement readahead
  2022-07-29  5:22         ` Xiongwei Song
@ 2022-08-01  4:53           ` Phillip Lougher
  0 siblings, 0 replies; 16+ messages in thread
From: Phillip Lougher @ 2022-08-01  4:53 UTC (permalink / raw)
  To: Xiongwei Song
  Cc: Hsin-Yi Wang, Matthew Wilcox, Xiongwei Song, Marek Szyprowski,
	Andrew Morton, Zheng Liang, Zhang Yi, Hou Tao, Miao Xie,
	linux-mm @ kvack . org,
	squashfs-devel @ lists . sourceforge . net,
	Linux Kernel Mailing List, xiaohong.qi

On 29/07/2022 06:22, Xiongwei Song wrote:
 > Hi Phillip,
 >
 > Gentle ping.
 >
 > Regards,
 > Xiongwei
 >
 > On Fri, Jul 15, 2022 at 9:45 AM Xiongwei Song <sxwjean@gmail.com> wrote:
 >>
 >> Please see the test results below, which are from my colleague 
Xiaohong Qi:
 >>
 >> I test file size from  256KB to 5120KB with thread number
 >> 1,2,4,8,16,24,32(run ten times and get it’s average value). The read
 >> performance is shown below. The difference of read performance between
 >> 4.18 kernel and 5.10(with squashfs_readahead() patch v7) seems is
 >> caused by the files whose size is litter than 256KB.
 >>
 >>                      T1              T2            T4             T8
 >>           T16          T24          T32
 >> All File Size
 >>      4.18         136.8642   100.479    96.5523    96.1569    96.204
 >>   96.0587    96.0519
 >>      5.10-v7    138.474     103.1351  99.9192    99.7091    99.7894
 >> 100.2034   100.4447
 >>      Delta        1.6098       2.6561      3.3669      3.5522
 >> 3.5854      4.1447      4.3928

To clarify what was mentioned later in the email - these results were
obtained using SQUASHFS_DECOMP_MULTI_PERCPU, on a 12 core system?

If so these results are unexpected.  There is very little extra
parallelism shown when increasing the threads.  There is about
a 36% increase in performance moving from 1 thread to 2 threads, which
is about what I expected, but from there on there is almost no
parellelism improvement, even though you should have 12 available
Squashfs decompressors.

This is the results I get on a rather old 4-core X86_64 system using
virtualisation, off SSD with a Squashfs filesystem created from a set of
Linux kernel repositories and distro root filesystems.  So a lot of 
small files and some larger files.

************************
1 Thread

real    8m4.435s
user    4m1.401s
sys     2m57.680s

2 Threads

real    5m16.647s
user    3m16.984s
sys     2m35.655s

4 Threads

real    3m46.047s
user    2m58.669s
sys     2m20.193s

8 Threads

real    3m0.239s
user    2m41.253s
sys     2m27.935s

16 Threads

real    2m38.329s
user    2m34.478s
sys     2m26.303s
***************************

This is the behaviour I would expect to see, a steadily decreasing
overall clock time, as more threads in parallel mean more Squashfs
decompressors are used.  Due to user-space overheads and context
switching, you will generally expect to see a decreasing clock
time even after the number of threads is more than the number of cores
available.  The rule of thumb is always to use at least double the number
of real cores.

As such your results are confusing, because they max out after only 2
parallel threads.

This may indicate there is something wrong somewhere in your system,
where I/O is bottlenecking early, or it cannot accomodate multiple
parallel reads and it is locking reads out.

These results remind me of the old days using rotating media, where
there was an expensive disk head SEEK to data blocks.  Trying to
read multiple files simultaneously was often self-defeating because the
extra SEEK time swallowed up any parallelism improvements, leading to
negligible, flat and decreasing performance improvement as more threads
were added.

Of course I doubt seek time is involved here, but, a lot of things
can emulate seek time, such as a constant unexpected cost.

As this effect is observed with the "original" Squashfs, this is going
to be external to Squashfs, and unrelated to the readhead patches.

 >>
 >> Fsize < 256KB
 >>      4.18          21.7949    14.6959    11.639     10.5154    10.14
 >>    10.1092    10.1425
 >>      5.10-v7     23.8629    16.2483    13.1475   12.3697    12.1985
 >> 12.8799    13.3292
 >>      Delta         2.068        1.5524      1.5085     1.8543
 >> 2.0585     2.7707     3.1867
 >>

This appears to show the readhead patch is performing much worse with
files less than 256KB, than larger files.  Which would indicate a
problem with the readahead patch.

But, this may be a symptom of whatever is causing your general
lack of parallelism. i.e. external to Squashfs.    When read sizes
are small, any extra fixed costs loom large in the result because
they are a significant proportion of the overall cost.  When
read sizes are large, any extra fixed costs are a small proportion
of the overall cost and show up marginally or not at all in the results.

In otherwords, there is already a suspicion there are some unexpected
fixed costs to doing I/O, which results in poor parallel performance.
These fixed costs if they are worse on the later kernel, will show
up here where read sizes are small, and may not show up elsewhere.

I have instrumented and profiled the readahead patches on a large
number of workloads, with various degrees of parallelism and I have
not experienced any unexpected regressions in performance as reported
here on small files.

This is not to say there isn't an undiscovered issue with the
readahead patch, but, I have to say the evidence more points to an
issue with your system rather than the readahead patch.

What I would do here is first investigate why you apear to have
poor parallel I/O scaling.

Phillip



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-08-01  4:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-06 15:03 [PATCH v5 0/3] Implement readahead for squashfs Hsin-Yi Wang
2022-06-06 15:03 ` [PATCH v5 1/3] Revert "squashfs: provide backing_dev_info in order to disable read-ahead" Hsin-Yi Wang
2022-06-06 15:03 ` [PATCH v5 2/3] squashfs: always build "file direct" version of page actor Hsin-Yi Wang
2022-06-06 15:03 ` [PATCH v5 3/3] squashfs: implement readahead Hsin-Yi Wang
2022-06-07  3:59   ` Phillip Lougher
2022-06-07 19:29   ` Fabio M. De Francesco
2022-06-08 10:20     ` Hsin-Yi Wang
2022-06-09 14:46   ` Xiongwei Song
2022-06-10  1:32     ` Xiongwei Song
2022-06-10  7:42     ` Phillip Lougher
2022-06-13  1:36       ` Xiongwei Song
2022-06-13  8:35         ` Hsin-Yi Wang
2022-07-15  1:45       ` Xiongwei Song
2022-07-29  5:22         ` Xiongwei Song
2022-08-01  4:53           ` Phillip Lougher
2022-06-11  5:23   ` Phillip Lougher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).