All of lore.kernel.org
 help / color / mirror / Atom feed
* squashfs performance regression and readahea
@ 2022-05-08 14:46 Song, Xiongwei
  2022-05-08 16:44 ` Matthew Wilcox
  0 siblings, 1 reply; 34+ messages in thread
From: Song, Xiongwei @ 2022-05-08 14:46 UTC (permalink / raw)
  To: phillip, willy; +Cc: linux-mm, squashfs-devel

Hello,

I am facing a performance regression on squashfs.  There are many squashfs 
partitions on our board. I am doing the operations below on 90 squashfs
partitions:

"for cnt in $(seq 0 9); do echo 3 > /proc/sys/vm/drop_caches; echo "Loop ${cnt}:"; time -v find /squashfs/part[0-9][0-9] | xargs -P 24 -i cat {} > /dev/null 2>/dev/null; echo ""; done"

On linux 4.18, I got the elapsed time statistics below with command above(run 
find/xargs/cat commands 10 times):
1:22.80 (1m + 22.80s)
0:59.76
1:01.43
1:02.48
1:03.03
1:02.92
1:03.19
1:03.22
1:03.26
1:03.14

On linux 5.10, huge performance regression:
5:48.69 (5m + 48.69s)
5:52.99
6:06.30
6:01.43
5:50.08
6:26.59
6:09.98
6:04.72
6:05.21
6:21.49

By "git bisect", I found this regression is related to readahead.  After reverting 
c1f6925e1091 ("mm: put readahead pages in cache earlier")  and
8151b4c8bee4 ("mm: add readahead address space operation") on linux 5.10,
the performance is improved:
1:37.16 (1m + 37.16s)
1:04.18
1:05.28
1:06.07
1:06.31
1:06.58
1:06.80
1:06.79
1:06.95
1:06.61

Also, I found disabling readahead is helpful with 9eec1d897139 ("squashfs: provide 
backing_dev_info in order to disable read-ahead"):
1:06.18 (1m + 6.18s)
1:05.65
1:06.34
1:06.88
1:06.52
1:06.78
1:06.61
1:06.99
1:06.60
1:06.79

I have also tired with the upstream linux 5.18, see the results below:
1:12.82 (1m + 12.82s)
1:07.68
1:08.94
1:09.65
1:09.87
1:10.32
1:10.47
1:10.34
1:10.24
1:10.34
 
As we can see that even if the readahead disabled, there is still extra 2 ~ 3s overhead
than linux 4.18. BTW, the reverted two commits above are from " Change readahead API "
series, see the following link:
https://lore.kernel.org/all/20200414150233.24495-11-willy@infradead.org/T/#m22d6de881c24057b776758ae8e7f5d54e2db8026
.

I would appreciate your comments and inputs.

Regards,
Xiognwei


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-08 14:46 squashfs performance regression and readahea Song, Xiongwei
@ 2022-05-08 16:44 ` Matthew Wilcox
  2022-05-09  7:05   ` Hsin-Yi Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Matthew Wilcox @ 2022-05-08 16:44 UTC (permalink / raw)
  To: Song, Xiongwei; +Cc: phillip, linux-mm, squashfs-devel, Hsin-Yi Wang

On Sun, May 08, 2022 at 02:46:40PM +0000, Song, Xiongwei wrote:
> I am facing a performance regression on squashfs.  There are many squashfs 
> partitions on our board. I am doing the operations below on 90 squashfs
> partitions:

Hi Xiongwei.  Thanks for the report.  Hsin-Ye and I have been working on
this problem off-list.  Hsin-Ye, could you send the latest version?


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-08 16:44 ` Matthew Wilcox
@ 2022-05-09  7:05   ` Hsin-Yi Wang
  2022-05-09  7:10     ` Hsin-Yi Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Hsin-Yi Wang @ 2022-05-09  7:05 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Song, Xiongwei, phillip, linux-mm, squashfs-devel

On Mon, May 9, 2022 at 12:45 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sun, May 08, 2022 at 02:46:40PM +0000, Song, Xiongwei wrote:
> > I am facing a performance regression on squashfs.  There are many squashfs
> > partitions on our board. I am doing the operations below on 90 squashfs
> > partitions:
>
> Hi Xiongwei.  Thanks for the report.  Hsin-Ye and I have been working on
> this problem off-list.  Hsin-Ye, could you send the latest version?

Hi Matthew,

This is the patch of the latest version. Since it's just being
reviewed halfway done, I posted the patch here, or should I send it to
the list directly?

Thanks

From 03558dcaab93ed3c32498eb70c7f2b1382b218d6 Mon Sep 17 00:00:00 2001
From: Hsin-Yi Wang <hsinyi@chromium.org>
Date: Sun, 10 Oct 2021 21:22:25 +0800
Subject: [PATCH] squashfs: implement readahead

Implement readahead callback for squashfs. It will read datablocks
which cover pages in readahead request. For a few cases it will
not mark page as uptodate, including:
- file end is 0.
- current batch of pages isn't in the same datablock or not enough in a
  datablock.
Otherwise pages will be marked as uptodate. The unhandled pages will be
updated by readpage later.

Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
---
 fs/squashfs/file.c | 72 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 71 insertions(+), 1 deletion(-)

diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
index 89d492916dea..20ec48cf97c5 100644
--- a/fs/squashfs/file.c
+++ b/fs/squashfs/file.c
@@ -39,6 +39,7 @@
 #include "squashfs_fs_sb.h"
 #include "squashfs_fs_i.h"
 #include "squashfs.h"
+#include "page_actor.h"

 /*
  * Locate cache slot in range [offset, index] for specified inode.  If
@@ -494,7 +495,76 @@ static int squashfs_readpage(struct file *file,
struct page *page)
  return 0;
 }

+static void squashfs_readahead(struct readahead_control *ractl)
+{
+ struct inode *inode = ractl->mapping->host;
+ struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
+ size_t mask = (1UL << msblk->block_log) - 1;
+ size_t shift = msblk->block_log - PAGE_SHIFT;
+ loff_t req_end = readahead_pos(ractl) + readahead_length(ractl);
+ loff_t start = readahead_pos(ractl) &~ mask;
+ size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
+ struct squashfs_page_actor *actor;
+ unsigned int nr_pages = 0;
+ struct page **pages;
+ u64 block = 0;
+ int bsize, res, i, index;
+ int file_end = i_size_read(inode) >> msblk->block_log;
+ int max_pages = 1UL << shift;
+
+ readahead_expand(ractl, start, (len | mask) + 1);
+
+ if (readahead_pos(ractl) + readahead_length(ractl) < req_end)
+ return;
+
+ nr_pages = readahead_count(ractl);
+ pages = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL);
+ if (!pages)
+ return;
+
+ actor = squashfs_page_actor_init_special(pages, nr_pages, 0);
+ if (!actor)
+ goto out;
+
+ for (;;) {
+ nr_pages = __readahead_batch(ractl, pages, max_pages);
+ if (!nr_pages)
+ break;
+
+ if (readahead_pos(ractl) >= i_size_read(inode) ||
+     file_end == 0 || nr_pages < max_pages)
+ goto skip_pages;
+
+ index = pages[0]->index >> shift;
+ bsize = read_blocklist(inode, index, &block);
+ res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
+ actor);
+
+ if (res >=0 && (pages[nr_pages - 1]->index >> shift) == index)
+ for (i = 0; i < nr_pages; i++)
+ SetPageUptodate(pages[i]);
+
+ for (i = 0; i < nr_pages; i++) {
+ unlock_page(pages[i]);
+ put_page(pages[i]);
+ }
+ }
+
+ kfree(actor);
+ return;
+
+skip_pages:
+ for (i = 0; i < nr_pages; i++) {
+ unlock_page(pages[i]);
+ put_page(pages[i]);
+ }
+
+ kfree(actor);
+out:
+ kfree(pages);
+}

 const struct address_space_operations squashfs_aops = {
- .readpage = squashfs_readpage
+ .readpage = squashfs_readpage,
+ .readahead = squashfs_readahead
 };
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-09  7:05   ` Hsin-Yi Wang
@ 2022-05-09  7:10     ` Hsin-Yi Wang
  2022-05-09  9:42       ` Song, Xiongwei
  2022-05-09 12:43       ` Xiongwei Song
  0 siblings, 2 replies; 34+ messages in thread
From: Hsin-Yi Wang @ 2022-05-09  7:10 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Song, Xiongwei, phillip, linux-mm, squashfs-devel

[-- Attachment #1: Type: text/plain, Size: 4189 bytes --]

On Mon, May 9, 2022 at 3:05 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
>
> On Mon, May 9, 2022 at 12:45 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Sun, May 08, 2022 at 02:46:40PM +0000, Song, Xiongwei wrote:
> > > I am facing a performance regression on squashfs.  There are many squashfs
> > > partitions on our board. I am doing the operations below on 90 squashfs
> > > partitions:
> >
> > Hi Xiongwei.  Thanks for the report.  Hsin-Ye and I have been working on
> > this problem off-list.  Hsin-Ye, could you send the latest version?
>
> Hi Matthew,
>
> This is the patch of the latest version. Since it's just being
> reviewed halfway done, I posted the patch here, or should I send it to
> the list directly?
>
The indent seems broken when I pasted it here. Resend in the attachment.

> Thanks
>
> From 03558dcaab93ed3c32498eb70c7f2b1382b218d6 Mon Sep 17 00:00:00 2001
> From: Hsin-Yi Wang <hsinyi@chromium.org>
> Date: Sun, 10 Oct 2021 21:22:25 +0800
> Subject: [PATCH] squashfs: implement readahead
>
> Implement readahead callback for squashfs. It will read datablocks
> which cover pages in readahead request. For a few cases it will
> not mark page as uptodate, including:
> - file end is 0.
> - current batch of pages isn't in the same datablock or not enough in a
>   datablock.
> Otherwise pages will be marked as uptodate. The unhandled pages will be
> updated by readpage later.
>
> Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
> ---
>  fs/squashfs/file.c | 72 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 71 insertions(+), 1 deletion(-)
>
> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> index 89d492916dea..20ec48cf97c5 100644
> --- a/fs/squashfs/file.c
> +++ b/fs/squashfs/file.c
> @@ -39,6 +39,7 @@
>  #include "squashfs_fs_sb.h"
>  #include "squashfs_fs_i.h"
>  #include "squashfs.h"
> +#include "page_actor.h"
>
>  /*
>   * Locate cache slot in range [offset, index] for specified inode.  If
> @@ -494,7 +495,76 @@ static int squashfs_readpage(struct file *file,
> struct page *page)
>   return 0;
>  }
>
> +static void squashfs_readahead(struct readahead_control *ractl)
> +{
> + struct inode *inode = ractl->mapping->host;
> + struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> + size_t mask = (1UL << msblk->block_log) - 1;
> + size_t shift = msblk->block_log - PAGE_SHIFT;
> + loff_t req_end = readahead_pos(ractl) + readahead_length(ractl);
> + loff_t start = readahead_pos(ractl) &~ mask;
> + size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
> + struct squashfs_page_actor *actor;
> + unsigned int nr_pages = 0;
> + struct page **pages;
> + u64 block = 0;
> + int bsize, res, i, index;
> + int file_end = i_size_read(inode) >> msblk->block_log;
> + int max_pages = 1UL << shift;
> +
> + readahead_expand(ractl, start, (len | mask) + 1);
> +
> + if (readahead_pos(ractl) + readahead_length(ractl) < req_end)
> + return;
> +
> + nr_pages = readahead_count(ractl);
> + pages = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL);
> + if (!pages)
> + return;
> +
> + actor = squashfs_page_actor_init_special(pages, nr_pages, 0);
> + if (!actor)
> + goto out;
> +
> + for (;;) {
> + nr_pages = __readahead_batch(ractl, pages, max_pages);
> + if (!nr_pages)
> + break;
> +
> + if (readahead_pos(ractl) >= i_size_read(inode) ||
> +     file_end == 0 || nr_pages < max_pages)
> + goto skip_pages;
> +
> + index = pages[0]->index >> shift;
> + bsize = read_blocklist(inode, index, &block);
> + res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
> + actor);
> +
> + if (res >=0 && (pages[nr_pages - 1]->index >> shift) == index)
> + for (i = 0; i < nr_pages; i++)
> + SetPageUptodate(pages[i]);
> +
> + for (i = 0; i < nr_pages; i++) {
> + unlock_page(pages[i]);
> + put_page(pages[i]);
> + }
> + }
> +
> + kfree(actor);
> + return;
> +
> +skip_pages:
> + for (i = 0; i < nr_pages; i++) {
> + unlock_page(pages[i]);
> + put_page(pages[i]);
> + }
> +
> + kfree(actor);
> +out:
> + kfree(pages);
> +}
>
>  const struct address_space_operations squashfs_aops = {
> - .readpage = squashfs_readpage
> + .readpage = squashfs_readpage,
> + .readahead = squashfs_readahead
>  };
> --
> 2.31.0

[-- Attachment #2: 0001-WIP-squashfs-implement-readahead.patch --]
[-- Type: text/x-patch, Size: 3190 bytes --]

From 03558dcaab93ed3c32498eb70c7f2b1382b218d6 Mon Sep 17 00:00:00 2001
From: Hsin-Yi Wang <hsinyi@chromium.org>
Date: Sun, 10 Oct 2021 21:22:25 +0800
Subject: [PATCH] squashfs: implement readahead

Implement readahead callback for squashfs. It will read datablocks
which cover pages in readahead request. For a few cases it will
not mark page as uptodate, including:
- file end is 0.
- current batch of pages isn't in the same datablock or not enough in a
  datablock.
Otherwise pages will be marked as uptodate. The unhandled pages will be
updated by readpage later.

Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
---
 fs/squashfs/file.c | 72 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 71 insertions(+), 1 deletion(-)

diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
index 89d492916dea..20ec48cf97c5 100644
--- a/fs/squashfs/file.c
+++ b/fs/squashfs/file.c
@@ -39,6 +39,7 @@
 #include "squashfs_fs_sb.h"
 #include "squashfs_fs_i.h"
 #include "squashfs.h"
+#include "page_actor.h"
 
 /*
  * Locate cache slot in range [offset, index] for specified inode.  If
@@ -494,7 +495,76 @@ static int squashfs_readpage(struct file *file, struct page *page)
 	return 0;
 }
 
+static void squashfs_readahead(struct readahead_control *ractl)
+{
+	struct inode *inode = ractl->mapping->host;
+	struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
+	size_t mask = (1UL << msblk->block_log) - 1;
+	size_t shift = msblk->block_log - PAGE_SHIFT;
+	loff_t req_end = readahead_pos(ractl) + readahead_length(ractl);
+	loff_t start = readahead_pos(ractl) &~ mask;
+	size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
+	struct squashfs_page_actor *actor;
+	unsigned int nr_pages = 0;
+	struct page **pages;
+	u64 block = 0;
+	int bsize, res, i, index;
+	int file_end = i_size_read(inode) >> msblk->block_log;
+	int max_pages = 1UL << shift;
+
+	readahead_expand(ractl, start, (len | mask) + 1);
+
+	if (readahead_pos(ractl) + readahead_length(ractl) < req_end)
+		return;
+
+	nr_pages = readahead_count(ractl);
+	pages = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL);
+	if (!pages)
+		return;
+
+	actor = squashfs_page_actor_init_special(pages, nr_pages, 0);
+	if (!actor)
+		goto out;
+
+	for (;;) {
+		nr_pages = __readahead_batch(ractl, pages, max_pages);
+		if (!nr_pages)
+			break;
+
+		if (readahead_pos(ractl) >= i_size_read(inode) ||
+		    file_end == 0 || nr_pages < max_pages)
+			goto skip_pages;
+
+		index = pages[0]->index >> shift;
+		bsize = read_blocklist(inode, index, &block);
+		res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
+					 actor);
+
+		if (res >=0 && (pages[nr_pages - 1]->index >> shift) == index)
+			for (i = 0; i < nr_pages; i++)
+				SetPageUptodate(pages[i]);
+
+		for (i = 0; i < nr_pages; i++) {
+			unlock_page(pages[i]);
+			put_page(pages[i]);
+		}
+	}
+
+	kfree(actor);
+	return;
+
+skip_pages:
+	for (i = 0; i < nr_pages; i++) {
+		unlock_page(pages[i]);
+		put_page(pages[i]);
+	}
+
+	kfree(actor);
+out:
+	kfree(pages);
+}
 
 const struct address_space_operations squashfs_aops = {
-	.readpage = squashfs_readpage
+	.readpage = squashfs_readpage,
+	.readahead = squashfs_readahead
 };
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* RE: squashfs performance regression and readahea
  2022-05-09  7:10     ` Hsin-Yi Wang
@ 2022-05-09  9:42       ` Song, Xiongwei
  2022-05-09 12:43       ` Xiongwei Song
  1 sibling, 0 replies; 34+ messages in thread
From: Song, Xiongwei @ 2022-05-09  9:42 UTC (permalink / raw)
  To: Hsin-Yi Wang, Matthew Wilcox; +Cc: phillip, linux-mm, squashfs-devel

Thanks Matthew and Hsin-Yi.  I will test the patch by EOD.

Regards,
Xiongwei

-----Original Message-----
From: Hsin-Yi Wang <hsinyi@chromium.org> 
Sent: Monday, May 9, 2022 3:11 PM
To: Matthew Wilcox <willy@infradead.org>
Cc: Song, Xiongwei <Xiongwei.Song@windriver.com>; phillip@squashfs.org.uk; linux-mm@kvack.org; squashfs-devel@lists.sourceforge.net
Subject: Re: squashfs performance regression and readahea

[Please note: This e-mail is from an EXTERNAL e-mail address]

On Mon, May 9, 2022 at 3:05 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
>
> On Mon, May 9, 2022 at 12:45 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Sun, May 08, 2022 at 02:46:40PM +0000, Song, Xiongwei wrote:
> > > I am facing a performance regression on squashfs.  There are many 
> > > squashfs partitions on our board. I am doing the operations below 
> > > on 90 squashfs
> > > partitions:
> >
> > Hi Xiongwei.  Thanks for the report.  Hsin-Ye and I have been 
> > working on this problem off-list.  Hsin-Ye, could you send the latest version?
>
> Hi Matthew,
>
> This is the patch of the latest version. Since it's just being 
> reviewed halfway done, I posted the patch here, or should I send it to 
> the list directly?
>
The indent seems broken when I pasted it here. Resend in the attachment.

> Thanks
>
> From 03558dcaab93ed3c32498eb70c7f2b1382b218d6 Mon Sep 17 00:00:00 2001
> From: Hsin-Yi Wang <hsinyi@chromium.org>
> Date: Sun, 10 Oct 2021 21:22:25 +0800
> Subject: [PATCH] squashfs: implement readahead
>
> Implement readahead callback for squashfs. It will read datablocks 
> which cover pages in readahead request. For a few cases it will not 
> mark page as uptodate, including:
> - file end is 0.
> - current batch of pages isn't in the same datablock or not enough in a
>   datablock.
> Otherwise pages will be marked as uptodate. The unhandled pages will 
> be updated by readpage later.
>
> Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
> ---
>  fs/squashfs/file.c | 72 
> +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 71 insertions(+), 1 deletion(-)
>
> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c index 
> 89d492916dea..20ec48cf97c5 100644
> --- a/fs/squashfs/file.c
> +++ b/fs/squashfs/file.c
> @@ -39,6 +39,7 @@
>  #include "squashfs_fs_sb.h"
>  #include "squashfs_fs_i.h"
>  #include "squashfs.h"
> +#include "page_actor.h"
>
>  /*
>   * Locate cache slot in range [offset, index] for specified inode.  
> If @@ -494,7 +495,76 @@ static int squashfs_readpage(struct file 
> *file, struct page *page)
>   return 0;
>  }
>
> +static void squashfs_readahead(struct readahead_control *ractl) {  
> +struct inode *inode = ractl->mapping->host;  struct squashfs_sb_info 
> +*msblk = inode->i_sb->s_fs_info;  size_t mask = (1UL << 
> +msblk->block_log) - 1;  size_t shift = msblk->block_log - PAGE_SHIFT;  
> +loff_t req_end = readahead_pos(ractl) + readahead_length(ractl);  
> +loff_t start = readahead_pos(ractl) &~ mask;  size_t len = 
> +readahead_length(ractl) + readahead_pos(ractl) - start;  struct 
> +squashfs_page_actor *actor;  unsigned int nr_pages = 0;  struct page 
> +**pages;
> + u64 block = 0;
> + int bsize, res, i, index;
> + int file_end = i_size_read(inode) >> msblk->block_log;  int 
> +max_pages = 1UL << shift;
> +
> + readahead_expand(ractl, start, (len | mask) + 1);
> +
> + if (readahead_pos(ractl) + readahead_length(ractl) < req_end) 
> + return;
> +
> + nr_pages = readahead_count(ractl);
> + pages = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL); if 
> + (!pages) return;
> +
> + actor = squashfs_page_actor_init_special(pages, nr_pages, 0); if 
> + (!actor) goto out;
> +
> + for (;;) {
> + nr_pages = __readahead_batch(ractl, pages, max_pages); if 
> + (!nr_pages) break;
> +
> + if (readahead_pos(ractl) >= i_size_read(inode) ||
> +     file_end == 0 || nr_pages < max_pages) goto skip_pages;
> +
> + index = pages[0]->index >> shift;
> + bsize = read_blocklist(inode, index, &block); res = 
> + squashfs_read_data(inode->i_sb, block, bsize, NULL, actor);
> +
> + if (res >=0 && (pages[nr_pages - 1]->index >> shift) == index) for 
> + (i = 0; i < nr_pages; i++) SetPageUptodate(pages[i]);
> +
> + for (i = 0; i < nr_pages; i++) {
> + unlock_page(pages[i]);
> + put_page(pages[i]);
> + }
> + }
> +
> + kfree(actor);
> + return;
> +
> +skip_pages:
> + for (i = 0; i < nr_pages; i++) {
> + unlock_page(pages[i]);
> + put_page(pages[i]);
> + }
> +
> + kfree(actor);
> +out:
> + kfree(pages);
> +}
>
>  const struct address_space_operations squashfs_aops = {
> - .readpage = squashfs_readpage
> + .readpage = squashfs_readpage,
> + .readahead = squashfs_readahead
>  };
> --
> 2.31.0

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-09  7:10     ` Hsin-Yi Wang
  2022-05-09  9:42       ` Song, Xiongwei
@ 2022-05-09 12:43       ` Xiongwei Song
  2022-05-09 13:21         ` Matthew Wilcox
  1 sibling, 1 reply; 34+ messages in thread
From: Xiongwei Song @ 2022-05-09 12:43 UTC (permalink / raw)
  To: Hsin-Yi Wang
  Cc: Matthew Wilcox, Song, Xiongwei, phillip, linux-mm, squashfs-devel

Hi Hsin-Yi and Matthew,

With the patch from the attachment on linux 5.10, ran the command as I
mentioned earlier,
got the results below:
1:40.65 (1m + 40.65s)
1:10.12
1:11.10
1:11.47
1:11.59
1:11.94
1:11.86
1:12.04
1:12.21
1:12.06

The performance has improved obviously, but compared to linux 4.18, the
performance is not so good.

Moreover, I wanted to test on linux 5.18. But I think I should revert
9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
read-ahead"),
right?  Otherwise, the patch doesn't work?

Regards,
Xiongwei

On Mon, May 9, 2022 at 3:11 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
>
> On Mon, May 9, 2022 at 3:05 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> >
> > On Mon, May 9, 2022 at 12:45 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Sun, May 08, 2022 at 02:46:40PM +0000, Song, Xiongwei wrote:
> > > > I am facing a performance regression on squashfs.  There are many squashfs
> > > > partitions on our board. I am doing the operations below on 90 squashfs
> > > > partitions:
> > >
> > > Hi Xiongwei.  Thanks for the report.  Hsin-Ye and I have been working on
> > > this problem off-list.  Hsin-Ye, could you send the latest version?
> >
> > Hi Matthew,
> >
> > This is the patch of the latest version. Since it's just being
> > reviewed halfway done, I posted the patch here, or should I send it to
> > the list directly?
> >
> The indent seems broken when I pasted it here. Resend in the attachment.
>
> > Thanks
> >
> > From 03558dcaab93ed3c32498eb70c7f2b1382b218d6 Mon Sep 17 00:00:00 2001
> > From: Hsin-Yi Wang <hsinyi@chromium.org>
> > Date: Sun, 10 Oct 2021 21:22:25 +0800
> > Subject: [PATCH] squashfs: implement readahead
> >
> > Implement readahead callback for squashfs. It will read datablocks
> > which cover pages in readahead request. For a few cases it will
> > not mark page as uptodate, including:
> > - file end is 0.
> > - current batch of pages isn't in the same datablock or not enough in a
> >   datablock.
> > Otherwise pages will be marked as uptodate. The unhandled pages will be
> > updated by readpage later.
> >
> > Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
> > ---
> >  fs/squashfs/file.c | 72 +++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 71 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > index 89d492916dea..20ec48cf97c5 100644
> > --- a/fs/squashfs/file.c
> > +++ b/fs/squashfs/file.c
> > @@ -39,6 +39,7 @@
> >  #include "squashfs_fs_sb.h"
> >  #include "squashfs_fs_i.h"
> >  #include "squashfs.h"
> > +#include "page_actor.h"
> >
> >  /*
> >   * Locate cache slot in range [offset, index] for specified inode.  If
> > @@ -494,7 +495,76 @@ static int squashfs_readpage(struct file *file,
> > struct page *page)
> >   return 0;
> >  }
> >
> > +static void squashfs_readahead(struct readahead_control *ractl)
> > +{
> > + struct inode *inode = ractl->mapping->host;
> > + struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> > + size_t mask = (1UL << msblk->block_log) - 1;
> > + size_t shift = msblk->block_log - PAGE_SHIFT;
> > + loff_t req_end = readahead_pos(ractl) + readahead_length(ractl);
> > + loff_t start = readahead_pos(ractl) &~ mask;
> > + size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
> > + struct squashfs_page_actor *actor;
> > + unsigned int nr_pages = 0;
> > + struct page **pages;
> > + u64 block = 0;
> > + int bsize, res, i, index;
> > + int file_end = i_size_read(inode) >> msblk->block_log;
> > + int max_pages = 1UL << shift;
> > +
> > + readahead_expand(ractl, start, (len | mask) + 1);
> > +
> > + if (readahead_pos(ractl) + readahead_length(ractl) < req_end)
> > + return;
> > +
> > + nr_pages = readahead_count(ractl);
> > + pages = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL);
> > + if (!pages)
> > + return;
> > +
> > + actor = squashfs_page_actor_init_special(pages, nr_pages, 0);
> > + if (!actor)
> > + goto out;
> > +
> > + for (;;) {
> > + nr_pages = __readahead_batch(ractl, pages, max_pages);
> > + if (!nr_pages)
> > + break;
> > +
> > + if (readahead_pos(ractl) >= i_size_read(inode) ||
> > +     file_end == 0 || nr_pages < max_pages)
> > + goto skip_pages;
> > +
> > + index = pages[0]->index >> shift;
> > + bsize = read_blocklist(inode, index, &block);
> > + res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
> > + actor);
> > +
> > + if (res >=0 && (pages[nr_pages - 1]->index >> shift) == index)
> > + for (i = 0; i < nr_pages; i++)
> > + SetPageUptodate(pages[i]);
> > +
> > + for (i = 0; i < nr_pages; i++) {
> > + unlock_page(pages[i]);
> > + put_page(pages[i]);
> > + }
> > + }
> > +
> > + kfree(actor);
> > + return;
> > +
> > +skip_pages:
> > + for (i = 0; i < nr_pages; i++) {
> > + unlock_page(pages[i]);
> > + put_page(pages[i]);
> > + }
> > +
> > + kfree(actor);
> > +out:
> > + kfree(pages);
> > +}
> >
> >  const struct address_space_operations squashfs_aops = {
> > - .readpage = squashfs_readpage
> > + .readpage = squashfs_readpage,
> > + .readahead = squashfs_readahead
> >  };
> > --
> > 2.31.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-09 12:43       ` Xiongwei Song
@ 2022-05-09 13:21         ` Matthew Wilcox
  2022-05-09 14:29           ` Hsin-Yi Wang
  2022-05-10  1:11           ` Phillip Lougher
  0 siblings, 2 replies; 34+ messages in thread
From: Matthew Wilcox @ 2022-05-09 13:21 UTC (permalink / raw)
  To: Xiongwei Song, Zheng Liang, Phillip Lougher, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds
  Cc: Hsin-Yi Wang, Song, Xiongwei, phillip, linux-mm, squashfs-devel

On Mon, May 09, 2022 at 08:43:45PM +0800, Xiongwei Song wrote:
> Hi Hsin-Yi and Matthew,
> 
> With the patch from the attachment on linux 5.10, ran the command as I
> mentioned earlier,
> got the results below:
> 1:40.65 (1m + 40.65s)
> 1:10.12
> 1:11.10
> 1:11.47
> 1:11.59
> 1:11.94
> 1:11.86
> 1:12.04
> 1:12.21
> 1:12.06
> 
> The performance has improved obviously, but compared to linux 4.18, the
> performance is not so good.
> 
> Moreover, I wanted to test on linux 5.18. But I think I should revert
> 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
> read-ahead"),
> right?  Otherwise, the patch doesn't work?

I've never seen patch 9eec1d897139 before.  If you're going to point
out bugs in my code, at least have the decency to cc me on it.  It
should never have gone in, and should be reverted so the problem can
be fixed properly.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-09 13:21         ` Matthew Wilcox
@ 2022-05-09 14:29           ` Hsin-Yi Wang
  2022-05-10 12:30             ` Xiongwei Song
  2022-05-10  1:11           ` Phillip Lougher
  1 sibling, 1 reply; 34+ messages in thread
From: Hsin-Yi Wang @ 2022-05-09 14:29 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Xiongwei Song, Zheng Liang, Phillip Lougher, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

On Mon, May 9, 2022 at 9:21 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, May 09, 2022 at 08:43:45PM +0800, Xiongwei Song wrote:
> > Hi Hsin-Yi and Matthew,
> >
> > With the patch from the attachment on linux 5.10, ran the command as I
> > mentioned earlier,
> > got the results below:
> > 1:40.65 (1m + 40.65s)
> > 1:10.12
> > 1:11.10
> > 1:11.47
> > 1:11.59
> > 1:11.94
> > 1:11.86
> > 1:12.04
> > 1:12.21
> > 1:12.06
> >
> > The performance has improved obviously, but compared to linux 4.18, the
> > performance is not so good.
> >
I think you shouldn't compare the performance with 4.18 directly,
since there might be other factors that impact the performance. I'd
suggest comparing the same kernel version with:
a) with this patch
b) with c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted.

According to https://lore.kernel.org/linux-mm/Ynfzh2ifG85MZEoN@casper.infradead.org/t/
It seems to be a 3 sec difference?

1:37.16 (1m + 37.16s)
1:04.18
1:05.28
1:06.07
1:06.31
1:06.58
1:06.80
1:06.79
1:06.95
1:06.61

> > Moreover, I wanted to test on linux 5.18. But I think I should revert
> > 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
> > read-ahead"),
> > right?  Otherwise, the patch doesn't work?
>
> I've never seen patch 9eec1d897139 before.  If you're going to point
> out bugs in my code, at least have the decency to cc me on it.  It
> should never have gone in, and should be reverted so the problem can
> be fixed properly.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-09 13:21         ` Matthew Wilcox
  2022-05-09 14:29           ` Hsin-Yi Wang
@ 2022-05-10  1:11           ` Phillip Lougher
  2022-05-10  2:35             ` Matthew Wilcox
  1 sibling, 1 reply; 34+ messages in thread
From: Phillip Lougher @ 2022-05-10  1:11 UTC (permalink / raw)
  To: Matthew Wilcox, Xiongwei Song, Zheng Liang, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds
  Cc: Hsin-Yi Wang, Song, Xiongwei, linux-mm, squashfs-devel

On 09/05/2022 14:21, Matthew Wilcox wrote:
> On Mon, May 09, 2022 at 08:43:45PM +0800, Xiongwei Song wrote:
>> Hi Hsin-Yi and Matthew,
>>
>> With the patch from the attachment on linux 5.10, ran the command as I
>> mentioned earlier,
>> got the results below:
>> 1:40.65 (1m + 40.65s)
>> 1:10.12
>> 1:11.10
>> 1:11.47
>> 1:11.59
>> 1:11.94
>> 1:11.86
>> 1:12.04
>> 1:12.21
>> 1:12.06
>>
>> The performance has improved obviously, but compared to linux 4.18, the
>> performance is not so good.
>>
>> Moreover, I wanted to test on linux 5.18. But I think I should revert
>> 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
>> read-ahead"),
>> right?  Otherwise, the patch doesn't work?
> 
> I've never seen patch 9eec1d897139 before.  If you're going to point
> out bugs in my code, at least have the decency to cc me on it.  It
> should never have gone in, and should be reverted so the problem can
> be fixed properly.

You are not in charge of what patches goes into Squashfs, that is my
perogative as maintainer of Squashfs.

That patch (by Huawei) fixes the performance regression in Squashfs
by disabling readahead, and it is good workaround until something
better.

If the patch being worked-on now, once reviewed is acceptable, it can
replace the current workaround, which will be reverted.

Cheers

Dr. Phillip Lougher
--
Squashfs author and maintainer.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-10  1:11           ` Phillip Lougher
@ 2022-05-10  2:35             ` Matthew Wilcox
  2022-05-10  3:20               ` Phillip Lougher
  0 siblings, 1 reply; 34+ messages in thread
From: Matthew Wilcox @ 2022-05-10  2:35 UTC (permalink / raw)
  To: Phillip Lougher
  Cc: Xiongwei Song, Zheng Liang, Zhang Yi, Hou Tao, Miao Xie,
	Andrew Morton, Linus Torvalds, Hsin-Yi Wang, Song, Xiongwei,
	linux-mm, squashfs-devel

On Tue, May 10, 2022 at 02:11:41AM +0100, Phillip Lougher wrote:
> On 09/05/2022 14:21, Matthew Wilcox wrote:
> > On Mon, May 09, 2022 at 08:43:45PM +0800, Xiongwei Song wrote:
> > > Hi Hsin-Yi and Matthew,
> > > 
> > > With the patch from the attachment on linux 5.10, ran the command as I
> > > mentioned earlier,
> > > got the results below:
> > > 1:40.65 (1m + 40.65s)
> > > 1:10.12
> > > 1:11.10
> > > 1:11.47
> > > 1:11.59
> > > 1:11.94
> > > 1:11.86
> > > 1:12.04
> > > 1:12.21
> > > 1:12.06
> > > 
> > > The performance has improved obviously, but compared to linux 4.18, the
> > > performance is not so good.
> > > 
> > > Moreover, I wanted to test on linux 5.18. But I think I should revert
> > > 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
> > > read-ahead"),
> > > right?  Otherwise, the patch doesn't work?
> > 
> > I've never seen patch 9eec1d897139 before.  If you're going to point
> > out bugs in my code, at least have the decency to cc me on it.  It
> > should never have gone in, and should be reverted so the problem can
> > be fixed properly.
> 
> You are not in charge of what patches goes into Squashfs, that is my
> perogative as maintainer of Squashfs.

I think you mean 'prerogative'.  And, no, your filesystem is not your
little fiefdom, it's part of a collaborative effort.

> That patch (by Huawei) fixes the performance regression in Squashfs
> by disabling readahead, and it is good workaround until something
> better.

You *didn't even report the problem to me*.  How can it be fixed if I'm
not aware of it?



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-10  2:35             ` Matthew Wilcox
@ 2022-05-10  3:20               ` Phillip Lougher
  2022-05-10  3:41                 ` Phillip Lougher
  0 siblings, 1 reply; 34+ messages in thread
From: Phillip Lougher @ 2022-05-10  3:20 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Xiongwei Song, Zheng Liang, Zhang Yi, Hou Tao, Miao Xie,
	Andrew Morton, Linus Torvalds, Hsin-Yi Wang, Song, Xiongwei,
	linux-mm, squashfs-devel, open list

On 10/05/2022 03:35, Matthew Wilcox wrote:
> On Tue, May 10, 2022 at 02:11:41AM +0100, Phillip Lougher wrote:
>> On 09/05/2022 14:21, Matthew Wilcox wrote:
>>> On Mon, May 09, 2022 at 08:43:45PM +0800, Xiongwei Song wrote:
>>>> Hi Hsin-Yi and Matthew,
>>>>
>>>> With the patch from the attachment on linux 5.10, ran the command as I
>>>> mentioned earlier,
>>>> got the results below:
>>>> 1:40.65 (1m + 40.65s)
>>>> 1:10.12
>>>> 1:11.10
>>>> 1:11.47
>>>> 1:11.59
>>>> 1:11.94
>>>> 1:11.86
>>>> 1:12.04
>>>> 1:12.21
>>>> 1:12.06
>>>>
>>>> The performance has improved obviously, but compared to linux 4.18, the
>>>> performance is not so good.
>>>>
>>>> Moreover, I wanted to test on linux 5.18. But I think I should revert
>>>> 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
>>>> read-ahead"),
>>>> right?  Otherwise, the patch doesn't work?
>>>
>>> I've never seen patch 9eec1d897139 before.  If you're going to point
>>> out bugs in my code, at least have the decency to cc me on it.  It
>>> should never have gone in, and should be reverted so the problem can
>>> be fixed properly.
>>
>> You are not in charge of what patches goes into Squashfs, that is my
>> perogative as maintainer of Squashfs.
> 
> I think you mean 'prerogative'.  And, no, your filesystem is not your
> little fiefdom, it's part of a collaborative effort.
> 

This isn't a spelling contest, and if that's the best you can do you
have already failed.

Be carefull here also, I have been maintainer of Squashfs for 20 years,
and was kernel maintainer for both Ubuntu and Redhat for 10 years, and
so I am experienced member of the community.

You reply is bordering on offensive and arrogant, especially considering
it is unwarranted.  I did not set out to offend you, and I don't
appreciate it.

About 8 years ago I decided to refrain from active involvement in the
kernel community, because I decided the level of discourse was
disgusting, and I had enough of it.

I poped up now to defend my approval of the Huawei patch.  I am *quite*
happy not to have any more involvement until necessary.

So having said what I want to say, I will leave it at that. You have
just proved why I have minimised my involvement.

No doubt you'll throw your toys out the pram, but, I'm no
longer listening so don't bother.


>> That patch (by Huawei) fixes the performance regression in Squashfs
>> by disabling readahead, and it is good workaround until something
>> better.
> 
> You *didn't even report the problem to me*.  How can it be fixed if I'm
> not aware of it?
> 

There was a email discussion last year, which I responded to, and got
ignored.  I will find it out tomorrow, perhaps.  But I will probably
not bother, because life is too short.

Cheers

Phillip

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-10  3:20               ` Phillip Lougher
@ 2022-05-10  3:41                 ` Phillip Lougher
  0 siblings, 0 replies; 34+ messages in thread
From: Phillip Lougher @ 2022-05-10  3:41 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Xiongwei Song, Zheng Liang, Zhang Yi, Hou Tao, Miao Xie,
	Andrew Morton, Linus Torvalds, Hsin-Yi Wang, Song, Xiongwei,
	linux-mm, squashfs-devel, open list

On 10/05/2022 04:20, Phillip Lougher wrote:
> On 10/05/2022 03:35, Matthew Wilcox wrote:
>> On Tue, May 10, 2022 at 02:11:41AM +0100, Phillip Lougher wrote:
>>> On 09/05/2022 14:21, Matthew Wilcox wrote:
>>>> On Mon, May 09, 2022 at 08:43:45PM +0800, Xiongwei Song wrote:
>>>>> Hi Hsin-Yi and Matthew,
>>>>>
>>>>> With the patch from the attachment on linux 5.10, ran the command as I
>>>>> mentioned earlier,
>>>>> got the results below:
>>>>> 1:40.65 (1m + 40.65s)
>>>>> 1:10.12
>>>>> 1:11.10
>>>>> 1:11.47
>>>>> 1:11.59
>>>>> 1:11.94
>>>>> 1:11.86
>>>>> 1:12.04
>>>>> 1:12.21
>>>>> 1:12.06
>>>>>
>>>>> The performance has improved obviously, but compared to linux 4.18, 
>>>>> the
>>>>> performance is not so good.
>>>>>
>>>>> Moreover, I wanted to test on linux 5.18. But I think I should revert
>>>>> 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
>>>>> read-ahead"),
>>>>> right?  Otherwise, the patch doesn't work?
>>>>
>>>> I've never seen patch 9eec1d897139 before.  If you're going to point
>>>> out bugs in my code, at least have the decency to cc me on it.  It
>>>> should never have gone in, and should be reverted so the problem can
>>>> be fixed properly.
>>>
>>> You are not in charge of what patches goes into Squashfs, that is my
>>> perogative as maintainer of Squashfs.
>>
>> I think you mean 'prerogative'.  And, no, your filesystem is not your
>> little fiefdom, it's part of a collaborative effort.
>>
> 
> This isn't a spelling contest, and if that's the best you can do you
> have already failed.
> 
> Be carefull here also, I have been maintainer of Squashfs for 20 years,
> and was kernel maintainer for both Ubuntu and Redhat for 10 years, and
> so I am experienced member of the community.
> 
> You reply is bordering on offensive and arrogant, especially considering
> it is unwarranted.  I did not set out to offend you, and I don't
> appreciate it.
> 
> About 8 years ago I decided to refrain from active involvement in the
> kernel community, because I decided the level of discourse was
> disgusting, and I had enough of it.
> 
> I poped up now to defend my approval of the Huawei patch.  I am *quite*
> happy not to have any more involvement until necessary.
> 
> So having said what I want to say, I will leave it at that. You have
> just proved why I have minimised my involvement.
> 
> No doubt you'll throw your toys out the pram, but, I'm no
> longer listening so don't bother.
> 
> 
>>> That patch (by Huawei) fixes the performance regression in Squashfs
>>> by disabling readahead, and it is good workaround until something
>>> better.
>>
>> You *didn't even report the problem to me*.  How can it be fixed if I'm
>> not aware of it?

Despite having been insulted, I have done your homework for you.

This is where the problem was raised last year, with you directly
emailed.

https://lore.kernel.org/all/CAJMQK-g9G6KQmH-V=BRGX0swZji9Wxe_2c7ht-MMAapdFy2pXw@mail.gmail.com/T/

>>
> 
> There was a email discussion last year, which I responded to, and got
> ignored.  I will find it out tomorrow, perhaps.  But I will probably
> not bother, because life is too short.
> 

Afterwards you started a thread on "Readahead for compressed data",
which I responded to.

https://lore.kernel.org/all/YXHK5HrQpJu9oy8w@casper.infradead.org/T/


> Cheers
> 
> Phillip


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-09 14:29           ` Hsin-Yi Wang
@ 2022-05-10 12:30             ` Xiongwei Song
  2022-05-10 12:47               ` Hsin-Yi Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Xiongwei Song @ 2022-05-10 12:30 UTC (permalink / raw)
  To: Hsin-Yi Wang
  Cc: Matthew Wilcox, Zheng Liang, Phillip Lougher, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

Hi Hsin-Yi,

On Mon, May 9, 2022 at 10:29 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
>
> On Mon, May 9, 2022 at 9:21 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Mon, May 09, 2022 at 08:43:45PM +0800, Xiongwei Song wrote:
> > > Hi Hsin-Yi and Matthew,
> > >
> > > With the patch from the attachment on linux 5.10, ran the command as I
> > > mentioned earlier,
> > > got the results below:
> > > 1:40.65 (1m + 40.65s)
> > > 1:10.12
> > > 1:11.10
> > > 1:11.47
> > > 1:11.59
> > > 1:11.94
> > > 1:11.86
> > > 1:12.04
> > > 1:12.21
> > > 1:12.06
> > >
> > > The performance has improved obviously, but compared to linux 4.18, the
> > > performance is not so good.
> > >
> I think you shouldn't compare the performance with 4.18 directly,
> since there might be other factors that impact the performance.

Make sense.

>I'd suggest comparing the same kernel version with:
> a) with this patch
> b) with c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted.

With 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
read-ahead") reverted and applied 0001-WIP-squashfs-implement-readahead.patch,
test result on linux 5.18:
1:41.51 (1m + 41.51s)
1:08.11
1:10.37
1:11.17
1:11.32
1:11.59
1:12.23
1:12.08
1:12.76
1:12.51

performance worse 1 ~ 2s than linux 5.18 vanilla.

>
> According to https://lore.kernel.org/linux-mm/Ynfzh2ifG85MZEoN@casper.infradead.org/t/
> It seems to be a 3 sec difference?

5 ~ 6s difference.

Regards,
Xiongwei

>
> 1:37.16 (1m + 37.16s)
> 1:04.18
> 1:05.28
> 1:06.07
> 1:06.31
> 1:06.58
> 1:06.80
> 1:06.79
> 1:06.95
> 1:06.61
>
> > > Moreover, I wanted to test on linux 5.18. But I think I should revert
> > > 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
> > > read-ahead"),
> > > right?  Otherwise, the patch doesn't work?
> >
> > I've never seen patch 9eec1d897139 before.  If you're going to point
> > out bugs in my code, at least have the decency to cc me on it.  It
> > should never have gone in, and should be reverted so the problem can
> > be fixed properly.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-10 12:30             ` Xiongwei Song
@ 2022-05-10 12:47               ` Hsin-Yi Wang
  2022-05-10 13:18                 ` Xiongwei Song
  0 siblings, 1 reply; 34+ messages in thread
From: Hsin-Yi Wang @ 2022-05-10 12:47 UTC (permalink / raw)
  To: Xiongwei Song
  Cc: Matthew Wilcox, Zheng Liang, Phillip Lougher, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

On Tue, May 10, 2022 at 8:31 PM Xiongwei Song <sxwjean@gmail.com> wrote:
>
> Hi Hsin-Yi,
>
> On Mon, May 9, 2022 at 10:29 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> >
> > On Mon, May 9, 2022 at 9:21 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Mon, May 09, 2022 at 08:43:45PM +0800, Xiongwei Song wrote:
> > > > Hi Hsin-Yi and Matthew,
> > > >
> > > > With the patch from the attachment on linux 5.10, ran the command as I
> > > > mentioned earlier,
> > > > got the results below:
> > > > 1:40.65 (1m + 40.65s)
> > > > 1:10.12
> > > > 1:11.10
> > > > 1:11.47
> > > > 1:11.59
> > > > 1:11.94
> > > > 1:11.86
> > > > 1:12.04
> > > > 1:12.21
> > > > 1:12.06
> > > >
> > > > The performance has improved obviously, but compared to linux 4.18, the
> > > > performance is not so good.
> > > >
> > I think you shouldn't compare the performance with 4.18 directly,
> > since there might be other factors that impact the performance.
>
> Make sense.
>
> >I'd suggest comparing the same kernel version with:
> > a) with this patch
> > b) with c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted.
>
> With 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
> read-ahead") reverted and applied 0001-WIP-squashfs-implement-readahead.patch,
> test result on linux 5.18:
> 1:41.51 (1m + 41.51s)
> 1:08.11
> 1:10.37
> 1:11.17
> 1:11.32
> 1:11.59
> 1:12.23
> 1:12.08
> 1:12.76
> 1:12.51
>
> performance worse 1 ~ 2s than linux 5.18 vanilla.
>

Can you share the pack file you used for testing? Thanks
> >
> > According to https://lore.kernel.org/linux-mm/Ynfzh2ifG85MZEoN@casper.infradead.org/t/
> > It seems to be a 3 sec difference?
>
> 5 ~ 6s difference.
>
> Regards,
> Xiongwei
>
> >
> > 1:37.16 (1m + 37.16s)
> > 1:04.18
> > 1:05.28
> > 1:06.07
> > 1:06.31
> > 1:06.58
> > 1:06.80
> > 1:06.79
> > 1:06.95
> > 1:06.61
> >
> > > > Moreover, I wanted to test on linux 5.18. But I think I should revert
> > > > 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
> > > > read-ahead"),
> > > > right?  Otherwise, the patch doesn't work?
> > >
> > > I've never seen patch 9eec1d897139 before.  If you're going to point
> > > out bugs in my code, at least have the decency to cc me on it.  It
> > > should never have gone in, and should be reverted so the problem can
> > > be fixed properly.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-10 12:47               ` Hsin-Yi Wang
@ 2022-05-10 13:18                 ` Xiongwei Song
  2022-05-11 15:12                   ` Hsin-Yi Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Xiongwei Song @ 2022-05-10 13:18 UTC (permalink / raw)
  To: Hsin-Yi Wang
  Cc: Matthew Wilcox, Zheng Liang, Phillip Lougher, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

On Tue, May 10, 2022 at 8:47 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
>
> On Tue, May 10, 2022 at 8:31 PM Xiongwei Song <sxwjean@gmail.com> wrote:
> >
> > Hi Hsin-Yi,
> >
> > On Mon, May 9, 2022 at 10:29 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> > >
> > > On Mon, May 9, 2022 at 9:21 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Mon, May 09, 2022 at 08:43:45PM +0800, Xiongwei Song wrote:
> > > > > Hi Hsin-Yi and Matthew,
> > > > >
> > > > > With the patch from the attachment on linux 5.10, ran the command as I
> > > > > mentioned earlier,
> > > > > got the results below:
> > > > > 1:40.65 (1m + 40.65s)
> > > > > 1:10.12
> > > > > 1:11.10
> > > > > 1:11.47
> > > > > 1:11.59
> > > > > 1:11.94
> > > > > 1:11.86
> > > > > 1:12.04
> > > > > 1:12.21
> > > > > 1:12.06
> > > > >
> > > > > The performance has improved obviously, but compared to linux 4.18, the
> > > > > performance is not so good.
> > > > >
> > > I think you shouldn't compare the performance with 4.18 directly,
> > > since there might be other factors that impact the performance.
> >
> > Make sense.
> >
> > >I'd suggest comparing the same kernel version with:
> > > a) with this patch
> > > b) with c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted.
> >
> > With 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
> > read-ahead") reverted and applied 0001-WIP-squashfs-implement-readahead.patch,
> > test result on linux 5.18:
> > 1:41.51 (1m + 41.51s)
> > 1:08.11
> > 1:10.37
> > 1:11.17
> > 1:11.32
> > 1:11.59
> > 1:12.23
> > 1:12.08
> > 1:12.76
> > 1:12.51
> >
> > performance worse 1 ~ 2s than linux 5.18 vanilla.
> >
>
> Can you share the pack file you used for testing? Thanks

You are saying the files that are put in squashfs partitions? If yes, I can tell
I just put some dynamic libraries in partitions:
-rwxr-xr-x 1 root root  200680 Apr 20 03:57 ld-2.33.so
lrwxrwxrwx 1 root root      10 Apr 20 03:57 ld-linux-x86-64.so.2 -> ld-2.33.so
-rwxr-xr-x 1 root root   18776 Apr 20 03:57 libanl-2.33.so
lrwxrwxrwx 1 root root      14 Apr 20 03:57 libanl.so.1 -> libanl-2.33.so
lrwxrwxrwx 1 root root      17 Apr 20 04:08 libblkid.so.1 -> libblkid.so.1.1.0
-rwxr-xr-x 1 root root  330776 Apr 20 04:08 libblkid.so.1.1.0
-rwxr-xr-x 1 root root 1823192 Apr 20 03:57 libc-2.33.so
...... snip ......

The number of files is 110(55 libs + 55 soft links to libs).  I have 90 squashfs
partitions which save the identical files. The space of each partition is 11M,
nothing special.

Thanks.






> > >
> > > According to https://lore.kernel.org/linux-mm/Ynfzh2ifG85MZEoN@casper.infradead.org/t/
> > > It seems to be a 3 sec difference?
> >
> > 5 ~ 6s difference.
> >
> > Regards,
> > Xiongwei
> >
> > >
> > > 1:37.16 (1m + 37.16s)
> > > 1:04.18
> > > 1:05.28
> > > 1:06.07
> > > 1:06.31
> > > 1:06.58
> > > 1:06.80
> > > 1:06.79
> > > 1:06.95
> > > 1:06.61
> > >
> > > > > Moreover, I wanted to test on linux 5.18. But I think I should revert
> > > > > 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
> > > > > read-ahead"),
> > > > > right?  Otherwise, the patch doesn't work?
> > > >
> > > > I've never seen patch 9eec1d897139 before.  If you're going to point
> > > > out bugs in my code, at least have the decency to cc me on it.  It
> > > > should never have gone in, and should be reverted so the problem can
> > > > be fixed properly.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-10 13:18                 ` Xiongwei Song
@ 2022-05-11 15:12                   ` Hsin-Yi Wang
  2022-05-11 19:13                     ` Phillip Lougher
  0 siblings, 1 reply; 34+ messages in thread
From: Hsin-Yi Wang @ 2022-05-11 15:12 UTC (permalink / raw)
  To: Xiongwei Song
  Cc: Matthew Wilcox, Zheng Liang, Phillip Lougher, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

On Tue, May 10, 2022 at 9:19 PM Xiongwei Song <sxwjean@gmail.com> wrote:
>
> On Tue, May 10, 2022 at 8:47 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> >
> > On Tue, May 10, 2022 at 8:31 PM Xiongwei Song <sxwjean@gmail.com> wrote:
> > >
> > > Hi Hsin-Yi,
> > >
> > > On Mon, May 9, 2022 at 10:29 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> > > >
> > > > On Mon, May 9, 2022 at 9:21 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Mon, May 09, 2022 at 08:43:45PM +0800, Xiongwei Song wrote:
> > > > > > Hi Hsin-Yi and Matthew,
> > > > > >
> > > > > > With the patch from the attachment on linux 5.10, ran the command as I
> > > > > > mentioned earlier,
> > > > > > got the results below:
> > > > > > 1:40.65 (1m + 40.65s)
> > > > > > 1:10.12
> > > > > > 1:11.10
> > > > > > 1:11.47
> > > > > > 1:11.59
> > > > > > 1:11.94
> > > > > > 1:11.86
> > > > > > 1:12.04
> > > > > > 1:12.21
> > > > > > 1:12.06
> > > > > >
> > > > > > The performance has improved obviously, but compared to linux 4.18, the
> > > > > > performance is not so good.
> > > > > >
> > > > I think you shouldn't compare the performance with 4.18 directly,
> > > > since there might be other factors that impact the performance.
> > >
> > > Make sense.
> > >
> > > >I'd suggest comparing the same kernel version with:
> > > > a) with this patch
> > > > b) with c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted.
> > >
> > > With 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
> > > read-ahead") reverted and applied 0001-WIP-squashfs-implement-readahead.patch,
> > > test result on linux 5.18:
> > > 1:41.51 (1m + 41.51s)
> > > 1:08.11
> > > 1:10.37
> > > 1:11.17
> > > 1:11.32
> > > 1:11.59
> > > 1:12.23
> > > 1:12.08
> > > 1:12.76
> > > 1:12.51
> > >
> > > performance worse 1 ~ 2s than linux 5.18 vanilla.
> > >
> >
> > Can you share the pack file you used for testing? Thanks
>
> You are saying the files that are put in squashfs partitions? If yes, I can tell
> I just put some dynamic libraries in partitions:
> -rwxr-xr-x 1 root root  200680 Apr 20 03:57 ld-2.33.so
> lrwxrwxrwx 1 root root      10 Apr 20 03:57 ld-linux-x86-64.so.2 -> ld-2.33.so
> -rwxr-xr-x 1 root root   18776 Apr 20 03:57 libanl-2.33.so
> lrwxrwxrwx 1 root root      14 Apr 20 03:57 libanl.so.1 -> libanl-2.33.so
> lrwxrwxrwx 1 root root      17 Apr 20 04:08 libblkid.so.1 -> libblkid.so.1.1.0
> -rwxr-xr-x 1 root root  330776 Apr 20 04:08 libblkid.so.1.1.0
> -rwxr-xr-x 1 root root 1823192 Apr 20 03:57 libc-2.33.so
> ...... snip ......
>
> The number of files is 110(55 libs + 55 soft links to libs).  I have 90 squashfs
> partitions which save the identical files. The space of each partition is 11M,
> nothing special.
>
> Thanks.
>

I noticed that there's a crash at
https://elixir.bootlin.com/linux/latest/source/lib/lzo/lzo1x_decompress_safe.c#L218
when testing on my system.
(I have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS enabled)

Full logs:
[  119.062420] Unable to handle kernel paging request at virtual
address ffffffc017337000
[  119.062437] Mem abort info:
[  119.062442]   ESR = 0x96000047
[  119.062447]   EC = 0x25: DABT (current EL), IL = 32 bits
[  119.062451]   SET = 0, FnV = 0
[  119.062454]   EA = 0, S1PTW = 0
[  119.062457] Data abort info:
[  119.062460]   ISV = 0, ISS = 0x00000047
[  119.062464]   CM = 0, WnR = 1
[  119.062469] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000041099000
[  119.062473] [ffffffc017337000] pgd=000000010014a003,
p4d=000000010014a003, pud=000000010014a003, pmd=000000010ba59003,
pte=0000000000000000
[  119.062489] Internal error: Oops: 96000047 [#1] PREEMPT SMP
[  119.062494] Modules linked in: vhost_vsock vhost vhost_iotlb
vmw_vsock_virtio_transport_common vsock rfcomm algif_hash
algif_skcipher af_alg veth uinput xt_cgroup mtk_dip mtk_cam_isp
mtk_vcodec_enc mtk_vcodec_dec hci_uart mtk_fd mtk_mdp3 v4l2_h264
mtk_vcodec_common mtk_vpu xt_MASQUERADE mtk_jpeg cros_ec_rpmsg btqca
videobuf2_dma_contig v4l2_fwnode v4l2_mem2mem btrtl elants_i2c mtk_scp
mtk_rpmsg rpmsg_core mtk_scp_ipi mt8183_cci_devfreq ip6table_nat fuse
8021q bluetooth ecdh_generic ecc iio_trig_sysfs cros_ec_lid_angle
cros_ec_sensors cros_ec_sensors_core industrialio_triggered_buffer
kfifo_buf cros_ec_sensorhub cros_ec_typec typec hid_google_hammer
ath10k_sdio lzo_rle lzo_compress ath10k_core ath mac80211 zram
cfg80211 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
videobuf2_common cdc_ether usbnet r8152 mii joydev
[  119.062613] CPU: 4 PID: 4161 Comm: chrome Tainted: G        W
  5.10.112 #105 39f11bffda227eaae4c704733b9bf01db22d8b4d
[  119.062617] Hardware name: Google burnet board (DT)
[  119.062623] pstate: 20400005 (nzCv daif +PAN -UAO -TCO BTYPE=--)
[  119.062636] pc : lzo1x_decompress_safe+0x1dc/0x564
[  119.062643] lr : lzo_uncompress+0x134/0x1f0
[  119.062647] sp : ffffffc01837b860
[  119.062650] x29: ffffffc01837b860 x28: 0000000000000000
[  119.062656] x27: 0000000000005451 x26: ffffffc0171c9445
[  119.062662] x25: 0000000000000000 x24: ffffffc017437000
[  119.062668] x23: ffffffc0171c944f x22: ffffffc017136000
[  119.062673] x21: ffffffc017336ff1 x20: ffffffc017237000
[  119.062679] x19: ffffffc01837b8d0 x18: 0000000000000000
[  119.062684] x17: 00000000000001eb x16: 0000000000000012
[  119.062689] x15: 000000000010000f x14: d600120202000001
[  119.062695] x13: ffffffc017336ff1 x12: ffffffc017336ff4
[  119.062700] x11: 0000000000000002 x10: 01010101010100ff
[  119.062705] x9 : ffffffffffffffff x8 : ffffffc0171c944d
[  119.062710] x7 : d15d3aaabd294330 x6 : 0206397115fe28ab
[  119.062715] x5 : ffffffc0171c944f x4 : 000000000009344f
[  119.062721] x3 : ffffffc01837b8d0 x2 : ffffffc017237000
[  119.062726] x1 : 000000000009344f x0 : ffffffc0171c9447
[  119.062731] Call trace:
[  119.062738]  lzo1x_decompress_safe+0x1dc/0x564
[  119.062742]  lzo_uncompress+0x134/0x1f0
[  119.062746]  squashfs_decompress+0x6c/0xb4
[  119.062753]  squashfs_read_data+0x1a8/0x298
[  119.062758]  squashfs_readahead+0x308/0x474
[  119.062765]  read_pages+0x74/0x280
[  119.062769]  page_cache_ra_unbounded+0x1d0/0x228
[  119.062773]  do_page_cache_ra+0x44/0x50
[  119.062779]  do_sync_mmap_readahead+0x188/0x1a0
[  119.062783]  filemap_fault+0x100/0x350
[  119.062789]  __do_fault+0x48/0x10c
[  119.062793]  do_cow_fault+0x58/0x12c
[  119.062797]  handle_mm_fault+0x544/0x904
[  119.062804]  do_page_fault+0x260/0x384
[  119.062809]  do_translation_fault+0x44/0x5c
[  119.062813]  do_mem_abort+0x48/0xb4
[  119.062819]  el0_da+0x28/0x34
[  119.062824]  el0_sync_compat_handler+0xb8/0xcc
[  119.062829]  el0_sync_compat+0x188/0x1c0
[  119.062837] Code: f94001ae f90002ae f94005ae 910041ad (f90006ae)
[  119.062842] ---[ end trace 3e9828c7360fd7be ]---
[  119.090436] Kernel panic - not syncing: Oops: Fatal exception
[  119.090455] SMP: stopping secondary CPUs
[  119.090467] Kernel Offset: 0x2729c00000 from 0xffffffc010000000
[  119.090471] PHYS_OFFSET: 0xffffffd880000000
[  119.090477] CPU features: 0x08240002,2188200c

1) Traces near when the crash happened:
[   79.495580] Block @ 0x60eea9c, compressed size 65744, src size 1048576
[   80.363573] Block @ 0x1f9f000, compressed size 200598, src size 1048576
[   80.371256] Block @ 0x1fcff96, compressed size 80772, src size 1048576
[   80.428388] Block @ 0x1fe3b1a, compressed size 83941, src size 1048576
[   80.435319] Block @ 0x1ff82ff, compressed size 77936, src size 1048576
[   80.724331] Block @ 0x4501000, compressed size 364069, src size 1048576
[   80.738683] Block @ 0x4dccf28, compressed size 603215, src size 2097152

It's also noticed that when the crash happened, nr_pages obtained by
readahead_count() is 512.
nr_pages = readahead_count(ractl); // this line

2) Normal cases that won't crash:
[   22.651750] Block @ 0xb3bbca6, compressed size 42172, src size 262144
[   22.653580] Block @ 0xb3c6162, compressed size 29815, src size 262144
[   22.656692] Block @ 0xb4a293f, compressed size 17484, src size 131072
[   22.666099] Block @ 0xb593881, compressed size 39742, src size 262144
[   22.668699] Block @ 0xb59d3bf, compressed size 37841, src size 262144
[   22.695739] Block @ 0x13698673, compressed size 65907, src size 131072
[   22.698619] Block @ 0x136a87e6, compressed size 3155, src size 131072
[   22.703400] Block @ 0xb1babe8, compressed size 99391, src size 131072
[   22.706288] Block @ 0x1514abc6, compressed size 4627, src size 131072

nr_pages are observed to be 32, 64, 256... These won't cause a crash.
Other values (max_pages, bsize, block...) looks normal

I'm not sure why the crash happened, but I tried to modify the mask
for a bit. After modifying the mask value to below, the crash is gone
(nr_pages are <=256).
Based on my testing on a 300K pack file, there's no performance change.

diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
index 20ec48cf97c5..f6d9b6f88ed9 100644
--- a/fs/squashfs/file.c
+++ b/fs/squashfs/file.c
@@ -499,8 +499,8 @@ static void squashfs_readahead(struct
readahead_control *ractl)
 {
        struct inode *inode = ractl->mapping->host;
        struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
-       size_t mask = (1UL << msblk->block_log) - 1;
        size_t shift = msblk->block_log - PAGE_SHIFT;
+       size_t mask = (1UL << shift) - 1;


Any pointers are appreciated. Thanks!


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-11 15:12                   ` Hsin-Yi Wang
@ 2022-05-11 19:13                     ` Phillip Lougher
  2022-05-12  6:23                       ` Hsin-Yi Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Phillip Lougher @ 2022-05-11 19:13 UTC (permalink / raw)
  To: Hsin-Yi Wang, Xiongwei Song
  Cc: Matthew Wilcox, Zheng Liang, Zhang Yi, Hou Tao, Miao Xie,
	Andrew Morton, Linus Torvalds, Song, Xiongwei, linux-mm,
	squashfs-devel

On 11/05/2022 16:12, Hsin-Yi Wang wrote:
> On Tue, May 10, 2022 at 9:19 PM Xiongwei Song <sxwjean@gmail.com> wrote:
>>
>> On Tue, May 10, 2022 at 8:47 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
>>>
>>> On Tue, May 10, 2022 at 8:31 PM Xiongwei Song <sxwjean@gmail.com> wrote:
>>>>
>>>> Hi Hsin-Yi,
>>>>
>>>> On Mon, May 9, 2022 at 10:29 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
>>>>>
>>>>> On Mon, May 9, 2022 at 9:21 PM Matthew Wilcox <willy@infradead.org> wrote:
>>>>>>
>>>>>> On Mon, May 09, 2022 at 08:43:45PM +0800, Xiongwei Song wrote:
>>>>>>> Hi Hsin-Yi and Matthew,
>>>>>>>
>>>>>>> With the patch from the attachment on linux 5.10, ran the command as I
>>>>>>> mentioned earlier,
>>>>>>> got the results below:
>>>>>>> 1:40.65 (1m + 40.65s)
>>>>>>> 1:10.12
>>>>>>> 1:11.10
>>>>>>> 1:11.47
>>>>>>> 1:11.59
>>>>>>> 1:11.94
>>>>>>> 1:11.86
>>>>>>> 1:12.04
>>>>>>> 1:12.21
>>>>>>> 1:12.06
>>>>>>>
>>>>>>> The performance has improved obviously, but compared to linux 4.18, the
>>>>>>> performance is not so good.
>>>>>>>
>>>>> I think you shouldn't compare the performance with 4.18 directly,
>>>>> since there might be other factors that impact the performance.
>>>>
>>>> Make sense.
>>>>
>>>>> I'd suggest comparing the same kernel version with:
>>>>> a) with this patch
>>>>> b) with c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted.
>>>>
>>>> With 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
>>>> read-ahead") reverted and applied 0001-WIP-squashfs-implement-readahead.patch,
>>>> test result on linux 5.18:
>>>> 1:41.51 (1m + 41.51s)
>>>> 1:08.11
>>>> 1:10.37
>>>> 1:11.17
>>>> 1:11.32
>>>> 1:11.59
>>>> 1:12.23
>>>> 1:12.08
>>>> 1:12.76
>>>> 1:12.51
>>>>
>>>> performance worse 1 ~ 2s than linux 5.18 vanilla.
>>>>
>>>
>>> Can you share the pack file you used for testing? Thanks
>>
>> You are saying the files that are put in squashfs partitions? If yes, I can tell
>> I just put some dynamic libraries in partitions:
>> -rwxr-xr-x 1 root root  200680 Apr 20 03:57 ld-2.33.so
>> lrwxrwxrwx 1 root root      10 Apr 20 03:57 ld-linux-x86-64.so.2 -> ld-2.33.so
>> -rwxr-xr-x 1 root root   18776 Apr 20 03:57 libanl-2.33.so
>> lrwxrwxrwx 1 root root      14 Apr 20 03:57 libanl.so.1 -> libanl-2.33.so
>> lrwxrwxrwx 1 root root      17 Apr 20 04:08 libblkid.so.1 -> libblkid.so.1.1.0
>> -rwxr-xr-x 1 root root  330776 Apr 20 04:08 libblkid.so.1.1.0
>> -rwxr-xr-x 1 root root 1823192 Apr 20 03:57 libc-2.33.so
>> ...... snip ......
>>
>> The number of files is 110(55 libs + 55 soft links to libs).  I have 90 squashfs
>> partitions which save the identical files. The space of each partition is 11M,
>> nothing special.
>>
>> Thanks.
>>
> 
> I noticed that there's a crash at
> https://elixir.bootlin.com/linux/latest/source/lib/lzo/lzo1x_decompress_safe.c#L218
> when testing on my system.
> (I have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS enabled)
> 
> Full logs:
> [  119.062420] Unable to handle kernel paging request at virtual
> address ffffffc017337000
> [  119.062437] Mem abort info:
> [  119.062442]   ESR = 0x96000047
> [  119.062447]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  119.062451]   SET = 0, FnV = 0
> [  119.062454]   EA = 0, S1PTW = 0
> [  119.062457] Data abort info:
> [  119.062460]   ISV = 0, ISS = 0x00000047
> [  119.062464]   CM = 0, WnR = 1
> [  119.062469] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000041099000
> [  119.062473] [ffffffc017337000] pgd=000000010014a003,
> p4d=000000010014a003, pud=000000010014a003, pmd=000000010ba59003,
> pte=0000000000000000
> [  119.062489] Internal error: Oops: 96000047 [#1] PREEMPT SMP
> [  119.062494] Modules linked in: vhost_vsock vhost vhost_iotlb
> vmw_vsock_virtio_transport_common vsock rfcomm algif_hash
> algif_skcipher af_alg veth uinput xt_cgroup mtk_dip mtk_cam_isp
> mtk_vcodec_enc mtk_vcodec_dec hci_uart mtk_fd mtk_mdp3 v4l2_h264
> mtk_vcodec_common mtk_vpu xt_MASQUERADE mtk_jpeg cros_ec_rpmsg btqca
> videobuf2_dma_contig v4l2_fwnode v4l2_mem2mem btrtl elants_i2c mtk_scp
> mtk_rpmsg rpmsg_core mtk_scp_ipi mt8183_cci_devfreq ip6table_nat fuse
> 8021q bluetooth ecdh_generic ecc iio_trig_sysfs cros_ec_lid_angle
> cros_ec_sensors cros_ec_sensors_core industrialio_triggered_buffer
> kfifo_buf cros_ec_sensorhub cros_ec_typec typec hid_google_hammer
> ath10k_sdio lzo_rle lzo_compress ath10k_core ath mac80211 zram
> cfg80211 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
> videobuf2_common cdc_ether usbnet r8152 mii joydev
> [  119.062613] CPU: 4 PID: 4161 Comm: chrome Tainted: G        W
>    5.10.112 #105 39f11bffda227eaae4c704733b9bf01db22d8b4d
> [  119.062617] Hardware name: Google burnet board (DT)
> [  119.062623] pstate: 20400005 (nzCv daif +PAN -UAO -TCO BTYPE=--)
> [  119.062636] pc : lzo1x_decompress_safe+0x1dc/0x564
> [  119.062643] lr : lzo_uncompress+0x134/0x1f0
> [  119.062647] sp : ffffffc01837b860
> [  119.062650] x29: ffffffc01837b860 x28: 0000000000000000
> [  119.062656] x27: 0000000000005451 x26: ffffffc0171c9445
> [  119.062662] x25: 0000000000000000 x24: ffffffc017437000
> [  119.062668] x23: ffffffc0171c944f x22: ffffffc017136000
> [  119.062673] x21: ffffffc017336ff1 x20: ffffffc017237000
> [  119.062679] x19: ffffffc01837b8d0 x18: 0000000000000000
> [  119.062684] x17: 00000000000001eb x16: 0000000000000012
> [  119.062689] x15: 000000000010000f x14: d600120202000001
> [  119.062695] x13: ffffffc017336ff1 x12: ffffffc017336ff4
> [  119.062700] x11: 0000000000000002 x10: 01010101010100ff
> [  119.062705] x9 : ffffffffffffffff x8 : ffffffc0171c944d
> [  119.062710] x7 : d15d3aaabd294330 x6 : 0206397115fe28ab
> [  119.062715] x5 : ffffffc0171c944f x4 : 000000000009344f
> [  119.062721] x3 : ffffffc01837b8d0 x2 : ffffffc017237000
> [  119.062726] x1 : 000000000009344f x0 : ffffffc0171c9447
> [  119.062731] Call trace:
> [  119.062738]  lzo1x_decompress_safe+0x1dc/0x564
> [  119.062742]  lzo_uncompress+0x134/0x1f0
> [  119.062746]  squashfs_decompress+0x6c/0xb4
> [  119.062753]  squashfs_read_data+0x1a8/0x298
> [  119.062758]  squashfs_readahead+0x308/0x474
> [  119.062765]  read_pages+0x74/0x280
> [  119.062769]  page_cache_ra_unbounded+0x1d0/0x228
> [  119.062773]  do_page_cache_ra+0x44/0x50
> [  119.062779]  do_sync_mmap_readahead+0x188/0x1a0
> [  119.062783]  filemap_fault+0x100/0x350
> [  119.062789]  __do_fault+0x48/0x10c
> [  119.062793]  do_cow_fault+0x58/0x12c
> [  119.062797]  handle_mm_fault+0x544/0x904
> [  119.062804]  do_page_fault+0x260/0x384
> [  119.062809]  do_translation_fault+0x44/0x5c
> [  119.062813]  do_mem_abort+0x48/0xb4
> [  119.062819]  el0_da+0x28/0x34
> [  119.062824]  el0_sync_compat_handler+0xb8/0xcc
> [  119.062829]  el0_sync_compat+0x188/0x1c0
> [  119.062837] Code: f94001ae f90002ae f94005ae 910041ad (f90006ae)
> [  119.062842] ---[ end trace 3e9828c7360fd7be ]---
> [  119.090436] Kernel panic - not syncing: Oops: Fatal exception
> [  119.090455] SMP: stopping secondary CPUs
> [  119.090467] Kernel Offset: 0x2729c00000 from 0xffffffc010000000
> [  119.090471] PHYS_OFFSET: 0xffffffd880000000
> [  119.090477] CPU features: 0x08240002,2188200c
> 
> 1) Traces near when the crash happened:
> [   79.495580] Block @ 0x60eea9c, compressed size 65744, src size 1048576
> [   80.363573] Block @ 0x1f9f000, compressed size 200598, src size 1048576
> [   80.371256] Block @ 0x1fcff96, compressed size 80772, src size 1048576
> [   80.428388] Block @ 0x1fe3b1a, compressed size 83941, src size 1048576
> [   80.435319] Block @ 0x1ff82ff, compressed size 77936, src size 1048576
> [   80.724331] Block @ 0x4501000, compressed size 364069, src size 1048576
> [   80.738683] Block @ 0x4dccf28, compressed size 603215, src size 2097152

Src size 2097152 is clearly wrong, as the maximum data block size is 1 
Mbyte or 1048576.

That debug line comes from

https://elixir.bootlin.com/linux/latest/source/fs/squashfs/block.c#L156

----
TRACE("Block @ 0x%llx, %scompressed size %d, src size %d\n",
		index, compressed ? "" : "un", length, output->length);
----

Which indicates your code has created a page_actor of 2 Mbytes in size
(output->length).

This is completely incorrect, as the page_actor should never be larger
than the size of the block to be read in question.  In most cases that
will be msblk->block_size, but it may be less at the end of the file.

You appear to be trying to read the amount of readahead requested.  But,
you should always be trying to read the lesser of readahead, and the 
size of the block in question.

Hope that helps.

Phillip

> 
> It's also noticed that when the crash happened, nr_pages obtained by
> readahead_count() is 512.
> nr_pages = readahead_count(ractl); // this line
> 
> 2) Normal cases that won't crash:
> [   22.651750] Block @ 0xb3bbca6, compressed size 42172, src size 262144
> [   22.653580] Block @ 0xb3c6162, compressed size 29815, src size 262144
> [   22.656692] Block @ 0xb4a293f, compressed size 17484, src size 131072
> [   22.666099] Block @ 0xb593881, compressed size 39742, src size 262144
> [   22.668699] Block @ 0xb59d3bf, compressed size 37841, src size 262144
> [   22.695739] Block @ 0x13698673, compressed size 65907, src size 131072
> [   22.698619] Block @ 0x136a87e6, compressed size 3155, src size 131072
> [   22.703400] Block @ 0xb1babe8, compressed size 99391, src size 131072
> [   22.706288] Block @ 0x1514abc6, compressed size 4627, src size 131072
> 
> nr_pages are observed to be 32, 64, 256... These won't cause a crash.
> Other values (max_pages, bsize, block...) looks normal
> 
> I'm not sure why the crash happened, but I tried to modify the mask
> for a bit. After modifying the mask value to below, the crash is gone
> (nr_pages are <=256).
> Based on my testing on a 300K pack file, there's no performance change.
> 
> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> index 20ec48cf97c5..f6d9b6f88ed9 100644
> --- a/fs/squashfs/file.c
> +++ b/fs/squashfs/file.c
> @@ -499,8 +499,8 @@ static void squashfs_readahead(struct
> readahead_control *ractl)
>   {
>          struct inode *inode = ractl->mapping->host;
>          struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> -       size_t mask = (1UL << msblk->block_log) - 1;
>          size_t shift = msblk->block_log - PAGE_SHIFT;
> +       size_t mask = (1UL << shift) - 1;
> 
> 
> Any pointers are appreciated. Thanks!



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-11 19:13                     ` Phillip Lougher
@ 2022-05-12  6:23                       ` Hsin-Yi Wang
  2022-05-13  5:33                         ` Phillip Lougher
  2022-05-13 12:16                         ` Xiongwei Song
  0 siblings, 2 replies; 34+ messages in thread
From: Hsin-Yi Wang @ 2022-05-12  6:23 UTC (permalink / raw)
  To: Phillip Lougher
  Cc: Xiongwei Song, Matthew Wilcox, Zheng Liang, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

[-- Attachment #1: Type: text/plain, Size: 11745 bytes --]

On Thu, May 12, 2022 at 3:13 AM Phillip Lougher <phillip@squashfs.org.uk> wrote:
>
> On 11/05/2022 16:12, Hsin-Yi Wang wrote:
> > On Tue, May 10, 2022 at 9:19 PM Xiongwei Song <sxwjean@gmail.com> wrote:
> >>
> >> On Tue, May 10, 2022 at 8:47 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> >>>
> >>> On Tue, May 10, 2022 at 8:31 PM Xiongwei Song <sxwjean@gmail.com> wrote:
> >>>>
> >>>> Hi Hsin-Yi,
> >>>>
> >>>> On Mon, May 9, 2022 at 10:29 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> >>>>>
> >>>>> On Mon, May 9, 2022 at 9:21 PM Matthew Wilcox <willy@infradead.org> wrote:
> >>>>>>
> >>>>>> On Mon, May 09, 2022 at 08:43:45PM +0800, Xiongwei Song wrote:
> >>>>>>> Hi Hsin-Yi and Matthew,
> >>>>>>>
> >>>>>>> With the patch from the attachment on linux 5.10, ran the command as I
> >>>>>>> mentioned earlier,
> >>>>>>> got the results below:
> >>>>>>> 1:40.65 (1m + 40.65s)
> >>>>>>> 1:10.12
> >>>>>>> 1:11.10
> >>>>>>> 1:11.47
> >>>>>>> 1:11.59
> >>>>>>> 1:11.94
> >>>>>>> 1:11.86
> >>>>>>> 1:12.04
> >>>>>>> 1:12.21
> >>>>>>> 1:12.06
> >>>>>>>
> >>>>>>> The performance has improved obviously, but compared to linux 4.18, the
> >>>>>>> performance is not so good.
> >>>>>>>
> >>>>> I think you shouldn't compare the performance with 4.18 directly,
> >>>>> since there might be other factors that impact the performance.
> >>>>
> >>>> Make sense.
> >>>>
> >>>>> I'd suggest comparing the same kernel version with:
> >>>>> a) with this patch
> >>>>> b) with c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted.
> >>>>
> >>>> With 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
> >>>> read-ahead") reverted and applied 0001-WIP-squashfs-implement-readahead.patch,
> >>>> test result on linux 5.18:
> >>>> 1:41.51 (1m + 41.51s)
> >>>> 1:08.11
> >>>> 1:10.37
> >>>> 1:11.17
> >>>> 1:11.32
> >>>> 1:11.59
> >>>> 1:12.23
> >>>> 1:12.08
> >>>> 1:12.76
> >>>> 1:12.51
> >>>>
> >>>> performance worse 1 ~ 2s than linux 5.18 vanilla.
> >>>>
> >>>
> >>> Can you share the pack file you used for testing? Thanks
> >>
> >> You are saying the files that are put in squashfs partitions? If yes, I can tell
> >> I just put some dynamic libraries in partitions:
> >> -rwxr-xr-x 1 root root  200680 Apr 20 03:57 ld-2.33.so
> >> lrwxrwxrwx 1 root root      10 Apr 20 03:57 ld-linux-x86-64.so.2 -> ld-2.33.so
> >> -rwxr-xr-x 1 root root   18776 Apr 20 03:57 libanl-2.33.so
> >> lrwxrwxrwx 1 root root      14 Apr 20 03:57 libanl.so.1 -> libanl-2.33.so
> >> lrwxrwxrwx 1 root root      17 Apr 20 04:08 libblkid.so.1 -> libblkid.so.1.1.0
> >> -rwxr-xr-x 1 root root  330776 Apr 20 04:08 libblkid.so.1.1.0
> >> -rwxr-xr-x 1 root root 1823192 Apr 20 03:57 libc-2.33.so
> >> ...... snip ......
> >>
> >> The number of files is 110(55 libs + 55 soft links to libs).  I have 90 squashfs
> >> partitions which save the identical files. The space of each partition is 11M,
> >> nothing special.
> >>
> >> Thanks.
> >>
> >
> > I noticed that there's a crash at
> > https://elixir.bootlin.com/linux/latest/source/lib/lzo/lzo1x_decompress_safe.c#L218
> > when testing on my system.
> > (I have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS enabled)
> >
> > Full logs:
> > [  119.062420] Unable to handle kernel paging request at virtual
> > address ffffffc017337000
> > [  119.062437] Mem abort info:
> > [  119.062442]   ESR = 0x96000047
> > [  119.062447]   EC = 0x25: DABT (current EL), IL = 32 bits
> > [  119.062451]   SET = 0, FnV = 0
> > [  119.062454]   EA = 0, S1PTW = 0
> > [  119.062457] Data abort info:
> > [  119.062460]   ISV = 0, ISS = 0x00000047
> > [  119.062464]   CM = 0, WnR = 1
> > [  119.062469] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000041099000
> > [  119.062473] [ffffffc017337000] pgd=000000010014a003,
> > p4d=000000010014a003, pud=000000010014a003, pmd=000000010ba59003,
> > pte=0000000000000000
> > [  119.062489] Internal error: Oops: 96000047 [#1] PREEMPT SMP
> > [  119.062494] Modules linked in: vhost_vsock vhost vhost_iotlb
> > vmw_vsock_virtio_transport_common vsock rfcomm algif_hash
> > algif_skcipher af_alg veth uinput xt_cgroup mtk_dip mtk_cam_isp
> > mtk_vcodec_enc mtk_vcodec_dec hci_uart mtk_fd mtk_mdp3 v4l2_h264
> > mtk_vcodec_common mtk_vpu xt_MASQUERADE mtk_jpeg cros_ec_rpmsg btqca
> > videobuf2_dma_contig v4l2_fwnode v4l2_mem2mem btrtl elants_i2c mtk_scp
> > mtk_rpmsg rpmsg_core mtk_scp_ipi mt8183_cci_devfreq ip6table_nat fuse
> > 8021q bluetooth ecdh_generic ecc iio_trig_sysfs cros_ec_lid_angle
> > cros_ec_sensors cros_ec_sensors_core industrialio_triggered_buffer
> > kfifo_buf cros_ec_sensorhub cros_ec_typec typec hid_google_hammer
> > ath10k_sdio lzo_rle lzo_compress ath10k_core ath mac80211 zram
> > cfg80211 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
> > videobuf2_common cdc_ether usbnet r8152 mii joydev
> > [  119.062613] CPU: 4 PID: 4161 Comm: chrome Tainted: G        W
> >    5.10.112 #105 39f11bffda227eaae4c704733b9bf01db22d8b4d
> > [  119.062617] Hardware name: Google burnet board (DT)
> > [  119.062623] pstate: 20400005 (nzCv daif +PAN -UAO -TCO BTYPE=--)
> > [  119.062636] pc : lzo1x_decompress_safe+0x1dc/0x564
> > [  119.062643] lr : lzo_uncompress+0x134/0x1f0
> > [  119.062647] sp : ffffffc01837b860
> > [  119.062650] x29: ffffffc01837b860 x28: 0000000000000000
> > [  119.062656] x27: 0000000000005451 x26: ffffffc0171c9445
> > [  119.062662] x25: 0000000000000000 x24: ffffffc017437000
> > [  119.062668] x23: ffffffc0171c944f x22: ffffffc017136000
> > [  119.062673] x21: ffffffc017336ff1 x20: ffffffc017237000
> > [  119.062679] x19: ffffffc01837b8d0 x18: 0000000000000000
> > [  119.062684] x17: 00000000000001eb x16: 0000000000000012
> > [  119.062689] x15: 000000000010000f x14: d600120202000001
> > [  119.062695] x13: ffffffc017336ff1 x12: ffffffc017336ff4
> > [  119.062700] x11: 0000000000000002 x10: 01010101010100ff
> > [  119.062705] x9 : ffffffffffffffff x8 : ffffffc0171c944d
> > [  119.062710] x7 : d15d3aaabd294330 x6 : 0206397115fe28ab
> > [  119.062715] x5 : ffffffc0171c944f x4 : 000000000009344f
> > [  119.062721] x3 : ffffffc01837b8d0 x2 : ffffffc017237000
> > [  119.062726] x1 : 000000000009344f x0 : ffffffc0171c9447
> > [  119.062731] Call trace:
> > [  119.062738]  lzo1x_decompress_safe+0x1dc/0x564
> > [  119.062742]  lzo_uncompress+0x134/0x1f0
> > [  119.062746]  squashfs_decompress+0x6c/0xb4
> > [  119.062753]  squashfs_read_data+0x1a8/0x298
> > [  119.062758]  squashfs_readahead+0x308/0x474
> > [  119.062765]  read_pages+0x74/0x280
> > [  119.062769]  page_cache_ra_unbounded+0x1d0/0x228
> > [  119.062773]  do_page_cache_ra+0x44/0x50
> > [  119.062779]  do_sync_mmap_readahead+0x188/0x1a0
> > [  119.062783]  filemap_fault+0x100/0x350
> > [  119.062789]  __do_fault+0x48/0x10c
> > [  119.062793]  do_cow_fault+0x58/0x12c
> > [  119.062797]  handle_mm_fault+0x544/0x904
> > [  119.062804]  do_page_fault+0x260/0x384
> > [  119.062809]  do_translation_fault+0x44/0x5c
> > [  119.062813]  do_mem_abort+0x48/0xb4
> > [  119.062819]  el0_da+0x28/0x34
> > [  119.062824]  el0_sync_compat_handler+0xb8/0xcc
> > [  119.062829]  el0_sync_compat+0x188/0x1c0
> > [  119.062837] Code: f94001ae f90002ae f94005ae 910041ad (f90006ae)
> > [  119.062842] ---[ end trace 3e9828c7360fd7be ]---
> > [  119.090436] Kernel panic - not syncing: Oops: Fatal exception
> > [  119.090455] SMP: stopping secondary CPUs
> > [  119.090467] Kernel Offset: 0x2729c00000 from 0xffffffc010000000
> > [  119.090471] PHYS_OFFSET: 0xffffffd880000000
> > [  119.090477] CPU features: 0x08240002,2188200c
> >
> > 1) Traces near when the crash happened:
> > [   79.495580] Block @ 0x60eea9c, compressed size 65744, src size 1048576
> > [   80.363573] Block @ 0x1f9f000, compressed size 200598, src size 1048576
> > [   80.371256] Block @ 0x1fcff96, compressed size 80772, src size 1048576
> > [   80.428388] Block @ 0x1fe3b1a, compressed size 83941, src size 1048576
> > [   80.435319] Block @ 0x1ff82ff, compressed size 77936, src size 1048576
> > [   80.724331] Block @ 0x4501000, compressed size 364069, src size 1048576
> > [   80.738683] Block @ 0x4dccf28, compressed size 603215, src size 2097152
>
> Src size 2097152 is clearly wrong, as the maximum data block size is 1
> Mbyte or 1048576.
>
> That debug line comes from
>
> https://elixir.bootlin.com/linux/latest/source/fs/squashfs/block.c#L156
>
> ----
> TRACE("Block @ 0x%llx, %scompressed size %d, src size %d\n",
>                 index, compressed ? "" : "un", length, output->length);
> ----
>
> Which indicates your code has created a page_actor of 2 Mbytes in size
> (output->length).
>
> This is completely incorrect, as the page_actor should never be larger
> than the size of the block to be read in question.  In most cases that
> will be msblk->block_size, but it may be less at the end of the file.
>
> You appear to be trying to read the amount of readahead requested.  But,
> you should always be trying to read the lesser of readahead, and the
> size of the block in question.
>
> Hope that helps.
>
> Phillip
>
Hi Phillip,
Thanks for the explanation. After restricting the size feed to
page_actor, the crash no longer happened.

Hi Xiongwei,
Can you test this version (sent as attachment) again? I've tested on
my platform:
- arm64
- kernel 5.10
- pack_data size ~ 300K
- time ureadahead pack_data
1. with c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted:
0.633s
0.755s
0.804s

2. apply the patch:
0.625s
0.656s
0.768s

Hi Matthew,
Thanks for reviewing the patch previously. Does this version look good
to you? If so, I can send it to the list.


Thanks for all of your help.

> >
> > It's also noticed that when the crash happened, nr_pages obtained by
> > readahead_count() is 512.
> > nr_pages = readahead_count(ractl); // this line
> >
> > 2) Normal cases that won't crash:
> > [   22.651750] Block @ 0xb3bbca6, compressed size 42172, src size 262144
> > [   22.653580] Block @ 0xb3c6162, compressed size 29815, src size 262144
> > [   22.656692] Block @ 0xb4a293f, compressed size 17484, src size 131072
> > [   22.666099] Block @ 0xb593881, compressed size 39742, src size 262144
> > [   22.668699] Block @ 0xb59d3bf, compressed size 37841, src size 262144
> > [   22.695739] Block @ 0x13698673, compressed size 65907, src size 131072
> > [   22.698619] Block @ 0x136a87e6, compressed size 3155, src size 131072
> > [   22.703400] Block @ 0xb1babe8, compressed size 99391, src size 131072
> > [   22.706288] Block @ 0x1514abc6, compressed size 4627, src size 131072
> >
> > nr_pages are observed to be 32, 64, 256... These won't cause a crash.
> > Other values (max_pages, bsize, block...) looks normal
> >
> > I'm not sure why the crash happened, but I tried to modify the mask
> > for a bit. After modifying the mask value to below, the crash is gone
> > (nr_pages are <=256).
> > Based on my testing on a 300K pack file, there's no performance change.
> >
> > diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > index 20ec48cf97c5..f6d9b6f88ed9 100644
> > --- a/fs/squashfs/file.c
> > +++ b/fs/squashfs/file.c
> > @@ -499,8 +499,8 @@ static void squashfs_readahead(struct
> > readahead_control *ractl)
> >   {
> >          struct inode *inode = ractl->mapping->host;
> >          struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> > -       size_t mask = (1UL << msblk->block_log) - 1;
> >          size_t shift = msblk->block_log - PAGE_SHIFT;
> > +       size_t mask = (1UL << shift) - 1;
> >
> >
> > Any pointers are appreciated. Thanks!
>

[-- Attachment #2: 0001-WIP-squashfs-implement-readahead.patch --]
[-- Type: text/x-patch, Size: 3224 bytes --]

From d50220684beed2b6d370d5e63a7dfb31a2b0788b Mon Sep 17 00:00:00 2001
From: Hsin-Yi Wang <hsinyi@chromium.org>
Date: Sun, 10 Oct 2021 21:22:25 +0800
Subject: [PATCH] squashfs: implement readahead

Implement readahead callback for squashfs. It will read datablocks
which cover pages in readahead request. For a few cases it will
not mark page as uptodate, including:
- file end is 0.
- current batch of pages isn't in the same datablock or not enough in a
  datablock.
Otherwise pages will be marked as uptodate. The unhandled pages will be
updated by readpage later.

Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
---
 fs/squashfs/file.c | 74 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 73 insertions(+), 1 deletion(-)

diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
index 89d492916dea..7cd57e0d88de 100644
--- a/fs/squashfs/file.c
+++ b/fs/squashfs/file.c
@@ -39,6 +39,7 @@
 #include "squashfs_fs_sb.h"
 #include "squashfs_fs_i.h"
 #include "squashfs.h"
+#include "page_actor.h"
 
 /*
  * Locate cache slot in range [offset, index] for specified inode.  If
@@ -494,7 +495,78 @@ static int squashfs_readpage(struct file *file, struct page *page)
 	return 0;
 }
 
+static void squashfs_readahead(struct readahead_control *ractl)
+{
+	struct inode *inode = ractl->mapping->host;
+	struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
+	size_t mask = (1UL << msblk->block_log) - 1;
+	size_t shift = msblk->block_log - PAGE_SHIFT;
+	loff_t req_end = readahead_pos(ractl) + readahead_length(ractl);
+	loff_t start = readahead_pos(ractl) &~ mask;
+	size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
+	struct squashfs_page_actor *actor;
+	unsigned int nr_pages = 0;
+	struct page **pages;
+	u64 block = 0;
+	int bsize, res, i, index;
+	int file_end = i_size_read(inode) >> msblk->block_log;
+	unsigned int max_pages = 1UL << shift;
+
+	readahead_expand(ractl, start, (len | mask) + 1);
+
+	if (readahead_pos(ractl) + readahead_length(ractl) < req_end ||
+	    file_end == 0)
+		return;
+
+	nr_pages = min(readahead_count(ractl), max_pages);
+
+	pages = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL);
+	if (!pages)
+		return;
+
+	actor = squashfs_page_actor_init_special(pages, nr_pages, 0);
+	if (!actor)
+		goto out;
+
+	for (;;) {
+		nr_pages = __readahead_batch(ractl, pages, max_pages);
+		if (!nr_pages)
+			break;
+
+		if (readahead_pos(ractl) >= i_size_read(inode) ||
+		    nr_pages < max_pages)
+			goto skip_pages;
+
+		index = pages[0]->index >> shift;
+		bsize = read_blocklist(inode, index, &block);
+		res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
+					 actor);
+
+		if (res >= 0 && (pages[nr_pages - 1]->index >> shift) == index)
+			for (i = 0; i < nr_pages; i++)
+				SetPageUptodate(pages[i]);
+
+		for (i = 0; i < nr_pages; i++) {
+			unlock_page(pages[i]);
+			put_page(pages[i]);
+		}
+	}
+
+	kfree(actor);
+	return;
+
+skip_pages:
+	for (i = 0; i < nr_pages; i++) {
+		unlock_page(pages[i]);
+		put_page(pages[i]);
+	}
+
+	kfree(actor);
+out:
+	kfree(pages);
+}
 
 const struct address_space_operations squashfs_aops = {
-	.readpage = squashfs_readpage
+	.readpage = squashfs_readpage,
+	.readahead = squashfs_readahead
 };
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-12  6:23                       ` Hsin-Yi Wang
@ 2022-05-13  5:33                         ` Phillip Lougher
  2022-05-13  6:35                           ` Hsin-Yi Wang
  2022-05-13 13:09                           ` Matthew Wilcox
  2022-05-13 12:16                         ` Xiongwei Song
  1 sibling, 2 replies; 34+ messages in thread
From: Phillip Lougher @ 2022-05-13  5:33 UTC (permalink / raw)
  To: Hsin-Yi Wang
  Cc: Xiongwei Song, Matthew Wilcox, Zheng Liang, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

On 12/05/2022 07:23, Hsin-Yi Wang wrote:

> 
> Hi Matthew,
> Thanks for reviewing the patch previously. Does this version look good
> to you? If so, I can send it to the list.
> 
> 
> Thanks for all of your help.

Hi Hsin-Yi Wang,

Thanks for updating the patch to minimise the pages used when
creating the page actor.

Looking at the new patch, I have a couple of questions which is worth
clarifying to have a fuller understanding of the readahead behaviour.
In otherwords I'm deducing the behaviour of the readahead calls
from context, and I want to make sure they're doing what I think
they're doing.

+	nr_pages = min(readahead_count(ractl), max_pages);

As I understand it, this will always produce nr_pages which will
cover the entirety of the block to be decompressed?  That is if
a block is block_size, it will return the number of pages necessary
to decompress the entire block?  It will never return less than the
necessary pages, i.e. if the block_size was 128K, it will never
return less than 32 pages?

Similarly, if at the end of the file, where the last block may not
be a full block (i.e. less than block_size) it will only return
the pages covered by the tail end block, not a full block.  For
example, if the last block is 16 Kbytes (and block_size is
128 Kbytes), it will return 4 pages and not 32 pages ...

Obviously this behaviour is important for decompression, because
you must always have enough pages to decompress the entire block
into.

You also shouldn't pass in more pages than expected (i.e. if the
block is only expected to decompress to 4 pages, that's what you
pass, rather than the full block size).  This helps to trap
corrupted blocks, where they are prevented to decompress to larger
than expected.

+ nr_pages = __readahead_batch(ractl, pages, max_pages)

My understanding is that this call will fully populate the
pages array with page references without any holes.  That
is none of the pages array entries will be NULL, meaning
there isn't a page for that entry.  In other words, if the
pages array has 32 pages, each of the 32 entries will
reference a page.

This is important for the decompression code, because it
expects each pages array entry to reference a page, which
can be kmapped to an address.  If an entry in the pages
array is NULL, this will break.

If the pages array can have holes (NULL pointers), I have
written an update patch which allows the decompression code
to handle these NULL pointers.

If the pages array can have NULL pointers, I can send you
the patch which will deal with this.

Thanks

Phillip



> 
>>>
>>> It's also noticed that when the crash happened, nr_pages obtained by
>>> readahead_count() is 512.
>>> nr_pages = readahead_count(ractl); // this line
>>>
>>> 2) Normal cases that won't crash:
>>> [   22.651750] Block @ 0xb3bbca6, compressed size 42172, src size 262144
>>> [   22.653580] Block @ 0xb3c6162, compressed size 29815, src size 262144
>>> [   22.656692] Block @ 0xb4a293f, compressed size 17484, src size 131072
>>> [   22.666099] Block @ 0xb593881, compressed size 39742, src size 262144
>>> [   22.668699] Block @ 0xb59d3bf, compressed size 37841, src size 262144
>>> [   22.695739] Block @ 0x13698673, compressed size 65907, src size 131072
>>> [   22.698619] Block @ 0x136a87e6, compressed size 3155, src size 131072
>>> [   22.703400] Block @ 0xb1babe8, compressed size 99391, src size 131072
>>> [   22.706288] Block @ 0x1514abc6, compressed size 4627, src size 131072
>>>
>>> nr_pages are observed to be 32, 64, 256... These won't cause a crash.
>>> Other values (max_pages, bsize, block...) looks normal
>>>
>>> I'm not sure why the crash happened, but I tried to modify the mask
>>> for a bit. After modifying the mask value to below, the crash is gone
>>> (nr_pages are <=256).
>>> Based on my testing on a 300K pack file, there's no performance change.
>>>
>>> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
>>> index 20ec48cf97c5..f6d9b6f88ed9 100644
>>> --- a/fs/squashfs/file.c
>>> +++ b/fs/squashfs/file.c
>>> @@ -499,8 +499,8 @@ static void squashfs_readahead(struct
>>> readahead_control *ractl)
>>>    {
>>>           struct inode *inode = ractl->mapping->host;
>>>           struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
>>> -       size_t mask = (1UL << msblk->block_log) - 1;
>>>           size_t shift = msblk->block_log - PAGE_SHIFT;
>>> +       size_t mask = (1UL << shift) - 1;
>>>
>>>
>>> Any pointers are appreciated. Thanks!
>>



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-13  5:33                         ` Phillip Lougher
@ 2022-05-13  6:35                           ` Hsin-Yi Wang
  2022-05-15  0:54                             ` Phillip Lougher
  2022-05-13 13:09                           ` Matthew Wilcox
  1 sibling, 1 reply; 34+ messages in thread
From: Hsin-Yi Wang @ 2022-05-13  6:35 UTC (permalink / raw)
  To: Phillip Lougher
  Cc: Xiongwei Song, Matthew Wilcox, Zheng Liang, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

On Fri, May 13, 2022 at 1:33 PM Phillip Lougher <phillip@squashfs.org.uk> wrote:
>
> On 12/05/2022 07:23, Hsin-Yi Wang wrote:
>
> >
> > Hi Matthew,
> > Thanks for reviewing the patch previously. Does this version look good
> > to you? If so, I can send it to the list.
> >
> >
> > Thanks for all of your help.
>
> Hi Hsin-Yi Wang,
>
Hi Philip,

> Thanks for updating the patch to minimise the pages used when
> creating the page actor.
>
> Looking at the new patch, I have a couple of questions which is worth
> clarifying to have a fuller understanding of the readahead behaviour.
> In otherwords I'm deducing the behaviour of the readahead calls
> from context, and I want to make sure they're doing what I think
> they're doing.
>
> +       nr_pages = min(readahead_count(ractl), max_pages);
>
> As I understand it, this will always produce nr_pages which will
> cover the entirety of the block to be decompressed?  That is if
> a block is block_size, it will return the number of pages necessary
> to decompress the entire block?  It will never return less than the
> necessary pages, i.e. if the block_size was 128K, it will never
> return less than 32 pages?
>
In most of the cases the size is max_pages (readahead_count(ractl) ==
max_pages).

> Similarly, if at the end of the file, where the last block may not
> be a full block (i.e. less than block_size) it will only return
> the pages covered by the tail end block, not a full block.  For
> example, if the last block is 16 Kbytes (and block_size is
> 128 Kbytes), it will return 4 pages and not 32 pages ...
>

But it's possible that readahead_count(ractl) < max_pages at the end of file.

> Obviously this behaviour is important for decompression, because
> you must always have enough pages to decompress the entire block
> into.
>
> You also shouldn't pass in more pages than expected (i.e. if the
> block is only expected to decompress to 4 pages, that's what you
> pass, rather than the full block size).  This helps to trap
> corrupted blocks, where they are prevented to decompress to larger
> than expected.
>
> + nr_pages = __readahead_batch(ractl, pages, max_pages)
>
> My understanding is that this call will fully populate the
> pages array with page references without any holes.  That
> is none of the pages array entries will be NULL, meaning
> there isn't a page for that entry.  In other words, if the
> pages array has 32 pages, each of the 32 entries will
> reference a page.
>
I noticed that if nr_pages < max_pages, calling read_blocklist() will
have SQUASHFS errors,

SQUASHFS error: Failed to read block 0x125ef7d: -5
SQUASHFS error: zlib decompression failed, data probably corrupt

so I did a check if nr_pages < max_pages before squashfs_read_data(),
just skip the remaining pages and let them be handled by readpage.

> This is important for the decompression code, because it
> expects each pages array entry to reference a page, which
> can be kmapped to an address.  If an entry in the pages
> array is NULL, this will break.
>
> If the pages array can have holes (NULL pointers), I have
> written an update patch which allows the decompression code
> to handle these NULL pointers.
>
> If the pages array can have NULL pointers, I can send you
> the patch which will deal with this.

Sure, if there are better ways to deal with this.

Thanks.

>
> Thanks
>
> Phillip
>
>
>
> >
> >>>
> >>> It's also noticed that when the crash happened, nr_pages obtained by
> >>> readahead_count() is 512.
> >>> nr_pages = readahead_count(ractl); // this line
> >>>
> >>> 2) Normal cases that won't crash:
> >>> [   22.651750] Block @ 0xb3bbca6, compressed size 42172, src size 262144
> >>> [   22.653580] Block @ 0xb3c6162, compressed size 29815, src size 262144
> >>> [   22.656692] Block @ 0xb4a293f, compressed size 17484, src size 131072
> >>> [   22.666099] Block @ 0xb593881, compressed size 39742, src size 262144
> >>> [   22.668699] Block @ 0xb59d3bf, compressed size 37841, src size 262144
> >>> [   22.695739] Block @ 0x13698673, compressed size 65907, src size 131072
> >>> [   22.698619] Block @ 0x136a87e6, compressed size 3155, src size 131072
> >>> [   22.703400] Block @ 0xb1babe8, compressed size 99391, src size 131072
> >>> [   22.706288] Block @ 0x1514abc6, compressed size 4627, src size 131072
> >>>
> >>> nr_pages are observed to be 32, 64, 256... These won't cause a crash.
> >>> Other values (max_pages, bsize, block...) looks normal
> >>>
> >>> I'm not sure why the crash happened, but I tried to modify the mask
> >>> for a bit. After modifying the mask value to below, the crash is gone
> >>> (nr_pages are <=256).
> >>> Based on my testing on a 300K pack file, there's no performance change.
> >>>
> >>> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> >>> index 20ec48cf97c5..f6d9b6f88ed9 100644
> >>> --- a/fs/squashfs/file.c
> >>> +++ b/fs/squashfs/file.c
> >>> @@ -499,8 +499,8 @@ static void squashfs_readahead(struct
> >>> readahead_control *ractl)
> >>>    {
> >>>           struct inode *inode = ractl->mapping->host;
> >>>           struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> >>> -       size_t mask = (1UL << msblk->block_log) - 1;
> >>>           size_t shift = msblk->block_log - PAGE_SHIFT;
> >>> +       size_t mask = (1UL << shift) - 1;
> >>>
> >>>
> >>> Any pointers are appreciated. Thanks!
> >>
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-12  6:23                       ` Hsin-Yi Wang
  2022-05-13  5:33                         ` Phillip Lougher
@ 2022-05-13 12:16                         ` Xiongwei Song
  2022-05-13 12:29                           ` Xiongwei Song
  2022-05-13 16:43                           ` Hsin-Yi Wang
  1 sibling, 2 replies; 34+ messages in thread
From: Xiongwei Song @ 2022-05-13 12:16 UTC (permalink / raw)
  To: Hsin-Yi Wang
  Cc: Phillip Lougher, Matthew Wilcox, Zheng Liang, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

Hello,

On Thu, May 12, 2022 at 2:23 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
>
> On Thu, May 12, 2022 at 3:13 AM Phillip Lougher <phillip@squashfs.org.uk> wrote:
> >
> > On 11/05/2022 16:12, Hsin-Yi Wang wrote:
> > > On Tue, May 10, 2022 at 9:19 PM Xiongwei Song <sxwjean@gmail.com> wrote:
> > >>
> > >> On Tue, May 10, 2022 at 8:47 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> > >>>
> > >>> On Tue, May 10, 2022 at 8:31 PM Xiongwei Song <sxwjean@gmail.com> wrote:
> > >>>>
> > >>>> Hi Hsin-Yi,
> > >>>>
> > >>>> On Mon, May 9, 2022 at 10:29 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> > >>>>>
> > >>>>> On Mon, May 9, 2022 at 9:21 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >>>>>>
> > >>>>>> On Mon, May 09, 2022 at 08:43:45PM +0800, Xiongwei Song wrote:
> > >>>>>>> Hi Hsin-Yi and Matthew,
> > >>>>>>>
> > >>>>>>> With the patch from the attachment on linux 5.10, ran the command as I
> > >>>>>>> mentioned earlier,
> > >>>>>>> got the results below:
> > >>>>>>> 1:40.65 (1m + 40.65s)
> > >>>>>>> 1:10.12
> > >>>>>>> 1:11.10
> > >>>>>>> 1:11.47
> > >>>>>>> 1:11.59
> > >>>>>>> 1:11.94
> > >>>>>>> 1:11.86
> > >>>>>>> 1:12.04
> > >>>>>>> 1:12.21
> > >>>>>>> 1:12.06
> > >>>>>>>
> > >>>>>>> The performance has improved obviously, but compared to linux 4.18, the
> > >>>>>>> performance is not so good.
> > >>>>>>>
> > >>>>> I think you shouldn't compare the performance with 4.18 directly,
> > >>>>> since there might be other factors that impact the performance.
> > >>>>
> > >>>> Make sense.
> > >>>>
> > >>>>> I'd suggest comparing the same kernel version with:
> > >>>>> a) with this patch
> > >>>>> b) with c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted.
> > >>>>
> > >>>> With 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
> > >>>> read-ahead") reverted and applied 0001-WIP-squashfs-implement-readahead.patch,
> > >>>> test result on linux 5.18:
> > >>>> 1:41.51 (1m + 41.51s)
> > >>>> 1:08.11
> > >>>> 1:10.37
> > >>>> 1:11.17
> > >>>> 1:11.32
> > >>>> 1:11.59
> > >>>> 1:12.23
> > >>>> 1:12.08
> > >>>> 1:12.76
> > >>>> 1:12.51
> > >>>>
> > >>>> performance worse 1 ~ 2s than linux 5.18 vanilla.
> > >>>>
> > >>>
> > >>> Can you share the pack file you used for testing? Thanks
> > >>
> > >> You are saying the files that are put in squashfs partitions? If yes, I can tell
> > >> I just put some dynamic libraries in partitions:
> > >> -rwxr-xr-x 1 root root  200680 Apr 20 03:57 ld-2.33.so
> > >> lrwxrwxrwx 1 root root      10 Apr 20 03:57 ld-linux-x86-64.so.2 -> ld-2.33.so
> > >> -rwxr-xr-x 1 root root   18776 Apr 20 03:57 libanl-2.33.so
> > >> lrwxrwxrwx 1 root root      14 Apr 20 03:57 libanl.so.1 -> libanl-2.33.so
> > >> lrwxrwxrwx 1 root root      17 Apr 20 04:08 libblkid.so.1 -> libblkid.so.1.1.0
> > >> -rwxr-xr-x 1 root root  330776 Apr 20 04:08 libblkid.so.1.1.0
> > >> -rwxr-xr-x 1 root root 1823192 Apr 20 03:57 libc-2.33.so
> > >> ...... snip ......
> > >>
> > >> The number of files is 110(55 libs + 55 soft links to libs).  I have 90 squashfs
> > >> partitions which save the identical files. The space of each partition is 11M,
> > >> nothing special.
> > >>
> > >> Thanks.
> > >>
> > >
> > > I noticed that there's a crash at
> > > https://elixir.bootlin.com/linux/latest/source/lib/lzo/lzo1x_decompress_safe.c#L218
> > > when testing on my system.
> > > (I have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS enabled)
> > >
> > > Full logs:
> > > [  119.062420] Unable to handle kernel paging request at virtual
> > > address ffffffc017337000
> > > [  119.062437] Mem abort info:
> > > [  119.062442]   ESR = 0x96000047
> > > [  119.062447]   EC = 0x25: DABT (current EL), IL = 32 bits
> > > [  119.062451]   SET = 0, FnV = 0
> > > [  119.062454]   EA = 0, S1PTW = 0
> > > [  119.062457] Data abort info:
> > > [  119.062460]   ISV = 0, ISS = 0x00000047
> > > [  119.062464]   CM = 0, WnR = 1
> > > [  119.062469] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000041099000
> > > [  119.062473] [ffffffc017337000] pgd=000000010014a003,
> > > p4d=000000010014a003, pud=000000010014a003, pmd=000000010ba59003,
> > > pte=0000000000000000
> > > [  119.062489] Internal error: Oops: 96000047 [#1] PREEMPT SMP
> > > [  119.062494] Modules linked in: vhost_vsock vhost vhost_iotlb
> > > vmw_vsock_virtio_transport_common vsock rfcomm algif_hash
> > > algif_skcipher af_alg veth uinput xt_cgroup mtk_dip mtk_cam_isp
> > > mtk_vcodec_enc mtk_vcodec_dec hci_uart mtk_fd mtk_mdp3 v4l2_h264
> > > mtk_vcodec_common mtk_vpu xt_MASQUERADE mtk_jpeg cros_ec_rpmsg btqca
> > > videobuf2_dma_contig v4l2_fwnode v4l2_mem2mem btrtl elants_i2c mtk_scp
> > > mtk_rpmsg rpmsg_core mtk_scp_ipi mt8183_cci_devfreq ip6table_nat fuse
> > > 8021q bluetooth ecdh_generic ecc iio_trig_sysfs cros_ec_lid_angle
> > > cros_ec_sensors cros_ec_sensors_core industrialio_triggered_buffer
> > > kfifo_buf cros_ec_sensorhub cros_ec_typec typec hid_google_hammer
> > > ath10k_sdio lzo_rle lzo_compress ath10k_core ath mac80211 zram
> > > cfg80211 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
> > > videobuf2_common cdc_ether usbnet r8152 mii joydev
> > > [  119.062613] CPU: 4 PID: 4161 Comm: chrome Tainted: G        W
> > >    5.10.112 #105 39f11bffda227eaae4c704733b9bf01db22d8b4d
> > > [  119.062617] Hardware name: Google burnet board (DT)
> > > [  119.062623] pstate: 20400005 (nzCv daif +PAN -UAO -TCO BTYPE=--)
> > > [  119.062636] pc : lzo1x_decompress_safe+0x1dc/0x564
> > > [  119.062643] lr : lzo_uncompress+0x134/0x1f0
> > > [  119.062647] sp : ffffffc01837b860
> > > [  119.062650] x29: ffffffc01837b860 x28: 0000000000000000
> > > [  119.062656] x27: 0000000000005451 x26: ffffffc0171c9445
> > > [  119.062662] x25: 0000000000000000 x24: ffffffc017437000
> > > [  119.062668] x23: ffffffc0171c944f x22: ffffffc017136000
> > > [  119.062673] x21: ffffffc017336ff1 x20: ffffffc017237000
> > > [  119.062679] x19: ffffffc01837b8d0 x18: 0000000000000000
> > > [  119.062684] x17: 00000000000001eb x16: 0000000000000012
> > > [  119.062689] x15: 000000000010000f x14: d600120202000001
> > > [  119.062695] x13: ffffffc017336ff1 x12: ffffffc017336ff4
> > > [  119.062700] x11: 0000000000000002 x10: 01010101010100ff
> > > [  119.062705] x9 : ffffffffffffffff x8 : ffffffc0171c944d
> > > [  119.062710] x7 : d15d3aaabd294330 x6 : 0206397115fe28ab
> > > [  119.062715] x5 : ffffffc0171c944f x4 : 000000000009344f
> > > [  119.062721] x3 : ffffffc01837b8d0 x2 : ffffffc017237000
> > > [  119.062726] x1 : 000000000009344f x0 : ffffffc0171c9447
> > > [  119.062731] Call trace:
> > > [  119.062738]  lzo1x_decompress_safe+0x1dc/0x564
> > > [  119.062742]  lzo_uncompress+0x134/0x1f0
> > > [  119.062746]  squashfs_decompress+0x6c/0xb4
> > > [  119.062753]  squashfs_read_data+0x1a8/0x298
> > > [  119.062758]  squashfs_readahead+0x308/0x474
> > > [  119.062765]  read_pages+0x74/0x280
> > > [  119.062769]  page_cache_ra_unbounded+0x1d0/0x228
> > > [  119.062773]  do_page_cache_ra+0x44/0x50
> > > [  119.062779]  do_sync_mmap_readahead+0x188/0x1a0
> > > [  119.062783]  filemap_fault+0x100/0x350
> > > [  119.062789]  __do_fault+0x48/0x10c
> > > [  119.062793]  do_cow_fault+0x58/0x12c
> > > [  119.062797]  handle_mm_fault+0x544/0x904
> > > [  119.062804]  do_page_fault+0x260/0x384
> > > [  119.062809]  do_translation_fault+0x44/0x5c
> > > [  119.062813]  do_mem_abort+0x48/0xb4
> > > [  119.062819]  el0_da+0x28/0x34
> > > [  119.062824]  el0_sync_compat_handler+0xb8/0xcc
> > > [  119.062829]  el0_sync_compat+0x188/0x1c0
> > > [  119.062837] Code: f94001ae f90002ae f94005ae 910041ad (f90006ae)
> > > [  119.062842] ---[ end trace 3e9828c7360fd7be ]---
> > > [  119.090436] Kernel panic - not syncing: Oops: Fatal exception
> > > [  119.090455] SMP: stopping secondary CPUs
> > > [  119.090467] Kernel Offset: 0x2729c00000 from 0xffffffc010000000
> > > [  119.090471] PHYS_OFFSET: 0xffffffd880000000
> > > [  119.090477] CPU features: 0x08240002,2188200c
> > >
> > > 1) Traces near when the crash happened:
> > > [   79.495580] Block @ 0x60eea9c, compressed size 65744, src size 1048576
> > > [   80.363573] Block @ 0x1f9f000, compressed size 200598, src size 1048576
> > > [   80.371256] Block @ 0x1fcff96, compressed size 80772, src size 1048576
> > > [   80.428388] Block @ 0x1fe3b1a, compressed size 83941, src size 1048576
> > > [   80.435319] Block @ 0x1ff82ff, compressed size 77936, src size 1048576
> > > [   80.724331] Block @ 0x4501000, compressed size 364069, src size 1048576
> > > [   80.738683] Block @ 0x4dccf28, compressed size 603215, src size 2097152
> >
> > Src size 2097152 is clearly wrong, as the maximum data block size is 1
> > Mbyte or 1048576.
> >
> > That debug line comes from
> >
> > https://elixir.bootlin.com/linux/latest/source/fs/squashfs/block.c#L156
> >
> > ----
> > TRACE("Block @ 0x%llx, %scompressed size %d, src size %d\n",
> >                 index, compressed ? "" : "un", length, output->length);
> > ----
> >
> > Which indicates your code has created a page_actor of 2 Mbytes in size
> > (output->length).
> >
> > This is completely incorrect, as the page_actor should never be larger
> > than the size of the block to be read in question.  In most cases that
> > will be msblk->block_size, but it may be less at the end of the file.
> >
> > You appear to be trying to read the amount of readahead requested.  But,
> > you should always be trying to read the lesser of readahead, and the
> > size of the block in question.
> >
> > Hope that helps.
> >
> > Phillip
> >
> Hi Phillip,
> Thanks for the explanation. After restricting the size feed to
> page_actor, the crash no longer happened.
>
> Hi Xiongwei,
> Can you test this version (sent as attachment) again? I've tested on
> my platform:
> - arm64
> - kernel 5.10
> - pack_data size ~ 300K
> - time ureadahead pack_data
> 1. with c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted:
> 0.633s
> 0.755s
> 0.804s
>
> 2. apply the patch:
> 0.625s
> 0.656s
> 0.768s
>
Thanks for sharing. I have done the test on 5.10 and 5.18. The results are
a little worse than patch v1 for my test env.

On linux 5.10:
With c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted::
1:37.16 (1m +37.16s)
1:04.18
1:05.28
1:06.07
1:06.31
1:06.58
1:06.80
1:06.79
1:06.95
1:06.61

With your patch v2:
2:04.27 (2m + 4.27s)
1:14.95
1:14.56
1:15.75
1:16.55
1:16.87
1:16.74
1:17.36
1:17.50
1:17.32

On linux 5.18:
The ra disabled by default::
1:12.82
1:07.68
1:08.94
1:09.65
1:09.87
1:10.32
1:10.47
1:10.34
1:10.24
1:10.34

With your patch v2:
2:00.14 (2m + 0.14s)
1:13.46
1:14.62
1:15.02
1:15.78
1:16.01
1:16.03
1:16.24
1:16.44
1:16.16

As you can see, there are extra 6s increase on both 5.10 and 5.18.
Don't know if the change of page number makes the overhead.

One stupid question, see below code from your patch:

 +       }
 +
 +       kfree(actor);
 +       return;
 +
 +skip_pages:

when release page pointers array after pages cached? I don't see
any chance to do that.

Regards,
Xiongwei



> Hi Matthew,
> Thanks for reviewing the patch previously. Does this version look good
> to you? If so, I can send it to the list.
>
>
> Thanks for all of your help.
>
> > >
> > > It's also noticed that when the crash happened, nr_pages obtained by
> > > readahead_count() is 512.
> > > nr_pages = readahead_count(ractl); // this line
> > >
> > > 2) Normal cases that won't crash:
> > > [   22.651750] Block @ 0xb3bbca6, compressed size 42172, src size 262144
> > > [   22.653580] Block @ 0xb3c6162, compressed size 29815, src size 262144
> > > [   22.656692] Block @ 0xb4a293f, compressed size 17484, src size 131072
> > > [   22.666099] Block @ 0xb593881, compressed size 39742, src size 262144
> > > [   22.668699] Block @ 0xb59d3bf, compressed size 37841, src size 262144
> > > [   22.695739] Block @ 0x13698673, compressed size 65907, src size 131072
> > > [   22.698619] Block @ 0x136a87e6, compressed size 3155, src size 131072
> > > [   22.703400] Block @ 0xb1babe8, compressed size 99391, src size 131072
> > > [   22.706288] Block @ 0x1514abc6, compressed size 4627, src size 131072
> > >
> > > nr_pages are observed to be 32, 64, 256... These won't cause a crash.
> > > Other values (max_pages, bsize, block...) looks normal
> > >
> > > I'm not sure why the crash happened, but I tried to modify the mask
> > > for a bit. After modifying the mask value to below, the crash is gone
> > > (nr_pages are <=256).
> > > Based on my testing on a 300K pack file, there's no performance change.
> > >
> > > diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > > index 20ec48cf97c5..f6d9b6f88ed9 100644
> > > --- a/fs/squashfs/file.c
> > > +++ b/fs/squashfs/file.c
> > > @@ -499,8 +499,8 @@ static void squashfs_readahead(struct
> > > readahead_control *ractl)
> > >   {
> > >          struct inode *inode = ractl->mapping->host;
> > >          struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> > > -       size_t mask = (1UL << msblk->block_log) - 1;
> > >          size_t shift = msblk->block_log - PAGE_SHIFT;
> > > +       size_t mask = (1UL << shift) - 1;
> > >
> > >
> > > Any pointers are appreciated. Thanks!
> >


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-13 12:16                         ` Xiongwei Song
@ 2022-05-13 12:29                           ` Xiongwei Song
  2022-05-13 16:43                           ` Hsin-Yi Wang
  1 sibling, 0 replies; 34+ messages in thread
From: Xiongwei Song @ 2022-05-13 12:29 UTC (permalink / raw)
  To: Hsin-Yi Wang
  Cc: Phillip Lougher, Matthew Wilcox, Zheng Liang, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

Hi Hsin-Yi,

One more thing, we should revert 9eec1d897139 ("squashfs: provide
backing_dev_info in order to disable read-ahead")? Then put the two
patches in one review?

Regards,
Xiongwei

On Fri, May 13, 2022 at 8:16 PM Xiongwei Song <sxwjean@gmail.com> wrote:
>
> Hello,
>
> On Thu, May 12, 2022 at 2:23 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> >
> > On Thu, May 12, 2022 at 3:13 AM Phillip Lougher <phillip@squashfs.org.uk> wrote:
> > >
> > > On 11/05/2022 16:12, Hsin-Yi Wang wrote:
> > > > On Tue, May 10, 2022 at 9:19 PM Xiongwei Song <sxwjean@gmail.com> wrote:
> > > >>
> > > >> On Tue, May 10, 2022 at 8:47 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> > > >>>
> > > >>> On Tue, May 10, 2022 at 8:31 PM Xiongwei Song <sxwjean@gmail.com> wrote:
> > > >>>>
> > > >>>> Hi Hsin-Yi,
> > > >>>>
> > > >>>> On Mon, May 9, 2022 at 10:29 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> > > >>>>>
> > > >>>>> On Mon, May 9, 2022 at 9:21 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > >>>>>>
> > > >>>>>> On Mon, May 09, 2022 at 08:43:45PM +0800, Xiongwei Song wrote:
> > > >>>>>>> Hi Hsin-Yi and Matthew,
> > > >>>>>>>
> > > >>>>>>> With the patch from the attachment on linux 5.10, ran the command as I
> > > >>>>>>> mentioned earlier,
> > > >>>>>>> got the results below:
> > > >>>>>>> 1:40.65 (1m + 40.65s)
> > > >>>>>>> 1:10.12
> > > >>>>>>> 1:11.10
> > > >>>>>>> 1:11.47
> > > >>>>>>> 1:11.59
> > > >>>>>>> 1:11.94
> > > >>>>>>> 1:11.86
> > > >>>>>>> 1:12.04
> > > >>>>>>> 1:12.21
> > > >>>>>>> 1:12.06
> > > >>>>>>>
> > > >>>>>>> The performance has improved obviously, but compared to linux 4.18, the
> > > >>>>>>> performance is not so good.
> > > >>>>>>>
> > > >>>>> I think you shouldn't compare the performance with 4.18 directly,
> > > >>>>> since there might be other factors that impact the performance.
> > > >>>>
> > > >>>> Make sense.
> > > >>>>
> > > >>>>> I'd suggest comparing the same kernel version with:
> > > >>>>> a) with this patch
> > > >>>>> b) with c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted.
> > > >>>>
> > > >>>> With 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
> > > >>>> read-ahead") reverted and applied 0001-WIP-squashfs-implement-readahead.patch,
> > > >>>> test result on linux 5.18:
> > > >>>> 1:41.51 (1m + 41.51s)
> > > >>>> 1:08.11
> > > >>>> 1:10.37
> > > >>>> 1:11.17
> > > >>>> 1:11.32
> > > >>>> 1:11.59
> > > >>>> 1:12.23
> > > >>>> 1:12.08
> > > >>>> 1:12.76
> > > >>>> 1:12.51
> > > >>>>
> > > >>>> performance worse 1 ~ 2s than linux 5.18 vanilla.
> > > >>>>
> > > >>>
> > > >>> Can you share the pack file you used for testing? Thanks
> > > >>
> > > >> You are saying the files that are put in squashfs partitions? If yes, I can tell
> > > >> I just put some dynamic libraries in partitions:
> > > >> -rwxr-xr-x 1 root root  200680 Apr 20 03:57 ld-2.33.so
> > > >> lrwxrwxrwx 1 root root      10 Apr 20 03:57 ld-linux-x86-64.so.2 -> ld-2.33.so
> > > >> -rwxr-xr-x 1 root root   18776 Apr 20 03:57 libanl-2.33.so
> > > >> lrwxrwxrwx 1 root root      14 Apr 20 03:57 libanl.so.1 -> libanl-2.33.so
> > > >> lrwxrwxrwx 1 root root      17 Apr 20 04:08 libblkid.so.1 -> libblkid.so.1.1.0
> > > >> -rwxr-xr-x 1 root root  330776 Apr 20 04:08 libblkid.so.1.1.0
> > > >> -rwxr-xr-x 1 root root 1823192 Apr 20 03:57 libc-2.33.so
> > > >> ...... snip ......
> > > >>
> > > >> The number of files is 110(55 libs + 55 soft links to libs).  I have 90 squashfs
> > > >> partitions which save the identical files. The space of each partition is 11M,
> > > >> nothing special.
> > > >>
> > > >> Thanks.
> > > >>
> > > >
> > > > I noticed that there's a crash at
> > > > https://elixir.bootlin.com/linux/latest/source/lib/lzo/lzo1x_decompress_safe.c#L218
> > > > when testing on my system.
> > > > (I have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS enabled)
> > > >
> > > > Full logs:
> > > > [  119.062420] Unable to handle kernel paging request at virtual
> > > > address ffffffc017337000
> > > > [  119.062437] Mem abort info:
> > > > [  119.062442]   ESR = 0x96000047
> > > > [  119.062447]   EC = 0x25: DABT (current EL), IL = 32 bits
> > > > [  119.062451]   SET = 0, FnV = 0
> > > > [  119.062454]   EA = 0, S1PTW = 0
> > > > [  119.062457] Data abort info:
> > > > [  119.062460]   ISV = 0, ISS = 0x00000047
> > > > [  119.062464]   CM = 0, WnR = 1
> > > > [  119.062469] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000041099000
> > > > [  119.062473] [ffffffc017337000] pgd=000000010014a003,
> > > > p4d=000000010014a003, pud=000000010014a003, pmd=000000010ba59003,
> > > > pte=0000000000000000
> > > > [  119.062489] Internal error: Oops: 96000047 [#1] PREEMPT SMP
> > > > [  119.062494] Modules linked in: vhost_vsock vhost vhost_iotlb
> > > > vmw_vsock_virtio_transport_common vsock rfcomm algif_hash
> > > > algif_skcipher af_alg veth uinput xt_cgroup mtk_dip mtk_cam_isp
> > > > mtk_vcodec_enc mtk_vcodec_dec hci_uart mtk_fd mtk_mdp3 v4l2_h264
> > > > mtk_vcodec_common mtk_vpu xt_MASQUERADE mtk_jpeg cros_ec_rpmsg btqca
> > > > videobuf2_dma_contig v4l2_fwnode v4l2_mem2mem btrtl elants_i2c mtk_scp
> > > > mtk_rpmsg rpmsg_core mtk_scp_ipi mt8183_cci_devfreq ip6table_nat fuse
> > > > 8021q bluetooth ecdh_generic ecc iio_trig_sysfs cros_ec_lid_angle
> > > > cros_ec_sensors cros_ec_sensors_core industrialio_triggered_buffer
> > > > kfifo_buf cros_ec_sensorhub cros_ec_typec typec hid_google_hammer
> > > > ath10k_sdio lzo_rle lzo_compress ath10k_core ath mac80211 zram
> > > > cfg80211 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
> > > > videobuf2_common cdc_ether usbnet r8152 mii joydev
> > > > [  119.062613] CPU: 4 PID: 4161 Comm: chrome Tainted: G        W
> > > >    5.10.112 #105 39f11bffda227eaae4c704733b9bf01db22d8b4d
> > > > [  119.062617] Hardware name: Google burnet board (DT)
> > > > [  119.062623] pstate: 20400005 (nzCv daif +PAN -UAO -TCO BTYPE=--)
> > > > [  119.062636] pc : lzo1x_decompress_safe+0x1dc/0x564
> > > > [  119.062643] lr : lzo_uncompress+0x134/0x1f0
> > > > [  119.062647] sp : ffffffc01837b860
> > > > [  119.062650] x29: ffffffc01837b860 x28: 0000000000000000
> > > > [  119.062656] x27: 0000000000005451 x26: ffffffc0171c9445
> > > > [  119.062662] x25: 0000000000000000 x24: ffffffc017437000
> > > > [  119.062668] x23: ffffffc0171c944f x22: ffffffc017136000
> > > > [  119.062673] x21: ffffffc017336ff1 x20: ffffffc017237000
> > > > [  119.062679] x19: ffffffc01837b8d0 x18: 0000000000000000
> > > > [  119.062684] x17: 00000000000001eb x16: 0000000000000012
> > > > [  119.062689] x15: 000000000010000f x14: d600120202000001
> > > > [  119.062695] x13: ffffffc017336ff1 x12: ffffffc017336ff4
> > > > [  119.062700] x11: 0000000000000002 x10: 01010101010100ff
> > > > [  119.062705] x9 : ffffffffffffffff x8 : ffffffc0171c944d
> > > > [  119.062710] x7 : d15d3aaabd294330 x6 : 0206397115fe28ab
> > > > [  119.062715] x5 : ffffffc0171c944f x4 : 000000000009344f
> > > > [  119.062721] x3 : ffffffc01837b8d0 x2 : ffffffc017237000
> > > > [  119.062726] x1 : 000000000009344f x0 : ffffffc0171c9447
> > > > [  119.062731] Call trace:
> > > > [  119.062738]  lzo1x_decompress_safe+0x1dc/0x564
> > > > [  119.062742]  lzo_uncompress+0x134/0x1f0
> > > > [  119.062746]  squashfs_decompress+0x6c/0xb4
> > > > [  119.062753]  squashfs_read_data+0x1a8/0x298
> > > > [  119.062758]  squashfs_readahead+0x308/0x474
> > > > [  119.062765]  read_pages+0x74/0x280
> > > > [  119.062769]  page_cache_ra_unbounded+0x1d0/0x228
> > > > [  119.062773]  do_page_cache_ra+0x44/0x50
> > > > [  119.062779]  do_sync_mmap_readahead+0x188/0x1a0
> > > > [  119.062783]  filemap_fault+0x100/0x350
> > > > [  119.062789]  __do_fault+0x48/0x10c
> > > > [  119.062793]  do_cow_fault+0x58/0x12c
> > > > [  119.062797]  handle_mm_fault+0x544/0x904
> > > > [  119.062804]  do_page_fault+0x260/0x384
> > > > [  119.062809]  do_translation_fault+0x44/0x5c
> > > > [  119.062813]  do_mem_abort+0x48/0xb4
> > > > [  119.062819]  el0_da+0x28/0x34
> > > > [  119.062824]  el0_sync_compat_handler+0xb8/0xcc
> > > > [  119.062829]  el0_sync_compat+0x188/0x1c0
> > > > [  119.062837] Code: f94001ae f90002ae f94005ae 910041ad (f90006ae)
> > > > [  119.062842] ---[ end trace 3e9828c7360fd7be ]---
> > > > [  119.090436] Kernel panic - not syncing: Oops: Fatal exception
> > > > [  119.090455] SMP: stopping secondary CPUs
> > > > [  119.090467] Kernel Offset: 0x2729c00000 from 0xffffffc010000000
> > > > [  119.090471] PHYS_OFFSET: 0xffffffd880000000
> > > > [  119.090477] CPU features: 0x08240002,2188200c
> > > >
> > > > 1) Traces near when the crash happened:
> > > > [   79.495580] Block @ 0x60eea9c, compressed size 65744, src size 1048576
> > > > [   80.363573] Block @ 0x1f9f000, compressed size 200598, src size 1048576
> > > > [   80.371256] Block @ 0x1fcff96, compressed size 80772, src size 1048576
> > > > [   80.428388] Block @ 0x1fe3b1a, compressed size 83941, src size 1048576
> > > > [   80.435319] Block @ 0x1ff82ff, compressed size 77936, src size 1048576
> > > > [   80.724331] Block @ 0x4501000, compressed size 364069, src size 1048576
> > > > [   80.738683] Block @ 0x4dccf28, compressed size 603215, src size 2097152
> > >
> > > Src size 2097152 is clearly wrong, as the maximum data block size is 1
> > > Mbyte or 1048576.
> > >
> > > That debug line comes from
> > >
> > > https://elixir.bootlin.com/linux/latest/source/fs/squashfs/block.c#L156
> > >
> > > ----
> > > TRACE("Block @ 0x%llx, %scompressed size %d, src size %d\n",
> > >                 index, compressed ? "" : "un", length, output->length);
> > > ----
> > >
> > > Which indicates your code has created a page_actor of 2 Mbytes in size
> > > (output->length).
> > >
> > > This is completely incorrect, as the page_actor should never be larger
> > > than the size of the block to be read in question.  In most cases that
> > > will be msblk->block_size, but it may be less at the end of the file.
> > >
> > > You appear to be trying to read the amount of readahead requested.  But,
> > > you should always be trying to read the lesser of readahead, and the
> > > size of the block in question.
> > >
> > > Hope that helps.
> > >
> > > Phillip
> > >
> > Hi Phillip,
> > Thanks for the explanation. After restricting the size feed to
> > page_actor, the crash no longer happened.
> >
> > Hi Xiongwei,
> > Can you test this version (sent as attachment) again? I've tested on
> > my platform:
> > - arm64
> > - kernel 5.10
> > - pack_data size ~ 300K
> > - time ureadahead pack_data
> > 1. with c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted:
> > 0.633s
> > 0.755s
> > 0.804s
> >
> > 2. apply the patch:
> > 0.625s
> > 0.656s
> > 0.768s
> >
> Thanks for sharing. I have done the test on 5.10 and 5.18. The results are
> a little worse than patch v1 for my test env.
>
> On linux 5.10:
> With c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted::
> 1:37.16 (1m +37.16s)
> 1:04.18
> 1:05.28
> 1:06.07
> 1:06.31
> 1:06.58
> 1:06.80
> 1:06.79
> 1:06.95
> 1:06.61
>
> With your patch v2:
> 2:04.27 (2m + 4.27s)
> 1:14.95
> 1:14.56
> 1:15.75
> 1:16.55
> 1:16.87
> 1:16.74
> 1:17.36
> 1:17.50
> 1:17.32
>
> On linux 5.18:
> The ra disabled by default::
> 1:12.82
> 1:07.68
> 1:08.94
> 1:09.65
> 1:09.87
> 1:10.32
> 1:10.47
> 1:10.34
> 1:10.24
> 1:10.34
>
> With your patch v2:
> 2:00.14 (2m + 0.14s)
> 1:13.46
> 1:14.62
> 1:15.02
> 1:15.78
> 1:16.01
> 1:16.03
> 1:16.24
> 1:16.44
> 1:16.16
>
> As you can see, there are extra 6s increase on both 5.10 and 5.18.
> Don't know if the change of page number makes the overhead.
>
> One stupid question, see below code from your patch:
>
>  +       }
>  +
>  +       kfree(actor);
>  +       return;
>  +
>  +skip_pages:
>
> when release page pointers array after pages cached? I don't see
> any chance to do that.
>
> Regards,
> Xiongwei
>
>
>
> > Hi Matthew,
> > Thanks for reviewing the patch previously. Does this version look good
> > to you? If so, I can send it to the list.
> >
> >
> > Thanks for all of your help.
> >
> > > >
> > > > It's also noticed that when the crash happened, nr_pages obtained by
> > > > readahead_count() is 512.
> > > > nr_pages = readahead_count(ractl); // this line
> > > >
> > > > 2) Normal cases that won't crash:
> > > > [   22.651750] Block @ 0xb3bbca6, compressed size 42172, src size 262144
> > > > [   22.653580] Block @ 0xb3c6162, compressed size 29815, src size 262144
> > > > [   22.656692] Block @ 0xb4a293f, compressed size 17484, src size 131072
> > > > [   22.666099] Block @ 0xb593881, compressed size 39742, src size 262144
> > > > [   22.668699] Block @ 0xb59d3bf, compressed size 37841, src size 262144
> > > > [   22.695739] Block @ 0x13698673, compressed size 65907, src size 131072
> > > > [   22.698619] Block @ 0x136a87e6, compressed size 3155, src size 131072
> > > > [   22.703400] Block @ 0xb1babe8, compressed size 99391, src size 131072
> > > > [   22.706288] Block @ 0x1514abc6, compressed size 4627, src size 131072
> > > >
> > > > nr_pages are observed to be 32, 64, 256... These won't cause a crash.
> > > > Other values (max_pages, bsize, block...) looks normal
> > > >
> > > > I'm not sure why the crash happened, but I tried to modify the mask
> > > > for a bit. After modifying the mask value to below, the crash is gone
> > > > (nr_pages are <=256).
> > > > Based on my testing on a 300K pack file, there's no performance change.
> > > >
> > > > diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > > > index 20ec48cf97c5..f6d9b6f88ed9 100644
> > > > --- a/fs/squashfs/file.c
> > > > +++ b/fs/squashfs/file.c
> > > > @@ -499,8 +499,8 @@ static void squashfs_readahead(struct
> > > > readahead_control *ractl)
> > > >   {
> > > >          struct inode *inode = ractl->mapping->host;
> > > >          struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> > > > -       size_t mask = (1UL << msblk->block_log) - 1;
> > > >          size_t shift = msblk->block_log - PAGE_SHIFT;
> > > > +       size_t mask = (1UL << shift) - 1;
> > > >
> > > >
> > > > Any pointers are appreciated. Thanks!
> > >


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-13  5:33                         ` Phillip Lougher
  2022-05-13  6:35                           ` Hsin-Yi Wang
@ 2022-05-13 13:09                           ` Matthew Wilcox
  2022-05-15  0:05                             ` Phillip Lougher
  1 sibling, 1 reply; 34+ messages in thread
From: Matthew Wilcox @ 2022-05-13 13:09 UTC (permalink / raw)
  To: Phillip Lougher
  Cc: Hsin-Yi Wang, Xiongwei Song, Zheng Liang, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

On Fri, May 13, 2022 at 06:33:21AM +0100, Phillip Lougher wrote:
> Looking at the new patch, I have a couple of questions which is worth
> clarifying to have a fuller understanding of the readahead behaviour.
> In otherwords I'm deducing the behaviour of the readahead calls
> from context, and I want to make sure they're doing what I think
> they're doing.

I did write quite a lot of documentation as part of the readahead
revision, and filesystem authors are the target audience, so this is
somewhat disheartening to read.  What could I have done better to make
the readahead documentation obvious for you to find?

> +	nr_pages = min(readahead_count(ractl), max_pages);
> 
> As I understand it, this will always produce nr_pages which will
> cover the entirety of the block to be decompressed?  That is if
> a block is block_size, it will return the number of pages necessary
> to decompress the entire block?  It will never return less than the
> necessary pages, i.e. if the block_size was 128K, it will never
> return less than 32 pages?

readahead_count() returns the number of pages that the page cache is
asking the filesystem for.  It may be any number from 1 to whatever
the current readahead window is.  It's possible to ask the page
cache to expand the readahead request to be aligned to a decompression
boundary, but that may not be possible.  For example, we may be in a
situation where we read pages 32-63 from the file previously, then
the page cache chose to discard pages 33, 35, 37, .., 63 under memory
pressure, and now the file is being re-read.  This isn't a likely
usage pattern, of course, but it's a situation we have to cope with.

> + nr_pages = __readahead_batch(ractl, pages, max_pages)
> 
> My understanding is that this call will fully populate the
> pages array with page references without any holes.  That
> is none of the pages array entries will be NULL, meaning
> there isn't a page for that entry.  In other words, if the
> pages array has 32 pages, each of the 32 entries will
> reference a page.

That is correct, a readahead request is always for a contiguous range of
the file.  The pages are allocated before calling ->readahead, so
there's no opportunity for failure; they exist and they're already in
the page cache, waiting for the FS to tell the pagecache that they're
uptodate and unlock them.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-13 12:16                         ` Xiongwei Song
  2022-05-13 12:29                           ` Xiongwei Song
@ 2022-05-13 16:43                           ` Hsin-Yi Wang
  2022-05-13 18:12                             ` Matthew Wilcox
  1 sibling, 1 reply; 34+ messages in thread
From: Hsin-Yi Wang @ 2022-05-13 16:43 UTC (permalink / raw)
  To: Xiongwei Song
  Cc: Phillip Lougher, Matthew Wilcox, Zheng Liang, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

On Fri, May 13, 2022 at 8:17 PM Xiongwei Song <sxwjean@gmail.com> wrote:
>
> Hello,
>
> On Thu, May 12, 2022 at 2:23 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> >
> > On Thu, May 12, 2022 at 3:13 AM Phillip Lougher <phillip@squashfs.org.uk> wrote:
> > >
> > > On 11/05/2022 16:12, Hsin-Yi Wang wrote:
> > > > On Tue, May 10, 2022 at 9:19 PM Xiongwei Song <sxwjean@gmail.com> wrote:
> > > >>
> > > >> On Tue, May 10, 2022 at 8:47 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> > > >>>
> > > >>> On Tue, May 10, 2022 at 8:31 PM Xiongwei Song <sxwjean@gmail.com> wrote:
> > > >>>>
> > > >>>> Hi Hsin-Yi,
> > > >>>>
> > > >>>> On Mon, May 9, 2022 at 10:29 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> > > >>>>>
> > > >>>>> On Mon, May 9, 2022 at 9:21 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > >>>>>>
> > > >>>>>> On Mon, May 09, 2022 at 08:43:45PM +0800, Xiongwei Song wrote:
> > > >>>>>>> Hi Hsin-Yi and Matthew,
> > > >>>>>>>
> > > >>>>>>> With the patch from the attachment on linux 5.10, ran the command as I
> > > >>>>>>> mentioned earlier,
> > > >>>>>>> got the results below:
> > > >>>>>>> 1:40.65 (1m + 40.65s)
> > > >>>>>>> 1:10.12
> > > >>>>>>> 1:11.10
> > > >>>>>>> 1:11.47
> > > >>>>>>> 1:11.59
> > > >>>>>>> 1:11.94
> > > >>>>>>> 1:11.86
> > > >>>>>>> 1:12.04
> > > >>>>>>> 1:12.21
> > > >>>>>>> 1:12.06
> > > >>>>>>>
> > > >>>>>>> The performance has improved obviously, but compared to linux 4.18, the
> > > >>>>>>> performance is not so good.
> > > >>>>>>>
> > > >>>>> I think you shouldn't compare the performance with 4.18 directly,
> > > >>>>> since there might be other factors that impact the performance.
> > > >>>>
> > > >>>> Make sense.
> > > >>>>
> > > >>>>> I'd suggest comparing the same kernel version with:
> > > >>>>> a) with this patch
> > > >>>>> b) with c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted.
> > > >>>>
> > > >>>> With 9eec1d897139 ("squashfs: provide backing_dev_info in order to disable
> > > >>>> read-ahead") reverted and applied 0001-WIP-squashfs-implement-readahead.patch,
> > > >>>> test result on linux 5.18:
> > > >>>> 1:41.51 (1m + 41.51s)
> > > >>>> 1:08.11
> > > >>>> 1:10.37
> > > >>>> 1:11.17
> > > >>>> 1:11.32
> > > >>>> 1:11.59
> > > >>>> 1:12.23
> > > >>>> 1:12.08
> > > >>>> 1:12.76
> > > >>>> 1:12.51
> > > >>>>
> > > >>>> performance worse 1 ~ 2s than linux 5.18 vanilla.
> > > >>>>
> > > >>>
> > > >>> Can you share the pack file you used for testing? Thanks
> > > >>
> > > >> You are saying the files that are put in squashfs partitions? If yes, I can tell
> > > >> I just put some dynamic libraries in partitions:
> > > >> -rwxr-xr-x 1 root root  200680 Apr 20 03:57 ld-2.33.so
> > > >> lrwxrwxrwx 1 root root      10 Apr 20 03:57 ld-linux-x86-64.so.2 -> ld-2.33.so
> > > >> -rwxr-xr-x 1 root root   18776 Apr 20 03:57 libanl-2.33.so
> > > >> lrwxrwxrwx 1 root root      14 Apr 20 03:57 libanl.so.1 -> libanl-2.33.so
> > > >> lrwxrwxrwx 1 root root      17 Apr 20 04:08 libblkid.so.1 -> libblkid.so.1.1.0
> > > >> -rwxr-xr-x 1 root root  330776 Apr 20 04:08 libblkid.so.1.1.0
> > > >> -rwxr-xr-x 1 root root 1823192 Apr 20 03:57 libc-2.33.so
> > > >> ...... snip ......
> > > >>
> > > >> The number of files is 110(55 libs + 55 soft links to libs).  I have 90 squashfs
> > > >> partitions which save the identical files. The space of each partition is 11M,
> > > >> nothing special.
> > > >>
> > > >> Thanks.
> > > >>
> > > >
> > > > I noticed that there's a crash at
> > > > https://elixir.bootlin.com/linux/latest/source/lib/lzo/lzo1x_decompress_safe.c#L218
> > > > when testing on my system.
> > > > (I have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS enabled)
> > > >
> > > > Full logs:
> > > > [  119.062420] Unable to handle kernel paging request at virtual
> > > > address ffffffc017337000
> > > > [  119.062437] Mem abort info:
> > > > [  119.062442]   ESR = 0x96000047
> > > > [  119.062447]   EC = 0x25: DABT (current EL), IL = 32 bits
> > > > [  119.062451]   SET = 0, FnV = 0
> > > > [  119.062454]   EA = 0, S1PTW = 0
> > > > [  119.062457] Data abort info:
> > > > [  119.062460]   ISV = 0, ISS = 0x00000047
> > > > [  119.062464]   CM = 0, WnR = 1
> > > > [  119.062469] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000041099000
> > > > [  119.062473] [ffffffc017337000] pgd=000000010014a003,
> > > > p4d=000000010014a003, pud=000000010014a003, pmd=000000010ba59003,
> > > > pte=0000000000000000
> > > > [  119.062489] Internal error: Oops: 96000047 [#1] PREEMPT SMP
> > > > [  119.062494] Modules linked in: vhost_vsock vhost vhost_iotlb
> > > > vmw_vsock_virtio_transport_common vsock rfcomm algif_hash
> > > > algif_skcipher af_alg veth uinput xt_cgroup mtk_dip mtk_cam_isp
> > > > mtk_vcodec_enc mtk_vcodec_dec hci_uart mtk_fd mtk_mdp3 v4l2_h264
> > > > mtk_vcodec_common mtk_vpu xt_MASQUERADE mtk_jpeg cros_ec_rpmsg btqca
> > > > videobuf2_dma_contig v4l2_fwnode v4l2_mem2mem btrtl elants_i2c mtk_scp
> > > > mtk_rpmsg rpmsg_core mtk_scp_ipi mt8183_cci_devfreq ip6table_nat fuse
> > > > 8021q bluetooth ecdh_generic ecc iio_trig_sysfs cros_ec_lid_angle
> > > > cros_ec_sensors cros_ec_sensors_core industrialio_triggered_buffer
> > > > kfifo_buf cros_ec_sensorhub cros_ec_typec typec hid_google_hammer
> > > > ath10k_sdio lzo_rle lzo_compress ath10k_core ath mac80211 zram
> > > > cfg80211 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
> > > > videobuf2_common cdc_ether usbnet r8152 mii joydev
> > > > [  119.062613] CPU: 4 PID: 4161 Comm: chrome Tainted: G        W
> > > >    5.10.112 #105 39f11bffda227eaae4c704733b9bf01db22d8b4d
> > > > [  119.062617] Hardware name: Google burnet board (DT)
> > > > [  119.062623] pstate: 20400005 (nzCv daif +PAN -UAO -TCO BTYPE=--)
> > > > [  119.062636] pc : lzo1x_decompress_safe+0x1dc/0x564
> > > > [  119.062643] lr : lzo_uncompress+0x134/0x1f0
> > > > [  119.062647] sp : ffffffc01837b860
> > > > [  119.062650] x29: ffffffc01837b860 x28: 0000000000000000
> > > > [  119.062656] x27: 0000000000005451 x26: ffffffc0171c9445
> > > > [  119.062662] x25: 0000000000000000 x24: ffffffc017437000
> > > > [  119.062668] x23: ffffffc0171c944f x22: ffffffc017136000
> > > > [  119.062673] x21: ffffffc017336ff1 x20: ffffffc017237000
> > > > [  119.062679] x19: ffffffc01837b8d0 x18: 0000000000000000
> > > > [  119.062684] x17: 00000000000001eb x16: 0000000000000012
> > > > [  119.062689] x15: 000000000010000f x14: d600120202000001
> > > > [  119.062695] x13: ffffffc017336ff1 x12: ffffffc017336ff4
> > > > [  119.062700] x11: 0000000000000002 x10: 01010101010100ff
> > > > [  119.062705] x9 : ffffffffffffffff x8 : ffffffc0171c944d
> > > > [  119.062710] x7 : d15d3aaabd294330 x6 : 0206397115fe28ab
> > > > [  119.062715] x5 : ffffffc0171c944f x4 : 000000000009344f
> > > > [  119.062721] x3 : ffffffc01837b8d0 x2 : ffffffc017237000
> > > > [  119.062726] x1 : 000000000009344f x0 : ffffffc0171c9447
> > > > [  119.062731] Call trace:
> > > > [  119.062738]  lzo1x_decompress_safe+0x1dc/0x564
> > > > [  119.062742]  lzo_uncompress+0x134/0x1f0
> > > > [  119.062746]  squashfs_decompress+0x6c/0xb4
> > > > [  119.062753]  squashfs_read_data+0x1a8/0x298
> > > > [  119.062758]  squashfs_readahead+0x308/0x474
> > > > [  119.062765]  read_pages+0x74/0x280
> > > > [  119.062769]  page_cache_ra_unbounded+0x1d0/0x228
> > > > [  119.062773]  do_page_cache_ra+0x44/0x50
> > > > [  119.062779]  do_sync_mmap_readahead+0x188/0x1a0
> > > > [  119.062783]  filemap_fault+0x100/0x350
> > > > [  119.062789]  __do_fault+0x48/0x10c
> > > > [  119.062793]  do_cow_fault+0x58/0x12c
> > > > [  119.062797]  handle_mm_fault+0x544/0x904
> > > > [  119.062804]  do_page_fault+0x260/0x384
> > > > [  119.062809]  do_translation_fault+0x44/0x5c
> > > > [  119.062813]  do_mem_abort+0x48/0xb4
> > > > [  119.062819]  el0_da+0x28/0x34
> > > > [  119.062824]  el0_sync_compat_handler+0xb8/0xcc
> > > > [  119.062829]  el0_sync_compat+0x188/0x1c0
> > > > [  119.062837] Code: f94001ae f90002ae f94005ae 910041ad (f90006ae)
> > > > [  119.062842] ---[ end trace 3e9828c7360fd7be ]---
> > > > [  119.090436] Kernel panic - not syncing: Oops: Fatal exception
> > > > [  119.090455] SMP: stopping secondary CPUs
> > > > [  119.090467] Kernel Offset: 0x2729c00000 from 0xffffffc010000000
> > > > [  119.090471] PHYS_OFFSET: 0xffffffd880000000
> > > > [  119.090477] CPU features: 0x08240002,2188200c
> > > >
> > > > 1) Traces near when the crash happened:
> > > > [   79.495580] Block @ 0x60eea9c, compressed size 65744, src size 1048576
> > > > [   80.363573] Block @ 0x1f9f000, compressed size 200598, src size 1048576
> > > > [   80.371256] Block @ 0x1fcff96, compressed size 80772, src size 1048576
> > > > [   80.428388] Block @ 0x1fe3b1a, compressed size 83941, src size 1048576
> > > > [   80.435319] Block @ 0x1ff82ff, compressed size 77936, src size 1048576
> > > > [   80.724331] Block @ 0x4501000, compressed size 364069, src size 1048576
> > > > [   80.738683] Block @ 0x4dccf28, compressed size 603215, src size 2097152
> > >
> > > Src size 2097152 is clearly wrong, as the maximum data block size is 1
> > > Mbyte or 1048576.
> > >
> > > That debug line comes from
> > >
> > > https://elixir.bootlin.com/linux/latest/source/fs/squashfs/block.c#L156
> > >
> > > ----
> > > TRACE("Block @ 0x%llx, %scompressed size %d, src size %d\n",
> > >                 index, compressed ? "" : "un", length, output->length);
> > > ----
> > >
> > > Which indicates your code has created a page_actor of 2 Mbytes in size
> > > (output->length).
> > >
> > > This is completely incorrect, as the page_actor should never be larger
> > > than the size of the block to be read in question.  In most cases that
> > > will be msblk->block_size, but it may be less at the end of the file.
> > >
> > > You appear to be trying to read the amount of readahead requested.  But,
> > > you should always be trying to read the lesser of readahead, and the
> > > size of the block in question.
> > >
> > > Hope that helps.
> > >
> > > Phillip
> > >
> > Hi Phillip,
> > Thanks for the explanation. After restricting the size feed to
> > page_actor, the crash no longer happened.
> >
> > Hi Xiongwei,
> > Can you test this version (sent as attachment) again? I've tested on
> > my platform:
> > - arm64
> > - kernel 5.10
> > - pack_data size ~ 300K
> > - time ureadahead pack_data
> > 1. with c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted:
> > 0.633s
> > 0.755s
> > 0.804s
> >
> > 2. apply the patch:
> > 0.625s
> > 0.656s
> > 0.768s
> >
> Thanks for sharing. I have done the test on 5.10 and 5.18. The results are
> a little worse than patch v1 for my test env.
>
> On linux 5.10:
> With c1f6925e1091 ("mm: put readahead pages in cache earlier") reverted::
> 1:37.16 (1m +37.16s)
> 1:04.18
> 1:05.28
> 1:06.07
> 1:06.31
> 1:06.58
> 1:06.80
> 1:06.79
> 1:06.95
> 1:06.61
>
> With your patch v2:
> 2:04.27 (2m + 4.27s)
> 1:14.95
> 1:14.56
> 1:15.75
> 1:16.55
> 1:16.87
> 1:16.74
> 1:17.36
> 1:17.50
> 1:17.32
>
> On linux 5.18:
> The ra disabled by default::
> 1:12.82
> 1:07.68
> 1:08.94
> 1:09.65
> 1:09.87
> 1:10.32
> 1:10.47
> 1:10.34
> 1:10.24
> 1:10.34
>
> With your patch v2:
> 2:00.14 (2m + 0.14s)
> 1:13.46
> 1:14.62
> 1:15.02
> 1:15.78
> 1:16.01
> 1:16.03
> 1:16.24
> 1:16.44
> 1:16.16
>
> As you can see, there are extra 6s increase on both 5.10 and 5.18.
> Don't know if the change of page number makes the overhead.
>
> One stupid question, see below code from your patch:
>
>  +       }
>  +
>  +       kfree(actor);
>  +       return;
>  +
>  +skip_pages:
>
> when release page pointers array after pages cached? I don't see
> any chance to do that.
>
actor is not a page pointer. This is allocated from
squashfs_page_actor_init() and should be freed after use. Or do you
mean skip_pages? There are some situations where we can't decompress
the whole block, so we will skip those pages.

> Regards,
> Xiongwei
>
>
>
> > Hi Matthew,
> > Thanks for reviewing the patch previously. Does this version look good
> > to you? If so, I can send it to the list.
> >
> >
> > Thanks for all of your help.
> >
> > > >
> > > > It's also noticed that when the crash happened, nr_pages obtained by
> > > > readahead_count() is 512.
> > > > nr_pages = readahead_count(ractl); // this line
> > > >
> > > > 2) Normal cases that won't crash:
> > > > [   22.651750] Block @ 0xb3bbca6, compressed size 42172, src size 262144
> > > > [   22.653580] Block @ 0xb3c6162, compressed size 29815, src size 262144
> > > > [   22.656692] Block @ 0xb4a293f, compressed size 17484, src size 131072
> > > > [   22.666099] Block @ 0xb593881, compressed size 39742, src size 262144
> > > > [   22.668699] Block @ 0xb59d3bf, compressed size 37841, src size 262144
> > > > [   22.695739] Block @ 0x13698673, compressed size 65907, src size 131072
> > > > [   22.698619] Block @ 0x136a87e6, compressed size 3155, src size 131072
> > > > [   22.703400] Block @ 0xb1babe8, compressed size 99391, src size 131072
> > > > [   22.706288] Block @ 0x1514abc6, compressed size 4627, src size 131072
> > > >
> > > > nr_pages are observed to be 32, 64, 256... These won't cause a crash.
> > > > Other values (max_pages, bsize, block...) looks normal
> > > >
> > > > I'm not sure why the crash happened, but I tried to modify the mask
> > > > for a bit. After modifying the mask value to below, the crash is gone
> > > > (nr_pages are <=256).
> > > > Based on my testing on a 300K pack file, there's no performance change.
> > > >
> > > > diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > > > index 20ec48cf97c5..f6d9b6f88ed9 100644
> > > > --- a/fs/squashfs/file.c
> > > > +++ b/fs/squashfs/file.c
> > > > @@ -499,8 +499,8 @@ static void squashfs_readahead(struct
> > > > readahead_control *ractl)
> > > >   {
> > > >          struct inode *inode = ractl->mapping->host;
> > > >          struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> > > > -       size_t mask = (1UL << msblk->block_log) - 1;
> > > >          size_t shift = msblk->block_log - PAGE_SHIFT;
> > > > +       size_t mask = (1UL << shift) - 1;
> > > >
> > > >
> > > > Any pointers are appreciated. Thanks!
> > >


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-13 16:43                           ` Hsin-Yi Wang
@ 2022-05-13 18:12                             ` Matthew Wilcox
  2022-05-13 23:12                               ` Xiongwei Song
  2022-05-14 11:51                               ` Hsin-Yi Wang
  0 siblings, 2 replies; 34+ messages in thread
From: Matthew Wilcox @ 2022-05-13 18:12 UTC (permalink / raw)
  To: Hsin-Yi Wang
  Cc: Xiongwei Song, Phillip Lougher, Zheng Liang, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

On Sat, May 14, 2022 at 12:43:47AM +0800, Hsin-Yi Wang wrote:
> > One stupid question, see below code from your patch:
> >
> >  +       }
> >  +
> >  +       kfree(actor);
> >  +       return;
> >  +
> >  +skip_pages:
> >
> > when release page pointers array after pages cached? I don't see
> > any chance to do that.
> >
> actor is not a page pointer. This is allocated from
> squashfs_page_actor_init() and should be freed after use. Or do you
> mean skip_pages? There are some situations where we can't decompress
> the whole block, so we will skip those pages.

I think the concern is that you don't seem to kfree(pages) on this
exit path.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-13 18:12                             ` Matthew Wilcox
@ 2022-05-13 23:12                               ` Xiongwei Song
  2022-05-14 11:51                               ` Hsin-Yi Wang
  1 sibling, 0 replies; 34+ messages in thread
From: Xiongwei Song @ 2022-05-13 23:12 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Hsin-Yi Wang, Phillip Lougher, Zheng Liang, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

On Sat, May 14, 2022 at 2:12 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sat, May 14, 2022 at 12:43:47AM +0800, Hsin-Yi Wang wrote:
> > > One stupid question, see below code from your patch:
> > >
> > >  +       }
> > >  +
> > >  +       kfree(actor);
> > >  +       return;
> > >  +
> > >  +skip_pages:
> > >
> > > when release page pointers array after pages cached? I don't see
> > > any chance to do that.
> > >
> > actor is not a page pointer. This is allocated from
> > squashfs_page_actor_init() and should be freed after use. Or do you
> > mean skip_pages? There are some situations where we can't decompress
> > the whole block, so we will skip those pages.
>
> I think the concern is that you don't seem to kfree(pages) on this
> exit path.

That's exactly what I wanted to say. Thanks Matthew.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-13 18:12                             ` Matthew Wilcox
  2022-05-13 23:12                               ` Xiongwei Song
@ 2022-05-14 11:51                               ` Hsin-Yi Wang
  1 sibling, 0 replies; 34+ messages in thread
From: Hsin-Yi Wang @ 2022-05-14 11:51 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Xiongwei Song, Phillip Lougher, Zheng Liang, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

On Sat, May 14, 2022 at 2:12 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sat, May 14, 2022 at 12:43:47AM +0800, Hsin-Yi Wang wrote:
> > > One stupid question, see below code from your patch:
> > >
> > >  +       }
> > >  +
> > >  +       kfree(actor);
> > >  +       return;
> > >  +
> > >  +skip_pages:
> > >
> > > when release page pointers array after pages cached? I don't see
> > > any chance to do that.
> > >
> > actor is not a page pointer. This is allocated from
> > squashfs_page_actor_init() and should be freed after use. Or do you
> > mean skip_pages? There are some situations where we can't decompress
> > the whole block, so we will skip those pages.
>
> I think the concern is that you don't seem to kfree(pages) on this
> exit path.

Got it. I'll update in the next version.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-13 13:09                           ` Matthew Wilcox
@ 2022-05-15  0:05                             ` Phillip Lougher
  0 siblings, 0 replies; 34+ messages in thread
From: Phillip Lougher @ 2022-05-15  0:05 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Hsin-Yi Wang, Xiongwei Song, Zheng Liang, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

On 13/05/2022 14:09, Matthew Wilcox wrote:
> On Fri, May 13, 2022 at 06:33:21AM +0100, Phillip Lougher wrote:
>> Looking at the new patch, I have a couple of questions which is worth
>> clarifying to have a fuller understanding of the readahead behaviour.
>> In otherwords I'm deducing the behaviour of the readahead calls
>> from context, and I want to make sure they're doing what I think
>> they're doing.
> 
> I did write quite a lot of documentation as part of the readahead
> revision, and filesystem authors are the target audience, so this is
> somewhat disheartening to read.  What could I have done better to make
> the readahead documentation obvious for you to find?

That wasn't a criticism about your documentation or how hard
it is to find.  Please don't take it that way. It was just
quicker (for me) to understand the patch from a Squashfs
point of view.

Phillip

> 
>> +	nr_pages = min(readahead_count(ractl), max_pages);
>>
>> As I understand it, this will always produce nr_pages which will
>> cover the entirety of the block to be decompressed?  That is if
>> a block is block_size, it will return the number of pages necessary
>> to decompress the entire block?  It will never return less than the
>> necessary pages, i.e. if the block_size was 128K, it will never
>> return less than 32 pages?
> 
> readahead_count() returns the number of pages that the page cache is
> asking the filesystem for.  It may be any number from 1 to whatever
> the current readahead window is.  It's possible to ask the page
> cache to expand the readahead request to be aligned to a decompression
> boundary, but that may not be possible.  For example, we may be in a
> situation where we read pages 32-63 from the file previously, then
> the page cache chose to discard pages 33, 35, 37, .., 63 under memory
> pressure, and now the file is being re-read.  This isn't a likely
> usage pattern, of course, but it's a situation we have to cope with.
> 
>> + nr_pages = __readahead_batch(ractl, pages, max_pages)
>>
>> My understanding is that this call will fully populate the
>> pages array with page references without any holes.  That
>> is none of the pages array entries will be NULL, meaning
>> there isn't a page for that entry.  In other words, if the
>> pages array has 32 pages, each of the 32 entries will
>> reference a page.
> 
> That is correct, a readahead request is always for a contiguous range of
> the file.  The pages are allocated before calling ->readahead, so
> there's no opportunity for failure; they exist and they're already in
> the page cache, waiting for the FS to tell the pagecache that they're
> uptodate and unlock them.
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-13  6:35                           ` Hsin-Yi Wang
@ 2022-05-15  0:54                             ` Phillip Lougher
  2022-05-16  8:23                               ` Hsin-Yi Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Phillip Lougher @ 2022-05-15  0:54 UTC (permalink / raw)
  To: Hsin-Yi Wang
  Cc: Xiongwei Song, Matthew Wilcox, Zheng Liang, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

[-- Attachment #1: Type: text/plain, Size: 5679 bytes --]

On 13/05/2022 07:35, Hsin-Yi Wang wrote:
> On Fri, May 13, 2022 at 1:33 PM Phillip Lougher <phillip@squashfs.org.uk> wrote:
>>
>> My understanding is that this call will fully populate the
>> pages array with page references without any holes.  That
>> is none of the pages array entries will be NULL, meaning
>> there isn't a page for that entry.  In other words, if the
>> pages array has 32 pages, each of the 32 entries will
>> reference a page.
>>
> I noticed that if nr_pages < max_pages, calling read_blocklist() will
> have SQUASHFS errors,
> 
> SQUASHFS error: Failed to read block 0x125ef7d: -5
> SQUASHFS error: zlib decompression failed, data probably corrupt
> 
> so I did a check if nr_pages < max_pages before squashfs_read_data(),
> just skip the remaining pages and let them be handled by readpage.
> 

Yes that avoids passing the decompressor code a too small page range.
As such extending the decompressor code isn't necessary.

Testing your patch I discovered a number of cases where
the decompressor still failed as above.

This I traced to "sparse blocks", these are zero filled blocks, and
are indicated/stored as a block length of 0 (bsize == 0).  Skipping
this sparse block and letting it be handled by readpage fixes this
issue.

I also noticed a potential performance improvement.  You check for
"pages[nr_pages - 1]->index >> shift) == index" after calling
squashfs_read_data.  But this information is known before
calling squashfs_read_data and moving the check to before
squashfs_read_data saves the cost of doing a redundant block
decompression.

Finally I noticed that if nr_pages grows after the __readahead_batch
call, then the pages array and the page actor will be too small, and
it will cause the decompressor to fail.  Changing the allocation to 
max_pages fixes this.

I have rolled these fixes into the patch below (also attached in
case it gets garbled).

diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
index 7cd57e0d88de..14485a7af5cf 100644
--- a/fs/squashfs/file.c
+++ b/fs/squashfs/file.c
@@ -518,13 +518,11 @@ static void squashfs_readahead(struct 
readahead_control *ractl)
  	    file_end == 0)
  		return;

-	nr_pages = min(readahead_count(ractl), max_pages);
-
-	pages = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL);
+	pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
  	if (!pages)
  		return;

-	actor = squashfs_page_actor_init_special(pages, nr_pages, 0);
+	actor = squashfs_page_actor_init_special(pages, max_pages, 0);
  	if (!actor)
  		goto out;

@@ -538,11 +536,18 @@ static void squashfs_readahead(struct 
readahead_control *ractl)
  			goto skip_pages;

  		index = pages[0]->index >> shift;
+
+		if ((pages[nr_pages - 1]->index >> shift) != index)
+			goto skip_pages;
+
  		bsize = read_blocklist(inode, index, &block);
+		if (bsize == 0)
+			goto skip_pages;
+
  		res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
  					 actor);

-		if (res >= 0 && (pages[nr_pages - 1]->index >> shift) == index)
+		if (res >= 0)
  			for (i = 0; i < nr_pages; i++)
  				SetPageUptodate(pages[i]);

-- 
2.34.1



Phillip


>> This is important for the decompression code, because it
>> expects each pages array entry to reference a page, which
>> can be kmapped to an address.  If an entry in the pages
>> array is NULL, this will break.
>>
>> If the pages array can have holes (NULL pointers), I have
>> written an update patch which allows the decompression code
>> to handle these NULL pointers.
>>
>> If the pages array can have NULL pointers, I can send you
>> the patch which will deal with this.
> 
> Sure, if there are better ways to deal with this.
> 
> Thanks.
> 
>>
>> Thanks
>>
>> Phillip
>>
>>
>>
>>>
>>>>>
>>>>> It's also noticed that when the crash happened, nr_pages obtained by
>>>>> readahead_count() is 512.
>>>>> nr_pages = readahead_count(ractl); // this line
>>>>>
>>>>> 2) Normal cases that won't crash:
>>>>> [   22.651750] Block @ 0xb3bbca6, compressed size 42172, src size 262144
>>>>> [   22.653580] Block @ 0xb3c6162, compressed size 29815, src size 262144
>>>>> [   22.656692] Block @ 0xb4a293f, compressed size 17484, src size 131072
>>>>> [   22.666099] Block @ 0xb593881, compressed size 39742, src size 262144
>>>>> [   22.668699] Block @ 0xb59d3bf, compressed size 37841, src size 262144
>>>>> [   22.695739] Block @ 0x13698673, compressed size 65907, src size 131072
>>>>> [   22.698619] Block @ 0x136a87e6, compressed size 3155, src size 131072
>>>>> [   22.703400] Block @ 0xb1babe8, compressed size 99391, src size 131072
>>>>> [   22.706288] Block @ 0x1514abc6, compressed size 4627, src size 131072
>>>>>
>>>>> nr_pages are observed to be 32, 64, 256... These won't cause a crash.
>>>>> Other values (max_pages, bsize, block...) looks normal
>>>>>
>>>>> I'm not sure why the crash happened, but I tried to modify the mask
>>>>> for a bit. After modifying the mask value to below, the crash is gone
>>>>> (nr_pages are <=256).
>>>>> Based on my testing on a 300K pack file, there's no performance change.
>>>>>
>>>>> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
>>>>> index 20ec48cf97c5..f6d9b6f88ed9 100644
>>>>> --- a/fs/squashfs/file.c
>>>>> +++ b/fs/squashfs/file.c
>>>>> @@ -499,8 +499,8 @@ static void squashfs_readahead(struct
>>>>> readahead_control *ractl)
>>>>>     {
>>>>>            struct inode *inode = ractl->mapping->host;
>>>>>            struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
>>>>> -       size_t mask = (1UL << msblk->block_log) - 1;
>>>>>            size_t shift = msblk->block_log - PAGE_SHIFT;
>>>>> +       size_t mask = (1UL << shift) - 1;
>>>>>
>>>>>
>>>>> Any pointers are appreciated. Thanks!
>>>>
>>

[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 1190 bytes --]

diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
index 7cd57e0d88de..14485a7af5cf 100644
--- a/fs/squashfs/file.c
+++ b/fs/squashfs/file.c
@@ -518,13 +518,11 @@ static void squashfs_readahead(struct readahead_control *ractl)
 	    file_end == 0)
 		return;
 
-	nr_pages = min(readahead_count(ractl), max_pages);
-
-	pages = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL);
+	pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
 	if (!pages)
 		return;
 
-	actor = squashfs_page_actor_init_special(pages, nr_pages, 0);
+	actor = squashfs_page_actor_init_special(pages, max_pages, 0);
 	if (!actor)
 		goto out;
 
@@ -538,11 +536,18 @@ static void squashfs_readahead(struct readahead_control *ractl)
 			goto skip_pages;
 
 		index = pages[0]->index >> shift;
+
+		if ((pages[nr_pages - 1]->index >> shift) != index)
+			goto skip_pages;
+
 		bsize = read_blocklist(inode, index, &block);
+		if (bsize == 0)
+			goto skip_pages;
+
 		res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
 					 actor);
 
-		if (res >= 0 && (pages[nr_pages - 1]->index >> shift) == index)
+		if (res >= 0)
 			for (i = 0; i < nr_pages; i++)
 				SetPageUptodate(pages[i]);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-15  0:54                             ` Phillip Lougher
@ 2022-05-16  8:23                               ` Hsin-Yi Wang
  2022-05-16  9:00                                 ` Xiongwei Song
  0 siblings, 1 reply; 34+ messages in thread
From: Hsin-Yi Wang @ 2022-05-16  8:23 UTC (permalink / raw)
  To: Phillip Lougher
  Cc: Xiongwei Song, Matthew Wilcox, Zheng Liang, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

[-- Attachment #1: Type: text/plain, Size: 6617 bytes --]

On Sun, May 15, 2022 at 8:55 AM Phillip Lougher <phillip@squashfs.org.uk> wrote:
>
> On 13/05/2022 07:35, Hsin-Yi Wang wrote:
> > On Fri, May 13, 2022 at 1:33 PM Phillip Lougher <phillip@squashfs.org.uk> wrote:
> >>
> >> My understanding is that this call will fully populate the
> >> pages array with page references without any holes.  That
> >> is none of the pages array entries will be NULL, meaning
> >> there isn't a page for that entry.  In other words, if the
> >> pages array has 32 pages, each of the 32 entries will
> >> reference a page.
> >>
> > I noticed that if nr_pages < max_pages, calling read_blocklist() will
> > have SQUASHFS errors,
> >
> > SQUASHFS error: Failed to read block 0x125ef7d: -5
> > SQUASHFS error: zlib decompression failed, data probably corrupt
> >
> > so I did a check if nr_pages < max_pages before squashfs_read_data(),
> > just skip the remaining pages and let them be handled by readpage.
> >
>
> Yes that avoids passing the decompressor code a too small page range.
> As such extending the decompressor code isn't necessary.
>
> Testing your patch I discovered a number of cases where
> the decompressor still failed as above.
>
> This I traced to "sparse blocks", these are zero filled blocks, and
> are indicated/stored as a block length of 0 (bsize == 0).  Skipping
> this sparse block and letting it be handled by readpage fixes this
> issue.
>
Ack. Thanks for testing this.

> I also noticed a potential performance improvement.  You check for
> "pages[nr_pages - 1]->index >> shift) == index" after calling
> squashfs_read_data.  But this information is known before
> calling squashfs_read_data and moving the check to before
> squashfs_read_data saves the cost of doing a redundant block
> decompression.
>
After applying this, The performance becomes:
2.73s
2.76s
2.73s

Original:
2.76s
2.79s
2.77s

(The pack file is different from my previous testing in this email thread.)

> Finally I noticed that if nr_pages grows after the __readahead_batch
> call, then the pages array and the page actor will be too small, and
> it will cause the decompressor to fail.  Changing the allocation to
> max_pages fixes this.
>
Ack.

I've added the fixes patch and previous fixes.
> I have rolled these fixes into the patch below (also attached in
> case it gets garbled).
>
> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> index 7cd57e0d88de..14485a7af5cf 100644
> --- a/fs/squashfs/file.c
> +++ b/fs/squashfs/file.c
> @@ -518,13 +518,11 @@ static void squashfs_readahead(struct
> readahead_control *ractl)
>             file_end == 0)
>                 return;
>
> -       nr_pages = min(readahead_count(ractl), max_pages);
> -
> -       pages = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL);
> +       pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
>         if (!pages)
>                 return;
>
> -       actor = squashfs_page_actor_init_special(pages, nr_pages, 0);
> +       actor = squashfs_page_actor_init_special(pages, max_pages, 0);
>         if (!actor)
>                 goto out;
>
> @@ -538,11 +536,18 @@ static void squashfs_readahead(struct
> readahead_control *ractl)
>                         goto skip_pages;
>
>                 index = pages[0]->index >> shift;
> +
> +               if ((pages[nr_pages - 1]->index >> shift) != index)
> +                       goto skip_pages;
> +
>                 bsize = read_blocklist(inode, index, &block);
> +               if (bsize == 0)
> +                       goto skip_pages;
> +
>                 res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
>                                          actor);
>
> -               if (res >= 0 && (pages[nr_pages - 1]->index >> shift) == index)
> +               if (res >= 0)
>                         for (i = 0; i < nr_pages; i++)
>                                 SetPageUptodate(pages[i]);
>
> --
> 2.34.1
>
>
>
> Phillip
>
>
> >> This is important for the decompression code, because it
> >> expects each pages array entry to reference a page, which
> >> can be kmapped to an address.  If an entry in the pages
> >> array is NULL, this will break.
> >>
> >> If the pages array can have holes (NULL pointers), I have
> >> written an update patch which allows the decompression code
> >> to handle these NULL pointers.
> >>
> >> If the pages array can have NULL pointers, I can send you
> >> the patch which will deal with this.
> >
> > Sure, if there are better ways to deal with this.
> >
> > Thanks.
> >
> >>
> >> Thanks
> >>
> >> Phillip
> >>
> >>
> >>
> >>>
> >>>>>
> >>>>> It's also noticed that when the crash happened, nr_pages obtained by
> >>>>> readahead_count() is 512.
> >>>>> nr_pages = readahead_count(ractl); // this line
> >>>>>
> >>>>> 2) Normal cases that won't crash:
> >>>>> [   22.651750] Block @ 0xb3bbca6, compressed size 42172, src size 262144
> >>>>> [   22.653580] Block @ 0xb3c6162, compressed size 29815, src size 262144
> >>>>> [   22.656692] Block @ 0xb4a293f, compressed size 17484, src size 131072
> >>>>> [   22.666099] Block @ 0xb593881, compressed size 39742, src size 262144
> >>>>> [   22.668699] Block @ 0xb59d3bf, compressed size 37841, src size 262144
> >>>>> [   22.695739] Block @ 0x13698673, compressed size 65907, src size 131072
> >>>>> [   22.698619] Block @ 0x136a87e6, compressed size 3155, src size 131072
> >>>>> [   22.703400] Block @ 0xb1babe8, compressed size 99391, src size 131072
> >>>>> [   22.706288] Block @ 0x1514abc6, compressed size 4627, src size 131072
> >>>>>
> >>>>> nr_pages are observed to be 32, 64, 256... These won't cause a crash.
> >>>>> Other values (max_pages, bsize, block...) looks normal
> >>>>>
> >>>>> I'm not sure why the crash happened, but I tried to modify the mask
> >>>>> for a bit. After modifying the mask value to below, the crash is gone
> >>>>> (nr_pages are <=256).
> >>>>> Based on my testing on a 300K pack file, there's no performance change.
> >>>>>
> >>>>> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> >>>>> index 20ec48cf97c5..f6d9b6f88ed9 100644
> >>>>> --- a/fs/squashfs/file.c
> >>>>> +++ b/fs/squashfs/file.c
> >>>>> @@ -499,8 +499,8 @@ static void squashfs_readahead(struct
> >>>>> readahead_control *ractl)
> >>>>>     {
> >>>>>            struct inode *inode = ractl->mapping->host;
> >>>>>            struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> >>>>> -       size_t mask = (1UL << msblk->block_log) - 1;
> >>>>>            size_t shift = msblk->block_log - PAGE_SHIFT;
> >>>>> +       size_t mask = (1UL << shift) - 1;
> >>>>>
> >>>>>
> >>>>> Any pointers are appreciated. Thanks!
> >>>>
> >>

[-- Attachment #2: 0001-WIP-squashfs-implement-readahead.patch --]
[-- Type: text/x-patch, Size: 3279 bytes --]

From b24e7e6068f3e56a66b914798bbc4dd84a84b1ca Mon Sep 17 00:00:00 2001
From: Hsin-Yi Wang <hsinyi@chromium.org>
Date: Sun, 10 Oct 2021 21:22:25 +0800
Subject: [PATCH] squashfs: implement readahead

Implement readahead callback for squashfs. It will read datablocks
which cover pages in readahead request. For a few cases it will
not mark page as uptodate, including:
- file end is 0.
- zero filled blocks.
- current batch of pages isn't in the same datablock or not enough in a
  datablock.
Otherwise pages will be marked as uptodate. The unhandled pages will be
updated by readpage later.

Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
---
 fs/squashfs/file.c | 79 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 78 insertions(+), 1 deletion(-)

diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
index 89d492916dea..48b134a315b7 100644
--- a/fs/squashfs/file.c
+++ b/fs/squashfs/file.c
@@ -39,6 +39,7 @@
 #include "squashfs_fs_sb.h"
 #include "squashfs_fs_i.h"
 #include "squashfs.h"
+#include "page_actor.h"
 
 /*
  * Locate cache slot in range [offset, index] for specified inode.  If
@@ -494,7 +495,83 @@ static int squashfs_readpage(struct file *file, struct page *page)
 	return 0;
 }
 
+static void squashfs_readahead(struct readahead_control *ractl)
+{
+	struct inode *inode = ractl->mapping->host;
+	struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
+	size_t mask = (1UL << msblk->block_log) - 1;
+	size_t shift = msblk->block_log - PAGE_SHIFT;
+	loff_t req_end = readahead_pos(ractl) + readahead_length(ractl);
+	loff_t start = readahead_pos(ractl) &~ mask;
+	size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
+	struct squashfs_page_actor *actor;
+	unsigned int nr_pages = 0;
+	struct page **pages;
+	u64 block = 0;
+	int bsize, res, i, index;
+	int file_end = i_size_read(inode) >> msblk->block_log;
+	unsigned int max_pages = 1UL << shift;
+
+	readahead_expand(ractl, start, (len | mask) + 1);
+
+	if (readahead_pos(ractl) + readahead_length(ractl) < req_end ||
+	    file_end == 0)
+		return;
+
+	pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
+	if (!pages)
+		return;
+
+	actor = squashfs_page_actor_init_special(pages, max_pages, 0);
+	if (!actor)
+		goto out;
+
+	for (;;) {
+		nr_pages = __readahead_batch(ractl, pages, max_pages);
+		if (!nr_pages)
+			break;
+
+		if (readahead_pos(ractl) >= i_size_read(inode) ||
+		    nr_pages < max_pages)
+			goto skip_pages;
+
+		index = pages[0]->index >> shift;
+		if ((pages[nr_pages - 1]->index >> shift) != index)
+			goto skip_pages;
+
+		bsize = read_blocklist(inode, index, &block);
+		if (bsize == 0)
+			goto skip_pages;
+
+		res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
+					 actor);
+
+		if (res >= 0)
+			for (i = 0; i < nr_pages; i++)
+				SetPageUptodate(pages[i]);
+
+		for (i = 0; i < nr_pages; i++) {
+			unlock_page(pages[i]);
+			put_page(pages[i]);
+		}
+	}
+
+	kfree(actor);
+	kfree(pages);
+	return;
+
+skip_pages:
+	for (i = 0; i < nr_pages; i++) {
+		unlock_page(pages[i]);
+		put_page(pages[i]);
+	}
+
+	kfree(actor);
+out:
+	kfree(pages);
+}
 
 const struct address_space_operations squashfs_aops = {
-	.readpage = squashfs_readpage
+	.readpage = squashfs_readpage,
+	.readahead = squashfs_readahead
 };
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-16  8:23                               ` Hsin-Yi Wang
@ 2022-05-16  9:00                                 ` Xiongwei Song
  2022-05-16  9:10                                   ` Hsin-Yi Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Xiongwei Song @ 2022-05-16  9:00 UTC (permalink / raw)
  To: Hsin-Yi Wang
  Cc: Phillip Lougher, Matthew Wilcox, Zheng Liang, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

You can not just sign your signature. You ignored others' contributions.
This is unacceptable.

On Mon, May 16, 2022 at 4:23 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
>
> On Sun, May 15, 2022 at 8:55 AM Phillip Lougher <phillip@squashfs.org.uk> wrote:
> >
> > On 13/05/2022 07:35, Hsin-Yi Wang wrote:
> > > On Fri, May 13, 2022 at 1:33 PM Phillip Lougher <phillip@squashfs.org.uk> wrote:
> > >>
> > >> My understanding is that this call will fully populate the
> > >> pages array with page references without any holes.  That
> > >> is none of the pages array entries will be NULL, meaning
> > >> there isn't a page for that entry.  In other words, if the
> > >> pages array has 32 pages, each of the 32 entries will
> > >> reference a page.
> > >>
> > > I noticed that if nr_pages < max_pages, calling read_blocklist() will
> > > have SQUASHFS errors,
> > >
> > > SQUASHFS error: Failed to read block 0x125ef7d: -5
> > > SQUASHFS error: zlib decompression failed, data probably corrupt
> > >
> > > so I did a check if nr_pages < max_pages before squashfs_read_data(),
> > > just skip the remaining pages and let them be handled by readpage.
> > >
> >
> > Yes that avoids passing the decompressor code a too small page range.
> > As such extending the decompressor code isn't necessary.
> >
> > Testing your patch I discovered a number of cases where
> > the decompressor still failed as above.
> >
> > This I traced to "sparse blocks", these are zero filled blocks, and
> > are indicated/stored as a block length of 0 (bsize == 0).  Skipping
> > this sparse block and letting it be handled by readpage fixes this
> > issue.
> >
> Ack. Thanks for testing this.
>
> > I also noticed a potential performance improvement.  You check for
> > "pages[nr_pages - 1]->index >> shift) == index" after calling
> > squashfs_read_data.  But this information is known before
> > calling squashfs_read_data and moving the check to before
> > squashfs_read_data saves the cost of doing a redundant block
> > decompression.
> >
> After applying this, The performance becomes:
> 2.73s
> 2.76s
> 2.73s
>
> Original:
> 2.76s
> 2.79s
> 2.77s
>
> (The pack file is different from my previous testing in this email thread.)
>
> > Finally I noticed that if nr_pages grows after the __readahead_batch
> > call, then the pages array and the page actor will be too small, and
> > it will cause the decompressor to fail.  Changing the allocation to
> > max_pages fixes this.
> >
> Ack.
>
> I've added the fixes patch and previous fixes.
> > I have rolled these fixes into the patch below (also attached in
> > case it gets garbled).
> >
> > diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > index 7cd57e0d88de..14485a7af5cf 100644
> > --- a/fs/squashfs/file.c
> > +++ b/fs/squashfs/file.c
> > @@ -518,13 +518,11 @@ static void squashfs_readahead(struct
> > readahead_control *ractl)
> >             file_end == 0)
> >                 return;
> >
> > -       nr_pages = min(readahead_count(ractl), max_pages);
> > -
> > -       pages = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL);
> > +       pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
> >         if (!pages)
> >                 return;
> >
> > -       actor = squashfs_page_actor_init_special(pages, nr_pages, 0);
> > +       actor = squashfs_page_actor_init_special(pages, max_pages, 0);
> >         if (!actor)
> >                 goto out;
> >
> > @@ -538,11 +536,18 @@ static void squashfs_readahead(struct
> > readahead_control *ractl)
> >                         goto skip_pages;
> >
> >                 index = pages[0]->index >> shift;
> > +
> > +               if ((pages[nr_pages - 1]->index >> shift) != index)
> > +                       goto skip_pages;
> > +
> >                 bsize = read_blocklist(inode, index, &block);
> > +               if (bsize == 0)
> > +                       goto skip_pages;
> > +
> >                 res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
> >                                          actor);
> >
> > -               if (res >= 0 && (pages[nr_pages - 1]->index >> shift) == index)
> > +               if (res >= 0)
> >                         for (i = 0; i < nr_pages; i++)
> >                                 SetPageUptodate(pages[i]);
> >
> > --
> > 2.34.1
> >
> >
> >
> > Phillip
> >
> >
> > >> This is important for the decompression code, because it
> > >> expects each pages array entry to reference a page, which
> > >> can be kmapped to an address.  If an entry in the pages
> > >> array is NULL, this will break.
> > >>
> > >> If the pages array can have holes (NULL pointers), I have
> > >> written an update patch which allows the decompression code
> > >> to handle these NULL pointers.
> > >>
> > >> If the pages array can have NULL pointers, I can send you
> > >> the patch which will deal with this.
> > >
> > > Sure, if there are better ways to deal with this.
> > >
> > > Thanks.
> > >
> > >>
> > >> Thanks
> > >>
> > >> Phillip
> > >>
> > >>
> > >>
> > >>>
> > >>>>>
> > >>>>> It's also noticed that when the crash happened, nr_pages obtained by
> > >>>>> readahead_count() is 512.
> > >>>>> nr_pages = readahead_count(ractl); // this line
> > >>>>>
> > >>>>> 2) Normal cases that won't crash:
> > >>>>> [   22.651750] Block @ 0xb3bbca6, compressed size 42172, src size 262144
> > >>>>> [   22.653580] Block @ 0xb3c6162, compressed size 29815, src size 262144
> > >>>>> [   22.656692] Block @ 0xb4a293f, compressed size 17484, src size 131072
> > >>>>> [   22.666099] Block @ 0xb593881, compressed size 39742, src size 262144
> > >>>>> [   22.668699] Block @ 0xb59d3bf, compressed size 37841, src size 262144
> > >>>>> [   22.695739] Block @ 0x13698673, compressed size 65907, src size 131072
> > >>>>> [   22.698619] Block @ 0x136a87e6, compressed size 3155, src size 131072
> > >>>>> [   22.703400] Block @ 0xb1babe8, compressed size 99391, src size 131072
> > >>>>> [   22.706288] Block @ 0x1514abc6, compressed size 4627, src size 131072
> > >>>>>
> > >>>>> nr_pages are observed to be 32, 64, 256... These won't cause a crash.
> > >>>>> Other values (max_pages, bsize, block...) looks normal
> > >>>>>
> > >>>>> I'm not sure why the crash happened, but I tried to modify the mask
> > >>>>> for a bit. After modifying the mask value to below, the crash is gone
> > >>>>> (nr_pages are <=256).
> > >>>>> Based on my testing on a 300K pack file, there's no performance change.
> > >>>>>
> > >>>>> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > >>>>> index 20ec48cf97c5..f6d9b6f88ed9 100644
> > >>>>> --- a/fs/squashfs/file.c
> > >>>>> +++ b/fs/squashfs/file.c
> > >>>>> @@ -499,8 +499,8 @@ static void squashfs_readahead(struct
> > >>>>> readahead_control *ractl)
> > >>>>>     {
> > >>>>>            struct inode *inode = ractl->mapping->host;
> > >>>>>            struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> > >>>>> -       size_t mask = (1UL << msblk->block_log) - 1;
> > >>>>>            size_t shift = msblk->block_log - PAGE_SHIFT;
> > >>>>> +       size_t mask = (1UL << shift) - 1;
> > >>>>>
> > >>>>>
> > >>>>> Any pointers are appreciated. Thanks!
> > >>>>
> > >>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-16  9:00                                 ` Xiongwei Song
@ 2022-05-16  9:10                                   ` Hsin-Yi Wang
  2022-05-16  9:34                                     ` Xiongwei Song
  0 siblings, 1 reply; 34+ messages in thread
From: Hsin-Yi Wang @ 2022-05-16  9:10 UTC (permalink / raw)
  To: Xiongwei Song
  Cc: Phillip Lougher, Matthew Wilcox, Zheng Liang, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

On Mon, May 16, 2022 at 5:00 PM Xiongwei Song <sxwjean@gmail.com> wrote:
>
> You can not just sign your signature. You ignored others' contributions.
> This is unacceptable.
>

Hi,

Don't be angry. My thought is, since this is still under review and
I'm not sure if the performance issue is settled, it's more important
to make sure it's ready.

If it's ready, I'll send it to the list formally, so it's easier for
maintainers (Matthew) to pick. At that time, I'll add your Tested-by
(again, I'm not sure the performance now is okay for you or not, so I
didn't add your tag. It would be incorrect to add your tag if the
performance is still not desired) and Phillips's Reviewed-by (Or
Signed-off-by) (I'm also not sure if Phillip or Matthew have other
comments, so I can't add their signature now). Maintainers will
probably add their Signed-off-by when they pick the patch.

I'm sorry that if not adding the tags in this WIP patch now offended you.



> On Mon, May 16, 2022 at 4:23 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> >
> > On Sun, May 15, 2022 at 8:55 AM Phillip Lougher <phillip@squashfs.org.uk> wrote:
> > >
> > > On 13/05/2022 07:35, Hsin-Yi Wang wrote:
> > > > On Fri, May 13, 2022 at 1:33 PM Phillip Lougher <phillip@squashfs.org.uk> wrote:
> > > >>
> > > >> My understanding is that this call will fully populate the
> > > >> pages array with page references without any holes.  That
> > > >> is none of the pages array entries will be NULL, meaning
> > > >> there isn't a page for that entry.  In other words, if the
> > > >> pages array has 32 pages, each of the 32 entries will
> > > >> reference a page.
> > > >>
> > > > I noticed that if nr_pages < max_pages, calling read_blocklist() will
> > > > have SQUASHFS errors,
> > > >
> > > > SQUASHFS error: Failed to read block 0x125ef7d: -5
> > > > SQUASHFS error: zlib decompression failed, data probably corrupt
> > > >
> > > > so I did a check if nr_pages < max_pages before squashfs_read_data(),
> > > > just skip the remaining pages and let them be handled by readpage.
> > > >
> > >
> > > Yes that avoids passing the decompressor code a too small page range.
> > > As such extending the decompressor code isn't necessary.
> > >
> > > Testing your patch I discovered a number of cases where
> > > the decompressor still failed as above.
> > >
> > > This I traced to "sparse blocks", these are zero filled blocks, and
> > > are indicated/stored as a block length of 0 (bsize == 0).  Skipping
> > > this sparse block and letting it be handled by readpage fixes this
> > > issue.
> > >
> > Ack. Thanks for testing this.
> >
> > > I also noticed a potential performance improvement.  You check for
> > > "pages[nr_pages - 1]->index >> shift) == index" after calling
> > > squashfs_read_data.  But this information is known before
> > > calling squashfs_read_data and moving the check to before
> > > squashfs_read_data saves the cost of doing a redundant block
> > > decompression.
> > >
> > After applying this, The performance becomes:
> > 2.73s
> > 2.76s
> > 2.73s
> >
> > Original:
> > 2.76s
> > 2.79s
> > 2.77s
> >
> > (The pack file is different from my previous testing in this email thread.)
> >
> > > Finally I noticed that if nr_pages grows after the __readahead_batch
> > > call, then the pages array and the page actor will be too small, and
> > > it will cause the decompressor to fail.  Changing the allocation to
> > > max_pages fixes this.
> > >
> > Ack.
> >
> > I've added the fixes patch and previous fixes.
> > > I have rolled these fixes into the patch below (also attached in
> > > case it gets garbled).
> > >
> > > diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > > index 7cd57e0d88de..14485a7af5cf 100644
> > > --- a/fs/squashfs/file.c
> > > +++ b/fs/squashfs/file.c
> > > @@ -518,13 +518,11 @@ static void squashfs_readahead(struct
> > > readahead_control *ractl)
> > >             file_end == 0)
> > >                 return;
> > >
> > > -       nr_pages = min(readahead_count(ractl), max_pages);
> > > -
> > > -       pages = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL);
> > > +       pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
> > >         if (!pages)
> > >                 return;
> > >
> > > -       actor = squashfs_page_actor_init_special(pages, nr_pages, 0);
> > > +       actor = squashfs_page_actor_init_special(pages, max_pages, 0);
> > >         if (!actor)
> > >                 goto out;
> > >
> > > @@ -538,11 +536,18 @@ static void squashfs_readahead(struct
> > > readahead_control *ractl)
> > >                         goto skip_pages;
> > >
> > >                 index = pages[0]->index >> shift;
> > > +
> > > +               if ((pages[nr_pages - 1]->index >> shift) != index)
> > > +                       goto skip_pages;
> > > +
> > >                 bsize = read_blocklist(inode, index, &block);
> > > +               if (bsize == 0)
> > > +                       goto skip_pages;
> > > +
> > >                 res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
> > >                                          actor);
> > >
> > > -               if (res >= 0 && (pages[nr_pages - 1]->index >> shift) == index)
> > > +               if (res >= 0)
> > >                         for (i = 0; i < nr_pages; i++)
> > >                                 SetPageUptodate(pages[i]);
> > >
> > > --
> > > 2.34.1
> > >
> > >
> > >
> > > Phillip
> > >
> > >
> > > >> This is important for the decompression code, because it
> > > >> expects each pages array entry to reference a page, which
> > > >> can be kmapped to an address.  If an entry in the pages
> > > >> array is NULL, this will break.
> > > >>
> > > >> If the pages array can have holes (NULL pointers), I have
> > > >> written an update patch which allows the decompression code
> > > >> to handle these NULL pointers.
> > > >>
> > > >> If the pages array can have NULL pointers, I can send you
> > > >> the patch which will deal with this.
> > > >
> > > > Sure, if there are better ways to deal with this.
> > > >
> > > > Thanks.
> > > >
> > > >>
> > > >> Thanks
> > > >>
> > > >> Phillip
> > > >>
> > > >>
> > > >>
> > > >>>
> > > >>>>>
> > > >>>>> It's also noticed that when the crash happened, nr_pages obtained by
> > > >>>>> readahead_count() is 512.
> > > >>>>> nr_pages = readahead_count(ractl); // this line
> > > >>>>>
> > > >>>>> 2) Normal cases that won't crash:
> > > >>>>> [   22.651750] Block @ 0xb3bbca6, compressed size 42172, src size 262144
> > > >>>>> [   22.653580] Block @ 0xb3c6162, compressed size 29815, src size 262144
> > > >>>>> [   22.656692] Block @ 0xb4a293f, compressed size 17484, src size 131072
> > > >>>>> [   22.666099] Block @ 0xb593881, compressed size 39742, src size 262144
> > > >>>>> [   22.668699] Block @ 0xb59d3bf, compressed size 37841, src size 262144
> > > >>>>> [   22.695739] Block @ 0x13698673, compressed size 65907, src size 131072
> > > >>>>> [   22.698619] Block @ 0x136a87e6, compressed size 3155, src size 131072
> > > >>>>> [   22.703400] Block @ 0xb1babe8, compressed size 99391, src size 131072
> > > >>>>> [   22.706288] Block @ 0x1514abc6, compressed size 4627, src size 131072
> > > >>>>>
> > > >>>>> nr_pages are observed to be 32, 64, 256... These won't cause a crash.
> > > >>>>> Other values (max_pages, bsize, block...) looks normal
> > > >>>>>
> > > >>>>> I'm not sure why the crash happened, but I tried to modify the mask
> > > >>>>> for a bit. After modifying the mask value to below, the crash is gone
> > > >>>>> (nr_pages are <=256).
> > > >>>>> Based on my testing on a 300K pack file, there's no performance change.
> > > >>>>>
> > > >>>>> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > > >>>>> index 20ec48cf97c5..f6d9b6f88ed9 100644
> > > >>>>> --- a/fs/squashfs/file.c
> > > >>>>> +++ b/fs/squashfs/file.c
> > > >>>>> @@ -499,8 +499,8 @@ static void squashfs_readahead(struct
> > > >>>>> readahead_control *ractl)
> > > >>>>>     {
> > > >>>>>            struct inode *inode = ractl->mapping->host;
> > > >>>>>            struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> > > >>>>> -       size_t mask = (1UL << msblk->block_log) - 1;
> > > >>>>>            size_t shift = msblk->block_log - PAGE_SHIFT;
> > > >>>>> +       size_t mask = (1UL << shift) - 1;
> > > >>>>>
> > > >>>>>
> > > >>>>> Any pointers are appreciated. Thanks!
> > > >>>>
> > > >>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-16  9:10                                   ` Hsin-Yi Wang
@ 2022-05-16  9:34                                     ` Xiongwei Song
  2022-05-16 11:01                                       ` Hsin-Yi Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Xiongwei Song @ 2022-05-16  9:34 UTC (permalink / raw)
  To: Hsin-Yi Wang
  Cc: Phillip Lougher, Matthew Wilcox, Zheng Liang, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

On Mon, May 16, 2022 at 5:10 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
>
> On Mon, May 16, 2022 at 5:00 PM Xiongwei Song <sxwjean@gmail.com> wrote:
> >
> > You can not just sign your signature. You ignored others' contributions.
> > This is unacceptable.
> >
>
> Hi,
>
> Don't be angry. My thought is, since this is still under review and
> I'm not sure if the performance issue is settled, it's more important
> to make sure it's ready.
>
> If it's ready, I'll send it to the list formally, so it's easier for
> maintainers (Matthew) to pick. At that time, I'll add your Tested-by
> (again, I'm not sure the performance now is okay for you or not, so I
> didn't add your tag. It would be incorrect to add your tag if the
> performance is still not desired) and Phillips's Reviewed-by (Or
> Signed-off-by) (I'm also not sure if Phillip or Matthew have other
> comments, so I can't add their signature now). Maintainers will
> probably add their Signed-off-by when they pick the patch.
>
> I'm sorry that if not adding the tags in this WIP patch now offended you.

Your apology is not sincere. I told you you should release @pages in the
exit path, you didn't even mention it. I told you patch v2 made ~6s
difference, you didn't provide any response. I told you the 9eec1d897139
("squashfs: provide backing_dev_info in order to disable read-ahead")
should be reverted, you didn't reply.  I think you don't  know what is
respection.

>
>
>
> > On Mon, May 16, 2022 at 4:23 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> > >
> > > On Sun, May 15, 2022 at 8:55 AM Phillip Lougher <phillip@squashfs.org.uk> wrote:
> > > >
> > > > On 13/05/2022 07:35, Hsin-Yi Wang wrote:
> > > > > On Fri, May 13, 2022 at 1:33 PM Phillip Lougher <phillip@squashfs.org.uk> wrote:
> > > > >>
> > > > >> My understanding is that this call will fully populate the
> > > > >> pages array with page references without any holes.  That
> > > > >> is none of the pages array entries will be NULL, meaning
> > > > >> there isn't a page for that entry.  In other words, if the
> > > > >> pages array has 32 pages, each of the 32 entries will
> > > > >> reference a page.
> > > > >>
> > > > > I noticed that if nr_pages < max_pages, calling read_blocklist() will
> > > > > have SQUASHFS errors,
> > > > >
> > > > > SQUASHFS error: Failed to read block 0x125ef7d: -5
> > > > > SQUASHFS error: zlib decompression failed, data probably corrupt
> > > > >
> > > > > so I did a check if nr_pages < max_pages before squashfs_read_data(),
> > > > > just skip the remaining pages and let them be handled by readpage.
> > > > >
> > > >
> > > > Yes that avoids passing the decompressor code a too small page range.
> > > > As such extending the decompressor code isn't necessary.
> > > >
> > > > Testing your patch I discovered a number of cases where
> > > > the decompressor still failed as above.
> > > >
> > > > This I traced to "sparse blocks", these are zero filled blocks, and
> > > > are indicated/stored as a block length of 0 (bsize == 0).  Skipping
> > > > this sparse block and letting it be handled by readpage fixes this
> > > > issue.
> > > >
> > > Ack. Thanks for testing this.
> > >
> > > > I also noticed a potential performance improvement.  You check for
> > > > "pages[nr_pages - 1]->index >> shift) == index" after calling
> > > > squashfs_read_data.  But this information is known before
> > > > calling squashfs_read_data and moving the check to before
> > > > squashfs_read_data saves the cost of doing a redundant block
> > > > decompression.
> > > >
> > > After applying this, The performance becomes:
> > > 2.73s
> > > 2.76s
> > > 2.73s
> > >
> > > Original:
> > > 2.76s
> > > 2.79s
> > > 2.77s
> > >
> > > (The pack file is different from my previous testing in this email thread.)
> > >
> > > > Finally I noticed that if nr_pages grows after the __readahead_batch
> > > > call, then the pages array and the page actor will be too small, and
> > > > it will cause the decompressor to fail.  Changing the allocation to
> > > > max_pages fixes this.
> > > >
> > > Ack.
> > >
> > > I've added the fixes patch and previous fixes.
> > > > I have rolled these fixes into the patch below (also attached in
> > > > case it gets garbled).
> > > >
> > > > diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > > > index 7cd57e0d88de..14485a7af5cf 100644
> > > > --- a/fs/squashfs/file.c
> > > > +++ b/fs/squashfs/file.c
> > > > @@ -518,13 +518,11 @@ static void squashfs_readahead(struct
> > > > readahead_control *ractl)
> > > >             file_end == 0)
> > > >                 return;
> > > >
> > > > -       nr_pages = min(readahead_count(ractl), max_pages);
> > > > -
> > > > -       pages = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL);
> > > > +       pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
> > > >         if (!pages)
> > > >                 return;
> > > >
> > > > -       actor = squashfs_page_actor_init_special(pages, nr_pages, 0);
> > > > +       actor = squashfs_page_actor_init_special(pages, max_pages, 0);
> > > >         if (!actor)
> > > >                 goto out;
> > > >
> > > > @@ -538,11 +536,18 @@ static void squashfs_readahead(struct
> > > > readahead_control *ractl)
> > > >                         goto skip_pages;
> > > >
> > > >                 index = pages[0]->index >> shift;
> > > > +
> > > > +               if ((pages[nr_pages - 1]->index >> shift) != index)
> > > > +                       goto skip_pages;
> > > > +
> > > >                 bsize = read_blocklist(inode, index, &block);
> > > > +               if (bsize == 0)
> > > > +                       goto skip_pages;
> > > > +
> > > >                 res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
> > > >                                          actor);
> > > >
> > > > -               if (res >= 0 && (pages[nr_pages - 1]->index >> shift) == index)
> > > > +               if (res >= 0)
> > > >                         for (i = 0; i < nr_pages; i++)
> > > >                                 SetPageUptodate(pages[i]);
> > > >
> > > > --
> > > > 2.34.1
> > > >
> > > >
> > > >
> > > > Phillip
> > > >
> > > >
> > > > >> This is important for the decompression code, because it
> > > > >> expects each pages array entry to reference a page, which
> > > > >> can be kmapped to an address.  If an entry in the pages
> > > > >> array is NULL, this will break.
> > > > >>
> > > > >> If the pages array can have holes (NULL pointers), I have
> > > > >> written an update patch which allows the decompression code
> > > > >> to handle these NULL pointers.
> > > > >>
> > > > >> If the pages array can have NULL pointers, I can send you
> > > > >> the patch which will deal with this.
> > > > >
> > > > > Sure, if there are better ways to deal with this.
> > > > >
> > > > > Thanks.
> > > > >
> > > > >>
> > > > >> Thanks
> > > > >>
> > > > >> Phillip
> > > > >>
> > > > >>
> > > > >>
> > > > >>>
> > > > >>>>>
> > > > >>>>> It's also noticed that when the crash happened, nr_pages obtained by
> > > > >>>>> readahead_count() is 512.
> > > > >>>>> nr_pages = readahead_count(ractl); // this line
> > > > >>>>>
> > > > >>>>> 2) Normal cases that won't crash:
> > > > >>>>> [   22.651750] Block @ 0xb3bbca6, compressed size 42172, src size 262144
> > > > >>>>> [   22.653580] Block @ 0xb3c6162, compressed size 29815, src size 262144
> > > > >>>>> [   22.656692] Block @ 0xb4a293f, compressed size 17484, src size 131072
> > > > >>>>> [   22.666099] Block @ 0xb593881, compressed size 39742, src size 262144
> > > > >>>>> [   22.668699] Block @ 0xb59d3bf, compressed size 37841, src size 262144
> > > > >>>>> [   22.695739] Block @ 0x13698673, compressed size 65907, src size 131072
> > > > >>>>> [   22.698619] Block @ 0x136a87e6, compressed size 3155, src size 131072
> > > > >>>>> [   22.703400] Block @ 0xb1babe8, compressed size 99391, src size 131072
> > > > >>>>> [   22.706288] Block @ 0x1514abc6, compressed size 4627, src size 131072
> > > > >>>>>
> > > > >>>>> nr_pages are observed to be 32, 64, 256... These won't cause a crash.
> > > > >>>>> Other values (max_pages, bsize, block...) looks normal
> > > > >>>>>
> > > > >>>>> I'm not sure why the crash happened, but I tried to modify the mask
> > > > >>>>> for a bit. After modifying the mask value to below, the crash is gone
> > > > >>>>> (nr_pages are <=256).
> > > > >>>>> Based on my testing on a 300K pack file, there's no performance change.
> > > > >>>>>
> > > > >>>>> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > > > >>>>> index 20ec48cf97c5..f6d9b6f88ed9 100644
> > > > >>>>> --- a/fs/squashfs/file.c
> > > > >>>>> +++ b/fs/squashfs/file.c
> > > > >>>>> @@ -499,8 +499,8 @@ static void squashfs_readahead(struct
> > > > >>>>> readahead_control *ractl)
> > > > >>>>>     {
> > > > >>>>>            struct inode *inode = ractl->mapping->host;
> > > > >>>>>            struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> > > > >>>>> -       size_t mask = (1UL << msblk->block_log) - 1;
> > > > >>>>>            size_t shift = msblk->block_log - PAGE_SHIFT;
> > > > >>>>> +       size_t mask = (1UL << shift) - 1;
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Any pointers are appreciated. Thanks!
> > > > >>>>
> > > > >>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: squashfs performance regression and readahea
  2022-05-16  9:34                                     ` Xiongwei Song
@ 2022-05-16 11:01                                       ` Hsin-Yi Wang
  0 siblings, 0 replies; 34+ messages in thread
From: Hsin-Yi Wang @ 2022-05-16 11:01 UTC (permalink / raw)
  To: Xiongwei Song
  Cc: Phillip Lougher, Matthew Wilcox, Zheng Liang, Zhang Yi, Hou Tao,
	Miao Xie, Andrew Morton, Linus Torvalds, Song, Xiongwei,
	linux-mm, squashfs-devel

On Mon, May 16, 2022 at 5:35 PM Xiongwei Song <sxwjean@gmail.com> wrote:
>
> On Mon, May 16, 2022 at 5:10 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> >
> > On Mon, May 16, 2022 at 5:00 PM Xiongwei Song <sxwjean@gmail.com> wrote:
> > >
> > > You can not just sign your signature. You ignored others' contributions.
> > > This is unacceptable.
> > >
> >
> > Hi,
> >
> > Don't be angry. My thought is, since this is still under review and
> > I'm not sure if the performance issue is settled, it's more important
> > to make sure it's ready.
> >
> > If it's ready, I'll send it to the list formally, so it's easier for
> > maintainers (Matthew) to pick. At that time, I'll add your Tested-by
> > (again, I'm not sure the performance now is okay for you or not, so I
> > didn't add your tag. It would be incorrect to add your tag if the
> > performance is still not desired) and Phillips's Reviewed-by (Or
> > Signed-off-by) (I'm also not sure if Phillip or Matthew have other
> > comments, so I can't add their signature now). Maintainers will
> > probably add their Signed-off-by when they pick the patch.
> >
> > I'm sorry that if not adding the tags in this WIP patch now offended you.
>
> Your apology is not sincere. I told you you should release @pages in the
> exit path, you didn't even mention it.

This normally is mentioned in the patch changes if this is sent to the
list. See for example:
https://patchwork.kernel.org/project/dri-devel/patch/20220516085258.1227691-3-cyndis@kapsi.fi/
https://patchwork.kernel.org/project/dri-devel/patch/20220213103437.3363848-3-hsinyi@chromium.org/

It will mention the change in each version. I didn't add them in the
attachments in this thread.
Now I think it's better to send the patch directly to the list
formally to avoid making you angry, though the patch is still
work-in-progress.
https://lore.kernel.org/lkml/20220516105100.1412740-1-hsinyi@chromium.org/T/#md06bd638bcdd985766ee75d17fbaa548218e0038

> I told you patch v2 made ~6s
> difference, you didn't provide any response.

Currently I don't have any better ideas to improve the performance.
I'm sorry about that.

> I told you the 9eec1d897139
> ("squashfs: provide backing_dev_info in order to disable read-ahead")
> should be reverted, you didn't reply.

Again, I didn't send the series formally in this thread since my focus
is still on getting the code right. So I didn't include the revert in
my attachment, only the readahead patch itself.

Talking about revert, since I didn't send a revert patch before. There
are conflict between
124cfc154f6c ("squashfs: Convert squashfs to read_folio") and
reverting 9eec1d897139 ("squashfs: provide backing_dev_info in order
to disable read-ahead").
Currently I just remove the comments in squashfs_fill_super() but I'm
not sure if this is the correct way to do it.
Hi, Matthew, will you handle the reverts? Thanks


> I think you don't  know what is
> respection.
>
> >
> >
> >
> > > On Mon, May 16, 2022 at 4:23 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> > > >
> > > > On Sun, May 15, 2022 at 8:55 AM Phillip Lougher <phillip@squashfs.org.uk> wrote:
> > > > >
> > > > > On 13/05/2022 07:35, Hsin-Yi Wang wrote:
> > > > > > On Fri, May 13, 2022 at 1:33 PM Phillip Lougher <phillip@squashfs.org.uk> wrote:
> > > > > >>
> > > > > >> My understanding is that this call will fully populate the
> > > > > >> pages array with page references without any holes.  That
> > > > > >> is none of the pages array entries will be NULL, meaning
> > > > > >> there isn't a page for that entry.  In other words, if the
> > > > > >> pages array has 32 pages, each of the 32 entries will
> > > > > >> reference a page.
> > > > > >>
> > > > > > I noticed that if nr_pages < max_pages, calling read_blocklist() will
> > > > > > have SQUASHFS errors,
> > > > > >
> > > > > > SQUASHFS error: Failed to read block 0x125ef7d: -5
> > > > > > SQUASHFS error: zlib decompression failed, data probably corrupt
> > > > > >
> > > > > > so I did a check if nr_pages < max_pages before squashfs_read_data(),
> > > > > > just skip the remaining pages and let them be handled by readpage.
> > > > > >
> > > > >
> > > > > Yes that avoids passing the decompressor code a too small page range.
> > > > > As such extending the decompressor code isn't necessary.
> > > > >
> > > > > Testing your patch I discovered a number of cases where
> > > > > the decompressor still failed as above.
> > > > >
> > > > > This I traced to "sparse blocks", these are zero filled blocks, and
> > > > > are indicated/stored as a block length of 0 (bsize == 0).  Skipping
> > > > > this sparse block and letting it be handled by readpage fixes this
> > > > > issue.
> > > > >
> > > > Ack. Thanks for testing this.
> > > >
> > > > > I also noticed a potential performance improvement.  You check for
> > > > > "pages[nr_pages - 1]->index >> shift) == index" after calling
> > > > > squashfs_read_data.  But this information is known before
> > > > > calling squashfs_read_data and moving the check to before
> > > > > squashfs_read_data saves the cost of doing a redundant block
> > > > > decompression.
> > > > >
> > > > After applying this, The performance becomes:
> > > > 2.73s
> > > > 2.76s
> > > > 2.73s
> > > >
> > > > Original:
> > > > 2.76s
> > > > 2.79s
> > > > 2.77s
> > > >
> > > > (The pack file is different from my previous testing in this email thread.)
> > > >
> > > > > Finally I noticed that if nr_pages grows after the __readahead_batch
> > > > > call, then the pages array and the page actor will be too small, and
> > > > > it will cause the decompressor to fail.  Changing the allocation to
> > > > > max_pages fixes this.
> > > > >
> > > > Ack.
> > > >
> > > > I've added the fixes patch and previous fixes.
> > > > > I have rolled these fixes into the patch below (also attached in
> > > > > case it gets garbled).
> > > > >
> > > > > diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > > > > index 7cd57e0d88de..14485a7af5cf 100644
> > > > > --- a/fs/squashfs/file.c
> > > > > +++ b/fs/squashfs/file.c
> > > > > @@ -518,13 +518,11 @@ static void squashfs_readahead(struct
> > > > > readahead_control *ractl)
> > > > >             file_end == 0)
> > > > >                 return;
> > > > >
> > > > > -       nr_pages = min(readahead_count(ractl), max_pages);
> > > > > -
> > > > > -       pages = kmalloc_array(nr_pages, sizeof(void *), GFP_KERNEL);
> > > > > +       pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
> > > > >         if (!pages)
> > > > >                 return;
> > > > >
> > > > > -       actor = squashfs_page_actor_init_special(pages, nr_pages, 0);
> > > > > +       actor = squashfs_page_actor_init_special(pages, max_pages, 0);
> > > > >         if (!actor)
> > > > >                 goto out;
> > > > >
> > > > > @@ -538,11 +536,18 @@ static void squashfs_readahead(struct
> > > > > readahead_control *ractl)
> > > > >                         goto skip_pages;
> > > > >
> > > > >                 index = pages[0]->index >> shift;
> > > > > +
> > > > > +               if ((pages[nr_pages - 1]->index >> shift) != index)
> > > > > +                       goto skip_pages;
> > > > > +
> > > > >                 bsize = read_blocklist(inode, index, &block);
> > > > > +               if (bsize == 0)
> > > > > +                       goto skip_pages;
> > > > > +
> > > > >                 res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
> > > > >                                          actor);
> > > > >
> > > > > -               if (res >= 0 && (pages[nr_pages - 1]->index >> shift) == index)
> > > > > +               if (res >= 0)
> > > > >                         for (i = 0; i < nr_pages; i++)
> > > > >                                 SetPageUptodate(pages[i]);
> > > > >
> > > > > --
> > > > > 2.34.1
> > > > >
> > > > >
> > > > >
> > > > > Phillip
> > > > >
> > > > >
> > > > > >> This is important for the decompression code, because it
> > > > > >> expects each pages array entry to reference a page, which
> > > > > >> can be kmapped to an address.  If an entry in the pages
> > > > > >> array is NULL, this will break.
> > > > > >>
> > > > > >> If the pages array can have holes (NULL pointers), I have
> > > > > >> written an update patch which allows the decompression code
> > > > > >> to handle these NULL pointers.
> > > > > >>
> > > > > >> If the pages array can have NULL pointers, I can send you
> > > > > >> the patch which will deal with this.
> > > > > >
> > > > > > Sure, if there are better ways to deal with this.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >>
> > > > > >> Thanks
> > > > > >>
> > > > > >> Phillip
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>>
> > > > > >>>>>
> > > > > >>>>> It's also noticed that when the crash happened, nr_pages obtained by
> > > > > >>>>> readahead_count() is 512.
> > > > > >>>>> nr_pages = readahead_count(ractl); // this line
> > > > > >>>>>
> > > > > >>>>> 2) Normal cases that won't crash:
> > > > > >>>>> [   22.651750] Block @ 0xb3bbca6, compressed size 42172, src size 262144
> > > > > >>>>> [   22.653580] Block @ 0xb3c6162, compressed size 29815, src size 262144
> > > > > >>>>> [   22.656692] Block @ 0xb4a293f, compressed size 17484, src size 131072
> > > > > >>>>> [   22.666099] Block @ 0xb593881, compressed size 39742, src size 262144
> > > > > >>>>> [   22.668699] Block @ 0xb59d3bf, compressed size 37841, src size 262144
> > > > > >>>>> [   22.695739] Block @ 0x13698673, compressed size 65907, src size 131072
> > > > > >>>>> [   22.698619] Block @ 0x136a87e6, compressed size 3155, src size 131072
> > > > > >>>>> [   22.703400] Block @ 0xb1babe8, compressed size 99391, src size 131072
> > > > > >>>>> [   22.706288] Block @ 0x1514abc6, compressed size 4627, src size 131072
> > > > > >>>>>
> > > > > >>>>> nr_pages are observed to be 32, 64, 256... These won't cause a crash.
> > > > > >>>>> Other values (max_pages, bsize, block...) looks normal
> > > > > >>>>>
> > > > > >>>>> I'm not sure why the crash happened, but I tried to modify the mask
> > > > > >>>>> for a bit. After modifying the mask value to below, the crash is gone
> > > > > >>>>> (nr_pages are <=256).
> > > > > >>>>> Based on my testing on a 300K pack file, there's no performance change.
> > > > > >>>>>
> > > > > >>>>> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > > > > >>>>> index 20ec48cf97c5..f6d9b6f88ed9 100644
> > > > > >>>>> --- a/fs/squashfs/file.c
> > > > > >>>>> +++ b/fs/squashfs/file.c
> > > > > >>>>> @@ -499,8 +499,8 @@ static void squashfs_readahead(struct
> > > > > >>>>> readahead_control *ractl)
> > > > > >>>>>     {
> > > > > >>>>>            struct inode *inode = ractl->mapping->host;
> > > > > >>>>>            struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> > > > > >>>>> -       size_t mask = (1UL << msblk->block_log) - 1;
> > > > > >>>>>            size_t shift = msblk->block_log - PAGE_SHIFT;
> > > > > >>>>> +       size_t mask = (1UL << shift) - 1;
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> Any pointers are appreciated. Thanks!
> > > > > >>>>
> > > > > >>


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2022-05-16 11:01 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-08 14:46 squashfs performance regression and readahea Song, Xiongwei
2022-05-08 16:44 ` Matthew Wilcox
2022-05-09  7:05   ` Hsin-Yi Wang
2022-05-09  7:10     ` Hsin-Yi Wang
2022-05-09  9:42       ` Song, Xiongwei
2022-05-09 12:43       ` Xiongwei Song
2022-05-09 13:21         ` Matthew Wilcox
2022-05-09 14:29           ` Hsin-Yi Wang
2022-05-10 12:30             ` Xiongwei Song
2022-05-10 12:47               ` Hsin-Yi Wang
2022-05-10 13:18                 ` Xiongwei Song
2022-05-11 15:12                   ` Hsin-Yi Wang
2022-05-11 19:13                     ` Phillip Lougher
2022-05-12  6:23                       ` Hsin-Yi Wang
2022-05-13  5:33                         ` Phillip Lougher
2022-05-13  6:35                           ` Hsin-Yi Wang
2022-05-15  0:54                             ` Phillip Lougher
2022-05-16  8:23                               ` Hsin-Yi Wang
2022-05-16  9:00                                 ` Xiongwei Song
2022-05-16  9:10                                   ` Hsin-Yi Wang
2022-05-16  9:34                                     ` Xiongwei Song
2022-05-16 11:01                                       ` Hsin-Yi Wang
2022-05-13 13:09                           ` Matthew Wilcox
2022-05-15  0:05                             ` Phillip Lougher
2022-05-13 12:16                         ` Xiongwei Song
2022-05-13 12:29                           ` Xiongwei Song
2022-05-13 16:43                           ` Hsin-Yi Wang
2022-05-13 18:12                             ` Matthew Wilcox
2022-05-13 23:12                               ` Xiongwei Song
2022-05-14 11:51                               ` Hsin-Yi Wang
2022-05-10  1:11           ` Phillip Lougher
2022-05-10  2:35             ` Matthew Wilcox
2022-05-10  3:20               ` Phillip Lougher
2022-05-10  3:41                 ` Phillip Lougher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.