All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs: zlib: refactor how we prepare the input buffer
@ 2022-06-18  2:39 Qu Wenruo
  2022-06-18  6:14 ` Fabio M. De Francesco
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Qu Wenruo @ 2022-06-18  2:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Fabio M . De Francesco

Inspired by recent kmap() change from Fabio M. De Francesco.

There are some weird behavior in zlib_compress_pages(), mostly around how
we prepare the input buffer.

[BEFORE]
- We hold a page mapped for a long time
  This is making it much harder to convert kmap() to kmap_local_page(),
  as such long mapped page can lead to nested mapped page.

- Different paths in the name of "optimization"
  When we ran out of input buffer, we will grab the new input with two
  different paths:

  * If there are more than one pages left, we copy the content into the
    input buffer.
    This behavior is introduced mostly for S390, as that arch needs
    multiple pages as input buffer for hardware decompression.

  * If there is only one page left, we use that page from page cache
    directly without copying the content.

  This is making page map/unmap much harder, especially due the latter
  case.

[AFTER]
This patch will change the behavior by introducing a new helper, to
fulfill the input buffer:

- Only map one page when we do the content copy

- Unified path, by always copying the page content into workspace
  input buffer
  Yes, we're doing extra page copying. But the original optimization
  only work for the last page of the input range.

  Thus I'd say the sacrifice is already not that big.

- Use kmap_local_page() and kunmap_local() instead
  Now the lifespan for the mapped page is only during memcpy() call,
  we're definitely fine to use kmap_local_page()/kunmap_local().

Cc: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Only tested on x86_64 for the correctness of the new helper.

But considering how small the window we need the page to be mapped, I
think it should also work for x86 without any problem.
---
 fs/btrfs/zlib.c | 85 ++++++++++++++++++++++++-------------------------
 1 file changed, 41 insertions(+), 44 deletions(-)

diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
index 767a0c6c9694..2cd4f6fb1537 100644
--- a/fs/btrfs/zlib.c
+++ b/fs/btrfs/zlib.c
@@ -91,20 +91,54 @@ struct list_head *zlib_alloc_workspace(unsigned int level)
 	return ERR_PTR(-ENOMEM);
 }
 
+/*
+ * Copy the content from page cache into @workspace->buf.
+ *
+ * @total_in:		The original total input length.
+ * @fileoff_ret:	The file offset.
+ *			Will be increased by the number of bytes we read.
+ */
+static void fill_input_buffer(struct workspace *workspace,
+			      struct address_space *mapping,
+			      unsigned long total_in, u64 *fileoff_ret)
+{
+	unsigned long bytes_left = total_in - workspace->strm.total_in;
+	const int input_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
+				    workspace->buf_size / PAGE_SIZE);
+	u64 file_offset = *fileoff_ret;
+	int i;
+
+	/* Copy the content of each page into the input buffer. */
+	for (i = 0; i < input_pages; i++) {
+		struct page *in_page;
+		void *addr;
+
+		in_page = find_get_page(mapping, file_offset >> PAGE_SHIFT);
+
+		addr = kmap_local_page(in_page);
+		memcpy(workspace->buf + i * PAGE_SIZE, addr, PAGE_SIZE);
+		kunmap_local(addr);
+
+		put_page(in_page);
+		file_offset += PAGE_SIZE;
+	}
+	*fileoff_ret = file_offset;
+	workspace->strm.next_in = workspace->buf;
+	workspace->strm.avail_in = min_t(unsigned long, bytes_left,
+					 workspace->buf_size);
+}
+
 int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
 		u64 start, struct page **pages, unsigned long *out_pages,
 		unsigned long *total_in, unsigned long *total_out)
 {
 	struct workspace *workspace = list_entry(ws, struct workspace, list);
+	/* Total input length. */
+	const unsigned long len = *total_out;
 	int ret;
-	char *data_in;
 	char *cpage_out;
 	int nr_pages = 0;
-	struct page *in_page = NULL;
 	struct page *out_page = NULL;
-	unsigned long bytes_left;
-	unsigned int in_buf_pages;
-	unsigned long len = *total_out;
 	unsigned long nr_dest_pages = *out_pages;
 	const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
 
@@ -140,40 +174,8 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
 		 * Get next input pages and copy the contents to
 		 * the workspace buffer if required.
 		 */
-		if (workspace->strm.avail_in == 0) {
-			bytes_left = len - workspace->strm.total_in;
-			in_buf_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
-					   workspace->buf_size / PAGE_SIZE);
-			if (in_buf_pages > 1) {
-				int i;
-
-				for (i = 0; i < in_buf_pages; i++) {
-					if (in_page) {
-						kunmap(in_page);
-						put_page(in_page);
-					}
-					in_page = find_get_page(mapping,
-								start >> PAGE_SHIFT);
-					data_in = kmap(in_page);
-					memcpy(workspace->buf + i * PAGE_SIZE,
-					       data_in, PAGE_SIZE);
-					start += PAGE_SIZE;
-				}
-				workspace->strm.next_in = workspace->buf;
-			} else {
-				if (in_page) {
-					kunmap(in_page);
-					put_page(in_page);
-				}
-				in_page = find_get_page(mapping,
-							start >> PAGE_SHIFT);
-				data_in = kmap(in_page);
-				start += PAGE_SIZE;
-				workspace->strm.next_in = data_in;
-			}
-			workspace->strm.avail_in = min(bytes_left,
-						       (unsigned long) workspace->buf_size);
-		}
+		if (workspace->strm.avail_in == 0)
+			fill_input_buffer(workspace, mapping, len, &start);
 
 		ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH);
 		if (ret != Z_OK) {
@@ -266,11 +268,6 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
 	*out_pages = nr_pages;
 	if (out_page)
 		kunmap(out_page);
-
-	if (in_page) {
-		kunmap(in_page);
-		put_page(in_page);
-	}
 	return ret;
 }
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: zlib: refactor how we prepare the input buffer
  2022-06-18  2:39 [PATCH] btrfs: zlib: refactor how we prepare the input buffer Qu Wenruo
@ 2022-06-18  6:14 ` Fabio M. De Francesco
  2022-06-20 16:08 ` David Sterba
  2022-06-20 21:38 ` Ira Weiny
  2 siblings, 0 replies; 6+ messages in thread
From: Fabio M. De Francesco @ 2022-06-18  6:14 UTC (permalink / raw)
  To: linux-btrfs, Qu Wenruo; +Cc: ira.weiny

On sabato 18 giugno 2022 04:39:28 CEST Qu Wenruo wrote:
> Inspired by recent kmap() change from Fabio M. De Francesco.
> 

Thanks!

>
> There are some weird behavior in zlib_compress_pages(), mostly around how
> we prepare the input buffer.
> 
> [BEFORE]
> - We hold a page mapped for a long time
>   This is making it much harder to convert kmap() to kmap_local_page(),
>   as such long mapped page can lead to nested mapped page.
> 
> - Different paths in the name of "optimization"
>   When we ran out of input buffer, we will grab the new input with two
>   different paths:
> 
>   * If there are more than one pages left, we copy the content into the
>     input buffer.
>     This behavior is introduced mostly for S390, as that arch needs
>     multiple pages as input buffer for hardware decompression.
> 
>   * If there is only one page left, we use that page from page cache
>     directly without copying the content.
> 
>   This is making page map/unmap much harder, especially due the latter
>   case.
> 
> [AFTER]
> This patch will change the behavior by introducing a new helper, to
> fulfill the input buffer:
> 
> - Only map one page when we do the content copy
> 
> - Unified path, by always copying the page content into workspace
>   input buffer
>   Yes, we're doing extra page copying. But the original optimization
>   only work for the last page of the input range.
> 
>   Thus I'd say the sacrifice is already not that big.
> 
> - Use kmap_local_page() and kunmap_local() instead
>   Now the lifespan for the mapped page is only during memcpy() call,
>   we're definitely fine to use kmap_local_page()/kunmap_local().
> 
> Cc: Fabio M. De Francesco <fmdefrancesco@gmail.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Only tested on x86_64 for the correctness of the new helper.
>
> 
> But considering how small the window we need the page to be mapped, I
> think it should also work for x86 without any problem.
>

This patch passed 26/26 "compress" group tests of (x)fstests on a 32-bit 
QEMU + KVM VM (two tests were skipped because they need 5 or more disks, 
but I don't have enough free space).

Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>

tweed32:/usr/lib/xfstests # ./check -g compress
FSTYP         -- btrfs
PLATFORM      -- Linux/i686 tweed32 5.19.0-rc2-vanilla-debug+ #46 SMP 
PREEMPT_DYNAMIC Sat Jun 18 07:30:28 CEST 2022
MKFS_OPTIONS  -- /dev/loop1
MOUNT_OPTIONS -- /dev/loop1 /mnt/scratch

btrfs/024 4s ...  3s
btrfs/026 8s ...  6s
btrfs/037 5s ...  3s
btrfs/038 3s ...  3s
btrfs/041 4s ...  2s
btrfs/062 47s ...  40s
btrfs/063 26s ...  22s
btrfs/067 44s ...  39s
btrfs/068 17s ...  13s
btrfs/070	[not run] btrfs and this test needs 5 or more disks in 
SCRATCH_DEV_POOL
btrfs/071	[not run] btrfs and this test needs 5 or more disks in 
SCRATCH_DEV_POOL
btrfs/072 46s ...  41s
btrfs/073 21s ...  22s
btrfs/074 48s ...  41s
btrfs/076 3s ...  3s
btrfs/103 3s ...  3s
btrfs/106 3s ...  3s
btrfs/109 4s ...  3s
btrfs/113 4s ...  3s
btrfs/138 63s ...  53s
btrfs/149 4s ...  3s
btrfs/183 4s ...  3s
btrfs/205 4s ...  3s
btrfs/234 5s ...  4s
btrfs/246 3s ...  3s
btrfs/251 3s ...  2s
Ran: btrfs/024 btrfs/026 btrfs/037 btrfs/038 btrfs/041 btrfs/062 btrfs/063 
btrfs/067 btrfs/068 btrfs/070 btrfs/071 btrfs/072 btrfs/073 btrfs/074 
btrfs/076 btrfs/103 btrfs/106 btrfs/109 btrfs/113 btrfs/138 btrfs/149 
btrfs/183 btrfs/205 btrfs/234 btrfs/246 btrfs/251
Not run: btrfs/070 btrfs/071
Passed all 26 tests

>
> ---
>  fs/btrfs/zlib.c | 85 ++++++++++++++++++++++++-------------------------
>  1 file changed, 41 insertions(+), 44 deletions(-)
> 
> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
> index 767a0c6c9694..2cd4f6fb1537 100644
> --- a/fs/btrfs/zlib.c
> +++ b/fs/btrfs/zlib.c
> @@ -91,20 +91,54 @@ struct list_head *zlib_alloc_workspace(unsigned int 
level)
>  	return ERR_PTR(-ENOMEM);
>  }
>  
> +/*
> + * Copy the content from page cache into @workspace->buf.
> + *
> + * @total_in:		The original total input length.
> + * @fileoff_ret:	The file offset.
> + *			Will be increased by the number of bytes we 
read.
> + */
> +static void fill_input_buffer(struct workspace *workspace,
> +			      struct address_space *mapping,
> +			      unsigned long total_in, u64 
*fileoff_ret)
> +{
> +	unsigned long bytes_left = total_in - workspace->strm.total_in;
> +	const int input_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
> +				    workspace->buf_size / 
PAGE_SIZE);
> +	u64 file_offset = *fileoff_ret;
> +	int i;
> +
> +	/* Copy the content of each page into the input buffer. */
> +	for (i = 0; i < input_pages; i++) {
> +		struct page *in_page;
> +		void *addr;
> +
> +		in_page = find_get_page(mapping, file_offset >> 
PAGE_SHIFT);
> +
> +		addr = kmap_local_page(in_page);
> +		memcpy(workspace->buf + i * PAGE_SIZE, addr, 
PAGE_SIZE);
> +		kunmap_local(addr);
> +
> +		put_page(in_page);
> +		file_offset += PAGE_SIZE;
> +	}
> +	*fileoff_ret = file_offset;
> +	workspace->strm.next_in = workspace->buf;
> +	workspace->strm.avail_in = min_t(unsigned long, bytes_left,
> +					 workspace->buf_size);
> +}
> +
>  int zlib_compress_pages(struct list_head *ws, struct address_space 
*mapping,
>  		u64 start, struct page **pages, unsigned long 
*out_pages,
>  		unsigned long *total_in, unsigned long *total_out)
>  {
>  	struct workspace *workspace = list_entry(ws, struct workspace, 
list);
> +	/* Total input length. */
> +	const unsigned long len = *total_out;
>  	int ret;
> -	char *data_in;
>  	char *cpage_out;
>  	int nr_pages = 0;
> -	struct page *in_page = NULL;
>  	struct page *out_page = NULL;
> -	unsigned long bytes_left;
> -	unsigned int in_buf_pages;
> -	unsigned long len = *total_out;
>  	unsigned long nr_dest_pages = *out_pages;
>  	const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
>  
> @@ -140,40 +174,8 @@ int zlib_compress_pages(struct list_head *ws, struct 
address_space *mapping,
>  		 * Get next input pages and copy the contents to
>  		 * the workspace buffer if required.
>  		 */
> -		if (workspace->strm.avail_in == 0) {
> -			bytes_left = len - workspace->strm.total_in;
> -			in_buf_pages = min(DIV_ROUND_UP(bytes_left, 
PAGE_SIZE),
> -					   workspace->buf_size / 
PAGE_SIZE);
> -			if (in_buf_pages > 1) {
> -				int i;
> -
> -				for (i = 0; i < in_buf_pages; i++) 
{
> -					if (in_page) {
> -						
kunmap(in_page);
> -						
put_page(in_page);
> -					}
> -					in_page = 
find_get_page(mapping,
> -								
start >> PAGE_SHIFT);
> -					data_in = kmap(in_page);
> -					memcpy(workspace->buf + i 
* PAGE_SIZE,
> -					       data_in, 
PAGE_SIZE);
> -					start += PAGE_SIZE;
> -				}
> -				workspace->strm.next_in = 
workspace->buf;
> -			} else {
> -				if (in_page) {
> -					kunmap(in_page);
> -					put_page(in_page);
> -				}
> -				in_page = find_get_page(mapping,
> -							start 
>> PAGE_SHIFT);
> -				data_in = kmap(in_page);
> -				start += PAGE_SIZE;
> -				workspace->strm.next_in = data_in;
> -			}
> -			workspace->strm.avail_in = min(bytes_left,
> -						       
(unsigned long) workspace->buf_size);
> -		}
> +		if (workspace->strm.avail_in == 0)
> +			fill_input_buffer(workspace, mapping, len, 
&start);
>  
>  		ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH);
>  		if (ret != Z_OK) {
> @@ -266,11 +268,6 @@ int zlib_compress_pages(struct list_head *ws, struct 
address_space *mapping,
>  	*out_pages = nr_pages;
>  	if (out_page)
>  		kunmap(out_page);
> -
> -	if (in_page) {
> -		kunmap(in_page);
> -		put_page(in_page);
> -	}
>  	return ret;
>  }
>  
> -- 
> 2.36.1
> 
Good job!

With your patch, the logic of zlib_compress_pages() is much more 
understandable for people unfamiliar with this code.

Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>

As a side effect (desired and important to me), I can now easily convert 
the remaining kmap() call sites in zlib.c.

Thanks again,

Fabio

PS: I'm adding Ira Weiny to the list of recipients.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: zlib: refactor how we prepare the input buffer
  2022-06-18  2:39 [PATCH] btrfs: zlib: refactor how we prepare the input buffer Qu Wenruo
  2022-06-18  6:14 ` Fabio M. De Francesco
@ 2022-06-20 16:08 ` David Sterba
  2022-06-21  0:40   ` Qu Wenruo
  2022-06-20 21:38 ` Ira Weiny
  2 siblings, 1 reply; 6+ messages in thread
From: David Sterba @ 2022-06-20 16:08 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Fabio M . De Francesco

On Sat, Jun 18, 2022 at 10:39:28AM +0800, Qu Wenruo wrote:
> Inspired by recent kmap() change from Fabio M. De Francesco.
> 
> There are some weird behavior in zlib_compress_pages(), mostly around how
> we prepare the input buffer.
> 
> [BEFORE]
> - We hold a page mapped for a long time
>   This is making it much harder to convert kmap() to kmap_local_page(),
>   as such long mapped page can lead to nested mapped page.
> 
> - Different paths in the name of "optimization"
>   When we ran out of input buffer, we will grab the new input with two
>   different paths:
> 
>   * If there are more than one pages left, we copy the content into the
>     input buffer.
>     This behavior is introduced mostly for S390, as that arch needs
>     multiple pages as input buffer for hardware decompression.
> 
>   * If there is only one page left, we use that page from page cache
>     directly without copying the content.
> 
>   This is making page map/unmap much harder, especially due the latter
>   case.
> 
> [AFTER]
> This patch will change the behavior by introducing a new helper, to
> fulfill the input buffer:
> 
> - Only map one page when we do the content copy
> 
> - Unified path, by always copying the page content into workspace
>   input buffer
>   Yes, we're doing extra page copying. But the original optimization
>   only work for the last page of the input range.
> 
>   Thus I'd say the sacrifice is already not that big.

I don't like that the performance may drop and that there's extra memory
copyiing when not absolutely needed, OTOH it's in zlib code and I think
though it's in use today, the zstd is a sufficient replacement so the
perf drop should have low impact.

> - Use kmap_local_page() and kunmap_local() instead
>   Now the lifespan for the mapped page is only during memcpy() call,
>   we're definitely fine to use kmap_local_page()/kunmap_local().
> 
> Cc: Fabio M. De Francesco <fmdefrancesco@gmail.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Only tested on x86_64 for the correctness of the new helper.
> 
> But considering how small the window we need the page to be mapped, I
> think it should also work for x86 without any problem.
> ---
>  fs/btrfs/zlib.c | 85 ++++++++++++++++++++++++-------------------------
>  1 file changed, 41 insertions(+), 44 deletions(-)
> 
> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
> index 767a0c6c9694..2cd4f6fb1537 100644
> --- a/fs/btrfs/zlib.c
> +++ b/fs/btrfs/zlib.c
> @@ -91,20 +91,54 @@ struct list_head *zlib_alloc_workspace(unsigned int level)
>  	return ERR_PTR(-ENOMEM);
>  }
>  
> +/*
> + * Copy the content from page cache into @workspace->buf.
> + *
> + * @total_in:		The original total input length.
> + * @fileoff_ret:	The file offset.
> + *			Will be increased by the number of bytes we read.
> + */
> +static void fill_input_buffer(struct workspace *workspace,
> +			      struct address_space *mapping,
> +			      unsigned long total_in, u64 *fileoff_ret)
> +{
> +	unsigned long bytes_left = total_in - workspace->strm.total_in;
> +	const int input_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
> +				    workspace->buf_size / PAGE_SIZE);
> +	u64 file_offset = *fileoff_ret;
> +	int i;
> +
> +	/* Copy the content of each page into the input buffer. */
> +	for (i = 0; i < input_pages; i++) {
> +		struct page *in_page;
> +		void *addr;
> +
> +		in_page = find_get_page(mapping, file_offset >> PAGE_SHIFT);
> +
> +		addr = kmap_local_page(in_page);
> +		memcpy(workspace->buf + i * PAGE_SIZE, addr, PAGE_SIZE);

The workspace->buf is 1 page or 4 pages for x390, so here it looks
confusing that it could overflow and no bounds are explicitly checked
wherether the i * PAGE_SIZE offset is still OK. This would at least
deserve a comment and some runtime checks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: zlib: refactor how we prepare the input buffer
  2022-06-18  2:39 [PATCH] btrfs: zlib: refactor how we prepare the input buffer Qu Wenruo
  2022-06-18  6:14 ` Fabio M. De Francesco
  2022-06-20 16:08 ` David Sterba
@ 2022-06-20 21:38 ` Ira Weiny
  2 siblings, 0 replies; 6+ messages in thread
From: Ira Weiny @ 2022-06-20 21:38 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Fabio M . De Francesco

On Sat, Jun 18, 2022 at 10:39:28AM +0800, Qu Wenruo wrote:
> Inspired by recent kmap() change from Fabio M. De Francesco.
> 
> There are some weird behavior in zlib_compress_pages(), mostly around how
> we prepare the input buffer.
> 
> [BEFORE]
> - We hold a page mapped for a long time
>   This is making it much harder to convert kmap() to kmap_local_page(),
>   as such long mapped page can lead to nested mapped page.
> 
> - Different paths in the name of "optimization"
>   When we ran out of input buffer, we will grab the new input with two
>   different paths:
> 
>   * If there are more than one pages left, we copy the content into the
>     input buffer.
>     This behavior is introduced mostly for S390, as that arch needs
>     multiple pages as input buffer for hardware decompression.
> 
>   * If there is only one page left, we use that page from page cache
>     directly without copying the content.
> 
>   This is making page map/unmap much harder, especially due the latter
>   case.
> 
> [AFTER]
> This patch will change the behavior by introducing a new helper, to
> fulfill the input buffer:
> 
> - Only map one page when we do the content copy
> 
> - Unified path, by always copying the page content into workspace
>   input buffer
>   Yes, we're doing extra page copying. But the original optimization
>   only work for the last page of the input range.
> 
>   Thus I'd say the sacrifice is already not that big.
> 
> - Use kmap_local_page() and kunmap_local() instead
>   Now the lifespan for the mapped page is only during memcpy() call,
>   we're definitely fine to use kmap_local_page()/kunmap_local().

Thanks!  This helps a lot.  Minor issue below.

[snip]

> +static void fill_input_buffer(struct workspace *workspace,
> +			      struct address_space *mapping,
> +			      unsigned long total_in, u64 *fileoff_ret)
> +{
> +	unsigned long bytes_left = total_in - workspace->strm.total_in;
> +	const int input_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
> +				    workspace->buf_size / PAGE_SIZE);
> +	u64 file_offset = *fileoff_ret;
> +	int i;
> +
> +	/* Copy the content of each page into the input buffer. */
> +	for (i = 0; i < input_pages; i++) {
> +		struct page *in_page;
> +		void *addr;
> +
> +		in_page = find_get_page(mapping, file_offset >> PAGE_SHIFT);
> +
> +		addr = kmap_local_page(in_page);
> +		memcpy(workspace->buf + i * PAGE_SIZE, addr, PAGE_SIZE);
> +		kunmap_local(addr);

This should be memcpy_from_page().

Ira

> +
> +		put_page(in_page);
> +		file_offset += PAGE_SIZE;
> +	}
> +	*fileoff_ret = file_offset;
> +	workspace->strm.next_in = workspace->buf;
> +	workspace->strm.avail_in = min_t(unsigned long, bytes_left,
> +					 workspace->buf_size);
> +}
> +
>  int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>  		u64 start, struct page **pages, unsigned long *out_pages,
>  		unsigned long *total_in, unsigned long *total_out)
>  {
>  	struct workspace *workspace = list_entry(ws, struct workspace, list);
> +	/* Total input length. */
> +	const unsigned long len = *total_out;
>  	int ret;
> -	char *data_in;
>  	char *cpage_out;
>  	int nr_pages = 0;
> -	struct page *in_page = NULL;
>  	struct page *out_page = NULL;
> -	unsigned long bytes_left;
> -	unsigned int in_buf_pages;
> -	unsigned long len = *total_out;
>  	unsigned long nr_dest_pages = *out_pages;
>  	const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
>  
> @@ -140,40 +174,8 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>  		 * Get next input pages and copy the contents to
>  		 * the workspace buffer if required.
>  		 */
> -		if (workspace->strm.avail_in == 0) {
> -			bytes_left = len - workspace->strm.total_in;
> -			in_buf_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
> -					   workspace->buf_size / PAGE_SIZE);
> -			if (in_buf_pages > 1) {
> -				int i;
> -
> -				for (i = 0; i < in_buf_pages; i++) {
> -					if (in_page) {
> -						kunmap(in_page);
> -						put_page(in_page);
> -					}
> -					in_page = find_get_page(mapping,
> -								start >> PAGE_SHIFT);
> -					data_in = kmap(in_page);
> -					memcpy(workspace->buf + i * PAGE_SIZE,
> -					       data_in, PAGE_SIZE);
> -					start += PAGE_SIZE;
> -				}
> -				workspace->strm.next_in = workspace->buf;
> -			} else {
> -				if (in_page) {
> -					kunmap(in_page);
> -					put_page(in_page);
> -				}
> -				in_page = find_get_page(mapping,
> -							start >> PAGE_SHIFT);
> -				data_in = kmap(in_page);
> -				start += PAGE_SIZE;
> -				workspace->strm.next_in = data_in;
> -			}
> -			workspace->strm.avail_in = min(bytes_left,
> -						       (unsigned long) workspace->buf_size);
> -		}
> +		if (workspace->strm.avail_in == 0)
> +			fill_input_buffer(workspace, mapping, len, &start);
>  
>  		ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH);
>  		if (ret != Z_OK) {
> @@ -266,11 +268,6 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>  	*out_pages = nr_pages;
>  	if (out_page)
>  		kunmap(out_page);
> -
> -	if (in_page) {
> -		kunmap(in_page);
> -		put_page(in_page);
> -	}
>  	return ret;
>  }
>  
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: zlib: refactor how we prepare the input buffer
  2022-06-20 16:08 ` David Sterba
@ 2022-06-21  0:40   ` Qu Wenruo
  2022-06-21  1:43     ` Qu Wenruo
  0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2022-06-21  0:40 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, Fabio M . De Francesco



On 2022/6/21 00:08, David Sterba wrote:
> On Sat, Jun 18, 2022 at 10:39:28AM +0800, Qu Wenruo wrote:
>> Inspired by recent kmap() change from Fabio M. De Francesco.
>>
>> There are some weird behavior in zlib_compress_pages(), mostly around how
>> we prepare the input buffer.
>>
>> [BEFORE]
>> - We hold a page mapped for a long time
>>    This is making it much harder to convert kmap() to kmap_local_page(),
>>    as such long mapped page can lead to nested mapped page.
>>
>> - Different paths in the name of "optimization"
>>    When we ran out of input buffer, we will grab the new input with two
>>    different paths:
>>
>>    * If there are more than one pages left, we copy the content into the
>>      input buffer.
>>      This behavior is introduced mostly for S390, as that arch needs
>>      multiple pages as input buffer for hardware decompression.
>>
>>    * If there is only one page left, we use that page from page cache
>>      directly without copying the content.
>>
>>    This is making page map/unmap much harder, especially due the latter
>>    case.
>>
>> [AFTER]
>> This patch will change the behavior by introducing a new helper, to
>> fulfill the input buffer:
>>
>> - Only map one page when we do the content copy
>>
>> - Unified path, by always copying the page content into workspace
>>    input buffer
>>    Yes, we're doing extra page copying. But the original optimization
>>    only work for the last page of the input range.
>>
>>    Thus I'd say the sacrifice is already not that big.
>
> I don't like that the performance may drop and that there's extra memory
> copyiing when not absolutely needed, OTOH it's in zlib code and I think
> though it's in use today, the zstd is a sufficient replacement so the
> perf drop should have low impact.

My bad, I didn't check buf_size which is conditionally assigned to
PAGE_SIZE or 4 * PAGE_SIZE.

So changing it to memcpy() is going affect all archs other than S390.

I'll change the mapping start and end part to re-enable the old behavior.

Thanks,
Qu

>
>> - Use kmap_local_page() and kunmap_local() instead
>>    Now the lifespan for the mapped page is only during memcpy() call,
>>    we're definitely fine to use kmap_local_page()/kunmap_local().
>>
>> Cc: Fabio M. De Francesco <fmdefrancesco@gmail.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> Only tested on x86_64 for the correctness of the new helper.
>>
>> But considering how small the window we need the page to be mapped, I
>> think it should also work for x86 without any problem.
>> ---
>>   fs/btrfs/zlib.c | 85 ++++++++++++++++++++++++-------------------------
>>   1 file changed, 41 insertions(+), 44 deletions(-)
>>
>> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
>> index 767a0c6c9694..2cd4f6fb1537 100644
>> --- a/fs/btrfs/zlib.c
>> +++ b/fs/btrfs/zlib.c
>> @@ -91,20 +91,54 @@ struct list_head *zlib_alloc_workspace(unsigned int level)
>>   	return ERR_PTR(-ENOMEM);
>>   }
>>
>> +/*
>> + * Copy the content from page cache into @workspace->buf.
>> + *
>> + * @total_in:		The original total input length.
>> + * @fileoff_ret:	The file offset.
>> + *			Will be increased by the number of bytes we read.
>> + */
>> +static void fill_input_buffer(struct workspace *workspace,
>> +			      struct address_space *mapping,
>> +			      unsigned long total_in, u64 *fileoff_ret)
>> +{
>> +	unsigned long bytes_left = total_in - workspace->strm.total_in;
>> +	const int input_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
>> +				    workspace->buf_size / PAGE_SIZE);
>> +	u64 file_offset = *fileoff_ret;
>> +	int i;
>> +
>> +	/* Copy the content of each page into the input buffer. */
>> +	for (i = 0; i < input_pages; i++) {
>> +		struct page *in_page;
>> +		void *addr;
>> +
>> +		in_page = find_get_page(mapping, file_offset >> PAGE_SHIFT);
>> +
>> +		addr = kmap_local_page(in_page);
>> +		memcpy(workspace->buf + i * PAGE_SIZE, addr, PAGE_SIZE);
>
> The workspace->buf is 1 page or 4 pages for x390, so here it looks
> confusing that it could overflow and no bounds are explicitly checked
> wherether the i * PAGE_SIZE offset is still OK. This would at least
> deserve a comment and some runtime checks.

The check is just above, @input_pages is calculated using
workspace->buf_size / PAGE_SIZE.

So it should be OK.

Although the real problem is subpage support, we should not just copy
the full page.

But thankfully, btrfs subpage only support full page compression for now.

Thanks,
Qu


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: zlib: refactor how we prepare the input buffer
  2022-06-21  0:40   ` Qu Wenruo
@ 2022-06-21  1:43     ` Qu Wenruo
  0 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2022-06-21  1:43 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, Fabio M . De Francesco



On 2022/6/21 08:40, Qu Wenruo wrote:
>
>
> On 2022/6/21 00:08, David Sterba wrote:
>> On Sat, Jun 18, 2022 at 10:39:28AM +0800, Qu Wenruo wrote:
>>> Inspired by recent kmap() change from Fabio M. De Francesco.
>>>
>>> There are some weird behavior in zlib_compress_pages(), mostly around
>>> how
>>> we prepare the input buffer.
>>>
>>> [BEFORE]
>>> - We hold a page mapped for a long time
>>>    This is making it much harder to convert kmap() to kmap_local_page(),
>>>    as such long mapped page can lead to nested mapped page.
>>>
>>> - Different paths in the name of "optimization"
>>>    When we ran out of input buffer, we will grab the new input with two
>>>    different paths:
>>>
>>>    * If there are more than one pages left, we copy the content into the
>>>      input buffer.
>>>      This behavior is introduced mostly for S390, as that arch needs
>>>      multiple pages as input buffer for hardware decompression.
>>>
>>>    * If there is only one page left, we use that page from page cache
>>>      directly without copying the content.
>>>
>>>    This is making page map/unmap much harder, especially due the latter
>>>    case.
>>>
>>> [AFTER]
>>> This patch will change the behavior by introducing a new helper, to
>>> fulfill the input buffer:
>>>
>>> - Only map one page when we do the content copy
>>>
>>> - Unified path, by always copying the page content into workspace
>>>    input buffer
>>>    Yes, we're doing extra page copying. But the original optimization
>>>    only work for the last page of the input range.
>>>
>>>    Thus I'd say the sacrifice is already not that big.
>>
>> I don't like that the performance may drop and that there's extra memory
>> copyiing when not absolutely needed, OTOH it's in zlib code and I think
>> though it's in use today, the zstd is a sufficient replacement so the
>> perf drop should have low impact.
>
> My bad, I didn't check buf_size which is conditionally assigned to
> PAGE_SIZE or 4 * PAGE_SIZE.
>
> So changing it to memcpy() is going affect all archs other than S390.
>
> I'll change the mapping start and end part to re-enable the old behavior.

Then the things can get super complex as the original purpose from
Fabio, the nest mapping is our biggest enemy.

For input and output page, we can no guarantee the sequence how they get
mapped/unmapped.

But kmap_local_page() requires the reverse order to unmap them, making
things super complex.

Thus I'd say, either we go the memcpy() path, sacrifice some performance
for the easier to read code, or we always map/unmap the input and output
pages every time we call zlib_deflate().

Thanks,
Qu

>
>>
>>> - Use kmap_local_page() and kunmap_local() instead
>>>    Now the lifespan for the mapped page is only during memcpy() call,
>>>    we're definitely fine to use kmap_local_page()/kunmap_local().
>>>
>>> Cc: Fabio M. De Francesco <fmdefrancesco@gmail.com>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>> Only tested on x86_64 for the correctness of the new helper.
>>>
>>> But considering how small the window we need the page to be mapped, I
>>> think it should also work for x86 without any problem.
>>> ---
>>>   fs/btrfs/zlib.c | 85 ++++++++++++++++++++++++-------------------------
>>>   1 file changed, 41 insertions(+), 44 deletions(-)
>>>
>>> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
>>> index 767a0c6c9694..2cd4f6fb1537 100644
>>> --- a/fs/btrfs/zlib.c
>>> +++ b/fs/btrfs/zlib.c
>>> @@ -91,20 +91,54 @@ struct list_head *zlib_alloc_workspace(unsigned
>>> int level)
>>>       return ERR_PTR(-ENOMEM);
>>>   }
>>>
>>> +/*
>>> + * Copy the content from page cache into @workspace->buf.
>>> + *
>>> + * @total_in:        The original total input length.
>>> + * @fileoff_ret:    The file offset.
>>> + *            Will be increased by the number of bytes we read.
>>> + */
>>> +static void fill_input_buffer(struct workspace *workspace,
>>> +                  struct address_space *mapping,
>>> +                  unsigned long total_in, u64 *fileoff_ret)
>>> +{
>>> +    unsigned long bytes_left = total_in - workspace->strm.total_in;
>>> +    const int input_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
>>> +                    workspace->buf_size / PAGE_SIZE);
>>> +    u64 file_offset = *fileoff_ret;
>>> +    int i;
>>> +
>>> +    /* Copy the content of each page into the input buffer. */
>>> +    for (i = 0; i < input_pages; i++) {
>>> +        struct page *in_page;
>>> +        void *addr;
>>> +
>>> +        in_page = find_get_page(mapping, file_offset >> PAGE_SHIFT);
>>> +
>>> +        addr = kmap_local_page(in_page);
>>> +        memcpy(workspace->buf + i * PAGE_SIZE, addr, PAGE_SIZE);
>>
>> The workspace->buf is 1 page or 4 pages for x390, so here it looks
>> confusing that it could overflow and no bounds are explicitly checked
>> wherether the i * PAGE_SIZE offset is still OK. This would at least
>> deserve a comment and some runtime checks.
>
> The check is just above, @input_pages is calculated using
> workspace->buf_size / PAGE_SIZE.
>
> So it should be OK.
>
> Although the real problem is subpage support, we should not just copy
> the full page.
>
> But thankfully, btrfs subpage only support full page compression for now.
>
> Thanks,
> Qu
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-06-21  1:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-18  2:39 [PATCH] btrfs: zlib: refactor how we prepare the input buffer Qu Wenruo
2022-06-18  6:14 ` Fabio M. De Francesco
2022-06-20 16:08 ` David Sterba
2022-06-21  0:40   ` Qu Wenruo
2022-06-21  1:43     ` Qu Wenruo
2022-06-20 21:38 ` Ira Weiny

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.