All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] btrfs: subpage compressed read path fixes
@ 2021-06-30  9:32 Qu Wenruo
  2021-06-30  9:32 ` [PATCH 1/4] btrfs: grab correct extent map for subpage compressed extent read Qu Wenruo
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Qu Wenruo @ 2021-06-30  9:32 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

During the development of subpage write support for compressed file
extents, there is a strange failure in btrfs/038 which results -EIO
during file read on compressed extents.

It exposed a rabbit hole of problems inside compression code, especially
for subpage case. Those problems including:

- bv_offset is not taken into consideration
  This involves several functions:

  * btrfs_submit_compressed_read()
    Without bv_offset taken into consideration, it can get a wrong
    extent map (can even be uncompressed extent) and cause tons
    of problems when doing decompression

  * btrfs_decompress_buf2page()
    It doesn't take bv_offset into consideration, which means it can
    copy wrong data into inode pages.

- PAGE_SIZE abuse inside lzo decompress code
  This makes padding zero behavior differ between different page size.

- Super awful code quality
  Anything involving two page switching is way more complex than it
  needs to be.
  Tons of comments for parameters are completely meaningless.
  Way too many helper variables, that makes it super hard to grab which
  is the main iteration cursor.

This patchset will fix them by:

- Make btrfs_submit_compressed_read() to do proper calculation
  Just a small fix

- Rework btrfs_decompress_buf2page()
  Not only to make it subpage compatible, but also introduce ASCII art
  to explain the parameter list.
  As there are several different offsets involved.

  Use single cursor for the main loop, separate page switching to
  different loops

- Rework lzo_decompress_bio()
  The same style applied to lzo_decompress_bio(), with proper
  sectorsize/PAGE_SIZE usage.

All the rework result smaller code size, even with the excessive
comments style and extra ASSERT()s.

Since this affects the enablement of basic subpage support (affects the
ability to read existing compressed extents), thus this patchset needs
to be merged before the enablement patch.

Thankfully all those patches should bring minimal amount of conflicts as
they are in compression path, not a location touched by subpage support.


If possible, please merge this patchset early so that we can get more
tests to ensure everything at least works fine for 4K page size.

Qu Wenruo (4):
  btrfs: grab correct extent map for subpage compressed extent read
  btrfs: remove the GFP_HIGHMEM flag for compression code
  btrfs: rework btrfs_decompress_buf2page()
  btrfs: rework lzo_decompress_bio() to make it subpage compatible

 fs/btrfs/compression.c | 171 +++++++++++++++++----------------
 fs/btrfs/compression.h |   5 +-
 fs/btrfs/lzo.c         | 210 +++++++++++++++++------------------------
 fs/btrfs/zlib.c        |  18 ++--
 fs/btrfs/zstd.c        |  12 +--
 5 files changed, 188 insertions(+), 228 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] btrfs: grab correct extent map for subpage compressed extent read
  2021-06-30  9:32 [PATCH 0/4] btrfs: subpage compressed read path fixes Qu Wenruo
@ 2021-06-30  9:32 ` Qu Wenruo
  2021-06-30  9:32 ` [PATCH 2/4] btrfs: remove the GFP_HIGHMEM flag for compression code Qu Wenruo
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2021-06-30  9:32 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

[BUG]
When subpage compressed read write support is enabled, btrfs/038 always
fail with EIO.

A simplified script can easily trigger the problem:

  mkfs.btrfs -f -s 4k $dev
  mount $dev $mnt -o compress=lzo

  xfs_io -f -c "truncate 118811" $mnt/foo
  xfs_io -c "pwrite -S 0x0d -b 39987 92267 39987" $mnt/foo > /dev/null

  sync
  btrfs subvolume snapshot -r $mnt $mnt/mysnap1

  xfs_io -c "pwrite -S 0x3e -b 80000 200000 80000" $mnt/foo > /dev/null
  sync

  xfs_io -c "pwrite -S 0xdc -b 10000 250000 10000" $mnt/foo > /dev/null
  xfs_io -c "pwrite -S 0xff -b 10000 300000 10000" $mnt/foo > /dev/null

  sync
  btrfs subvolume snapshot -r $mnt $mnt/mysnap2

  cat $mnt/mysnap2/foo
  # Above cat will fail due to EIO

[CAUSE]
The problem is in btrfs_submit_compressed_read().

When it tries to grab the extent map of the read range, it uses the
following call:

	em = lookup_extent_mapping(em_tree,
  				   page_offset(bio_first_page_all(bio)),
				   fs_info->sectorsize);

The problem is in the page_offset(bio_first_page_all(bio)) part.

The offending inode has the following file extent layout

        item 10 key (257 EXTENT_DATA 131072) itemoff 15639 itemsize 53
                generation 8 type 1 (regular)
                extent data disk byte 13680640 nr 4096
                extent data offset 0 nr 4096 ram 4096
                extent compression 0 (none)
        item 11 key (257 EXTENT_DATA 135168) itemoff 15586 itemsize 53
                generation 8 type 1 (regular)
                extent data disk byte 0 nr 0
        item 12 key (257 EXTENT_DATA 196608) itemoff 15533 itemsize 53
                generation 8 type 1 (regular)
                extent data disk byte 13676544 nr 4096
                extent data offset 0 nr 53248 ram 86016
                extent compression 2 (lzo)

And the bio passed in has the following parameters:

page_offset(bio_first_page_all(bio))	= 131072
bio_first_bvec_all(bio)->bv_offset	= 65536

If we use page_offset(bio_first_page_all(bio) without adding bv_offset,
we will get an extent map for file offset 131072, not 196608.

This means we read uncompressed data from disk, and later decompression
will definitely fail.

[FIX]
Take bv_offset into consideration when trying to grab an extent map.

And add an ASSERT() to ensure we're really getting a compressed extent.

Thankfully this won't affect anything but subpage, thus we wonly need to
ensure this patch get merged before we enabled basic subpage support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 9a023ae0f98b..19da933c5f1c 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -673,6 +673,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	struct page *page;
 	struct bio *comp_bio;
 	u64 cur_disk_byte = bio->bi_iter.bi_sector << 9;
+	u64 file_offset;
 	u64 em_len;
 	u64 em_start;
 	struct extent_map *em;
@@ -682,15 +683,17 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 
 	em_tree = &BTRFS_I(inode)->extent_tree;
 
+	file_offset = bio_first_bvec_all(bio)->bv_offset +
+		      page_offset(bio_first_page_all(bio));
+
 	/* we need the actual starting offset of this extent in the file */
 	read_lock(&em_tree->lock);
-	em = lookup_extent_mapping(em_tree,
-				   page_offset(bio_first_page_all(bio)),
-				   fs_info->sectorsize);
+	em = lookup_extent_mapping(em_tree, file_offset, fs_info->sectorsize);
 	read_unlock(&em_tree->lock);
 	if (!em)
 		return BLK_STS_IOERR;
 
+	ASSERT(em->compress_type != BTRFS_COMPRESS_NONE);
 	compressed_len = em->block_len;
 	cb = kmalloc(compressed_bio_size(fs_info, compressed_len), GFP_NOFS);
 	if (!cb)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] btrfs: remove the GFP_HIGHMEM flag for compression code
  2021-06-30  9:32 [PATCH 0/4] btrfs: subpage compressed read path fixes Qu Wenruo
  2021-06-30  9:32 ` [PATCH 1/4] btrfs: grab correct extent map for subpage compressed extent read Qu Wenruo
@ 2021-06-30  9:32 ` Qu Wenruo
  2021-06-30 13:06   ` David Sterba
  2021-06-30  9:32 ` [PATCH 3/4] btrfs: rework btrfs_decompress_buf2page() Qu Wenruo
  2021-06-30  9:32 ` [PATCH 4/4] btrfs: rework lzo_decompress_bio() to make it subpage compatible Qu Wenruo
  3 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2021-06-30  9:32 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

This allows later decompress functions to get rid of kmap()/kunmap()
pairs.

And since all other filesystems are getting rid of HIGHMEM, it should
not be a problem for btrfs.

Although we removed the HIGHMEM allocation, we still keep the
kmap()/kunmap() pairs.
They will be removed when involved functions are refactored later.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 3 +--
 fs/btrfs/lzo.c         | 4 ++--
 fs/btrfs/zlib.c        | 6 +++---
 fs/btrfs/zstd.c        | 6 +++---
 4 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 19da933c5f1c..8318e56b5ab4 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -724,8 +724,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 		goto fail1;
 
 	for (pg_index = 0; pg_index < nr_pages; pg_index++) {
-		cb->compressed_pages[pg_index] = alloc_page(GFP_NOFS |
-							      __GFP_HIGHMEM);
+		cb->compressed_pages[pg_index] = alloc_page(GFP_NOFS);
 		if (!cb->compressed_pages[pg_index]) {
 			faili = pg_index - 1;
 			ret = BLK_STS_RESOURCE;
diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
index cd042c7567a4..2bebb60c5830 100644
--- a/fs/btrfs/lzo.c
+++ b/fs/btrfs/lzo.c
@@ -146,7 +146,7 @@ int lzo_compress_pages(struct list_head *ws, struct address_space *mapping,
 	 * store the size of all chunks of compressed data in
 	 * the first 4 bytes
 	 */
-	out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+	out_page = alloc_page(GFP_NOFS);
 	if (out_page == NULL) {
 		ret = -ENOMEM;
 		goto out;
@@ -216,7 +216,7 @@ int lzo_compress_pages(struct list_head *ws, struct address_space *mapping,
 					goto out;
 				}
 
-				out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+				out_page = alloc_page(GFP_NOFS);
 				if (out_page == NULL) {
 					ret = -ENOMEM;
 					goto out;
diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
index c3fa7d3fa770..2c792bc5a987 100644
--- a/fs/btrfs/zlib.c
+++ b/fs/btrfs/zlib.c
@@ -121,7 +121,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
 	workspace->strm.total_in = 0;
 	workspace->strm.total_out = 0;
 
-	out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+	out_page = alloc_page(GFP_NOFS);
 	if (out_page == NULL) {
 		ret = -ENOMEM;
 		goto out;
@@ -202,7 +202,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
 				ret = -E2BIG;
 				goto out;
 			}
-			out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+			out_page = alloc_page(GFP_NOFS);
 			if (out_page == NULL) {
 				ret = -ENOMEM;
 				goto out;
@@ -240,7 +240,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
 				ret = -E2BIG;
 				goto out;
 			}
-			out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+			out_page = alloc_page(GFP_NOFS);
 			if (out_page == NULL) {
 				ret = -ENOMEM;
 				goto out;
diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c
index 3e26b466476a..9451d2bb984e 100644
--- a/fs/btrfs/zstd.c
+++ b/fs/btrfs/zstd.c
@@ -405,7 +405,7 @@ int zstd_compress_pages(struct list_head *ws, struct address_space *mapping,
 
 
 	/* Allocate and map in the output buffer */
-	out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+	out_page = alloc_page(GFP_NOFS);
 	if (out_page == NULL) {
 		ret = -ENOMEM;
 		goto out;
@@ -452,7 +452,7 @@ int zstd_compress_pages(struct list_head *ws, struct address_space *mapping,
 				ret = -E2BIG;
 				goto out;
 			}
-			out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+			out_page = alloc_page(GFP_NOFS);
 			if (out_page == NULL) {
 				ret = -ENOMEM;
 				goto out;
@@ -512,7 +512,7 @@ int zstd_compress_pages(struct list_head *ws, struct address_space *mapping,
 			ret = -E2BIG;
 			goto out;
 		}
-		out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+		out_page = alloc_page(GFP_NOFS);
 		if (out_page == NULL) {
 			ret = -ENOMEM;
 			goto out;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] btrfs: rework btrfs_decompress_buf2page()
  2021-06-30  9:32 [PATCH 0/4] btrfs: subpage compressed read path fixes Qu Wenruo
  2021-06-30  9:32 ` [PATCH 1/4] btrfs: grab correct extent map for subpage compressed extent read Qu Wenruo
  2021-06-30  9:32 ` [PATCH 2/4] btrfs: remove the GFP_HIGHMEM flag for compression code Qu Wenruo
@ 2021-06-30  9:32 ` Qu Wenruo
  2021-06-30 10:50   ` Qu Wenruo
  2021-06-30  9:32 ` [PATCH 4/4] btrfs: rework lzo_decompress_bio() to make it subpage compatible Qu Wenruo
  3 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2021-06-30  9:32 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

There are several bugs inside the function btrfs_decompress_buf2page()

- @start_byte doesn't take bvec.bv_offset into consideration
  Thus it can't handle case where the target range is not page aligned.

- Too many helper variables
  There are tons of helper variables, @buf_offset, @current_buf_start,
  @start_byte, @prev_start_byte, @working_bytes, @bytes.
  This hurts anyone who wants to read the function.

- No obvious main cursor for the iteartion
  A new problem caused by previous problem.

- Comments for parameter list makes no sense
  Like @buf_start is the offset to @buf, or offset inside the full
  decompressed extent? (Spoiler alert, the later case)
  And @total_out acts more like @buf_start + @size_of_buf.

  The worst is @disk_start.
  The real meaning of it is the file offset of the full decompressed
  extent.

This patch will rework the whole function by:

- Add a proper comment with ASCII art to explain the parameter list

- Rework parameter list
  The old @buf_start is renamed to @decompressed, to show how many bytes
  are already decompressed inside the full decompressed extent.
  The old @total_out is replaced by @buf_len, which is the decompressed
  data size.
  For old @disk_start and @bio, just pass @compressed_bio in.

- Use single main cursor
  The main cursor will be @cur_file_offset, to show what's the current
  file offset.
  Other helper variables will be declared inside the main loop, and only
  minimal amount of helper variables:
  * offset_inside_decompressed_buf:	The only real helper
  * copy_start_file_offset:		File offset we start memcpy
  * bvec_file_offset:			File offset of current bvec

Even with all these extensive comments, the final function is still
smaller than the original function, which is definitely a win.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 159 ++++++++++++++++++++---------------------
 fs/btrfs/compression.h |   5 +-
 fs/btrfs/lzo.c         |   8 +--
 fs/btrfs/zlib.c        |  12 ++--
 fs/btrfs/zstd.c        |   6 +-
 5 files changed, 89 insertions(+), 101 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 8318e56b5ab4..28f24c8ac3c1 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1263,96 +1263,93 @@ void __cold btrfs_exit_compress(void)
 }
 
 /*
- * Copy uncompressed data from working buffer to pages.
+ * Copy decompressed data from working buffer to pages.
  *
- * buf_start is the byte offset we're of the start of our workspace buffer.
+ * @buf:		The decompressed data buffer
+ * @buf_len:		The decompressed data length
+ * @decompressed:	Number of bytes that is already decompressed inside the
+ * 			compressed extent
+ * @cb:			The compressed extent descriptor
+ * @orig_bio:		The original bio that the caller wants to read for
  *
- * total_out is the last byte of the buffer
+ * An easier to understand graph is like below:
+ *
+ * 		|<- orig_bio ->|     |<- orig_bio->|
+ * 	|<-------      full decompressed extent      ----->|
+ * 	|<-----------    @cb range   ---->|
+ * 	|			|<-- @buf_len -->|
+ * 	|<--- @decompressed --->|
+ *
+ * Note that, @cb can be a subpage of the full decompressed extent, but
+ * @cb->start always has the same as the orig_file_offset value of the full
+ * decompressed extent.
+ *
+ * When reading compressed extent, we have to read the full compressed extent,
+ * while @orig_bio may only want part of the range.
+ * Thus this function will ensure only data covered by @orig_bio will be copied
+ * to.
+ *
+ * Return 0 if we have copied all needed contents for @orig_bio.
+ * Return >0 if we need continue decompress.
  */
-int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
-			      unsigned long total_out, u64 disk_start,
-			      struct bio *bio)
+int btrfs_decompress_buf2page(const char *buf, u32 buf_len,
+			      struct compressed_bio *cb, u32 decompressed)
 {
-	unsigned long buf_offset;
-	unsigned long current_buf_start;
-	unsigned long start_byte;
-	unsigned long prev_start_byte;
-	unsigned long working_bytes = total_out - buf_start;
-	unsigned long bytes;
-	struct bio_vec bvec = bio_iter_iovec(bio, bio->bi_iter);
-
-	/*
-	 * start byte is the first byte of the page we're currently
-	 * copying into relative to the start of the compressed data.
-	 */
-	start_byte = page_offset(bvec.bv_page) - disk_start;
-
-	/* we haven't yet hit data corresponding to this page */
-	if (total_out <= start_byte)
-		return 1;
-
-	/*
-	 * the start of the data we care about is offset into
-	 * the middle of our working buffer
-	 */
-	if (total_out > start_byte && buf_start < start_byte) {
-		buf_offset = start_byte - buf_start;
-		working_bytes -= buf_offset;
-	} else {
-		buf_offset = 0;
-	}
-	current_buf_start = buf_start;
-
-	/* copy bytes from the working buffer into the pages */
-	while (working_bytes > 0) {
-		bytes = min_t(unsigned long, bvec.bv_len,
-				PAGE_SIZE - (buf_offset % PAGE_SIZE));
-		bytes = min(bytes, working_bytes);
-
-		memcpy_to_page(bvec.bv_page, bvec.bv_offset, buf + buf_offset,
-			       bytes);
-		flush_dcache_page(bvec.bv_page);
-
-		buf_offset += bytes;
-		working_bytes -= bytes;
-		current_buf_start += bytes;
-
-		/* check if we need to pick another page */
-		bio_advance(bio, bytes);
-		if (!bio->bi_iter.bi_size)
-			return 0;
-		bvec = bio_iter_iovec(bio, bio->bi_iter);
-		prev_start_byte = start_byte;
-		start_byte = page_offset(bvec.bv_page) - disk_start;
+	const u64 orig_file_offset = cb->start;
+	const u64 buf_file_offset = orig_file_offset + decompressed;
+	struct bio *orig_bio = cb->orig_bio;
+	u64 cur_file_offset = buf_file_offset;
+
+	/* The main loop to do the copy */
+	while (cur_file_offset < buf_file_offset + buf_len) {
+		struct bio_vec bvec = bio_iter_iovec(orig_bio,
+						     orig_bio->bi_iter);
+		size_t copy_len;
+		u32 offset_inside_decompressed_buf;
+		u64 copy_start_file_offset;
+		u64 bvec_file_offset;
+
+		bvec = bio_iter_iovec(orig_bio, orig_bio->bi_iter);
+		bvec_file_offset = page_offset(bvec.bv_page) + bvec.bv_offset;
+
+		/* Haven't reached the bvec range, exit */
+		if (buf_file_offset + buf_len <= bvec_file_offset)
+			return 1;
+
+		copy_start_file_offset = max(bvec_file_offset, cur_file_offset);
+		copy_len = min(bvec_file_offset + bvec.bv_len,
+			       buf_file_offset + buf_len) -
+			   copy_start_file_offset;
+		ASSERT(copy_len);
+		ASSERT(copy_len <= buf_file_offset + buf_len -
+				   copy_start_file_offset);
 
 		/*
-		 * We need to make sure we're only adjusting
-		 * our offset into compression working buffer when
-		 * we're switching pages.  Otherwise we can incorrectly
-		 * keep copying when we were actually done.
+		 * Extra range check to ensure we didn't go beyond
+		 * @buf + @buf_len.
+		 *
+		 * (copy_start_file_offset - orig_file_offset) is the offset
+		 * inside the full decompressed extent.
+		 * then (- decompressed) we got the offset inside the
+		 * decompressed data @buf, which should not exceed @buf_len.
 		 */
-		if (start_byte != prev_start_byte) {
-			/*
-			 * make sure our new page is covered by this
-			 * working buffer
-			 */
-			if (total_out <= start_byte)
-				return 1;
-
-			/*
-			 * the next page in the biovec might not be adjacent
-			 * to the last page, but it might still be found
-			 * inside this working buffer. bump our offset pointer
-			 */
-			if (total_out > start_byte &&
-			    current_buf_start < start_byte) {
-				buf_offset = start_byte - buf_start;
-				working_bytes = total_out - start_byte;
-				current_buf_start = buf_start + buf_offset;
-			}
+		offset_inside_decompressed_buf = copy_start_file_offset -
+						 orig_file_offset -
+						 decompressed;
+		ASSERT(offset_inside_decompressed_buf < buf_len);
+		memcpy_to_page(bvec.bv_page, bvec.bv_offset,
+			       buf + offset_inside_decompressed_buf,
+			       copy_len);
+		cur_file_offset += copy_len;
+
+		/* Check if we need to advanced to next bvec */
+		if (cur_file_offset >= bvec_file_offset + bvec.bv_len) {
+			bio_advance(orig_bio, copy_len);
+			/* Finished the bio */
+			if (!orig_bio->bi_iter.bi_size)
+				return 0;
 		}
 	}
-
 	return 1;
 }
 
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index c359f20920d0..399be0b435bf 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -86,9 +86,8 @@ int btrfs_compress_pages(unsigned int type_level, struct address_space *mapping,
 			 unsigned long *total_out);
 int btrfs_decompress(int type, unsigned char *data_in, struct page *dest_page,
 		     unsigned long start_byte, size_t srclen, size_t destlen);
-int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
-			      unsigned long total_out, u64 disk_start,
-			      struct bio *bio);
+int btrfs_decompress_buf2page(const char *buf, u32 buf_len,
+			      struct compressed_bio *cb, u32 decompressed);
 
 blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 				  unsigned int len, u64 disk_start,
diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
index 2bebb60c5830..2dbbfd33e5a5 100644
--- a/fs/btrfs/lzo.c
+++ b/fs/btrfs/lzo.c
@@ -301,8 +301,6 @@ int lzo_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
 	char *buf;
 	bool may_late_unmap, need_unmap;
 	struct page **pages_in = cb->compressed_pages;
-	u64 disk_start = cb->start;
-	struct bio *orig_bio = cb->orig_bio;
 
 	data_in = kmap(pages_in[0]);
 	tot_len = read_compress_length(data_in);
@@ -407,15 +405,15 @@ int lzo_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
 		buf_start = tot_out;
 		tot_out += out_len;
 
-		ret2 = btrfs_decompress_buf2page(workspace->buf, buf_start,
-						 tot_out, disk_start, orig_bio);
+		ret2 = btrfs_decompress_buf2page(workspace->buf, out_len,
+						 cb, buf_start);
 		if (ret2 == 0)
 			break;
 	}
 done:
 	kunmap(pages_in[page_in_index]);
 	if (!ret)
-		zero_fill_bio(orig_bio);
+		zero_fill_bio(cb->orig_bio);
 	return ret;
 }
 
diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
index 2c792bc5a987..767a0c6c9694 100644
--- a/fs/btrfs/zlib.c
+++ b/fs/btrfs/zlib.c
@@ -286,8 +286,6 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
 	unsigned long total_pages_in = DIV_ROUND_UP(srclen, PAGE_SIZE);
 	unsigned long buf_start;
 	struct page **pages_in = cb->compressed_pages;
-	u64 disk_start = cb->start;
-	struct bio *orig_bio = cb->orig_bio;
 
 	data_in = kmap(pages_in[page_in_index]);
 	workspace->strm.next_in = data_in;
@@ -326,9 +324,8 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
 		if (buf_start == total_out)
 			break;
 
-		ret2 = btrfs_decompress_buf2page(workspace->buf, buf_start,
-						 total_out, disk_start,
-						 orig_bio);
+		ret2 = btrfs_decompress_buf2page(workspace->buf,
+				total_out - buf_start, cb, buf_start);
 		if (ret2 == 0) {
 			ret = 0;
 			goto done;
@@ -348,8 +345,7 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
 			data_in = kmap(pages_in[page_in_index]);
 			workspace->strm.next_in = data_in;
 			tmp = srclen - workspace->strm.total_in;
-			workspace->strm.avail_in = min(tmp,
-							   PAGE_SIZE);
+			workspace->strm.avail_in = min(tmp, PAGE_SIZE);
 		}
 	}
 	if (ret != Z_STREAM_END)
@@ -361,7 +357,7 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
 	if (data_in)
 		kunmap(pages_in[page_in_index]);
 	if (!ret)
-		zero_fill_bio(orig_bio);
+		zero_fill_bio(cb->orig_bio);
 	return ret;
 }
 
diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c
index 9451d2bb984e..f06b68040352 100644
--- a/fs/btrfs/zstd.c
+++ b/fs/btrfs/zstd.c
@@ -547,8 +547,6 @@ int zstd_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
 {
 	struct workspace *workspace = list_entry(ws, struct workspace, list);
 	struct page **pages_in = cb->compressed_pages;
-	u64 disk_start = cb->start;
-	struct bio *orig_bio = cb->orig_bio;
 	size_t srclen = cb->compressed_len;
 	ZSTD_DStream *stream;
 	int ret = 0;
@@ -589,7 +587,7 @@ int zstd_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
 		workspace->out_buf.pos = 0;
 
 		ret = btrfs_decompress_buf2page(workspace->out_buf.dst,
-				buf_start, total_out, disk_start, orig_bio);
+				total_out - buf_start, cb, buf_start);
 		if (ret == 0)
 			break;
 
@@ -614,7 +612,7 @@ int zstd_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
 		}
 	}
 	ret = 0;
-	zero_fill_bio(orig_bio);
+	zero_fill_bio(cb->orig_bio);
 done:
 	if (workspace->in_buf.src)
 		kunmap(pages_in[page_in_index]);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] btrfs: rework lzo_decompress_bio() to make it subpage compatible
  2021-06-30  9:32 [PATCH 0/4] btrfs: subpage compressed read path fixes Qu Wenruo
                   ` (2 preceding siblings ...)
  2021-06-30  9:32 ` [PATCH 3/4] btrfs: rework btrfs_decompress_buf2page() Qu Wenruo
@ 2021-06-30  9:32 ` Qu Wenruo
  3 siblings, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2021-06-30  9:32 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

For the initial subpage support, although we won't support compressed
write, we still need to support compressed read.

But for lzo_decompress_bio() it has several problems:

- The abuse of PAGE_SIZE for boundary detection
  For subpage case, we should follow sectorsize to detect the padding
  zeros.
  Using PAGE_SIZE will cause subpage compress read to skip certain
  bytes, and causing read error.

- Too many helper variables
  There are half a dozen helper variables, which is only making things
  harder to read

This patch will rework lzo_decompress_bio() to make it work for subpage:

- Use sectorsize to do boundary check, while still use PAGE_SIZE for
  page switching
  This allows us to have the same on-disk format for 4K sectorsize fs,
  while take advantage of larger page size.

- Use two main cursor
  Only @cur_in and @cur_out is utilized as the main cursor.
  The helper variables will only be declared inside the loop, and only 2
  helper variables needed.

- Introduce a helper function to copy compressed segment payload
  Introduce a new helper, copy_compressed_segment(), to copy a
  compressed segment to workspace buffer.
  This function will handle the page switching.

Now the net result is, with all the excessive comments and new helper
function, the refactored code is still smaller, and easier to read.

For other decompression code, they have no special padding rule, thus no
need to bother for initial subpage support, but will be refactored to
the same style later.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/lzo.c | 202 +++++++++++++++++++++----------------------------
 1 file changed, 86 insertions(+), 116 deletions(-)

diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
index 2dbbfd33e5a5..5fbbc4caaad0 100644
--- a/fs/btrfs/lzo.c
+++ b/fs/btrfs/lzo.c
@@ -14,6 +14,7 @@
 #include <linux/lzo.h>
 #include <linux/refcount.h>
 #include "compression.h"
+#include "ctree.h"
 
 #define LZO_LEN	4
 
@@ -278,140 +279,109 @@ int lzo_compress_pages(struct list_head *ws, struct address_space *mapping,
 	return ret;
 }
 
+/*
+ * Copy the compressed segment payload into @dest.
+ *
+ * For the payload there will be no padding, just need to do page switching.
+ */
+static void copy_compressed_segment(struct compressed_bio *cb,
+				    char *dest, u32 len, u32 *cur_in)
+{
+	u32 orig_in = *cur_in;
+
+	while (*cur_in < orig_in + len) {
+		struct page *cur_page;
+		u32 copy_len = min_t(u32, PAGE_SIZE - offset_in_page(*cur_in),
+					  orig_in + len - *cur_in);
+
+		ASSERT(copy_len);
+		cur_page = cb->compressed_pages[*cur_in / PAGE_SIZE];
+
+		memcpy(dest + *cur_in - orig_in,
+			page_address(cur_page) + offset_in_page(*cur_in),
+			copy_len);
+
+		*cur_in += copy_len;
+	}
+}
+
 int lzo_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
 {
 	struct workspace *workspace = list_entry(ws, struct workspace, list);
-	int ret = 0, ret2;
-	char *data_in;
-	unsigned long page_in_index = 0;
-	size_t srclen = cb->compressed_len;
-	unsigned long total_pages_in = DIV_ROUND_UP(srclen, PAGE_SIZE);
-	unsigned long buf_start;
-	unsigned long buf_offset = 0;
-	unsigned long bytes;
-	unsigned long working_bytes;
-	size_t in_len;
-	size_t out_len;
-	const size_t max_segment_len = lzo1x_worst_compress(PAGE_SIZE);
-	unsigned long in_offset;
-	unsigned long in_page_bytes_left;
-	unsigned long tot_in;
-	unsigned long tot_out;
-	unsigned long tot_len;
-	char *buf;
-	bool may_late_unmap, need_unmap;
-	struct page **pages_in = cb->compressed_pages;
+	const struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb);
+	const u32 sectorsize = fs_info->sectorsize;
+	int ret;
+	u32 len_in;		/* Compressed data length, can be unaligned */
+	u32 cur_in = 0;         /* Offset inside the compressed data */
+	u64 cur_out = 0;        /* Bytes decompressed so far */
+
+	len_in = read_compress_length(page_address(cb->compressed_pages[0]));
+	cur_in += LZO_LEN;
 
-	data_in = kmap(pages_in[0]);
-	tot_len = read_compress_length(data_in);
 	/*
-	 * Compressed data header check.
+	 * LZO header length check
 	 *
-	 * The real compressed size can't exceed the maximum extent length, and
-	 * all pages should be used (whole unused page with just the segment
-	 * header is not possible).  If this happens it means the compressed
-	 * extent is corrupted.
+	 * The total length should not exceed the maximum extent lenght,
+	 * and all sectors should be used.
+	 * If this happens, it means the compressed extent is corrupted.
 	 */
-	if (tot_len > min_t(size_t, BTRFS_MAX_COMPRESSED, srclen) ||
-	    tot_len < srclen - PAGE_SIZE) {
-		ret = -EUCLEAN;
-		goto done;
+	if (len_in > min_t(size_t, BTRFS_MAX_COMPRESSED, cb->compressed_len) ||
+	    round_up(len_in, sectorsize) < cb->compressed_len) {
+		btrfs_err(fs_info,
+			"invalid lzo header, lzo len %u compressed len %u",
+			  len_in, cb->compressed_len);
+		return -EUCLEAN;
 	}
 
-	tot_in = LZO_LEN;
-	in_offset = LZO_LEN;
-	in_page_bytes_left = PAGE_SIZE - LZO_LEN;
-
-	tot_out = 0;
-
-	while (tot_in < tot_len) {
-		in_len = read_compress_length(data_in + in_offset);
-		in_page_bytes_left -= LZO_LEN;
-		in_offset += LZO_LEN;
-		tot_in += LZO_LEN;
+	/* Go through each lzo segment */
+	while (cur_in < len_in) {
+		struct page *cur_page;
+		u32 seg_len;	/* Length of the compressed segment */
+		u32 sector_bytes_left;
+		size_t out_len = lzo1x_worst_compress(sectorsize);
 
 		/*
-		 * Segment header check.
-		 *
-		 * The segment length must not exceed the maximum LZO
-		 * compression size, nor the total compressed size.
+		 * We should always have enough space for one segment header
+		 * inside current sector.
 		 */
-		if (in_len > max_segment_len || tot_in + in_len > tot_len) {
-			ret = -EUCLEAN;
-			goto done;
-		}
-
-		tot_in += in_len;
-		working_bytes = in_len;
-		may_late_unmap = need_unmap = false;
-
-		/* fast path: avoid using the working buffer */
-		if (in_page_bytes_left >= in_len) {
-			buf = data_in + in_offset;
-			bytes = in_len;
-			may_late_unmap = true;
-			goto cont;
-		}
-
-		/* copy bytes from the pages into the working buffer */
-		buf = workspace->cbuf;
-		buf_offset = 0;
-		while (working_bytes) {
-			bytes = min(working_bytes, in_page_bytes_left);
-
-			memcpy(buf + buf_offset, data_in + in_offset, bytes);
-			buf_offset += bytes;
-cont:
-			working_bytes -= bytes;
-			in_page_bytes_left -= bytes;
-			in_offset += bytes;
-
-			/* check if we need to pick another page */
-			if ((working_bytes == 0 && in_page_bytes_left < LZO_LEN)
-			    || in_page_bytes_left == 0) {
-				tot_in += in_page_bytes_left;
-
-				if (working_bytes == 0 && tot_in >= tot_len)
-					break;
-
-				if (page_in_index + 1 >= total_pages_in) {
-					ret = -EIO;
-					goto done;
-				}
-
-				if (may_late_unmap)
-					need_unmap = true;
-				else
-					kunmap(pages_in[page_in_index]);
-
-				data_in = kmap(pages_in[++page_in_index]);
-
-				in_page_bytes_left = PAGE_SIZE;
-				in_offset = 0;
-			}
-		}
-
-		out_len = max_segment_len;
-		ret = lzo1x_decompress_safe(buf, in_len, workspace->buf,
-					    &out_len);
-		if (need_unmap)
-			kunmap(pages_in[page_in_index - 1]);
+		ASSERT(cur_in / sectorsize ==
+		       (cur_in + LZO_LEN - 1) / sectorsize);
+		cur_page = cb->compressed_pages[cur_in / PAGE_SIZE];
+		ASSERT(cur_page);
+		seg_len = read_compress_length(page_address(cur_page) +
+					       offset_in_page(cur_in));
+		cur_in += LZO_LEN;
+
+		/* Copy the compressed segment payload into workspace */
+		copy_compressed_segment(cb, workspace->cbuf, seg_len, &cur_in);
+
+		/* Decompress the data */
+		ret = lzo1x_decompress_safe(workspace->cbuf, seg_len,
+					    workspace->buf, &out_len);
 		if (ret != LZO_E_OK) {
-			pr_warn("BTRFS: decompress failed\n");
+			btrfs_err(fs_info, "failed to decompress");
 			ret = -EIO;
-			break;
+			goto out;
 		}
 
-		buf_start = tot_out;
-		tot_out += out_len;
+		/* Copy the data into inode pages */
+		ret = btrfs_decompress_buf2page(workspace->buf, out_len, cb, cur_out);
+		cur_out += out_len;
 
-		ret2 = btrfs_decompress_buf2page(workspace->buf, out_len,
-						 cb, buf_start);
-		if (ret2 == 0)
-			break;
+		/* All data read, exit */
+		if (ret == 0)
+			goto out;
+		ret = 0;
+
+		/* Check if the sector has enough space for a segment header */
+		sector_bytes_left = sectorsize - cur_in % sectorsize;
+		if (sector_bytes_left >= LZO_LEN)
+			continue;
+
+		/* Skip the padding zeros */
+		cur_in += sector_bytes_left;
 	}
-done:
-	kunmap(pages_in[page_in_index]);
+out:
 	if (!ret)
 		zero_fill_bio(cb->orig_bio);
 	return ret;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/4] btrfs: rework btrfs_decompress_buf2page()
  2021-06-30  9:32 ` [PATCH 3/4] btrfs: rework btrfs_decompress_buf2page() Qu Wenruo
@ 2021-06-30 10:50   ` Qu Wenruo
  0 siblings, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2021-06-30 10:50 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 2021/6/30 下午5:32, Qu Wenruo wrote:
> There are several bugs inside the function btrfs_decompress_buf2page()
>
> - @start_byte doesn't take bvec.bv_offset into consideration
>    Thus it can't handle case where the target range is not page aligned.
>
> - Too many helper variables
>    There are tons of helper variables, @buf_offset, @current_buf_start,
>    @start_byte, @prev_start_byte, @working_bytes, @bytes.
>    This hurts anyone who wants to read the function.
>
> - No obvious main cursor for the iteartion
>    A new problem caused by previous problem.
>
> - Comments for parameter list makes no sense
>    Like @buf_start is the offset to @buf, or offset inside the full
>    decompressed extent? (Spoiler alert, the later case)
>    And @total_out acts more like @buf_start + @size_of_buf.
>
>    The worst is @disk_start.
>    The real meaning of it is the file offset of the full decompressed
>    extent.
>
> This patch will rework the whole function by:
>
> - Add a proper comment with ASCII art to explain the parameter list
>
> - Rework parameter list
>    The old @buf_start is renamed to @decompressed, to show how many bytes
>    are already decompressed inside the full decompressed extent.
>    The old @total_out is replaced by @buf_len, which is the decompressed
>    data size.
>    For old @disk_start and @bio, just pass @compressed_bio in.
>
> - Use single main cursor
>    The main cursor will be @cur_file_offset, to show what's the current
>    file offset.
>    Other helper variables will be declared inside the main loop, and only
>    minimal amount of helper variables:
>    * offset_inside_decompressed_buf:	The only real helper
>    * copy_start_file_offset:		File offset we start memcpy
>    * bvec_file_offset:			File offset of current bvec
>
> Even with all these extensive comments, the final function is still
> smaller than the original function, which is definitely a win.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/compression.c | 159 ++++++++++++++++++++---------------------
>   fs/btrfs/compression.h |   5 +-
>   fs/btrfs/lzo.c         |   8 +--
>   fs/btrfs/zlib.c        |  12 ++--
>   fs/btrfs/zstd.c        |   6 +-
>   5 files changed, 89 insertions(+), 101 deletions(-)
>
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index 8318e56b5ab4..28f24c8ac3c1 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -1263,96 +1263,93 @@ void __cold btrfs_exit_compress(void)
>   }
>
>   /*
> - * Copy uncompressed data from working buffer to pages.
> + * Copy decompressed data from working buffer to pages.
>    *
> - * buf_start is the byte offset we're of the start of our workspace buffer.
> + * @buf:		The decompressed data buffer
> + * @buf_len:		The decompressed data length
> + * @decompressed:	Number of bytes that is already decompressed inside the
> + * 			compressed extent
> + * @cb:			The compressed extent descriptor
> + * @orig_bio:		The original bio that the caller wants to read for
>    *
> - * total_out is the last byte of the buffer
> + * An easier to understand graph is like below:
> + *
> + * 		|<- orig_bio ->|     |<- orig_bio->|
> + * 	|<-------      full decompressed extent      ----->|
> + * 	|<-----------    @cb range   ---->|
> + * 	|			|<-- @buf_len -->|
> + * 	|<--- @decompressed --->|
> + *
> + * Note that, @cb can be a subpage of the full decompressed extent, but
> + * @cb->start always has the same as the orig_file_offset value of the full
> + * decompressed extent.
> + *
> + * When reading compressed extent, we have to read the full compressed extent,
> + * while @orig_bio may only want part of the range.
> + * Thus this function will ensure only data covered by @orig_bio will be copied
> + * to.
> + *
> + * Return 0 if we have copied all needed contents for @orig_bio.
> + * Return >0 if we need continue decompress.
>    */
> -int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
> -			      unsigned long total_out, u64 disk_start,
> -			      struct bio *bio)
> +int btrfs_decompress_buf2page(const char *buf, u32 buf_len,
> +			      struct compressed_bio *cb, u32 decompressed)
>   {
> -	unsigned long buf_offset;
> -	unsigned long current_buf_start;
> -	unsigned long start_byte;
> -	unsigned long prev_start_byte;
> -	unsigned long working_bytes = total_out - buf_start;
> -	unsigned long bytes;
> -	struct bio_vec bvec = bio_iter_iovec(bio, bio->bi_iter);
> -
> -	/*
> -	 * start byte is the first byte of the page we're currently
> -	 * copying into relative to the start of the compressed data.
> -	 */
> -	start_byte = page_offset(bvec.bv_page) - disk_start;
> -
> -	/* we haven't yet hit data corresponding to this page */
> -	if (total_out <= start_byte)
> -		return 1;
> -
> -	/*
> -	 * the start of the data we care about is offset into
> -	 * the middle of our working buffer
> -	 */
> -	if (total_out > start_byte && buf_start < start_byte) {
> -		buf_offset = start_byte - buf_start;
> -		working_bytes -= buf_offset;
> -	} else {
> -		buf_offset = 0;
> -	}
> -	current_buf_start = buf_start;
> -
> -	/* copy bytes from the working buffer into the pages */
> -	while (working_bytes > 0) {
> -		bytes = min_t(unsigned long, bvec.bv_len,
> -				PAGE_SIZE - (buf_offset % PAGE_SIZE));
> -		bytes = min(bytes, working_bytes);
> -
> -		memcpy_to_page(bvec.bv_page, bvec.bv_offset, buf + buf_offset,
> -			       bytes);
> -		flush_dcache_page(bvec.bv_page);
> -
> -		buf_offset += bytes;
> -		working_bytes -= bytes;
> -		current_buf_start += bytes;
> -
> -		/* check if we need to pick another page */
> -		bio_advance(bio, bytes);
> -		if (!bio->bi_iter.bi_size)
> -			return 0;
> -		bvec = bio_iter_iovec(bio, bio->bi_iter);
> -		prev_start_byte = start_byte;
> -		start_byte = page_offset(bvec.bv_page) - disk_start;
> +	const u64 orig_file_offset = cb->start;

This part can underflow when the compressed extent is reflinked to a
lower file offset.

Thus we can't use file_offset as the main cursor, as underflowed value
can easily screw up later max/min calculation.

This can be exposed by btrfs/097.

I'll need to find a better cursor for this case then.

Thanks,
Qu

> +	const u64 buf_file_offset = orig_file_offset + decompressed;
> +	struct bio *orig_bio = cb->orig_bio;
> +	u64 cur_file_offset = buf_file_offset;
> +
> +	/* The main loop to do the copy */
> +	while (cur_file_offset < buf_file_offset + buf_len) {
> +		struct bio_vec bvec = bio_iter_iovec(orig_bio,
> +						     orig_bio->bi_iter);
> +		size_t copy_len;
> +		u32 offset_inside_decompressed_buf;
> +		u64 copy_start_file_offset;
> +		u64 bvec_file_offset;
> +
> +		bvec = bio_iter_iovec(orig_bio, orig_bio->bi_iter);
> +		bvec_file_offset = page_offset(bvec.bv_page) + bvec.bv_offset;
> +
> +		/* Haven't reached the bvec range, exit */
> +		if (buf_file_offset + buf_len <= bvec_file_offset)
> +			return 1;
> +
> +		copy_start_file_offset = max(bvec_file_offset, cur_file_offset);
> +		copy_len = min(bvec_file_offset + bvec.bv_len,
> +			       buf_file_offset + buf_len) -
> +			   copy_start_file_offset;
> +		ASSERT(copy_len);
> +		ASSERT(copy_len <= buf_file_offset + buf_len -
> +				   copy_start_file_offset);
>
>   		/*
> -		 * We need to make sure we're only adjusting
> -		 * our offset into compression working buffer when
> -		 * we're switching pages.  Otherwise we can incorrectly
> -		 * keep copying when we were actually done.
> +		 * Extra range check to ensure we didn't go beyond
> +		 * @buf + @buf_len.
> +		 *
> +		 * (copy_start_file_offset - orig_file_offset) is the offset
> +		 * inside the full decompressed extent.
> +		 * then (- decompressed) we got the offset inside the
> +		 * decompressed data @buf, which should not exceed @buf_len.
>   		 */
> -		if (start_byte != prev_start_byte) {
> -			/*
> -			 * make sure our new page is covered by this
> -			 * working buffer
> -			 */
> -			if (total_out <= start_byte)
> -				return 1;
> -
> -			/*
> -			 * the next page in the biovec might not be adjacent
> -			 * to the last page, but it might still be found
> -			 * inside this working buffer. bump our offset pointer
> -			 */
> -			if (total_out > start_byte &&
> -			    current_buf_start < start_byte) {
> -				buf_offset = start_byte - buf_start;
> -				working_bytes = total_out - start_byte;
> -				current_buf_start = buf_start + buf_offset;
> -			}
> +		offset_inside_decompressed_buf = copy_start_file_offset -
> +						 orig_file_offset -
> +						 decompressed;
> +		ASSERT(offset_inside_decompressed_buf < buf_len);
> +		memcpy_to_page(bvec.bv_page, bvec.bv_offset,
> +			       buf + offset_inside_decompressed_buf,
> +			       copy_len);
> +		cur_file_offset += copy_len;
> +
> +		/* Check if we need to advanced to next bvec */
> +		if (cur_file_offset >= bvec_file_offset + bvec.bv_len) {
> +			bio_advance(orig_bio, copy_len);
> +			/* Finished the bio */
> +			if (!orig_bio->bi_iter.bi_size)
> +				return 0;
>   		}
>   	}
> -
>   	return 1;
>   }
>
> diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
> index c359f20920d0..399be0b435bf 100644
> --- a/fs/btrfs/compression.h
> +++ b/fs/btrfs/compression.h
> @@ -86,9 +86,8 @@ int btrfs_compress_pages(unsigned int type_level, struct address_space *mapping,
>   			 unsigned long *total_out);
>   int btrfs_decompress(int type, unsigned char *data_in, struct page *dest_page,
>   		     unsigned long start_byte, size_t srclen, size_t destlen);
> -int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
> -			      unsigned long total_out, u64 disk_start,
> -			      struct bio *bio);
> +int btrfs_decompress_buf2page(const char *buf, u32 buf_len,
> +			      struct compressed_bio *cb, u32 decompressed);
>
>   blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
>   				  unsigned int len, u64 disk_start,
> diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
> index 2bebb60c5830..2dbbfd33e5a5 100644
> --- a/fs/btrfs/lzo.c
> +++ b/fs/btrfs/lzo.c
> @@ -301,8 +301,6 @@ int lzo_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
>   	char *buf;
>   	bool may_late_unmap, need_unmap;
>   	struct page **pages_in = cb->compressed_pages;
> -	u64 disk_start = cb->start;
> -	struct bio *orig_bio = cb->orig_bio;
>
>   	data_in = kmap(pages_in[0]);
>   	tot_len = read_compress_length(data_in);
> @@ -407,15 +405,15 @@ int lzo_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
>   		buf_start = tot_out;
>   		tot_out += out_len;
>
> -		ret2 = btrfs_decompress_buf2page(workspace->buf, buf_start,
> -						 tot_out, disk_start, orig_bio);
> +		ret2 = btrfs_decompress_buf2page(workspace->buf, out_len,
> +						 cb, buf_start);
>   		if (ret2 == 0)
>   			break;
>   	}
>   done:
>   	kunmap(pages_in[page_in_index]);
>   	if (!ret)
> -		zero_fill_bio(orig_bio);
> +		zero_fill_bio(cb->orig_bio);
>   	return ret;
>   }
>
> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
> index 2c792bc5a987..767a0c6c9694 100644
> --- a/fs/btrfs/zlib.c
> +++ b/fs/btrfs/zlib.c
> @@ -286,8 +286,6 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
>   	unsigned long total_pages_in = DIV_ROUND_UP(srclen, PAGE_SIZE);
>   	unsigned long buf_start;
>   	struct page **pages_in = cb->compressed_pages;
> -	u64 disk_start = cb->start;
> -	struct bio *orig_bio = cb->orig_bio;
>
>   	data_in = kmap(pages_in[page_in_index]);
>   	workspace->strm.next_in = data_in;
> @@ -326,9 +324,8 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
>   		if (buf_start == total_out)
>   			break;
>
> -		ret2 = btrfs_decompress_buf2page(workspace->buf, buf_start,
> -						 total_out, disk_start,
> -						 orig_bio);
> +		ret2 = btrfs_decompress_buf2page(workspace->buf,
> +				total_out - buf_start, cb, buf_start);
>   		if (ret2 == 0) {
>   			ret = 0;
>   			goto done;
> @@ -348,8 +345,7 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
>   			data_in = kmap(pages_in[page_in_index]);
>   			workspace->strm.next_in = data_in;
>   			tmp = srclen - workspace->strm.total_in;
> -			workspace->strm.avail_in = min(tmp,
> -							   PAGE_SIZE);
> +			workspace->strm.avail_in = min(tmp, PAGE_SIZE);
>   		}
>   	}
>   	if (ret != Z_STREAM_END)
> @@ -361,7 +357,7 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
>   	if (data_in)
>   		kunmap(pages_in[page_in_index]);
>   	if (!ret)
> -		zero_fill_bio(orig_bio);
> +		zero_fill_bio(cb->orig_bio);
>   	return ret;
>   }
>
> diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c
> index 9451d2bb984e..f06b68040352 100644
> --- a/fs/btrfs/zstd.c
> +++ b/fs/btrfs/zstd.c
> @@ -547,8 +547,6 @@ int zstd_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
>   {
>   	struct workspace *workspace = list_entry(ws, struct workspace, list);
>   	struct page **pages_in = cb->compressed_pages;
> -	u64 disk_start = cb->start;
> -	struct bio *orig_bio = cb->orig_bio;
>   	size_t srclen = cb->compressed_len;
>   	ZSTD_DStream *stream;
>   	int ret = 0;
> @@ -589,7 +587,7 @@ int zstd_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
>   		workspace->out_buf.pos = 0;
>
>   		ret = btrfs_decompress_buf2page(workspace->out_buf.dst,
> -				buf_start, total_out, disk_start, orig_bio);
> +				total_out - buf_start, cb, buf_start);
>   		if (ret == 0)
>   			break;
>
> @@ -614,7 +612,7 @@ int zstd_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
>   		}
>   	}
>   	ret = 0;
> -	zero_fill_bio(orig_bio);
> +	zero_fill_bio(cb->orig_bio);
>   done:
>   	if (workspace->in_buf.src)
>   		kunmap(pages_in[page_in_index]);
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] btrfs: remove the GFP_HIGHMEM flag for compression code
  2021-06-30  9:32 ` [PATCH 2/4] btrfs: remove the GFP_HIGHMEM flag for compression code Qu Wenruo
@ 2021-06-30 13:06   ` David Sterba
  2021-06-30 13:13     ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: David Sterba @ 2021-06-30 13:06 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Jun 30, 2021 at 05:32:31PM +0800, Qu Wenruo wrote:
> This allows later decompress functions to get rid of kmap()/kunmap()
> pairs.
> 
> And since all other filesystems are getting rid of HIGHMEM, it should
> not be a problem for btrfs.
> 
> Although we removed the HIGHMEM allocation, we still keep the
> kmap()/kunmap() pairs.
> They will be removed when involved functions are refactored later.

Without removing the kmaps it's incomplete so I'll post the series
removing it from the compression code at least.

> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/compression.c | 3 +--
>  fs/btrfs/lzo.c         | 4 ++--
>  fs/btrfs/zlib.c        | 6 +++---
>  fs/btrfs/zstd.c        | 6 +++---
>  4 files changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index 19da933c5f1c..8318e56b5ab4 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -724,8 +724,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
>  		goto fail1;
>  
>  	for (pg_index = 0; pg_index < nr_pages; pg_index++) {
> -		cb->compressed_pages[pg_index] = alloc_page(GFP_NOFS |
> -							      __GFP_HIGHMEM);
> +		cb->compressed_pages[pg_index] = alloc_page(GFP_NOFS);
>  		if (!cb->compressed_pages[pg_index]) {
>  			faili = pg_index - 1;
>  			ret = BLK_STS_RESOURCE;
> diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
> index cd042c7567a4..2bebb60c5830 100644
> --- a/fs/btrfs/lzo.c
> +++ b/fs/btrfs/lzo.c
> @@ -146,7 +146,7 @@ int lzo_compress_pages(struct list_head *ws, struct address_space *mapping,
>  	 * store the size of all chunks of compressed data in
>  	 * the first 4 bytes
>  	 */
> -	out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> +	out_page = alloc_page(GFP_NOFS);
>  	if (out_page == NULL) {
>  		ret = -ENOMEM;
>  		goto out;
> @@ -216,7 +216,7 @@ int lzo_compress_pages(struct list_head *ws, struct address_space *mapping,
>  					goto out;
>  				}
>  
> -				out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> +				out_page = alloc_page(GFP_NOFS);
>  				if (out_page == NULL) {
>  					ret = -ENOMEM;
>  					goto out;
> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
> index c3fa7d3fa770..2c792bc5a987 100644
> --- a/fs/btrfs/zlib.c
> +++ b/fs/btrfs/zlib.c
> @@ -121,7 +121,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>  	workspace->strm.total_in = 0;
>  	workspace->strm.total_out = 0;
>  
> -	out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> +	out_page = alloc_page(GFP_NOFS);
>  	if (out_page == NULL) {
>  		ret = -ENOMEM;
>  		goto out;
> @@ -202,7 +202,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>  				ret = -E2BIG;
>  				goto out;
>  			}
> -			out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> +			out_page = alloc_page(GFP_NOFS);
>  			if (out_page == NULL) {
>  				ret = -ENOMEM;
>  				goto out;
> @@ -240,7 +240,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>  				ret = -E2BIG;
>  				goto out;
>  			}
> -			out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> +			out_page = alloc_page(GFP_NOFS);
>  			if (out_page == NULL) {
>  				ret = -ENOMEM;
>  				goto out;
> diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c
> index 3e26b466476a..9451d2bb984e 100644
> --- a/fs/btrfs/zstd.c
> +++ b/fs/btrfs/zstd.c
> @@ -405,7 +405,7 @@ int zstd_compress_pages(struct list_head *ws, struct address_space *mapping,
>  
>  
>  	/* Allocate and map in the output buffer */
> -	out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> +	out_page = alloc_page(GFP_NOFS);
>  	if (out_page == NULL) {
>  		ret = -ENOMEM;
>  		goto out;
> @@ -452,7 +452,7 @@ int zstd_compress_pages(struct list_head *ws, struct address_space *mapping,
>  				ret = -E2BIG;
>  				goto out;
>  			}
> -			out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> +			out_page = alloc_page(GFP_NOFS);
>  			if (out_page == NULL) {
>  				ret = -ENOMEM;
>  				goto out;
> @@ -512,7 +512,7 @@ int zstd_compress_pages(struct list_head *ws, struct address_space *mapping,
>  			ret = -E2BIG;
>  			goto out;
>  		}
> -		out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> +		out_page = alloc_page(GFP_NOFS);
>  		if (out_page == NULL) {
>  			ret = -ENOMEM;
>  			goto out;
> -- 
> 2.32.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] btrfs: remove the GFP_HIGHMEM flag for compression code
  2021-06-30 13:06   ` David Sterba
@ 2021-06-30 13:13     ` Qu Wenruo
  2021-06-30 13:26       ` David Sterba
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2021-06-30 13:13 UTC (permalink / raw)
  To: dsterba, linux-btrfs



On 2021/6/30 下午9:06, David Sterba wrote:
> On Wed, Jun 30, 2021 at 05:32:31PM +0800, Qu Wenruo wrote:
>> This allows later decompress functions to get rid of kmap()/kunmap()
>> pairs.
>>
>> And since all other filesystems are getting rid of HIGHMEM, it should
>> not be a problem for btrfs.
>>
>> Although we removed the HIGHMEM allocation, we still keep the
>> kmap()/kunmap() pairs.
>> They will be removed when involved functions are refactored later.
> 
> Without removing the kmaps it's incomplete so I'll post the series
> removing it from the compression code at least.

But kmap()/kunmap() can still work on those pages, right?

This is just a preparation for the later patches which refactor 
lzo_decompress_bio() and btrfs_decompress_buf2page().

Thus I don't think we need to do that in one go.

Thanks,
Qu
> 
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/compression.c | 3 +--
>>   fs/btrfs/lzo.c         | 4 ++--
>>   fs/btrfs/zlib.c        | 6 +++---
>>   fs/btrfs/zstd.c        | 6 +++---
>>   4 files changed, 9 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
>> index 19da933c5f1c..8318e56b5ab4 100644
>> --- a/fs/btrfs/compression.c
>> +++ b/fs/btrfs/compression.c
>> @@ -724,8 +724,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
>>   		goto fail1;
>>   
>>   	for (pg_index = 0; pg_index < nr_pages; pg_index++) {
>> -		cb->compressed_pages[pg_index] = alloc_page(GFP_NOFS |
>> -							      __GFP_HIGHMEM);
>> +		cb->compressed_pages[pg_index] = alloc_page(GFP_NOFS);
>>   		if (!cb->compressed_pages[pg_index]) {
>>   			faili = pg_index - 1;
>>   			ret = BLK_STS_RESOURCE;
>> diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
>> index cd042c7567a4..2bebb60c5830 100644
>> --- a/fs/btrfs/lzo.c
>> +++ b/fs/btrfs/lzo.c
>> @@ -146,7 +146,7 @@ int lzo_compress_pages(struct list_head *ws, struct address_space *mapping,
>>   	 * store the size of all chunks of compressed data in
>>   	 * the first 4 bytes
>>   	 */
>> -	out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
>> +	out_page = alloc_page(GFP_NOFS);
>>   	if (out_page == NULL) {
>>   		ret = -ENOMEM;
>>   		goto out;
>> @@ -216,7 +216,7 @@ int lzo_compress_pages(struct list_head *ws, struct address_space *mapping,
>>   					goto out;
>>   				}
>>   
>> -				out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
>> +				out_page = alloc_page(GFP_NOFS);
>>   				if (out_page == NULL) {
>>   					ret = -ENOMEM;
>>   					goto out;
>> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
>> index c3fa7d3fa770..2c792bc5a987 100644
>> --- a/fs/btrfs/zlib.c
>> +++ b/fs/btrfs/zlib.c
>> @@ -121,7 +121,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>>   	workspace->strm.total_in = 0;
>>   	workspace->strm.total_out = 0;
>>   
>> -	out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
>> +	out_page = alloc_page(GFP_NOFS);
>>   	if (out_page == NULL) {
>>   		ret = -ENOMEM;
>>   		goto out;
>> @@ -202,7 +202,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>>   				ret = -E2BIG;
>>   				goto out;
>>   			}
>> -			out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
>> +			out_page = alloc_page(GFP_NOFS);
>>   			if (out_page == NULL) {
>>   				ret = -ENOMEM;
>>   				goto out;
>> @@ -240,7 +240,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>>   				ret = -E2BIG;
>>   				goto out;
>>   			}
>> -			out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
>> +			out_page = alloc_page(GFP_NOFS);
>>   			if (out_page == NULL) {
>>   				ret = -ENOMEM;
>>   				goto out;
>> diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c
>> index 3e26b466476a..9451d2bb984e 100644
>> --- a/fs/btrfs/zstd.c
>> +++ b/fs/btrfs/zstd.c
>> @@ -405,7 +405,7 @@ int zstd_compress_pages(struct list_head *ws, struct address_space *mapping,
>>   
>>   
>>   	/* Allocate and map in the output buffer */
>> -	out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
>> +	out_page = alloc_page(GFP_NOFS);
>>   	if (out_page == NULL) {
>>   		ret = -ENOMEM;
>>   		goto out;
>> @@ -452,7 +452,7 @@ int zstd_compress_pages(struct list_head *ws, struct address_space *mapping,
>>   				ret = -E2BIG;
>>   				goto out;
>>   			}
>> -			out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
>> +			out_page = alloc_page(GFP_NOFS);
>>   			if (out_page == NULL) {
>>   				ret = -ENOMEM;
>>   				goto out;
>> @@ -512,7 +512,7 @@ int zstd_compress_pages(struct list_head *ws, struct address_space *mapping,
>>   			ret = -E2BIG;
>>   			goto out;
>>   		}
>> -		out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
>> +		out_page = alloc_page(GFP_NOFS);
>>   		if (out_page == NULL) {
>>   			ret = -ENOMEM;
>>   			goto out;
>> -- 
>> 2.32.0
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] btrfs: remove the GFP_HIGHMEM flag for compression code
  2021-06-30 13:13     ` Qu Wenruo
@ 2021-06-30 13:26       ` David Sterba
  0 siblings, 0 replies; 9+ messages in thread
From: David Sterba @ 2021-06-30 13:26 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, linux-btrfs

On Wed, Jun 30, 2021 at 09:13:03PM +0800, Qu Wenruo wrote:
> 
> 
> On 2021/6/30 下午9:06, David Sterba wrote:
> > On Wed, Jun 30, 2021 at 05:32:31PM +0800, Qu Wenruo wrote:
> >> This allows later decompress functions to get rid of kmap()/kunmap()
> >> pairs.
> >>
> >> And since all other filesystems are getting rid of HIGHMEM, it should
> >> not be a problem for btrfs.
> >>
> >> Although we removed the HIGHMEM allocation, we still keep the
> >> kmap()/kunmap() pairs.
> >> They will be removed when involved functions are refactored later.
> > 
> > Without removing the kmaps it's incomplete so I'll post the series
> > removing it from the compression code at least.
> 
> But kmap()/kunmap() can still work on those pages, right?
> 
> This is just a preparation for the later patches which refactor 
> lzo_decompress_bio() and btrfs_decompress_buf2page().
> 
> Thus I don't think we need to do that in one go.

Either way but please slow down with the patches, you have too many in
flight. We need to get something merged first, the merge window is fine
so far so we can do that. There are preparatory patches and the real
subpage work so there are dependencies, I'd like to avoid unnecessary
conflicts are resends due to refreshes.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-06-30 13:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-30  9:32 [PATCH 0/4] btrfs: subpage compressed read path fixes Qu Wenruo
2021-06-30  9:32 ` [PATCH 1/4] btrfs: grab correct extent map for subpage compressed extent read Qu Wenruo
2021-06-30  9:32 ` [PATCH 2/4] btrfs: remove the GFP_HIGHMEM flag for compression code Qu Wenruo
2021-06-30 13:06   ` David Sterba
2021-06-30 13:13     ` Qu Wenruo
2021-06-30 13:26       ` David Sterba
2021-06-30  9:32 ` [PATCH 3/4] btrfs: rework btrfs_decompress_buf2page() Qu Wenruo
2021-06-30 10:50   ` Qu Wenruo
2021-06-30  9:32 ` [PATCH 4/4] btrfs: rework lzo_decompress_bio() to make it subpage compatible Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.