linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag
@ 2021-08-06  8:12 Qu Wenruo
  2021-08-06  8:12 ` [PATCH v5 01/11] btrfs: defrag: pass file_ra_state instead of file for btrfs_defrag_file() Qu Wenruo
                   ` (11 more replies)
  0 siblings, 12 replies; 21+ messages in thread
From: Qu Wenruo @ 2021-08-06  8:12 UTC (permalink / raw)
  To: linux-btrfs

This branch is based on David's misc-next branch, which already has the
subpage read-write support included.

[BACKGROUND]
In current misc-next branch, we disable defrag completely due to the fact
that current code can only work on page basis.

Without proper subpage defrag support, it could lead to problems like
btrfs/062 crash.

Thus this patchset will make defrag to work on both regular and subpage
sectorsize.

[SOLUTION]
To defrag a file range, what we do is pretty much like buffered write,
except we don't really write any new data to page cache, but just mark
the range dirty.

Then let later writeback to merge the range into a larger extent.

But current defrag code is working on per-page basis, not per-sector,
thus we have to refactor it a little to make it to work properly for
subpage.

This patch will separate the code into 3 layers:
Layer 0:	btrfs_defrag_file()
		The defrag entrance.
		Just do proper inode lock and split the file into
		page aligned 256K clusters to defrag.

Layer 1:	defrag_one_cluster()
		Will collect the initial targets file extents, and pass
		each continuous target to defrag_one_range().

Layer 2:	defrag_one_range()
		Will prepare the needed page and extent locking.
		Then re-check the range for real target list, as initial
		target list is not consistent as it doesn't hage
		page/extent locking to prevent hole punching.

Layer 3:	defrag_one_locked_target()
		The real work, to make the extent range defrag and
		update involved page status.

[BEHAVIOR CHANGE]
In the refactor, there is one behavior change:

- Defraged sector counter is based on the initial target list
  This is mostly to avoid the parameters to be passed too deep into
  defrag_one_locked_target().
  Considering the accounting is not that important, we can afford some
  difference.

[PATCH STRUCTURE]
Patch 01~04:	Small independent refactor to improve readability
Patch 05~09:	Implement the more readable and subpage friendly defrag
Patch 10:	Cleanup of old infrastructure 
Patch 11:	Enable defrag for subpage case

Now both regular sectorsize and subpage sectorsize can pass defrag test
group.

[CHANGELOG]
v2:
- Make sure we won't defrag hole
  This is done by re-collect the target list after have page and extent
  locked. So that we can have a consistent view of the extent map.

- Add a new layer to avoid variable naming bugs
  Since we need to handle real target list inside defrag_one_range(),
  and that function has parameters like "start" and "len", while inside
  the loop we need things like "entry->start" and "entry->len", it has
  already caused hard to debug bugs during development.

  Thus introduce a new layer, defrag_one_ragen() to prepare pages/extent
  lock then pass the entry to defrag_one_locked_target().

v3:
- Fix extent_state leak
  Now we pass the @cached_state to defrag_one_locked_target() other
  than allowing it to allocate a new one.

  This can be reproduced by enabling "TEST_FS_MODULE_RELOAD" environment
  variable for fstests and run "-g defrag" group.

- Fix a random hang in test cases like btrfs/062
  Since defrag_one_range() will lock the extent range first, then
  call defrag_collect_targets(), which will in turn call
  defrag_lookup_extent() and lock extent range again.

  This will cause a dead lock, and this only happens when the
  extent range is smaller than the original locked range.
  Thus sometimes the test can pass, but sometimes it can hang.

  Fix it by teaching defrag_collect_targets() and defrag_lookup_extent()
  to skip extent lock for certain call sites.
  Thus this needs some loops for btrfs/062.
  The hang possibility is around 1/2 ~ 1/3 when run in a loop.

v4:
- Fix a callsite which can cause hang due to extent locking
  The call site is defrag_lookup_extent() in defrag_collect_tagets(),
  which has one call site called with extent lock hold.
  Thus it also need to pass the @locked parameter

- Fix a typo
  "defraged" -> "defragged"

v5:
- Fix the btrfs/072 test failure
  Now btrfs/072 no longer reports inode nbytes mismatch error

- Add one patch to make cluster_pages_for_defrag() to be subpage
  compatible

- Move the page writeback wait out of the page preparation loop
  To keep the same old behavior.

- Use pgoff_t for page index

Qu Wenruo (11):
  btrfs: defrag: pass file_ra_state instead of file for
    btrfs_defrag_file()
  btrfs: defrag: also check PagePrivate for subpage cases in
    cluster_pages_for_defrag()
  btrfs: defrag: replace hard coded PAGE_SIZE to sectorsize
  btrfs: defrag: extract the page preparation code into one helper
  btrfs: defrag: introduce a new helper to collect target file extents
  btrfs: defrag: introduce a helper to defrag a continuous prepared
    range
  btrfs: defrag: introduce a helper to defrag a range
  btrfs: defrag: introduce a new helper to defrag one cluster
  btrfs: defrag: use defrag_one_cluster() to implement
    btrfs_defrag_file()
  btrfs: defrag: remove the old infrastructure
  btrfs: defrag: enable defrag for subpage case

 fs/btrfs/ctree.h |   4 +-
 fs/btrfs/ioctl.c | 911 ++++++++++++++++++++++-------------------------
 2 files changed, 429 insertions(+), 486 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v5 01/11] btrfs: defrag: pass file_ra_state instead of file for btrfs_defrag_file()
  2021-08-06  8:12 [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag Qu Wenruo
@ 2021-08-06  8:12 ` Qu Wenruo
  2021-08-06  8:12 ` [PATCH v5 02/11] btrfs: defrag: also check PagePrivate for subpage cases in cluster_pages_for_defrag() Qu Wenruo
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Qu Wenruo @ 2021-08-06  8:12 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Johannes Thumshirn

Currently btrfs_defrag_file() accepts both "struct inode" and "struct
file" as parameter, which can't be more confusing.
As we can easily grab "struct inode" from "struct file" using
file_inode() helper.

The reason why we need "struct file" is just to re-use its f_ra.

This patch will change this by passing "struct file_ra_state" parameter,
so that it's more clear what we really want.

Since we're here, also add some comment on the function
btrfs_defrag_file() to make later reader less miserable.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/ctree.h |  4 ++--
 fs/btrfs/ioctl.c | 27 ++++++++++++++++++---------
 2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f17be4b023cb..1723e8945ab2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3237,9 +3237,9 @@ int btrfs_fileattr_set(struct user_namespace *mnt_userns,
 int btrfs_ioctl_get_supported_features(void __user *arg);
 void btrfs_sync_inode_flags_to_i_flags(struct inode *inode);
 int __pure btrfs_is_empty_uuid(u8 *uuid);
-int btrfs_defrag_file(struct inode *inode, struct file *file,
+int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
 		      struct btrfs_ioctl_defrag_range_args *range,
-		      u64 newer_than, unsigned long max_pages);
+		      u64 newer_than, unsigned long max_to_defrag);
 void btrfs_get_block_group_info(struct list_head *groups_list,
 				struct btrfs_ioctl_space_info *space);
 void btrfs_update_ioctl_balance_args(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 85c8b5a87a6a..e63961192581 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1397,13 +1397,22 @@ static int cluster_pages_for_defrag(struct inode *inode,
 
 }
 
-int btrfs_defrag_file(struct inode *inode, struct file *file,
+/*
+ * Btrfs entrace for defrag.
+ *
+ * @inode:	   Inode to be defragged
+ * @ra:		   Readahead state. If NULL, one will be allocated at runtime.
+ * @range:	   Defrag options including range and flags.
+ * @newer_than:	   Minimal transid to defrag
+ * @max_to_defrag: Max number of sectors to be defragged, if 0, the whole inode
+ *		   will be defragged.
+ */
+int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
 		      struct btrfs_ioctl_defrag_range_args *range,
 		      u64 newer_than, unsigned long max_to_defrag)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_root *root = BTRFS_I(inode)->root;
-	struct file_ra_state *ra = NULL;
 	unsigned long last_index;
 	u64 isize = i_size_read(inode);
 	u64 last_len = 0;
@@ -1421,6 +1430,7 @@ int btrfs_defrag_file(struct inode *inode, struct file *file,
 	u64 new_align = ~((u64)SZ_128K - 1);
 	struct page **pages = NULL;
 	bool do_compress = range->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
+	bool ra_allocated = false;
 
 	if (isize == 0)
 		return 0;
@@ -1439,16 +1449,15 @@ int btrfs_defrag_file(struct inode *inode, struct file *file,
 		extent_thresh = SZ_256K;
 
 	/*
-	 * If we were not given a file, allocate a readahead context. As
+	 * If we were not given a ra, allocate a readahead context. As
 	 * readahead is just an optimization, defrag will work without it so
 	 * we don't error out.
 	 */
-	if (!file) {
+	if (!ra) {
+		ra_allocated = true;
 		ra = kzalloc(sizeof(*ra), GFP_KERNEL);
 		if (ra)
 			file_ra_state_init(ra, inode->i_mapping);
-	} else {
-		ra = &file->f_ra;
 	}
 
 	pages = kmalloc_array(max_cluster, sizeof(struct page *), GFP_KERNEL);
@@ -1530,7 +1539,7 @@ int btrfs_defrag_file(struct inode *inode, struct file *file,
 			ra_index = max(i, ra_index);
 			if (ra)
 				page_cache_sync_readahead(inode->i_mapping, ra,
-						file, ra_index, cluster);
+						NULL, ra_index, cluster);
 			ra_index += cluster;
 		}
 
@@ -1601,7 +1610,7 @@ int btrfs_defrag_file(struct inode *inode, struct file *file,
 		BTRFS_I(inode)->defrag_compress = BTRFS_COMPRESS_NONE;
 		btrfs_inode_unlock(inode, 0);
 	}
-	if (!file)
+	if (ra_allocated)
 		kfree(ra);
 	kfree(pages);
 	return ret;
@@ -3170,7 +3179,7 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp)
 			/* the rest are all set to zero by kzalloc */
 			range->len = (u64)-1;
 		}
-		ret = btrfs_defrag_file(file_inode(file), file,
+		ret = btrfs_defrag_file(file_inode(file), &file->f_ra,
 					range, BTRFS_OLDEST_GENERATION, 0);
 		if (ret > 0)
 			ret = 0;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 02/11] btrfs: defrag: also check PagePrivate for subpage cases in cluster_pages_for_defrag()
  2021-08-06  8:12 [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag Qu Wenruo
  2021-08-06  8:12 ` [PATCH v5 01/11] btrfs: defrag: pass file_ra_state instead of file for btrfs_defrag_file() Qu Wenruo
@ 2021-08-06  8:12 ` Qu Wenruo
  2021-08-06  8:12 ` [PATCH v5 03/11] btrfs: defrag: replace hard coded PAGE_SIZE to sectorsize Qu Wenruo
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Qu Wenruo @ 2021-08-06  8:12 UTC (permalink / raw)
  To: linux-btrfs

In function cluster_pages_for_defrag() we have a window where we unlock
page, either start the ordered range or read the content from disk.

When we re-lock the page, we need to make sure it still has the correct
page->private for subpage.

Thus add the extra PagePrivate check here to handle subpage cases
properly.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ioctl.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index e63961192581..7c1ea4eed490 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1272,7 +1272,8 @@ static int cluster_pages_for_defrag(struct inode *inode,
 			 * we unlocked the page above, so we need check if
 			 * it was released or not.
 			 */
-			if (page->mapping != inode->i_mapping) {
+			if (page->mapping != inode->i_mapping ||
+			    !PagePrivate(page)) {
 				unlock_page(page);
 				put_page(page);
 				goto again;
@@ -1290,7 +1291,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
 			}
 		}
 
-		if (page->mapping != inode->i_mapping) {
+		if (page->mapping != inode->i_mapping || !PagePrivate(page)) {
 			unlock_page(page);
 			put_page(page);
 			goto again;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 03/11] btrfs: defrag: replace hard coded PAGE_SIZE to sectorsize
  2021-08-06  8:12 [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag Qu Wenruo
  2021-08-06  8:12 ` [PATCH v5 01/11] btrfs: defrag: pass file_ra_state instead of file for btrfs_defrag_file() Qu Wenruo
  2021-08-06  8:12 ` [PATCH v5 02/11] btrfs: defrag: also check PagePrivate for subpage cases in cluster_pages_for_defrag() Qu Wenruo
@ 2021-08-06  8:12 ` Qu Wenruo
  2021-08-06  8:12 ` [PATCH v5 04/11] btrfs: defrag: extract the page preparation code into one helper Qu Wenruo
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Qu Wenruo @ 2021-08-06  8:12 UTC (permalink / raw)
  To: linux-btrfs

When testing subpage defrag support, I always find some strange inode
nbytes error, after a lot of debugging, it turns out that
defrag_lookup_extent() is using PAGE_SIZE as size for
lookup_extent_mapping().

Since lookup_extent_mapping() is calling __lookup_extent_mapping() with
@strict == 1, this means any extent map smaller than one page will be
ignored, prevent subpage defrag to grab a correct extent map.

There are quite some PAGE_SIZE usage in ioctl.c, but most of them are
correct usages, and can be one of the following cases:

- ioctl structure size check
  We want ioctl structure be contained inside one page.

- real page operations

The remaining cases in defrag_lookup_extent() and
check_defrag_in_cache() will be addressed in this patch.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ioctl.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 7c1ea4eed490..5545f33c6a98 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -991,10 +991,11 @@ static int check_defrag_in_cache(struct inode *inode, u64 offset, u32 thresh)
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct extent_map *em = NULL;
 	struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
+	const u32 sectorsize = btrfs_sb(inode->i_sb)->sectorsize;
 	u64 end;
 
 	read_lock(&em_tree->lock);
-	em = lookup_extent_mapping(em_tree, offset, PAGE_SIZE);
+	em = lookup_extent_mapping(em_tree, offset, sectorsize);
 	read_unlock(&em_tree->lock);
 
 	if (em) {
@@ -1084,23 +1085,24 @@ static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start)
 	struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct extent_map *em;
-	u64 len = PAGE_SIZE;
+	const u32 sectorsize = BTRFS_I(inode)->root->fs_info->sectorsize;
 
 	/*
 	 * hopefully we have this extent in the tree already, try without
 	 * the full extent lock
 	 */
 	read_lock(&em_tree->lock);
-	em = lookup_extent_mapping(em_tree, start, len);
+	em = lookup_extent_mapping(em_tree, start, sectorsize);
 	read_unlock(&em_tree->lock);
 
 	if (!em) {
 		struct extent_state *cached = NULL;
-		u64 end = start + len - 1;
+		u64 end = start + sectorsize - 1;
 
 		/* get the big lock and read metadata off disk */
 		lock_extent_bits(io_tree, start, end, &cached);
-		em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, start, len);
+		em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, start,
+				      sectorsize);
 		unlock_extent_cached(io_tree, start, end, &cached);
 
 		if (IS_ERR(em))
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 04/11] btrfs: defrag: extract the page preparation code into one helper
  2021-08-06  8:12 [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag Qu Wenruo
                   ` (2 preceding siblings ...)
  2021-08-06  8:12 ` [PATCH v5 03/11] btrfs: defrag: replace hard coded PAGE_SIZE to sectorsize Qu Wenruo
@ 2021-08-06  8:12 ` Qu Wenruo
  2021-08-23 19:04   ` David Sterba
  2021-08-06  8:12 ` [PATCH v5 05/11] btrfs: defrag: introduce a new helper to collect target file extents Qu Wenruo
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2021-08-06  8:12 UTC (permalink / raw)
  To: linux-btrfs

In cluster_pages_for_defrag(), we have complex code block inside one
for() loop.

The code block is to prepare one page for defrag, this will ensure:
- The page is locked and set up properly
- No ordered extent exists in the page range
- The page is uptodate

This behavior is pretty common and will be reused by later defrag
rework.

So extract the code into its own helper, defrag_prepare_one_page(), for
later usage, and cleanup the code by a little.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ioctl.c | 149 ++++++++++++++++++++++++++++-------------------
 1 file changed, 88 insertions(+), 61 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5545f33c6a98..c0639780f99c 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1192,6 +1192,89 @@ static int should_defrag_range(struct inode *inode, u64 start, u32 thresh,
 	return ret;
 }
 
+/*
+ * Prepare one page to be defragged.
+ *
+ * This will ensure:
+ * - Returned page is locked and has been set up properly
+ * - No ordered extent exists in the page
+ * - The page is uptodate
+ *
+ * NOTE: Caller should also wait for page writeback after the cluster is
+ * prepared, here we don't do writeback wait for each page.
+ */
+static struct page *defrag_prepare_one_page(struct btrfs_inode *inode,
+					    pgoff_t index)
+{
+	struct address_space *mapping = inode->vfs_inode.i_mapping;
+	gfp_t mask = btrfs_alloc_write_mask(mapping);
+	u64 page_start = index << PAGE_SHIFT;
+	u64 page_end = page_start + PAGE_SIZE - 1;
+	struct extent_state *cached_state = NULL;
+	struct page *page;
+	int ret;
+
+again:
+	page = find_or_create_page(mapping, index, mask);
+	if (!page)
+		return ERR_PTR(-ENOMEM);
+
+	ret = set_page_extent_mapped(page);
+	if (ret < 0) {
+		unlock_page(page);
+		put_page(page);
+		return ERR_PTR(ret);
+	}
+
+	/* Wait for any exists ordered extent in the range */
+	while (1) {
+		struct btrfs_ordered_extent *ordered;
+
+		lock_extent_bits(&inode->io_tree, page_start, page_end,
+				 &cached_state);
+		ordered = btrfs_lookup_ordered_range(inode, page_start,
+						     PAGE_SIZE);
+		unlock_extent_cached(&inode->io_tree, page_start, page_end,
+				     &cached_state);
+		if (!ordered)
+			break;
+
+		unlock_page(page);
+		btrfs_start_ordered_extent(ordered, 1);
+		btrfs_put_ordered_extent(ordered);
+		lock_page(page);
+		/*
+		 * we unlocked the page above, so we need check if
+		 * it was released or not.
+		 */
+		if (page->mapping != mapping || !PagePrivate(page)) {
+			unlock_page(page);
+			put_page(page);
+			goto again;
+		}
+	}
+
+	/*
+	 * Now the page range has no ordered extent any more.
+	 * Read the page to make it uptodate.
+	 */
+	if (!PageUptodate(page)) {
+		btrfs_readpage(NULL, page);
+		lock_page(page);
+		if (page->mapping != mapping || !PagePrivate(page)) {
+			unlock_page(page);
+			put_page(page);
+			goto again;
+		}
+		if (!PageUptodate(page)) {
+			unlock_page(page);
+			put_page(page);
+			return ERR_PTR(-EIO);
+		}
+	}
+	return page;
+}
+
 /*
  * it doesn't do much good to defrag one or two pages
  * at a time.  This pulls in a nice chunk of pages
@@ -1219,11 +1302,8 @@ static int cluster_pages_for_defrag(struct inode *inode,
 	int ret;
 	int i;
 	int i_done;
-	struct btrfs_ordered_extent *ordered;
 	struct extent_state *cached_state = NULL;
-	struct extent_io_tree *tree;
 	struct extent_changeset *data_reserved = NULL;
-	gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
 
 	file_end = (isize - 1) >> PAGE_SHIFT;
 	if (!isize || start_index > file_end)
@@ -1236,69 +1316,16 @@ static int cluster_pages_for_defrag(struct inode *inode,
 	if (ret)
 		return ret;
 	i_done = 0;
-	tree = &BTRFS_I(inode)->io_tree;
 
 	/* step one, lock all the pages */
 	for (i = 0; i < page_cnt; i++) {
 		struct page *page;
-again:
-		page = find_or_create_page(inode->i_mapping,
-					   start_index + i, mask);
-		if (!page)
-			break;
 
-		ret = set_page_extent_mapped(page);
-		if (ret < 0) {
-			unlock_page(page);
-			put_page(page);
+		page = defrag_prepare_one_page(BTRFS_I(inode), start_index + i);
+		if (IS_ERR(page)) {
+			ret = PTR_ERR(page);
 			break;
 		}
-
-		page_start = page_offset(page);
-		page_end = page_start + PAGE_SIZE - 1;
-		while (1) {
-			lock_extent_bits(tree, page_start, page_end,
-					 &cached_state);
-			ordered = btrfs_lookup_ordered_extent(BTRFS_I(inode),
-							      page_start);
-			unlock_extent_cached(tree, page_start, page_end,
-					     &cached_state);
-			if (!ordered)
-				break;
-
-			unlock_page(page);
-			btrfs_start_ordered_extent(ordered, 1);
-			btrfs_put_ordered_extent(ordered);
-			lock_page(page);
-			/*
-			 * we unlocked the page above, so we need check if
-			 * it was released or not.
-			 */
-			if (page->mapping != inode->i_mapping ||
-			    !PagePrivate(page)) {
-				unlock_page(page);
-				put_page(page);
-				goto again;
-			}
-		}
-
-		if (!PageUptodate(page)) {
-			btrfs_readpage(NULL, page);
-			lock_page(page);
-			if (!PageUptodate(page)) {
-				unlock_page(page);
-				put_page(page);
-				ret = -EIO;
-				break;
-			}
-		}
-
-		if (page->mapping != inode->i_mapping || !PagePrivate(page)) {
-			unlock_page(page);
-			put_page(page);
-			goto again;
-		}
-
 		pages[i] = page;
 		i_done++;
 	}
@@ -1309,8 +1336,8 @@ static int cluster_pages_for_defrag(struct inode *inode,
 		goto out;
 
 	/*
-	 * so now we have a nice long stream of locked
-	 * and up to date pages, lets wait on them
+	 * So now we have a nice long stream of locked and up to date pages,
+	 * lets wait on them.
 	 */
 	for (i = 0; i < i_done; i++)
 		wait_on_page_writeback(pages[i]);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 05/11] btrfs: defrag: introduce a new helper to collect target file extents
  2021-08-06  8:12 [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag Qu Wenruo
                   ` (3 preceding siblings ...)
  2021-08-06  8:12 ` [PATCH v5 04/11] btrfs: defrag: extract the page preparation code into one helper Qu Wenruo
@ 2021-08-06  8:12 ` Qu Wenruo
  2021-08-23 19:08   ` David Sterba
  2021-08-06  8:12 ` [PATCH v5 06/11] btrfs: defrag: introduce a helper to defrag a continuous prepared range Qu Wenruo
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2021-08-06  8:12 UTC (permalink / raw)
  To: linux-btrfs

Introduce a new helper, defrag_collect_targets(), to collect all
possible targets to be defragged.

This function will not consider things like max_sectors_to_defrag, thus
caller should be responsible to ensure we don't exceed the limit.

This function will be the first stage of later defrag rework.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ioctl.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 120 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index c0639780f99c..043c44daa5ae 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1427,6 +1427,126 @@ static int cluster_pages_for_defrag(struct inode *inode,
 
 }
 
+struct defrag_target_range {
+	struct list_head list;
+	u64 start;
+	u64 len;
+};
+
+/*
+ * Helper to collect all valid target extents.
+ *
+ * @start:	   The file offset to lookup
+ * @len:	   The length to lookup
+ * @extent_thresh: File extent size threshold, any extent size >= this value
+ *		   will be ignored
+ * @newer_than:    Only defrag extents newer than this value
+ * @do_compress:   Whether the defrag is doing compression
+ *		   If true, @extent_thresh will be ignored and all regular
+ *		   file extents meeting @newer_than will be targets.
+ * @target_list:   The list of targets file extents
+ */
+static int defrag_collect_targets(struct btrfs_inode *inode,
+				  u64 start, u64 len, u32 extent_thresh,
+				  u64 newer_than, bool do_compress,
+				  struct list_head *target_list)
+{
+	u64 cur = start;
+	int ret = 0;
+
+	while (cur < start + len) {
+		struct extent_map *em;
+		struct defrag_target_range *new;
+		bool next_mergeable = true;
+		u64 range_len;
+
+		em = defrag_lookup_extent(&inode->vfs_inode, cur);
+		if (!em)
+			break;
+
+		/* Skip hole/inline/preallocated extents */
+		if (em->block_start >= EXTENT_MAP_LAST_BYTE ||
+		    test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
+			goto next;
+
+		/* Skip older extent */
+		if (em->generation < newer_than)
+			goto next;
+
+		/*
+		 * For do_compress case, we want to compress all valid file
+		 * extents, thus no @extent_thresh or mergeable check.
+		 */
+		if (do_compress)
+			goto add;
+
+		/* Skip too large extent */
+		if (em->len >= extent_thresh)
+			goto next;
+
+		next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em);
+		if (!next_mergeable) {
+			struct defrag_target_range *last;
+
+			/* Empty target list, no way to merge with last entry */
+			if (list_empty(target_list))
+				goto next;
+			last = list_entry(target_list->prev,
+					struct defrag_target_range, list);
+			/* Not mergeable with last entry */
+			if (last->start + last->len != cur)
+				goto next;
+
+			/* Mergeable, fall through to add it to @target_list. */
+		}
+
+add:
+		range_len = min(extent_map_end(em), start + len) - cur;
+		/*
+		 * This one is a good target, check if it can be merged into
+		 * last range of the target list
+		 */
+		if (!list_empty(target_list)) {
+			struct defrag_target_range *last;
+
+			last = list_entry(target_list->prev,
+					struct defrag_target_range, list);
+			ASSERT(last->start + last->len <= cur);
+			if (last->start + last->len == cur) {
+				/* Mergeable, enlarge the last entry */
+				last->len += range_len;
+				goto next;
+			}
+			/* Fall through to allocate a new entry */
+		}
+
+		/* Allocate new defrag_target_range */
+		new = kmalloc(sizeof(*new), GFP_NOFS);
+		if (!new) {
+			free_extent_map(em);
+			ret = -ENOMEM;
+			break;
+		}
+		new->start = cur;
+		new->len = range_len;
+		list_add_tail(&new->list, target_list);
+
+next:
+		cur = extent_map_end(em);
+		free_extent_map(em);
+	}
+	if (ret < 0) {
+		struct defrag_target_range *entry;
+		struct defrag_target_range *tmp;
+
+		list_for_each_entry_safe(entry, tmp, target_list, list) {
+			list_del_init(&entry->list);
+			kfree(entry);
+		}
+	}
+	return ret;
+}
+
 /*
  * Btrfs entrace for defrag.
  *
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 06/11] btrfs: defrag: introduce a helper to defrag a continuous prepared range
  2021-08-06  8:12 [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag Qu Wenruo
                   ` (4 preceding siblings ...)
  2021-08-06  8:12 ` [PATCH v5 05/11] btrfs: defrag: introduce a new helper to collect target file extents Qu Wenruo
@ 2021-08-06  8:12 ` Qu Wenruo
  2021-08-06  8:12 ` [PATCH v5 07/11] btrfs: defrag: introduce a helper to defrag a range Qu Wenruo
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Qu Wenruo @ 2021-08-06  8:12 UTC (permalink / raw)
  To: linux-btrfs

A new helper, defrag_one_locked_target(), introduced to do the real part
of defrag.

The caller needs to ensure both page and extents bits are locked, and no
ordered extent for the range, and all writeback is finished.

The core defrag part is pretty straight-forward:

- Reserve space
- Set extent bits to defrag
- Update involved pages to be dirty

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ioctl.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 043c44daa5ae..a21c4c09269a 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -48,6 +48,7 @@
 #include "space-info.h"
 #include "delalloc-space.h"
 #include "block-group.h"
+#include "subpage.h"
 
 #ifdef CONFIG_64BIT
 /* If we have a 32-bit userspace and 64-bit kernel, then the UAPI
@@ -1547,6 +1548,61 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
 	return ret;
 }
 
+#define CLUSTER_SIZE	(SZ_256K)
+
+/*
+ * Defrag one continuous target range.
+ *
+ * @inode:	Target inode
+ * @target:	Target range to defrag
+ * @pages:	Locked pages covering the defrag range
+ * @nr_pages:	Number of locked pages
+ *
+ * Caller should ensure:
+ *
+ * - Pages are prepared
+ *   Pages should be locked, no ordered extent in the pages range,
+ *   no writeback.
+ *
+ * - Extent bits are locked
+ */
+static int defrag_one_locked_target(struct btrfs_inode *inode,
+				    struct defrag_target_range *target,
+				    struct page **pages, int nr_pages,
+				    struct extent_state **cached_state)
+{
+	struct btrfs_fs_info *fs_info = inode->root->fs_info;
+	struct extent_changeset *data_reserved = NULL;
+	const u64 start = target->start;
+	const u64 len = target->len;
+	unsigned long last_index = (start + len - 1) >> PAGE_SHIFT;
+	unsigned long start_index = start >> PAGE_SHIFT;
+	unsigned long first_index = page_index(pages[0]);
+	int ret = 0;
+	int i;
+
+	ASSERT(last_index - first_index + 1 <= nr_pages);
+
+	ret = btrfs_delalloc_reserve_space(inode, &data_reserved, start, len);
+	if (ret < 0)
+		return ret;
+	clear_extent_bit(&inode->io_tree, start, start + len - 1,
+			 EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING |
+			 EXTENT_DEFRAG, 0, 0, cached_state);
+	set_extent_defrag(&inode->io_tree, start, start + len - 1,
+			  cached_state);
+
+	/* Update the page status */
+	for (i = start_index - first_index; i <= last_index - first_index;
+	     i++) {
+		ClearPageChecked(pages[i]);
+		btrfs_page_clamp_set_dirty(fs_info, pages[i], start, len);
+	}
+	btrfs_delalloc_release_extents(inode, len);
+	extent_changeset_free(data_reserved);
+	return ret;
+}
+
 /*
  * Btrfs entrace for defrag.
  *
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 07/11] btrfs: defrag: introduce a helper to defrag a range
  2021-08-06  8:12 [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag Qu Wenruo
                   ` (5 preceding siblings ...)
  2021-08-06  8:12 ` [PATCH v5 06/11] btrfs: defrag: introduce a helper to defrag a continuous prepared range Qu Wenruo
@ 2021-08-06  8:12 ` Qu Wenruo
  2021-08-23 19:21   ` David Sterba
  2021-08-06  8:12 ` [PATCH v5 08/11] btrfs: defrag: introduce a new helper to defrag one cluster Qu Wenruo
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2021-08-06  8:12 UTC (permalink / raw)
  To: linux-btrfs

A new helper, defrag_one_range(), is introduced to defrag one range.

This function will mostly prepare the needed pages and extent status for
defrag_one_locked_target().

As we can only have a consistent view of extent map with page and
extent bits locked, we need to re-check the range passed in to get a
real target list for defrag_one_locked_target().

Since defrag_collect_targets() will call defrag_lookup_extent() and lock
extent range, we also need to teach those two functions to skip extent
lock.
Thus new parameter, @locked, is introruced to skip extent lock if the
caller has already locked the range.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ioctl.c | 105 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 95 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a21c4c09269a..2f7196f9bd65 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1081,7 +1081,8 @@ static int find_new_extents(struct btrfs_root *root,
 	return -ENOENT;
 }
 
-static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start)
+static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start,
+					       bool locked)
 {
 	struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
@@ -1101,10 +1102,12 @@ static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start)
 		u64 end = start + sectorsize - 1;
 
 		/* get the big lock and read metadata off disk */
-		lock_extent_bits(io_tree, start, end, &cached);
+		if (!locked)
+			lock_extent_bits(io_tree, start, end, &cached);
 		em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, start,
 				      sectorsize);
-		unlock_extent_cached(io_tree, start, end, &cached);
+		if (!locked)
+			unlock_extent_cached(io_tree, start, end, &cached);
 
 		if (IS_ERR(em))
 			return NULL;
@@ -1113,7 +1116,8 @@ static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start)
 	return em;
 }
 
-static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em)
+static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em,
+				     bool locked)
 {
 	struct extent_map *next;
 	bool ret = true;
@@ -1122,7 +1126,7 @@ static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em)
 	if (em->start + em->len >= i_size_read(inode))
 		return false;
 
-	next = defrag_lookup_extent(inode, em->start + em->len);
+	next = defrag_lookup_extent(inode, em->start + em->len, locked);
 	if (!next || next->block_start >= EXTENT_MAP_LAST_BYTE)
 		ret = false;
 	else if ((em->block_start + em->block_len == next->block_start) &&
@@ -1151,7 +1155,7 @@ static int should_defrag_range(struct inode *inode, u64 start, u32 thresh,
 
 	*skip = 0;
 
-	em = defrag_lookup_extent(inode, start);
+	em = defrag_lookup_extent(inode, start, false);
 	if (!em)
 		return 0;
 
@@ -1164,7 +1168,7 @@ static int should_defrag_range(struct inode *inode, u64 start, u32 thresh,
 	if (!*defrag_end)
 		prev_mergeable = false;
 
-	next_mergeable = defrag_check_next_extent(inode, em);
+	next_mergeable = defrag_check_next_extent(inode, em, false);
 	/*
 	 * we hit a real extent, if it is big or the next extent is not a
 	 * real extent, don't bother defragging it
@@ -1445,12 +1449,13 @@ struct defrag_target_range {
  * @do_compress:   Whether the defrag is doing compression
  *		   If true, @extent_thresh will be ignored and all regular
  *		   file extents meeting @newer_than will be targets.
+ * @locked:	   If the range has already hold extent lock
  * @target_list:   The list of targets file extents
  */
 static int defrag_collect_targets(struct btrfs_inode *inode,
 				  u64 start, u64 len, u32 extent_thresh,
 				  u64 newer_than, bool do_compress,
-				  struct list_head *target_list)
+				  bool locked, struct list_head *target_list)
 {
 	u64 cur = start;
 	int ret = 0;
@@ -1461,7 +1466,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
 		bool next_mergeable = true;
 		u64 range_len;
 
-		em = defrag_lookup_extent(&inode->vfs_inode, cur);
+		em = defrag_lookup_extent(&inode->vfs_inode, cur, locked);
 		if (!em)
 			break;
 
@@ -1485,7 +1490,8 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
 		if (em->len >= extent_thresh)
 			goto next;
 
-		next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em);
+		next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em,
+							  locked);
 		if (!next_mergeable) {
 			struct defrag_target_range *last;
 
@@ -1603,6 +1609,85 @@ static int defrag_one_locked_target(struct btrfs_inode *inode,
 	return ret;
 }
 
+static int defrag_one_range(struct btrfs_inode *inode,
+			    u64 start, u32 len,
+			    u32 extent_thresh, u64 newer_than,
+			    bool do_compress)
+{
+	struct extent_state *cached_state = NULL;
+	struct defrag_target_range *entry;
+	struct defrag_target_range *tmp;
+	LIST_HEAD(target_list);
+	struct page **pages;
+	const u32 sectorsize = inode->root->fs_info->sectorsize;
+	unsigned long last_index = (start + len - 1) >> PAGE_SHIFT;
+	unsigned long start_index = start >> PAGE_SHIFT;
+	unsigned int nr_pages = last_index - start_index + 1;
+	int ret = 0;
+	int i;
+
+	ASSERT(nr_pages <= CLUSTER_SIZE / PAGE_SIZE);
+	ASSERT(IS_ALIGNED(start, sectorsize) && IS_ALIGNED(len, sectorsize));
+
+	pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
+	if (!pages)
+		return -ENOMEM;
+
+	/* Prepare all pages */
+	for (i = 0; i < nr_pages; i++) {
+		pages[i] = defrag_prepare_one_page(inode, start_index + i);
+		if (IS_ERR(pages[i])) {
+			ret = PTR_ERR(pages[i]);
+			pages[i] = NULL;
+			goto free_pages;
+		}
+	}
+	for (i = 0; i < nr_pages; i++)
+		wait_on_page_writeback(pages[i]);
+
+	/* Also lock the pages range */
+	lock_extent_bits(&inode->io_tree, start_index << PAGE_SHIFT,
+			 (last_index << PAGE_SHIFT) + PAGE_SIZE - 1,
+			 &cached_state);
+	/*
+	 * Now we have a consistent view about the extent map, re-check
+	 * which range really needs to be defragged.
+	 *
+	 * And this time we have extent locked already, pass @locked = true
+	 * so that we won't re-lock the extent range and cause deadlock.
+	 */
+	ret = defrag_collect_targets(inode, start, len, extent_thresh,
+				     newer_than, do_compress, true,
+				     &target_list);
+	if (ret < 0)
+		goto unlock_extent;
+
+	list_for_each_entry(entry, &target_list, list) {
+		ret = defrag_one_locked_target(inode, entry, pages, nr_pages,
+					       &cached_state);
+		if (ret < 0)
+			break;
+	}
+
+	list_for_each_entry_safe(entry, tmp, &target_list, list) {
+		list_del_init(&entry->list);
+		kfree(entry);
+	}
+unlock_extent:
+	unlock_extent_cached(&inode->io_tree, start_index << PAGE_SHIFT,
+			     (last_index << PAGE_SHIFT) + PAGE_SIZE - 1,
+			     &cached_state);
+free_pages:
+	for (i = 0; i < nr_pages; i++) {
+		if (pages[i]) {
+			unlock_page(pages[i]);
+			put_page(pages[i]);
+		}
+	}
+	kfree(pages);
+	return ret;
+}
+
 /*
  * Btrfs entrace for defrag.
  *
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 08/11] btrfs: defrag: introduce a new helper to defrag one cluster
  2021-08-06  8:12 [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag Qu Wenruo
                   ` (6 preceding siblings ...)
  2021-08-06  8:12 ` [PATCH v5 07/11] btrfs: defrag: introduce a helper to defrag a range Qu Wenruo
@ 2021-08-06  8:12 ` Qu Wenruo
  2021-08-23 19:27   ` David Sterba
  2021-08-06  8:12 ` [PATCH v5 09/11] btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file() Qu Wenruo
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2021-08-06  8:12 UTC (permalink / raw)
  To: linux-btrfs

This new helper, defrag_one_cluster(), will defrag one cluster (at most
256K) by:

- Collect all initial targets

- Kick in readahead when possible

- Call defrag_one_range() on each initial target
  With some extra range clamping.

- Update @sectors_defragged parameter

This involves one behavior change, the defragged sectors accounting is
no longer as accurate as old behavior, as the initial targets are not
consistent.

We can have new holes punched inside the initial target, and we will
skip such holes later.
But the defragged sectors accounting doesn't need to be that accurate
anyway, thus I don't want to pass those extra accounting burden into
defrag_one_range().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ioctl.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2f7196f9bd65..74346fde06f6 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1688,6 +1688,62 @@ static int defrag_one_range(struct btrfs_inode *inode,
 	return ret;
 }
 
+static int defrag_one_cluster(struct btrfs_inode *inode,
+			      struct file_ra_state *ra,
+			      u64 start, u32 len, u32 extent_thresh,
+			      u64 newer_than, bool do_compress,
+			      unsigned long *sectors_defragged,
+			      unsigned long max_sectors)
+{
+	const u32 sectorsize = inode->root->fs_info->sectorsize;
+	struct defrag_target_range *entry;
+	struct defrag_target_range *tmp;
+	LIST_HEAD(target_list);
+	int ret;
+
+	BUILD_BUG_ON(!IS_ALIGNED(CLUSTER_SIZE, PAGE_SIZE));
+	ret = defrag_collect_targets(inode, start, len, extent_thresh,
+				     newer_than, do_compress, false,
+				     &target_list);
+	if (ret < 0)
+		goto out;
+
+	list_for_each_entry(entry, &target_list, list) {
+		u32 range_len = entry->len;
+
+		/* Reached the limit */
+		if (max_sectors && max_sectors == *sectors_defragged)
+			break;
+
+		if (max_sectors)
+			range_len = min_t(u32, range_len,
+				(max_sectors - *sectors_defragged) * sectorsize);
+
+		if (ra)
+			page_cache_sync_readahead(inode->vfs_inode.i_mapping,
+				ra, NULL, entry->start >> PAGE_SHIFT,
+				((entry->start + range_len - 1) >> PAGE_SHIFT) -
+				(entry->start >> PAGE_SHIFT) + 1);
+		/*
+		 * Here we may not defrag any range if holes are punched before
+		 * we locked the pages.
+		 * But that's fine, it only affects the @sectors_defragged
+		 * accounting.
+		 */
+		ret = defrag_one_range(inode, entry->start, range_len,
+					extent_thresh, newer_than, do_compress);
+		if (ret < 0)
+			break;
+		*sectors_defragged += range_len;
+	}
+out:
+	list_for_each_entry_safe(entry, tmp, &target_list, list) {
+		list_del_init(&entry->list);
+		kfree(entry);
+	}
+	return ret;
+}
+
 /*
  * Btrfs entrace for defrag.
  *
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 09/11] btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()
  2021-08-06  8:12 [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag Qu Wenruo
                   ` (7 preceding siblings ...)
  2021-08-06  8:12 ` [PATCH v5 08/11] btrfs: defrag: introduce a new helper to defrag one cluster Qu Wenruo
@ 2021-08-06  8:12 ` Qu Wenruo
  2021-08-09 11:32   ` Dan Carpenter
  2021-08-06  8:12 ` [PATCH v5 10/11] btrfs: defrag: remove the old infrastructure Qu Wenruo
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2021-08-06  8:12 UTC (permalink / raw)
  To: linux-btrfs

The function defrag_one_cluster() is able to defrag one range well
enough, we only need to do prepration for it, including:

- Clamp and align the defrag range
- Exclude invalid cases
- Proper inode locking

The old infrastructures will not be removed in this patch, as it would
be too noisy to review.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ioctl.c | 204 +++++++++++++----------------------------------
 1 file changed, 55 insertions(+), 149 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 74346fde06f6..b3ba89d6402e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1759,25 +1759,15 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
 		      u64 newer_than, unsigned long max_to_defrag)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct btrfs_root *root = BTRFS_I(inode)->root;
-	unsigned long last_index;
+	unsigned long sectors_defragged = 0;
 	u64 isize = i_size_read(inode);
-	u64 last_len = 0;
-	u64 skip = 0;
-	u64 defrag_end = 0;
-	u64 newer_off = range->start;
-	unsigned long i;
-	unsigned long ra_index = 0;
-	int ret;
-	int defrag_count = 0;
-	int compress_type = BTRFS_COMPRESS_ZLIB;
-	u32 extent_thresh = range->extent_thresh;
-	unsigned long max_cluster = SZ_256K >> PAGE_SHIFT;
-	unsigned long cluster = max_cluster;
-	u64 new_align = ~((u64)SZ_128K - 1);
-	struct page **pages = NULL;
+	u64 cur;
+	u64 last_byte;
 	bool do_compress = range->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
 	bool ra_allocated = false;
+	int compress_type = BTRFS_COMPRESS_ZLIB;
+	int ret;
+	u32 extent_thresh = range->extent_thresh;
 
 	if (isize == 0)
 		return 0;
@@ -1795,6 +1785,14 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
 	if (extent_thresh == 0)
 		extent_thresh = SZ_256K;
 
+	if (range->start + range->len > range->start) {
+		/* Got a specific range */
+		last_byte = min(isize, range->start + range->len) - 1;
+	} else {
+		/* Defrag until file end */
+		last_byte = isize - 1;
+	}
+
 	/*
 	 * If we were not given a ra, allocate a readahead context. As
 	 * readahead is just an optimization, defrag will work without it so
@@ -1807,159 +1805,67 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
 			file_ra_state_init(ra, inode->i_mapping);
 	}
 
-	pages = kmalloc_array(max_cluster, sizeof(struct page *), GFP_KERNEL);
-	if (!pages) {
-		ret = -ENOMEM;
-		goto out_ra;
-	}
+	/* Align the range */
+	cur = round_down(range->start, fs_info->sectorsize);
+	last_byte = round_up(last_byte, fs_info->sectorsize) - 1;
 
-	/* find the last page to defrag */
-	if (range->start + range->len > range->start) {
-		last_index = min_t(u64, isize - 1,
-			 range->start + range->len - 1) >> PAGE_SHIFT;
-	} else {
-		last_index = (isize - 1) >> PAGE_SHIFT;
-	}
-
-	if (newer_than) {
-		ret = find_new_extents(root, inode, newer_than,
-				       &newer_off, SZ_64K);
-		if (!ret) {
-			range->start = newer_off;
-			/*
-			 * we always align our defrag to help keep
-			 * the extents in the file evenly spaced
-			 */
-			i = (newer_off & new_align) >> PAGE_SHIFT;
-		} else
-			goto out_ra;
-	} else {
-		i = range->start >> PAGE_SHIFT;
-	}
-	if (!max_to_defrag)
-		max_to_defrag = last_index - i + 1;
-
-	/*
-	 * make writeback starts from i, so the defrag range can be
-	 * written sequentially.
-	 */
-	if (i < inode->i_mapping->writeback_index)
-		inode->i_mapping->writeback_index = i;
-
-	while (i <= last_index && defrag_count < max_to_defrag &&
-	       (i < DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE))) {
-		/*
-		 * make sure we stop running if someone unmounts
-		 * the FS
-		 */
-		if (!(inode->i_sb->s_flags & SB_ACTIVE))
-			break;
-
-		if (btrfs_defrag_cancelled(fs_info)) {
-			btrfs_debug(fs_info, "defrag_file cancelled");
-			ret = -EAGAIN;
-			goto error;
-		}
-
-		if (!should_defrag_range(inode, (u64)i << PAGE_SHIFT,
-					 extent_thresh, &last_len, &skip,
-					 &defrag_end, do_compress)){
-			unsigned long next;
-			/*
-			 * the should_defrag function tells us how much to skip
-			 * bump our counter by the suggested amount
-			 */
-			next = DIV_ROUND_UP(skip, PAGE_SIZE);
-			i = max(i + 1, next);
-			continue;
-		}
+	while (cur < last_byte) {
+		u64 cluster_end;
 
-		if (!newer_than) {
-			cluster = (PAGE_ALIGN(defrag_end) >>
-				   PAGE_SHIFT) - i;
-			cluster = min(cluster, max_cluster);
-		} else {
-			cluster = max_cluster;
-		}
+		/* The cluster size 256K should always be page aligned */
+		BUILD_BUG_ON(!IS_ALIGNED(CLUSTER_SIZE, PAGE_SIZE));
 
-		if (i + cluster > ra_index) {
-			ra_index = max(i, ra_index);
-			if (ra)
-				page_cache_sync_readahead(inode->i_mapping, ra,
-						NULL, ra_index, cluster);
-			ra_index += cluster;
-		}
+		/* We want the cluster ends at page boundary when possible */
+		cluster_end = (((cur >> PAGE_SHIFT) +
+			       (SZ_256K >> PAGE_SHIFT)) << PAGE_SHIFT) - 1;
+		cluster_end = min(cluster_end, last_byte);
 
 		btrfs_inode_lock(inode, 0);
 		if (IS_SWAPFILE(inode)) {
 			ret = -ETXTBSY;
-		} else {
-			if (do_compress)
-				BTRFS_I(inode)->defrag_compress = compress_type;
-			ret = cluster_pages_for_defrag(inode, pages, i, cluster);
+			btrfs_inode_unlock(inode, 0);
+			break;
 		}
-		if (ret < 0) {
+		if (!(inode->i_sb->s_flags & SB_ACTIVE)) {
 			btrfs_inode_unlock(inode, 0);
-			goto out_ra;
+			break;
 		}
-
-		defrag_count += ret;
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+		if (do_compress)
+			BTRFS_I(inode)->defrag_compress = compress_type;
+		ret = defrag_one_cluster(BTRFS_I(inode), ra, cur,
+				cluster_end + 1 - cur, extent_thresh,
+				newer_than, do_compress,
+				&sectors_defragged, max_to_defrag);
 		btrfs_inode_unlock(inode, 0);
-
-		if (newer_than) {
-			if (newer_off == (u64)-1)
-				break;
-
-			if (ret > 0)
-				i += ret;
-
-			newer_off = max(newer_off + 1,
-					(u64)i << PAGE_SHIFT);
-
-			ret = find_new_extents(root, inode, newer_than,
-					       &newer_off, SZ_64K);
-			if (!ret) {
-				range->start = newer_off;
-				i = (newer_off & new_align) >> PAGE_SHIFT;
-			} else {
-				break;
-			}
-		} else {
-			if (ret > 0) {
-				i += ret;
-				last_len += ret << PAGE_SHIFT;
-			} else {
-				i++;
-				last_len = 0;
-			}
-		}
+		if (ret < 0)
+			break;
+		cur = cluster_end + 1;
 	}
 
-	ret = defrag_count;
-error:
-	if ((range->flags & BTRFS_DEFRAG_RANGE_START_IO)) {
-		filemap_flush(inode->i_mapping);
-		if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
-			     &BTRFS_I(inode)->runtime_flags))
+	if (ra_allocated)
+		kfree(ra);
+	if (sectors_defragged) {
+		/*
+		 * We have defragged some sectors, for compression case
+		 * they need to be written back immediately.
+		 */
+		if (range->flags & BTRFS_DEFRAG_RANGE_START_IO) {
 			filemap_flush(inode->i_mapping);
+			if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
+				     &BTRFS_I(inode)->runtime_flags))
+				filemap_flush(inode->i_mapping);
+		}
+		if (range->compress_type == BTRFS_COMPRESS_LZO)
+			btrfs_set_fs_incompat(fs_info, COMPRESS_LZO);
+		else if (range->compress_type == BTRFS_COMPRESS_ZSTD)
+			btrfs_set_fs_incompat(fs_info, COMPRESS_ZSTD);
+		ret = sectors_defragged;
 	}
-
-	if (range->compress_type == BTRFS_COMPRESS_LZO) {
-		btrfs_set_fs_incompat(fs_info, COMPRESS_LZO);
-	} else if (range->compress_type == BTRFS_COMPRESS_ZSTD) {
-		btrfs_set_fs_incompat(fs_info, COMPRESS_ZSTD);
-	}
-
-out_ra:
 	if (do_compress) {
 		btrfs_inode_lock(inode, 0);
 		BTRFS_I(inode)->defrag_compress = BTRFS_COMPRESS_NONE;
 		btrfs_inode_unlock(inode, 0);
 	}
-	if (ra_allocated)
-		kfree(ra);
-	kfree(pages);
 	return ret;
 }
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 10/11] btrfs: defrag: remove the old infrastructure
  2021-08-06  8:12 [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag Qu Wenruo
                   ` (8 preceding siblings ...)
  2021-08-06  8:12 ` [PATCH v5 09/11] btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file() Qu Wenruo
@ 2021-08-06  8:12 ` Qu Wenruo
  2021-08-06  8:12 ` [PATCH v5 11/11] btrfs: defrag: enable defrag for subpage case Qu Wenruo
  2021-08-23 19:43 ` [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag David Sterba
  11 siblings, 0 replies; 21+ messages in thread
From: Qu Wenruo @ 2021-08-06  8:12 UTC (permalink / raw)
  To: linux-btrfs

Now the old infrastructure can all be removed.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ioctl.c | 313 -----------------------------------------------
 1 file changed, 313 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b3ba89d6402e..fe56183f0872 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -980,107 +980,6 @@ static noinline int btrfs_mksnapshot(const struct path *parent,
 	return ret;
 }
 
-/*
- * When we're defragging a range, we don't want to kick it off again
- * if it is really just waiting for delalloc to send it down.
- * If we find a nice big extent or delalloc range for the bytes in the
- * file you want to defrag, we return 0 to let you know to skip this
- * part of the file
- */
-static int check_defrag_in_cache(struct inode *inode, u64 offset, u32 thresh)
-{
-	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
-	struct extent_map *em = NULL;
-	struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
-	const u32 sectorsize = btrfs_sb(inode->i_sb)->sectorsize;
-	u64 end;
-
-	read_lock(&em_tree->lock);
-	em = lookup_extent_mapping(em_tree, offset, sectorsize);
-	read_unlock(&em_tree->lock);
-
-	if (em) {
-		end = extent_map_end(em);
-		free_extent_map(em);
-		if (end - offset > thresh)
-			return 0;
-	}
-	/* if we already have a nice delalloc here, just stop */
-	thresh /= 2;
-	end = count_range_bits(io_tree, &offset, offset + thresh,
-			       thresh, EXTENT_DELALLOC, 1);
-	if (end >= thresh)
-		return 0;
-	return 1;
-}
-
-/*
- * helper function to walk through a file and find extents
- * newer than a specific transid, and smaller than thresh.
- *
- * This is used by the defragging code to find new and small
- * extents
- */
-static int find_new_extents(struct btrfs_root *root,
-			    struct inode *inode, u64 newer_than,
-			    u64 *off, u32 thresh)
-{
-	struct btrfs_path *path;
-	struct btrfs_key min_key;
-	struct extent_buffer *leaf;
-	struct btrfs_file_extent_item *extent;
-	int type;
-	int ret;
-	u64 ino = btrfs_ino(BTRFS_I(inode));
-
-	path = btrfs_alloc_path();
-	if (!path)
-		return -ENOMEM;
-
-	min_key.objectid = ino;
-	min_key.type = BTRFS_EXTENT_DATA_KEY;
-	min_key.offset = *off;
-
-	while (1) {
-		ret = btrfs_search_forward(root, &min_key, path, newer_than);
-		if (ret != 0)
-			goto none;
-process_slot:
-		if (min_key.objectid != ino)
-			goto none;
-		if (min_key.type != BTRFS_EXTENT_DATA_KEY)
-			goto none;
-
-		leaf = path->nodes[0];
-		extent = btrfs_item_ptr(leaf, path->slots[0],
-					struct btrfs_file_extent_item);
-
-		type = btrfs_file_extent_type(leaf, extent);
-		if (type == BTRFS_FILE_EXTENT_REG &&
-		    btrfs_file_extent_num_bytes(leaf, extent) < thresh &&
-		    check_defrag_in_cache(inode, min_key.offset, thresh)) {
-			*off = min_key.offset;
-			btrfs_free_path(path);
-			return 0;
-		}
-
-		path->slots[0]++;
-		if (path->slots[0] < btrfs_header_nritems(leaf)) {
-			btrfs_item_key_to_cpu(leaf, &min_key, path->slots[0]);
-			goto process_slot;
-		}
-
-		if (min_key.offset == (u64)-1)
-			goto none;
-
-		min_key.offset++;
-		btrfs_release_path(path);
-	}
-none:
-	btrfs_free_path(path);
-	return -ENOENT;
-}
-
 static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start,
 					       bool locked)
 {
@@ -1137,66 +1036,6 @@ static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em,
 	return ret;
 }
 
-static int should_defrag_range(struct inode *inode, u64 start, u32 thresh,
-			       u64 *last_len, u64 *skip, u64 *defrag_end,
-			       int compress)
-{
-	struct extent_map *em;
-	int ret = 1;
-	bool next_mergeable = true;
-	bool prev_mergeable = true;
-
-	/*
-	 * make sure that once we start defragging an extent, we keep on
-	 * defragging it
-	 */
-	if (start < *defrag_end)
-		return 1;
-
-	*skip = 0;
-
-	em = defrag_lookup_extent(inode, start, false);
-	if (!em)
-		return 0;
-
-	/* this will cover holes, and inline extents */
-	if (em->block_start >= EXTENT_MAP_LAST_BYTE) {
-		ret = 0;
-		goto out;
-	}
-
-	if (!*defrag_end)
-		prev_mergeable = false;
-
-	next_mergeable = defrag_check_next_extent(inode, em, false);
-	/*
-	 * we hit a real extent, if it is big or the next extent is not a
-	 * real extent, don't bother defragging it
-	 */
-	if (!compress && (*last_len == 0 || *last_len >= thresh) &&
-	    (em->len >= thresh || (!next_mergeable && !prev_mergeable)))
-		ret = 0;
-out:
-	/*
-	 * last_len ends up being a counter of how many bytes we've defragged.
-	 * every time we choose not to defrag an extent, we reset *last_len
-	 * so that the next tiny extent will force a defrag.
-	 *
-	 * The end result of this is that tiny extents before a single big
-	 * extent will force at least part of that big extent to be defragged.
-	 */
-	if (ret) {
-		*defrag_end = extent_map_end(em);
-	} else {
-		*last_len = 0;
-		*skip = extent_map_end(em);
-		*defrag_end = 0;
-	}
-
-	free_extent_map(em);
-	return ret;
-}
-
 /*
  * Prepare one page to be defragged.
  *
@@ -1280,158 +1119,6 @@ static struct page *defrag_prepare_one_page(struct btrfs_inode *inode,
 	return page;
 }
 
-/*
- * it doesn't do much good to defrag one or two pages
- * at a time.  This pulls in a nice chunk of pages
- * to COW and defrag.
- *
- * It also makes sure the delalloc code has enough
- * dirty data to avoid making new small extents as part
- * of the defrag
- *
- * It's a good idea to start RA on this range
- * before calling this.
- */
-static int cluster_pages_for_defrag(struct inode *inode,
-				    struct page **pages,
-				    unsigned long start_index,
-				    unsigned long num_pages)
-{
-	unsigned long file_end;
-	u64 isize = i_size_read(inode);
-	u64 page_start;
-	u64 page_end;
-	u64 page_cnt;
-	u64 start = (u64)start_index << PAGE_SHIFT;
-	u64 search_start;
-	int ret;
-	int i;
-	int i_done;
-	struct extent_state *cached_state = NULL;
-	struct extent_changeset *data_reserved = NULL;
-
-	file_end = (isize - 1) >> PAGE_SHIFT;
-	if (!isize || start_index > file_end)
-		return 0;
-
-	page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1);
-
-	ret = btrfs_delalloc_reserve_space(BTRFS_I(inode), &data_reserved,
-			start, page_cnt << PAGE_SHIFT);
-	if (ret)
-		return ret;
-	i_done = 0;
-
-	/* step one, lock all the pages */
-	for (i = 0; i < page_cnt; i++) {
-		struct page *page;
-
-		page = defrag_prepare_one_page(BTRFS_I(inode), start_index + i);
-		if (IS_ERR(page)) {
-			ret = PTR_ERR(page);
-			break;
-		}
-		pages[i] = page;
-		i_done++;
-	}
-	if (!i_done || ret)
-		goto out;
-
-	if (!(inode->i_sb->s_flags & SB_ACTIVE))
-		goto out;
-
-	/*
-	 * So now we have a nice long stream of locked and up to date pages,
-	 * lets wait on them.
-	 */
-	for (i = 0; i < i_done; i++)
-		wait_on_page_writeback(pages[i]);
-
-	page_start = page_offset(pages[0]);
-	page_end = page_offset(pages[i_done - 1]) + PAGE_SIZE;
-
-	lock_extent_bits(&BTRFS_I(inode)->io_tree,
-			 page_start, page_end - 1, &cached_state);
-
-	/*
-	 * When defragmenting we skip ranges that have holes or inline extents,
-	 * (check should_defrag_range()), to avoid unnecessary IO and wasting
-	 * space. At btrfs_defrag_file(), we check if a range should be defragged
-	 * before locking the inode and then, if it should, we trigger a sync
-	 * page cache readahead - we lock the inode only after that to avoid
-	 * blocking for too long other tasks that possibly want to operate on
-	 * other file ranges. But before we were able to get the inode lock,
-	 * some other task may have punched a hole in the range, or we may have
-	 * now an inline extent, in which case we should not defrag. So check
-	 * for that here, where we have the inode and the range locked, and bail
-	 * out if that happened.
-	 */
-	search_start = page_start;
-	while (search_start < page_end) {
-		struct extent_map *em;
-
-		em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, search_start,
-				      page_end - search_start);
-		if (IS_ERR(em)) {
-			ret = PTR_ERR(em);
-			goto out_unlock_range;
-		}
-		if (em->block_start >= EXTENT_MAP_LAST_BYTE) {
-			free_extent_map(em);
-			/* Ok, 0 means we did not defrag anything */
-			ret = 0;
-			goto out_unlock_range;
-		}
-		search_start = extent_map_end(em);
-		free_extent_map(em);
-	}
-
-	clear_extent_bit(&BTRFS_I(inode)->io_tree, page_start,
-			  page_end - 1, EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING |
-			  EXTENT_DEFRAG, 0, 0, &cached_state);
-
-	if (i_done != page_cnt) {
-		spin_lock(&BTRFS_I(inode)->lock);
-		btrfs_mod_outstanding_extents(BTRFS_I(inode), 1);
-		spin_unlock(&BTRFS_I(inode)->lock);
-		btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved,
-				start, (page_cnt - i_done) << PAGE_SHIFT, true);
-	}
-
-
-	set_extent_defrag(&BTRFS_I(inode)->io_tree, page_start, page_end - 1,
-			  &cached_state);
-
-	unlock_extent_cached(&BTRFS_I(inode)->io_tree,
-			     page_start, page_end - 1, &cached_state);
-
-	for (i = 0; i < i_done; i++) {
-		clear_page_dirty_for_io(pages[i]);
-		ClearPageChecked(pages[i]);
-		set_page_dirty(pages[i]);
-		unlock_page(pages[i]);
-		put_page(pages[i]);
-	}
-	btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT);
-	extent_changeset_free(data_reserved);
-	return i_done;
-
-out_unlock_range:
-	unlock_extent_cached(&BTRFS_I(inode)->io_tree,
-			     page_start, page_end - 1, &cached_state);
-out:
-	for (i = 0; i < i_done; i++) {
-		unlock_page(pages[i]);
-		put_page(pages[i]);
-	}
-	btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved,
-			start, page_cnt << PAGE_SHIFT, true);
-	btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT);
-	extent_changeset_free(data_reserved);
-	return ret;
-
-}
-
 struct defrag_target_range {
 	struct list_head list;
 	u64 start;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 11/11] btrfs: defrag: enable defrag for subpage case
  2021-08-06  8:12 [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag Qu Wenruo
                   ` (9 preceding siblings ...)
  2021-08-06  8:12 ` [PATCH v5 10/11] btrfs: defrag: remove the old infrastructure Qu Wenruo
@ 2021-08-06  8:12 ` Qu Wenruo
  2021-08-23 19:43 ` [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag David Sterba
  11 siblings, 0 replies; 21+ messages in thread
From: Qu Wenruo @ 2021-08-06  8:12 UTC (permalink / raw)
  To: linux-btrfs

With the new infrasturture which has taken subpage into consideration,
now we should be safe to allow defrag to work for subpage case.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ioctl.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index fe56183f0872..f00a3957a090 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3071,12 +3071,6 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp)
 		goto out;
 	}
 
-	/* Subpage defrag will be supported in later commits */
-	if (root->fs_info->sectorsize < PAGE_SIZE) {
-		ret = -ENOTTY;
-		goto out;
-	}
-
 	switch (inode->i_mode & S_IFMT) {
 	case S_IFDIR:
 		if (!capable(CAP_SYS_ADMIN)) {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 09/11] btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()
  2021-08-06  8:12 ` [PATCH v5 09/11] btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file() Qu Wenruo
@ 2021-08-09 11:32   ` Dan Carpenter
  2021-08-09 12:13     ` Qu Wenruo
  0 siblings, 1 reply; 21+ messages in thread
From: Dan Carpenter @ 2021-08-09 11:32 UTC (permalink / raw)
  To: kbuild, Qu Wenruo, linux-btrfs; +Cc: lkp, kbuild-all

Hi Qu,

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-defrag-rework-to-support-sector-perfect-defrag/20210806-161501
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: h8300-randconfig-m031-20210804 (attached as .config)
compiler: h8300-linux-gcc (GCC) 10.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
fs/btrfs/ioctl.c:1869 btrfs_defrag_file() error: uninitialized symbol 'ret'.

vim +/ret +1869 fs/btrfs/ioctl.c

fe90d1614439a8 Qu Wenruo                 2021-08-06  1757  int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
4cb5300bc839b8 Chris Mason               2011-05-24  1758  		      struct btrfs_ioctl_defrag_range_args *range,
4cb5300bc839b8 Chris Mason               2011-05-24  1759  		      u64 newer_than, unsigned long max_to_defrag)
4cb5300bc839b8 Chris Mason               2011-05-24  1760  {
0b246afa62b0cf Jeff Mahoney              2016-06-22  1761  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1762  	unsigned long sectors_defragged = 0;
151a31b25e5c94 Li Zefan                  2011-09-02  1763  	u64 isize = i_size_read(inode);
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1764  	u64 cur;
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1765  	u64 last_byte;
1e2ef46d89ee41 David Sterba              2017-07-17  1766  	bool do_compress = range->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
fe90d1614439a8 Qu Wenruo                 2021-08-06  1767  	bool ra_allocated = false;
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1768  	int compress_type = BTRFS_COMPRESS_ZLIB;
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1769  	int ret;
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1770  	u32 extent_thresh = range->extent_thresh;
4cb5300bc839b8 Chris Mason               2011-05-24  1771  
0abd5b17249ea5 Liu Bo                    2013-04-16  1772  	if (isize == 0)
0abd5b17249ea5 Liu Bo                    2013-04-16  1773  		return 0;
0abd5b17249ea5 Liu Bo                    2013-04-16  1774  
0abd5b17249ea5 Liu Bo                    2013-04-16  1775  	if (range->start >= isize)
0abd5b17249ea5 Liu Bo                    2013-04-16  1776  		return -EINVAL;
1a419d85a76853 Li Zefan                  2010-10-25  1777  
1e2ef46d89ee41 David Sterba              2017-07-17  1778  	if (do_compress) {
ce96b7ffd11e26 Chengguang Xu             2019-10-10  1779  		if (range->compress_type >= BTRFS_NR_COMPRESS_TYPES)
1a419d85a76853 Li Zefan                  2010-10-25  1780  			return -EINVAL;
1a419d85a76853 Li Zefan                  2010-10-25  1781  		if (range->compress_type)
1a419d85a76853 Li Zefan                  2010-10-25  1782  			compress_type = range->compress_type;
1a419d85a76853 Li Zefan                  2010-10-25  1783  	}
f46b5a66b3316e Christoph Hellwig         2008-06-11  1784  
0abd5b17249ea5 Liu Bo                    2013-04-16  1785  	if (extent_thresh == 0)
ee22184b53c823 Byongho Lee               2015-12-15  1786  		extent_thresh = SZ_256K;
940100a4a7b78b Chris Mason               2010-03-10  1787  
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1788  	if (range->start + range->len > range->start) {
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1789  		/* Got a specific range */
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1790  		last_byte = min(isize, range->start + range->len) - 1;
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1791  	} else {
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1792  		/* Defrag until file end */
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1793  		last_byte = isize - 1;
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1794  	}
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1795  
4cb5300bc839b8 Chris Mason               2011-05-24  1796  	/*
fe90d1614439a8 Qu Wenruo                 2021-08-06  1797  	 * If we were not given a ra, allocate a readahead context. As
0a52d108089f33 David Sterba              2017-06-22  1798  	 * readahead is just an optimization, defrag will work without it so
0a52d108089f33 David Sterba              2017-06-22  1799  	 * we don't error out.
4cb5300bc839b8 Chris Mason               2011-05-24  1800  	 */
fe90d1614439a8 Qu Wenruo                 2021-08-06  1801  	if (!ra) {
fe90d1614439a8 Qu Wenruo                 2021-08-06  1802  		ra_allocated = true;
63e727ecd238be David Sterba              2017-06-22  1803  		ra = kzalloc(sizeof(*ra), GFP_KERNEL);
0a52d108089f33 David Sterba              2017-06-22  1804  		if (ra)
4cb5300bc839b8 Chris Mason               2011-05-24  1805  			file_ra_state_init(ra, inode->i_mapping);
4cb5300bc839b8 Chris Mason               2011-05-24  1806  	}
4cb5300bc839b8 Chris Mason               2011-05-24  1807  
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1808  	/* Align the range */
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1809  	cur = round_down(range->start, fs_info->sectorsize);
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1810  	last_byte = round_up(last_byte, fs_info->sectorsize) - 1;
4cb5300bc839b8 Chris Mason               2011-05-24  1811  
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1812  	while (cur < last_byte) {
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1813  		u64 cluster_end;
1e701a3292e25a Chris Mason               2010-03-11  1814  
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1815  		/* The cluster size 256K should always be page aligned */
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1816  		BUILD_BUG_ON(!IS_ALIGNED(CLUSTER_SIZE, PAGE_SIZE));
008873eafbc77d Li Zefan                  2011-09-02  1817  
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1818  		/* We want the cluster ends at page boundary when possible */
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1819  		cluster_end = (((cur >> PAGE_SHIFT) +
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1820  			       (SZ_256K >> PAGE_SHIFT)) << PAGE_SHIFT) - 1;
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1821  		cluster_end = min(cluster_end, last_byte);
940100a4a7b78b Chris Mason               2010-03-10  1822  
64708539cd23b3 Josef Bacik               2021-02-10  1823  		btrfs_inode_lock(inode, 0);
eede2bf34f4fa8 Omar Sandoval             2016-11-03  1824  		if (IS_SWAPFILE(inode)) {
eede2bf34f4fa8 Omar Sandoval             2016-11-03  1825  			ret = -ETXTBSY;
64708539cd23b3 Josef Bacik               2021-02-10  1826  			btrfs_inode_unlock(inode, 0);
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1827  			break;
ecb8bea87d05fd Liu Bo                    2012-03-29  1828  		}
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1829  		if (!(inode->i_sb->s_flags & SB_ACTIVE)) {
64708539cd23b3 Josef Bacik               2021-02-10  1830  			btrfs_inode_unlock(inode, 0);
4cb5300bc839b8 Chris Mason               2011-05-24  1831  			break;

Can we hit this break statement on the first iteration through the loop?

3eaa2885276fd6 Chris Mason               2008-07-24  1832  		}
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1833  		if (do_compress)
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1834  			BTRFS_I(inode)->defrag_compress = compress_type;
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1835  		ret = defrag_one_cluster(BTRFS_I(inode), ra, cur,
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1836  				cluster_end + 1 - cur, extent_thresh,
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1837  				newer_than, do_compress,
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1838  				&sectors_defragged, max_to_defrag);
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1839  		btrfs_inode_unlock(inode, 0);
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1840  		if (ret < 0)
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1841  			break;
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1842  		cur = cluster_end + 1;
4cb5300bc839b8 Chris Mason               2011-05-24  1843  	}
f46b5a66b3316e Christoph Hellwig         2008-06-11  1844  
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1845  	if (ra_allocated)
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1846  		kfree(ra);
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1847  	if (sectors_defragged) {
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1848  		/*
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1849  		 * We have defragged some sectors, for compression case
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1850  		 * they need to be written back immediately.
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1851  		 */
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1852  		if (range->flags & BTRFS_DEFRAG_RANGE_START_IO) {
1e701a3292e25a Chris Mason               2010-03-11  1853  			filemap_flush(inode->i_mapping);
dec8ef90552f7b Filipe Manana             2014-03-01  1854  			if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
dec8ef90552f7b Filipe Manana             2014-03-01  1855  				     &BTRFS_I(inode)->runtime_flags))
1e701a3292e25a Chris Mason               2010-03-11  1856  				filemap_flush(inode->i_mapping);
dec8ef90552f7b Filipe Manana             2014-03-01  1857  		}
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1858  		if (range->compress_type == BTRFS_COMPRESS_LZO)
0b246afa62b0cf Jeff Mahoney              2016-06-22  1859  			btrfs_set_fs_incompat(fs_info, COMPRESS_LZO);
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1860  		else if (range->compress_type == BTRFS_COMPRESS_ZSTD)
5c1aab1dd5445e Nick Terrell              2017-08-09  1861  			btrfs_set_fs_incompat(fs_info, COMPRESS_ZSTD);
d0b928ff1ed56a Qu Wenruo                 2021-08-06  1862  		ret = sectors_defragged;
1a419d85a76853 Li Zefan                  2010-10-25  1863  	}
1e2ef46d89ee41 David Sterba              2017-07-17  1864  	if (do_compress) {
64708539cd23b3 Josef Bacik               2021-02-10  1865  		btrfs_inode_lock(inode, 0);
eec63c65dcbeb1 David Sterba              2017-07-17  1866  		BTRFS_I(inode)->defrag_compress = BTRFS_COMPRESS_NONE;
64708539cd23b3 Josef Bacik               2021-02-10  1867  		btrfs_inode_unlock(inode, 0);
633085c79c84c3 Filipe David Borba Manana 2013-08-16  1868  	}
940100a4a7b78b Chris Mason               2010-03-10 @1869  	return ret;
f46b5a66b3316e Christoph Hellwig         2008-06-11  1870  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 09/11] btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()
  2021-08-09 11:32   ` Dan Carpenter
@ 2021-08-09 12:13     ` Qu Wenruo
  2021-08-10  6:19       ` Qu Wenruo
  0 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2021-08-09 12:13 UTC (permalink / raw)
  To: Dan Carpenter, kbuild, Qu Wenruo, linux-btrfs; +Cc: lkp, kbuild-all



On 2021/8/9 下午7:32, Dan Carpenter wrote:
> Hi Qu,
>
> url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-defrag-rework-to-support-sector-perfect-defrag/20210806-161501
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
> config: h8300-randconfig-m031-20210804 (attached as .config)
> compiler: h8300-linux-gcc (GCC) 10.3.0
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
>
> New smatch warnings:
> fs/btrfs/ioctl.c:1869 btrfs_defrag_file() error: uninitialized symbol 'ret'.

OK, it's indeed a possible case, and I even find a special case where we
don't need to go through the branch reported by the static checker.

>
> vim +/ret +1869 fs/btrfs/ioctl.c
>
> fe90d1614439a8 Qu Wenruo                 2021-08-06  1757  int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
> 4cb5300bc839b8 Chris Mason               2011-05-24  1758  		      struct btrfs_ioctl_defrag_range_args *range,
> 4cb5300bc839b8 Chris Mason               2011-05-24  1759  		      u64 newer_than, unsigned long max_to_defrag)
> 4cb5300bc839b8 Chris Mason               2011-05-24  1760  {
> 0b246afa62b0cf Jeff Mahoney              2016-06-22  1761  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1762  	unsigned long sectors_defragged = 0;
> 151a31b25e5c94 Li Zefan                  2011-09-02  1763  	u64 isize = i_size_read(inode);
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1764  	u64 cur;
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1765  	u64 last_byte;
> 1e2ef46d89ee41 David Sterba              2017-07-17  1766  	bool do_compress = range->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
> fe90d1614439a8 Qu Wenruo                 2021-08-06  1767  	bool ra_allocated = false;
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1768  	int compress_type = BTRFS_COMPRESS_ZLIB;
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1769  	int ret;
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1770  	u32 extent_thresh = range->extent_thresh;
> 4cb5300bc839b8 Chris Mason               2011-05-24  1771
> 0abd5b17249ea5 Liu Bo                    2013-04-16  1772  	if (isize == 0)
> 0abd5b17249ea5 Liu Bo                    2013-04-16  1773  		return 0;
> 0abd5b17249ea5 Liu Bo                    2013-04-16  1774
> 0abd5b17249ea5 Liu Bo                    2013-04-16  1775  	if (range->start >= isize)
> 0abd5b17249ea5 Liu Bo                    2013-04-16  1776  		return -EINVAL;

Firstly, we skip several invalid cases, like empty file or range beyond
isize.

Notice that now range->start < isize; AKA range->start <= isize - 1;

> 1a419d85a76853 Li Zefan                  2010-10-25  1777
> 1e2ef46d89ee41 David Sterba              2017-07-17  1778  	if (do_compress) {
> ce96b7ffd11e26 Chengguang Xu             2019-10-10  1779  		if (range->compress_type >= BTRFS_NR_COMPRESS_TYPES)
> 1a419d85a76853 Li Zefan                  2010-10-25  1780  			return -EINVAL;
> 1a419d85a76853 Li Zefan                  2010-10-25  1781  		if (range->compress_type)
> 1a419d85a76853 Li Zefan                  2010-10-25  1782  			compress_type = range->compress_type;
> 1a419d85a76853 Li Zefan                  2010-10-25  1783  	}
> f46b5a66b3316e Christoph Hellwig         2008-06-11  1784
> 0abd5b17249ea5 Liu Bo                    2013-04-16  1785  	if (extent_thresh == 0)
> ee22184b53c823 Byongho Lee               2015-12-15  1786  		extent_thresh = SZ_256K;
> 940100a4a7b78b Chris Mason               2010-03-10  1787
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1788  	if (range->start + range->len > range->start) {
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1789  		/* Got a specific range */
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1790  		last_byte = min(isize, range->start + range->len) - 1;
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1791  	} else {
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1792  		/* Defrag until file end */
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1793  		last_byte = isize - 1;
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1794  	}

No matter the range->len, last_byte <= isize - 1;
Also start->range <= isize - 1;

Thus we can have a worst case where start->range == last_byte.

> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1795
> 4cb5300bc839b8 Chris Mason               2011-05-24  1796  	/*
> fe90d1614439a8 Qu Wenruo                 2021-08-06  1797  	 * If we were not given a ra, allocate a readahead context. As
> 0a52d108089f33 David Sterba              2017-06-22  1798  	 * readahead is just an optimization, defrag will work without it so
> 0a52d108089f33 David Sterba              2017-06-22  1799  	 * we don't error out.
> 4cb5300bc839b8 Chris Mason               2011-05-24  1800  	 */
> fe90d1614439a8 Qu Wenruo                 2021-08-06  1801  	if (!ra) {
> fe90d1614439a8 Qu Wenruo                 2021-08-06  1802  		ra_allocated = true;
> 63e727ecd238be David Sterba              2017-06-22  1803  		ra = kzalloc(sizeof(*ra), GFP_KERNEL);
> 0a52d108089f33 David Sterba              2017-06-22  1804  		if (ra)
> 4cb5300bc839b8 Chris Mason               2011-05-24  1805  			file_ra_state_init(ra, inode->i_mapping);
> 4cb5300bc839b8 Chris Mason               2011-05-24  1806  	}
> 4cb5300bc839b8 Chris Mason               2011-05-24  1807
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1808  	/* Align the range */
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1809  	cur = round_down(range->start, fs_info->sectorsize);
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1810  	last_byte = round_up(last_byte, fs_info->sectorsize) - 1;

Even for the worst case, range->start == last_byte.

If range->start = last_byte = 4K (aka isize = 4K + 1), then:
cur = 4K;
last_byte = 4K - 1;

Now we don't need to go through the while() loop at all.

> 4cb5300bc839b8 Chris Mason               2011-05-24  1811
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1812  	while (cur < last_byte) {
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1813  		u64 cluster_end;
> 1e701a3292e25a Chris Mason               2010-03-11  1814
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1815  		/* The cluster size 256K should always be page aligned */
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1816  		BUILD_BUG_ON(!IS_ALIGNED(CLUSTER_SIZE, PAGE_SIZE));
> 008873eafbc77d Li Zefan                  2011-09-02  1817
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1818  		/* We want the cluster ends at page boundary when possible */
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1819  		cluster_end = (((cur >> PAGE_SHIFT) +
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1820  			       (SZ_256K >> PAGE_SHIFT)) << PAGE_SHIFT) - 1;
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1821  		cluster_end = min(cluster_end, last_byte);
> 940100a4a7b78b Chris Mason               2010-03-10  1822
> 64708539cd23b3 Josef Bacik               2021-02-10  1823  		btrfs_inode_lock(inode, 0);
> eede2bf34f4fa8 Omar Sandoval             2016-11-03  1824  		if (IS_SWAPFILE(inode)) {
> eede2bf34f4fa8 Omar Sandoval             2016-11-03  1825  			ret = -ETXTBSY;
> 64708539cd23b3 Josef Bacik               2021-02-10  1826  			btrfs_inode_unlock(inode, 0);
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1827  			break;
> ecb8bea87d05fd Liu Bo                    2012-03-29  1828  		}
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1829  		if (!(inode->i_sb->s_flags & SB_ACTIVE)) {
> 64708539cd23b3 Josef Bacik               2021-02-10  1830  			btrfs_inode_unlock(inode, 0);
> 4cb5300bc839b8 Chris Mason               2011-05-24  1831  			break;
>
> Can we hit this break statement on the first iteration through the loop?
>
> 3eaa2885276fd6 Chris Mason               2008-07-24  1832  		}
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1833  		if (do_compress)
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1834  			BTRFS_I(inode)->defrag_compress = compress_type;
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1835  		ret = defrag_one_cluster(BTRFS_I(inode), ra, cur,
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1836  				cluster_end + 1 - cur, extent_thresh,
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1837  				newer_than, do_compress,
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1838  				&sectors_defragged, max_to_defrag);
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1839  		btrfs_inode_unlock(inode, 0);
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1840  		if (ret < 0)
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1841  			break;
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1842  		cur = cluster_end + 1;
> 4cb5300bc839b8 Chris Mason               2011-05-24  1843  	}
> f46b5a66b3316e Christoph Hellwig         2008-06-11  1844
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1845  	if (ra_allocated)
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1846  		kfree(ra);
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1847  	if (sectors_defragged) {
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1848  		/*
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1849  		 * We have defragged some sectors, for compression case
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1850  		 * they need to be written back immediately.
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1851  		 */
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1852  		if (range->flags & BTRFS_DEFRAG_RANGE_START_IO) {
> 1e701a3292e25a Chris Mason               2010-03-11  1853  			filemap_flush(inode->i_mapping);
> dec8ef90552f7b Filipe Manana             2014-03-01  1854  			if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
> dec8ef90552f7b Filipe Manana             2014-03-01  1855  				     &BTRFS_I(inode)->runtime_flags))
> 1e701a3292e25a Chris Mason               2010-03-11  1856  				filemap_flush(inode->i_mapping);
> dec8ef90552f7b Filipe Manana             2014-03-01  1857  		}
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1858  		if (range->compress_type == BTRFS_COMPRESS_LZO)
> 0b246afa62b0cf Jeff Mahoney              2016-06-22  1859  			btrfs_set_fs_incompat(fs_info, COMPRESS_LZO);
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1860  		else if (range->compress_type == BTRFS_COMPRESS_ZSTD)
> 5c1aab1dd5445e Nick Terrell              2017-08-09  1861  			btrfs_set_fs_incompat(fs_info, COMPRESS_ZSTD);
> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1862  		ret = sectors_defragged;
> 1a419d85a76853 Li Zefan                  2010-10-25  1863  	}

We also skip above (sectors_defraged) branch.

> 1e2ef46d89ee41 David Sterba              2017-07-17  1864  	if (do_compress) {
> 64708539cd23b3 Josef Bacik               2021-02-10  1865  		btrfs_inode_lock(inode, 0);
> eec63c65dcbeb1 David Sterba              2017-07-17  1866  		BTRFS_I(inode)->defrag_compress = BTRFS_COMPRESS_NONE;
> 64708539cd23b3 Josef Bacik               2021-02-10  1867  		btrfs_inode_unlock(inode, 0);
> 633085c79c84c3 Filipe David Borba Manana 2013-08-16  1868  	}
> 940100a4a7b78b Chris Mason               2010-03-10 @1869  	return ret;

And @ret is indeed uninitialized.

Will fix it in next update.

Thanks,
Qu

> f46b5a66b3316e Christoph Hellwig         2008-06-11  1870  }
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 09/11] btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()
  2021-08-09 12:13     ` Qu Wenruo
@ 2021-08-10  6:19       ` Qu Wenruo
  0 siblings, 0 replies; 21+ messages in thread
From: Qu Wenruo @ 2021-08-10  6:19 UTC (permalink / raw)
  To: Dan Carpenter, kbuild, Qu Wenruo, linux-btrfs; +Cc: lkp, kbuild-all



On 2021/8/9 下午8:13, Qu Wenruo wrote:
> 
> 
> On 2021/8/9 下午7:32, Dan Carpenter wrote:
>> Hi Qu,
>>
>> url:    
>> https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-defrag-rework-to-support-sector-perfect-defrag/20210806-161501 
>>
>> base:   
>> https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
>> config: h8300-randconfig-m031-20210804 (attached as .config)
>> compiler: h8300-linux-gcc (GCC) 10.3.0
>>
>> If you fix the issue, kindly add following tag as appropriate
>> Reported-by: kernel test robot <lkp@intel.com>
>> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
>>
>> New smatch warnings:
>> fs/btrfs/ioctl.c:1869 btrfs_defrag_file() error: uninitialized symbol 
>> 'ret'.
> 
> OK, it's indeed a possible case, and I even find a special case where we
> don't need to go through the branch reported by the static checker.
> 
>>
>> vim +/ret +1869 fs/btrfs/ioctl.c
>>
>> fe90d1614439a8 Qu Wenruo                 2021-08-06  1757  int 
>> btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
>> 4cb5300bc839b8 Chris Mason               2011-05-24  
>> 1758                struct btrfs_ioctl_defrag_range_args *range,
>> 4cb5300bc839b8 Chris Mason               2011-05-24  
>> 1759                u64 newer_than, unsigned long max_to_defrag)
>> 4cb5300bc839b8 Chris Mason               2011-05-24  1760  {
>> 0b246afa62b0cf Jeff Mahoney              2016-06-22  1761      struct 
>> btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1762      
>> unsigned long sectors_defragged = 0;
>> 151a31b25e5c94 Li Zefan                  2011-09-02  1763      u64 
>> isize = i_size_read(inode);
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1764      u64 cur;
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1765      u64 
>> last_byte;
>> 1e2ef46d89ee41 David Sterba              2017-07-17  1766      bool 
>> do_compress = range->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
>> fe90d1614439a8 Qu Wenruo                 2021-08-06  1767      bool 
>> ra_allocated = false;
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1768      int 
>> compress_type = BTRFS_COMPRESS_ZLIB;
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1769      int ret;
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1770      u32 
>> extent_thresh = range->extent_thresh;
>> 4cb5300bc839b8 Chris Mason               2011-05-24  1771
>> 0abd5b17249ea5 Liu Bo                    2013-04-16  1772      if 
>> (isize == 0)
>> 0abd5b17249ea5 Liu Bo                    2013-04-16  1773          
>> return 0;
>> 0abd5b17249ea5 Liu Bo                    2013-04-16  1774
>> 0abd5b17249ea5 Liu Bo                    2013-04-16  1775      if 
>> (range->start >= isize)
>> 0abd5b17249ea5 Liu Bo                    2013-04-16  1776          
>> return -EINVAL;
> 
> Firstly, we skip several invalid cases, like empty file or range beyond
> isize.
> 
> Notice that now range->start < isize; AKA range->start <= isize - 1;
> 
>> 1a419d85a76853 Li Zefan                  2010-10-25  1777
>> 1e2ef46d89ee41 David Sterba              2017-07-17  1778      if 
>> (do_compress) {
>> ce96b7ffd11e26 Chengguang Xu             2019-10-10  1779          if 
>> (range->compress_type >= BTRFS_NR_COMPRESS_TYPES)
>> 1a419d85a76853 Li Zefan                  2010-10-25  1780              
>> return -EINVAL;
>> 1a419d85a76853 Li Zefan                  2010-10-25  1781          if 
>> (range->compress_type)
>> 1a419d85a76853 Li Zefan                  2010-10-25  1782              
>> compress_type = range->compress_type;
>> 1a419d85a76853 Li Zefan                  2010-10-25  1783      }
>> f46b5a66b3316e Christoph Hellwig         2008-06-11  1784
>> 0abd5b17249ea5 Liu Bo                    2013-04-16  1785      if 
>> (extent_thresh == 0)
>> ee22184b53c823 Byongho Lee               2015-12-15  1786          
>> extent_thresh = SZ_256K;
>> 940100a4a7b78b Chris Mason               2010-03-10  1787
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1788      if 
>> (range->start + range->len > range->start) {
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1789          /* 
>> Got a specific range */
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1790          
>> last_byte = min(isize, range->start + range->len) - 1;
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1791      } else {
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1792          /* 
>> Defrag until file end */
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1793          
>> last_byte = isize - 1;
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1794      }
> 
> No matter the range->len, last_byte <= isize - 1;
> Also start->range <= isize - 1;
> 
> Thus we can have a worst case where start->range == last_byte.
> 
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1795
>> 4cb5300bc839b8 Chris Mason               2011-05-24  1796      /*
>> fe90d1614439a8 Qu Wenruo                 2021-08-06  1797       * If 
>> we were not given a ra, allocate a readahead context. As
>> 0a52d108089f33 David Sterba              2017-06-22  1798       * 
>> readahead is just an optimization, defrag will work without it so
>> 0a52d108089f33 David Sterba              2017-06-22  1799       * we 
>> don't error out.
>> 4cb5300bc839b8 Chris Mason               2011-05-24  1800       */
>> fe90d1614439a8 Qu Wenruo                 2021-08-06  1801      if (!ra) {
>> fe90d1614439a8 Qu Wenruo                 2021-08-06  1802          
>> ra_allocated = true;
>> 63e727ecd238be David Sterba              2017-06-22  1803          ra 
>> = kzalloc(sizeof(*ra), GFP_KERNEL);
>> 0a52d108089f33 David Sterba              2017-06-22  1804          if 
>> (ra)
>> 4cb5300bc839b8 Chris Mason               2011-05-24  1805              
>> file_ra_state_init(ra, inode->i_mapping);
>> 4cb5300bc839b8 Chris Mason               2011-05-24  1806      }
>> 4cb5300bc839b8 Chris Mason               2011-05-24  1807
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1808      /* 
>> Align the range */
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1809      cur = 
>> round_down(range->start, fs_info->sectorsize);
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1810      
>> last_byte = round_up(last_byte, fs_info->sectorsize) - 1;
> 
> Even for the worst case, range->start == last_byte.
> 
> If range->start = last_byte = 4K (aka isize = 4K + 1), then:
> cur = 4K;
> last_byte = 4K - 1;
> 
> Now we don't need to go through the while() loop at all.

BTW, here the proper way to calculate last_byte should be:

round_up(last_byte + 1, sectorsize) - 1;

So that we are ensured that rounded up last_byte will never be smaller 
than range->start.

I have fixed both uninitialized @ret and this @last_byte problem in my 
github repo already.

Thanks,
Qu
> 
>> 4cb5300bc839b8 Chris Mason               2011-05-24  1811
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1812      while 
>> (cur < last_byte) {
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1813          u64 
>> cluster_end;
>> 1e701a3292e25a Chris Mason               2010-03-11  1814
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1815          /* 
>> The cluster size 256K should always be page aligned */
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1816          
>> BUILD_BUG_ON(!IS_ALIGNED(CLUSTER_SIZE, PAGE_SIZE));
>> 008873eafbc77d Li Zefan                  2011-09-02  1817
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1818          /* 
>> We want the cluster ends at page boundary when possible */
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1819          
>> cluster_end = (((cur >> PAGE_SHIFT) +
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  
>> 1820                     (SZ_256K >> PAGE_SHIFT)) << PAGE_SHIFT) - 1;
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1821          
>> cluster_end = min(cluster_end, last_byte);
>> 940100a4a7b78b Chris Mason               2010-03-10  1822
>> 64708539cd23b3 Josef Bacik               2021-02-10  1823          
>> btrfs_inode_lock(inode, 0);
>> eede2bf34f4fa8 Omar Sandoval             2016-11-03  1824          if 
>> (IS_SWAPFILE(inode)) {
>> eede2bf34f4fa8 Omar Sandoval             2016-11-03  1825              
>> ret = -ETXTBSY;
>> 64708539cd23b3 Josef Bacik               2021-02-10  1826              
>> btrfs_inode_unlock(inode, 0);
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1827              
>> break;
>> ecb8bea87d05fd Liu Bo                    2012-03-29  1828          }
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1829          if 
>> (!(inode->i_sb->s_flags & SB_ACTIVE)) {
>> 64708539cd23b3 Josef Bacik               2021-02-10  1830              
>> btrfs_inode_unlock(inode, 0);
>> 4cb5300bc839b8 Chris Mason               2011-05-24  1831              
>> break;
>>
>> Can we hit this break statement on the first iteration through the loop?
>>
>> 3eaa2885276fd6 Chris Mason               2008-07-24  1832          }
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1833          if 
>> (do_compress)
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1834              
>> BTRFS_I(inode)->defrag_compress = compress_type;
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1835          ret 
>> = defrag_one_cluster(BTRFS_I(inode), ra, cur,
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  
>> 1836                  cluster_end + 1 - cur, extent_thresh,
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  
>> 1837                  newer_than, do_compress,
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  
>> 1838                  &sectors_defragged, max_to_defrag);
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1839          
>> btrfs_inode_unlock(inode, 0);
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1840          if 
>> (ret < 0)
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1841              
>> break;
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1842          cur 
>> = cluster_end + 1;
>> 4cb5300bc839b8 Chris Mason               2011-05-24  1843      }
>> f46b5a66b3316e Christoph Hellwig         2008-06-11  1844
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1845      if 
>> (ra_allocated)
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1846          
>> kfree(ra);
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1847      if 
>> (sectors_defragged) {
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1848          /*
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1849           * 
>> We have defragged some sectors, for compression case
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1850           * 
>> they need to be written back immediately.
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1851           */
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1852          if 
>> (range->flags & BTRFS_DEFRAG_RANGE_START_IO) {
>> 1e701a3292e25a Chris Mason               2010-03-11  1853              
>> filemap_flush(inode->i_mapping);
>> dec8ef90552f7b Filipe Manana             2014-03-01  1854              
>> if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
>> dec8ef90552f7b Filipe Manana             2014-03-01  
>> 1855                       &BTRFS_I(inode)->runtime_flags))
>> 1e701a3292e25a Chris Mason               2010-03-11  
>> 1856                  filemap_flush(inode->i_mapping);
>> dec8ef90552f7b Filipe Manana             2014-03-01  1857          }
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1858          if 
>> (range->compress_type == BTRFS_COMPRESS_LZO)
>> 0b246afa62b0cf Jeff Mahoney              2016-06-22  1859              
>> btrfs_set_fs_incompat(fs_info, COMPRESS_LZO);
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1860          
>> else if (range->compress_type == BTRFS_COMPRESS_ZSTD)
>> 5c1aab1dd5445e Nick Terrell              2017-08-09  1861              
>> btrfs_set_fs_incompat(fs_info, COMPRESS_ZSTD);
>> d0b928ff1ed56a Qu Wenruo                 2021-08-06  1862          ret 
>> = sectors_defragged;
>> 1a419d85a76853 Li Zefan                  2010-10-25  1863      }
> 
> We also skip above (sectors_defraged) branch.
> 
>> 1e2ef46d89ee41 David Sterba              2017-07-17  1864      if 
>> (do_compress) {
>> 64708539cd23b3 Josef Bacik               2021-02-10  1865          
>> btrfs_inode_lock(inode, 0);
>> eec63c65dcbeb1 David Sterba              2017-07-17  1866          
>> BTRFS_I(inode)->defrag_compress = BTRFS_COMPRESS_NONE;
>> 64708539cd23b3 Josef Bacik               2021-02-10  1867          
>> btrfs_inode_unlock(inode, 0);
>> 633085c79c84c3 Filipe David Borba Manana 2013-08-16  1868      }
>> 940100a4a7b78b Chris Mason               2010-03-10 @1869      return 
>> ret;
> 
> And @ret is indeed uninitialized.
> 
> Will fix it in next update.
> 
> Thanks,
> Qu
> 
>> f46b5a66b3316e Christoph Hellwig         2008-06-11  1870  }
>>
>> ---
>> 0-DAY CI Kernel Test Service, Intel Corporation
>> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 04/11] btrfs: defrag: extract the page preparation code into one helper
  2021-08-06  8:12 ` [PATCH v5 04/11] btrfs: defrag: extract the page preparation code into one helper Qu Wenruo
@ 2021-08-23 19:04   ` David Sterba
  0 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2021-08-23 19:04 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Aug 06, 2021 at 04:12:35PM +0800, Qu Wenruo wrote:
> +					    pgoff_t index)
> +{
> +	struct address_space *mapping = inode->vfs_inode.i_mapping;
> +	gfp_t mask = btrfs_alloc_write_mask(mapping);
> +	u64 page_start = index << PAGE_SHIFT;

This needs (u64)index << PAGE_SHIFT, the types are not 64bit safe.

> +	u64 page_end = page_start + PAGE_SIZE - 1;
> +	struct extent_state *cached_state = NULL;
> +	struct page *page;
> +	int ret;
> +
> +again:
> +	page = find_or_create_page(mapping, index, mask);
> +	if (!page)
> +		return ERR_PTR(-ENOMEM);
> +
> +	ret = set_page_extent_mapped(page);
> +	if (ret < 0) {
> +		unlock_page(page);
> +		put_page(page);
> +		return ERR_PTR(ret);
> +	}
> +
> +	/* Wait for any exists ordered extent in the range */
                        existing

> +	while (1) {
> +		struct btrfs_ordered_extent *ordered;
> +
> +		lock_extent_bits(&inode->io_tree, page_start, page_end,
> +				 &cached_state);
> +		ordered = btrfs_lookup_ordered_range(inode, page_start,
> +						     PAGE_SIZE);
> +		unlock_extent_cached(&inode->io_tree, page_start, page_end,
> +				     &cached_state);
> +		if (!ordered)
> +			break;
> +
> +		unlock_page(page);
> +		btrfs_start_ordered_extent(ordered, 1);
> +		btrfs_put_ordered_extent(ordered);
> +		lock_page(page);
> +		/*
> +		 * we unlocked the page above, so we need check if
> +		 * it was released or not.
> +		 */

Please reformat comments that are moved

> +		if (page->mapping != mapping || !PagePrivate(page)) {
> +			unlock_page(page);
> +			put_page(page);
> +			goto again;
> +		}
> +	}
> +
> +	/*
> +	 * Now the page range has no ordered extent any more.
> +	 * Read the page to make it uptodate.

Same.

> +	 */
> +	if (!PageUptodate(page)) {
> +		btrfs_readpage(NULL, page);
> +		lock_page(page);
> +		if (page->mapping != mapping || !PagePrivate(page)) {
> +			unlock_page(page);
> +			put_page(page);
> +			goto again;
> +		}
> +		if (!PageUptodate(page)) {
> +			unlock_page(page);
> +			put_page(page);
> +			return ERR_PTR(-EIO);
> +		}
> +	}
> +	return page;
> +}

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 05/11] btrfs: defrag: introduce a new helper to collect target file extents
  2021-08-06  8:12 ` [PATCH v5 05/11] btrfs: defrag: introduce a new helper to collect target file extents Qu Wenruo
@ 2021-08-23 19:08   ` David Sterba
  0 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2021-08-23 19:08 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Aug 06, 2021 at 04:12:36PM +0800, Qu Wenruo wrote:
> Introduce a new helper, defrag_collect_targets(), to collect all
> possible targets to be defragged.
> 
> This function will not consider things like max_sectors_to_defrag, thus
> caller should be responsible to ensure we don't exceed the limit.
> 
> This function will be the first stage of later defrag rework.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/ioctl.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 120 insertions(+)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index c0639780f99c..043c44daa5ae 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1427,6 +1427,126 @@ static int cluster_pages_for_defrag(struct inode *inode,
>  
>  }
>  
> +struct defrag_target_range {
> +	struct list_head list;
> +	u64 start;
> +	u64 len;
> +};
> +
> +/*
> + * Helper to collect all valid target extents.

Please use wording like "Collect all the valid ...", ie. drop any "This
function does ..." or "Helper to do ..."

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 07/11] btrfs: defrag: introduce a helper to defrag a range
  2021-08-06  8:12 ` [PATCH v5 07/11] btrfs: defrag: introduce a helper to defrag a range Qu Wenruo
@ 2021-08-23 19:21   ` David Sterba
  0 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2021-08-23 19:21 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Aug 06, 2021 at 04:12:38PM +0800, Qu Wenruo wrote:
> A new helper, defrag_one_range(), is introduced to defrag one range.
> 
> This function will mostly prepare the needed pages and extent status for
> defrag_one_locked_target().
> 
> As we can only have a consistent view of extent map with page and
> extent bits locked, we need to re-check the range passed in to get a
> real target list for defrag_one_locked_target().
> 
> Since defrag_collect_targets() will call defrag_lookup_extent() and lock
> extent range, we also need to teach those two functions to skip extent
> lock.
> Thus new parameter, @locked, is introruced to skip extent lock if the
> caller has already locked the range.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/ioctl.c | 105 ++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 95 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index a21c4c09269a..2f7196f9bd65 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1081,7 +1081,8 @@ static int find_new_extents(struct btrfs_root *root,
>  	return -ENOENT;
>  }
>  
> -static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start)
> +static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start,
> +					       bool locked)
>  {
>  	struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
>  	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
> @@ -1101,10 +1102,12 @@ static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start)
>  		u64 end = start + sectorsize - 1;
>  
>  		/* get the big lock and read metadata off disk */
> -		lock_extent_bits(io_tree, start, end, &cached);
> +		if (!locked)
> +			lock_extent_bits(io_tree, start, end, &cached);
>  		em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, start,
>  				      sectorsize);
> -		unlock_extent_cached(io_tree, start, end, &cached);
> +		if (!locked)
> +			unlock_extent_cached(io_tree, start, end, &cached);
>  
>  		if (IS_ERR(em))
>  			return NULL;
> @@ -1113,7 +1116,8 @@ static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start)
>  	return em;
>  }
>  
> -static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em)
> +static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em,
> +				     bool locked)
>  {
>  	struct extent_map *next;
>  	bool ret = true;
> @@ -1122,7 +1126,7 @@ static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em)
>  	if (em->start + em->len >= i_size_read(inode))
>  		return false;
>  
> -	next = defrag_lookup_extent(inode, em->start + em->len);
> +	next = defrag_lookup_extent(inode, em->start + em->len, locked);
>  	if (!next || next->block_start >= EXTENT_MAP_LAST_BYTE)
>  		ret = false;
>  	else if ((em->block_start + em->block_len == next->block_start) &&
> @@ -1151,7 +1155,7 @@ static int should_defrag_range(struct inode *inode, u64 start, u32 thresh,
>  
>  	*skip = 0;
>  
> -	em = defrag_lookup_extent(inode, start);
> +	em = defrag_lookup_extent(inode, start, false);
>  	if (!em)
>  		return 0;
>  
> @@ -1164,7 +1168,7 @@ static int should_defrag_range(struct inode *inode, u64 start, u32 thresh,
>  	if (!*defrag_end)
>  		prev_mergeable = false;
>  
> -	next_mergeable = defrag_check_next_extent(inode, em);
> +	next_mergeable = defrag_check_next_extent(inode, em, false);
>  	/*
>  	 * we hit a real extent, if it is big or the next extent is not a
>  	 * real extent, don't bother defragging it
> @@ -1445,12 +1449,13 @@ struct defrag_target_range {
>   * @do_compress:   Whether the defrag is doing compression
>   *		   If true, @extent_thresh will be ignored and all regular
>   *		   file extents meeting @newer_than will be targets.
> + * @locked:	   If the range has already hold extent lock
>   * @target_list:   The list of targets file extents
>   */
>  static int defrag_collect_targets(struct btrfs_inode *inode,
>  				  u64 start, u64 len, u32 extent_thresh,
>  				  u64 newer_than, bool do_compress,
> -				  struct list_head *target_list)
> +				  bool locked, struct list_head *target_list)
>  {
>  	u64 cur = start;
>  	int ret = 0;
> @@ -1461,7 +1466,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>  		bool next_mergeable = true;
>  		u64 range_len;
>  
> -		em = defrag_lookup_extent(&inode->vfs_inode, cur);
> +		em = defrag_lookup_extent(&inode->vfs_inode, cur, locked);
>  		if (!em)
>  			break;
>  
> @@ -1485,7 +1490,8 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>  		if (em->len >= extent_thresh)
>  			goto next;
>  
> -		next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em);
> +		next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em,
> +							  locked);
>  		if (!next_mergeable) {
>  			struct defrag_target_range *last;
>  
> @@ -1603,6 +1609,85 @@ static int defrag_one_locked_target(struct btrfs_inode *inode,
>  	return ret;
>  }
>  
> +static int defrag_one_range(struct btrfs_inode *inode,
> +			    u64 start, u32 len,
> +			    u32 extent_thresh, u64 newer_than,
> +			    bool do_compress)
> +{
> +	struct extent_state *cached_state = NULL;
> +	struct defrag_target_range *entry;
> +	struct defrag_target_range *tmp;
> +	LIST_HEAD(target_list);
> +	struct page **pages;
> +	const u32 sectorsize = inode->root->fs_info->sectorsize;
> +	unsigned long last_index = (start + len - 1) >> PAGE_SHIFT;
> +	unsigned long start_index = start >> PAGE_SHIFT;

This is another instance of the unsafe page index type variable, here
it's fine because start is u64, but below

> +	unsigned int nr_pages = last_index - start_index + 1;
> +	int ret = 0;
> +	int i;
> +
> +	ASSERT(nr_pages <= CLUSTER_SIZE / PAGE_SIZE);
> +	ASSERT(IS_ALIGNED(start, sectorsize) && IS_ALIGNED(len, sectorsize));
> +
> +	pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
> +	if (!pages)
> +		return -ENOMEM;
> +
> +	/* Prepare all pages */
> +	for (i = 0; i < nr_pages; i++) {
> +		pages[i] = defrag_prepare_one_page(inode, start_index + i);
> +		if (IS_ERR(pages[i])) {
> +			ret = PTR_ERR(pages[i]);
> +			pages[i] = NULL;
> +			goto free_pages;
> +		}
> +	}
> +	for (i = 0; i < nr_pages; i++)
> +		wait_on_page_writeback(pages[i]);
> +
> +	/* Also lock the pages range */
> +	lock_extent_bits(&inode->io_tree, start_index << PAGE_SHIFT,

The shifts are on unsigned long which can break, so it's better to
declare them u64 right away.

> +			 (last_index << PAGE_SHIFT) + PAGE_SIZE - 1,
> +			 &cached_state);
> +	/*
> +	 * Now we have a consistent view about the extent map, re-check
> +	 * which range really needs to be defragged.
> +	 *
> +	 * And this time we have extent locked already, pass @locked = true
> +	 * so that we won't re-lock the extent range and cause deadlock.
> +	 */
> +	ret = defrag_collect_targets(inode, start, len, extent_thresh,
> +				     newer_than, do_compress, true,
> +				     &target_list);
> +	if (ret < 0)
> +		goto unlock_extent;
> +
> +	list_for_each_entry(entry, &target_list, list) {
> +		ret = defrag_one_locked_target(inode, entry, pages, nr_pages,
> +					       &cached_state);
> +		if (ret < 0)
> +			break;
> +	}
> +
> +	list_for_each_entry_safe(entry, tmp, &target_list, list) {
> +		list_del_init(&entry->list);
> +		kfree(entry);
> +	}
> +unlock_extent:
> +	unlock_extent_cached(&inode->io_tree, start_index << PAGE_SHIFT,
> +			     (last_index << PAGE_SHIFT) + PAGE_SIZE - 1,

And same here.

> +			     &cached_state);
> +free_pages:
> +	for (i = 0; i < nr_pages; i++) {
> +		if (pages[i]) {
> +			unlock_page(pages[i]);
> +			put_page(pages[i]);
> +		}
> +	}
> +	kfree(pages);
> +	return ret;
> +}
> +
>  /*
>   * Btrfs entrace for defrag.
>   *
> -- 
> 2.32.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 08/11] btrfs: defrag: introduce a new helper to defrag one cluster
  2021-08-06  8:12 ` [PATCH v5 08/11] btrfs: defrag: introduce a new helper to defrag one cluster Qu Wenruo
@ 2021-08-23 19:27   ` David Sterba
  0 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2021-08-23 19:27 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Aug 06, 2021 at 04:12:39PM +0800, Qu Wenruo wrote:
> This new helper, defrag_one_cluster(), will defrag one cluster (at most
> 256K) by:
> 
> - Collect all initial targets
> 
> - Kick in readahead when possible
> 
> - Call defrag_one_range() on each initial target
>   With some extra range clamping.
> 
> - Update @sectors_defragged parameter
> 
> This involves one behavior change, the defragged sectors accounting is
> no longer as accurate as old behavior, as the initial targets are not
> consistent.
> 
> We can have new holes punched inside the initial target, and we will
> skip such holes later.
> But the defragged sectors accounting doesn't need to be that accurate
> anyway, thus I don't want to pass those extra accounting burden into
> defrag_one_range().

I think it's ok, any parallel operation can change the file during
defragmentation at any point.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag
  2021-08-06  8:12 [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag Qu Wenruo
                   ` (10 preceding siblings ...)
  2021-08-06  8:12 ` [PATCH v5 11/11] btrfs: defrag: enable defrag for subpage case Qu Wenruo
@ 2021-08-23 19:43 ` David Sterba
  2021-08-27  9:18   ` David Sterba
  11 siblings, 1 reply; 21+ messages in thread
From: David Sterba @ 2021-08-23 19:43 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Aug 06, 2021 at 04:12:31PM +0800, Qu Wenruo wrote:
> Now both regular sectorsize and subpage sectorsize can pass defrag test
> group.

> Qu Wenruo (11):
>   btrfs: defrag: pass file_ra_state instead of file for
>     btrfs_defrag_file()
>   btrfs: defrag: also check PagePrivate for subpage cases in
>     cluster_pages_for_defrag()
>   btrfs: defrag: replace hard coded PAGE_SIZE to sectorsize
>   btrfs: defrag: extract the page preparation code into one helper
>   btrfs: defrag: introduce a new helper to collect target file extents
>   btrfs: defrag: introduce a helper to defrag a continuous prepared
>     range
>   btrfs: defrag: introduce a helper to defrag a range
>   btrfs: defrag: introduce a new helper to defrag one cluster
>   btrfs: defrag: use defrag_one_cluster() to implement
>     btrfs_defrag_file()
>   btrfs: defrag: remove the old infrastructure
>   btrfs: defrag: enable defrag for subpage case

The patch 9 was taken from your git repository. Patchset now in a topic
branch, I'll do one round and then move it to misc-next. Any followups
please send as separate patches, thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag
  2021-08-23 19:43 ` [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag David Sterba
@ 2021-08-27  9:18   ` David Sterba
  0 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2021-08-27  9:18 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

On Mon, Aug 23, 2021 at 09:43:03PM +0200, David Sterba wrote:
> On Fri, Aug 06, 2021 at 04:12:31PM +0800, Qu Wenruo wrote:
> > Now both regular sectorsize and subpage sectorsize can pass defrag test
> > group.
> 
> > Qu Wenruo (11):
> >   btrfs: defrag: pass file_ra_state instead of file for
> >     btrfs_defrag_file()
> >   btrfs: defrag: also check PagePrivate for subpage cases in
> >     cluster_pages_for_defrag()
> >   btrfs: defrag: replace hard coded PAGE_SIZE to sectorsize
> >   btrfs: defrag: extract the page preparation code into one helper
> >   btrfs: defrag: introduce a new helper to collect target file extents
> >   btrfs: defrag: introduce a helper to defrag a continuous prepared
> >     range
> >   btrfs: defrag: introduce a helper to defrag a range
> >   btrfs: defrag: introduce a new helper to defrag one cluster
> >   btrfs: defrag: use defrag_one_cluster() to implement
> >     btrfs_defrag_file()
> >   btrfs: defrag: remove the old infrastructure
> >   btrfs: defrag: enable defrag for subpage case
> 
> The patch 9 was taken from your git repository. Patchset now in a topic
> branch, I'll do one round and then move it to misc-next. Any followups
> please send as separate patches, thanks.

Now moved to misc-next, thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-08-27  9:21 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-06  8:12 [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag Qu Wenruo
2021-08-06  8:12 ` [PATCH v5 01/11] btrfs: defrag: pass file_ra_state instead of file for btrfs_defrag_file() Qu Wenruo
2021-08-06  8:12 ` [PATCH v5 02/11] btrfs: defrag: also check PagePrivate for subpage cases in cluster_pages_for_defrag() Qu Wenruo
2021-08-06  8:12 ` [PATCH v5 03/11] btrfs: defrag: replace hard coded PAGE_SIZE to sectorsize Qu Wenruo
2021-08-06  8:12 ` [PATCH v5 04/11] btrfs: defrag: extract the page preparation code into one helper Qu Wenruo
2021-08-23 19:04   ` David Sterba
2021-08-06  8:12 ` [PATCH v5 05/11] btrfs: defrag: introduce a new helper to collect target file extents Qu Wenruo
2021-08-23 19:08   ` David Sterba
2021-08-06  8:12 ` [PATCH v5 06/11] btrfs: defrag: introduce a helper to defrag a continuous prepared range Qu Wenruo
2021-08-06  8:12 ` [PATCH v5 07/11] btrfs: defrag: introduce a helper to defrag a range Qu Wenruo
2021-08-23 19:21   ` David Sterba
2021-08-06  8:12 ` [PATCH v5 08/11] btrfs: defrag: introduce a new helper to defrag one cluster Qu Wenruo
2021-08-23 19:27   ` David Sterba
2021-08-06  8:12 ` [PATCH v5 09/11] btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file() Qu Wenruo
2021-08-09 11:32   ` Dan Carpenter
2021-08-09 12:13     ` Qu Wenruo
2021-08-10  6:19       ` Qu Wenruo
2021-08-06  8:12 ` [PATCH v5 10/11] btrfs: defrag: remove the old infrastructure Qu Wenruo
2021-08-06  8:12 ` [PATCH v5 11/11] btrfs: defrag: enable defrag for subpage case Qu Wenruo
2021-08-23 19:43 ` [PATCH v5 00/11] btrfs: defrag: rework to support sector perfect defrag David Sterba
2021-08-27  9:18   ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).