All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] btrfs: defrag: don't waste CPU time on non-target extent
@ 2022-01-28  8:12 Qu Wenruo
  2022-01-28  8:12 ` [PATCH 1/4] btrfs: uapi: introduce BTRFS_DEFRAG_RANGE_MASK for later sanity check Qu Wenruo
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Qu Wenruo @ 2022-01-28  8:12 UTC (permalink / raw)
  To: linux-btrfs

In the rework of btrfs_defrag_file() one core idea is to defrag cluster
by cluster, thus we can have a better layered code structure, just like
what we have now:

btrfs_defrag_file()
|- defrag_one_cluster()
   |- defrag_one_range()
      |- defrag_one_locked_range()

But there is a catch, btrfs_defrag_file() just moves the cluster to the
next cluster, never considering cases like the current extent is already
too large, we can skip to its end directly.

This increases CPU usage on very large but not fragmented files.

Fix the behavior in defrag_one_cluster() that, defrag_collect_targets()
will reports where next search should start from.

If the current extent is not a target at all, then we can jump to the
end of that non-target extent to save time.

To get the missing optimization, also introduce a new structure,
btrfs_defrag_ctrl, so we don't need to pass things like @newer_than and
@max_to_defrag around.

This also remove weird behaviors like reusing range::start for next
search location.

And since we need to convert old btrfs_ioctl_defrag_range_args to newer
btrfs_defrag_ctrl, also do extra sanity check in the converting
function.

Such cleanup will also bring us closer to expose these extra policy
parameters in future enhanced defrag ioctl interface.
(Unfortunately, the reserved space of the existing defrag ioctl is not
large enough to contain them all)

Qu Wenruo (4):
  btrfs: uapi: introduce BTRFS_DEFRAG_RANGE_MASK for later sanity check
  btrfs: defrag: introduce btrfs_defrag_ctrl structure for later usage
  btrfs: defrag: use btrfs_defrag_ctrl to replace
    btrfs_ioctl_defrag_range_args for btrfs_defrag_file()
  btrfs: defrag: allow defrag_one_cluster() to large extent which is not
    a target

 fs/btrfs/ctree.h           |  22 +++-
 fs/btrfs/file.c            |  17 ++--
 fs/btrfs/ioctl.c           | 199 ++++++++++++++++++++++---------------
 include/uapi/linux/btrfs.h |   6 +-
 4 files changed, 152 insertions(+), 92 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/4] btrfs: uapi: introduce BTRFS_DEFRAG_RANGE_MASK for later sanity check
  2022-01-28  8:12 [PATCH 0/4] btrfs: defrag: don't waste CPU time on non-target extent Qu Wenruo
@ 2022-01-28  8:12 ` Qu Wenruo
  2022-02-03 16:58   ` Filipe Manana
  2022-01-28  8:12 ` [PATCH 2/4] btrfs: defrag: introduce btrfs_defrag_ctrl structure for later usage Qu Wenruo
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2022-01-28  8:12 UTC (permalink / raw)
  To: linux-btrfs

And since we're here, replace the hardcoded bit flags (1, 2) with
(1UL << 0) (1UL << 2).

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 include/uapi/linux/btrfs.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 87925444d892..efd246e218d7 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -575,8 +575,10 @@ struct btrfs_ioctl_clone_range_args {
  * Used by:
  * struct btrfs_ioctl_defrag_range_args.flags
  */
-#define BTRFS_DEFRAG_RANGE_COMPRESS 1
-#define BTRFS_DEFRAG_RANGE_START_IO 2
+#define BTRFS_DEFRAG_RANGE_COMPRESS	(1UL << 0)
+#define BTRFS_DEFRAG_RANGE_START_IO	(1UL << 1)
+#define BTRFS_DEFRAG_RANGE_FLAGS_MASK	(BTRFS_DEFRAG_RANGE_COMPRESS |\
+					 BTRFS_DEFRAG_RANGE_START_IO)
 struct btrfs_ioctl_defrag_range_args {
 	/* start of the defrag operation */
 	__u64 start;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/4] btrfs: defrag: introduce btrfs_defrag_ctrl structure for later usage
  2022-01-28  8:12 [PATCH 0/4] btrfs: defrag: don't waste CPU time on non-target extent Qu Wenruo
  2022-01-28  8:12 ` [PATCH 1/4] btrfs: uapi: introduce BTRFS_DEFRAG_RANGE_MASK for later sanity check Qu Wenruo
@ 2022-01-28  8:12 ` Qu Wenruo
  2022-02-03 17:06   ` Filipe Manana
  2022-01-28  8:12 ` [PATCH 3/4] btrfs: defrag: use btrfs_defrag_ctrl to replace btrfs_ioctl_defrag_range_args for btrfs_defrag_file() Qu Wenruo
  2022-01-28  8:12 ` [PATCH 4/4] btrfs: defrag: allow defrag_one_cluster() to large extent which is not a target Qu Wenruo
  3 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2022-01-28  8:12 UTC (permalink / raw)
  To: linux-btrfs

Currently btrfs_defrag_file() accepts not only
btrfs_ioctl_defrag_range_args but also other parameters like @newer_than
and @max_sectors_to_defrag for extra policies.

Those extra values are hidden from defrag ioctl and even caused bugs in
the past due to different behaviors based on those extra values.

Here we introduce a new structure, btrfs_defrag_ctrl, to include:

- all members in btrfs_ioctl_defrag_range_args

- @max_sectors_to_defrag and @newer_than

- Extra values which callers of btrfs_defrag_file() may care
  Like @sectors_defragged and @last_scanned.

With the new structure, also introduce a new helper,
btrfs_defrag_ioctl_args_to_ctrl() to:

- Do extra sanity check on @compress and @flags

- Do range alignment when possible

- Set the default value.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h | 19 +++++++++++++++++++
 fs/btrfs/ioctl.c | 41 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index b4a9b1c58d22..0a441bd703a0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3267,6 +3267,25 @@ int btrfs_fileattr_set(struct user_namespace *mnt_userns,
 int btrfs_ioctl_get_supported_features(void __user *arg);
 void btrfs_sync_inode_flags_to_i_flags(struct inode *inode);
 int __pure btrfs_is_empty_uuid(u8 *uuid);
+
+struct btrfs_defrag_ctrl {
+	/* Values below are for defrag policy */
+	u64	start;
+	u64	len;
+	u32	extent_thresh;
+	u64	newer_than;
+	u64	max_sectors_to_defrag;
+	u8	compress;
+	u8	flags;
+
+	/* Values below are the defrag result */
+	u64	sectors_defragged;
+	u64	last_scanned;	/* Exclusive bytenr */
+};
+int btrfs_defrag_ioctl_args_to_ctrl(struct btrfs_fs_info *fs_info,
+				    struct btrfs_ioctl_defrag_range_args *args,
+				    struct btrfs_defrag_ctrl *ctrl,
+				    u64 max_sectors_to_defrag, u64 newer_than);
 int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
 		      struct btrfs_ioctl_defrag_range_args *range,
 		      u64 newer_than, unsigned long max_to_defrag);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 3d3331dd0902..f9b9ee6c50da 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1499,6 +1499,47 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
 	return ret;
 }
 
+/*
+ * Convert the old ioctl format to the new btrfs_defrag_ctrl structure.
+ *
+ * Will also do basic task like setting default values and sanity check.
+ */
+int btrfs_defrag_ioctl_args_to_ctrl(struct btrfs_fs_info *fs_info,
+				    struct btrfs_ioctl_defrag_range_args *args,
+				    struct btrfs_defrag_ctrl *ctrl,
+				    u64 max_sectors_to_defrag, u64 newer_than)
+{
+	u64 range_end;
+
+	if (args->flags & ~BTRFS_DEFRAG_RANGE_FLAGS_MASK)
+		return -EOPNOTSUPP;
+	if (args->compress_type >= BTRFS_NR_COMPRESS_TYPES)
+		return -EOPNOTSUPP;
+
+	ctrl->start = round_down(args->start, fs_info->sectorsize);
+	/*
+	 * If @len does not overflow with @start nor is -1, align the length.
+	 * Otherwise set it to (u64)-1 so later btrfs_defrag_file() will
+	 * determine the length using isize.
+	 */
+	if (!check_add_overflow(args->start, args->len, &range_end) &&
+	    args->len != (u64)-1)
+		ctrl->len = round_up(range_end, fs_info->sectorsize) -
+			    ctrl->start;
+	else
+		ctrl->len = -1;
+	ctrl->flags = args->flags;
+	ctrl->compress = args->compress_type;
+	if (args->extent_thresh == 0)
+		ctrl->extent_thresh = SZ_256K;
+	else
+		ctrl->extent_thresh = args->extent_thresh;
+	ctrl->newer_than = newer_than;
+	ctrl->last_scanned = 0;
+	ctrl->sectors_defragged = 0;
+	return 0;
+}
+
 /*
  * Entry point to file defragmentation.
  *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/4] btrfs: defrag: use btrfs_defrag_ctrl to replace btrfs_ioctl_defrag_range_args for btrfs_defrag_file()
  2022-01-28  8:12 [PATCH 0/4] btrfs: defrag: don't waste CPU time on non-target extent Qu Wenruo
  2022-01-28  8:12 ` [PATCH 1/4] btrfs: uapi: introduce BTRFS_DEFRAG_RANGE_MASK for later sanity check Qu Wenruo
  2022-01-28  8:12 ` [PATCH 2/4] btrfs: defrag: introduce btrfs_defrag_ctrl structure for later usage Qu Wenruo
@ 2022-01-28  8:12 ` Qu Wenruo
  2022-02-03 17:17   ` Filipe Manana
  2022-01-28  8:12 ` [PATCH 4/4] btrfs: defrag: allow defrag_one_cluster() to large extent which is not a target Qu Wenruo
  3 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2022-01-28  8:12 UTC (permalink / raw)
  To: linux-btrfs

This brings the following benefits:

- No more strange range->start update to indicate last scanned bytenr
  We have btrfs_defrag_ctrl::last_scanned (exclusive) for it directly.

- No more return value to indicate defragged sectors
  Now btrfs_defrag_file() will just return 0 if no error happened.
  And btrfs_defrag_ctrl::sectors_defragged will show that value.

- Less parameters to carry around
  Now most defrag_* functions only need to fetch its policy parameters
  from btrfs_defrag_ctrl directly.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h |   3 +-
 fs/btrfs/file.c  |  17 +++---
 fs/btrfs/ioctl.c | 144 +++++++++++++++++++++--------------------------
 3 files changed, 73 insertions(+), 91 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0a441bd703a0..b30271676e15 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3287,8 +3287,7 @@ int btrfs_defrag_ioctl_args_to_ctrl(struct btrfs_fs_info *fs_info,
 				    struct btrfs_defrag_ctrl *ctrl,
 				    u64 max_sectors_to_defrag, u64 newer_than);
 int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
-		      struct btrfs_ioctl_defrag_range_args *range,
-		      u64 newer_than, unsigned long max_to_defrag);
+		      struct btrfs_defrag_ctrl *ctrl);
 void btrfs_get_block_group_info(struct list_head *groups_list,
 				struct btrfs_ioctl_space_info *space);
 void btrfs_update_ioctl_balance_args(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 11204dbbe053..f5de8ab9787e 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -277,8 +277,7 @@ static int __btrfs_run_defrag_inode(struct btrfs_fs_info *fs_info,
 {
 	struct btrfs_root *inode_root;
 	struct inode *inode;
-	struct btrfs_ioctl_defrag_range_args range;
-	int num_defrag;
+	struct btrfs_defrag_ctrl ctrl = {};
 	int ret;
 
 	/* get the inode */
@@ -297,21 +296,21 @@ static int __btrfs_run_defrag_inode(struct btrfs_fs_info *fs_info,
 
 	/* do a chunk of defrag */
 	clear_bit(BTRFS_INODE_IN_DEFRAG, &BTRFS_I(inode)->runtime_flags);
-	memset(&range, 0, sizeof(range));
-	range.len = (u64)-1;
-	range.start = defrag->last_offset;
+	ctrl.len = (u64)-1;
+	ctrl.start = defrag->last_offset;
+	ctrl.newer_than = defrag->transid;
+	ctrl.max_sectors_to_defrag = BTRFS_DEFRAG_BATCH;
 
 	sb_start_write(fs_info->sb);
-	num_defrag = btrfs_defrag_file(inode, NULL, &range, defrag->transid,
-				       BTRFS_DEFRAG_BATCH);
+	ret = btrfs_defrag_file(inode, NULL, &ctrl);
 	sb_end_write(fs_info->sb);
 	/*
 	 * if we filled the whole defrag batch, there
 	 * must be more work to do.  Queue this defrag
 	 * again
 	 */
-	if (num_defrag == BTRFS_DEFRAG_BATCH) {
-		defrag->last_offset = range.start;
+	if (ctrl.sectors_defragged == BTRFS_DEFRAG_BATCH) {
+		defrag->last_offset = ctrl.last_scanned;
 		btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
 	} else if (defrag->last_offset && !defrag->cycled) {
 		/*
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index f9b9ee6c50da..67ba934be99e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1189,22 +1189,18 @@ struct defrag_target_range {
 /*
  * Collect all valid target extents.
  *
+ * @ctrl:	   extra defrag policy control
  * @start:	   file offset to lookup
  * @len:	   length to lookup
- * @extent_thresh: file extent size threshold, any extent size >= this value
- *		   will be ignored
- * @newer_than:    only defrag extents newer than this value
- * @do_compress:   whether the defrag is doing compression
- *		   if true, @extent_thresh will be ignored and all regular
- *		   file extents meeting @newer_than will be targets.
  * @locked:	   if the range has already held extent lock
  * @target_list:   list of targets file extents
  */
 static int defrag_collect_targets(struct btrfs_inode *inode,
-				  u64 start, u64 len, u32 extent_thresh,
-				  u64 newer_than, bool do_compress,
-				  bool locked, struct list_head *target_list)
+				  const struct btrfs_defrag_ctrl *ctrl,
+				  u64 start, u32 len, bool locked,
+				  struct list_head *target_list)
 {
+	bool do_compress = ctrl->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
 	u64 cur = start;
 	int ret = 0;
 
@@ -1224,7 +1220,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
 			goto next;
 
 		/* Skip older extent */
-		if (em->generation < newer_than)
+		if (em->generation < ctrl->newer_than)
 			goto next;
 
 		/*
@@ -1235,7 +1231,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
 			goto add;
 
 		/* Skip too large extent */
-		if (em->len >= extent_thresh)
+		if (em->len >= ctrl->extent_thresh)
 			goto next;
 
 		/*
@@ -1363,8 +1359,9 @@ static int defrag_one_locked_target(struct btrfs_inode *inode,
 	return ret;
 }
 
-static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len,
-			    u32 extent_thresh, u64 newer_than, bool do_compress)
+static int defrag_one_range(struct btrfs_inode *inode,
+			    const struct btrfs_defrag_ctrl *ctrl,
+			    u64 start, u32 len)
 {
 	struct extent_state *cached_state = NULL;
 	struct defrag_target_range *entry;
@@ -1408,8 +1405,7 @@ static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len,
 	 * And this time we have extent locked already, pass @locked = true
 	 * so that we won't relock the extent range and cause deadlock.
 	 */
-	ret = defrag_collect_targets(inode, start, len, extent_thresh,
-				     newer_than, do_compress, true,
+	ret = defrag_collect_targets(inode, ctrl, start, len, true,
 				     &target_list);
 	if (ret < 0)
 		goto unlock_extent;
@@ -1440,12 +1436,16 @@ static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len,
 	return ret;
 }
 
+/*
+ * Return <0 for error.
+ * Return >0 if we hit the ctrl->max_sectors_to_defrag limit
+ * Return 0 if we finished the range without error.
+ *
+ * For >= 0 case, ctrl->last_scanned and ctrl->sectors_defragged will be updated.
+ */
 static int defrag_one_cluster(struct btrfs_inode *inode,
 			      struct file_ra_state *ra,
-			      u64 start, u32 len, u32 extent_thresh,
-			      u64 newer_than, bool do_compress,
-			      unsigned long *sectors_defragged,
-			      unsigned long max_sectors)
+			      struct btrfs_defrag_ctrl *ctrl, u32 len)
 {
 	const u32 sectorsize = inode->root->fs_info->sectorsize;
 	struct defrag_target_range *entry;
@@ -1454,9 +1454,8 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
 	int ret;
 
 	BUILD_BUG_ON(!IS_ALIGNED(CLUSTER_SIZE, PAGE_SIZE));
-	ret = defrag_collect_targets(inode, start, len, extent_thresh,
-				     newer_than, do_compress, false,
-				     &target_list);
+	ret = defrag_collect_targets(inode, ctrl, ctrl->last_scanned, len,
+				     false, &target_list);
 	if (ret < 0)
 		goto out;
 
@@ -1464,14 +1463,16 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
 		u32 range_len = entry->len;
 
 		/* Reached or beyond the limit */
-		if (max_sectors && *sectors_defragged >= max_sectors) {
+		if (ctrl->max_sectors_to_defrag &&
+		    ctrl->sectors_defragged >= ctrl->max_sectors_to_defrag) {
 			ret = 1;
 			break;
 		}
 
-		if (max_sectors)
+		if (ctrl->max_sectors_to_defrag)
 			range_len = min_t(u32, range_len,
-				(max_sectors - *sectors_defragged) * sectorsize);
+					  (ctrl->max_sectors_to_defrag -
+					   ctrl->sectors_defragged) * sectorsize);
 
 		if (ra)
 			page_cache_sync_readahead(inode->vfs_inode.i_mapping,
@@ -1484,12 +1485,11 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
 		 * But that's fine, it only affects the @sectors_defragged
 		 * accounting.
 		 */
-		ret = defrag_one_range(inode, entry->start, range_len,
-				       extent_thresh, newer_than, do_compress);
+		ret = defrag_one_range(inode, ctrl, entry->start, range_len);
 		if (ret < 0)
 			break;
-		*sectors_defragged += range_len >>
-				      inode->root->fs_info->sectorsize_bits;
+		ctrl->sectors_defragged += range_len >>
+					  inode->root->fs_info->sectorsize_bits;
 	}
 out:
 	list_for_each_entry_safe(entry, tmp, &target_list, list) {
@@ -1545,58 +1545,43 @@ int btrfs_defrag_ioctl_args_to_ctrl(struct btrfs_fs_info *fs_info,
  *
  * @inode:	   inode to be defragged
  * @ra:		   readahead state (can be NUL)
- * @range:	   defrag options including range and flags
- * @newer_than:	   minimum transid to defrag
- * @max_to_defrag: max number of sectors to be defragged, if 0, the whole inode
- *		   will be defragged.
+ * @ctrl:	   defrag options including range and various policy parameters
  *
  * Return <0 for error.
- * Return >=0 for the number of sectors defragged, and range->start will be updated
- * to indicate the file offset where next defrag should be started at.
- * (Mostly for autodefrag, which sets @max_to_defrag thus we may exit early without
- *  defragging all the range).
+ * Return 0 if the defrag is done without error, ctrl->last_scanned and
+ * ctrl->sectors_defragged will be updated.
  */
 int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
-		      struct btrfs_ioctl_defrag_range_args *range,
-		      u64 newer_than, unsigned long max_to_defrag)
+		      struct btrfs_defrag_ctrl *ctrl)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	unsigned long sectors_defragged = 0;
 	u64 isize = i_size_read(inode);
-	u64 cur;
 	u64 last_byte;
-	bool do_compress = range->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
+	bool do_compress = ctrl->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
 	bool ra_allocated = false;
-	int compress_type = BTRFS_COMPRESS_ZLIB;
 	int ret = 0;
-	u32 extent_thresh = range->extent_thresh;
 
 	if (isize == 0)
 		return 0;
 
-	if (range->start >= isize)
+	if (ctrl->start >= isize)
 		return -EINVAL;
 
-	if (do_compress) {
-		if (range->compress_type >= BTRFS_NR_COMPRESS_TYPES)
-			return -EINVAL;
-		if (range->compress_type)
-			compress_type = range->compress_type;
-	}
-
-	if (extent_thresh == 0)
-		extent_thresh = SZ_256K;
+	if (do_compress)
+		ASSERT(ctrl->compress < BTRFS_NR_COMPRESS_TYPES);
 
-	if (range->start + range->len > range->start) {
+	if (ctrl->start + ctrl->len > ctrl->start) {
 		/* Got a specific range */
-		last_byte = min(isize, range->start + range->len);
+		last_byte = min(isize, ctrl->start + ctrl->len);
 	} else {
 		/* Defrag until file end */
 		last_byte = isize;
 	}
+	if (ctrl->extent_thresh == 0)
+		ctrl->extent_thresh = SZ_256K;
 
-	/* Align the range */
-	cur = round_down(range->start, fs_info->sectorsize);
+	ASSERT(IS_ALIGNED(ctrl->start, fs_info->sectorsize));
+	ctrl->last_scanned = ctrl->start;
 	last_byte = round_up(last_byte, fs_info->sectorsize) - 1;
 
 	/*
@@ -1611,14 +1596,14 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
 			file_ra_state_init(ra, inode->i_mapping);
 	}
 
-	while (cur < last_byte) {
+	while (ctrl->last_scanned < last_byte) {
 		u64 cluster_end;
 
 		/* The cluster size 256K should always be page aligned */
 		BUILD_BUG_ON(!IS_ALIGNED(CLUSTER_SIZE, PAGE_SIZE));
 
 		/* We want the cluster end at page boundary when possible */
-		cluster_end = (((cur >> PAGE_SHIFT) +
+		cluster_end = (((ctrl->last_scanned >> PAGE_SHIFT) +
 			       (SZ_256K >> PAGE_SHIFT)) << PAGE_SHIFT) - 1;
 		cluster_end = min(cluster_end, last_byte);
 
@@ -1633,15 +1618,13 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
 			break;
 		}
 		if (do_compress)
-			BTRFS_I(inode)->defrag_compress = compress_type;
-		ret = defrag_one_cluster(BTRFS_I(inode), ra, cur,
-				cluster_end + 1 - cur, extent_thresh,
-				newer_than, do_compress,
-				&sectors_defragged, max_to_defrag);
+			BTRFS_I(inode)->defrag_compress = ctrl->compress;
+		ret = defrag_one_cluster(BTRFS_I(inode), ra, ctrl,
+				cluster_end + 1 - ctrl->last_scanned);
 		btrfs_inode_unlock(inode, 0);
 		if (ret < 0)
 			break;
-		cur = cluster_end + 1;
+		ctrl->last_scanned = cluster_end + 1;
 		if (ret > 0) {
 			ret = 0;
 			break;
@@ -1650,27 +1633,21 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
 
 	if (ra_allocated)
 		kfree(ra);
-	/*
-	 * Update range.start for autodefrag, this will indicate where to start
-	 * in next run.
-	 */
-	range->start = cur;
-	if (sectors_defragged) {
+	if (ctrl->sectors_defragged) {
 		/*
 		 * We have defragged some sectors, for compression case they
 		 * need to be written back immediately.
 		 */
-		if (range->flags & BTRFS_DEFRAG_RANGE_START_IO) {
+		if (ctrl->flags & BTRFS_DEFRAG_RANGE_START_IO) {
 			filemap_flush(inode->i_mapping);
 			if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
 				     &BTRFS_I(inode)->runtime_flags))
 				filemap_flush(inode->i_mapping);
 		}
-		if (range->compress_type == BTRFS_COMPRESS_LZO)
+		if (ctrl->compress == BTRFS_COMPRESS_LZO)
 			btrfs_set_fs_incompat(fs_info, COMPRESS_LZO);
-		else if (range->compress_type == BTRFS_COMPRESS_ZSTD)
+		else if (ctrl->compress == BTRFS_COMPRESS_ZSTD)
 			btrfs_set_fs_incompat(fs_info, COMPRESS_ZSTD);
-		ret = sectors_defragged;
 	}
 	if (do_compress) {
 		btrfs_inode_lock(inode, 0);
@@ -3193,6 +3170,7 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp)
 	struct inode *inode = file_inode(file);
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct btrfs_ioctl_defrag_range_args range = {0};
+	struct btrfs_defrag_ctrl ctrl = {};
 	int ret;
 
 	ret = mnt_want_write_file(file);
@@ -3238,10 +3216,16 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp)
 			/* the rest are all set to zero by kzalloc */
 			range.len = (u64)-1;
 		}
-		ret = btrfs_defrag_file(file_inode(file), &file->f_ra,
-					&range, BTRFS_OLDEST_GENERATION, 0);
-		if (ret > 0)
-			ret = 0;
+		ret = btrfs_defrag_ioctl_args_to_ctrl(root->fs_info, &range,
+						      &ctrl, 0, BTRFS_OLDEST_GENERATION);
+		if (ret < 0)
+			break;
+		ret = btrfs_defrag_file(file_inode(file), &file->f_ra, &ctrl);
+		/*
+		 * Although progs doesn't check range->start, still try to keep
+		 * the same behavior to make direct ioctl caller happy.
+		 */
+		range.start = ctrl.last_scanned;
 		break;
 	default:
 		ret = -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/4] btrfs: defrag: allow defrag_one_cluster() to large extent which is not a target
  2022-01-28  8:12 [PATCH 0/4] btrfs: defrag: don't waste CPU time on non-target extent Qu Wenruo
                   ` (2 preceding siblings ...)
  2022-01-28  8:12 ` [PATCH 3/4] btrfs: defrag: use btrfs_defrag_ctrl to replace btrfs_ioctl_defrag_range_args for btrfs_defrag_file() Qu Wenruo
@ 2022-01-28  8:12 ` Qu Wenruo
  2022-02-03 17:39   ` Filipe Manana
  3 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2022-01-28  8:12 UTC (permalink / raw)
  To: linux-btrfs

In the rework of btrfs_defrag_file(), we always call
defrag_one_cluster() and increase the offset by cluster size, which is
only 256K.

But there are cases where we have a large extent (e.g. 128M) which
doesn't need to be defragged at all.

Before the refactor, we can directly skip the range, but now we have to
scan that extent map again and again until the cluster moves after the
non-target extent.

Fix the problem by allow defrag_one_cluster() to increase
btrfs_defrag_ctrl::last_scanned to the end of an extent, if and only if
the last extent of the cluster is not a target.

The test script looks like this:

	mkfs.btrfs -f $dev > /dev/null

	mount $dev $mnt

	# As btrfs ioctl uses 32M as extent_threshold
	xfs_io -f -c "pwrite 0 64M" $mnt/file1
	sync
	# Some fragemented range to defrag
	xfs_io -s -c "pwrite 65548k 4k" \
		  -c "pwrite 65544k 4k" \
		  -c "pwrite 65540k 4k" \
		  -c "pwrite 65536k 4k" \
		  $mnt/file1
	sync

	echo "=== before ==="
	xfs_io -c "fiemap -v" $mnt/file1
	echo "=== after ==="
	btrfs fi defrag $mnt/file1
	sync
	xfs_io -c "fiemap -v" $mnt/file1
	umount $mnt

With extra ftrace put into defrag_one_cluster(), before the patch it
would result tons of loops:

(As defrag_one_cluster() is inlined, the function name is its caller)

  btrfs-126062  [005] .....  4682.816026: btrfs_defrag_file: r/i=5/257 start=0 len=262144
  btrfs-126062  [005] .....  4682.816027: btrfs_defrag_file: r/i=5/257 start=262144 len=262144
  btrfs-126062  [005] .....  4682.816028: btrfs_defrag_file: r/i=5/257 start=524288 len=262144
  btrfs-126062  [005] .....  4682.816028: btrfs_defrag_file: r/i=5/257 start=786432 len=262144
  btrfs-126062  [005] .....  4682.816028: btrfs_defrag_file: r/i=5/257 start=1048576 len=262144
  ...
  btrfs-126062  [005] .....  4682.816043: btrfs_defrag_file: r/i=5/257 start=67108864 len=262144

But with this patch there will be just one loop, then directly to the
end of the extent:

  btrfs-130471  [014] .....  5434.029558: defrag_one_cluster: r/i=5/257 start=0 len=262144
  btrfs-130471  [014] .....  5434.029559: defrag_one_cluster: r/i=5/257 start=67108864 len=16384

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ioctl.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 67ba934be99e..592a542164a4 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1197,10 +1197,11 @@ struct defrag_target_range {
  */
 static int defrag_collect_targets(struct btrfs_inode *inode,
 				  const struct btrfs_defrag_ctrl *ctrl,
-				  u64 start, u32 len, bool locked,
-				  struct list_head *target_list)
+				  u64 start, u32 len, u64 *last_scanned_ret,
+				  bool locked, struct list_head *target_list)
 {
 	bool do_compress = ctrl->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
+	bool last_target = false;
 	u64 cur = start;
 	int ret = 0;
 
@@ -1210,6 +1211,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
 		bool next_mergeable = true;
 		u64 range_len;
 
+		last_target = false;
 		em = defrag_lookup_extent(&inode->vfs_inode, cur, locked);
 		if (!em)
 			break;
@@ -1259,6 +1261,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
 		}
 
 add:
+		last_target = true;
 		range_len = min(extent_map_end(em), start + len) - cur;
 		/*
 		 * This one is a good target, check if it can be merged into
@@ -1302,6 +1305,17 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
 			kfree(entry);
 		}
 	}
+	if (!ret && last_scanned_ret) {
+		/*
+		 * If the last extent is not a target, the caller can skip to
+		 * the end of that extent.
+		 * Otherwise, we can only go the end of the spcified range.
+		 */
+		if (!last_target)
+			*last_scanned_ret = cur;
+		else
+			*last_scanned_ret = start + len;
+	}
 	return ret;
 }
 
@@ -1405,7 +1419,7 @@ static int defrag_one_range(struct btrfs_inode *inode,
 	 * And this time we have extent locked already, pass @locked = true
 	 * so that we won't relock the extent range and cause deadlock.
 	 */
-	ret = defrag_collect_targets(inode, ctrl, start, len, true,
+	ret = defrag_collect_targets(inode, ctrl, start, len, NULL, true,
 				     &target_list);
 	if (ret < 0)
 		goto unlock_extent;
@@ -1448,6 +1462,7 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
 			      struct btrfs_defrag_ctrl *ctrl, u32 len)
 {
 	const u32 sectorsize = inode->root->fs_info->sectorsize;
+	u64 last_scanned;
 	struct defrag_target_range *entry;
 	struct defrag_target_range *tmp;
 	LIST_HEAD(target_list);
@@ -1455,7 +1470,7 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
 
 	BUILD_BUG_ON(!IS_ALIGNED(CLUSTER_SIZE, PAGE_SIZE));
 	ret = defrag_collect_targets(inode, ctrl, ctrl->last_scanned, len,
-				     false, &target_list);
+				     &last_scanned, false, &target_list);
 	if (ret < 0)
 		goto out;
 
@@ -1496,6 +1511,8 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
 		list_del_init(&entry->list);
 		kfree(entry);
 	}
+	if (!ret)
+		ctrl->last_scanned = last_scanned;
 	return ret;
 }
 
@@ -1624,7 +1641,6 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
 		btrfs_inode_unlock(inode, 0);
 		if (ret < 0)
 			break;
-		ctrl->last_scanned = cluster_end + 1;
 		if (ret > 0) {
 			ret = 0;
 			break;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] btrfs: uapi: introduce BTRFS_DEFRAG_RANGE_MASK for later sanity check
  2022-01-28  8:12 ` [PATCH 1/4] btrfs: uapi: introduce BTRFS_DEFRAG_RANGE_MASK for later sanity check Qu Wenruo
@ 2022-02-03 16:58   ` Filipe Manana
  0 siblings, 0 replies; 19+ messages in thread
From: Filipe Manana @ 2022-02-03 16:58 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Jan 28, 2022 at 04:12:55PM +0800, Qu Wenruo wrote:
> And since we're here, replace the hardcoded bit flags (1, 2) with
> (1UL << 0) (1UL << 2).

"... (1UL << 0) and (1UL << 1), respectively."

Other than that looks, fine, thanks.

> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  include/uapi/linux/btrfs.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
> index 87925444d892..efd246e218d7 100644
> --- a/include/uapi/linux/btrfs.h
> +++ b/include/uapi/linux/btrfs.h
> @@ -575,8 +575,10 @@ struct btrfs_ioctl_clone_range_args {
>   * Used by:
>   * struct btrfs_ioctl_defrag_range_args.flags
>   */
> -#define BTRFS_DEFRAG_RANGE_COMPRESS 1
> -#define BTRFS_DEFRAG_RANGE_START_IO 2
> +#define BTRFS_DEFRAG_RANGE_COMPRESS	(1UL << 0)
> +#define BTRFS_DEFRAG_RANGE_START_IO	(1UL << 1)
> +#define BTRFS_DEFRAG_RANGE_FLAGS_MASK	(BTRFS_DEFRAG_RANGE_COMPRESS |\
> +					 BTRFS_DEFRAG_RANGE_START_IO)
>  struct btrfs_ioctl_defrag_range_args {
>  	/* start of the defrag operation */
>  	__u64 start;
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] btrfs: defrag: introduce btrfs_defrag_ctrl structure for later usage
  2022-01-28  8:12 ` [PATCH 2/4] btrfs: defrag: introduce btrfs_defrag_ctrl structure for later usage Qu Wenruo
@ 2022-02-03 17:06   ` Filipe Manana
  0 siblings, 0 replies; 19+ messages in thread
From: Filipe Manana @ 2022-02-03 17:06 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Jan 28, 2022 at 04:12:56PM +0800, Qu Wenruo wrote:
> Currently btrfs_defrag_file() accepts not only
> btrfs_ioctl_defrag_range_args but also other parameters like @newer_than
> and @max_sectors_to_defrag for extra policies.
> 
> Those extra values are hidden from defrag ioctl and even caused bugs in
> the past due to different behaviors based on those extra values.
> 
> Here we introduce a new structure, btrfs_defrag_ctrl, to include:
> 
> - all members in btrfs_ioctl_defrag_range_args
> 
> - @max_sectors_to_defrag and @newer_than
> 
> - Extra values which callers of btrfs_defrag_file() may care
>   Like @sectors_defragged and @last_scanned.
> 
> With the new structure, also introduce a new helper,
> btrfs_defrag_ioctl_args_to_ctrl() to:
> 
> - Do extra sanity check on @compress and @flags
> 
> - Do range alignment when possible
> 
> - Set the default value.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/ctree.h | 19 +++++++++++++++++++
>  fs/btrfs/ioctl.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 60 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index b4a9b1c58d22..0a441bd703a0 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3267,6 +3267,25 @@ int btrfs_fileattr_set(struct user_namespace *mnt_userns,
>  int btrfs_ioctl_get_supported_features(void __user *arg);
>  void btrfs_sync_inode_flags_to_i_flags(struct inode *inode);
>  int __pure btrfs_is_empty_uuid(u8 *uuid);
> +
> +struct btrfs_defrag_ctrl {
> +	/* Values below are for defrag policy */

I think you meant input, read-only, only fields.

> +	u64	start;
> +	u64	len;
> +	u32	extent_thresh;
> +	u64	newer_than;
> +	u64	max_sectors_to_defrag;
> +	u8	compress;
> +	u8	flags;
> +
> +	/* Values below are the defrag result */

And here to say that they are output fields.
Makes it a lot more clear IMHO.

> +	u64	sectors_defragged;
> +	u64	last_scanned;	/* Exclusive bytenr */
> +};
> +int btrfs_defrag_ioctl_args_to_ctrl(struct btrfs_fs_info *fs_info,
> +				    struct btrfs_ioctl_defrag_range_args *args,
> +				    struct btrfs_defrag_ctrl *ctrl,
> +				    u64 max_sectors_to_defrag, u64 newer_than);
>  int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
>  		      struct btrfs_ioctl_defrag_range_args *range,
>  		      u64 newer_than, unsigned long max_to_defrag);
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 3d3331dd0902..f9b9ee6c50da 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1499,6 +1499,47 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
>  	return ret;
>  }
>  
> +/*
> + * Convert the old ioctl format to the new btrfs_defrag_ctrl structure.
> + *
> + * Will also do basic task like setting default values and sanity check.

task -> tasks, sanity check -> sanity checks

It's a bit awful to have two structures now, but I don't see a better
way around it.

Thanks.

> + */
> +int btrfs_defrag_ioctl_args_to_ctrl(struct btrfs_fs_info *fs_info,
> +				    struct btrfs_ioctl_defrag_range_args *args,
> +				    struct btrfs_defrag_ctrl *ctrl,
> +				    u64 max_sectors_to_defrag, u64 newer_than)
> +{
> +	u64 range_end;
> +
> +	if (args->flags & ~BTRFS_DEFRAG_RANGE_FLAGS_MASK)
> +		return -EOPNOTSUPP;
> +	if (args->compress_type >= BTRFS_NR_COMPRESS_TYPES)
> +		return -EOPNOTSUPP;
> +
> +	ctrl->start = round_down(args->start, fs_info->sectorsize);
> +	/*
> +	 * If @len does not overflow with @start nor is -1, align the length.
> +	 * Otherwise set it to (u64)-1 so later btrfs_defrag_file() will
> +	 * determine the length using isize.
> +	 */
> +	if (!check_add_overflow(args->start, args->len, &range_end) &&
> +	    args->len != (u64)-1)
> +		ctrl->len = round_up(range_end, fs_info->sectorsize) -
> +			    ctrl->start;
> +	else
> +		ctrl->len = -1;
> +	ctrl->flags = args->flags;
> +	ctrl->compress = args->compress_type;
> +	if (args->extent_thresh == 0)
> +		ctrl->extent_thresh = SZ_256K;
> +	else
> +		ctrl->extent_thresh = args->extent_thresh;
> +	ctrl->newer_than = newer_than;
> +	ctrl->last_scanned = 0;
> +	ctrl->sectors_defragged = 0;
> +	return 0;
> +}
> +
>  /*
>   * Entry point to file defragmentation.
>   *
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] btrfs: defrag: use btrfs_defrag_ctrl to replace btrfs_ioctl_defrag_range_args for btrfs_defrag_file()
  2022-01-28  8:12 ` [PATCH 3/4] btrfs: defrag: use btrfs_defrag_ctrl to replace btrfs_ioctl_defrag_range_args for btrfs_defrag_file() Qu Wenruo
@ 2022-02-03 17:17   ` Filipe Manana
  2022-02-04  0:30     ` Qu Wenruo
  0 siblings, 1 reply; 19+ messages in thread
From: Filipe Manana @ 2022-02-03 17:17 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Jan 28, 2022 at 04:12:57PM +0800, Qu Wenruo wrote:
> This brings the following benefits:
> 
> - No more strange range->start update to indicate last scanned bytenr
>   We have btrfs_defrag_ctrl::last_scanned (exclusive) for it directly.
> 
> - No more return value to indicate defragged sectors
>   Now btrfs_defrag_file() will just return 0 if no error happened.
>   And btrfs_defrag_ctrl::sectors_defragged will show that value.
> 
> - Less parameters to carry around
>   Now most defrag_* functions only need to fetch its policy parameters
>   from btrfs_defrag_ctrl directly.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/ctree.h |   3 +-
>  fs/btrfs/file.c  |  17 +++---
>  fs/btrfs/ioctl.c | 144 +++++++++++++++++++++--------------------------
>  3 files changed, 73 insertions(+), 91 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 0a441bd703a0..b30271676e15 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3287,8 +3287,7 @@ int btrfs_defrag_ioctl_args_to_ctrl(struct btrfs_fs_info *fs_info,
>  				    struct btrfs_defrag_ctrl *ctrl,
>  				    u64 max_sectors_to_defrag, u64 newer_than);
>  int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
> -		      struct btrfs_ioctl_defrag_range_args *range,
> -		      u64 newer_than, unsigned long max_to_defrag);
> +		      struct btrfs_defrag_ctrl *ctrl);
>  void btrfs_get_block_group_info(struct list_head *groups_list,
>  				struct btrfs_ioctl_space_info *space);
>  void btrfs_update_ioctl_balance_args(struct btrfs_fs_info *fs_info,
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 11204dbbe053..f5de8ab9787e 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -277,8 +277,7 @@ static int __btrfs_run_defrag_inode(struct btrfs_fs_info *fs_info,
>  {
>  	struct btrfs_root *inode_root;
>  	struct inode *inode;
> -	struct btrfs_ioctl_defrag_range_args range;
> -	int num_defrag;
> +	struct btrfs_defrag_ctrl ctrl = {};

Most of the code base uses { 0 } for this type of initialization.
IIRC some compiler versions complained about {} and preferred the
version with 0.

I think we should try to be consistent. Personally I find the 0 version
more clear. Though this might be bike shedding territory.


>  	int ret;
>  
>  	/* get the inode */
> @@ -297,21 +296,21 @@ static int __btrfs_run_defrag_inode(struct btrfs_fs_info *fs_info,
>  
>  	/* do a chunk of defrag */
>  	clear_bit(BTRFS_INODE_IN_DEFRAG, &BTRFS_I(inode)->runtime_flags);
> -	memset(&range, 0, sizeof(range));
> -	range.len = (u64)-1;
> -	range.start = defrag->last_offset;
> +	ctrl.len = (u64)-1;
> +	ctrl.start = defrag->last_offset;
> +	ctrl.newer_than = defrag->transid;
> +	ctrl.max_sectors_to_defrag = BTRFS_DEFRAG_BATCH;
>  
>  	sb_start_write(fs_info->sb);
> -	num_defrag = btrfs_defrag_file(inode, NULL, &range, defrag->transid,
> -				       BTRFS_DEFRAG_BATCH);
> +	ret = btrfs_defrag_file(inode, NULL, &ctrl);
>  	sb_end_write(fs_info->sb);
>  	/*
>  	 * if we filled the whole defrag batch, there
>  	 * must be more work to do.  Queue this defrag
>  	 * again
>  	 */
> -	if (num_defrag == BTRFS_DEFRAG_BATCH) {
> -		defrag->last_offset = range.start;
> +	if (ctrl.sectors_defragged == BTRFS_DEFRAG_BATCH) {
> +		defrag->last_offset = ctrl.last_scanned;
>  		btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
>  	} else if (defrag->last_offset && !defrag->cycled) {
>  		/*
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index f9b9ee6c50da..67ba934be99e 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1189,22 +1189,18 @@ struct defrag_target_range {
>  /*
>   * Collect all valid target extents.
>   *
> + * @ctrl:	   extra defrag policy control
>   * @start:	   file offset to lookup
>   * @len:	   length to lookup
> - * @extent_thresh: file extent size threshold, any extent size >= this value
> - *		   will be ignored
> - * @newer_than:    only defrag extents newer than this value
> - * @do_compress:   whether the defrag is doing compression
> - *		   if true, @extent_thresh will be ignored and all regular
> - *		   file extents meeting @newer_than will be targets.
>   * @locked:	   if the range has already held extent lock
>   * @target_list:   list of targets file extents
>   */
>  static int defrag_collect_targets(struct btrfs_inode *inode,
> -				  u64 start, u64 len, u32 extent_thresh,
> -				  u64 newer_than, bool do_compress,
> -				  bool locked, struct list_head *target_list)
> +				  const struct btrfs_defrag_ctrl *ctrl,
> +				  u64 start, u32 len, bool locked,
> +				  struct list_head *target_list)
>  {
> +	bool do_compress = ctrl->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
>  	u64 cur = start;
>  	int ret = 0;
>  
> @@ -1224,7 +1220,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>  			goto next;
>  
>  		/* Skip older extent */
> -		if (em->generation < newer_than)
> +		if (em->generation < ctrl->newer_than)
>  			goto next;
>  
>  		/*
> @@ -1235,7 +1231,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>  			goto add;
>  
>  		/* Skip too large extent */
> -		if (em->len >= extent_thresh)
> +		if (em->len >= ctrl->extent_thresh)
>  			goto next;
>  
>  		/*
> @@ -1363,8 +1359,9 @@ static int defrag_one_locked_target(struct btrfs_inode *inode,
>  	return ret;
>  }
>  
> -static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len,
> -			    u32 extent_thresh, u64 newer_than, bool do_compress)
> +static int defrag_one_range(struct btrfs_inode *inode,
> +			    const struct btrfs_defrag_ctrl *ctrl,
> +			    u64 start, u32 len)
>  {
>  	struct extent_state *cached_state = NULL;
>  	struct defrag_target_range *entry;
> @@ -1408,8 +1405,7 @@ static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len,
>  	 * And this time we have extent locked already, pass @locked = true
>  	 * so that we won't relock the extent range and cause deadlock.
>  	 */
> -	ret = defrag_collect_targets(inode, start, len, extent_thresh,
> -				     newer_than, do_compress, true,
> +	ret = defrag_collect_targets(inode, ctrl, start, len, true,
>  				     &target_list);
>  	if (ret < 0)
>  		goto unlock_extent;
> @@ -1440,12 +1436,16 @@ static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len,
>  	return ret;
>  }
>  
> +/*
> + * Return <0 for error.
> + * Return >0 if we hit the ctrl->max_sectors_to_defrag limit
> + * Return 0 if we finished the range without error.
> + *
> + * For >= 0 case, ctrl->last_scanned and ctrl->sectors_defragged will be updated.
> + */
>  static int defrag_one_cluster(struct btrfs_inode *inode,
>  			      struct file_ra_state *ra,
> -			      u64 start, u32 len, u32 extent_thresh,
> -			      u64 newer_than, bool do_compress,
> -			      unsigned long *sectors_defragged,
> -			      unsigned long max_sectors)
> +			      struct btrfs_defrag_ctrl *ctrl, u32 len)
>  {
>  	const u32 sectorsize = inode->root->fs_info->sectorsize;
>  	struct defrag_target_range *entry;
> @@ -1454,9 +1454,8 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
>  	int ret;
>  
>  	BUILD_BUG_ON(!IS_ALIGNED(CLUSTER_SIZE, PAGE_SIZE));
> -	ret = defrag_collect_targets(inode, start, len, extent_thresh,
> -				     newer_than, do_compress, false,
> -				     &target_list);
> +	ret = defrag_collect_targets(inode, ctrl, ctrl->last_scanned, len,
> +				     false, &target_list);
>  	if (ret < 0)
>  		goto out;
>  
> @@ -1464,14 +1463,16 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
>  		u32 range_len = entry->len;
>  
>  		/* Reached or beyond the limit */
> -		if (max_sectors && *sectors_defragged >= max_sectors) {
> +		if (ctrl->max_sectors_to_defrag &&
> +		    ctrl->sectors_defragged >= ctrl->max_sectors_to_defrag) {
>  			ret = 1;
>  			break;
>  		}
>  
> -		if (max_sectors)
> +		if (ctrl->max_sectors_to_defrag)
>  			range_len = min_t(u32, range_len,
> -				(max_sectors - *sectors_defragged) * sectorsize);
> +					  (ctrl->max_sectors_to_defrag -
> +					   ctrl->sectors_defragged) * sectorsize);
>  
>  		if (ra)
>  			page_cache_sync_readahead(inode->vfs_inode.i_mapping,
> @@ -1484,12 +1485,11 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
>  		 * But that's fine, it only affects the @sectors_defragged
>  		 * accounting.
>  		 */
> -		ret = defrag_one_range(inode, entry->start, range_len,
> -				       extent_thresh, newer_than, do_compress);
> +		ret = defrag_one_range(inode, ctrl, entry->start, range_len);
>  		if (ret < 0)
>  			break;
> -		*sectors_defragged += range_len >>
> -				      inode->root->fs_info->sectorsize_bits;
> +		ctrl->sectors_defragged += range_len >>
> +					  inode->root->fs_info->sectorsize_bits;
>  	}
>  out:
>  	list_for_each_entry_safe(entry, tmp, &target_list, list) {
> @@ -1545,58 +1545,43 @@ int btrfs_defrag_ioctl_args_to_ctrl(struct btrfs_fs_info *fs_info,
>   *
>   * @inode:	   inode to be defragged
>   * @ra:		   readahead state (can be NUL)
> - * @range:	   defrag options including range and flags
> - * @newer_than:	   minimum transid to defrag
> - * @max_to_defrag: max number of sectors to be defragged, if 0, the whole inode
> - *		   will be defragged.
> + * @ctrl:	   defrag options including range and various policy parameters
>   *
>   * Return <0 for error.
> - * Return >=0 for the number of sectors defragged, and range->start will be updated
> - * to indicate the file offset where next defrag should be started at.
> - * (Mostly for autodefrag, which sets @max_to_defrag thus we may exit early without
> - *  defragging all the range).
> + * Return 0 if the defrag is done without error, ctrl->last_scanned and
> + * ctrl->sectors_defragged will be updated.
>   */
>  int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
> -		      struct btrfs_ioctl_defrag_range_args *range,
> -		      u64 newer_than, unsigned long max_to_defrag)
> +		      struct btrfs_defrag_ctrl *ctrl)
>  {
>  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> -	unsigned long sectors_defragged = 0;
>  	u64 isize = i_size_read(inode);
> -	u64 cur;
>  	u64 last_byte;
> -	bool do_compress = range->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
> +	bool do_compress = ctrl->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
>  	bool ra_allocated = false;
> -	int compress_type = BTRFS_COMPRESS_ZLIB;
>  	int ret = 0;
> -	u32 extent_thresh = range->extent_thresh;
>  
>  	if (isize == 0)
>  		return 0;
>  
> -	if (range->start >= isize)
> +	if (ctrl->start >= isize)
>  		return -EINVAL;
>  
> -	if (do_compress) {
> -		if (range->compress_type >= BTRFS_NR_COMPRESS_TYPES)
> -			return -EINVAL;
> -		if (range->compress_type)
> -			compress_type = range->compress_type;
> -	}
> -
> -	if (extent_thresh == 0)
> -		extent_thresh = SZ_256K;
> +	if (do_compress)
> +		ASSERT(ctrl->compress < BTRFS_NR_COMPRESS_TYPES);
>  
> -	if (range->start + range->len > range->start) {
> +	if (ctrl->start + ctrl->len > ctrl->start) {
>  		/* Got a specific range */
> -		last_byte = min(isize, range->start + range->len);
> +		last_byte = min(isize, ctrl->start + ctrl->len);
>  	} else {
>  		/* Defrag until file end */
>  		last_byte = isize;
>  	}
> +	if (ctrl->extent_thresh == 0)
> +		ctrl->extent_thresh = SZ_256K;

Why did you move the logic for checking/setting the extent threshold?
You placed it now in the middle of the logic that sets 'last_byte', which is
not logical and makes it harder to follow.

I adjust the setup of 'last_byte' recently to avoid that:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6b34cd8e175bfbf4f3f01b6d19eae18245e1a8cc

>  
> -	/* Align the range */
> -	cur = round_down(range->start, fs_info->sectorsize);
> +	ASSERT(IS_ALIGNED(ctrl->start, fs_info->sectorsize));
> +	ctrl->last_scanned = ctrl->start;
>  	last_byte = round_up(last_byte, fs_info->sectorsize) - 1;
>  
>  	/*
> @@ -1611,14 +1596,14 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
>  			file_ra_state_init(ra, inode->i_mapping);
>  	}
>  
> -	while (cur < last_byte) {
> +	while (ctrl->last_scanned < last_byte) {
>  		u64 cluster_end;
>  
>  		/* The cluster size 256K should always be page aligned */
>  		BUILD_BUG_ON(!IS_ALIGNED(CLUSTER_SIZE, PAGE_SIZE));
>  
>  		/* We want the cluster end at page boundary when possible */
> -		cluster_end = (((cur >> PAGE_SHIFT) +
> +		cluster_end = (((ctrl->last_scanned >> PAGE_SHIFT) +
>  			       (SZ_256K >> PAGE_SHIFT)) << PAGE_SHIFT) - 1;
>  		cluster_end = min(cluster_end, last_byte);
>  
> @@ -1633,15 +1618,13 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
>  			break;
>  		}
>  		if (do_compress)
> -			BTRFS_I(inode)->defrag_compress = compress_type;
> -		ret = defrag_one_cluster(BTRFS_I(inode), ra, cur,
> -				cluster_end + 1 - cur, extent_thresh,
> -				newer_than, do_compress,
> -				&sectors_defragged, max_to_defrag);
> +			BTRFS_I(inode)->defrag_compress = ctrl->compress;
> +		ret = defrag_one_cluster(BTRFS_I(inode), ra, ctrl,
> +				cluster_end + 1 - ctrl->last_scanned);
>  		btrfs_inode_unlock(inode, 0);
>  		if (ret < 0)
>  			break;
> -		cur = cluster_end + 1;
> +		ctrl->last_scanned = cluster_end + 1;
>  		if (ret > 0) {
>  			ret = 0;
>  			break;
> @@ -1650,27 +1633,21 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
>  
>  	if (ra_allocated)
>  		kfree(ra);
> -	/*
> -	 * Update range.start for autodefrag, this will indicate where to start
> -	 * in next run.
> -	 */
> -	range->start = cur;
> -	if (sectors_defragged) {
> +	if (ctrl->sectors_defragged) {
>  		/*
>  		 * We have defragged some sectors, for compression case they
>  		 * need to be written back immediately.
>  		 */
> -		if (range->flags & BTRFS_DEFRAG_RANGE_START_IO) {
> +		if (ctrl->flags & BTRFS_DEFRAG_RANGE_START_IO) {
>  			filemap_flush(inode->i_mapping);
>  			if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
>  				     &BTRFS_I(inode)->runtime_flags))
>  				filemap_flush(inode->i_mapping);
>  		}
> -		if (range->compress_type == BTRFS_COMPRESS_LZO)
> +		if (ctrl->compress == BTRFS_COMPRESS_LZO)
>  			btrfs_set_fs_incompat(fs_info, COMPRESS_LZO);
> -		else if (range->compress_type == BTRFS_COMPRESS_ZSTD)
> +		else if (ctrl->compress == BTRFS_COMPRESS_ZSTD)
>  			btrfs_set_fs_incompat(fs_info, COMPRESS_ZSTD);
> -		ret = sectors_defragged;
>  	}
>  	if (do_compress) {
>  		btrfs_inode_lock(inode, 0);
> @@ -3193,6 +3170,7 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp)
>  	struct inode *inode = file_inode(file);
>  	struct btrfs_root *root = BTRFS_I(inode)->root;
>  	struct btrfs_ioctl_defrag_range_args range = {0};
> +	struct btrfs_defrag_ctrl ctrl = {};

Same here, but more confusing here since both styles are used one after
another.

Other than that, it looks fine. Thanks.

>  	int ret;
>  
>  	ret = mnt_want_write_file(file);
> @@ -3238,10 +3216,16 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp)
>  			/* the rest are all set to zero by kzalloc */
>  			range.len = (u64)-1;
>  		}
> -		ret = btrfs_defrag_file(file_inode(file), &file->f_ra,
> -					&range, BTRFS_OLDEST_GENERATION, 0);
> -		if (ret > 0)
> -			ret = 0;
> +		ret = btrfs_defrag_ioctl_args_to_ctrl(root->fs_info, &range,
> +						      &ctrl, 0, BTRFS_OLDEST_GENERATION);
> +		if (ret < 0)
> +			break;
> +		ret = btrfs_defrag_file(file_inode(file), &file->f_ra, &ctrl);
> +		/*
> +		 * Although progs doesn't check range->start, still try to keep
> +		 * the same behavior to make direct ioctl caller happy.
> +		 */
> +		range.start = ctrl.last_scanned;
>  		break;
>  	default:
>  		ret = -EINVAL;
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] btrfs: defrag: allow defrag_one_cluster() to large extent which is not a target
  2022-01-28  8:12 ` [PATCH 4/4] btrfs: defrag: allow defrag_one_cluster() to large extent which is not a target Qu Wenruo
@ 2022-02-03 17:39   ` Filipe Manana
  2022-02-04  0:39     ` Qu Wenruo
  0 siblings, 1 reply; 19+ messages in thread
From: Filipe Manana @ 2022-02-03 17:39 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Jan 28, 2022 at 04:12:58PM +0800, Qu Wenruo wrote:
> In the rework of btrfs_defrag_file(), we always call
> defrag_one_cluster() and increase the offset by cluster size, which is
> only 256K.
> 
> But there are cases where we have a large extent (e.g. 128M) which
> doesn't need to be defragged at all.
> 
> Before the refactor, we can directly skip the range, but now we have to
> scan that extent map again and again until the cluster moves after the
> non-target extent.
> 
> Fix the problem by allow defrag_one_cluster() to increase
> btrfs_defrag_ctrl::last_scanned to the end of an extent, if and only if
> the last extent of the cluster is not a target.
> 
> The test script looks like this:
> 
> 	mkfs.btrfs -f $dev > /dev/null
> 
> 	mount $dev $mnt
> 
> 	# As btrfs ioctl uses 32M as extent_threshold
> 	xfs_io -f -c "pwrite 0 64M" $mnt/file1
> 	sync
> 	# Some fragemented range to defrag
> 	xfs_io -s -c "pwrite 65548k 4k" \
> 		  -c "pwrite 65544k 4k" \
> 		  -c "pwrite 65540k 4k" \
> 		  -c "pwrite 65536k 4k" \
> 		  $mnt/file1
> 	sync
> 
> 	echo "=== before ==="
> 	xfs_io -c "fiemap -v" $mnt/file1
> 	echo "=== after ==="
> 	btrfs fi defrag $mnt/file1
> 	sync
> 	xfs_io -c "fiemap -v" $mnt/file1
> 	umount $mnt
> 
> With extra ftrace put into defrag_one_cluster(), before the patch it
> would result tons of loops:
> 
> (As defrag_one_cluster() is inlined, the function name is its caller)
> 
>   btrfs-126062  [005] .....  4682.816026: btrfs_defrag_file: r/i=5/257 start=0 len=262144
>   btrfs-126062  [005] .....  4682.816027: btrfs_defrag_file: r/i=5/257 start=262144 len=262144
>   btrfs-126062  [005] .....  4682.816028: btrfs_defrag_file: r/i=5/257 start=524288 len=262144
>   btrfs-126062  [005] .....  4682.816028: btrfs_defrag_file: r/i=5/257 start=786432 len=262144
>   btrfs-126062  [005] .....  4682.816028: btrfs_defrag_file: r/i=5/257 start=1048576 len=262144
>   ...
>   btrfs-126062  [005] .....  4682.816043: btrfs_defrag_file: r/i=5/257 start=67108864 len=262144
> 
> But with this patch there will be just one loop, then directly to the
> end of the extent:
> 
>   btrfs-130471  [014] .....  5434.029558: defrag_one_cluster: r/i=5/257 start=0 len=262144
>   btrfs-130471  [014] .....  5434.029559: defrag_one_cluster: r/i=5/257 start=67108864 len=16384

Did you also measure how much time was spent before and after?
It would be interesting to have it here in case you have measured it.
If you don't have it, then it's fine.

> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/ioctl.c | 26 +++++++++++++++++++++-----
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 67ba934be99e..592a542164a4 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1197,10 +1197,11 @@ struct defrag_target_range {
>   */
>  static int defrag_collect_targets(struct btrfs_inode *inode,
>  				  const struct btrfs_defrag_ctrl *ctrl,
> -				  u64 start, u32 len, bool locked,
> -				  struct list_head *target_list)
> +				  u64 start, u32 len, u64 *last_scanned_ret,
> +				  bool locked, struct list_head *target_list)
>  {
>  	bool do_compress = ctrl->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
> +	bool last_target = false;
>  	u64 cur = start;
>  	int ret = 0;
>  
> @@ -1210,6 +1211,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>  		bool next_mergeable = true;
>  		u64 range_len;
>  
> +		last_target = false;
>  		em = defrag_lookup_extent(&inode->vfs_inode, cur, locked);
>  		if (!em)
>  			break;
> @@ -1259,6 +1261,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>  		}
>  
>  add:
> +		last_target = true;
>  		range_len = min(extent_map_end(em), start + len) - cur;
>  		/*
>  		 * This one is a good target, check if it can be merged into
> @@ -1302,6 +1305,17 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>  			kfree(entry);
>  		}
>  	}
> +	if (!ret && last_scanned_ret) {
> +		/*
> +		 * If the last extent is not a target, the caller can skip to
> +		 * the end of that extent.
> +		 * Otherwise, we can only go the end of the spcified range.
> +		 */
> +		if (!last_target)
> +			*last_scanned_ret = cur;
> +		else
> +			*last_scanned_ret = start + len;

Might be just me, but I found the name "last_target" a bit harder to
decipher. Something like "last_is_target" seems more clear to me, as
it indicates if the last extent we found was a target for defrag.

> +	}
>  	return ret;
>  }
>  
> @@ -1405,7 +1419,7 @@ static int defrag_one_range(struct btrfs_inode *inode,
>  	 * And this time we have extent locked already, pass @locked = true
>  	 * so that we won't relock the extent range and cause deadlock.
>  	 */
> -	ret = defrag_collect_targets(inode, ctrl, start, len, true,
> +	ret = defrag_collect_targets(inode, ctrl, start, len, NULL, true,
>  				     &target_list);
>  	if (ret < 0)
>  		goto unlock_extent;
> @@ -1448,6 +1462,7 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
>  			      struct btrfs_defrag_ctrl *ctrl, u32 len)
>  {
>  	const u32 sectorsize = inode->root->fs_info->sectorsize;
> +	u64 last_scanned;
>  	struct defrag_target_range *entry;
>  	struct defrag_target_range *tmp;
>  	LIST_HEAD(target_list);
> @@ -1455,7 +1470,7 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
>  
>  	BUILD_BUG_ON(!IS_ALIGNED(CLUSTER_SIZE, PAGE_SIZE));
>  	ret = defrag_collect_targets(inode, ctrl, ctrl->last_scanned, len,
> -				     false, &target_list);
> +				     &last_scanned, false, &target_list);

So I would revert the logic here.
What I mean is, to pass a non-NULL pointer *last_scanned_ret to defrag_collect_targets()
when it's called at defrag_one_range(), and then here, at defrag_one_cluster(), pass it NULL
instead.

The reason is simple and I think makes more sense:

1) defrag_one_cluster() does a first call to defrag_collect_targets() to scan
   for the extents maps in a range without locking the range in the inode's
   iotree;

2) Then for each range it collects, we call defrag_one_range(). That will
   lock the range in the io tree and then call again defrag_collect_targets(),
   this time extent maps may have changed (due to concurrent mmap writes, etc).
   So it's this second time that we can have an accurate value for
   *last_scanned_ret

That would deal with the case where first pass considered a range for
defrag, but then in the second pass we skipped a subrange because in the
meanwhile, before we locked the range in the io tree, it got a large extent.

Thanks.
   

>  	if (ret < 0)
>  		goto out;
>  
> @@ -1496,6 +1511,8 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
>  		list_del_init(&entry->list);
>  		kfree(entry);
>  	}
> +	if (!ret)
> +		ctrl->last_scanned = last_scanned;
>  	return ret;
>  }
>  
> @@ -1624,7 +1641,6 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
>  		btrfs_inode_unlock(inode, 0);
>  		if (ret < 0)
>  			break;
> -		ctrl->last_scanned = cluster_end + 1;
>  		if (ret > 0) {
>  			ret = 0;
>  			break;
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] btrfs: defrag: use btrfs_defrag_ctrl to replace btrfs_ioctl_defrag_range_args for btrfs_defrag_file()
  2022-02-03 17:17   ` Filipe Manana
@ 2022-02-04  0:30     ` Qu Wenruo
  2022-02-04  1:03       ` Qu Wenruo
  2022-02-04 17:57       ` David Sterba
  0 siblings, 2 replies; 19+ messages in thread
From: Qu Wenruo @ 2022-02-04  0:30 UTC (permalink / raw)
  To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs



On 2022/2/4 01:17, Filipe Manana wrote:
> On Fri, Jan 28, 2022 at 04:12:57PM +0800, Qu Wenruo wrote:
>> This brings the following benefits:
>>
>> - No more strange range->start update to indicate last scanned bytenr
>>    We have btrfs_defrag_ctrl::last_scanned (exclusive) for it directly.
>>
>> - No more return value to indicate defragged sectors
>>    Now btrfs_defrag_file() will just return 0 if no error happened.
>>    And btrfs_defrag_ctrl::sectors_defragged will show that value.
>>
>> - Less parameters to carry around
>>    Now most defrag_* functions only need to fetch its policy parameters
>>    from btrfs_defrag_ctrl directly.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/ctree.h |   3 +-
>>   fs/btrfs/file.c  |  17 +++---
>>   fs/btrfs/ioctl.c | 144 +++++++++++++++++++++--------------------------
>>   3 files changed, 73 insertions(+), 91 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 0a441bd703a0..b30271676e15 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -3287,8 +3287,7 @@ int btrfs_defrag_ioctl_args_to_ctrl(struct btrfs_fs_info *fs_info,
>>   				    struct btrfs_defrag_ctrl *ctrl,
>>   				    u64 max_sectors_to_defrag, u64 newer_than);
>>   int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
>> -		      struct btrfs_ioctl_defrag_range_args *range,
>> -		      u64 newer_than, unsigned long max_to_defrag);
>> +		      struct btrfs_defrag_ctrl *ctrl);
>>   void btrfs_get_block_group_info(struct list_head *groups_list,
>>   				struct btrfs_ioctl_space_info *space);
>>   void btrfs_update_ioctl_balance_args(struct btrfs_fs_info *fs_info,
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index 11204dbbe053..f5de8ab9787e 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -277,8 +277,7 @@ static int __btrfs_run_defrag_inode(struct btrfs_fs_info *fs_info,
>>   {
>>   	struct btrfs_root *inode_root;
>>   	struct inode *inode;
>> -	struct btrfs_ioctl_defrag_range_args range;
>> -	int num_defrag;
>> +	struct btrfs_defrag_ctrl ctrl = {};
>
> Most of the code base uses { 0 } for this type of initialization.
> IIRC some compiler versions complained about {} and preferred the
> version with 0.

Exactly what I preferred, but David mentioned that kernel is moving
torwards {} thus I that's what I go.

>
> I think we should try to be consistent. Personally I find the 0 version
> more clear. Though this might be bike shedding territory.
>
>
>>   	int ret;
>>
>>   	/* get the inode */
>> @@ -297,21 +296,21 @@ static int __btrfs_run_defrag_inode(struct btrfs_fs_info *fs_info,
>>
>>   	/* do a chunk of defrag */
>>   	clear_bit(BTRFS_INODE_IN_DEFRAG, &BTRFS_I(inode)->runtime_flags);
>> -	memset(&range, 0, sizeof(range));
>> -	range.len = (u64)-1;
>> -	range.start = defrag->last_offset;
>> +	ctrl.len = (u64)-1;
>> +	ctrl.start = defrag->last_offset;
>> +	ctrl.newer_than = defrag->transid;
>> +	ctrl.max_sectors_to_defrag = BTRFS_DEFRAG_BATCH;
>>
>>   	sb_start_write(fs_info->sb);
>> -	num_defrag = btrfs_defrag_file(inode, NULL, &range, defrag->transid,
>> -				       BTRFS_DEFRAG_BATCH);
>> +	ret = btrfs_defrag_file(inode, NULL, &ctrl);
>>   	sb_end_write(fs_info->sb);
>>   	/*
>>   	 * if we filled the whole defrag batch, there
>>   	 * must be more work to do.  Queue this defrag
>>   	 * again
>>   	 */
>> -	if (num_defrag == BTRFS_DEFRAG_BATCH) {
>> -		defrag->last_offset = range.start;
>> +	if (ctrl.sectors_defragged == BTRFS_DEFRAG_BATCH) {
>> +		defrag->last_offset = ctrl.last_scanned;
>>   		btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
>>   	} else if (defrag->last_offset && !defrag->cycled) {
>>   		/*
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index f9b9ee6c50da..67ba934be99e 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -1189,22 +1189,18 @@ struct defrag_target_range {
>>   /*
>>    * Collect all valid target extents.
>>    *
>> + * @ctrl:	   extra defrag policy control
>>    * @start:	   file offset to lookup
>>    * @len:	   length to lookup
>> - * @extent_thresh: file extent size threshold, any extent size >= this value
>> - *		   will be ignored
>> - * @newer_than:    only defrag extents newer than this value
>> - * @do_compress:   whether the defrag is doing compression
>> - *		   if true, @extent_thresh will be ignored and all regular
>> - *		   file extents meeting @newer_than will be targets.
>>    * @locked:	   if the range has already held extent lock
>>    * @target_list:   list of targets file extents
>>    */
>>   static int defrag_collect_targets(struct btrfs_inode *inode,
>> -				  u64 start, u64 len, u32 extent_thresh,
>> -				  u64 newer_than, bool do_compress,
>> -				  bool locked, struct list_head *target_list)
>> +				  const struct btrfs_defrag_ctrl *ctrl,
>> +				  u64 start, u32 len, bool locked,
>> +				  struct list_head *target_list)
>>   {
>> +	bool do_compress = ctrl->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
>>   	u64 cur = start;
>>   	int ret = 0;
>>
>> @@ -1224,7 +1220,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>>   			goto next;
>>
>>   		/* Skip older extent */
>> -		if (em->generation < newer_than)
>> +		if (em->generation < ctrl->newer_than)
>>   			goto next;
>>
>>   		/*
>> @@ -1235,7 +1231,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>>   			goto add;
>>
>>   		/* Skip too large extent */
>> -		if (em->len >= extent_thresh)
>> +		if (em->len >= ctrl->extent_thresh)
>>   			goto next;
>>
>>   		/*
>> @@ -1363,8 +1359,9 @@ static int defrag_one_locked_target(struct btrfs_inode *inode,
>>   	return ret;
>>   }
>>
>> -static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len,
>> -			    u32 extent_thresh, u64 newer_than, bool do_compress)
>> +static int defrag_one_range(struct btrfs_inode *inode,
>> +			    const struct btrfs_defrag_ctrl *ctrl,
>> +			    u64 start, u32 len)
>>   {
>>   	struct extent_state *cached_state = NULL;
>>   	struct defrag_target_range *entry;
>> @@ -1408,8 +1405,7 @@ static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len,
>>   	 * And this time we have extent locked already, pass @locked = true
>>   	 * so that we won't relock the extent range and cause deadlock.
>>   	 */
>> -	ret = defrag_collect_targets(inode, start, len, extent_thresh,
>> -				     newer_than, do_compress, true,
>> +	ret = defrag_collect_targets(inode, ctrl, start, len, true,
>>   				     &target_list);
>>   	if (ret < 0)
>>   		goto unlock_extent;
>> @@ -1440,12 +1436,16 @@ static int defrag_one_range(struct btrfs_inode *inode, u64 start, u32 len,
>>   	return ret;
>>   }
>>
>> +/*
>> + * Return <0 for error.
>> + * Return >0 if we hit the ctrl->max_sectors_to_defrag limit
>> + * Return 0 if we finished the range without error.
>> + *
>> + * For >= 0 case, ctrl->last_scanned and ctrl->sectors_defragged will be updated.
>> + */
>>   static int defrag_one_cluster(struct btrfs_inode *inode,
>>   			      struct file_ra_state *ra,
>> -			      u64 start, u32 len, u32 extent_thresh,
>> -			      u64 newer_than, bool do_compress,
>> -			      unsigned long *sectors_defragged,
>> -			      unsigned long max_sectors)
>> +			      struct btrfs_defrag_ctrl *ctrl, u32 len)
>>   {
>>   	const u32 sectorsize = inode->root->fs_info->sectorsize;
>>   	struct defrag_target_range *entry;
>> @@ -1454,9 +1454,8 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
>>   	int ret;
>>
>>   	BUILD_BUG_ON(!IS_ALIGNED(CLUSTER_SIZE, PAGE_SIZE));
>> -	ret = defrag_collect_targets(inode, start, len, extent_thresh,
>> -				     newer_than, do_compress, false,
>> -				     &target_list);
>> +	ret = defrag_collect_targets(inode, ctrl, ctrl->last_scanned, len,
>> +				     false, &target_list);
>>   	if (ret < 0)
>>   		goto out;
>>
>> @@ -1464,14 +1463,16 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
>>   		u32 range_len = entry->len;
>>
>>   		/* Reached or beyond the limit */
>> -		if (max_sectors && *sectors_defragged >= max_sectors) {
>> +		if (ctrl->max_sectors_to_defrag &&
>> +		    ctrl->sectors_defragged >= ctrl->max_sectors_to_defrag) {
>>   			ret = 1;
>>   			break;
>>   		}
>>
>> -		if (max_sectors)
>> +		if (ctrl->max_sectors_to_defrag)
>>   			range_len = min_t(u32, range_len,
>> -				(max_sectors - *sectors_defragged) * sectorsize);
>> +					  (ctrl->max_sectors_to_defrag -
>> +					   ctrl->sectors_defragged) * sectorsize);
>>
>>   		if (ra)
>>   			page_cache_sync_readahead(inode->vfs_inode.i_mapping,
>> @@ -1484,12 +1485,11 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
>>   		 * But that's fine, it only affects the @sectors_defragged
>>   		 * accounting.
>>   		 */
>> -		ret = defrag_one_range(inode, entry->start, range_len,
>> -				       extent_thresh, newer_than, do_compress);
>> +		ret = defrag_one_range(inode, ctrl, entry->start, range_len);
>>   		if (ret < 0)
>>   			break;
>> -		*sectors_defragged += range_len >>
>> -				      inode->root->fs_info->sectorsize_bits;
>> +		ctrl->sectors_defragged += range_len >>
>> +					  inode->root->fs_info->sectorsize_bits;
>>   	}
>>   out:
>>   	list_for_each_entry_safe(entry, tmp, &target_list, list) {
>> @@ -1545,58 +1545,43 @@ int btrfs_defrag_ioctl_args_to_ctrl(struct btrfs_fs_info *fs_info,
>>    *
>>    * @inode:	   inode to be defragged
>>    * @ra:		   readahead state (can be NUL)
>> - * @range:	   defrag options including range and flags
>> - * @newer_than:	   minimum transid to defrag
>> - * @max_to_defrag: max number of sectors to be defragged, if 0, the whole inode
>> - *		   will be defragged.
>> + * @ctrl:	   defrag options including range and various policy parameters
>>    *
>>    * Return <0 for error.
>> - * Return >=0 for the number of sectors defragged, and range->start will be updated
>> - * to indicate the file offset where next defrag should be started at.
>> - * (Mostly for autodefrag, which sets @max_to_defrag thus we may exit early without
>> - *  defragging all the range).
>> + * Return 0 if the defrag is done without error, ctrl->last_scanned and
>> + * ctrl->sectors_defragged will be updated.
>>    */
>>   int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
>> -		      struct btrfs_ioctl_defrag_range_args *range,
>> -		      u64 newer_than, unsigned long max_to_defrag)
>> +		      struct btrfs_defrag_ctrl *ctrl)
>>   {
>>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>> -	unsigned long sectors_defragged = 0;
>>   	u64 isize = i_size_read(inode);
>> -	u64 cur;
>>   	u64 last_byte;
>> -	bool do_compress = range->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
>> +	bool do_compress = ctrl->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
>>   	bool ra_allocated = false;
>> -	int compress_type = BTRFS_COMPRESS_ZLIB;
>>   	int ret = 0;
>> -	u32 extent_thresh = range->extent_thresh;
>>
>>   	if (isize == 0)
>>   		return 0;
>>
>> -	if (range->start >= isize)
>> +	if (ctrl->start >= isize)
>>   		return -EINVAL;
>>
>> -	if (do_compress) {
>> -		if (range->compress_type >= BTRFS_NR_COMPRESS_TYPES)
>> -			return -EINVAL;
>> -		if (range->compress_type)
>> -			compress_type = range->compress_type;
>> -	}
>> -
>> -	if (extent_thresh == 0)
>> -		extent_thresh = SZ_256K;
>> +	if (do_compress)
>> +		ASSERT(ctrl->compress < BTRFS_NR_COMPRESS_TYPES);
>>
>> -	if (range->start + range->len > range->start) {
>> +	if (ctrl->start + ctrl->len > ctrl->start) {
>>   		/* Got a specific range */
>> -		last_byte = min(isize, range->start + range->len);
>> +		last_byte = min(isize, ctrl->start + ctrl->len);
>>   	} else {
>>   		/* Defrag until file end */
>>   		last_byte = isize;
>>   	}
>> +	if (ctrl->extent_thresh == 0)
>> +		ctrl->extent_thresh = SZ_256K;
>
> Why did you move the logic for checking/setting the extent threshold?
> You placed it now in the middle of the logic that sets 'last_byte', which is
> not logical and makes it harder to follow.
>
> I adjust the setup of 'last_byte' recently to avoid that:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6b34cd8e175bfbf4f3f01b6d19eae18245e1a8cc

My bad, originally I put the default value setting into the new helper
btrfs_defrag_ioctl_args_to_ctrl(), but that won't handle autodefrag and
test cases exposed this problem.

Then when adding back the assignment I put it into a different location.

Should keep it where it was.

Thanks,
Qu
>
>>
>> -	/* Align the range */
>> -	cur = round_down(range->start, fs_info->sectorsize);
>> +	ASSERT(IS_ALIGNED(ctrl->start, fs_info->sectorsize));
>> +	ctrl->last_scanned = ctrl->start;
>>   	last_byte = round_up(last_byte, fs_info->sectorsize) - 1;
>>
>>   	/*
>> @@ -1611,14 +1596,14 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
>>   			file_ra_state_init(ra, inode->i_mapping);
>>   	}
>>
>> -	while (cur < last_byte) {
>> +	while (ctrl->last_scanned < last_byte) {
>>   		u64 cluster_end;
>>
>>   		/* The cluster size 256K should always be page aligned */
>>   		BUILD_BUG_ON(!IS_ALIGNED(CLUSTER_SIZE, PAGE_SIZE));
>>
>>   		/* We want the cluster end at page boundary when possible */
>> -		cluster_end = (((cur >> PAGE_SHIFT) +
>> +		cluster_end = (((ctrl->last_scanned >> PAGE_SHIFT) +
>>   			       (SZ_256K >> PAGE_SHIFT)) << PAGE_SHIFT) - 1;
>>   		cluster_end = min(cluster_end, last_byte);
>>
>> @@ -1633,15 +1618,13 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
>>   			break;
>>   		}
>>   		if (do_compress)
>> -			BTRFS_I(inode)->defrag_compress = compress_type;
>> -		ret = defrag_one_cluster(BTRFS_I(inode), ra, cur,
>> -				cluster_end + 1 - cur, extent_thresh,
>> -				newer_than, do_compress,
>> -				&sectors_defragged, max_to_defrag);
>> +			BTRFS_I(inode)->defrag_compress = ctrl->compress;
>> +		ret = defrag_one_cluster(BTRFS_I(inode), ra, ctrl,
>> +				cluster_end + 1 - ctrl->last_scanned);
>>   		btrfs_inode_unlock(inode, 0);
>>   		if (ret < 0)
>>   			break;
>> -		cur = cluster_end + 1;
>> +		ctrl->last_scanned = cluster_end + 1;
>>   		if (ret > 0) {
>>   			ret = 0;
>>   			break;
>> @@ -1650,27 +1633,21 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
>>
>>   	if (ra_allocated)
>>   		kfree(ra);
>> -	/*
>> -	 * Update range.start for autodefrag, this will indicate where to start
>> -	 * in next run.
>> -	 */
>> -	range->start = cur;
>> -	if (sectors_defragged) {
>> +	if (ctrl->sectors_defragged) {
>>   		/*
>>   		 * We have defragged some sectors, for compression case they
>>   		 * need to be written back immediately.
>>   		 */
>> -		if (range->flags & BTRFS_DEFRAG_RANGE_START_IO) {
>> +		if (ctrl->flags & BTRFS_DEFRAG_RANGE_START_IO) {
>>   			filemap_flush(inode->i_mapping);
>>   			if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
>>   				     &BTRFS_I(inode)->runtime_flags))
>>   				filemap_flush(inode->i_mapping);
>>   		}
>> -		if (range->compress_type == BTRFS_COMPRESS_LZO)
>> +		if (ctrl->compress == BTRFS_COMPRESS_LZO)
>>   			btrfs_set_fs_incompat(fs_info, COMPRESS_LZO);
>> -		else if (range->compress_type == BTRFS_COMPRESS_ZSTD)
>> +		else if (ctrl->compress == BTRFS_COMPRESS_ZSTD)
>>   			btrfs_set_fs_incompat(fs_info, COMPRESS_ZSTD);
>> -		ret = sectors_defragged;
>>   	}
>>   	if (do_compress) {
>>   		btrfs_inode_lock(inode, 0);
>> @@ -3193,6 +3170,7 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp)
>>   	struct inode *inode = file_inode(file);
>>   	struct btrfs_root *root = BTRFS_I(inode)->root;
>>   	struct btrfs_ioctl_defrag_range_args range = {0};
>> +	struct btrfs_defrag_ctrl ctrl = {};
>
> Same here, but more confusing here since both styles are used one after
> another.
>
> Other than that, it looks fine. Thanks.
>
>>   	int ret;
>>
>>   	ret = mnt_want_write_file(file);
>> @@ -3238,10 +3216,16 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp)
>>   			/* the rest are all set to zero by kzalloc */
>>   			range.len = (u64)-1;
>>   		}
>> -		ret = btrfs_defrag_file(file_inode(file), &file->f_ra,
>> -					&range, BTRFS_OLDEST_GENERATION, 0);
>> -		if (ret > 0)
>> -			ret = 0;
>> +		ret = btrfs_defrag_ioctl_args_to_ctrl(root->fs_info, &range,
>> +						      &ctrl, 0, BTRFS_OLDEST_GENERATION);
>> +		if (ret < 0)
>> +			break;
>> +		ret = btrfs_defrag_file(file_inode(file), &file->f_ra, &ctrl);
>> +		/*
>> +		 * Although progs doesn't check range->start, still try to keep
>> +		 * the same behavior to make direct ioctl caller happy.
>> +		 */
>> +		range.start = ctrl.last_scanned;
>>   		break;
>>   	default:
>>   		ret = -EINVAL;
>> --
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] btrfs: defrag: allow defrag_one_cluster() to large extent which is not a target
  2022-02-03 17:39   ` Filipe Manana
@ 2022-02-04  0:39     ` Qu Wenruo
  2022-02-04  7:08       ` Qu Wenruo
  0 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2022-02-04  0:39 UTC (permalink / raw)
  To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs



On 2022/2/4 01:39, Filipe Manana wrote:
> On Fri, Jan 28, 2022 at 04:12:58PM +0800, Qu Wenruo wrote:
>> In the rework of btrfs_defrag_file(), we always call
>> defrag_one_cluster() and increase the offset by cluster size, which is
>> only 256K.
>>
>> But there are cases where we have a large extent (e.g. 128M) which
>> doesn't need to be defragged at all.
>>
>> Before the refactor, we can directly skip the range, but now we have to
>> scan that extent map again and again until the cluster moves after the
>> non-target extent.
>>
>> Fix the problem by allow defrag_one_cluster() to increase
>> btrfs_defrag_ctrl::last_scanned to the end of an extent, if and only if
>> the last extent of the cluster is not a target.
>>
>> The test script looks like this:
>>
>> 	mkfs.btrfs -f $dev > /dev/null
>>
>> 	mount $dev $mnt
>>
>> 	# As btrfs ioctl uses 32M as extent_threshold
>> 	xfs_io -f -c "pwrite 0 64M" $mnt/file1
>> 	sync
>> 	# Some fragemented range to defrag
>> 	xfs_io -s -c "pwrite 65548k 4k" \
>> 		  -c "pwrite 65544k 4k" \
>> 		  -c "pwrite 65540k 4k" \
>> 		  -c "pwrite 65536k 4k" \
>> 		  $mnt/file1
>> 	sync
>>
>> 	echo "=== before ==="
>> 	xfs_io -c "fiemap -v" $mnt/file1
>> 	echo "=== after ==="
>> 	btrfs fi defrag $mnt/file1
>> 	sync
>> 	xfs_io -c "fiemap -v" $mnt/file1
>> 	umount $mnt
>>
>> With extra ftrace put into defrag_one_cluster(), before the patch it
>> would result tons of loops:
>>
>> (As defrag_one_cluster() is inlined, the function name is its caller)
>>
>>    btrfs-126062  [005] .....  4682.816026: btrfs_defrag_file: r/i=5/257 start=0 len=262144
>>    btrfs-126062  [005] .....  4682.816027: btrfs_defrag_file: r/i=5/257 start=262144 len=262144
>>    btrfs-126062  [005] .....  4682.816028: btrfs_defrag_file: r/i=5/257 start=524288 len=262144
>>    btrfs-126062  [005] .....  4682.816028: btrfs_defrag_file: r/i=5/257 start=786432 len=262144
>>    btrfs-126062  [005] .....  4682.816028: btrfs_defrag_file: r/i=5/257 start=1048576 len=262144
>>    ...
>>    btrfs-126062  [005] .....  4682.816043: btrfs_defrag_file: r/i=5/257 start=67108864 len=262144
>>
>> But with this patch there will be just one loop, then directly to the
>> end of the extent:
>>
>>    btrfs-130471  [014] .....  5434.029558: defrag_one_cluster: r/i=5/257 start=0 len=262144
>>    btrfs-130471  [014] .....  5434.029559: defrag_one_cluster: r/i=5/257 start=67108864 len=16384
>
> Did you also measure how much time was spent before and after?
> It would be interesting to have it here in case you have measured it.
> If you don't have it, then it's fine.

I don't have the exact time spent.

But the trace contains the timestamp, and even with the old behavior,
the time spent before defragging the last cluster is pretty tiny, only 17ns.

>
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/ioctl.c | 26 +++++++++++++++++++++-----
>>   1 file changed, 21 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 67ba934be99e..592a542164a4 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -1197,10 +1197,11 @@ struct defrag_target_range {
>>    */
>>   static int defrag_collect_targets(struct btrfs_inode *inode,
>>   				  const struct btrfs_defrag_ctrl *ctrl,
>> -				  u64 start, u32 len, bool locked,
>> -				  struct list_head *target_list)
>> +				  u64 start, u32 len, u64 *last_scanned_ret,
>> +				  bool locked, struct list_head *target_list)
>>   {
>>   	bool do_compress = ctrl->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
>> +	bool last_target = false;
>>   	u64 cur = start;
>>   	int ret = 0;
>>
>> @@ -1210,6 +1211,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>>   		bool next_mergeable = true;
>>   		u64 range_len;
>>
>> +		last_target = false;
>>   		em = defrag_lookup_extent(&inode->vfs_inode, cur, locked);
>>   		if (!em)
>>   			break;
>> @@ -1259,6 +1261,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>>   		}
>>
>>   add:
>> +		last_target = true;
>>   		range_len = min(extent_map_end(em), start + len) - cur;
>>   		/*
>>   		 * This one is a good target, check if it can be merged into
>> @@ -1302,6 +1305,17 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>>   			kfree(entry);
>>   		}
>>   	}
>> +	if (!ret && last_scanned_ret) {
>> +		/*
>> +		 * If the last extent is not a target, the caller can skip to
>> +		 * the end of that extent.
>> +		 * Otherwise, we can only go the end of the spcified range.
>> +		 */
>> +		if (!last_target)
>> +			*last_scanned_ret = cur;
>> +		else
>> +			*last_scanned_ret = start + len;
>
> Might be just me, but I found the name "last_target" a bit harder to
> decipher. Something like "last_is_target" seems more clear to me, as
> it indicates if the last extent we found was a target for defrag.

Indeed sounds better.

>
>> +	}
>>   	return ret;
>>   }
>>
>> @@ -1405,7 +1419,7 @@ static int defrag_one_range(struct btrfs_inode *inode,
>>   	 * And this time we have extent locked already, pass @locked = true
>>   	 * so that we won't relock the extent range and cause deadlock.
>>   	 */
>> -	ret = defrag_collect_targets(inode, ctrl, start, len, true,
>> +	ret = defrag_collect_targets(inode, ctrl, start, len, NULL, true,
>>   				     &target_list);
>>   	if (ret < 0)
>>   		goto unlock_extent;
>> @@ -1448,6 +1462,7 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
>>   			      struct btrfs_defrag_ctrl *ctrl, u32 len)
>>   {
>>   	const u32 sectorsize = inode->root->fs_info->sectorsize;
>> +	u64 last_scanned;
>>   	struct defrag_target_range *entry;
>>   	struct defrag_target_range *tmp;
>>   	LIST_HEAD(target_list);
>> @@ -1455,7 +1470,7 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
>>
>>   	BUILD_BUG_ON(!IS_ALIGNED(CLUSTER_SIZE, PAGE_SIZE));
>>   	ret = defrag_collect_targets(inode, ctrl, ctrl->last_scanned, len,
>> -				     false, &target_list);
>> +				     &last_scanned, false, &target_list);
>
> So I would revert the logic here.
> What I mean is, to pass a non-NULL pointer *last_scanned_ret to defrag_collect_targets()
> when it's called at defrag_one_range(), and then here, at defrag_one_cluster(), pass it NULL
> instead.

The main reason to pass it to defrag_one_cluster() call site but not
defrag_one_range() one is to reduce the parameter for defrag_one_range().

>
> The reason is simple and I think makes more sense:
>
> 1) defrag_one_cluster() does a first call to defrag_collect_targets() to scan
>     for the extents maps in a range without locking the range in the inode's
>     iotree;
>
> 2) Then for each range it collects, we call defrag_one_range(). That will
>     lock the range in the io tree and then call again defrag_collect_targets(),
>     this time extent maps may have changed (due to concurrent mmap writes, etc).
>     So it's this second time that we can have an accurate value for
>     *last_scanned_ret

One of the design is to not use so accurate values for accounting
(again, to pass less parameters down the chain), thus things like
btrfs_defrag_ctrl::sectors_defragged is not that accurate.

But since you mentioned, I guess I can update them to the most accurate
accounting, especially after the btrfs_defrag_ctrl refactor, there is
already less parameters to pass around.

Thanks,
Qu

>
> That would deal with the case where first pass considered a range for
> defrag, but then in the second pass we skipped a subrange because in the
> meanwhile, before we locked the range in the io tree, it got a large extent.
>
> Thanks.
>
>
>>   	if (ret < 0)
>>   		goto out;
>>
>> @@ -1496,6 +1511,8 @@ static int defrag_one_cluster(struct btrfs_inode *inode,
>>   		list_del_init(&entry->list);
>>   		kfree(entry);
>>   	}
>> +	if (!ret)
>> +		ctrl->last_scanned = last_scanned;
>>   	return ret;
>>   }
>>
>> @@ -1624,7 +1641,6 @@ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
>>   		btrfs_inode_unlock(inode, 0);
>>   		if (ret < 0)
>>   			break;
>> -		ctrl->last_scanned = cluster_end + 1;
>>   		if (ret > 0) {
>>   			ret = 0;
>>   			break;
>> --
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] btrfs: defrag: use btrfs_defrag_ctrl to replace btrfs_ioctl_defrag_range_args for btrfs_defrag_file()
  2022-02-04  0:30     ` Qu Wenruo
@ 2022-02-04  1:03       ` Qu Wenruo
  2022-02-04 17:57       ` David Sterba
  1 sibling, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2022-02-04  1:03 UTC (permalink / raw)
  To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs



On 2022/2/4 08:30, Qu Wenruo wrote:
> 
> 
> On 2022/2/4 01:17, Filipe Manana wrote:
>> On Fri, Jan 28, 2022 at 04:12:57PM +0800, Qu Wenruo wrote:
>>> This brings the following benefits:
>>>
>>> - No more strange range->start update to indicate last scanned bytenr
>>>    We have btrfs_defrag_ctrl::last_scanned (exclusive) for it directly.
>>>
>>> - No more return value to indicate defragged sectors
>>>    Now btrfs_defrag_file() will just return 0 if no error happened.
>>>    And btrfs_defrag_ctrl::sectors_defragged will show that value.
>>>
>>> - Less parameters to carry around
>>>    Now most defrag_* functions only need to fetch its policy parameters
>>>    from btrfs_defrag_ctrl directly.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>   fs/btrfs/ctree.h |   3 +-
>>>   fs/btrfs/file.c  |  17 +++---
>>>   fs/btrfs/ioctl.c | 144 +++++++++++++++++++++--------------------------
>>>   3 files changed, 73 insertions(+), 91 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>> index 0a441bd703a0..b30271676e15 100644
>>> --- a/fs/btrfs/ctree.h
>>> +++ b/fs/btrfs/ctree.h
>>> @@ -3287,8 +3287,7 @@ int btrfs_defrag_ioctl_args_to_ctrl(struct 
>>> btrfs_fs_info *fs_info,
>>>                       struct btrfs_defrag_ctrl *ctrl,
>>>                       u64 max_sectors_to_defrag, u64 newer_than);
>>>   int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
>>> -              struct btrfs_ioctl_defrag_range_args *range,
>>> -              u64 newer_than, unsigned long max_to_defrag);
>>> +              struct btrfs_defrag_ctrl *ctrl);
>>>   void btrfs_get_block_group_info(struct list_head *groups_list,
>>>                   struct btrfs_ioctl_space_info *space);
>>>   void btrfs_update_ioctl_balance_args(struct btrfs_fs_info *fs_info,
>>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>>> index 11204dbbe053..f5de8ab9787e 100644
>>> --- a/fs/btrfs/file.c
>>> +++ b/fs/btrfs/file.c
>>> @@ -277,8 +277,7 @@ static int __btrfs_run_defrag_inode(struct 
>>> btrfs_fs_info *fs_info,
>>>   {
>>>       struct btrfs_root *inode_root;
>>>       struct inode *inode;
>>> -    struct btrfs_ioctl_defrag_range_args range;
>>> -    int num_defrag;
>>> +    struct btrfs_defrag_ctrl ctrl = {};
>>
>> Most of the code base uses { 0 } for this type of initialization.
>> IIRC some compiler versions complained about {} and preferred the
>> version with 0.
> 
> Exactly what I preferred, but David mentioned that kernel is moving
> torwards {} thus I that's what I go.
> 
>>
>> I think we should try to be consistent. Personally I find the 0 version
>> more clear. Though this might be bike shedding territory.
>>
>>
>>>       int ret;
>>>
>>>       /* get the inode */
>>> @@ -297,21 +296,21 @@ static int __btrfs_run_defrag_inode(struct 
>>> btrfs_fs_info *fs_info,
>>>
>>>       /* do a chunk of defrag */
>>>       clear_bit(BTRFS_INODE_IN_DEFRAG, &BTRFS_I(inode)->runtime_flags);
>>> -    memset(&range, 0, sizeof(range));
>>> -    range.len = (u64)-1;
>>> -    range.start = defrag->last_offset;
>>> +    ctrl.len = (u64)-1;
>>> +    ctrl.start = defrag->last_offset;
>>> +    ctrl.newer_than = defrag->transid;
>>> +    ctrl.max_sectors_to_defrag = BTRFS_DEFRAG_BATCH;
>>>
>>>       sb_start_write(fs_info->sb);
>>> -    num_defrag = btrfs_defrag_file(inode, NULL, &range, 
>>> defrag->transid,
>>> -                       BTRFS_DEFRAG_BATCH);
>>> +    ret = btrfs_defrag_file(inode, NULL, &ctrl);
>>>       sb_end_write(fs_info->sb);
>>>       /*
>>>        * if we filled the whole defrag batch, there
>>>        * must be more work to do.  Queue this defrag
>>>        * again
>>>        */
>>> -    if (num_defrag == BTRFS_DEFRAG_BATCH) {
>>> -        defrag->last_offset = range.start;
>>> +    if (ctrl.sectors_defragged == BTRFS_DEFRAG_BATCH) {
>>> +        defrag->last_offset = ctrl.last_scanned;
>>>           btrfs_requeue_inode_defrag(BTRFS_I(inode), defrag);
>>>       } else if (defrag->last_offset && !defrag->cycled) {
>>>           /*
>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>> index f9b9ee6c50da..67ba934be99e 100644
>>> --- a/fs/btrfs/ioctl.c
>>> +++ b/fs/btrfs/ioctl.c
>>> @@ -1189,22 +1189,18 @@ struct defrag_target_range {
>>>   /*
>>>    * Collect all valid target extents.
>>>    *
>>> + * @ctrl:       extra defrag policy control
>>>    * @start:       file offset to lookup
>>>    * @len:       length to lookup
>>> - * @extent_thresh: file extent size threshold, any extent size >= 
>>> this value
>>> - *           will be ignored
>>> - * @newer_than:    only defrag extents newer than this value
>>> - * @do_compress:   whether the defrag is doing compression
>>> - *           if true, @extent_thresh will be ignored and all regular
>>> - *           file extents meeting @newer_than will be targets.
>>>    * @locked:       if the range has already held extent lock
>>>    * @target_list:   list of targets file extents
>>>    */
>>>   static int defrag_collect_targets(struct btrfs_inode *inode,
>>> -                  u64 start, u64 len, u32 extent_thresh,
>>> -                  u64 newer_than, bool do_compress,
>>> -                  bool locked, struct list_head *target_list)
>>> +                  const struct btrfs_defrag_ctrl *ctrl,
>>> +                  u64 start, u32 len, bool locked,
>>> +                  struct list_head *target_list)
>>>   {
>>> +    bool do_compress = ctrl->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
>>>       u64 cur = start;
>>>       int ret = 0;
>>>
>>> @@ -1224,7 +1220,7 @@ static int defrag_collect_targets(struct 
>>> btrfs_inode *inode,
>>>               goto next;
>>>
>>>           /* Skip older extent */
>>> -        if (em->generation < newer_than)
>>> +        if (em->generation < ctrl->newer_than)
>>>               goto next;
>>>
>>>           /*
>>> @@ -1235,7 +1231,7 @@ static int defrag_collect_targets(struct 
>>> btrfs_inode *inode,
>>>               goto add;
>>>
>>>           /* Skip too large extent */
>>> -        if (em->len >= extent_thresh)
>>> +        if (em->len >= ctrl->extent_thresh)
>>>               goto next;
>>>
>>>           /*
>>> @@ -1363,8 +1359,9 @@ static int defrag_one_locked_target(struct 
>>> btrfs_inode *inode,
>>>       return ret;
>>>   }
>>>
>>> -static int defrag_one_range(struct btrfs_inode *inode, u64 start, 
>>> u32 len,
>>> -                u32 extent_thresh, u64 newer_than, bool do_compress)
>>> +static int defrag_one_range(struct btrfs_inode *inode,
>>> +                const struct btrfs_defrag_ctrl *ctrl,
>>> +                u64 start, u32 len)
>>>   {
>>>       struct extent_state *cached_state = NULL;
>>>       struct defrag_target_range *entry;
>>> @@ -1408,8 +1405,7 @@ static int defrag_one_range(struct btrfs_inode 
>>> *inode, u64 start, u32 len,
>>>        * And this time we have extent locked already, pass @locked = 
>>> true
>>>        * so that we won't relock the extent range and cause deadlock.
>>>        */
>>> -    ret = defrag_collect_targets(inode, start, len, extent_thresh,
>>> -                     newer_than, do_compress, true,
>>> +    ret = defrag_collect_targets(inode, ctrl, start, len, true,
>>>                        &target_list);
>>>       if (ret < 0)
>>>           goto unlock_extent;
>>> @@ -1440,12 +1436,16 @@ static int defrag_one_range(struct 
>>> btrfs_inode *inode, u64 start, u32 len,
>>>       return ret;
>>>   }
>>>
>>> +/*
>>> + * Return <0 for error.
>>> + * Return >0 if we hit the ctrl->max_sectors_to_defrag limit
>>> + * Return 0 if we finished the range without error.
>>> + *
>>> + * For >= 0 case, ctrl->last_scanned and ctrl->sectors_defragged 
>>> will be updated.
>>> + */
>>>   static int defrag_one_cluster(struct btrfs_inode *inode,
>>>                     struct file_ra_state *ra,
>>> -                  u64 start, u32 len, u32 extent_thresh,
>>> -                  u64 newer_than, bool do_compress,
>>> -                  unsigned long *sectors_defragged,
>>> -                  unsigned long max_sectors)
>>> +                  struct btrfs_defrag_ctrl *ctrl, u32 len)
>>>   {
>>>       const u32 sectorsize = inode->root->fs_info->sectorsize;
>>>       struct defrag_target_range *entry;
>>> @@ -1454,9 +1454,8 @@ static int defrag_one_cluster(struct 
>>> btrfs_inode *inode,
>>>       int ret;
>>>
>>>       BUILD_BUG_ON(!IS_ALIGNED(CLUSTER_SIZE, PAGE_SIZE));
>>> -    ret = defrag_collect_targets(inode, start, len, extent_thresh,
>>> -                     newer_than, do_compress, false,
>>> -                     &target_list);
>>> +    ret = defrag_collect_targets(inode, ctrl, ctrl->last_scanned, len,
>>> +                     false, &target_list);
>>>       if (ret < 0)
>>>           goto out;
>>>
>>> @@ -1464,14 +1463,16 @@ static int defrag_one_cluster(struct 
>>> btrfs_inode *inode,
>>>           u32 range_len = entry->len;
>>>
>>>           /* Reached or beyond the limit */
>>> -        if (max_sectors && *sectors_defragged >= max_sectors) {
>>> +        if (ctrl->max_sectors_to_defrag &&
>>> +            ctrl->sectors_defragged >= ctrl->max_sectors_to_defrag) {
>>>               ret = 1;
>>>               break;
>>>           }
>>>
>>> -        if (max_sectors)
>>> +        if (ctrl->max_sectors_to_defrag)
>>>               range_len = min_t(u32, range_len,
>>> -                (max_sectors - *sectors_defragged) * sectorsize);
>>> +                      (ctrl->max_sectors_to_defrag -
>>> +                       ctrl->sectors_defragged) * sectorsize);
>>>
>>>           if (ra)
>>>               page_cache_sync_readahead(inode->vfs_inode.i_mapping,
>>> @@ -1484,12 +1485,11 @@ static int defrag_one_cluster(struct 
>>> btrfs_inode *inode,
>>>            * But that's fine, it only affects the @sectors_defragged
>>>            * accounting.
>>>            */
>>> -        ret = defrag_one_range(inode, entry->start, range_len,
>>> -                       extent_thresh, newer_than, do_compress);
>>> +        ret = defrag_one_range(inode, ctrl, entry->start, range_len);
>>>           if (ret < 0)
>>>               break;
>>> -        *sectors_defragged += range_len >>
>>> -                      inode->root->fs_info->sectorsize_bits;
>>> +        ctrl->sectors_defragged += range_len >>
>>> +                      inode->root->fs_info->sectorsize_bits;
>>>       }
>>>   out:
>>>       list_for_each_entry_safe(entry, tmp, &target_list, list) {
>>> @@ -1545,58 +1545,43 @@ int btrfs_defrag_ioctl_args_to_ctrl(struct 
>>> btrfs_fs_info *fs_info,
>>>    *
>>>    * @inode:       inode to be defragged
>>>    * @ra:           readahead state (can be NUL)
>>> - * @range:       defrag options including range and flags
>>> - * @newer_than:       minimum transid to defrag
>>> - * @max_to_defrag: max number of sectors to be defragged, if 0, the 
>>> whole inode
>>> - *           will be defragged.
>>> + * @ctrl:       defrag options including range and various policy 
>>> parameters
>>>    *
>>>    * Return <0 for error.
>>> - * Return >=0 for the number of sectors defragged, and range->start 
>>> will be updated
>>> - * to indicate the file offset where next defrag should be started at.
>>> - * (Mostly for autodefrag, which sets @max_to_defrag thus we may 
>>> exit early without
>>> - *  defragging all the range).
>>> + * Return 0 if the defrag is done without error, ctrl->last_scanned and
>>> + * ctrl->sectors_defragged will be updated.
>>>    */
>>>   int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
>>> -              struct btrfs_ioctl_defrag_range_args *range,
>>> -              u64 newer_than, unsigned long max_to_defrag)
>>> +              struct btrfs_defrag_ctrl *ctrl)
>>>   {
>>>       struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>>> -    unsigned long sectors_defragged = 0;
>>>       u64 isize = i_size_read(inode);
>>> -    u64 cur;
>>>       u64 last_byte;
>>> -    bool do_compress = range->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
>>> +    bool do_compress = ctrl->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
>>>       bool ra_allocated = false;
>>> -    int compress_type = BTRFS_COMPRESS_ZLIB;
>>>       int ret = 0;
>>> -    u32 extent_thresh = range->extent_thresh;
>>>
>>>       if (isize == 0)
>>>           return 0;
>>>
>>> -    if (range->start >= isize)
>>> +    if (ctrl->start >= isize)
>>>           return -EINVAL;
>>>
>>> -    if (do_compress) {
>>> -        if (range->compress_type >= BTRFS_NR_COMPRESS_TYPES)
>>> -            return -EINVAL;
>>> -        if (range->compress_type)
>>> -            compress_type = range->compress_type;
>>> -    }
>>> -
>>> -    if (extent_thresh == 0)
>>> -        extent_thresh = SZ_256K;
>>> +    if (do_compress)
>>> +        ASSERT(ctrl->compress < BTRFS_NR_COMPRESS_TYPES);
>>>
>>> -    if (range->start + range->len > range->start) {
>>> +    if (ctrl->start + ctrl->len > ctrl->start) {
>>>           /* Got a specific range */
>>> -        last_byte = min(isize, range->start + range->len);
>>> +        last_byte = min(isize, ctrl->start + ctrl->len);
>>>       } else {
>>>           /* Defrag until file end */
>>>           last_byte = isize;
>>>       }
>>> +    if (ctrl->extent_thresh == 0)
>>> +        ctrl->extent_thresh = SZ_256K;
>>
>> Why did you move the logic for checking/setting the extent threshold?
>> You placed it now in the middle of the logic that sets 'last_byte', 
>> which is
>> not logical and makes it harder to follow.
>>
>> I adjust the setup of 'last_byte' recently to avoid that:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6b34cd8e175bfbf4f3f01b6d19eae18245e1a8cc 
>>
> 
> My bad, originally I put the default value setting into the new helper
> btrfs_defrag_ioctl_args_to_ctrl(), but that won't handle autodefrag and
> test cases exposed this problem.
> 
> Then when adding back the assignment I put it into a different location.
> 
> Should keep it where it was.

Or, should we just move the default value assignment completely out of 
btrfs_defrag_file()?
And put extent_threshold == 0 and not aligned case as invalid?

Thanks,
Qu
> 
> Thanks,
> Qu
>>
>>>
>>> -    /* Align the range */
>>> -    cur = round_down(range->start, fs_info->sectorsize);
>>> +    ASSERT(IS_ALIGNED(ctrl->start, fs_info->sectorsize));
>>> +    ctrl->last_scanned = ctrl->start;
>>>       last_byte = round_up(last_byte, fs_info->sectorsize) - 1;
>>>
>>>       /*
>>> @@ -1611,14 +1596,14 @@ int btrfs_defrag_file(struct inode *inode, 
>>> struct file_ra_state *ra,
>>>               file_ra_state_init(ra, inode->i_mapping);
>>>       }
>>>
>>> -    while (cur < last_byte) {
>>> +    while (ctrl->last_scanned < last_byte) {
>>>           u64 cluster_end;
>>>
>>>           /* The cluster size 256K should always be page aligned */
>>>           BUILD_BUG_ON(!IS_ALIGNED(CLUSTER_SIZE, PAGE_SIZE));
>>>
>>>           /* We want the cluster end at page boundary when possible */
>>> -        cluster_end = (((cur >> PAGE_SHIFT) +
>>> +        cluster_end = (((ctrl->last_scanned >> PAGE_SHIFT) +
>>>                      (SZ_256K >> PAGE_SHIFT)) << PAGE_SHIFT) - 1;
>>>           cluster_end = min(cluster_end, last_byte);
>>>
>>> @@ -1633,15 +1618,13 @@ int btrfs_defrag_file(struct inode *inode, 
>>> struct file_ra_state *ra,
>>>               break;
>>>           }
>>>           if (do_compress)
>>> -            BTRFS_I(inode)->defrag_compress = compress_type;
>>> -        ret = defrag_one_cluster(BTRFS_I(inode), ra, cur,
>>> -                cluster_end + 1 - cur, extent_thresh,
>>> -                newer_than, do_compress,
>>> -                &sectors_defragged, max_to_defrag);
>>> +            BTRFS_I(inode)->defrag_compress = ctrl->compress;
>>> +        ret = defrag_one_cluster(BTRFS_I(inode), ra, ctrl,
>>> +                cluster_end + 1 - ctrl->last_scanned);
>>>           btrfs_inode_unlock(inode, 0);
>>>           if (ret < 0)
>>>               break;
>>> -        cur = cluster_end + 1;
>>> +        ctrl->last_scanned = cluster_end + 1;
>>>           if (ret > 0) {
>>>               ret = 0;
>>>               break;
>>> @@ -1650,27 +1633,21 @@ int btrfs_defrag_file(struct inode *inode, 
>>> struct file_ra_state *ra,
>>>
>>>       if (ra_allocated)
>>>           kfree(ra);
>>> -    /*
>>> -     * Update range.start for autodefrag, this will indicate where 
>>> to start
>>> -     * in next run.
>>> -     */
>>> -    range->start = cur;
>>> -    if (sectors_defragged) {
>>> +    if (ctrl->sectors_defragged) {
>>>           /*
>>>            * We have defragged some sectors, for compression case they
>>>            * need to be written back immediately.
>>>            */
>>> -        if (range->flags & BTRFS_DEFRAG_RANGE_START_IO) {
>>> +        if (ctrl->flags & BTRFS_DEFRAG_RANGE_START_IO) {
>>>               filemap_flush(inode->i_mapping);
>>>               if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
>>>                        &BTRFS_I(inode)->runtime_flags))
>>>                   filemap_flush(inode->i_mapping);
>>>           }
>>> -        if (range->compress_type == BTRFS_COMPRESS_LZO)
>>> +        if (ctrl->compress == BTRFS_COMPRESS_LZO)
>>>               btrfs_set_fs_incompat(fs_info, COMPRESS_LZO);
>>> -        else if (range->compress_type == BTRFS_COMPRESS_ZSTD)
>>> +        else if (ctrl->compress == BTRFS_COMPRESS_ZSTD)
>>>               btrfs_set_fs_incompat(fs_info, COMPRESS_ZSTD);
>>> -        ret = sectors_defragged;
>>>       }
>>>       if (do_compress) {
>>>           btrfs_inode_lock(inode, 0);
>>> @@ -3193,6 +3170,7 @@ static int btrfs_ioctl_defrag(struct file 
>>> *file, void __user *argp)
>>>       struct inode *inode = file_inode(file);
>>>       struct btrfs_root *root = BTRFS_I(inode)->root;
>>>       struct btrfs_ioctl_defrag_range_args range = {0};
>>> +    struct btrfs_defrag_ctrl ctrl = {};
>>
>> Same here, but more confusing here since both styles are used one after
>> another.
>>
>> Other than that, it looks fine. Thanks.
>>
>>>       int ret;
>>>
>>>       ret = mnt_want_write_file(file);
>>> @@ -3238,10 +3216,16 @@ static int btrfs_ioctl_defrag(struct file 
>>> *file, void __user *argp)
>>>               /* the rest are all set to zero by kzalloc */
>>>               range.len = (u64)-1;
>>>           }
>>> -        ret = btrfs_defrag_file(file_inode(file), &file->f_ra,
>>> -                    &range, BTRFS_OLDEST_GENERATION, 0);
>>> -        if (ret > 0)
>>> -            ret = 0;
>>> +        ret = btrfs_defrag_ioctl_args_to_ctrl(root->fs_info, &range,
>>> +                              &ctrl, 0, BTRFS_OLDEST_GENERATION);
>>> +        if (ret < 0)
>>> +            break;
>>> +        ret = btrfs_defrag_file(file_inode(file), &file->f_ra, &ctrl);
>>> +        /*
>>> +         * Although progs doesn't check range->start, still try to keep
>>> +         * the same behavior to make direct ioctl caller happy.
>>> +         */
>>> +        range.start = ctrl.last_scanned;
>>>           break;
>>>       default:
>>>           ret = -EINVAL;
>>> -- 
>>> 2.34.1
>>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] btrfs: defrag: allow defrag_one_cluster() to large extent which is not a target
  2022-02-04  0:39     ` Qu Wenruo
@ 2022-02-04  7:08       ` Qu Wenruo
  0 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2022-02-04  7:08 UTC (permalink / raw)
  To: Filipe Manana, Qu Wenruo; +Cc: linux-btrfs



On 2022/2/4 08:39, Qu Wenruo wrote:
>
>
> On 2022/2/4 01:39, Filipe Manana wrote:
>> On Fri, Jan 28, 2022 at 04:12:58PM +0800, Qu Wenruo wrote:
>>> In the rework of btrfs_defrag_file(), we always call
>>> defrag_one_cluster() and increase the offset by cluster size, which is
>>> only 256K.
>>>
>>> But there are cases where we have a large extent (e.g. 128M) which
>>> doesn't need to be defragged at all.
>>>
>>> Before the refactor, we can directly skip the range, but now we have to
>>> scan that extent map again and again until the cluster moves after the
>>> non-target extent.
>>>
>>> Fix the problem by allow defrag_one_cluster() to increase
>>> btrfs_defrag_ctrl::last_scanned to the end of an extent, if and only if
>>> the last extent of the cluster is not a target.
>>>
>>> The test script looks like this:
>>>
>>>     mkfs.btrfs -f $dev > /dev/null
>>>
>>>     mount $dev $mnt
>>>
>>>     # As btrfs ioctl uses 32M as extent_threshold
>>>     xfs_io -f -c "pwrite 0 64M" $mnt/file1
>>>     sync
>>>     # Some fragemented range to defrag
>>>     xfs_io -s -c "pwrite 65548k 4k" \
>>>           -c "pwrite 65544k 4k" \
>>>           -c "pwrite 65540k 4k" \
>>>           -c "pwrite 65536k 4k" \
>>>           $mnt/file1
>>>     sync
>>>
>>>     echo "=== before ==="
>>>     xfs_io -c "fiemap -v" $mnt/file1
>>>     echo "=== after ==="
>>>     btrfs fi defrag $mnt/file1
>>>     sync
>>>     xfs_io -c "fiemap -v" $mnt/file1
>>>     umount $mnt
>>>
>>> With extra ftrace put into defrag_one_cluster(), before the patch it
>>> would result tons of loops:
>>>
>>> (As defrag_one_cluster() is inlined, the function name is its caller)
>>>
>>>    btrfs-126062  [005] .....  4682.816026: btrfs_defrag_file:
>>> r/i=5/257 start=0 len=262144
>>>    btrfs-126062  [005] .....  4682.816027: btrfs_defrag_file:
>>> r/i=5/257 start=262144 len=262144
>>>    btrfs-126062  [005] .....  4682.816028: btrfs_defrag_file:
>>> r/i=5/257 start=524288 len=262144
>>>    btrfs-126062  [005] .....  4682.816028: btrfs_defrag_file:
>>> r/i=5/257 start=786432 len=262144
>>>    btrfs-126062  [005] .....  4682.816028: btrfs_defrag_file:
>>> r/i=5/257 start=1048576 len=262144
>>>    ...
>>>    btrfs-126062  [005] .....  4682.816043: btrfs_defrag_file:
>>> r/i=5/257 start=67108864 len=262144
>>>
>>> But with this patch there will be just one loop, then directly to the
>>> end of the extent:
>>>
>>>    btrfs-130471  [014] .....  5434.029558: defrag_one_cluster:
>>> r/i=5/257 start=0 len=262144
>>>    btrfs-130471  [014] .....  5434.029559: defrag_one_cluster:
>>> r/i=5/257 start=67108864 len=16384
>>
>> Did you also measure how much time was spent before and after?
>> It would be interesting to have it here in case you have measured it.
>> If you don't have it, then it's fine.
>
> I don't have the exact time spent.
>
> But the trace contains the timestamp, and even with the old behavior,
> the time spent before defragging the last cluster is pretty tiny, only
> 17ns.

My bad, it's 17us, which is already somewhat time consuming...

>
>>
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>   fs/btrfs/ioctl.c | 26 +++++++++++++++++++++-----
>>>   1 file changed, 21 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>> index 67ba934be99e..592a542164a4 100644
>>> --- a/fs/btrfs/ioctl.c
>>> +++ b/fs/btrfs/ioctl.c
>>> @@ -1197,10 +1197,11 @@ struct defrag_target_range {
>>>    */
>>>   static int defrag_collect_targets(struct btrfs_inode *inode,
>>>                     const struct btrfs_defrag_ctrl *ctrl,
>>> -                  u64 start, u32 len, bool locked,
>>> -                  struct list_head *target_list)
>>> +                  u64 start, u32 len, u64 *last_scanned_ret,
>>> +                  bool locked, struct list_head *target_list)
>>>   {
>>>       bool do_compress = ctrl->flags & BTRFS_DEFRAG_RANGE_COMPRESS;
>>> +    bool last_target = false;
>>>       u64 cur = start;
>>>       int ret = 0;
>>>
>>> @@ -1210,6 +1211,7 @@ static int defrag_collect_targets(struct
>>> btrfs_inode *inode,
>>>           bool next_mergeable = true;
>>>           u64 range_len;
>>>
>>> +        last_target = false;
>>>           em = defrag_lookup_extent(&inode->vfs_inode, cur, locked);
>>>           if (!em)
>>>               break;
>>> @@ -1259,6 +1261,7 @@ static int defrag_collect_targets(struct
>>> btrfs_inode *inode,
>>>           }
>>>
>>>   add:
>>> +        last_target = true;
>>>           range_len = min(extent_map_end(em), start + len) - cur;
>>>           /*
>>>            * This one is a good target, check if it can be merged into
>>> @@ -1302,6 +1305,17 @@ static int defrag_collect_targets(struct
>>> btrfs_inode *inode,
>>>               kfree(entry);
>>>           }
>>>       }
>>> +    if (!ret && last_scanned_ret) {
>>> +        /*
>>> +         * If the last extent is not a target, the caller can skip to
>>> +         * the end of that extent.
>>> +         * Otherwise, we can only go the end of the spcified range.
>>> +         */
>>> +        if (!last_target)
>>> +            *last_scanned_ret = cur;
>>> +        else
>>> +            *last_scanned_ret = start + len;
>>
>> Might be just me, but I found the name "last_target" a bit harder to
>> decipher. Something like "last_is_target" seems more clear to me, as
>> it indicates if the last extent we found was a target for defrag.
>
> Indeed sounds better.
>
>>
>>> +    }
>>>       return ret;
>>>   }
>>>
>>> @@ -1405,7 +1419,7 @@ static int defrag_one_range(struct btrfs_inode
>>> *inode,
>>>        * And this time we have extent locked already, pass @locked =
>>> true
>>>        * so that we won't relock the extent range and cause deadlock.
>>>        */
>>> -    ret = defrag_collect_targets(inode, ctrl, start, len, true,
>>> +    ret = defrag_collect_targets(inode, ctrl, start, len, NULL, true,
>>>                        &target_list);
>>>       if (ret < 0)
>>>           goto unlock_extent;
>>> @@ -1448,6 +1462,7 @@ static int defrag_one_cluster(struct
>>> btrfs_inode *inode,
>>>                     struct btrfs_defrag_ctrl *ctrl, u32 len)
>>>   {
>>>       const u32 sectorsize = inode->root->fs_info->sectorsize;
>>> +    u64 last_scanned;
>>>       struct defrag_target_range *entry;
>>>       struct defrag_target_range *tmp;
>>>       LIST_HEAD(target_list);
>>> @@ -1455,7 +1470,7 @@ static int defrag_one_cluster(struct
>>> btrfs_inode *inode,
>>>
>>>       BUILD_BUG_ON(!IS_ALIGNED(CLUSTER_SIZE, PAGE_SIZE));
>>>       ret = defrag_collect_targets(inode, ctrl, ctrl->last_scanned, len,
>>> -                     false, &target_list);
>>> +                     &last_scanned, false, &target_list);
>>
>> So I would revert the logic here.
>> What I mean is, to pass a non-NULL pointer *last_scanned_ret to
>> defrag_collect_targets()
>> when it's called at defrag_one_range(), and then here, at
>> defrag_one_cluster(), pass it NULL
>> instead.
>
> The main reason to pass it to defrag_one_cluster() call site but not
> defrag_one_range() one is to reduce the parameter for defrag_one_range().
>
>>
>> The reason is simple and I think makes more sense:
>>
>> 1) defrag_one_cluster() does a first call to defrag_collect_targets()
>> to scan
>>     for the extents maps in a range without locking the range in the
>> inode's
>>     iotree;
>>
>> 2) Then for each range it collects, we call defrag_one_range(). That will
>>     lock the range in the io tree and then call again
>> defrag_collect_targets(),
>>     this time extent maps may have changed (due to concurrent mmap
>> writes, etc).
>>     So it's this second time that we can have an accurate value for
>>     *last_scanned_ret
>
> One of the design is to not use so accurate values for accounting
> (again, to pass less parameters down the chain), thus things like
> btrfs_defrag_ctrl::sectors_defragged is not that accurate.
>
> But since you mentioned, I guess I can update them to the most accurate
> accounting, especially after the btrfs_defrag_ctrl refactor, there is
> already less parameters to pass around.
>
> Thanks,
> Qu
>
>>
>> That would deal with the case where first pass considered a range for
>> defrag, but then in the second pass we skipped a subrange because in the
>> meanwhile, before we locked the range in the io tree, it got a large
>> extent.
>>
>> Thanks.
>>
>>
>>>       if (ret < 0)
>>>           goto out;
>>>
>>> @@ -1496,6 +1511,8 @@ static int defrag_one_cluster(struct
>>> btrfs_inode *inode,
>>>           list_del_init(&entry->list);
>>>           kfree(entry);
>>>       }
>>> +    if (!ret)
>>> +        ctrl->last_scanned = last_scanned;
>>>       return ret;
>>>   }
>>>
>>> @@ -1624,7 +1641,6 @@ int btrfs_defrag_file(struct inode *inode,
>>> struct file_ra_state *ra,
>>>           btrfs_inode_unlock(inode, 0);
>>>           if (ret < 0)
>>>               break;
>>> -        ctrl->last_scanned = cluster_end + 1;
>>>           if (ret > 0) {
>>>               ret = 0;
>>>               break;
>>> --
>>> 2.34.1
>>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] btrfs: defrag: use btrfs_defrag_ctrl to replace btrfs_ioctl_defrag_range_args for btrfs_defrag_file()
  2022-02-04  0:30     ` Qu Wenruo
  2022-02-04  1:03       ` Qu Wenruo
@ 2022-02-04 17:57       ` David Sterba
  2022-02-04 18:00         ` David Sterba
  1 sibling, 1 reply; 19+ messages in thread
From: David Sterba @ 2022-02-04 17:57 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Filipe Manana, Qu Wenruo, linux-btrfs

On Fri, Feb 04, 2022 at 08:30:01AM +0800, Qu Wenruo wrote:
> On 2022/2/4 01:17, Filipe Manana wrote:
> > On Fri, Jan 28, 2022 at 04:12:57PM +0800, Qu Wenruo wrote:
> >> --- a/fs/btrfs/file.c
> >> +++ b/fs/btrfs/file.c
> >> @@ -277,8 +277,7 @@ static int __btrfs_run_defrag_inode(struct btrfs_fs_info *fs_info,
> >>   {
> >>   	struct btrfs_root *inode_root;
> >>   	struct inode *inode;
> >> -	struct btrfs_ioctl_defrag_range_args range;
> >> -	int num_defrag;
> >> +	struct btrfs_defrag_ctrl ctrl = {};
> >
> > Most of the code base uses { 0 } for this type of initialization.
> > IIRC some compiler versions complained about {} and preferred the
> > version with 0.
> 
> Exactly what I preferred, but David mentioned that kernel is moving
> torwards {} thus I that's what I go.
> 
> > I think we should try to be consistent. Personally I find the 0 version
> > more clear. Though this might be bike shedding territory.

The preferred style is {} because { 0 } does not consistently initialize
the structures on all compilers. I can't find the mail/commit and
reasoning, if there's a pointer as the first member, then it gets
initialized (the 0 matches) but the rest of the structure is not
initialized.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] btrfs: defrag: use btrfs_defrag_ctrl to replace btrfs_ioctl_defrag_range_args for btrfs_defrag_file()
  2022-02-04 17:57       ` David Sterba
@ 2022-02-04 18:00         ` David Sterba
  2022-02-04 18:17           ` David Sterba
  0 siblings, 1 reply; 19+ messages in thread
From: David Sterba @ 2022-02-04 18:00 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Filipe Manana, Qu Wenruo, linux-btrfs

On Fri, Feb 04, 2022 at 06:57:42PM +0100, David Sterba wrote:
> On Fri, Feb 04, 2022 at 08:30:01AM +0800, Qu Wenruo wrote:
> > On 2022/2/4 01:17, Filipe Manana wrote:
> > > On Fri, Jan 28, 2022 at 04:12:57PM +0800, Qu Wenruo wrote:
> > >> --- a/fs/btrfs/file.c
> > >> +++ b/fs/btrfs/file.c
> > >> @@ -277,8 +277,7 @@ static int __btrfs_run_defrag_inode(struct btrfs_fs_info *fs_info,
> > >>   {
> > >>   	struct btrfs_root *inode_root;
> > >>   	struct inode *inode;
> > >> -	struct btrfs_ioctl_defrag_range_args range;
> > >> -	int num_defrag;
> > >> +	struct btrfs_defrag_ctrl ctrl = {};
> > >
> > > Most of the code base uses { 0 } for this type of initialization.
> > > IIRC some compiler versions complained about {} and preferred the
> > > version with 0.
> > 
> > Exactly what I preferred, but David mentioned that kernel is moving
> > torwards {} thus I that's what I go.
> > 
> > > I think we should try to be consistent. Personally I find the 0 version
> > > more clear. Though this might be bike shedding territory.
> 
> The preferred style is {} because { 0 } does not consistently initialize
> the structures on all compilers. I can't find the mail/commit and
> reasoning, if there's a pointer as the first member, then it gets
> initialized (the 0 matches) but the rest of the structure is not
> initialized.

I thought we've had all the { 0 } converted to {} but no so I get the
consistency agrument.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] btrfs: defrag: use btrfs_defrag_ctrl to replace btrfs_ioctl_defrag_range_args for btrfs_defrag_file()
  2022-02-04 18:00         ` David Sterba
@ 2022-02-04 18:17           ` David Sterba
  2022-02-08 17:00             ` David Sterba
  0 siblings, 1 reply; 19+ messages in thread
From: David Sterba @ 2022-02-04 18:17 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Filipe Manana, Qu Wenruo, linux-btrfs

On Fri, Feb 04, 2022 at 07:00:31PM +0100, David Sterba wrote:
> On Fri, Feb 04, 2022 at 06:57:42PM +0100, David Sterba wrote:
> > On Fri, Feb 04, 2022 at 08:30:01AM +0800, Qu Wenruo wrote:
> > > On 2022/2/4 01:17, Filipe Manana wrote:
> > > > On Fri, Jan 28, 2022 at 04:12:57PM +0800, Qu Wenruo wrote:
> > > >> --- a/fs/btrfs/file.c
> > > >> +++ b/fs/btrfs/file.c
> > > >> @@ -277,8 +277,7 @@ static int __btrfs_run_defrag_inode(struct btrfs_fs_info *fs_info,
> > > >>   {
> > > >>   	struct btrfs_root *inode_root;
> > > >>   	struct inode *inode;
> > > >> -	struct btrfs_ioctl_defrag_range_args range;
> > > >> -	int num_defrag;
> > > >> +	struct btrfs_defrag_ctrl ctrl = {};
> > > >
> > > > Most of the code base uses { 0 } for this type of initialization.
> > > > IIRC some compiler versions complained about {} and preferred the
> > > > version with 0.
> > > 
> > > Exactly what I preferred, but David mentioned that kernel is moving
> > > torwards {} thus I that's what I go.
> > > 
> > > > I think we should try to be consistent. Personally I find the 0 version
> > > > more clear. Though this might be bike shedding territory.
> > 
> > The preferred style is {} because { 0 } does not consistently initialize
> > the structures on all compilers. I can't find the mail/commit and
> > reasoning, if there's a pointer as the first member, then it gets
> > initialized (the 0 matches) but the rest of the structure is not
> > initialized.
> 
> I thought we've had all the { 0 } converted to {} but no so I get the
> consistency agrument.

https://lore.kernel.org/all/20210910225207.3272766-1-keescook@chromium.org/

A tree-wide change but it did not get pulled in the end. The problematic
compiler was gcc 4.9 but the current minimum requirement is 5.1 so I
wonder how much relevant it is.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] btrfs: defrag: use btrfs_defrag_ctrl to replace btrfs_ioctl_defrag_range_args for btrfs_defrag_file()
  2022-02-04 18:17           ` David Sterba
@ 2022-02-08 17:00             ` David Sterba
  2022-02-09  0:15               ` Qu Wenruo
  0 siblings, 1 reply; 19+ messages in thread
From: David Sterba @ 2022-02-08 17:00 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Filipe Manana, Qu Wenruo, linux-btrfs

On Fri, Feb 04, 2022 at 07:17:07PM +0100, David Sterba wrote:
> > > > >> --- a/fs/btrfs/file.c
> > > > >> +++ b/fs/btrfs/file.c
> > > > >> @@ -277,8 +277,7 @@ static int __btrfs_run_defrag_inode(struct btrfs_fs_info *fs_info,
> > > > >>   {
> > > > >>   	struct btrfs_root *inode_root;
> > > > >>   	struct inode *inode;
> > > > >> -	struct btrfs_ioctl_defrag_range_args range;
> > > > >> -	int num_defrag;
> > > > >> +	struct btrfs_defrag_ctrl ctrl = {};
> > > > >
> > > > > Most of the code base uses { 0 } for this type of initialization.
> > > > > IIRC some compiler versions complained about {} and preferred the
> > > > > version with 0.
> > > > 
> > > > Exactly what I preferred, but David mentioned that kernel is moving
> > > > torwards {} thus I that's what I go.
> > > > 
> > > > > I think we should try to be consistent. Personally I find the 0 version
> > > > > more clear. Though this might be bike shedding territory.
> > > 
> > > The preferred style is {} because { 0 } does not consistently initialize
> > > the structures on all compilers. I can't find the mail/commit and
> > > reasoning, if there's a pointer as the first member, then it gets
> > > initialized (the 0 matches) but the rest of the structure is not
> > > initialized.
> > 
> > I thought we've had all the { 0 } converted to {} but no so I get the
> > consistency agrument.
> 
> https://lore.kernel.org/all/20210910225207.3272766-1-keescook@chromium.org/
> 
> A tree-wide change but it did not get pulled in the end. The problematic
> compiler was gcc 4.9 but the current minimum requirement is 5.1 so I
> wonder how much relevant it is.

I've asked Kees, seems that the conversion is not necessary. Thus, I'd
stay with { 0 } that we have now as it's the most used initializer.

No need to resend patches just if there's {}, I'll fix it up anywhere I
see it.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] btrfs: defrag: use btrfs_defrag_ctrl to replace btrfs_ioctl_defrag_range_args for btrfs_defrag_file()
  2022-02-08 17:00             ` David Sterba
@ 2022-02-09  0:15               ` Qu Wenruo
  2022-02-14 16:19                 ` David Sterba
  0 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2022-02-09  0:15 UTC (permalink / raw)
  To: dsterba, Filipe Manana, Qu Wenruo, linux-btrfs



On 2022/2/9 01:00, David Sterba wrote:
> On Fri, Feb 04, 2022 at 07:17:07PM +0100, David Sterba wrote:
>>>>>>> --- a/fs/btrfs/file.c
>>>>>>> +++ b/fs/btrfs/file.c
>>>>>>> @@ -277,8 +277,7 @@ static int __btrfs_run_defrag_inode(struct btrfs_fs_info *fs_info,
>>>>>>>    {
>>>>>>>    	struct btrfs_root *inode_root;
>>>>>>>    	struct inode *inode;
>>>>>>> -	struct btrfs_ioctl_defrag_range_args range;
>>>>>>> -	int num_defrag;
>>>>>>> +	struct btrfs_defrag_ctrl ctrl = {};
>>>>>>
>>>>>> Most of the code base uses { 0 } for this type of initialization.
>>>>>> IIRC some compiler versions complained about {} and preferred the
>>>>>> version with 0.
>>>>>
>>>>> Exactly what I preferred, but David mentioned that kernel is moving
>>>>> torwards {} thus I that's what I go.
>>>>>
>>>>>> I think we should try to be consistent. Personally I find the 0 version
>>>>>> more clear. Though this might be bike shedding territory.
>>>>
>>>> The preferred style is {} because { 0 } does not consistently initialize
>>>> the structures on all compilers. I can't find the mail/commit and
>>>> reasoning, if there's a pointer as the first member, then it gets
>>>> initialized (the 0 matches) but the rest of the structure is not
>>>> initialized.
>>>
>>> I thought we've had all the { 0 } converted to {} but no so I get the
>>> consistency agrument.
>>
>> https://lore.kernel.org/all/20210910225207.3272766-1-keescook@chromium.org/
>>
>> A tree-wide change but it did not get pulled in the end. The problematic
>> compiler was gcc 4.9 but the current minimum requirement is 5.1 so I
>> wonder how much relevant it is.
>
> I've asked Kees, seems that the conversion is not necessary. Thus, I'd
> stay with { 0 } that we have now as it's the most used initializer.
>
> No need to resend patches just if there's {}, I'll fix it up anywhere I
> see it.

So the preferred style reverts back to "= { 0 };", right?

Thanks,
Qu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] btrfs: defrag: use btrfs_defrag_ctrl to replace btrfs_ioctl_defrag_range_args for btrfs_defrag_file()
  2022-02-09  0:15               ` Qu Wenruo
@ 2022-02-14 16:19                 ` David Sterba
  0 siblings, 0 replies; 19+ messages in thread
From: David Sterba @ 2022-02-14 16:19 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Filipe Manana, Qu Wenruo, linux-btrfs

On Wed, Feb 09, 2022 at 08:15:42AM +0800, Qu Wenruo wrote:
> On 2022/2/9 01:00, David Sterba wrote:
> >>>
> >>> I thought we've had all the { 0 } converted to {} but no so I get the
> >>> consistency agrument.
> >>
> >> https://lore.kernel.org/all/20210910225207.3272766-1-keescook@chromium.org/
> >>
> >> A tree-wide change but it did not get pulled in the end. The problematic
> >> compiler was gcc 4.9 but the current minimum requirement is 5.1 so I
> >> wonder how much relevant it is.
> >
> > I've asked Kees, seems that the conversion is not necessary. Thus, I'd
> > stay with { 0 } that we have now as it's the most used initializer.
> >
> > No need to resend patches just if there's {}, I'll fix it up anywhere I
> > see it.
> 
> So the preferred style reverts back to "= { 0 };", right?

Yes

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-02-14 16:23 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-28  8:12 [PATCH 0/4] btrfs: defrag: don't waste CPU time on non-target extent Qu Wenruo
2022-01-28  8:12 ` [PATCH 1/4] btrfs: uapi: introduce BTRFS_DEFRAG_RANGE_MASK for later sanity check Qu Wenruo
2022-02-03 16:58   ` Filipe Manana
2022-01-28  8:12 ` [PATCH 2/4] btrfs: defrag: introduce btrfs_defrag_ctrl structure for later usage Qu Wenruo
2022-02-03 17:06   ` Filipe Manana
2022-01-28  8:12 ` [PATCH 3/4] btrfs: defrag: use btrfs_defrag_ctrl to replace btrfs_ioctl_defrag_range_args for btrfs_defrag_file() Qu Wenruo
2022-02-03 17:17   ` Filipe Manana
2022-02-04  0:30     ` Qu Wenruo
2022-02-04  1:03       ` Qu Wenruo
2022-02-04 17:57       ` David Sterba
2022-02-04 18:00         ` David Sterba
2022-02-04 18:17           ` David Sterba
2022-02-08 17:00             ` David Sterba
2022-02-09  0:15               ` Qu Wenruo
2022-02-14 16:19                 ` David Sterba
2022-01-28  8:12 ` [PATCH 4/4] btrfs: defrag: allow defrag_one_cluster() to large extent which is not a target Qu Wenruo
2022-02-03 17:39   ` Filipe Manana
2022-02-04  0:39     ` Qu Wenruo
2022-02-04  7:08       ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.