linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Minor improvements and cleanups to ext4 mballoc
@ 2024-04-23 12:40 Kemeng Shi
  2024-04-23 12:40 ` [PATCH v2 1/5] ext4: keep "prefetch_grp" and "nr" consistent Kemeng Shi
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Kemeng Shi @ 2024-04-23 12:40 UTC (permalink / raw)
  To: tytso, adilger.kernel, linux-ext4, linux-kernel
  Cc: jack, ojaswin, ritesh.list

v1->v2:
-Collect RVB from Jan and Ojaswin
-Rewrite changelog of "ext4: call ext4_mb_mark_free_simple to free
continuous bits in found chunk"
-Remove "cr =" in comment of criteria name as criteria name has CR_
prfefix now.
-Only open codeing repeated checks in next_linear_group

This series contains some minor improvements and cleanups to ext4
mballoc. No failure is found in "kvm-xfstests smoke" test and unit
test. More details can be found in respective patches. Thanks!

Kemeng Shi (5):
  ext4: keep "prefetch_grp" and "nr" consistent
  ext4: add test_mb_mark_used_cost to estimate cost of mb_mark_used
  ext4: call ext4_mb_mark_free_simple to free continuous bits in found
    chunk
  ext4: use correct criteria name instead stale integer number in
    comment
  ext4: open coding repeated check in next_linear_group

 fs/ext4/ext4.h         |  9 +++--
 fs/ext4/mballoc-test.c | 52 ++++++++++++++++++++++++++
 fs/ext4/mballoc.c      | 84 ++++++++++++++++++++++--------------------
 fs/ext4/mballoc.h      |  4 +-
 4 files changed, 105 insertions(+), 44 deletions(-)

-- 
2.30.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/5] ext4: keep "prefetch_grp" and "nr" consistent
  2024-04-23 12:40 [PATCH v2 0/5] Minor improvements and cleanups to ext4 mballoc Kemeng Shi
@ 2024-04-23 12:40 ` Kemeng Shi
  2024-04-23 12:40 ` [PATCH v2 2/5] ext4: add test_mb_mark_used_cost to estimate cost of mb_mark_used Kemeng Shi
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Kemeng Shi @ 2024-04-23 12:40 UTC (permalink / raw)
  To: tytso, adilger.kernel, linux-ext4, linux-kernel
  Cc: jack, ojaswin, ritesh.list

Keep "prefetch_grp" and "nr" consistent to avoid to call
ext4_mb_prefetch_fini with non-prefetched groups.
When we step into next criteria, "prefetch_grp" is set to prefetch start
of new criteria while "nr" is number of the prefetched group in previous
criteria. If previous criteria and next criteria are both inexpensive
(< CR_GOAL_LEN_SLOW) and prefetch_ios reachs sbi->s_mb_prefetch_limit
in previous criteria, "prefetch_grp" and "nr" will be inconsistent and
may introduce unexpected cost to do ext4_mb_init_group for non-prefetched
groups.
Reset "nr" to 0 when we reset "prefetch_grp" to goal group to keep them
consistent.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/mballoc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 12b3f196010b..a61fc52956b2 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2856,6 +2856,7 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
 		group = ac->ac_g_ex.fe_group;
 		ac->ac_groups_linear_remaining = sbi->s_mb_max_linear_groups;
 		prefetch_grp = group;
+		nr = 0;
 
 		for (i = 0, new_cr = cr; i < ngroups; i++,
 		     ext4_mb_choose_next_group(ac, &new_cr, &group, ngroups)) {
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/5] ext4: add test_mb_mark_used_cost to estimate cost of mb_mark_used
  2024-04-23 12:40 [PATCH v2 0/5] Minor improvements and cleanups to ext4 mballoc Kemeng Shi
  2024-04-23 12:40 ` [PATCH v2 1/5] ext4: keep "prefetch_grp" and "nr" consistent Kemeng Shi
@ 2024-04-23 12:40 ` Kemeng Shi
  2024-04-23 12:40 ` [PATCH v2 3/5] ext4: call ext4_mb_mark_free_simple to free continuous bits in found chunk Kemeng Shi
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Kemeng Shi @ 2024-04-23 12:40 UTC (permalink / raw)
  To: tytso, adilger.kernel, linux-ext4, linux-kernel
  Cc: jack, ojaswin, ritesh.list

Add test_mb_mark_used_cost to estimate cost of mb_mark_used

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc-test.c | 52 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/fs/ext4/mballoc-test.c b/fs/ext4/mballoc-test.c
index 044ca5238f41..df482782d695 100644
--- a/fs/ext4/mballoc-test.c
+++ b/fs/ext4/mballoc-test.c
@@ -859,6 +859,56 @@ static void test_mb_free_blocks(struct kunit *test)
 	ext4_mb_unload_buddy(&e4b);
 }
 
+#define COUNT_FOR_ESTIMATE 100000
+static void test_mb_mark_used_cost(struct kunit *test)
+{
+	struct ext4_buddy e4b;
+	struct super_block *sb = (struct super_block *)test->priv;
+	struct ext4_free_extent ex;
+	int ret;
+	struct test_range ranges[TEST_RANGE_COUNT];
+	int i, j;
+	unsigned long start, end, all = 0;
+
+	/* buddy cache assumes that each page contains at least one block */
+	if (sb->s_blocksize > PAGE_SIZE)
+		kunit_skip(test, "blocksize exceeds pagesize");
+
+	ret = ext4_mb_load_buddy(sb, TEST_GOAL_GROUP, &e4b);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	ex.fe_group = TEST_GOAL_GROUP;
+	for (j = 0; j < COUNT_FOR_ESTIMATE; j++) {
+		mbt_generate_test_ranges(sb, ranges, TEST_RANGE_COUNT);
+		start = jiffies;
+		for (i = 0; i < TEST_RANGE_COUNT; i++) {
+			if (ranges[i].len == 0)
+				continue;
+
+			ex.fe_start = ranges[i].start;
+			ex.fe_len = ranges[i].len;
+			ext4_lock_group(sb, TEST_GOAL_GROUP);
+			mb_mark_used(&e4b, &ex);
+			ext4_unlock_group(sb, TEST_GOAL_GROUP);
+		}
+		end = jiffies;
+		all += (end - start);
+
+		for (i = 0; i < TEST_RANGE_COUNT; i++) {
+			if (ranges[i].len == 0)
+				continue;
+
+			ext4_lock_group(sb, TEST_GOAL_GROUP);
+			mb_free_blocks(NULL, &e4b, ranges[i].start,
+				       ranges[i].len);
+			ext4_unlock_group(sb, TEST_GOAL_GROUP);
+		}
+	}
+
+	kunit_info(test, "costed jiffies %lu\n", all);
+	ext4_mb_unload_buddy(&e4b);
+}
+
 static const struct mbt_ext4_block_layout mbt_test_layouts[] = {
 	{
 		.blocksize_bits = 10,
@@ -901,6 +951,8 @@ static struct kunit_case mbt_test_cases[] = {
 	KUNIT_CASE_PARAM(test_mb_mark_used, mbt_layouts_gen_params),
 	KUNIT_CASE_PARAM(test_mb_free_blocks, mbt_layouts_gen_params),
 	KUNIT_CASE_PARAM(test_mark_diskspace_used, mbt_layouts_gen_params),
+	KUNIT_CASE_PARAM_ATTR(test_mb_mark_used_cost, mbt_layouts_gen_params,
+			      { .speed = KUNIT_SPEED_SLOW }),
 	{}
 };
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/5] ext4: call ext4_mb_mark_free_simple to free continuous bits in found chunk
  2024-04-23 12:40 [PATCH v2 0/5] Minor improvements and cleanups to ext4 mballoc Kemeng Shi
  2024-04-23 12:40 ` [PATCH v2 1/5] ext4: keep "prefetch_grp" and "nr" consistent Kemeng Shi
  2024-04-23 12:40 ` [PATCH v2 2/5] ext4: add test_mb_mark_used_cost to estimate cost of mb_mark_used Kemeng Shi
@ 2024-04-23 12:40 ` Kemeng Shi
  2024-04-23 12:40 ` [PATCH v2 4/5] ext4: use correct criteria name instead stale integer number in comment Kemeng Shi
  2024-04-23 12:40 ` [PATCH v2 5/5] ext4: open coding repeated check in next_linear_group Kemeng Shi
  4 siblings, 0 replies; 9+ messages in thread
From: Kemeng Shi @ 2024-04-23 12:40 UTC (permalink / raw)
  To: tytso, adilger.kernel, linux-ext4, linux-kernel
  Cc: jack, ojaswin, ritesh.list

In mb_mark_used, we will find free chunk and mark it inuse. For chunk
in mid of passed range, we could simply mark whole chunk inuse. For chunk
at end of range, we may need to mark a continuous bits at end of part of
chunk inuse and keep rest part of chunk free. To only mark a part of
chunk inuse, we firstly mark whole chunk inuse and then mark a continuous
range at end of chunk free.
Function mb_mark_used does several times of "mb_find_buddy; mb_clear_bit;
..." to mark a continuous range free which can be done by simply calling
ext4_mb_mark_free_simple which free continuous bits in a more effective
way.
Just call ext4_mb_mark_free_simple in mb_mark_used to use existing and
effective code to free continuous blocks in chunk at end of passed range.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a61fc52956b2..5acf413808a2 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2040,13 +2040,12 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
 	int ord;
 	int mlen = 0;
 	int max = 0;
-	int cur;
 	int start = ex->fe_start;
 	int len = ex->fe_len;
 	unsigned ret = 0;
 	int len0 = len;
 	void *buddy;
-	bool split = false;
+	int ord_start, ord_end;
 
 	BUG_ON(start + len > (e4b->bd_sb->s_blocksize << 3));
 	BUG_ON(e4b->bd_group != ex->fe_group);
@@ -2071,16 +2070,12 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
 
 	/* let's maintain buddy itself */
 	while (len) {
-		if (!split)
-			ord = mb_find_order_for_block(e4b, start);
+		ord = mb_find_order_for_block(e4b, start);
 
 		if (((start >> ord) << ord) == start && len >= (1 << ord)) {
 			/* the whole chunk may be allocated at once! */
 			mlen = 1 << ord;
-			if (!split)
-				buddy = mb_find_buddy(e4b, ord, &max);
-			else
-				split = false;
+			buddy = mb_find_buddy(e4b, ord, &max);
 			BUG_ON((start >> ord) >= max);
 			mb_set_bit(start >> ord, buddy);
 			e4b->bd_info->bb_counters[ord]--;
@@ -2094,20 +2089,29 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
 		if (ret == 0)
 			ret = len | (ord << 16);
 
-		/* we have to split large buddy */
 		BUG_ON(ord <= 0);
 		buddy = mb_find_buddy(e4b, ord, &max);
 		mb_set_bit(start >> ord, buddy);
 		e4b->bd_info->bb_counters[ord]--;
 
-		ord--;
-		cur = (start >> ord) & ~1U;
-		buddy = mb_find_buddy(e4b, ord, &max);
-		mb_clear_bit(cur, buddy);
-		mb_clear_bit(cur + 1, buddy);
-		e4b->bd_info->bb_counters[ord]++;
-		e4b->bd_info->bb_counters[ord]++;
-		split = true;
+		ord_start = (start >> ord) << ord;
+		ord_end = ord_start + (1 << ord);
+		/* first chunk */
+		if (start > ord_start)
+			ext4_mb_mark_free_simple(e4b->bd_sb, e4b->bd_buddy,
+						 ord_start, start - ord_start,
+						 e4b->bd_info);
+
+		/* last chunk */
+		if (start + len < ord_end) {
+			ext4_mb_mark_free_simple(e4b->bd_sb, e4b->bd_buddy,
+						 start + len,
+						 ord_end - (start + len),
+						 e4b->bd_info);
+			break;
+		}
+		len = start + len - ord_end;
+		start = ord_end;
 	}
 	mb_set_largest_free_order(e4b->bd_sb, e4b->bd_info);
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 4/5] ext4: use correct criteria name instead stale integer number in comment
  2024-04-23 12:40 [PATCH v2 0/5] Minor improvements and cleanups to ext4 mballoc Kemeng Shi
                   ` (2 preceding siblings ...)
  2024-04-23 12:40 ` [PATCH v2 3/5] ext4: call ext4_mb_mark_free_simple to free continuous bits in found chunk Kemeng Shi
@ 2024-04-23 12:40 ` Kemeng Shi
  2024-04-23 21:43   ` Jan Kara
  2024-04-23 12:40 ` [PATCH v2 5/5] ext4: open coding repeated check in next_linear_group Kemeng Shi
  4 siblings, 1 reply; 9+ messages in thread
From: Kemeng Shi @ 2024-04-23 12:40 UTC (permalink / raw)
  To: tytso, adilger.kernel, linux-ext4, linux-kernel
  Cc: jack, ojaswin, ritesh.list

Use correct criteria name instead stale integer number in comment

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
---
 fs/ext4/ext4.h    |  9 ++++++---
 fs/ext4/mballoc.c | 14 ++++++++------
 fs/ext4/mballoc.h |  4 ++--
 3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 023571f8dd1b..9bd3764d1121 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -213,11 +213,14 @@ enum criteria {
 #define EXT4_MB_USE_RESERVED		0x2000
 /* Do strict check for free blocks while retrying block allocation */
 #define EXT4_MB_STRICT_CHECK		0x4000
-/* Large fragment size list lookup succeeded at least once for cr = 0 */
+/* Large fragment size list lookup succeeded at least once for
+ * CR_POWER2_ALIGNED */
 #define EXT4_MB_CR_POWER2_ALIGNED_OPTIMIZED		0x8000
-/* Avg fragment size rb tree lookup succeeded at least once for cr = 1 */
+/* Avg fragment size rb tree lookup succeeded at least once for
+ * CR_GOAL_LEN_FAST */
 #define EXT4_MB_CR_GOAL_LEN_FAST_OPTIMIZED		0x00010000
-/* Avg fragment size rb tree lookup succeeded at least once for cr = 1.5 */
+/* Avg fragment size rb tree lookup succeeded at least once for
+ * CR_BEST_AVAIL_LEN */
 #define EXT4_MB_CR_BEST_AVAIL_LEN_OPTIMIZED		0x00020000
 
 struct ext4_allocation_request {
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5acf413808a2..71b2f9a18875 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1131,8 +1131,9 @@ static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac,
 		ext4_mb_choose_next_group_best_avail(ac, new_cr, group);
 	} else {
 		/*
-		 * TODO: For CR=2, we can arrange groups in an rb tree sorted by
-		 * bb_free. But until that happens, we should never come here.
+		 * TODO: For CR=CR_GOAL_LEN_SLOW, we can arrange groups in an
+		 * rb tree sorted by bb_free. But until that happens, we should
+		 * never come here.
 		 */
 		WARN_ON(1);
 	}
@@ -3445,10 +3446,11 @@ static int ext4_mb_init_backend(struct super_block *sb)
 	}
 	if (sbi->s_mb_prefetch > ext4_get_groups_count(sb))
 		sbi->s_mb_prefetch = ext4_get_groups_count(sb);
-	/* now many real IOs to prefetch within a single allocation at cr=0
-	 * given cr=0 is an CPU-related optimization we shouldn't try to
-	 * load too many groups, at some point we should start to use what
-	 * we've got in memory.
+	/*
+	 * now many real IOs to prefetch within a single allocation at
+	 * cr=CR_POWER2_ALIGNED. Given cr=CR_POWER2_ALIGNED is an CPU-related
+	 * optimization we shouldn't try to load too many groups, at some point
+	 * we should start to use what we've got in memory.
 	 * with an average random access time 5ms, it'd take a second to get
 	 * 200 groups (* N with flex_bg), so let's make this limit 4
 	 */
diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
index 56938532b4ce..042437d8860f 100644
--- a/fs/ext4/mballoc.h
+++ b/fs/ext4/mballoc.h
@@ -187,8 +187,8 @@ struct ext4_allocation_context {
 	struct ext4_free_extent ac_f_ex;
 
 	/*
-	 * goal len can change in CR1.5, so save the original len. This is
-	 * used while adjusting the PA window and for accounting.
+	 * goal len can change in CR_BEST_AVAIL_LEN, so save the original len.
+	 * This is used while adjusting the PA window and for accounting.
 	 */
 	ext4_grpblk_t	ac_orig_goal_len;
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 5/5] ext4: open coding repeated check in next_linear_group
  2024-04-23 12:40 [PATCH v2 0/5] Minor improvements and cleanups to ext4 mballoc Kemeng Shi
                   ` (3 preceding siblings ...)
  2024-04-23 12:40 ` [PATCH v2 4/5] ext4: use correct criteria name instead stale integer number in comment Kemeng Shi
@ 2024-04-23 12:40 ` Kemeng Shi
  2024-04-23 21:46   ` Jan Kara
  4 siblings, 1 reply; 9+ messages in thread
From: Kemeng Shi @ 2024-04-23 12:40 UTC (permalink / raw)
  To: tytso, adilger.kernel, linux-ext4, linux-kernel
  Cc: jack, ojaswin, ritesh.list

Open coding repeated check in next_linear_group.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 71b2f9a18875..4afe5bb94bf4 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1076,23 +1076,11 @@ static inline int should_optimize_scan(struct ext4_allocation_context *ac)
 }
 
 /*
- * Return next linear group for allocation. If linear traversal should not be
- * performed, this function just returns the same group
+ * Return next linear group for allocation.
  */
 static ext4_group_t
-next_linear_group(struct ext4_allocation_context *ac, ext4_group_t group,
-		  ext4_group_t ngroups)
+next_linear_group(ext4_group_t group, ext4_group_t ngroups)
 {
-	if (!should_optimize_scan(ac))
-		goto inc_and_return;
-
-	if (ac->ac_groups_linear_remaining) {
-		ac->ac_groups_linear_remaining--;
-		goto inc_and_return;
-	}
-
-	return group;
-inc_and_return:
 	/*
 	 * Artificially restricted ngroups for non-extent
 	 * files makes group > ngroups possible on first loop.
@@ -1118,8 +1106,19 @@ static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac,
 {
 	*new_cr = ac->ac_criteria;
 
-	if (!should_optimize_scan(ac) || ac->ac_groups_linear_remaining) {
-		*group = next_linear_group(ac, *group, ngroups);
+	if (!should_optimize_scan(ac)) {
+		*group = next_linear_group(*group, ngroups);
+		return;
+	}
+
+	/*
+	 * Optimized scanning can return non adjacent groups which can cause
+	 * seek overhead for rotational disks. So try few linear groups before
+	 * trying optimized scan.
+	 */
+	if (ac->ac_groups_linear_remaining) {
+		*group = next_linear_group(*group, ngroups);
+		ac->ac_groups_linear_remaining--;
 		return;
 	}
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 4/5] ext4: use correct criteria name instead stale integer number in comment
  2024-04-23 12:40 ` [PATCH v2 4/5] ext4: use correct criteria name instead stale integer number in comment Kemeng Shi
@ 2024-04-23 21:43   ` Jan Kara
  2024-04-24  1:20     ` Kemeng Shi
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2024-04-23 21:43 UTC (permalink / raw)
  To: Kemeng Shi
  Cc: tytso, adilger.kernel, linux-ext4, linux-kernel, jack, ojaswin,
	ritesh.list

On Tue 23-04-24 20:40:45, Kemeng Shi wrote:
> Use correct criteria name instead stale integer number in comment
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>

You have cleaned up the superfluous "CR=" bits in several places but still
left them is couple more :). See below:

> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 5acf413808a2..71b2f9a18875 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1131,8 +1131,9 @@ static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac,
>  		ext4_mb_choose_next_group_best_avail(ac, new_cr, group);
>  	} else {
>  		/*
> -		 * TODO: For CR=2, we can arrange groups in an rb tree sorted by
> -		 * bb_free. But until that happens, we should never come here.
> +		 * TODO: For CR=CR_GOAL_LEN_SLOW, we can arrange groups in an
			     ^^^ Still you have left these superfluous
"CR=" bits here.
 
> +		 * rb tree sorted by bb_free. But until that happens, we should
> +		 * never come here.
>  		 */
>  		WARN_ON(1);
>  	}
> @@ -3445,10 +3446,11 @@ static int ext4_mb_init_backend(struct super_block *sb)
>  	}
>  	if (sbi->s_mb_prefetch > ext4_get_groups_count(sb))
>  		sbi->s_mb_prefetch = ext4_get_groups_count(sb);
> -	/* now many real IOs to prefetch within a single allocation at cr=0
> -	 * given cr=0 is an CPU-related optimization we shouldn't try to
> -	 * load too many groups, at some point we should start to use what
> -	 * we've got in memory.
> +	/*
> +	 * now many real IOs to prefetch within a single allocation at
> +	 * cr=CR_POWER2_ALIGNED. Given cr=CR_POWER2_ALIGNED is an CPU-related
           ^^^  and here               ^^^

> +	 * optimization we shouldn't try to load too many groups, at some point
> +	 * we should start to use what we've got in memory.
>  	 * with an average random access time 5ms, it'd take a second to get
>  	 * 200 groups (* N with flex_bg), so let's make this limit 4
>  	 */

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 5/5] ext4: open coding repeated check in next_linear_group
  2024-04-23 12:40 ` [PATCH v2 5/5] ext4: open coding repeated check in next_linear_group Kemeng Shi
@ 2024-04-23 21:46   ` Jan Kara
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2024-04-23 21:46 UTC (permalink / raw)
  To: Kemeng Shi
  Cc: tytso, adilger.kernel, linux-ext4, linux-kernel, jack, ojaswin,
	ritesh.list

On Tue 23-04-24 20:40:46, Kemeng Shi wrote:
> Open coding repeated check in next_linear_group.
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/mballoc.c | 31 +++++++++++++++----------------
>  1 file changed, 15 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 71b2f9a18875..4afe5bb94bf4 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1076,23 +1076,11 @@ static inline int should_optimize_scan(struct ext4_allocation_context *ac)
>  }
>  
>  /*
> - * Return next linear group for allocation. If linear traversal should not be
> - * performed, this function just returns the same group
> + * Return next linear group for allocation.
>   */
>  static ext4_group_t
> -next_linear_group(struct ext4_allocation_context *ac, ext4_group_t group,
> -		  ext4_group_t ngroups)
> +next_linear_group(ext4_group_t group, ext4_group_t ngroups)
>  {
> -	if (!should_optimize_scan(ac))
> -		goto inc_and_return;
> -
> -	if (ac->ac_groups_linear_remaining) {
> -		ac->ac_groups_linear_remaining--;
> -		goto inc_and_return;
> -	}
> -
> -	return group;
> -inc_and_return:
>  	/*
>  	 * Artificially restricted ngroups for non-extent
>  	 * files makes group > ngroups possible on first loop.
> @@ -1118,8 +1106,19 @@ static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac,
>  {
>  	*new_cr = ac->ac_criteria;
>  
> -	if (!should_optimize_scan(ac) || ac->ac_groups_linear_remaining) {
> -		*group = next_linear_group(ac, *group, ngroups);
> +	if (!should_optimize_scan(ac)) {
> +		*group = next_linear_group(*group, ngroups);
> +		return;
> +	}
> +
> +	/*
> +	 * Optimized scanning can return non adjacent groups which can cause
> +	 * seek overhead for rotational disks. So try few linear groups before
> +	 * trying optimized scan.
> +	 */
> +	if (ac->ac_groups_linear_remaining) {
> +		*group = next_linear_group(*group, ngroups);
> +		ac->ac_groups_linear_remaining--;
>  		return;
>  	}
>  
> -- 
> 2.30.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 4/5] ext4: use correct criteria name instead stale integer number in comment
  2024-04-23 21:43   ` Jan Kara
@ 2024-04-24  1:20     ` Kemeng Shi
  0 siblings, 0 replies; 9+ messages in thread
From: Kemeng Shi @ 2024-04-24  1:20 UTC (permalink / raw)
  To: Jan Kara
  Cc: tytso, adilger.kernel, linux-ext4, linux-kernel, ojaswin, ritesh.list



on 4/24/2024 5:43 AM, Jan Kara wrote:
> On Tue 23-04-24 20:40:45, Kemeng Shi wrote:
>> Use correct criteria name instead stale integer number in comment
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> 
> You have cleaned up the superfluous "CR=" bits in several places but still
> left them is couple more :). See below:
Sorry, It seems that I mis-understand what you mean in last reply. I will
clean up all unnecessary stuff like "CR=" in next version. Thanks for the
feedback.

Kemeng
> 
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 5acf413808a2..71b2f9a18875 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -1131,8 +1131,9 @@ static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac,
>>  		ext4_mb_choose_next_group_best_avail(ac, new_cr, group);
>>  	} else {
>>  		/*
>> -		 * TODO: For CR=2, we can arrange groups in an rb tree sorted by
>> -		 * bb_free. But until that happens, we should never come here.
>> +		 * TODO: For CR=CR_GOAL_LEN_SLOW, we can arrange groups in an
> 			     ^^^ Still you have left these superfluous
> "CR=" bits here.
>  
>> +		 * rb tree sorted by bb_free. But until that happens, we should
>> +		 * never come here.
>>  		 */
>>  		WARN_ON(1);
>>  	}
>> @@ -3445,10 +3446,11 @@ static int ext4_mb_init_backend(struct super_block *sb)
>>  	}
>>  	if (sbi->s_mb_prefetch > ext4_get_groups_count(sb))
>>  		sbi->s_mb_prefetch = ext4_get_groups_count(sb);
>> -	/* now many real IOs to prefetch within a single allocation at cr=0
>> -	 * given cr=0 is an CPU-related optimization we shouldn't try to
>> -	 * load too many groups, at some point we should start to use what
>> -	 * we've got in memory.
>> +	/*
>> +	 * now many real IOs to prefetch within a single allocation at
>> +	 * cr=CR_POWER2_ALIGNED. Given cr=CR_POWER2_ALIGNED is an CPU-related
>            ^^^  and here               ^^^
> 
>> +	 * optimization we shouldn't try to load too many groups, at some point
>> +	 * we should start to use what we've got in memory.
>>  	 * with an average random access time 5ms, it'd take a second to get
>>  	 * 200 groups (* N with flex_bg), so let's make this limit 4
>>  	 */
> 
> 								Honza
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-04-24  1:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-23 12:40 [PATCH v2 0/5] Minor improvements and cleanups to ext4 mballoc Kemeng Shi
2024-04-23 12:40 ` [PATCH v2 1/5] ext4: keep "prefetch_grp" and "nr" consistent Kemeng Shi
2024-04-23 12:40 ` [PATCH v2 2/5] ext4: add test_mb_mark_used_cost to estimate cost of mb_mark_used Kemeng Shi
2024-04-23 12:40 ` [PATCH v2 3/5] ext4: call ext4_mb_mark_free_simple to free continuous bits in found chunk Kemeng Shi
2024-04-23 12:40 ` [PATCH v2 4/5] ext4: use correct criteria name instead stale integer number in comment Kemeng Shi
2024-04-23 21:43   ` Jan Kara
2024-04-24  1:20     ` Kemeng Shi
2024-04-23 12:40 ` [PATCH v2 5/5] ext4: open coding repeated check in next_linear_group Kemeng Shi
2024-04-23 21:46   ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).