linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 00/12] cleanups and unit test for mballoc
@ 2023-09-19 20:15 Kemeng Shi
  2023-09-19 20:15 ` [PATCH v7 01/12] ext4: make state in ext4_mb_mark_bb to be bool Kemeng Shi
                   ` (11 more replies)
  0 siblings, 12 replies; 27+ messages in thread
From: Kemeng Shi @ 2023-09-19 20:15 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list; +Cc: ojaswin, linux-ext4, linux-kernel

v6->v7:
1. Remove struct ext4_mark_context and call ext4_mb_mark_context with all
needed arguments directly.
2. Add new patch to make state in ext4_mb_mark_bb bool and make state in
new-added function ext4_mb_mark_context bool.
3. Fix typos in git messages.
4. Keep on-disk bitmap marking code tight in patch separating on-disk
bitmap and buddy bitmap freeing.
5. Test 64k blocksize instead of 256k blocksize.
6. Remove RVB from Ojaswin in changed patches and collect updated RVB from
Ritesh.
7. As there is no much functional change to initial version, "kvm-xfstest smoke"
test and kunit test are ran and result is good.
8. Plan to add ext4_inode_block_valid in ext4_mb_mark_context in new
series!

v5->v6:
1. Separate block bitmap and buddy bitmap freeing in individual patch
and rewrite the descriptions.
2. Remove #ifdef around KUNIT_STATIC_STUB_REDIRECT which should be
only defined when CONFIG_KUNIT is enabled after fix [7] which was merged
into kunit-next/kunit
3. Use mbt prefix to distiguish test APIs from actual kernel APIs.
4. Add prepare function for ext4_mark_context and rename
ext4_mb_mark_group_bb to ext4_mb_mark_context
5. Support to run mballoc test with different layouts setting and
different block size is tested.

v4->v5:
1. WARN on free blocks to uninitialized group is removed as normal
fast commit route may triggers this, see [1] for details. The review
tag from Ojaswin of changed patch is also removed and a futher review
is needed.

v3->v4:
1. Collect Reviewed-by from Ojaswin
2. Do improve as Ojaswin kindly suggested: Fix typo in commit,
WARN if try to clear bit of uninitialized group and improve
refactoring of AGGRESSIVE_CHECK code.
3. Fix conflic on patch 16
4. Improve git log in patch 16,17

v2->v3:
1. Fix build warnings on "ext4: add some kunit stub for mballoc kunit
test" and "ext4: add first unit test for ext4_mb_new_blocks_simple in
mballoc"

This series is a new version of unmerged patches from [1]. The first 6
patches of this series factor out codes to mark blocks used or freed
which will update on disk block bitmap and group descriptor. Several
reasons to do this:
1. pair behavior of alloc/free bits. For example,
ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups
in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this.
2. remove repeat code to read from disk, update and write back to disk.
3. reduce future unit test mocks to avoid real IO to update structure
on disk.

The last 2 patches add a unit test for mballoc. Before more unit tests
are added, there are something should be discussed:
1. How to test static function in mballoc.c
Currently, include mballoc-test.c in mballoc.c to test static function
in mballoc.c from mballoc-test.c which is one way suggested in [2].
Not sure if there is any more elegant way to test static function without
touch mballoc.c.
2. How to add mocks to function in mballoc.c which may issue IO to disk
Currently, KUNIT_STATIC_STUB_REDIRECT is added to functions as suggested
in kunit document [3].
3. How to simulate a block bitmap.
Currently, a fake buffer_head with bitmap data is returned, then no
futher change is needed.
If we simulate a block bitmap with an array of data structure like:
struct test_bitmap {
       unsigned int	start;
       unsigned int	len;
}
which is suggested by Theodore in [4], then we need to add mocks to
function which expected bitmap from bitmap_bh->b_data, like
mb_find_next_bit, mb_find_next_zero_bit and maybe more.

Would like to hear any suggestion! Thanks!

I run kvm-xfstest with config "ext4/all" and "-g auto" together with
patchset for resize, you can see detail report in [6].

Unit test result is as followings:
# ./tools/testing/kunit/kunit.py run --kunitconfig=fs/ext4/.kunitconfig --raw_output
[02:56:14] Configuring KUnit Kernel ...
[02:56:14] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make ARCH=um O=.kunit --jobs=88
[02:56:19] Starting KUnit Kernel (1/1)...
KTAP version 1
1..2
    KTAP version 1
    # Subtest: ext4_mballoc_test
    1..1
        KTAP version 1
        # Subtest: test_new_blocks_simple
        ok 1 block_bits=10 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
        ok 2 block_bits=12 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
        ok 3 block_bits=16 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
    # test_new_blocks_simple: pass:3 fail:0 skip:0 total:3
    ok 1 test_new_blocks_simple
# Totals: pass:3 fail:0 skip:0 total:3
ok 1 ext4_mballoc_test
    KTAP version 1
    # Subtest: ext4_inode_test
    1..1
        KTAP version 1
        # Subtest: inode_test_xtimestamp_decoding
        ok 1 1901-12-13 Lower bound of 32bit < 0 timestamp, no extra bits
        ok 2 1969-12-31 Upper bound of 32bit < 0 timestamp, no extra bits
        ok 3 1970-01-01 Lower bound of 32bit >=0 timestamp, no extra bits
        ok 4 2038-01-19 Upper bound of 32bit >=0 timestamp, no extra bits
        ok 5 2038-01-19 Lower bound of 32bit <0 timestamp, lo extra sec bit on
        ok 6 2106-02-07 Upper bound of 32bit <0 timestamp, lo extra sec bit on
        ok 7 2106-02-07 Lower bound of 32bit >=0 timestamp, lo extra sec bit on
        ok 8 2174-02-25 Upper bound of 32bit >=0 timestamp, lo extra sec bit on
        ok 9 2174-02-25 Lower bound of 32bit <0 timestamp, hi extra sec bit on
        ok 10 2242-03-16 Upper bound of 32bit <0 timestamp, hi extra sec bit on
        ok 11 2242-03-16 Lower bound of 32bit >=0 timestamp, hi extra sec bit on
        ok 12 2310-04-04 Upper bound of 32bit >=0 timestamp, hi extra sec bit on
        ok 13 2310-04-04 Upper bound of 32bit>=0 timestamp, hi extra sec bit 1. 1 ns
        ok 14 2378-04-22 Lower bound of 32bit>= timestamp. Extra sec bits 1. Max ns
        ok 15 2378-04-22 Lower bound of 32bit >=0 timestamp. All extra sec bits on
        ok 16 2446-05-10 Upper bound of 32bit >=0 timestamp. All extra sec bits on
    # inode_test_xtimestamp_decoding: pass:16 fail:0 skip:0 total:16
    ok 1 inode_test_xtimestamp_decoding
# Totals: pass:16 fail:0 skip:0 total:16
ok 2 ext4_inode_test
[02:56:19] Elapsed time: 5.173s total, 0.001s configuring, 5.055s building, 0.074s running

[1]
https://lore.kernel.org/linux-ext4/20230603150327.3596033-1-shikemeng@huaweicloud.com/T/#m5ff8e3a058ce1cb272dfef3262cd3202ce6e4358
[2]
https://lore.kernel.org/linux-ext4/ZC3MoWn2UO6p+Swp@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com/
[3]
https://docs.kernel.org/dev-tools/kunit/usage.html#testing-static-functions
[4]
https://docs.kernel.org/dev-tools/kunit/api/functionredirection.html#c.KUNIT_STATIC_STUB_REDIRECT
[5]
https://lore.kernel.org/linux-ext4/20230317155047.GB3270589@mit.edu/
[6]
https://lore.kernel.org/linux-ext4/20230629120044.1261968-1-shikemeng@huaweicloud.com/T/#mcc8fb0697fd54d9267c02c027e1eb3468026ae56
[7]
https://lore.kernel.org/lkml/20230725172051.2142641-1-shikemeng@huaweicloud.com/


Kemeng Shi (12):
  ext4: make state in ext4_mb_mark_bb to be bool
  ext4: factor out codes to update block bitmap and group descriptor on
    disk from ext4_mb_mark_bb
  ext4: call ext4_mb_mark_context in ext4_free_blocks_simple
  ext4: extend ext4_mb_mark_context to support allocation under journal
  ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used
  ext4: Separate block bitmap and buddy bitmap freeing in
    ext4_mb_clear_bb()
  ext4: call ext4_mb_mark_context in ext4_mb_clear_bb
  ext4: Separate block bitmap and buddy bitmap freeing in
    ext4_group_add_blocks()
  ext4: call ext4_mb_mark_context in ext4_group_add_blocks()
  ext4: add some kunit stub for mballoc kunit test
  ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc
  ext4: run mballoc test with different layouts setting

 fs/ext4/balloc.c       |  10 +
 fs/ext4/ext4.h         |   2 +-
 fs/ext4/fast_commit.c  |   8 +-
 fs/ext4/mballoc-test.c | 349 ++++++++++++++++++++++++++++
 fs/ext4/mballoc.c      | 512 +++++++++++++++--------------------------
 5 files changed, 550 insertions(+), 331 deletions(-)
 create mode 100644 fs/ext4/mballoc-test.c

-- 
2.30.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v7 01/12] ext4: make state in ext4_mb_mark_bb to be bool
  2023-09-19 20:15 [PATCH v7 00/12] cleanups and unit test for mballoc Kemeng Shi
@ 2023-09-19 20:15 ` Kemeng Shi
  2023-09-27  6:10   ` Ritesh Harjani
  2023-09-19 20:15 ` [PATCH v7 02/12] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb Kemeng Shi
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Kemeng Shi @ 2023-09-19 20:15 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list; +Cc: ojaswin, linux-ext4, linux-kernel

As state could only be either 0 or 1, just make it bool.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/ext4.h        | 2 +-
 fs/ext4/fast_commit.c | 8 ++++----
 fs/ext4/mballoc.c     | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 84618c46f239..88186ea2322f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2918,7 +2918,7 @@ extern int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 extern int ext4_trim_fs(struct super_block *, struct fstrim_range *);
 extern void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid);
 extern void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
-		       int len, int state);
+			    int len, bool state);
 static inline bool ext4_mb_cr_expensive(enum criteria cr)
 {
 	return cr >= CR_GOAL_LEN_SLOW;
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index b06de728b3b6..87c009e0c59a 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -1806,7 +1806,7 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
 			 * at the end of the FC replay using our array of
 			 * modified inodes.
 			 */
-			ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, 0);
+			ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, false);
 			goto next;
 		}
 
@@ -1875,7 +1875,7 @@ ext4_fc_replay_del_range(struct super_block *sb,
 		if (ret > 0) {
 			remaining -= ret;
 			cur += ret;
-			ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, 0);
+			ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, false);
 		} else {
 			remaining -= map.m_len;
 			cur += map.m_len;
@@ -1934,12 +1934,12 @@ static void ext4_fc_set_bitmaps_and_counters(struct super_block *sb)
 				if (!IS_ERR(path)) {
 					for (j = 0; j < path->p_depth; j++)
 						ext4_mb_mark_bb(inode->i_sb,
-							path[j].p_block, 1, 1);
+							path[j].p_block, 1, true);
 					ext4_free_ext_path(path);
 				}
 				cur += ret;
 				ext4_mb_mark_bb(inode->i_sb, map.m_pblk,
-							map.m_len, 1);
+							map.m_len, true);
 			} else {
 				cur = cur + (map.m_len ? map.m_len : 1);
 			}
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 1e599305d85f..cf09adfbaf11 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4077,7 +4077,7 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
  * blocks in bitmaps and update counters.
  */
 void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
-			int len, int state)
+		     int len, bool state)
 {
 	struct buffer_head *bitmap_bh = NULL;
 	struct ext4_group_desc *gdp;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 02/12] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb
  2023-09-19 20:15 [PATCH v7 00/12] cleanups and unit test for mballoc Kemeng Shi
  2023-09-19 20:15 ` [PATCH v7 01/12] ext4: make state in ext4_mb_mark_bb to be bool Kemeng Shi
@ 2023-09-19 20:15 ` Kemeng Shi
  2023-09-27  8:49   ` Ritesh Harjani
  2023-09-19 20:15 ` [PATCH v7 03/12] ext4: call ext4_mb_mark_context in ext4_free_blocks_simple Kemeng Shi
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Kemeng Shi @ 2023-09-19 20:15 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list; +Cc: ojaswin, linux-ext4, linux-kernel

There are several reasons to add a general function ext4_mb_mark_context
to update block bitmap and group descriptor on disk:
1. pair behavior of alloc/free bits. For example,
ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups
in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this.
2. remove repeat code to read from disk, update and write back to disk.
3. reduce future unit test mocks to catch real IO to update structure
on disk.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 147 ++++++++++++++++++++++++----------------------
 1 file changed, 77 insertions(+), 70 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index cf09adfbaf11..e1320eea46e9 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3953,6 +3953,80 @@ void ext4_exit_mballoc(void)
 	ext4_groupinfo_destroy_slabs();
 }
 
+static int
+ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group,
+		     ext4_grpblk_t blkoff, ext4_grpblk_t len)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	struct buffer_head *bitmap_bh = NULL;
+	struct ext4_group_desc *gdp;
+	struct buffer_head *gdp_bh;
+	int err;
+	unsigned int i, already, changed;
+
+	bitmap_bh = ext4_read_block_bitmap(sb, group);
+	if (IS_ERR(bitmap_bh))
+		return PTR_ERR(bitmap_bh);
+
+	err = -EIO;
+	gdp = ext4_get_group_desc(sb, group, &gdp_bh);
+	if (!gdp)
+		goto out_err;
+
+	ext4_lock_group(sb, group);
+	if (ext4_has_group_desc_csum(sb) &&
+	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
+		gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
+		ext4_free_group_clusters_set(sb, gdp,
+			ext4_free_clusters_after_init(sb, group, gdp));
+	}
+
+	already = 0;
+	for (i = 0; i < len; i++)
+		if (mb_test_bit(blkoff + i, bitmap_bh->b_data) ==
+				state)
+			already++;
+	changed = len - already;
+
+	if (state) {
+		mb_set_bits(bitmap_bh->b_data, blkoff, len);
+		ext4_free_group_clusters_set(sb, gdp,
+			ext4_free_group_clusters(sb, gdp) - changed);
+	} else {
+		mb_clear_bits(bitmap_bh->b_data, blkoff, len);
+		ext4_free_group_clusters_set(sb, gdp,
+			ext4_free_group_clusters(sb, gdp) + changed);
+	}
+
+	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
+	ext4_group_desc_csum_set(sb, group, gdp);
+	ext4_unlock_group(sb, group);
+
+	if (sbi->s_log_groups_per_flex) {
+		ext4_group_t flex_group = ext4_flex_group(sbi, group);
+		struct flex_groups *fg = sbi_array_rcu_deref(sbi,
+					   s_flex_groups, flex_group);
+
+		if (state)
+			atomic64_sub(changed, &fg->free_clusters);
+		else
+			atomic64_add(changed, &fg->free_clusters);
+	}
+
+	err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
+	if (err)
+		goto out_err;
+	err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
+	if (err)
+		goto out_err;
+
+	sync_dirty_buffer(bitmap_bh);
+	sync_dirty_buffer(gdp_bh);
+
+out_err:
+	brelse(bitmap_bh);
+	return err;
+}
 
 /*
  * Check quota and mark chosen space (ac->ac_b_ex) non-free in bitmaps
@@ -4079,15 +4153,11 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
 void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
 		     int len, bool state)
 {
-	struct buffer_head *bitmap_bh = NULL;
-	struct ext4_group_desc *gdp;
-	struct buffer_head *gdp_bh;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	ext4_group_t group;
 	ext4_grpblk_t blkoff;
-	int i, err = 0;
-	int already;
-	unsigned int clen, clen_changed, thisgrp_len;
+	int err = 0;
+	unsigned int clen, thisgrp_len;
 
 	while (len > 0) {
 		ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
@@ -4108,80 +4178,17 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
 			ext4_error(sb, "Marking blocks in system zone - "
 				   "Block = %llu, len = %u",
 				   block, thisgrp_len);
-			bitmap_bh = NULL;
 			break;
 		}
 
-		bitmap_bh = ext4_read_block_bitmap(sb, group);
-		if (IS_ERR(bitmap_bh)) {
-			err = PTR_ERR(bitmap_bh);
-			bitmap_bh = NULL;
-			break;
-		}
-
-		err = -EIO;
-		gdp = ext4_get_group_desc(sb, group, &gdp_bh);
-		if (!gdp)
-			break;
-
-		ext4_lock_group(sb, group);
-		already = 0;
-		for (i = 0; i < clen; i++)
-			if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) ==
-					 !state)
-				already++;
-
-		clen_changed = clen - already;
-		if (state)
-			mb_set_bits(bitmap_bh->b_data, blkoff, clen);
-		else
-			mb_clear_bits(bitmap_bh->b_data, blkoff, clen);
-		if (ext4_has_group_desc_csum(sb) &&
-		    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
-			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
-			ext4_free_group_clusters_set(sb, gdp,
-			     ext4_free_clusters_after_init(sb, group, gdp));
-		}
-		if (state)
-			clen = ext4_free_group_clusters(sb, gdp) - clen_changed;
-		else
-			clen = ext4_free_group_clusters(sb, gdp) + clen_changed;
-
-		ext4_free_group_clusters_set(sb, gdp, clen);
-		ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
-		ext4_group_desc_csum_set(sb, group, gdp);
-
-		ext4_unlock_group(sb, group);
-
-		if (sbi->s_log_groups_per_flex) {
-			ext4_group_t flex_group = ext4_flex_group(sbi, group);
-			struct flex_groups *fg = sbi_array_rcu_deref(sbi,
-						   s_flex_groups, flex_group);
-
-			if (state)
-				atomic64_sub(clen_changed, &fg->free_clusters);
-			else
-				atomic64_add(clen_changed, &fg->free_clusters);
-
-		}
-
-		err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
-		if (err)
-			break;
-		sync_dirty_buffer(bitmap_bh);
-		err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
-		sync_dirty_buffer(gdp_bh);
+		err = ext4_mb_mark_context(sb, state, group, blkoff, clen);
 		if (err)
 			break;
 
 		block += thisgrp_len;
 		len -= thisgrp_len;
-		brelse(bitmap_bh);
 		BUG_ON(len < 0);
 	}
-
-	if (err)
-		brelse(bitmap_bh);
 }
 
 /*
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 03/12] ext4: call ext4_mb_mark_context in ext4_free_blocks_simple
  2023-09-19 20:15 [PATCH v7 00/12] cleanups and unit test for mballoc Kemeng Shi
  2023-09-19 20:15 ` [PATCH v7 01/12] ext4: make state in ext4_mb_mark_bb to be bool Kemeng Shi
  2023-09-19 20:15 ` [PATCH v7 02/12] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb Kemeng Shi
@ 2023-09-19 20:15 ` Kemeng Shi
  2023-09-27  8:52   ` Ritesh Harjani
  2023-09-19 20:15 ` [PATCH v7 04/12] ext4: extend ext4_mb_mark_context to support allocation under journal Kemeng Shi
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Kemeng Shi @ 2023-09-19 20:15 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list; +Cc: ojaswin, linux-ext4, linux-kernel

call ext4_mb_mark_context in ext4_free_blocks_simple to:
1. remove repeat code
2. pair update of free_clusters in ext4_mb_new_blocks_simple.
3. add missing ext4_lock_group/ext4_unlock_group protection.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 33 +--------------------------------
 1 file changed, 1 insertion(+), 32 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index e1320eea46e9..cd2fd5dbfcdd 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6393,43 +6393,12 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
 static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block,
 					unsigned long count)
 {
-	struct buffer_head *bitmap_bh;
 	struct super_block *sb = inode->i_sb;
-	struct ext4_group_desc *gdp;
-	struct buffer_head *gdp_bh;
 	ext4_group_t group;
 	ext4_grpblk_t blkoff;
-	int already_freed = 0, err, i;
 
 	ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
-	bitmap_bh = ext4_read_block_bitmap(sb, group);
-	if (IS_ERR(bitmap_bh)) {
-		pr_warn("Failed to read block bitmap\n");
-		return;
-	}
-	gdp = ext4_get_group_desc(sb, group, &gdp_bh);
-	if (!gdp)
-		goto err_out;
-
-	for (i = 0; i < count; i++) {
-		if (!mb_test_bit(blkoff + i, bitmap_bh->b_data))
-			already_freed++;
-	}
-	mb_clear_bits(bitmap_bh->b_data, blkoff, count);
-	err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
-	if (err)
-		goto err_out;
-	ext4_free_group_clusters_set(
-		sb, gdp, ext4_free_group_clusters(sb, gdp) +
-		count - already_freed);
-	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
-	ext4_group_desc_csum_set(sb, group, gdp);
-	ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
-	sync_dirty_buffer(bitmap_bh);
-	sync_dirty_buffer(gdp_bh);
-
-err_out:
-	brelse(bitmap_bh);
+	ext4_mb_mark_context(sb, false, group, blkoff, count);
 }
 
 /**
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 04/12] ext4: extend ext4_mb_mark_context to support allocation under journal
  2023-09-19 20:15 [PATCH v7 00/12] cleanups and unit test for mballoc Kemeng Shi
                   ` (2 preceding siblings ...)
  2023-09-19 20:15 ` [PATCH v7 03/12] ext4: call ext4_mb_mark_context in ext4_free_blocks_simple Kemeng Shi
@ 2023-09-19 20:15 ` Kemeng Shi
  2023-09-27  9:13   ` Ritesh Harjani
  2023-09-19 20:15 ` [PATCH v7 05/12] ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used Kemeng Shi
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Kemeng Shi @ 2023-09-19 20:15 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list; +Cc: ojaswin, linux-ext4, linux-kernel

Previously, ext4_mb_mark_context is only called under fast commit
replay path, so there is no valid handle when we update block bitmap
and group descriptor. This patch try to extend ext4_mb_mark_context
to be used by code under journal. There are several improvement:
1. add "handle_t *handle" to struct ext4_mark_context to journal block
bitmap and group descriptor update inside ext4_mb_mark_context (the
added journal code is based on ext4_mb_mark_diskspace_used where
ext4_mb_mark_context is going to be used.)
2. add EXT4_MB_BITMAP_MARKED_CHECK flag to control check if bits in block
bitmap are already marked as allocation code under journal asserts that
all bits to be changed are not marked before.
3. add "ext4_grpblk_t changed" to struct ext4_mark_context to notify number
of bits in block bitmap has changed.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 64 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 49 insertions(+), 15 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index cd2fd5dbfcdd..bdf07b2904ca 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3953,26 +3953,47 @@ void ext4_exit_mballoc(void)
 	ext4_groupinfo_destroy_slabs();
 }
 
+#define EXT4_MB_BITMAP_MARKED_CHECK 0x0001
+#define EXT4_MB_SYNC_UPDATE 0x0002
 static int
-ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group,
-		     ext4_grpblk_t blkoff, ext4_grpblk_t len)
+ext4_mb_mark_context(handle_t *handle, struct super_block *sb, bool state,
+		     ext4_group_t group, ext4_grpblk_t blkoff,
+		     ext4_grpblk_t len, int flags, ext4_grpblk_t *ret_changed)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	struct buffer_head *bitmap_bh = NULL;
 	struct ext4_group_desc *gdp;
 	struct buffer_head *gdp_bh;
 	int err;
-	unsigned int i, already, changed;
+	unsigned int i, already, changed = len;
 
+	if (ret_changed)
+		*ret_changed = 0;
 	bitmap_bh = ext4_read_block_bitmap(sb, group);
 	if (IS_ERR(bitmap_bh))
 		return PTR_ERR(bitmap_bh);
 
+	if (handle) {
+		BUFFER_TRACE(bitmap_bh, "getting write access");
+		err = ext4_journal_get_write_access(handle, sb, bitmap_bh,
+						    EXT4_JTR_NONE);
+		if (err)
+			goto out_err;
+	}
+
 	err = -EIO;
 	gdp = ext4_get_group_desc(sb, group, &gdp_bh);
 	if (!gdp)
 		goto out_err;
 
+	if (handle) {
+		BUFFER_TRACE(gdp_bh, "get_write_access");
+		err = ext4_journal_get_write_access(handle, sb, gdp_bh,
+						    EXT4_JTR_NONE);
+		if (err)
+			goto out_err;
+	}
+
 	ext4_lock_group(sb, group);
 	if (ext4_has_group_desc_csum(sb) &&
 	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
@@ -3981,12 +4002,14 @@ ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group,
 			ext4_free_clusters_after_init(sb, group, gdp));
 	}
 
-	already = 0;
-	for (i = 0; i < len; i++)
-		if (mb_test_bit(blkoff + i, bitmap_bh->b_data) ==
-				state)
-			already++;
-	changed = len - already;
+	if (flags & EXT4_MB_BITMAP_MARKED_CHECK) {
+		already = 0;
+		for (i = 0; i < len; i++)
+			if (mb_test_bit(blkoff + i, bitmap_bh->b_data) ==
+					state)
+				already++;
+		changed = len - already;
+	}
 
 	if (state) {
 		mb_set_bits(bitmap_bh->b_data, blkoff, len);
@@ -4001,6 +4024,8 @@ ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group,
 	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
 	ext4_group_desc_csum_set(sb, group, gdp);
 	ext4_unlock_group(sb, group);
+	if (ret_changed)
+		*ret_changed = changed;
 
 	if (sbi->s_log_groups_per_flex) {
 		ext4_group_t flex_group = ext4_flex_group(sbi, group);
@@ -4013,15 +4038,17 @@ ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group,
 			atomic64_add(changed, &fg->free_clusters);
 	}
 
-	err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
+	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
 	if (err)
 		goto out_err;
-	err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
+	err = ext4_handle_dirty_metadata(handle, NULL, gdp_bh);
 	if (err)
 		goto out_err;
 
-	sync_dirty_buffer(bitmap_bh);
-	sync_dirty_buffer(gdp_bh);
+	if (flags & EXT4_MB_SYNC_UPDATE) {
+		sync_dirty_buffer(bitmap_bh);
+		sync_dirty_buffer(gdp_bh);
+	}
 
 out_err:
 	brelse(bitmap_bh);
@@ -4181,7 +4208,11 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
 			break;
 		}
 
-		err = ext4_mb_mark_context(sb, state, group, blkoff, clen);
+		err = ext4_mb_mark_context(NULL, sb, state,
+					   group, blkoff, clen,
+					   EXT4_MB_BITMAP_MARKED_CHECK |
+					   EXT4_MB_SYNC_UPDATE,
+					   NULL);
 		if (err)
 			break;
 
@@ -6398,7 +6429,10 @@ static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block,
 	ext4_grpblk_t blkoff;
 
 	ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
-	ext4_mb_mark_context(sb, false, group, blkoff, count);
+	ext4_mb_mark_context(NULL, sb, false, group, blkoff, count,
+			     EXT4_MB_BITMAP_MARKED_CHECK |
+			     EXT4_MB_SYNC_UPDATE,
+			     NULL);
 }
 
 /**
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 05/12] ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used
  2023-09-19 20:15 [PATCH v7 00/12] cleanups and unit test for mballoc Kemeng Shi
                   ` (3 preceding siblings ...)
  2023-09-19 20:15 ` [PATCH v7 04/12] ext4: extend ext4_mb_mark_context to support allocation under journal Kemeng Shi
@ 2023-09-19 20:15 ` Kemeng Shi
  2023-09-27  9:58   ` Ritesh Harjani
  2023-09-19 20:15 ` [PATCH v7 06/12] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb() Kemeng Shi
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Kemeng Shi @ 2023-09-19 20:15 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list; +Cc: ojaswin, linux-ext4, linux-kernel

call ext4_mb_mark_context in ext4_mb_mark_diskspace_used to:
1. remove repeat code to normally update bitmap and group descriptor
on disk.
2. call ext4_mb_mark_context instead of only setting bits in block bitmap
to fix the bitmap. Function ext4_mb_mark_context will also update
checksum of bitmap and other counter along with the bit change to keep
the cosistent with bit change or block bitmap will be marked corrupted as
checksum of bitmap is in inconsistent state.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 86 +++++++++++------------------------------------
 1 file changed, 20 insertions(+), 66 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index bdf07b2904ca..87e5922997fb 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4063,13 +4063,13 @@ static noinline_for_stack int
 ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
 				handle_t *handle, unsigned int reserv_clstrs)
 {
-	struct buffer_head *bitmap_bh = NULL;
 	struct ext4_group_desc *gdp;
-	struct buffer_head *gdp_bh;
 	struct ext4_sb_info *sbi;
 	struct super_block *sb;
 	ext4_fsblk_t block;
 	int err, len;
+	int flags = 0;
+	ext4_grpblk_t changed;
 
 	BUG_ON(ac->ac_status != AC_STATUS_FOUND);
 	BUG_ON(ac->ac_b_ex.fe_len <= 0);
@@ -4077,32 +4077,13 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
 	sb = ac->ac_sb;
 	sbi = EXT4_SB(sb);
 
-	bitmap_bh = ext4_read_block_bitmap(sb, ac->ac_b_ex.fe_group);
-	if (IS_ERR(bitmap_bh)) {
-		return PTR_ERR(bitmap_bh);
-	}
-
-	BUFFER_TRACE(bitmap_bh, "getting write access");
-	err = ext4_journal_get_write_access(handle, sb, bitmap_bh,
-					    EXT4_JTR_NONE);
-	if (err)
-		goto out_err;
-
-	err = -EIO;
-	gdp = ext4_get_group_desc(sb, ac->ac_b_ex.fe_group, &gdp_bh);
+	gdp = ext4_get_group_desc(sb, ac->ac_b_ex.fe_group, NULL);
 	if (!gdp)
-		goto out_err;
-
+		return -EIO;
 	ext4_debug("using block group %u(%d)\n", ac->ac_b_ex.fe_group,
 			ext4_free_group_clusters(sb, gdp));
 
-	BUFFER_TRACE(gdp_bh, "get_write_access");
-	err = ext4_journal_get_write_access(handle, sb, gdp_bh, EXT4_JTR_NONE);
-	if (err)
-		goto out_err;
-
 	block = ext4_grp_offs_to_block(sb, &ac->ac_b_ex);
-
 	len = EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
 	if (!ext4_inode_block_valid(ac->ac_inode, block, len)) {
 		ext4_error(sb, "Allocating blocks %llu-%llu which overlap "
@@ -4111,41 +4092,29 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
 		 * Fix the bitmap and return EFSCORRUPTED
 		 * We leak some of the blocks here.
 		 */
-		ext4_lock_group(sb, ac->ac_b_ex.fe_group);
-		mb_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start,
-			      ac->ac_b_ex.fe_len);
-		ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
-		err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
+		err = ext4_mb_mark_context(handle, sb, true,
+					   ac->ac_b_ex.fe_group,
+					   ac->ac_b_ex.fe_start,
+					   ac->ac_b_ex.fe_len,
+					   0, NULL);
 		if (!err)
 			err = -EFSCORRUPTED;
-		goto out_err;
+		return err;
 	}
 
-	ext4_lock_group(sb, ac->ac_b_ex.fe_group);
 #ifdef AGGRESSIVE_CHECK
-	{
-		int i;
-		for (i = 0; i < ac->ac_b_ex.fe_len; i++) {
-			BUG_ON(mb_test_bit(ac->ac_b_ex.fe_start + i,
-						bitmap_bh->b_data));
-		}
-	}
+	flags |= EXT4_MB_BITMAP_MARKED_CHECK;
 #endif
-	mb_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start,
-		      ac->ac_b_ex.fe_len);
-	if (ext4_has_group_desc_csum(sb) &&
-	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
-		gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
-		ext4_free_group_clusters_set(sb, gdp,
-					     ext4_free_clusters_after_init(sb,
-						ac->ac_b_ex.fe_group, gdp));
-	}
-	len = ext4_free_group_clusters(sb, gdp) - ac->ac_b_ex.fe_len;
-	ext4_free_group_clusters_set(sb, gdp, len);
-	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
-	ext4_group_desc_csum_set(sb, ac->ac_b_ex.fe_group, gdp);
+	err = ext4_mb_mark_context(handle, sb, true, ac->ac_b_ex.fe_group,
+				   ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len,
+				   flags, &changed);
+
+	if (err && changed == 0)
+		return err;
 
-	ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
+#ifdef AGGRESSIVE_CHECK
+	BUG_ON(changed != ac->ac_b_ex.fe_len);
+#endif
 	percpu_counter_sub(&sbi->s_freeclusters_counter, ac->ac_b_ex.fe_len);
 	/*
 	 * Now reduce the dirty block count also. Should not go negative
@@ -4155,21 +4124,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
 		percpu_counter_sub(&sbi->s_dirtyclusters_counter,
 				   reserv_clstrs);
 
-	if (sbi->s_log_groups_per_flex) {
-		ext4_group_t flex_group = ext4_flex_group(sbi,
-							  ac->ac_b_ex.fe_group);
-		atomic64_sub(ac->ac_b_ex.fe_len,
-			     &sbi_array_rcu_deref(sbi, s_flex_groups,
-						  flex_group)->free_clusters);
-	}
-
-	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
-	if (err)
-		goto out_err;
-	err = ext4_handle_dirty_metadata(handle, NULL, gdp_bh);
-
-out_err:
-	brelse(bitmap_bh);
 	return err;
 }
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 06/12] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb()
  2023-09-19 20:15 [PATCH v7 00/12] cleanups and unit test for mballoc Kemeng Shi
                   ` (4 preceding siblings ...)
  2023-09-19 20:15 ` [PATCH v7 05/12] ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used Kemeng Shi
@ 2023-09-19 20:15 ` Kemeng Shi
  2023-09-27 11:06   ` Ritesh Harjani
  2023-09-19 20:15 ` [PATCH v7 07/12] ext4: call ext4_mb_mark_context in ext4_mb_clear_bb Kemeng Shi
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Kemeng Shi @ 2023-09-19 20:15 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list; +Cc: ojaswin, linux-ext4, linux-kernel

This patch separates block bitmap and buddy bitmap freeing in order to
update block bitmap with ext4_mb_mark_context in following patch.

Separated freeing is safe with concurrent allocation as long as:
1. Firstly allocate block in buddy bitmap, and then in block bitmap.
2. Firstly free block in block bitmap, and then buddy bitmap.
Then freed block will only be available to allocation when both buddy
bitmap and block bitmap are updated by freeing.
Allocation obeys rule 1 already, just do sperated freeing with rule 2.

Separated freeing has no race with generate_buddy as:
Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
buddy page can be found in sbi->s_buddy_cache and no more buddy
initialization of the buddy page will be executed concurrently until
buddy page is unloaded. As we always do free in "load buddy, free,
unload buddy" sequence, separated freeing has no race with generate_buddy.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 98 +++++++++++++++++++++++------------------------
 1 file changed, 49 insertions(+), 49 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 87e5922997fb..8c1127aa7f59 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6423,7 +6423,7 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 		ext4_error(sb, "Freeing blocks in system zone - "
 			   "Block = %llu, count = %lu", block, count);
 		/* err = 0. ext4_std_error should be a no op */
-		goto error_return;
+		goto error_out;
 	}
 	flags |= EXT4_FREE_BLOCKS_VALIDATED;
 
@@ -6447,31 +6447,39 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 		flags &= ~EXT4_FREE_BLOCKS_VALIDATED;
 	}
 	count_clusters = EXT4_NUM_B2C(sbi, count);
+	trace_ext4_mballoc_free(sb, inode, block_group, bit, count_clusters);
+
+	/* __GFP_NOFAIL: retry infinitely, ignore TIF_MEMDIE and memcg limit. */
+	err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b,
+				     GFP_NOFS|__GFP_NOFAIL);
+	if (err)
+		goto error_out;
+
+	if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) &&
+	    !ext4_inode_block_valid(inode, block, count)) {
+		ext4_error(sb, "Freeing blocks in system zone - "
+			   "Block = %llu, count = %lu", block, count);
+		/* err = 0. ext4_std_error should be a no op */
+		goto error_clean;
+	}
+
 	bitmap_bh = ext4_read_block_bitmap(sb, block_group);
 	if (IS_ERR(bitmap_bh)) {
 		err = PTR_ERR(bitmap_bh);
 		bitmap_bh = NULL;
-		goto error_return;
+		goto error_clean;
 	}
 	gdp = ext4_get_group_desc(sb, block_group, &gd_bh);
 	if (!gdp) {
 		err = -EIO;
-		goto error_return;
-	}
-
-	if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) &&
-	    !ext4_inode_block_valid(inode, block, count)) {
-		ext4_error(sb, "Freeing blocks in system zone - "
-			   "Block = %llu, count = %lu", block, count);
-		/* err = 0. ext4_std_error should be a no op */
-		goto error_return;
+		goto error_clean;
 	}
 
 	BUFFER_TRACE(bitmap_bh, "getting write access");
 	err = ext4_journal_get_write_access(handle, sb, bitmap_bh,
 					    EXT4_JTR_NONE);
 	if (err)
-		goto error_return;
+		goto error_clean;
 
 	/*
 	 * We are about to modify some metadata.  Call the journal APIs
@@ -6481,7 +6489,7 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 	BUFFER_TRACE(gd_bh, "get_write_access");
 	err = ext4_journal_get_write_access(handle, sb, gd_bh, EXT4_JTR_NONE);
 	if (err)
-		goto error_return;
+		goto error_clean;
 #ifdef AGGRESSIVE_CHECK
 	{
 		int i;
@@ -6489,13 +6497,30 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 			BUG_ON(!mb_test_bit(bit + i, bitmap_bh->b_data));
 	}
 #endif
-	trace_ext4_mballoc_free(sb, inode, block_group, bit, count_clusters);
+	ext4_lock_group(sb, block_group);
+	mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
+	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
+	ext4_free_group_clusters_set(sb, gdp, ret);
+	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
+	ext4_group_desc_csum_set(sb, block_group, gdp);
+	ext4_unlock_group(sb, block_group);
 
-	/* __GFP_NOFAIL: retry infinitely, ignore TIF_MEMDIE and memcg limit. */
-	err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b,
-				     GFP_NOFS|__GFP_NOFAIL);
-	if (err)
-		goto error_return;
+	if (sbi->s_log_groups_per_flex) {
+		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
+		atomic64_add(count_clusters,
+			     &sbi_array_rcu_deref(sbi, s_flex_groups,
+						  flex_group)->free_clusters);
+	}
+
+	/* We dirtied the bitmap block */
+	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
+	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
+
+	/* And the group descriptor block */
+	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
+	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
+	if (!err)
+		err = ret;
 
 	/*
 	 * We need to make sure we don't reuse the freed block until after the
@@ -6519,13 +6544,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 		new_entry->efd_tid = handle->h_transaction->t_tid;
 
 		ext4_lock_group(sb, block_group);
-		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
 		ext4_mb_free_metadata(handle, &e4b, new_entry);
 	} else {
-		/* need to update group_info->bb_free and bitmap
-		 * with group lock held. generate_buddy look at
-		 * them with group lock_held
-		 */
 		if (test_opt(sb, DISCARD)) {
 			err = ext4_issue_discard(sb, block_group, bit,
 						 count_clusters, NULL);
@@ -6538,23 +6558,11 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 			EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info);
 
 		ext4_lock_group(sb, block_group);
-		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
 		mb_free_blocks(inode, &e4b, bit, count_clusters);
 	}
 
-	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
-	ext4_free_group_clusters_set(sb, gdp, ret);
-	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
-	ext4_group_desc_csum_set(sb, block_group, gdp);
 	ext4_unlock_group(sb, block_group);
 
-	if (sbi->s_log_groups_per_flex) {
-		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
-		atomic64_add(count_clusters,
-			     &sbi_array_rcu_deref(sbi, s_flex_groups,
-						  flex_group)->free_clusters);
-	}
-
 	/*
 	 * on a bigalloc file system, defer the s_freeclusters_counter
 	 * update to the caller (ext4_remove_space and friends) so they
@@ -6567,28 +6575,20 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 				   count_clusters);
 	}
 
-	ext4_mb_unload_buddy(&e4b);
-
-	/* We dirtied the bitmap block */
-	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
-	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
-
-	/* And the group descriptor block */
-	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
-	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
-	if (!err)
-		err = ret;
-
 	if (overflow && !err) {
 		block += count;
 		count = overflow;
+		ext4_mb_unload_buddy(&e4b);
 		put_bh(bitmap_bh);
 		/* The range changed so it's no longer validated */
 		flags &= ~EXT4_FREE_BLOCKS_VALIDATED;
 		goto do_more;
 	}
-error_return:
+
+error_clean:
+	ext4_mb_unload_buddy(&e4b);
 	brelse(bitmap_bh);
+error_out:
 	ext4_std_error(sb, err);
 }
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 07/12] ext4: call ext4_mb_mark_context in ext4_mb_clear_bb
  2023-09-19 20:15 [PATCH v7 00/12] cleanups and unit test for mballoc Kemeng Shi
                   ` (5 preceding siblings ...)
  2023-09-19 20:15 ` [PATCH v7 06/12] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb() Kemeng Shi
@ 2023-09-19 20:15 ` Kemeng Shi
  2023-09-27 11:14   ` Ritesh Harjani
  2023-09-19 20:15 ` [PATCH v7 08/12] ext4: Separate block bitmap and buddy bitmap freeing in ext4_group_add_blocks() Kemeng Shi
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Kemeng Shi @ 2023-09-19 20:15 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list; +Cc: ojaswin, linux-ext4, linux-kernel

Call ext4_mb_mark_context in ext4_mb_clear_bb to remove repeat code.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 69 +++++++----------------------------------------
 1 file changed, 10 insertions(+), 59 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 8c1127aa7f59..730397af6975 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6402,19 +6402,17 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 			       ext4_fsblk_t block, unsigned long count,
 			       int flags)
 {
-	struct buffer_head *bitmap_bh = NULL;
 	struct super_block *sb = inode->i_sb;
-	struct ext4_group_desc *gdp;
 	struct ext4_group_info *grp;
 	unsigned int overflow;
 	ext4_grpblk_t bit;
-	struct buffer_head *gd_bh;
 	ext4_group_t block_group;
 	struct ext4_sb_info *sbi;
 	struct ext4_buddy e4b;
 	unsigned int count_clusters;
 	int err = 0;
-	int ret;
+	int mark_flags = 0;
+	ext4_grpblk_t changed;
 
 	sbi = EXT4_SB(sb);
 
@@ -6463,64 +6461,19 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 		goto error_clean;
 	}
 
-	bitmap_bh = ext4_read_block_bitmap(sb, block_group);
-	if (IS_ERR(bitmap_bh)) {
-		err = PTR_ERR(bitmap_bh);
-		bitmap_bh = NULL;
-		goto error_clean;
-	}
-	gdp = ext4_get_group_desc(sb, block_group, &gd_bh);
-	if (!gdp) {
-		err = -EIO;
-		goto error_clean;
-	}
-
-	BUFFER_TRACE(bitmap_bh, "getting write access");
-	err = ext4_journal_get_write_access(handle, sb, bitmap_bh,
-					    EXT4_JTR_NONE);
-	if (err)
-		goto error_clean;
-
-	/*
-	 * We are about to modify some metadata.  Call the journal APIs
-	 * to unshare ->b_data if a currently-committing transaction is
-	 * using it
-	 */
-	BUFFER_TRACE(gd_bh, "get_write_access");
-	err = ext4_journal_get_write_access(handle, sb, gd_bh, EXT4_JTR_NONE);
-	if (err)
-		goto error_clean;
 #ifdef AGGRESSIVE_CHECK
-	{
-		int i;
-		for (i = 0; i < count_clusters; i++)
-			BUG_ON(!mb_test_bit(bit + i, bitmap_bh->b_data));
-	}
+	mark_flags |= EXT4_MB_BITMAP_MARKED_CHECK;
 #endif
-	ext4_lock_group(sb, block_group);
-	mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
-	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
-	ext4_free_group_clusters_set(sb, gdp, ret);
-	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
-	ext4_group_desc_csum_set(sb, block_group, gdp);
-	ext4_unlock_group(sb, block_group);
+	err = ext4_mb_mark_context(handle, sb, false, block_group, bit,
+				   count_clusters, mark_flags, &changed);
 
-	if (sbi->s_log_groups_per_flex) {
-		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
-		atomic64_add(count_clusters,
-			     &sbi_array_rcu_deref(sbi, s_flex_groups,
-						  flex_group)->free_clusters);
-	}
 
-	/* We dirtied the bitmap block */
-	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
-	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
+	if (err && changed == 0)
+		goto error_clean;
 
-	/* And the group descriptor block */
-	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
-	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
-	if (!err)
-		err = ret;
+#ifdef AGGRESSIVE_CHECK
+	BUG_ON(changed != count_clusters);
+#endif
 
 	/*
 	 * We need to make sure we don't reuse the freed block until after the
@@ -6579,7 +6532,6 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 		block += count;
 		count = overflow;
 		ext4_mb_unload_buddy(&e4b);
-		put_bh(bitmap_bh);
 		/* The range changed so it's no longer validated */
 		flags &= ~EXT4_FREE_BLOCKS_VALIDATED;
 		goto do_more;
@@ -6587,7 +6539,6 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 
 error_clean:
 	ext4_mb_unload_buddy(&e4b);
-	brelse(bitmap_bh);
 error_out:
 	ext4_std_error(sb, err);
 }
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 08/12] ext4: Separate block bitmap and buddy bitmap freeing in ext4_group_add_blocks()
  2023-09-19 20:15 [PATCH v7 00/12] cleanups and unit test for mballoc Kemeng Shi
                   ` (6 preceding siblings ...)
  2023-09-19 20:15 ` [PATCH v7 07/12] ext4: call ext4_mb_mark_context in ext4_mb_clear_bb Kemeng Shi
@ 2023-09-19 20:15 ` Kemeng Shi
  2023-09-27 11:16   ` Ritesh Harjani
  2023-09-19 20:15 ` [PATCH v7 09/12] ext4: call ext4_mb_mark_context " Kemeng Shi
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Kemeng Shi @ 2023-09-19 20:15 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list; +Cc: ojaswin, linux-ext4, linux-kernel

This patch separates block bitmap and buddy bitmap freeing in order to
udpate block bitmap with ext4_mb_mark_context in following patch.
The reason why this can be sperated is explained in previous submit.
Put the explanation here to simplify the code archeology to
ext4_group_add_blocks():

Separated freeing is safe with concurrent allocation as long as:
1. Firstly allocate block in buddy bitmap, and then in block bitmap.
2. Firstly free block in block bitmap, and then buddy bitmap.
Then freed block will only be available to allocation when both buddy
bitmap and block bitmap are updated by freeing.
Allocation obeys rule 1 already, just do sperated freeing with rule 2.

Separated freeing has no race with generate_buddy as:
Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
buddy page can be found in sbi->s_buddy_cache and no more buddy
initialization of the buddy page will be executed concurrently until
buddy page is unloaded. As we always do free in "load buddy, free,
unload buddy" sequence, separated freeing has no race with generate_buddy.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 54 +++++++++++++++++++++++------------------------
 1 file changed, 26 insertions(+), 28 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 730397af6975..404121445fc3 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6685,35 +6685,39 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 		ext4_warning(sb, "too many blocks added to group %u",
 			     block_group);
 		err = -EINVAL;
-		goto error_return;
+		goto error_out;
+	}
+
+	err = ext4_mb_load_buddy(sb, block_group, &e4b);
+	if (err)
+		goto error_out;
+
+	if (!ext4_sb_block_valid(sb, NULL, block, count)) {
+		ext4_error(sb, "Adding blocks in system zones - "
+			   "Block = %llu, count = %lu",
+			   block, count);
+		err = -EINVAL;
+		goto error_clean;
 	}
 
 	bitmap_bh = ext4_read_block_bitmap(sb, block_group);
 	if (IS_ERR(bitmap_bh)) {
 		err = PTR_ERR(bitmap_bh);
 		bitmap_bh = NULL;
-		goto error_return;
+		goto error_clean;
 	}
 
 	desc = ext4_get_group_desc(sb, block_group, &gd_bh);
 	if (!desc) {
 		err = -EIO;
-		goto error_return;
-	}
-
-	if (!ext4_sb_block_valid(sb, NULL, block, count)) {
-		ext4_error(sb, "Adding blocks in system zones - "
-			   "Block = %llu, count = %lu",
-			   block, count);
-		err = -EINVAL;
-		goto error_return;
+		goto error_clean;
 	}
 
 	BUFFER_TRACE(bitmap_bh, "getting write access");
 	err = ext4_journal_get_write_access(handle, sb, bitmap_bh,
 					    EXT4_JTR_NONE);
 	if (err)
-		goto error_return;
+		goto error_clean;
 
 	/*
 	 * We are about to modify some metadata.  Call the journal APIs
@@ -6723,7 +6727,7 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 	BUFFER_TRACE(gd_bh, "get_write_access");
 	err = ext4_journal_get_write_access(handle, sb, gd_bh, EXT4_JTR_NONE);
 	if (err)
-		goto error_return;
+		goto error_clean;
 
 	for (i = 0, clusters_freed = 0; i < cluster_count; i++) {
 		BUFFER_TRACE(bitmap_bh, "clear bit");
@@ -6736,26 +6740,14 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 		}
 	}
 
-	err = ext4_mb_load_buddy(sb, block_group, &e4b);
-	if (err)
-		goto error_return;
-
-	/*
-	 * need to update group_info->bb_free and bitmap
-	 * with group lock held. generate_buddy look at
-	 * them with group lock_held
-	 */
 	ext4_lock_group(sb, block_group);
 	mb_clear_bits(bitmap_bh->b_data, bit, cluster_count);
-	mb_free_blocks(NULL, &e4b, bit, cluster_count);
 	free_clusters_count = clusters_freed +
 		ext4_free_group_clusters(sb, desc);
 	ext4_free_group_clusters_set(sb, desc, free_clusters_count);
 	ext4_block_bitmap_csum_set(sb, desc, bitmap_bh);
 	ext4_group_desc_csum_set(sb, block_group, desc);
 	ext4_unlock_group(sb, block_group);
-	percpu_counter_add(&sbi->s_freeclusters_counter,
-			   clusters_freed);
 
 	if (sbi->s_log_groups_per_flex) {
 		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
@@ -6764,8 +6756,6 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 						  flex_group)->free_clusters);
 	}
 
-	ext4_mb_unload_buddy(&e4b);
-
 	/* We dirtied the bitmap block */
 	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
 	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
@@ -6776,8 +6766,16 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 	if (!err)
 		err = ret;
 
-error_return:
+	ext4_lock_group(sb, block_group);
+	mb_free_blocks(NULL, &e4b, bit, cluster_count);
+	ext4_unlock_group(sb, block_group);
+	percpu_counter_add(&sbi->s_freeclusters_counter,
+			   clusters_freed);
+
+error_clean:
 	brelse(bitmap_bh);
+	ext4_mb_unload_buddy(&e4b);
+error_out:
 	ext4_std_error(sb, err);
 	return err;
 }
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 09/12] ext4: call ext4_mb_mark_context in ext4_group_add_blocks()
  2023-09-19 20:15 [PATCH v7 00/12] cleanups and unit test for mballoc Kemeng Shi
                   ` (7 preceding siblings ...)
  2023-09-19 20:15 ` [PATCH v7 08/12] ext4: Separate block bitmap and buddy bitmap freeing in ext4_group_add_blocks() Kemeng Shi
@ 2023-09-19 20:15 ` Kemeng Shi
  2023-09-27 11:19   ` Ritesh Harjani
  2023-09-19 20:15 ` [PATCH v7 10/12] ext4: add some kunit stub for mballoc kunit test Kemeng Shi
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Kemeng Shi @ 2023-09-19 20:15 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list; +Cc: ojaswin, linux-ext4, linux-kernel

Call ext4_mb_mark_context in ext4_group_add_blocks() to remove repeat code.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 82 ++++++-----------------------------------------
 1 file changed, 10 insertions(+), 72 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 404121445fc3..db0b15098a80 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6657,23 +6657,19 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
 int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 			 ext4_fsblk_t block, unsigned long count)
 {
-	struct buffer_head *bitmap_bh = NULL;
-	struct buffer_head *gd_bh;
 	ext4_group_t block_group;
 	ext4_grpblk_t bit;
-	unsigned int i;
-	struct ext4_group_desc *desc;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	struct ext4_buddy e4b;
-	int err = 0, ret, free_clusters_count;
-	ext4_grpblk_t clusters_freed;
+	int err = 0;
 	ext4_fsblk_t first_cluster = EXT4_B2C(sbi, block);
 	ext4_fsblk_t last_cluster = EXT4_B2C(sbi, block + count - 1);
 	unsigned long cluster_count = last_cluster - first_cluster + 1;
+	ext4_grpblk_t changed;
 
 	ext4_debug("Adding block(s) %llu-%llu\n", block, block + count - 1);
 
-	if (count == 0)
+	if (cluster_count == 0)
 		return 0;
 
 	ext4_get_group_no_and_offset(sb, block, &block_group, &bit);
@@ -6700,80 +6696,22 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 		goto error_clean;
 	}
 
-	bitmap_bh = ext4_read_block_bitmap(sb, block_group);
-	if (IS_ERR(bitmap_bh)) {
-		err = PTR_ERR(bitmap_bh);
-		bitmap_bh = NULL;
-		goto error_clean;
-	}
-
-	desc = ext4_get_group_desc(sb, block_group, &gd_bh);
-	if (!desc) {
-		err = -EIO;
-		goto error_clean;
-	}
-
-	BUFFER_TRACE(bitmap_bh, "getting write access");
-	err = ext4_journal_get_write_access(handle, sb, bitmap_bh,
-					    EXT4_JTR_NONE);
-	if (err)
-		goto error_clean;
-
-	/*
-	 * We are about to modify some metadata.  Call the journal APIs
-	 * to unshare ->b_data if a currently-committing transaction is
-	 * using it
-	 */
-	BUFFER_TRACE(gd_bh, "get_write_access");
-	err = ext4_journal_get_write_access(handle, sb, gd_bh, EXT4_JTR_NONE);
-	if (err)
+	err = ext4_mb_mark_context(handle, sb, false, block_group, bit,
+				   cluster_count, EXT4_MB_BITMAP_MARKED_CHECK,
+				   &changed);
+	if (err && changed == 0)
 		goto error_clean;
 
-	for (i = 0, clusters_freed = 0; i < cluster_count; i++) {
-		BUFFER_TRACE(bitmap_bh, "clear bit");
-		if (!mb_test_bit(bit + i, bitmap_bh->b_data)) {
-			ext4_error(sb, "bit already cleared for block %llu",
-				   (ext4_fsblk_t)(block + i));
-			BUFFER_TRACE(bitmap_bh, "bit already cleared");
-		} else {
-			clusters_freed++;
-		}
-	}
-
-	ext4_lock_group(sb, block_group);
-	mb_clear_bits(bitmap_bh->b_data, bit, cluster_count);
-	free_clusters_count = clusters_freed +
-		ext4_free_group_clusters(sb, desc);
-	ext4_free_group_clusters_set(sb, desc, free_clusters_count);
-	ext4_block_bitmap_csum_set(sb, desc, bitmap_bh);
-	ext4_group_desc_csum_set(sb, block_group, desc);
-	ext4_unlock_group(sb, block_group);
-
-	if (sbi->s_log_groups_per_flex) {
-		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
-		atomic64_add(clusters_freed,
-			     &sbi_array_rcu_deref(sbi, s_flex_groups,
-						  flex_group)->free_clusters);
-	}
-
-	/* We dirtied the bitmap block */
-	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
-	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
-
-	/* And the group descriptor block */
-	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
-	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
-	if (!err)
-		err = ret;
+	if (changed != cluster_count)
+		ext4_error(sb, "bit already cleared in group %u", block_group);
 
 	ext4_lock_group(sb, block_group);
 	mb_free_blocks(NULL, &e4b, bit, cluster_count);
 	ext4_unlock_group(sb, block_group);
 	percpu_counter_add(&sbi->s_freeclusters_counter,
-			   clusters_freed);
+			   changed);
 
 error_clean:
-	brelse(bitmap_bh);
 	ext4_mb_unload_buddy(&e4b);
 error_out:
 	ext4_std_error(sb, err);
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 10/12] ext4: add some kunit stub for mballoc kunit test
  2023-09-19 20:15 [PATCH v7 00/12] cleanups and unit test for mballoc Kemeng Shi
                   ` (8 preceding siblings ...)
  2023-09-19 20:15 ` [PATCH v7 09/12] ext4: call ext4_mb_mark_context " Kemeng Shi
@ 2023-09-19 20:15 ` Kemeng Shi
  2023-09-19 20:15 ` [PATCH v7 11/12] ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc Kemeng Shi
  2023-09-19 20:15 ` [PATCH v7 12/12] ext4: run mballoc test with different layouts setting Kemeng Shi
  11 siblings, 0 replies; 27+ messages in thread
From: Kemeng Shi @ 2023-09-19 20:15 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list; +Cc: ojaswin, linux-ext4, linux-kernel

Multiblocks allocation will read and write block bitmap and group
descriptor which reside on disk. Add kunit stub to function
ext4_get_group_desc, ext4_read_block_bitmap_nowait, ext4_wait_block_bitmap
and ext4_mb_mark_context to avoid real IO to disk.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/ext4/balloc.c  | 10 ++++++++++
 fs/ext4/mballoc.c |  5 +++++
 2 files changed, 15 insertions(+)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 79b20d6ae39e..b0bf255b159f 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -22,6 +22,7 @@
 #include "mballoc.h"
 
 #include <trace/events/ext4.h>
+#include <kunit/static_stub.h>
 
 static unsigned ext4_num_base_meta_clusters(struct super_block *sb,
 					    ext4_group_t block_group);
@@ -274,6 +275,9 @@ struct ext4_group_desc * ext4_get_group_desc(struct super_block *sb,
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	struct buffer_head *bh_p;
 
+	KUNIT_STATIC_STUB_REDIRECT(ext4_get_group_desc,
+				   sb, block_group, bh);
+
 	if (block_group >= ngroups) {
 		ext4_error(sb, "block_group >= groups_count - block_group = %u,"
 			   " groups_count = %u", block_group, ngroups);
@@ -468,6 +472,9 @@ ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group,
 	ext4_fsblk_t bitmap_blk;
 	int err;
 
+	KUNIT_STATIC_STUB_REDIRECT(ext4_read_block_bitmap_nowait,
+				   sb, block_group, ignore_locked);
+
 	desc = ext4_get_group_desc(sb, block_group, NULL);
 	if (!desc)
 		return ERR_PTR(-EFSCORRUPTED);
@@ -563,6 +570,9 @@ int ext4_wait_block_bitmap(struct super_block *sb, ext4_group_t block_group,
 {
 	struct ext4_group_desc *desc;
 
+	KUNIT_STATIC_STUB_REDIRECT(ext4_wait_block_bitmap,
+				   sb, block_group, bh);
+
 	if (!buffer_new(bh))
 		return 0;
 	desc = ext4_get_group_desc(sb, block_group, NULL);
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index db0b15098a80..70dc446996a0 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -18,6 +18,7 @@
 #include <linux/backing-dev.h>
 #include <linux/freezer.h>
 #include <trace/events/ext4.h>
+#include <kunit/static_stub.h>
 
 /*
  * MUSTDO:
@@ -3967,6 +3968,10 @@ ext4_mb_mark_context(handle_t *handle, struct super_block *sb, bool state,
 	int err;
 	unsigned int i, already, changed = len;
 
+	KUNIT_STATIC_STUB_REDIRECT(ext4_mb_mark_context,
+				   handle, sb, state, group, blkoff, len,
+				   flags, ret_changed);
+
 	if (ret_changed)
 		*ret_changed = 0;
 	bitmap_bh = ext4_read_block_bitmap(sb, group);
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 11/12] ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc
  2023-09-19 20:15 [PATCH v7 00/12] cleanups and unit test for mballoc Kemeng Shi
                   ` (9 preceding siblings ...)
  2023-09-19 20:15 ` [PATCH v7 10/12] ext4: add some kunit stub for mballoc kunit test Kemeng Shi
@ 2023-09-19 20:15 ` Kemeng Shi
  2023-09-27 11:39   ` Ritesh Harjani
  2023-09-19 20:15 ` [PATCH v7 12/12] ext4: run mballoc test with different layouts setting Kemeng Shi
  11 siblings, 1 reply; 27+ messages in thread
From: Kemeng Shi @ 2023-09-19 20:15 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list; +Cc: ojaswin, linux-ext4, linux-kernel

Here are prepared work:
1. Include mballoc-test.c to mballoc.c to be able test static function
in mballoc.c.
2. Implement static stub to avoid read IO to disk.
3. Construct fake super_block. Only partial members are set, more members
will be set when more functions are tested.
Then unit test for ext4_mb_new_blocks_simple is added.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc-test.c | 325 +++++++++++++++++++++++++++++++++++++++++
 fs/ext4/mballoc.c      |   4 +
 2 files changed, 329 insertions(+)
 create mode 100644 fs/ext4/mballoc-test.c

diff --git a/fs/ext4/mballoc-test.c b/fs/ext4/mballoc-test.c
new file mode 100644
index 000000000000..120c4944d2e1
--- /dev/null
+++ b/fs/ext4/mballoc-test.c
@@ -0,0 +1,325 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KUnit test of ext4 multiblocks allocation.
+ */
+
+#include <kunit/test.h>
+#include <kunit/static_stub.h>
+
+#include "ext4.h"
+
+struct mbt_grp_ctx {
+	struct buffer_head bitmap_bh;
+	/* desc and gd_bh are just the place holders for now */
+	struct ext4_group_desc desc;
+	struct buffer_head gd_bh;
+};
+
+struct mbt_ctx {
+	struct mbt_grp_ctx *grp_ctx;
+};
+
+struct mbt_ext4_super_block {
+	struct super_block sb;
+	struct mbt_ctx mbt_ctx;
+};
+
+#define MBT_CTX(_sb) (&(container_of((_sb), struct mbt_ext4_super_block, sb)->mbt_ctx))
+#define MBT_GRP_CTX(_sb, _group) (&MBT_CTX(_sb)->grp_ctx[_group])
+
+static struct super_block *mbt_ext4_alloc_super_block(void)
+{
+	struct ext4_super_block *es = kzalloc(sizeof(*es), GFP_KERNEL);
+	struct ext4_sb_info *sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
+	struct mbt_ext4_super_block *fsb = kzalloc(sizeof(*fsb), GFP_KERNEL);
+
+	if (fsb == NULL || sbi == NULL || es == NULL)
+		goto out;
+
+	sbi->s_es = es;
+	fsb->sb.s_fs_info = sbi;
+	return &fsb->sb;
+
+out:
+	kfree(fsb);
+	kfree(sbi);
+	kfree(es);
+	return NULL;
+}
+
+static void mbt_ext4_free_super_block(struct super_block *sb)
+{
+	struct mbt_ext4_super_block *fsb =
+		container_of(sb, struct mbt_ext4_super_block, sb);
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+	kfree(sbi->s_es);
+	kfree(sbi);
+	kfree(fsb);
+}
+
+struct mbt_ext4_block_layout {
+	unsigned char blocksize_bits;
+	unsigned int cluster_bits;
+	uint32_t blocks_per_group;
+	ext4_group_t group_count;
+	uint16_t desc_size;
+};
+
+static void mbt_init_sb_layout(struct super_block *sb,
+			       struct mbt_ext4_block_layout *layout)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	struct ext4_super_block *es = sbi->s_es;
+
+	sb->s_blocksize = 1UL << layout->blocksize_bits;
+	sb->s_blocksize_bits = layout->blocksize_bits;
+
+	sbi->s_groups_count = layout->group_count;
+	sbi->s_blocks_per_group = layout->blocks_per_group;
+	sbi->s_cluster_bits = layout->cluster_bits;
+	sbi->s_cluster_ratio = 1U << layout->cluster_bits;
+	sbi->s_clusters_per_group = layout->blocks_per_group >>
+				    layout->cluster_bits;
+	sbi->s_desc_size = layout->desc_size;
+
+	es->s_first_data_block = cpu_to_le32(0);
+	es->s_blocks_count_lo = cpu_to_le32(layout->blocks_per_group *
+					    layout->group_count);
+}
+
+static int mbt_grp_ctx_init(struct super_block *sb,
+			    struct mbt_grp_ctx *grp_ctx)
+{
+	grp_ctx->bitmap_bh.b_data = kzalloc(EXT4_BLOCK_SIZE(sb), GFP_KERNEL);
+	if (grp_ctx->bitmap_bh.b_data == NULL)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void mbt_grp_ctx_release(struct mbt_grp_ctx *grp_ctx)
+{
+	kfree(grp_ctx->bitmap_bh.b_data);
+	grp_ctx->bitmap_bh.b_data = NULL;
+}
+
+static void mbt_ctx_mark_used(struct super_block *sb, ext4_group_t group,
+			      unsigned int start, unsigned int len)
+{
+	struct mbt_grp_ctx *grp_ctx = MBT_GRP_CTX(sb, group);
+
+	mb_set_bits(grp_ctx->bitmap_bh.b_data, start, len);
+}
+
+/* called after mbt_init_sb_layout */
+static int mbt_ctx_init(struct super_block *sb)
+{
+	struct mbt_ctx *ctx = MBT_CTX(sb);
+	ext4_group_t i, ngroups = ext4_get_groups_count(sb);
+
+	ctx->grp_ctx = kcalloc(ngroups, sizeof(struct mbt_grp_ctx),
+			       GFP_KERNEL);
+	if (ctx->grp_ctx == NULL)
+		return -ENOMEM;
+
+	for (i = 0; i < ngroups; i++)
+		if (mbt_grp_ctx_init(sb, &ctx->grp_ctx[i]))
+			goto out;
+
+	/*
+	 * first data block(first cluster in first group) is used by
+	 * metadata, mark it used to avoid to alloc data block at first
+	 * block which will fail ext4_sb_block_valid check.
+	 */
+	mb_set_bits(ctx->grp_ctx[0].bitmap_bh.b_data, 0, 1);
+
+	return 0;
+out:
+	while (i-- > 0)
+		mbt_grp_ctx_release(&ctx->grp_ctx[i]);
+	kfree(ctx->grp_ctx);
+	return -ENOMEM;
+}
+
+static void mbt_ctx_release(struct super_block *sb)
+{
+	struct mbt_ctx *ctx = MBT_CTX(sb);
+	ext4_group_t i, ngroups = ext4_get_groups_count(sb);
+
+	for (i = 0; i < ngroups; i++)
+		mbt_grp_ctx_release(&ctx->grp_ctx[i]);
+	kfree(ctx->grp_ctx);
+}
+
+static struct buffer_head *
+ext4_read_block_bitmap_nowait_stub(struct super_block *sb, ext4_group_t block_group,
+				   bool ignore_locked)
+{
+	struct mbt_grp_ctx *grp_ctx = MBT_GRP_CTX(sb, block_group);
+
+	/* paired with brelse from caller of ext4_read_block_bitmap_nowait */
+	get_bh(&grp_ctx->bitmap_bh);
+	return &grp_ctx->bitmap_bh;
+}
+
+static int ext4_wait_block_bitmap_stub(struct super_block *sb,
+				       ext4_group_t block_group,
+				       struct buffer_head *bh)
+{
+	return 0;
+}
+
+static struct ext4_group_desc *
+ext4_get_group_desc_stub(struct super_block *sb, ext4_group_t block_group,
+			 struct buffer_head **bh)
+{
+	struct mbt_grp_ctx *grp_ctx = MBT_GRP_CTX(sb, block_group);
+
+	if (bh != NULL)
+		*bh = &grp_ctx->gd_bh;
+
+	return &grp_ctx->desc;
+}
+
+static int
+ext4_mb_mark_context_stub(handle_t *handle, struct super_block *sb, bool state,
+			  ext4_group_t group, ext4_grpblk_t blkoff,
+			  ext4_grpblk_t len, int flags,
+			  ext4_grpblk_t *ret_changed)
+{
+	struct mbt_grp_ctx *grp_ctx = MBT_GRP_CTX(sb, group);
+	struct buffer_head *bitmap_bh = &grp_ctx->bitmap_bh;
+
+	if (state)
+		mb_set_bits(bitmap_bh->b_data, blkoff, len);
+	else
+		mb_clear_bits(bitmap_bh->b_data, blkoff, len);
+
+	return 0;
+}
+
+#define TEST_BLOCKSIZE_BITS 10
+#define TEST_CLUSTER_BITS 3
+#define TEST_BLOCKS_PER_GROUP 8192
+#define TEST_GROUP_COUNT 4
+#define TEST_DESC_SIZE 64
+#define TEST_GOAL_GROUP 1
+static int mbt_kunit_init(struct kunit *test)
+{
+	struct mbt_ext4_block_layout layout = {
+		.blocksize_bits = TEST_BLOCKSIZE_BITS,
+		.cluster_bits = TEST_CLUSTER_BITS,
+		.blocks_per_group = TEST_BLOCKS_PER_GROUP,
+		.group_count = TEST_GROUP_COUNT,
+		.desc_size = TEST_DESC_SIZE,
+	};
+	struct super_block *sb;
+	int ret;
+
+	sb = mbt_ext4_alloc_super_block();
+	if (sb == NULL)
+		return -ENOMEM;
+
+	mbt_init_sb_layout(sb, &layout);
+
+	ret = mbt_ctx_init(sb);
+	if (ret != 0) {
+		mbt_ext4_free_super_block(sb);
+		return ret;
+	}
+
+	test->priv = sb;
+	kunit_activate_static_stub(test,
+				   ext4_read_block_bitmap_nowait,
+				   ext4_read_block_bitmap_nowait_stub);
+	kunit_activate_static_stub(test,
+				   ext4_wait_block_bitmap,
+				   ext4_wait_block_bitmap_stub);
+	kunit_activate_static_stub(test,
+				   ext4_get_group_desc,
+				   ext4_get_group_desc_stub);
+	kunit_activate_static_stub(test,
+				   ext4_mb_mark_context,
+				   ext4_mb_mark_context_stub);
+	return 0;
+}
+
+static void mbt_kunit_exit(struct kunit *test)
+{
+	struct super_block *sb = (struct super_block *)test->priv;
+
+	mbt_ctx_release(sb);
+	mbt_ext4_free_super_block(sb);
+}
+
+static void test_new_blocks_simple(struct kunit *test)
+{
+	struct super_block *sb = (struct super_block *)test->priv;
+	struct inode inode = { .i_sb = sb, };
+	struct ext4_allocation_request ar;
+	ext4_group_t i, goal_group = TEST_GOAL_GROUP;
+	int err = 0;
+	ext4_fsblk_t found;
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+	ar.inode = &inode;
+
+	/* get block at goal */
+	ar.goal = ext4_group_first_block_no(sb, goal_group);
+	found = ext4_mb_new_blocks_simple(&ar, &err);
+	KUNIT_ASSERT_EQ_MSG(test, ar.goal, found,
+		"failed to alloc block at goal, expected %llu found %llu",
+		ar.goal, found);
+
+	/* get block after goal in goal group */
+	ar.goal = ext4_group_first_block_no(sb, goal_group);
+	found = ext4_mb_new_blocks_simple(&ar, &err);
+	KUNIT_ASSERT_EQ_MSG(test, ar.goal + EXT4_C2B(sbi, 1), found,
+		"failed to alloc block after goal in goal group, expected %llu found %llu",
+		ar.goal + 1, found);
+
+	/* get block after goal group */
+	mbt_ctx_mark_used(sb, goal_group, 0, EXT4_CLUSTERS_PER_GROUP(sb));
+	ar.goal = ext4_group_first_block_no(sb, goal_group);
+	found = ext4_mb_new_blocks_simple(&ar, &err);
+	KUNIT_ASSERT_EQ_MSG(test,
+		ext4_group_first_block_no(sb, goal_group + 1), found,
+		"failed to alloc block after goal group, expected %llu found %llu",
+		ext4_group_first_block_no(sb, goal_group + 1), found);
+
+	/* get block before goal group */
+	for (i = goal_group; i < ext4_get_groups_count(sb); i++)
+		mbt_ctx_mark_used(sb, i, 0, EXT4_CLUSTERS_PER_GROUP(sb));
+	ar.goal = ext4_group_first_block_no(sb, goal_group);
+	found = ext4_mb_new_blocks_simple(&ar, &err);
+	KUNIT_ASSERT_EQ_MSG(test,
+		ext4_group_first_block_no(sb, 0) + EXT4_C2B(sbi, 1), found,
+		"failed to alloc block before goal group, expected %llu found %llu",
+		ext4_group_first_block_no(sb, 0 + EXT4_C2B(sbi, 1)), found);
+
+	/* no block available, fail to allocate block */
+	for (i = 0; i < ext4_get_groups_count(sb); i++)
+		mbt_ctx_mark_used(sb, i, 0, EXT4_CLUSTERS_PER_GROUP(sb));
+	ar.goal = ext4_group_first_block_no(sb, goal_group);
+	found = ext4_mb_new_blocks_simple(&ar, &err);
+	KUNIT_ASSERT_NE_MSG(test, err, 0,
+		"unexpectedly get block when no block is available");
+}
+
+
+static struct kunit_case mbt_test_cases[] = {
+	KUNIT_CASE(test_new_blocks_simple),
+	{}
+};
+
+static struct kunit_suite mbt_test_suite = {
+	.name = "ext4_mballoc_test",
+	.init = mbt_kunit_init,
+	.exit = mbt_kunit_exit,
+	.test_cases = mbt_test_cases,
+};
+
+kunit_test_suites(&mbt_test_suite);
+
+MODULE_LICENSE("GPL");
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 70dc446996a0..95125711468e 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -7026,3 +7026,7 @@ ext4_mballoc_query_range(
 
 	return error;
 }
+
+#ifdef CONFIG_EXT4_KUNIT_TESTS
+#include "mballoc-test.c"
+#endif
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v7 12/12] ext4: run mballoc test with different layouts setting
  2023-09-19 20:15 [PATCH v7 00/12] cleanups and unit test for mballoc Kemeng Shi
                   ` (10 preceding siblings ...)
  2023-09-19 20:15 ` [PATCH v7 11/12] ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc Kemeng Shi
@ 2023-09-19 20:15 ` Kemeng Shi
  11 siblings, 0 replies; 27+ messages in thread
From: Kemeng Shi @ 2023-09-19 20:15 UTC (permalink / raw)
  To: tytso, adilger.kernel, ritesh.list; +Cc: ojaswin, linux-ext4, linux-kernel

Use KUNIT_CASE_PARAM to run mballoc test with different layouts setting.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/ext4/mballoc-test.c | 52 ++++++++++++++++++++++++++++++------------
 1 file changed, 38 insertions(+), 14 deletions(-)

diff --git a/fs/ext4/mballoc-test.c b/fs/ext4/mballoc-test.c
index 120c4944d2e1..f94901fd3835 100644
--- a/fs/ext4/mballoc-test.c
+++ b/fs/ext4/mballoc-test.c
@@ -199,21 +199,11 @@ ext4_mb_mark_context_stub(handle_t *handle, struct super_block *sb, bool state,
 	return 0;
 }
 
-#define TEST_BLOCKSIZE_BITS 10
-#define TEST_CLUSTER_BITS 3
-#define TEST_BLOCKS_PER_GROUP 8192
-#define TEST_GROUP_COUNT 4
-#define TEST_DESC_SIZE 64
 #define TEST_GOAL_GROUP 1
 static int mbt_kunit_init(struct kunit *test)
 {
-	struct mbt_ext4_block_layout layout = {
-		.blocksize_bits = TEST_BLOCKSIZE_BITS,
-		.cluster_bits = TEST_CLUSTER_BITS,
-		.blocks_per_group = TEST_BLOCKS_PER_GROUP,
-		.group_count = TEST_GROUP_COUNT,
-		.desc_size = TEST_DESC_SIZE,
-	};
+	struct mbt_ext4_block_layout *layout =
+		(struct mbt_ext4_block_layout *)(test->param_value);
 	struct super_block *sb;
 	int ret;
 
@@ -221,7 +211,7 @@ static int mbt_kunit_init(struct kunit *test)
 	if (sb == NULL)
 		return -ENOMEM;
 
-	mbt_init_sb_layout(sb, &layout);
+	mbt_init_sb_layout(sb, layout);
 
 	ret = mbt_ctx_init(sb);
 	if (ret != 0) {
@@ -307,9 +297,43 @@ static void test_new_blocks_simple(struct kunit *test)
 		"unexpectedly get block when no block is available");
 }
 
+static const struct mbt_ext4_block_layout mbt_test_layouts[] = {
+	{
+		.blocksize_bits = 10,
+		.cluster_bits = 3,
+		.blocks_per_group = 8192,
+		.group_count = 4,
+		.desc_size = 64,
+	},
+	{
+		.blocksize_bits = 12,
+		.cluster_bits = 3,
+		.blocks_per_group = 8192,
+		.group_count = 4,
+		.desc_size = 64,
+	},
+	{
+		.blocksize_bits = 16,
+		.cluster_bits = 3,
+		.blocks_per_group = 8192,
+		.group_count = 4,
+		.desc_size = 64,
+	},
+};
+
+static void mbt_show_layout(const struct mbt_ext4_block_layout *layout,
+			    char *desc)
+{
+	snprintf(desc, KUNIT_PARAM_DESC_SIZE, "block_bits=%d cluster_bits=%d "
+		 "blocks_per_group=%d group_count=%d desc_size=%d\n",
+		 layout->blocksize_bits, layout->cluster_bits,
+		 layout->blocks_per_group, layout->group_count,
+		 layout->desc_size);
+}
+KUNIT_ARRAY_PARAM(mbt_layouts, mbt_test_layouts, mbt_show_layout);
 
 static struct kunit_case mbt_test_cases[] = {
-	KUNIT_CASE(test_new_blocks_simple),
+	KUNIT_CASE_PARAM(test_new_blocks_simple, mbt_layouts_gen_params),
 	{}
 };
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 01/12] ext4: make state in ext4_mb_mark_bb to be bool
  2023-09-19 20:15 ` [PATCH v7 01/12] ext4: make state in ext4_mb_mark_bb to be bool Kemeng Shi
@ 2023-09-27  6:10   ` Ritesh Harjani
  2023-09-27  6:51     ` Kemeng Shi
  0 siblings, 1 reply; 27+ messages in thread
From: Ritesh Harjani @ 2023-09-27  6:10 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel; +Cc: ojaswin, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> As state could only be either 0 or 1, just make it bool.

Sure.

>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/ext4.h        | 2 +-
>  fs/ext4/fast_commit.c | 8 ++++----
>  fs/ext4/mballoc.c     | 2 +-
>  3 files changed, 6 insertions(+), 6 deletions(-)

But why not convert at all places?

git grep "ext4_mb_mark_bb" .
fs/ext4/ext4.h:extern void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
fs/ext4/extents.c:                                      ext4_mb_mark_bb(inode->i_sb,
fs/ext4/extents.c:                      ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, 0);
fs/ext4/fast_commit.c:                  ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, 0);
fs/ext4/fast_commit.c:                  ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, 0);
fs/ext4/fast_commit.c:                                          ext4_mb_mark_bb(inode->i_sb,
fs/ext4/fast_commit.c:                          ext4_mb_mark_bb(inode->i_sb, map.m_pblk,
fs/ext4/mballoc.c:void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
fs/ext4/mballoc.c:      ext4_mb_mark_bb(sb, block, 1, 1);

-ritesh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 01/12] ext4: make state in ext4_mb_mark_bb to be bool
  2023-09-27  6:10   ` Ritesh Harjani
@ 2023-09-27  6:51     ` Kemeng Shi
  0 siblings, 0 replies; 27+ messages in thread
From: Kemeng Shi @ 2023-09-27  6:51 UTC (permalink / raw)
  To: Ritesh Harjani, tytso, adilger.kernel; +Cc: ojaswin, linux-ext4, linux-kernel



on 9/27/2023 2:10 PM, Ritesh Harjani wrote:
> Kemeng Shi <shikemeng@huaweicloud.com> writes:
> 
>> As state could only be either 0 or 1, just make it bool.
> 
> Sure.
> 
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>>  fs/ext4/ext4.h        | 2 +-
>>  fs/ext4/fast_commit.c | 8 ++++----
>>  fs/ext4/mballoc.c     | 2 +-
>>  3 files changed, 6 insertions(+), 6 deletions(-)
> 
> But why not convert at all places?
Sorry for this. Will convert at all places in next version. Thanks.
> 
> git grep "ext4_mb_mark_bb" .
> fs/ext4/ext4.h:extern void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
> fs/ext4/extents.c:                                      ext4_mb_mark_bb(inode->i_sb,
> fs/ext4/extents.c:                      ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, 0);
> fs/ext4/fast_commit.c:                  ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, 0);
> fs/ext4/fast_commit.c:                  ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, 0);
> fs/ext4/fast_commit.c:                                          ext4_mb_mark_bb(inode->i_sb,
> fs/ext4/fast_commit.c:                          ext4_mb_mark_bb(inode->i_sb, map.m_pblk,
> fs/ext4/mballoc.c:void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
> fs/ext4/mballoc.c:      ext4_mb_mark_bb(sb, block, 1, 1);
> 
> -ritesh
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 02/12] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb
  2023-09-19 20:15 ` [PATCH v7 02/12] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb Kemeng Shi
@ 2023-09-27  8:49   ` Ritesh Harjani
  2023-09-28  3:31     ` Kemeng Shi
  0 siblings, 1 reply; 27+ messages in thread
From: Ritesh Harjani @ 2023-09-27  8:49 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel; +Cc: ojaswin, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> There are several reasons to add a general function ext4_mb_mark_context
> to update block bitmap and group descriptor on disk:
> 1. pair behavior of alloc/free bits. For example,
> ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups
> in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this.
> 2. remove repeat code to read from disk, update and write back to disk.
> 3. reduce future unit test mocks to catch real IO to update structure
> on disk.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/mballoc.c | 147 ++++++++++++++++++++++++----------------------
>  1 file changed, 77 insertions(+), 70 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index cf09adfbaf11..e1320eea46e9 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -3953,6 +3953,80 @@ void ext4_exit_mballoc(void)
>  	ext4_groupinfo_destroy_slabs();
>  }
>  
> +static int
> +ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group,
> +		     ext4_grpblk_t blkoff, ext4_grpblk_t len)


ext4_grpblk_t is defined as int.
    /* data type for block offset of block group */
    typedef int ext4_grpblk_t;

I think len should be unsigned int (u32) here. 

> +{
> +	struct ext4_sb_info *sbi = EXT4_SB(sb);
> +	struct buffer_head *bitmap_bh = NULL;
> +	struct ext4_group_desc *gdp;
> +	struct buffer_head *gdp_bh;
> +	int err;
> +	unsigned int i, already, changed;
> +
> +	bitmap_bh = ext4_read_block_bitmap(sb, group);
> +	if (IS_ERR(bitmap_bh))
> +		return PTR_ERR(bitmap_bh);
> +
> +	err = -EIO;
> +	gdp = ext4_get_group_desc(sb, group, &gdp_bh);
> +	if (!gdp)
> +		goto out_err;
> +
> +	ext4_lock_group(sb, group);
> +	if (ext4_has_group_desc_csum(sb) &&
> +	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
> +		gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
> +		ext4_free_group_clusters_set(sb, gdp,
> +			ext4_free_clusters_after_init(sb, group, gdp));
> +	}
> +
> +	already = 0;
> +	for (i = 0; i < len; i++)
> +		if (mb_test_bit(blkoff + i, bitmap_bh->b_data) ==
> +				state)
> +			already++;
> +	changed = len - already;
> +
> +	if (state) {
> +		mb_set_bits(bitmap_bh->b_data, blkoff, len);
> +		ext4_free_group_clusters_set(sb, gdp,
> +			ext4_free_group_clusters(sb, gdp) - changed);
> +	} else {
> +		mb_clear_bits(bitmap_bh->b_data, blkoff, len);
> +		ext4_free_group_clusters_set(sb, gdp,
> +			ext4_free_group_clusters(sb, gdp) + changed);
> +	}
> +
> +	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
> +	ext4_group_desc_csum_set(sb, group, gdp);
> +	ext4_unlock_group(sb, group);
> +
> +	if (sbi->s_log_groups_per_flex) {
> +		ext4_group_t flex_group = ext4_flex_group(sbi, group);
> +		struct flex_groups *fg = sbi_array_rcu_deref(sbi,
> +					   s_flex_groups, flex_group);
> +
> +		if (state)
> +			atomic64_sub(changed, &fg->free_clusters);
> +		else
> +			atomic64_add(changed, &fg->free_clusters);
> +	}
> +
> +	err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
> +	if (err)
> +		goto out_err;
> +	err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
> +	if (err)
> +		goto out_err;
> +
> +	sync_dirty_buffer(bitmap_bh);
> +	sync_dirty_buffer(gdp_bh);
> +
> +out_err:
> +	brelse(bitmap_bh);
> +	return err;
> +}
>  
>  /*
>   * Check quota and mark chosen space (ac->ac_b_ex) non-free in bitmaps
> @@ -4079,15 +4153,11 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
>  void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
>  		     int len, bool state)

Even ext4_mb_mark_bb should take len as unsigned int IMO.
For e.g. ext4_fc_replay_add_range() passes map.m_len which is also
unsigned int.


Otherwise the patch looks good to me. Feel free to add - 

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

-ritesh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 03/12] ext4: call ext4_mb_mark_context in ext4_free_blocks_simple
  2023-09-19 20:15 ` [PATCH v7 03/12] ext4: call ext4_mb_mark_context in ext4_free_blocks_simple Kemeng Shi
@ 2023-09-27  8:52   ` Ritesh Harjani
  0 siblings, 0 replies; 27+ messages in thread
From: Ritesh Harjani @ 2023-09-27  8:52 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel; +Cc: ojaswin, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> call ext4_mb_mark_context in ext4_free_blocks_simple to:
> 1. remove repeat code
> 2. pair update of free_clusters in ext4_mb_new_blocks_simple.
> 3. add missing ext4_lock_group/ext4_unlock_group protection.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/mballoc.c | 33 +--------------------------------
>  1 file changed, 1 insertion(+), 32 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index e1320eea46e9..cd2fd5dbfcdd 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6393,43 +6393,12 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
>  static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block,
>  					unsigned long count)

This might need some auditing later (need not be as part of this series)
on why it is an unsigned long. I think it is just a left over and an
unsigned int should be sufficient.

But either ways this patch looks good to me. Feel free to add -

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

-ritesh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 04/12] ext4: extend ext4_mb_mark_context to support allocation under journal
  2023-09-19 20:15 ` [PATCH v7 04/12] ext4: extend ext4_mb_mark_context to support allocation under journal Kemeng Shi
@ 2023-09-27  9:13   ` Ritesh Harjani
  0 siblings, 0 replies; 27+ messages in thread
From: Ritesh Harjani @ 2023-09-27  9:13 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel; +Cc: ojaswin, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> Previously, ext4_mb_mark_context is only called under fast commit
> replay path, so there is no valid handle when we update block bitmap
> and group descriptor. This patch try to extend ext4_mb_mark_context
> to be used by code under journal. There are several improvement:
> 1. add "handle_t *handle" to struct ext4_mark_context to journal block
> bitmap and group descriptor update inside ext4_mb_mark_context (the
> added journal code is based on ext4_mb_mark_diskspace_used where
> ext4_mb_mark_context is going to be used.)

> 2. add EXT4_MB_BITMAP_MARKED_CHECK flag to control check if bits in block
> bitmap are already marked as allocation code under journal asserts that
> all bits to be changed are not marked before.

Maybe we can reword this to... 
Adds a flag argument to ext4_mb_mark_context() which controls
a. EXT4_MB_BITMAP_MARKED_CHECK - whether block bitmap checking is needed.
b. EXT4_MB_SYNC_UPDATE - whether dirty buffers (bitmap and group descriptor) needs sync.


> 3. add "ext4_grpblk_t changed" to struct ext4_mark_context to notify number
> of bits in block bitmap has changed.

We should remove above point 3 as there is no "struct ext4_mark_context"
in this v7 series.

>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/mballoc.c | 64 ++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 49 insertions(+), 15 deletions(-)


The changes looks good to me. With commit msg updated, feel free to add-

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 05/12] ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used
  2023-09-19 20:15 ` [PATCH v7 05/12] ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used Kemeng Shi
@ 2023-09-27  9:58   ` Ritesh Harjani
  0 siblings, 0 replies; 27+ messages in thread
From: Ritesh Harjani @ 2023-09-27  9:58 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel; +Cc: ojaswin, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> call ext4_mb_mark_context in ext4_mb_mark_diskspace_used to:
> 1. remove repeat code to normally update bitmap and group descriptor
> on disk.
> 2. call ext4_mb_mark_context instead of only setting bits in block bitmap
> to fix the bitmap. Function ext4_mb_mark_context will also update
> checksum of bitmap and other counter along with the bit change to keep
> the cosistent with bit change or block bitmap will be marked corrupted as
> checksum of bitmap is in inconsistent state.
>

Rewording point 2 to... 
Now that we have a common API for marking blocks inuse/free in block
bitmap, use that instead of open coding it in function
ext4_mb_mark_diskspace_used(). The current code was not updating
checksum and other counters. ext4_mb_mark_context() should fix these
consistency problems.


Also I now see why you have used "int" (ext4_grpblk_t) for len in
ext4_mb_mark_context(). The reason is because this is "cluster len" which
is defined as ext4_grpblk_t in "struct ext4_free_extent"

I think by default we anyway have 8192 blocks per block group. So it
should be ok for now. I anyway think the usage of blocks and cluster (&
their data type) is confusing at different places which needs an auditing/cleanup.

This patch looks good to me. Feel free to add - 

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 06/12] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb()
  2023-09-19 20:15 ` [PATCH v7 06/12] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb() Kemeng Shi
@ 2023-09-27 11:06   ` Ritesh Harjani
  0 siblings, 0 replies; 27+ messages in thread
From: Ritesh Harjani @ 2023-09-27 11:06 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel; +Cc: ojaswin, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> This patch separates block bitmap and buddy bitmap freeing in order to
> update block bitmap with ext4_mb_mark_context in following patch.
>
> Separated freeing is safe with concurrent allocation as long as:
> 1. Firstly allocate block in buddy bitmap, and then in block bitmap.
> 2. Firstly free block in block bitmap, and then buddy bitmap.
> Then freed block will only be available to allocation when both buddy
> bitmap and block bitmap are updated by freeing.
> Allocation obeys rule 1 already, just do sperated freeing with rule 2.
>
> Separated freeing has no race with generate_buddy as:
> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
> buddy page can be found in sbi->s_buddy_cache and no more buddy
> initialization of the buddy page will be executed concurrently until
> buddy page is unloaded. As we always do free in "load buddy, free,
> unload buddy" sequence, separated freeing has no race with generate_buddy.
>

Agreed. And thanks for adding a separate patch which talks about this
change. 

> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/mballoc.c | 98 +++++++++++++++++++++++------------------------
>  1 file changed, 49 insertions(+), 49 deletions(-)

The patch looks good to me. Please feel free to add - 

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 07/12] ext4: call ext4_mb_mark_context in ext4_mb_clear_bb
  2023-09-19 20:15 ` [PATCH v7 07/12] ext4: call ext4_mb_mark_context in ext4_mb_clear_bb Kemeng Shi
@ 2023-09-27 11:14   ` Ritesh Harjani
  0 siblings, 0 replies; 27+ messages in thread
From: Ritesh Harjani @ 2023-09-27 11:14 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel; +Cc: ojaswin, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> Call ext4_mb_mark_context in ext4_mb_clear_bb to remove repeat code.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/mballoc.c | 69 +++++++----------------------------------------
>  1 file changed, 10 insertions(+), 59 deletions(-)

Looks good to me. Please feel free to add - 

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 08/12] ext4: Separate block bitmap and buddy bitmap freeing in ext4_group_add_blocks()
  2023-09-19 20:15 ` [PATCH v7 08/12] ext4: Separate block bitmap and buddy bitmap freeing in ext4_group_add_blocks() Kemeng Shi
@ 2023-09-27 11:16   ` Ritesh Harjani
  0 siblings, 0 replies; 27+ messages in thread
From: Ritesh Harjani @ 2023-09-27 11:16 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel; +Cc: ojaswin, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> This patch separates block bitmap and buddy bitmap freeing in order to
> udpate block bitmap with ext4_mb_mark_context in following patch.
> The reason why this can be sperated is explained in previous submit.
> Put the explanation here to simplify the code archeology to
> ext4_group_add_blocks():
>
> Separated freeing is safe with concurrent allocation as long as:
> 1. Firstly allocate block in buddy bitmap, and then in block bitmap.
> 2. Firstly free block in block bitmap, and then buddy bitmap.
> Then freed block will only be available to allocation when both buddy
> bitmap and block bitmap are updated by freeing.
> Allocation obeys rule 1 already, just do sperated freeing with rule 2.
>
> Separated freeing has no race with generate_buddy as:
> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
> buddy page can be found in sbi->s_buddy_cache and no more buddy
> initialization of the buddy page will be executed concurrently until
> buddy page is unloaded. As we always do free in "load buddy, free,
> unload buddy" sequence, separated freeing has no race with generate_buddy.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/mballoc.c | 54 +++++++++++++++++++++++------------------------
>  1 file changed, 26 insertions(+), 28 deletions(-)
>

Looks good to me. Please feel free to add - 

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 09/12] ext4: call ext4_mb_mark_context in ext4_group_add_blocks()
  2023-09-19 20:15 ` [PATCH v7 09/12] ext4: call ext4_mb_mark_context " Kemeng Shi
@ 2023-09-27 11:19   ` Ritesh Harjani
  0 siblings, 0 replies; 27+ messages in thread
From: Ritesh Harjani @ 2023-09-27 11:19 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel; +Cc: ojaswin, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> Call ext4_mb_mark_context in ext4_group_add_blocks() to remove repeat code.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/mballoc.c | 82 ++++++-----------------------------------------
>  1 file changed, 10 insertions(+), 72 deletions(-)
>

Looks good to me. Please feel free to add - 

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 11/12] ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc
  2023-09-19 20:15 ` [PATCH v7 11/12] ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc Kemeng Shi
@ 2023-09-27 11:39   ` Ritesh Harjani
  0 siblings, 0 replies; 27+ messages in thread
From: Ritesh Harjani @ 2023-09-27 11:39 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel; +Cc: ojaswin, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> Here are prepared work:
> 1. Include mballoc-test.c to mballoc.c to be able test static function
> in mballoc.c.
> 2. Implement static stub to avoid read IO to disk.
> 3. Construct fake super_block. Only partial members are set, more members
> will be set when more functions are tested.
> Then unit test for ext4_mb_new_blocks_simple is added.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/mballoc-test.c | 325 +++++++++++++++++++++++++++++++++++++++++
>  fs/ext4/mballoc.c      |   4 +
>  2 files changed, 329 insertions(+)
>  create mode 100644 fs/ext4/mballoc-test.c

I guess I missed to add by RB in previous revision. I see there are no
functional changes to this patch from previous revision. 

Please feel free to add - 
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

Thanks!
-ritesh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 02/12] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb
  2023-09-27  8:49   ` Ritesh Harjani
@ 2023-09-28  3:31     ` Kemeng Shi
  2023-09-28  4:42       ` Ritesh Harjani
  0 siblings, 1 reply; 27+ messages in thread
From: Kemeng Shi @ 2023-09-28  3:31 UTC (permalink / raw)
  To: Ritesh Harjani, tytso, adilger.kernel; +Cc: ojaswin, linux-ext4, linux-kernel



on 9/27/2023 4:49 PM, Ritesh Harjani wrote:
> Kemeng Shi <shikemeng@huaweicloud.com> writes:
> 
>> There are several reasons to add a general function ext4_mb_mark_context
>> to update block bitmap and group descriptor on disk:
>> 1. pair behavior of alloc/free bits. For example,
>> ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups
>> in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this.
>> 2. remove repeat code to read from disk, update and write back to disk.
>> 3. reduce future unit test mocks to catch real IO to update structure
>> on disk.
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>>  fs/ext4/mballoc.c | 147 ++++++++++++++++++++++++----------------------
>>  1 file changed, 77 insertions(+), 70 deletions(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index cf09adfbaf11..e1320eea46e9 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -3953,6 +3953,80 @@ void ext4_exit_mballoc(void)
>>  	ext4_groupinfo_destroy_slabs();
>>  }
>>  
>> +static int
>> +ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group,
>> +		     ext4_grpblk_t blkoff, ext4_grpblk_t len)
> 
> 
> ext4_grpblk_t is defined as int.
>     /* data type for block offset of block group */
>     typedef int ext4_grpblk_t;
> 
> I think len should be unsigned int (u32) here. 
> 
Hi Ritesh, thanks for reply and a lot suggestions to this patch and other
patches in this series.
I define len as ext4_grpblk_t as I think ext4_grpblk_t is supposed to fit
block or cluster number of single group.

Here are some examples save block number of group to ext4_grpblk_t:
static ext4_fsblk_t ext4_valid_block_bitmap(...)
{
        ...
        ext4_grpblk_t max_bit = EXT4_CLUSTERS_PER_GROUP(sb);
        ...
}

static ext4_fsblk_t ext4_mb_new_blocks_simple(...)
{
        ...
        ext4_grpblk_t max = EXT4_CLUSTERS_PER_GROUP(sb);
        ...
}

/* len could be group block number if group has only one fragment */
static int mb_avg_fragment_size_order(..., ext4_grpblk_t len)

As ext4_grpblk_t is data type for block offset of block group, so
ext4_grpblk_t fits "block number of group" - 1. If we support block
number of group > INT_MAX + 1, ext4_grpblk_t should be unsigned int anyway.
IMO, it's more simple just make ext4_grpblk_t data type for block number
in a single group and make it unsigned int if block number of group is
possible to >= INT_MAX + 1. Does this makes to you.

>> +{
>> +	struct ext4_sb_info *sbi = EXT4_SB(sb);
>> +	struct buffer_head *bitmap_bh = NULL;
>> +	struct ext4_group_desc *gdp;
>> +	struct buffer_head *gdp_bh;
>> +	int err;
>> +	unsigned int i, already, changed;
>> +
>> +	bitmap_bh = ext4_read_block_bitmap(sb, group);
>> +	if (IS_ERR(bitmap_bh))
>> +		return PTR_ERR(bitmap_bh);
>> +
>> +	err = -EIO;
>> +	gdp = ext4_get_group_desc(sb, group, &gdp_bh);
>> +	if (!gdp)
>> +		goto out_err;
>> +
>> +	ext4_lock_group(sb, group);
>> +	if (ext4_has_group_desc_csum(sb) &&
>> +	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
>> +		gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
>> +		ext4_free_group_clusters_set(sb, gdp,
>> +			ext4_free_clusters_after_init(sb, group, gdp));
>> +	}
>> +
>> +	already = 0;
>> +	for (i = 0; i < len; i++)
>> +		if (mb_test_bit(blkoff + i, bitmap_bh->b_data) ==
>> +				state)
>> +			already++;
>> +	changed = len - already;
>> +
>> +	if (state) {
>> +		mb_set_bits(bitmap_bh->b_data, blkoff, len);
>> +		ext4_free_group_clusters_set(sb, gdp,
>> +			ext4_free_group_clusters(sb, gdp) - changed);
>> +	} else {
>> +		mb_clear_bits(bitmap_bh->b_data, blkoff, len);
>> +		ext4_free_group_clusters_set(sb, gdp,
>> +			ext4_free_group_clusters(sb, gdp) + changed);
>> +	}
>> +
>> +	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>> +	ext4_group_desc_csum_set(sb, group, gdp);
>> +	ext4_unlock_group(sb, group);
>> +
>> +	if (sbi->s_log_groups_per_flex) {
>> +		ext4_group_t flex_group = ext4_flex_group(sbi, group);
>> +		struct flex_groups *fg = sbi_array_rcu_deref(sbi,
>> +					   s_flex_groups, flex_group);
>> +
>> +		if (state)
>> +			atomic64_sub(changed, &fg->free_clusters);
>> +		else
>> +			atomic64_add(changed, &fg->free_clusters);
>> +	}
>> +
>> +	err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
>> +	if (err)
>> +		goto out_err;
>> +	err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
>> +	if (err)
>> +		goto out_err;
>> +
>> +	sync_dirty_buffer(bitmap_bh);
>> +	sync_dirty_buffer(gdp_bh);
>> +
>> +out_err:
>> +	brelse(bitmap_bh);
>> +	return err;
>> +}
>>  
>>  /*
>>   * Check quota and mark chosen space (ac->ac_b_ex) non-free in bitmaps
>> @@ -4079,15 +4153,11 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
>>  void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
>>  		     int len, bool state)
> 
> Even ext4_mb_mark_bb should take len as unsigned int IMO.
> For e.g. ext4_fc_replay_add_range() passes map.m_len which is also
> unsigned int.
If we agree ext4_grpblk_t to be data type for block number in group,
I think it's more reasonable to take len as ext4_grpblk_t too.

Look forward to you reply. Thanks!
> 
> 
> Otherwise the patch looks good to me. Feel free to add - 
> 
> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> 
> -ritesh
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 02/12] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb
  2023-09-28  3:31     ` Kemeng Shi
@ 2023-09-28  4:42       ` Ritesh Harjani
  2023-09-28  7:45         ` Kemeng Shi
  0 siblings, 1 reply; 27+ messages in thread
From: Ritesh Harjani @ 2023-09-28  4:42 UTC (permalink / raw)
  To: Kemeng Shi, tytso, adilger.kernel; +Cc: ojaswin, linux-ext4, linux-kernel

Kemeng Shi <shikemeng@huaweicloud.com> writes:

> on 9/27/2023 4:49 PM, Ritesh Harjani wrote:
>> Kemeng Shi <shikemeng@huaweicloud.com> writes:
>> 
>>> There are several reasons to add a general function ext4_mb_mark_context
>>> to update block bitmap and group descriptor on disk:
>>> 1. pair behavior of alloc/free bits. For example,
>>> ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups
>>> in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this.
>>> 2. remove repeat code to read from disk, update and write back to disk.
>>> 3. reduce future unit test mocks to catch real IO to update structure
>>> on disk.
>>>
>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>> ---
>>>  fs/ext4/mballoc.c | 147 ++++++++++++++++++++++++----------------------
>>>  1 file changed, 77 insertions(+), 70 deletions(-)
>>>
>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>> index cf09adfbaf11..e1320eea46e9 100644
>>> --- a/fs/ext4/mballoc.c
>>> +++ b/fs/ext4/mballoc.c
>>> @@ -3953,6 +3953,80 @@ void ext4_exit_mballoc(void)
>>>  	ext4_groupinfo_destroy_slabs();
>>>  }
>>>  
>>> +static int
>>> +ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group,
>>> +		     ext4_grpblk_t blkoff, ext4_grpblk_t len)
>> 
>> 
>> ext4_grpblk_t is defined as int.
>>     /* data type for block offset of block group */
>>     typedef int ext4_grpblk_t;
>> 
>> I think len should be unsigned int (u32) here. 
>> 
> Hi Ritesh, thanks for reply and a lot suggestions to this patch and other
> patches in this series.
> I define len as ext4_grpblk_t as I think ext4_grpblk_t is supposed to fit
> block or cluster number of single group.
>

At different places the use of datatype for no. of blocks/clusters within
a group gets very confusing :(

However, IMO ext4_grpblk_t should be fine for using len argument here.
I did respond about that while reviewing in some later patches [1]

[1]: https://lore.kernel.org/linux-ext4/87r0mkey45.fsf@doe.com/

So, I don't think we need any changes to this patch. 
    Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

Also overall the series looks good. There are just some minor
changes suggested in 1st patch and some commit msg updates suggested for
other changes. If you send a v8, then I think that looks good to be
picked up :) 

Thanks a lot for working on it & the suggested changes!

-ritesh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v7 02/12] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb
  2023-09-28  4:42       ` Ritesh Harjani
@ 2023-09-28  7:45         ` Kemeng Shi
  0 siblings, 0 replies; 27+ messages in thread
From: Kemeng Shi @ 2023-09-28  7:45 UTC (permalink / raw)
  To: Ritesh Harjani, tytso, adilger.kernel; +Cc: ojaswin, linux-ext4, linux-kernel



on 9/28/2023 12:42 PM, Ritesh Harjani wrote:
> Kemeng Shi <shikemeng@huaweicloud.com> writes:
> 
>> on 9/27/2023 4:49 PM, Ritesh Harjani wrote:
>>> Kemeng Shi <shikemeng@huaweicloud.com> writes:
>>>
>>>> There are several reasons to add a general function ext4_mb_mark_context
>>>> to update block bitmap and group descriptor on disk:
>>>> 1. pair behavior of alloc/free bits. For example,
>>>> ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups
>>>> in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this.
>>>> 2. remove repeat code to read from disk, update and write back to disk.
>>>> 3. reduce future unit test mocks to catch real IO to update structure
>>>> on disk.
>>>>
>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>> ---
>>>>  fs/ext4/mballoc.c | 147 ++++++++++++++++++++++++----------------------
>>>>  1 file changed, 77 insertions(+), 70 deletions(-)
>>>>
>>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>>> index cf09adfbaf11..e1320eea46e9 100644
>>>> --- a/fs/ext4/mballoc.c
>>>> +++ b/fs/ext4/mballoc.c
>>>> @@ -3953,6 +3953,80 @@ void ext4_exit_mballoc(void)
>>>>  	ext4_groupinfo_destroy_slabs();
>>>>  }
>>>>  
>>>> +static int
>>>> +ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group,
>>>> +		     ext4_grpblk_t blkoff, ext4_grpblk_t len)
>>>
>>>
>>> ext4_grpblk_t is defined as int.
>>>     /* data type for block offset of block group */
>>>     typedef int ext4_grpblk_t;
>>>
>>> I think len should be unsigned int (u32) here. 
>>>
>> Hi Ritesh, thanks for reply and a lot suggestions to this patch and other
>> patches in this series.
>> I define len as ext4_grpblk_t as I think ext4_grpblk_t is supposed to fit
>> block or cluster number of single group.
>>
> 
> At different places the use of datatype for no. of blocks/clusters within
> a group gets very confusing :(
> > However, IMO ext4_grpblk_t should be fine for using len argument here.
> I did respond about that while reviewing in some later patches [1]
> 
> [1]: https://lore.kernel.org/linux-ext4/87r0mkey45.fsf@doe.com/
> 
> So, I don't think we need any changes to this patch. 
>     Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> 
Sorry for missing this.
> Also overall the series looks good. There are just some minor
> changes suggested in 1st patch and some commit msg updates suggested for
> other changes. If you send a v8, then I think that looks good to be
> picked up :) 
> 
> Thanks a lot for working on it & the suggested changes!
Thanks a lot for your suggestion and patience! I will send a v8 include
all changes that you suggested. Thanks!
> 
> -ritesh
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-09-28  7:45 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-19 20:15 [PATCH v7 00/12] cleanups and unit test for mballoc Kemeng Shi
2023-09-19 20:15 ` [PATCH v7 01/12] ext4: make state in ext4_mb_mark_bb to be bool Kemeng Shi
2023-09-27  6:10   ` Ritesh Harjani
2023-09-27  6:51     ` Kemeng Shi
2023-09-19 20:15 ` [PATCH v7 02/12] ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb Kemeng Shi
2023-09-27  8:49   ` Ritesh Harjani
2023-09-28  3:31     ` Kemeng Shi
2023-09-28  4:42       ` Ritesh Harjani
2023-09-28  7:45         ` Kemeng Shi
2023-09-19 20:15 ` [PATCH v7 03/12] ext4: call ext4_mb_mark_context in ext4_free_blocks_simple Kemeng Shi
2023-09-27  8:52   ` Ritesh Harjani
2023-09-19 20:15 ` [PATCH v7 04/12] ext4: extend ext4_mb_mark_context to support allocation under journal Kemeng Shi
2023-09-27  9:13   ` Ritesh Harjani
2023-09-19 20:15 ` [PATCH v7 05/12] ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used Kemeng Shi
2023-09-27  9:58   ` Ritesh Harjani
2023-09-19 20:15 ` [PATCH v7 06/12] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb() Kemeng Shi
2023-09-27 11:06   ` Ritesh Harjani
2023-09-19 20:15 ` [PATCH v7 07/12] ext4: call ext4_mb_mark_context in ext4_mb_clear_bb Kemeng Shi
2023-09-27 11:14   ` Ritesh Harjani
2023-09-19 20:15 ` [PATCH v7 08/12] ext4: Separate block bitmap and buddy bitmap freeing in ext4_group_add_blocks() Kemeng Shi
2023-09-27 11:16   ` Ritesh Harjani
2023-09-19 20:15 ` [PATCH v7 09/12] ext4: call ext4_mb_mark_context " Kemeng Shi
2023-09-27 11:19   ` Ritesh Harjani
2023-09-19 20:15 ` [PATCH v7 10/12] ext4: add some kunit stub for mballoc kunit test Kemeng Shi
2023-09-19 20:15 ` [PATCH v7 11/12] ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc Kemeng Shi
2023-09-27 11:39   ` Ritesh Harjani
2023-09-19 20:15 ` [PATCH v7 12/12] ext4: run mballoc test with different layouts setting Kemeng Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).