All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv1 0/9] ext4: fast_commit fixes, stricter block checking & cleanups
@ 2022-02-05 14:09 Ritesh Harjani
  2022-02-05 14:09 ` [PATCHv1 1/9] ext4: Correct cluster len and clusters changed accounting in ext4_mb_mark_bb Ritesh Harjani
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Ritesh Harjani @ 2022-02-05 14:09 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, Theodore Ts'o, Jan Kara, Harshad Shirwadkar,
	Ritesh Harjani

Hello,

Please find v1 of this patch series aimed at fixing some of the issues
identified in fast_commit. This also adds some stricter checking of
blocks to be freed in ext4_mb_clear_bb(), ext4_group_add_blocks() &
ext4_mb_mark_bb().

I have tested this with few different fast_commit configs and normal 4k config
with -g log,quick. Haven't seen any surprises there.

RFC -> v1:
==========
1. Added Patch-1 which correctly accounts for flex_bg->free_clusters.
2. Addressed review comments from Jan
3. Might have changed the order of patches a bit.

[RFC] - https://lore.kernel.org/all/a9770b46522c03989bdd96f63f7d0bfb2cf499ab.1643642105.git.riteshh@linux.ibm.com/


Ritesh Harjani (9):
  ext4: Correct cluster len and clusters changed accounting in ext4_mb_mark_bb
  ext4: Fixes ext4_mb_mark_bb() with flex_bg with fast_commit
  ext4: Refactor ext4_free_blocks() to pull out ext4_mb_clear_bb()
  ext4: Use in_range() for range checking in ext4_fc_replay_check_excluded
  ext4: Rename ext4_set_bits to mb_set_bits
  ext4: No need to test for block bitmap bits in ext4_mb_mark_bb()
  ext4: Add ext4_sb_block_valid() refactored out of ext4_inode_block_valid()
  ext4: Add strict range checks while freeing blocks
  ext4: Add extra check in ext4_mb_mark_bb() to prevent against possible
    corruption

 fs/ext4/block_validity.c |  25 +--
 fs/ext4/ext4.h           |   5 +-
 fs/ext4/fast_commit.c    |   4 +-
 fs/ext4/mballoc.c        | 342 ++++++++++++++++++++++-----------------
 fs/ext4/resize.c         |   4 +-
 5 files changed, 219 insertions(+), 161 deletions(-)

--
2.31.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCHv1 1/9] ext4: Correct cluster len and clusters changed accounting in ext4_mb_mark_bb
  2022-02-05 14:09 [PATCHv1 0/9] ext4: fast_commit fixes, stricter block checking & cleanups Ritesh Harjani
@ 2022-02-05 14:09 ` Ritesh Harjani
  2022-02-07 15:28   ` Jan Kara
  2022-02-05 14:09 ` [PATCHv1 2/9] ext4: Fixes ext4_mb_mark_bb() with flex_bg with fast_commit Ritesh Harjani
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Ritesh Harjani @ 2022-02-05 14:09 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, Theodore Ts'o, Jan Kara, Harshad Shirwadkar,
	Ritesh Harjani

ext4_mb_mark_bb() currently wrongly calculates cluster len (clen) and
flex_group->free_clusters. This patch fixes that.

Identified based on code review of ext4_mb_mark_bb() function.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 fs/ext4/mballoc.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c781974df9d0..2f117ce3bb73 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3899,10 +3899,11 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	ext4_group_t group;
 	ext4_grpblk_t blkoff;
-	int i, clen, err;
+	int i, err;
 	int already;
+	unsigned int clen, clen_changed;
 
-	clen = EXT4_B2C(sbi, len);
+	clen = EXT4_NUM_B2C(sbi, len);
 
 	ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
 	bitmap_bh = ext4_read_block_bitmap(sb, group);
@@ -3923,6 +3924,7 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
 		if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) == !state)
 			already++;
 
+	clen_changed = clen - already;
 	if (state)
 		ext4_set_bits(bitmap_bh->b_data, blkoff, clen);
 	else
@@ -3935,9 +3937,9 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
 						group, gdp));
 	}
 	if (state)
-		clen = ext4_free_group_clusters(sb, gdp) - clen + already;
+		clen = ext4_free_group_clusters(sb, gdp) - clen_changed;
 	else
-		clen = ext4_free_group_clusters(sb, gdp) + clen - already;
+		clen = ext4_free_group_clusters(sb, gdp) + clen_changed;
 
 	ext4_free_group_clusters_set(sb, gdp, clen);
 	ext4_block_bitmap_csum_set(sb, group, gdp, bitmap_bh);
@@ -3947,10 +3949,13 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
 
 	if (sbi->s_log_groups_per_flex) {
 		ext4_group_t flex_group = ext4_flex_group(sbi, group);
+		struct flex_groups *fg = sbi_array_rcu_deref(sbi,
+					   s_flex_groups, flex_group);
 
-		atomic64_sub(len,
-			     &sbi_array_rcu_deref(sbi, s_flex_groups,
-						  flex_group)->free_clusters);
+		if (state)
+			atomic64_sub(clen_changed, &fg->free_clusters);
+		else
+			atomic64_add(clen_changed, &fg->free_clusters);
 	}
 
 	err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv1 2/9] ext4: Fixes ext4_mb_mark_bb() with flex_bg with fast_commit
  2022-02-05 14:09 [PATCHv1 0/9] ext4: fast_commit fixes, stricter block checking & cleanups Ritesh Harjani
  2022-02-05 14:09 ` [PATCHv1 1/9] ext4: Correct cluster len and clusters changed accounting in ext4_mb_mark_bb Ritesh Harjani
@ 2022-02-05 14:09 ` Ritesh Harjani
  2022-02-07 16:37   ` Jan Kara
  2022-02-05 14:09 ` [PATCHv1 3/9] ext4: Refactor ext4_free_blocks() to pull out ext4_mb_clear_bb() Ritesh Harjani
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Ritesh Harjani @ 2022-02-05 14:09 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, Theodore Ts'o, Jan Kara, Harshad Shirwadkar,
	Ritesh Harjani

In case of flex_bg feature (which is by default enabled), extents for
any given inode might span across blocks from two different block group.
ext4_mb_mark_bb() only reads the buffer_head of block bitmap once for the
starting block group, but it fails to read it again when the extent length
boundary overflows to another block group. Then in this below loop it
accesses memory beyond the block group bitmap buffer_head and results
into a data abort.

	for (i = 0; i < clen; i++)
		if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) == !state)
			already++;

This patch adds this functionality for checking block group boundary in
ext4_mb_mark_bb() and update the buffer_head(bitmap_bh) for every different
block group.

w/o this patch, I was easily able to hit a data access abort using Power platform.

<...>
[   74.327662] EXT4-fs error (device loop3): ext4_mb_generate_buddy:1141: group 11, block bitmap and bg descriptor inconsistent: 21248 vs 23294 free clusters
[   74.533214] EXT4-fs (loop3): shut down requested (2)
[   74.536705] Aborting journal on device loop3-8.
[   74.702705] BUG: Unable to handle kernel data access on read at 0xc00000005e980000
[   74.703727] Faulting instruction address: 0xc0000000007bffb8
cpu 0xd: Vector: 300 (Data Access) at [c000000015db7060]
    pc: c0000000007bffb8: ext4_mb_mark_bb+0x198/0x5a0
    lr: c0000000007bfeec: ext4_mb_mark_bb+0xcc/0x5a0
    sp: c000000015db7300
   msr: 800000000280b033
   dar: c00000005e980000
 dsisr: 40000000
  current = 0xc000000027af6880
  paca    = 0xc00000003ffd5200   irqmask: 0x03   irq_happened: 0x01
    pid   = 5167, comm = mount
<...>
enter ? for help
[c000000015db7380] c000000000782708 ext4_ext_clear_bb+0x378/0x410
[c000000015db7400] c000000000813f14 ext4_fc_replay+0x1794/0x2000
[c000000015db7580] c000000000833f7c do_one_pass+0xe9c/0x12a0
[c000000015db7710] c000000000834504 jbd2_journal_recover+0x184/0x2d0
[c000000015db77c0] c000000000841398 jbd2_journal_load+0x188/0x4a0
[c000000015db7880] c000000000804de8 ext4_fill_super+0x2638/0x3e10
[c000000015db7a40] c0000000005f8404 get_tree_bdev+0x2b4/0x350
[c000000015db7ae0] c0000000007ef058 ext4_get_tree+0x28/0x40
[c000000015db7b00] c0000000005f6344 vfs_get_tree+0x44/0x100
[c000000015db7b70] c00000000063c408 path_mount+0xdd8/0xe70
[c000000015db7c40] c00000000063c8f0 sys_mount+0x450/0x550
[c000000015db7d50] c000000000035770 system_call_exception+0x4a0/0x4e0
[c000000015db7e10] c00000000000c74c system_call_common+0xec/0x250

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 fs/ext4/mballoc.c | 131 +++++++++++++++++++++++++++-------------------
 1 file changed, 76 insertions(+), 55 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 2f117ce3bb73..d0bd51b1e1ad 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3901,72 +3901,93 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
 	ext4_grpblk_t blkoff;
 	int i, err;
 	int already;
-	unsigned int clen, clen_changed;
+	unsigned int clen, clen_changed, thisgrp_len;
 
-	clen = EXT4_NUM_B2C(sbi, len);
-
-	ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
-	bitmap_bh = ext4_read_block_bitmap(sb, group);
-	if (IS_ERR(bitmap_bh)) {
-		err = PTR_ERR(bitmap_bh);
-		bitmap_bh = NULL;
-		goto out_err;
-	}
-
-	err = -EIO;
-	gdp = ext4_get_group_desc(sb, group, &gdp_bh);
-	if (!gdp)
-		goto out_err;
+	while (len > 0) {
+		ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
 
-	ext4_lock_group(sb, group);
-	already = 0;
-	for (i = 0; i < clen; i++)
-		if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) == !state)
-			already++;
-
-	clen_changed = clen - already;
-	if (state)
-		ext4_set_bits(bitmap_bh->b_data, blkoff, clen);
-	else
-		mb_test_and_clear_bits(bitmap_bh->b_data, blkoff, clen);
-	if (ext4_has_group_desc_csum(sb) &&
-	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
-		gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
-		ext4_free_group_clusters_set(sb, gdp,
-					     ext4_free_clusters_after_init(sb,
-						group, gdp));
-	}
-	if (state)
-		clen = ext4_free_group_clusters(sb, gdp) - clen_changed;
-	else
-		clen = ext4_free_group_clusters(sb, gdp) + clen_changed;
+		/*
+		 * Check to see if we are freeing blocks across a group
+		 * boundary.
+		 * In case of flex_bg, this can happen that (block, len) may
+		 * span across more than one group. In that case we need to
+		 * get the corresponding group metadata to work with.
+		 * For this we have goto again loop.
+		 */
+		thisgrp_len = min_t(unsigned int, (unsigned int)len,
+			EXT4_BLOCKS_PER_GROUP(sb) - EXT4_C2B(sbi, blkoff));
+		clen = EXT4_NUM_B2C(sbi, thisgrp_len);
 
-	ext4_free_group_clusters_set(sb, gdp, clen);
-	ext4_block_bitmap_csum_set(sb, group, gdp, bitmap_bh);
-	ext4_group_desc_csum_set(sb, group, gdp);
+		bitmap_bh = ext4_read_block_bitmap(sb, group);
+		if (IS_ERR(bitmap_bh)) {
+			err = PTR_ERR(bitmap_bh);
+			bitmap_bh = NULL;
+			break;
+		}
 
-	ext4_unlock_group(sb, group);
+		err = -EIO;
+		gdp = ext4_get_group_desc(sb, group, &gdp_bh);
+		if (!gdp)
+			break;
 
-	if (sbi->s_log_groups_per_flex) {
-		ext4_group_t flex_group = ext4_flex_group(sbi, group);
-		struct flex_groups *fg = sbi_array_rcu_deref(sbi,
-					   s_flex_groups, flex_group);
+		ext4_lock_group(sb, group);
+		already = 0;
+		for (i = 0; i < clen; i++)
+			if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) ==
+					 !state)
+				already++;
 
+		clen_changed = clen - already;
 		if (state)
-			atomic64_sub(clen_changed, &fg->free_clusters);
+			ext4_set_bits(bitmap_bh->b_data, blkoff, clen);
 		else
-			atomic64_add(clen_changed, &fg->free_clusters);
+			mb_test_and_clear_bits(bitmap_bh->b_data, blkoff, clen);
+		if (ext4_has_group_desc_csum(sb) &&
+		    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
+			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
+			ext4_free_group_clusters_set(sb, gdp,
+			     ext4_free_clusters_after_init(sb, group, gdp));
+		}
+		if (state)
+			clen = ext4_free_group_clusters(sb, gdp) - clen_changed;
+		else
+			clen = ext4_free_group_clusters(sb, gdp) + clen_changed;
+
+		ext4_free_group_clusters_set(sb, gdp, clen);
+		ext4_block_bitmap_csum_set(sb, group, gdp, bitmap_bh);
+		ext4_group_desc_csum_set(sb, group, gdp);
+
+		ext4_unlock_group(sb, group);
+
+		if (sbi->s_log_groups_per_flex) {
+			ext4_group_t flex_group = ext4_flex_group(sbi, group);
+			struct flex_groups *fg = sbi_array_rcu_deref(sbi,
+						   s_flex_groups, flex_group);
+
+			if (state)
+				atomic64_sub(clen_changed, &fg->free_clusters);
+			else
+				atomic64_add(clen_changed, &fg->free_clusters);
+
+		}
+
+		err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
+		if (err)
+			break;
+		sync_dirty_buffer(bitmap_bh);
+		err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
+		sync_dirty_buffer(gdp_bh);
+		if (err)
+			break;
+
+		block += thisgrp_len;
+		len = len - thisgrp_len;
+		put_bh(bitmap_bh);
+		BUG_ON(len < 0);
 	}
 
-	err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
 	if (err)
-		goto out_err;
-	sync_dirty_buffer(bitmap_bh);
-	err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
-	sync_dirty_buffer(gdp_bh);
-
-out_err:
-	brelse(bitmap_bh);
+		brelse(bitmap_bh);
 }
 
 /*
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv1 3/9] ext4: Refactor ext4_free_blocks() to pull out ext4_mb_clear_bb()
  2022-02-05 14:09 [PATCHv1 0/9] ext4: fast_commit fixes, stricter block checking & cleanups Ritesh Harjani
  2022-02-05 14:09 ` [PATCHv1 1/9] ext4: Correct cluster len and clusters changed accounting in ext4_mb_mark_bb Ritesh Harjani
  2022-02-05 14:09 ` [PATCHv1 2/9] ext4: Fixes ext4_mb_mark_bb() with flex_bg with fast_commit Ritesh Harjani
@ 2022-02-05 14:09 ` Ritesh Harjani
  2022-02-05 14:09 ` [PATCHv1 4/9] ext4: Use in_range() for range checking in ext4_fc_replay_check_excluded Ritesh Harjani
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Ritesh Harjani @ 2022-02-05 14:09 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, Theodore Ts'o, Jan Kara, Harshad Shirwadkar,
	Ritesh Harjani

ext4_free_blocks() function became too long and confusing, this patch
just pulls out the ext4_mb_clear_bb() function logic from it
which clears the block bitmap and frees it.

No functionality change in this patch

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/mballoc.c | 180 ++++++++++++++++++++++++++--------------------
 1 file changed, 102 insertions(+), 78 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index d0bd51b1e1ad..91058f81a0c6 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5872,7 +5872,8 @@ static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block,
 }
 
 /**
- * ext4_free_blocks() -- Free given blocks and update quota
+ * ext4_mb_clear_bb() -- helper function for freeing blocks.
+ *			Used by ext4_free_blocks()
  * @handle:		handle for this transaction
  * @inode:		inode
  * @bh:			optional buffer of the block to be freed
@@ -5880,9 +5881,9 @@ static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block,
  * @count:		number of blocks to be freed
  * @flags:		flags used by ext4_free_blocks
  */
-void ext4_free_blocks(handle_t *handle, struct inode *inode,
-		      struct buffer_head *bh, ext4_fsblk_t block,
-		      unsigned long count, int flags)
+static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
+			       ext4_fsblk_t block, unsigned long count,
+			       int flags)
 {
 	struct buffer_head *bitmap_bh = NULL;
 	struct super_block *sb = inode->i_sb;
@@ -5899,80 +5900,6 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
 
 	sbi = EXT4_SB(sb);
 
-	if (sbi->s_mount_state & EXT4_FC_REPLAY) {
-		ext4_free_blocks_simple(inode, block, count);
-		return;
-	}
-
-	might_sleep();
-	if (bh) {
-		if (block)
-			BUG_ON(block != bh->b_blocknr);
-		else
-			block = bh->b_blocknr;
-	}
-
-	if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) &&
-	    !ext4_inode_block_valid(inode, block, count)) {
-		ext4_error(sb, "Freeing blocks not in datazone - "
-			   "block = %llu, count = %lu", block, count);
-		goto error_return;
-	}
-
-	ext4_debug("freeing block %llu\n", block);
-	trace_ext4_free_blocks(inode, block, count, flags);
-
-	if (bh && (flags & EXT4_FREE_BLOCKS_FORGET)) {
-		BUG_ON(count > 1);
-
-		ext4_forget(handle, flags & EXT4_FREE_BLOCKS_METADATA,
-			    inode, bh, block);
-	}
-
-	/*
-	 * If the extent to be freed does not begin on a cluster
-	 * boundary, we need to deal with partial clusters at the
-	 * beginning and end of the extent.  Normally we will free
-	 * blocks at the beginning or the end unless we are explicitly
-	 * requested to avoid doing so.
-	 */
-	overflow = EXT4_PBLK_COFF(sbi, block);
-	if (overflow) {
-		if (flags & EXT4_FREE_BLOCKS_NOFREE_FIRST_CLUSTER) {
-			overflow = sbi->s_cluster_ratio - overflow;
-			block += overflow;
-			if (count > overflow)
-				count -= overflow;
-			else
-				return;
-		} else {
-			block -= overflow;
-			count += overflow;
-		}
-	}
-	overflow = EXT4_LBLK_COFF(sbi, count);
-	if (overflow) {
-		if (flags & EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER) {
-			if (count > overflow)
-				count -= overflow;
-			else
-				return;
-		} else
-			count += sbi->s_cluster_ratio - overflow;
-	}
-
-	if (!bh && (flags & EXT4_FREE_BLOCKS_FORGET)) {
-		int i;
-		int is_metadata = flags & EXT4_FREE_BLOCKS_METADATA;
-
-		for (i = 0; i < count; i++) {
-			cond_resched();
-			if (is_metadata)
-				bh = sb_find_get_block(inode->i_sb, block + i);
-			ext4_forget(handle, is_metadata, inode, bh, block + i);
-		}
-	}
-
 do_more:
 	overflow = 0;
 	ext4_get_group_no_and_offset(sb, block, &block_group, &bit);
@@ -6140,6 +6067,103 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
 	return;
 }
 
+/**
+ * ext4_free_blocks() -- Free given blocks and update quota
+ * @handle:		handle for this transaction
+ * @inode:		inode
+ * @bh:			optional buffer of the block to be freed
+ * @block:		starting physical block to be freed
+ * @count:		number of blocks to be freed
+ * @flags:		flags used by ext4_free_blocks
+ */
+void ext4_free_blocks(handle_t *handle, struct inode *inode,
+		      struct buffer_head *bh, ext4_fsblk_t block,
+		      unsigned long count, int flags)
+{
+	struct super_block *sb = inode->i_sb;
+	unsigned int overflow;
+	struct ext4_sb_info *sbi;
+
+	sbi = EXT4_SB(sb);
+
+	if (sbi->s_mount_state & EXT4_FC_REPLAY) {
+		ext4_free_blocks_simple(inode, block, count);
+		return;
+	}
+
+	might_sleep();
+	if (bh) {
+		if (block)
+			BUG_ON(block != bh->b_blocknr);
+		else
+			block = bh->b_blocknr;
+	}
+
+	if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) &&
+	    !ext4_inode_block_valid(inode, block, count)) {
+		ext4_error(sb, "Freeing blocks not in datazone - "
+			   "block = %llu, count = %lu", block, count);
+		return;
+	}
+
+	ext4_debug("freeing block %llu\n", block);
+	trace_ext4_free_blocks(inode, block, count, flags);
+
+	if (bh && (flags & EXT4_FREE_BLOCKS_FORGET)) {
+		BUG_ON(count > 1);
+
+		ext4_forget(handle, flags & EXT4_FREE_BLOCKS_METADATA,
+			    inode, bh, block);
+	}
+
+	/*
+	 * If the extent to be freed does not begin on a cluster
+	 * boundary, we need to deal with partial clusters at the
+	 * beginning and end of the extent.  Normally we will free
+	 * blocks at the beginning or the end unless we are explicitly
+	 * requested to avoid doing so.
+	 */
+	overflow = EXT4_PBLK_COFF(sbi, block);
+	if (overflow) {
+		if (flags & EXT4_FREE_BLOCKS_NOFREE_FIRST_CLUSTER) {
+			overflow = sbi->s_cluster_ratio - overflow;
+			block += overflow;
+			if (count > overflow)
+				count -= overflow;
+			else
+				return;
+		} else {
+			block -= overflow;
+			count += overflow;
+		}
+	}
+	overflow = EXT4_LBLK_COFF(sbi, count);
+	if (overflow) {
+		if (flags & EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER) {
+			if (count > overflow)
+				count -= overflow;
+			else
+				return;
+		} else
+			count += sbi->s_cluster_ratio - overflow;
+	}
+
+	if (!bh && (flags & EXT4_FREE_BLOCKS_FORGET)) {
+		int i;
+		int is_metadata = flags & EXT4_FREE_BLOCKS_METADATA;
+
+		for (i = 0; i < count; i++) {
+			cond_resched();
+			if (is_metadata)
+				bh = sb_find_get_block(inode->i_sb, block + i);
+			ext4_forget(handle, is_metadata, inode, bh, block + i);
+		}
+	}
+
+	ext4_mb_clear_bb(handle, inode, block, count, flags);
+	return;
+}
+
 /**
  * ext4_group_add_blocks() -- Add given blocks to an existing group
  * @handle:			handle to this transaction
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv1 4/9] ext4: Use in_range() for range checking in ext4_fc_replay_check_excluded
  2022-02-05 14:09 [PATCHv1 0/9] ext4: fast_commit fixes, stricter block checking & cleanups Ritesh Harjani
                   ` (2 preceding siblings ...)
  2022-02-05 14:09 ` [PATCHv1 3/9] ext4: Refactor ext4_free_blocks() to pull out ext4_mb_clear_bb() Ritesh Harjani
@ 2022-02-05 14:09 ` Ritesh Harjani
  2022-02-05 14:09 ` [PATCHv1 5/9] ext4: Rename ext4_set_bits to mb_set_bits Ritesh Harjani
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Ritesh Harjani @ 2022-02-05 14:09 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, Theodore Ts'o, Jan Kara, Harshad Shirwadkar,
	Ritesh Harjani

Instead of open coding it, use in_range() function instead.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/fast_commit.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 7964ee34e322..3c5baca38767 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -1875,8 +1875,8 @@ bool ext4_fc_replay_check_excluded(struct super_block *sb, ext4_fsblk_t blk)
 		if (state->fc_regions[i].ino == 0 ||
 			state->fc_regions[i].len == 0)
 			continue;
-		if (blk >= state->fc_regions[i].pblk &&
-		    blk < state->fc_regions[i].pblk + state->fc_regions[i].len)
+		if (in_range(blk, state->fc_regions[i].pblk,
+					state->fc_regions[i].len))
 			return true;
 	}
 	return false;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv1 5/9] ext4: Rename ext4_set_bits to mb_set_bits
  2022-02-05 14:09 [PATCHv1 0/9] ext4: fast_commit fixes, stricter block checking & cleanups Ritesh Harjani
                   ` (3 preceding siblings ...)
  2022-02-05 14:09 ` [PATCHv1 4/9] ext4: Use in_range() for range checking in ext4_fc_replay_check_excluded Ritesh Harjani
@ 2022-02-05 14:09 ` Ritesh Harjani
  2022-02-07 16:38   ` Jan Kara
  2022-02-05 14:09 ` [PATCHv1 6/9] ext4: No need to test for block bitmap bits in ext4_mb_mark_bb() Ritesh Harjani
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Ritesh Harjani @ 2022-02-05 14:09 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, Theodore Ts'o, Jan Kara, Harshad Shirwadkar,
	Ritesh Harjani

ext4_set_bits() should actually be mb_set_bits() for uniform API naming
convention.
This is via below cmd -

grep -nr "ext4_set_bits" fs/ext4/ | cut -d ":" -f 1 | xargs sed -i 's/ext4_set_bits/mb_set_bits/g'

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 fs/ext4/ext4.h    |  2 +-
 fs/ext4/mballoc.c | 14 +++++++-------
 fs/ext4/resize.c  |  4 ++--
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 09d8f60ebf0f..8c1d0e352f47 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1279,7 +1279,7 @@ struct ext4_inode_info {
 #define ext4_find_next_zero_bit		find_next_zero_bit_le
 #define ext4_find_next_bit		find_next_bit_le

-extern void ext4_set_bits(void *bm, int cur, int len);
+extern void mb_set_bits(void *bm, int cur, int len);

 /*
  * Maximal mount counts between two filesystem checks
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 91058f81a0c6..f80af108d05e 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1689,7 +1689,7 @@ static int mb_test_and_clear_bits(void *bm, int cur, int len)
 	return zero_bit;
 }

-void ext4_set_bits(void *bm, int cur, int len)
+void mb_set_bits(void *bm, int cur, int len)
 {
 	__u32 *addr;

@@ -1996,7 +1996,7 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
 	mb_set_largest_free_order(e4b->bd_sb, e4b->bd_info);

 	mb_update_avg_fragment_size(e4b->bd_sb, e4b->bd_info);
-	ext4_set_bits(e4b->bd_bitmap, ex->fe_start, len0);
+	mb_set_bits(e4b->bd_bitmap, ex->fe_start, len0);
 	mb_check_buddy(e4b);

 	return ret;
@@ -3825,7 +3825,7 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
 		 * We leak some of the blocks here.
 		 */
 		ext4_lock_group(sb, ac->ac_b_ex.fe_group);
-		ext4_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start,
+		mb_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start,
 			      ac->ac_b_ex.fe_len);
 		ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
 		err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
@@ -3844,7 +3844,7 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
 		}
 	}
 #endif
-	ext4_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start,
+	mb_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start,
 		      ac->ac_b_ex.fe_len);
 	if (ext4_has_group_desc_csum(sb) &&
 	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
@@ -3939,7 +3939,7 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,

 		clen_changed = clen - already;
 		if (state)
-			ext4_set_bits(bitmap_bh->b_data, blkoff, clen);
+			mb_set_bits(bitmap_bh->b_data, blkoff, clen);
 		else
 			mb_test_and_clear_bits(bitmap_bh->b_data, blkoff, clen);
 		if (ext4_has_group_desc_csum(sb) &&
@@ -4459,7 +4459,7 @@ static void ext4_mb_generate_from_freelist(struct super_block *sb, void *bitmap,

 	while (n) {
 		entry = rb_entry(n, struct ext4_free_data, efd_node);
-		ext4_set_bits(bitmap, entry->efd_start_cluster, entry->efd_count);
+		mb_set_bits(bitmap, entry->efd_start_cluster, entry->efd_count);
 		n = rb_next(n);
 	}
 	return;
@@ -4500,7 +4500,7 @@ void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap,
 		if (unlikely(len == 0))
 			continue;
 		BUG_ON(groupnr != group);
-		ext4_set_bits(bitmap, start, len);
+		mb_set_bits(bitmap, start, len);
 		preallocated += len;
 	}
 	mb_debug(sb, "preallocated %d for group %u\n", preallocated, group);
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index ee8f02f406cb..f507f34be602 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -483,7 +483,7 @@ static int set_flexbg_block_bitmap(struct super_block *sb, handle_t *handle,
 		}
 		ext4_debug("mark block bitmap %#04llx (+%llu/%u)\n",
 			   first_cluster, first_cluster - start, count2);
-		ext4_set_bits(bh->b_data, first_cluster - start, count2);
+		mb_set_bits(bh->b_data, first_cluster - start, count2);

 		err = ext4_handle_dirty_metadata(handle, NULL, bh);
 		brelse(bh);
@@ -632,7 +632,7 @@ static int setup_new_flex_group_blocks(struct super_block *sb,
 		if (overhead != 0) {
 			ext4_debug("mark backup superblock %#04llx (+0)\n",
 				   start);
-			ext4_set_bits(bh->b_data, 0,
+			mb_set_bits(bh->b_data, 0,
 				      EXT4_NUM_B2C(sbi, overhead));
 		}
 		ext4_mark_bitmap_end(EXT4_B2C(sbi, group_data[i].blocks_count),
--
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv1 6/9] ext4: No need to test for block bitmap bits in ext4_mb_mark_bb()
  2022-02-05 14:09 [PATCHv1 0/9] ext4: fast_commit fixes, stricter block checking & cleanups Ritesh Harjani
                   ` (4 preceding siblings ...)
  2022-02-05 14:09 ` [PATCHv1 5/9] ext4: Rename ext4_set_bits to mb_set_bits Ritesh Harjani
@ 2022-02-05 14:09 ` Ritesh Harjani
  2022-02-05 14:09 ` [PATCHv1 7/9] ext4: Add ext4_sb_block_valid() refactored out of ext4_inode_block_valid() Ritesh Harjani
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Ritesh Harjani @ 2022-02-05 14:09 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, Theodore Ts'o, Jan Kara, Harshad Shirwadkar,
	Ritesh Harjani

We don't need the return value of mb_test_and_clear_bits() in ext4_mb_mark_bb()
So simply use mb_clear_bits() instead.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/mballoc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index f80af108d05e..23313963bb56 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3941,7 +3941,7 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
 		if (state)
 			mb_set_bits(bitmap_bh->b_data, blkoff, clen);
 		else
-			mb_test_and_clear_bits(bitmap_bh->b_data, blkoff, clen);
+			mb_clear_bits(bitmap_bh->b_data, blkoff, clen);
 		if (ext4_has_group_desc_csum(sb) &&
 		    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
 			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv1 7/9] ext4: Add ext4_sb_block_valid() refactored out of ext4_inode_block_valid()
  2022-02-05 14:09 [PATCHv1 0/9] ext4: fast_commit fixes, stricter block checking & cleanups Ritesh Harjani
                   ` (5 preceding siblings ...)
  2022-02-05 14:09 ` [PATCHv1 6/9] ext4: No need to test for block bitmap bits in ext4_mb_mark_bb() Ritesh Harjani
@ 2022-02-05 14:09 ` Ritesh Harjani
  2022-02-07 16:42   ` Jan Kara
  2022-02-05 14:09 ` [PATCHv1 8/9] ext4: Add strict range checks while freeing blocks Ritesh Harjani
  2022-02-05 14:09 ` [PATCHv1 9/9] ext4: Add extra check in ext4_mb_mark_bb() to prevent against possible corruption Ritesh Harjani
  8 siblings, 1 reply; 19+ messages in thread
From: Ritesh Harjani @ 2022-02-05 14:09 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, Theodore Ts'o, Jan Kara, Harshad Shirwadkar,
	Ritesh Harjani

This API will be needed at places where we don't have an inode
for e.g. while freeing blocks in ext4_group_add_blocks()

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 fs/ext4/block_validity.c | 25 ++++++++++++++++---------
 fs/ext4/ext4.h           |  3 +++
 2 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/block_validity.c b/fs/ext4/block_validity.c
index 4666b55b736e..a9195d5ac1e7 100644
--- a/fs/ext4/block_validity.c
+++ b/fs/ext4/block_validity.c
@@ -292,15 +292,10 @@ void ext4_release_system_zone(struct super_block *sb)
 		call_rcu(&system_blks->rcu, ext4_destroy_system_zone);
 }
 
-/*
- * Returns 1 if the passed-in block region (start_blk,
- * start_blk+count) is valid; 0 if some part of the block region
- * overlaps with some other filesystem metadata blocks.
- */
-int ext4_inode_block_valid(struct inode *inode, ext4_fsblk_t start_blk,
-			  unsigned int count)
+int ext4_sb_block_valid(struct super_block *sb, struct inode *inode,
+				ext4_fsblk_t start_blk, unsigned int count)
 {
-	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	struct ext4_system_blocks *system_blks;
 	struct ext4_system_zone *entry;
 	struct rb_node *n;
@@ -329,7 +324,8 @@ int ext4_inode_block_valid(struct inode *inode, ext4_fsblk_t start_blk,
 		else if (start_blk >= (entry->start_blk + entry->count))
 			n = n->rb_right;
 		else {
-			ret = (entry->ino == inode->i_ino);
+			if (inode)
+				ret = (entry->ino == inode->i_ino);
 			break;
 		}
 	}
@@ -338,6 +334,17 @@ int ext4_inode_block_valid(struct inode *inode, ext4_fsblk_t start_blk,
 	return ret;
 }
 
+/*
+ * Returns 1 if the passed-in block region (start_blk,
+ * start_blk+count) is valid; 0 if some part of the block region
+ * overlaps with some other filesystem metadata blocks.
+ */
+int ext4_inode_block_valid(struct inode *inode, ext4_fsblk_t start_blk,
+			  unsigned int count)
+{
+	return ext4_sb_block_valid(inode->i_sb, inode, start_blk, count);
+}
+
 int ext4_check_blockref(const char *function, unsigned int line,
 			struct inode *inode, __le32 *p, unsigned int max)
 {
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 8c1d0e352f47..4f7851c1e432 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3706,6 +3706,9 @@ extern int ext4_inode_block_valid(struct inode *inode,
 				  unsigned int count);
 extern int ext4_check_blockref(const char *, unsigned int,
 			       struct inode *, __le32 *, unsigned int);
+extern int ext4_sb_block_valid(struct super_block *sb, struct inode *inode,
+				ext4_fsblk_t start_blk, unsigned int count);
+
 
 /* extents.c */
 struct ext4_ext_path;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv1 8/9] ext4: Add strict range checks while freeing blocks
  2022-02-05 14:09 [PATCHv1 0/9] ext4: fast_commit fixes, stricter block checking & cleanups Ritesh Harjani
                   ` (6 preceding siblings ...)
  2022-02-05 14:09 ` [PATCHv1 7/9] ext4: Add ext4_sb_block_valid() refactored out of ext4_inode_block_valid() Ritesh Harjani
@ 2022-02-05 14:09 ` Ritesh Harjani
  2022-02-07 16:44   ` Jan Kara
  2022-02-05 14:09 ` [PATCHv1 9/9] ext4: Add extra check in ext4_mb_mark_bb() to prevent against possible corruption Ritesh Harjani
  8 siblings, 1 reply; 19+ messages in thread
From: Ritesh Harjani @ 2022-02-05 14:09 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, Theodore Ts'o, Jan Kara, Harshad Shirwadkar,
	Ritesh Harjani

Currently ext4_mb_clear_bb() & ext4_group_add_blocks() only checks
whether the given block ranges (which is to be freed) belongs to any FS
metadata blocks or not, of the block's respective block group.
But to detect any FS error early, it is better to add more strict
checkings in those functions which checks whether the given blocks
belongs to any critical FS metadata or not within system-zone.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 fs/ext4/mballoc.c | 16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 23313963bb56..9f2b3a057918 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5930,13 +5930,7 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 		goto error_return;
 	}
 
-	if (in_range(ext4_block_bitmap(sb, gdp), block, count) ||
-	    in_range(ext4_inode_bitmap(sb, gdp), block, count) ||
-	    in_range(block, ext4_inode_table(sb, gdp),
-		     sbi->s_itb_per_group) ||
-	    in_range(block + count - 1, ext4_inode_table(sb, gdp),
-		     sbi->s_itb_per_group)) {
-
+	if (!ext4_inode_block_valid(inode, block, count)) {
 		ext4_error(sb, "Freeing blocks in system zone - "
 			   "Block = %llu, count = %lu", block, count);
 		/* err = 0. ext4_std_error should be a no op */
@@ -6007,7 +6001,7 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 						 NULL);
 			if (err && err != -EOPNOTSUPP)
 				ext4_msg(sb, KERN_WARNING, "discard request in"
-					 " group:%d block:%d count:%lu failed"
+					 " group:%u block:%d count:%lu failed"
 					 " with %d", block_group, bit, count,
 					 err);
 		} else
@@ -6220,11 +6214,7 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 		goto error_return;
 	}
 
-	if (in_range(ext4_block_bitmap(sb, desc), block, count) ||
-	    in_range(ext4_inode_bitmap(sb, desc), block, count) ||
-	    in_range(block, ext4_inode_table(sb, desc), sbi->s_itb_per_group) ||
-	    in_range(block + count - 1, ext4_inode_table(sb, desc),
-		     sbi->s_itb_per_group)) {
+	if (!ext4_sb_block_valid(sb, NULL, block, count)) {
 		ext4_error(sb, "Adding blocks in system zones - "
 			   "Block = %llu, count = %lu",
 			   block, count);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv1 9/9] ext4: Add extra check in ext4_mb_mark_bb() to prevent against possible corruption
  2022-02-05 14:09 [PATCHv1 0/9] ext4: fast_commit fixes, stricter block checking & cleanups Ritesh Harjani
                   ` (7 preceding siblings ...)
  2022-02-05 14:09 ` [PATCHv1 8/9] ext4: Add strict range checks while freeing blocks Ritesh Harjani
@ 2022-02-05 14:09 ` Ritesh Harjani
  2022-02-07 16:45   ` Jan Kara
  8 siblings, 1 reply; 19+ messages in thread
From: Ritesh Harjani @ 2022-02-05 14:09 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, Theodore Ts'o, Jan Kara, Harshad Shirwadkar,
	Ritesh Harjani

This patch adds an extra checks in ext4_mb_mark_bb() function
to make sure we mark & report error if we were to mark/clear any
of the critical FS metadata specific bitmaps (&bail out) to prevent
from any accidental corruption.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 fs/ext4/mballoc.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 9f2b3a057918..75c20a10529a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3918,6 +3918,14 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
 			EXT4_BLOCKS_PER_GROUP(sb) - EXT4_C2B(sbi, blkoff));
 		clen = EXT4_NUM_B2C(sbi, thisgrp_len);
 
+		if (!ext4_sb_block_valid(sb, NULL, block, thisgrp_len)) {
+			ext4_error(sb, "Marking blocks in system zone - "
+				   "Block = %llu, len = %u",
+				   block, thisgrp_len);
+			bitmap_bh = NULL;
+			break;
+		}
+
 		bitmap_bh = ext4_read_block_bitmap(sb, group);
 		if (IS_ERR(bitmap_bh)) {
 			err = PTR_ERR(bitmap_bh);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCHv1 1/9] ext4: Correct cluster len and clusters changed accounting in ext4_mb_mark_bb
  2022-02-05 14:09 ` [PATCHv1 1/9] ext4: Correct cluster len and clusters changed accounting in ext4_mb_mark_bb Ritesh Harjani
@ 2022-02-07 15:28   ` Jan Kara
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Kara @ 2022-02-07 15:28 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, linux-fsdevel, Theodore Ts'o, Jan Kara,
	Harshad Shirwadkar

On Sat 05-02-22 19:39:50, Ritesh Harjani wrote:
> ext4_mb_mark_bb() currently wrongly calculates cluster len (clen) and
> flex_group->free_clusters. This patch fixes that.
> 
> Identified based on code review of ext4_mb_mark_bb() function.
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/mballoc.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index c781974df9d0..2f117ce3bb73 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -3899,10 +3899,11 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
>  	struct ext4_sb_info *sbi = EXT4_SB(sb);
>  	ext4_group_t group;
>  	ext4_grpblk_t blkoff;
> -	int i, clen, err;
> +	int i, err;
>  	int already;
> +	unsigned int clen, clen_changed;
>  
> -	clen = EXT4_B2C(sbi, len);
> +	clen = EXT4_NUM_B2C(sbi, len);
>  
>  	ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
>  	bitmap_bh = ext4_read_block_bitmap(sb, group);
> @@ -3923,6 +3924,7 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
>  		if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) == !state)
>  			already++;
>  
> +	clen_changed = clen - already;
>  	if (state)
>  		ext4_set_bits(bitmap_bh->b_data, blkoff, clen);
>  	else
> @@ -3935,9 +3937,9 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
>  						group, gdp));
>  	}
>  	if (state)
> -		clen = ext4_free_group_clusters(sb, gdp) - clen + already;
> +		clen = ext4_free_group_clusters(sb, gdp) - clen_changed;
>  	else
> -		clen = ext4_free_group_clusters(sb, gdp) + clen - already;
> +		clen = ext4_free_group_clusters(sb, gdp) + clen_changed;
>  
>  	ext4_free_group_clusters_set(sb, gdp, clen);
>  	ext4_block_bitmap_csum_set(sb, group, gdp, bitmap_bh);
> @@ -3947,10 +3949,13 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
>  
>  	if (sbi->s_log_groups_per_flex) {
>  		ext4_group_t flex_group = ext4_flex_group(sbi, group);
> +		struct flex_groups *fg = sbi_array_rcu_deref(sbi,
> +					   s_flex_groups, flex_group);
>  
> -		atomic64_sub(len,
> -			     &sbi_array_rcu_deref(sbi, s_flex_groups,
> -						  flex_group)->free_clusters);
> +		if (state)
> +			atomic64_sub(clen_changed, &fg->free_clusters);
> +		else
> +			atomic64_add(clen_changed, &fg->free_clusters);
>  	}
>  
>  	err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv1 2/9] ext4: Fixes ext4_mb_mark_bb() with flex_bg with fast_commit
  2022-02-05 14:09 ` [PATCHv1 2/9] ext4: Fixes ext4_mb_mark_bb() with flex_bg with fast_commit Ritesh Harjani
@ 2022-02-07 16:37   ` Jan Kara
  2022-02-08  3:11     ` Ritesh Harjani
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2022-02-07 16:37 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, linux-fsdevel, Theodore Ts'o, Jan Kara,
	Harshad Shirwadkar

On Sat 05-02-22 19:39:51, Ritesh Harjani wrote:
> In case of flex_bg feature (which is by default enabled), extents for
> any given inode might span across blocks from two different block group.
> ext4_mb_mark_bb() only reads the buffer_head of block bitmap once for the
> starting block group, but it fails to read it again when the extent length
> boundary overflows to another block group. Then in this below loop it
> accesses memory beyond the block group bitmap buffer_head and results
> into a data abort.
> 
> 	for (i = 0; i < clen; i++)
> 		if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) == !state)
> 			already++;
> 
> This patch adds this functionality for checking block group boundary in
> ext4_mb_mark_bb() and update the buffer_head(bitmap_bh) for every different
> block group.
> 
> w/o this patch, I was easily able to hit a data access abort using Power platform.
> 
> <...>
> [   74.327662] EXT4-fs error (device loop3): ext4_mb_generate_buddy:1141: group 11, block bitmap and bg descriptor inconsistent: 21248 vs 23294 free clusters
> [   74.533214] EXT4-fs (loop3): shut down requested (2)
> [   74.536705] Aborting journal on device loop3-8.
> [   74.702705] BUG: Unable to handle kernel data access on read at 0xc00000005e980000
> [   74.703727] Faulting instruction address: 0xc0000000007bffb8
> cpu 0xd: Vector: 300 (Data Access) at [c000000015db7060]
>     pc: c0000000007bffb8: ext4_mb_mark_bb+0x198/0x5a0
>     lr: c0000000007bfeec: ext4_mb_mark_bb+0xcc/0x5a0
>     sp: c000000015db7300
>    msr: 800000000280b033
>    dar: c00000005e980000
>  dsisr: 40000000
>   current = 0xc000000027af6880
>   paca    = 0xc00000003ffd5200   irqmask: 0x03   irq_happened: 0x01
>     pid   = 5167, comm = mount
> <...>
> enter ? for help
> [c000000015db7380] c000000000782708 ext4_ext_clear_bb+0x378/0x410
> [c000000015db7400] c000000000813f14 ext4_fc_replay+0x1794/0x2000
> [c000000015db7580] c000000000833f7c do_one_pass+0xe9c/0x12a0
> [c000000015db7710] c000000000834504 jbd2_journal_recover+0x184/0x2d0
> [c000000015db77c0] c000000000841398 jbd2_journal_load+0x188/0x4a0
> [c000000015db7880] c000000000804de8 ext4_fill_super+0x2638/0x3e10
> [c000000015db7a40] c0000000005f8404 get_tree_bdev+0x2b4/0x350
> [c000000015db7ae0] c0000000007ef058 ext4_get_tree+0x28/0x40
> [c000000015db7b00] c0000000005f6344 vfs_get_tree+0x44/0x100
> [c000000015db7b70] c00000000063c408 path_mount+0xdd8/0xe70
> [c000000015db7c40] c00000000063c8f0 sys_mount+0x450/0x550
> [c000000015db7d50] c000000000035770 system_call_exception+0x4a0/0x4e0
> [c000000015db7e10] c00000000000c74c system_call_common+0xec/0x250
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>

Just two nits below. Otherwise feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

> ---
>  fs/ext4/mballoc.c | 131 +++++++++++++++++++++++++++-------------------
>  1 file changed, 76 insertions(+), 55 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 2f117ce3bb73..d0bd51b1e1ad 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -3901,72 +3901,93 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
>  	ext4_grpblk_t blkoff;
>  	int i, err;
>  	int already;
> -	unsigned int clen, clen_changed;
> +	unsigned int clen, clen_changed, thisgrp_len;
>  
> -	clen = EXT4_NUM_B2C(sbi, len);
> -
> -	ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
> -	bitmap_bh = ext4_read_block_bitmap(sb, group);
> -	if (IS_ERR(bitmap_bh)) {
> -		err = PTR_ERR(bitmap_bh);
> -		bitmap_bh = NULL;
> -		goto out_err;
> -	}
> -
> -	err = -EIO;
> -	gdp = ext4_get_group_desc(sb, group, &gdp_bh);
> -	if (!gdp)
> -		goto out_err;
> +	while (len > 0) {
> +		ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
>  
> -	ext4_lock_group(sb, group);
> -	already = 0;
> -	for (i = 0; i < clen; i++)
> -		if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) == !state)
> -			already++;
> -
> -	clen_changed = clen - already;
> -	if (state)
> -		ext4_set_bits(bitmap_bh->b_data, blkoff, clen);
> -	else
> -		mb_test_and_clear_bits(bitmap_bh->b_data, blkoff, clen);
> -	if (ext4_has_group_desc_csum(sb) &&
> -	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
> -		gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
> -		ext4_free_group_clusters_set(sb, gdp,
> -					     ext4_free_clusters_after_init(sb,
> -						group, gdp));
> -	}
> -	if (state)
> -		clen = ext4_free_group_clusters(sb, gdp) - clen_changed;
> -	else
> -		clen = ext4_free_group_clusters(sb, gdp) + clen_changed;
> +		/*
> +		 * Check to see if we are freeing blocks across a group
> +		 * boundary.
> +		 * In case of flex_bg, this can happen that (block, len) may
> +		 * span across more than one group. In that case we need to
> +		 * get the corresponding group metadata to work with.
> +		 * For this we have goto again loop.
> +		 */
> +		thisgrp_len = min_t(unsigned int, (unsigned int)len,
> +			EXT4_BLOCKS_PER_GROUP(sb) - EXT4_C2B(sbi, blkoff));
> +		clen = EXT4_NUM_B2C(sbi, thisgrp_len);
>  
> -	ext4_free_group_clusters_set(sb, gdp, clen);
> -	ext4_block_bitmap_csum_set(sb, group, gdp, bitmap_bh);
> -	ext4_group_desc_csum_set(sb, group, gdp);
> +		bitmap_bh = ext4_read_block_bitmap(sb, group);
> +		if (IS_ERR(bitmap_bh)) {
> +			err = PTR_ERR(bitmap_bh);
> +			bitmap_bh = NULL;
> +			break;
> +		}
>  
> -	ext4_unlock_group(sb, group);
> +		err = -EIO;
> +		gdp = ext4_get_group_desc(sb, group, &gdp_bh);
> +		if (!gdp)
> +			break;
>  
> -	if (sbi->s_log_groups_per_flex) {
> -		ext4_group_t flex_group = ext4_flex_group(sbi, group);
> -		struct flex_groups *fg = sbi_array_rcu_deref(sbi,
> -					   s_flex_groups, flex_group);
> +		ext4_lock_group(sb, group);
> +		already = 0;
> +		for (i = 0; i < clen; i++)
> +			if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) ==
> +					 !state)
> +				already++;
>  
> +		clen_changed = clen - already;
>  		if (state)
> -			atomic64_sub(clen_changed, &fg->free_clusters);
> +			ext4_set_bits(bitmap_bh->b_data, blkoff, clen);
>  		else
> -			atomic64_add(clen_changed, &fg->free_clusters);
> +			mb_test_and_clear_bits(bitmap_bh->b_data, blkoff, clen);
> +		if (ext4_has_group_desc_csum(sb) &&
> +		    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
> +			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
> +			ext4_free_group_clusters_set(sb, gdp,
> +			     ext4_free_clusters_after_init(sb, group, gdp));
> +		}
> +		if (state)
> +			clen = ext4_free_group_clusters(sb, gdp) - clen_changed;
> +		else
> +			clen = ext4_free_group_clusters(sb, gdp) + clen_changed;
> +
> +		ext4_free_group_clusters_set(sb, gdp, clen);
> +		ext4_block_bitmap_csum_set(sb, group, gdp, bitmap_bh);
> +		ext4_group_desc_csum_set(sb, group, gdp);
> +
> +		ext4_unlock_group(sb, group);
> +
> +		if (sbi->s_log_groups_per_flex) {
> +			ext4_group_t flex_group = ext4_flex_group(sbi, group);
> +			struct flex_groups *fg = sbi_array_rcu_deref(sbi,
> +						   s_flex_groups, flex_group);
> +
> +			if (state)
> +				atomic64_sub(clen_changed, &fg->free_clusters);
> +			else
> +				atomic64_add(clen_changed, &fg->free_clusters);
> +
> +		}
> +
> +		err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
> +		if (err)
> +			break;
> +		sync_dirty_buffer(bitmap_bh);
> +		err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
> +		sync_dirty_buffer(gdp_bh);
> +		if (err)
> +			break;
> +
> +		block += thisgrp_len;
> +		len = len - thisgrp_len;
		^^^ Maybe: len -= thisgrp_len;

> +		put_bh(bitmap_bh);
		^^ brelse() would be more usual here...


								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv1 5/9] ext4: Rename ext4_set_bits to mb_set_bits
  2022-02-05 14:09 ` [PATCHv1 5/9] ext4: Rename ext4_set_bits to mb_set_bits Ritesh Harjani
@ 2022-02-07 16:38   ` Jan Kara
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Kara @ 2022-02-07 16:38 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, linux-fsdevel, Theodore Ts'o, Jan Kara,
	Harshad Shirwadkar

On Sat 05-02-22 19:39:54, Ritesh Harjani wrote:
> ext4_set_bits() should actually be mb_set_bits() for uniform API naming
> convention.
> This is via below cmd -
> 
> grep -nr "ext4_set_bits" fs/ext4/ | cut -d ":" -f 1 | xargs sed -i 's/ext4_set_bits/mb_set_bits/g'
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/ext4.h    |  2 +-
>  fs/ext4/mballoc.c | 14 +++++++-------
>  fs/ext4/resize.c  |  4 ++--
>  3 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 09d8f60ebf0f..8c1d0e352f47 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1279,7 +1279,7 @@ struct ext4_inode_info {
>  #define ext4_find_next_zero_bit		find_next_zero_bit_le
>  #define ext4_find_next_bit		find_next_bit_le
> 
> -extern void ext4_set_bits(void *bm, int cur, int len);
> +extern void mb_set_bits(void *bm, int cur, int len);
> 
>  /*
>   * Maximal mount counts between two filesystem checks
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 91058f81a0c6..f80af108d05e 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1689,7 +1689,7 @@ static int mb_test_and_clear_bits(void *bm, int cur, int len)
>  	return zero_bit;
>  }
> 
> -void ext4_set_bits(void *bm, int cur, int len)
> +void mb_set_bits(void *bm, int cur, int len)
>  {
>  	__u32 *addr;
> 
> @@ -1996,7 +1996,7 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>  	mb_set_largest_free_order(e4b->bd_sb, e4b->bd_info);
> 
>  	mb_update_avg_fragment_size(e4b->bd_sb, e4b->bd_info);
> -	ext4_set_bits(e4b->bd_bitmap, ex->fe_start, len0);
> +	mb_set_bits(e4b->bd_bitmap, ex->fe_start, len0);
>  	mb_check_buddy(e4b);
> 
>  	return ret;
> @@ -3825,7 +3825,7 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
>  		 * We leak some of the blocks here.
>  		 */
>  		ext4_lock_group(sb, ac->ac_b_ex.fe_group);
> -		ext4_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start,
> +		mb_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start,
>  			      ac->ac_b_ex.fe_len);
>  		ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
>  		err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
> @@ -3844,7 +3844,7 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
>  		}
>  	}
>  #endif
> -	ext4_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start,
> +	mb_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start,
>  		      ac->ac_b_ex.fe_len);
>  	if (ext4_has_group_desc_csum(sb) &&
>  	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
> @@ -3939,7 +3939,7 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
> 
>  		clen_changed = clen - already;
>  		if (state)
> -			ext4_set_bits(bitmap_bh->b_data, blkoff, clen);
> +			mb_set_bits(bitmap_bh->b_data, blkoff, clen);
>  		else
>  			mb_test_and_clear_bits(bitmap_bh->b_data, blkoff, clen);
>  		if (ext4_has_group_desc_csum(sb) &&
> @@ -4459,7 +4459,7 @@ static void ext4_mb_generate_from_freelist(struct super_block *sb, void *bitmap,
> 
>  	while (n) {
>  		entry = rb_entry(n, struct ext4_free_data, efd_node);
> -		ext4_set_bits(bitmap, entry->efd_start_cluster, entry->efd_count);
> +		mb_set_bits(bitmap, entry->efd_start_cluster, entry->efd_count);
>  		n = rb_next(n);
>  	}
>  	return;
> @@ -4500,7 +4500,7 @@ void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap,
>  		if (unlikely(len == 0))
>  			continue;
>  		BUG_ON(groupnr != group);
> -		ext4_set_bits(bitmap, start, len);
> +		mb_set_bits(bitmap, start, len);
>  		preallocated += len;
>  	}
>  	mb_debug(sb, "preallocated %d for group %u\n", preallocated, group);
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index ee8f02f406cb..f507f34be602 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -483,7 +483,7 @@ static int set_flexbg_block_bitmap(struct super_block *sb, handle_t *handle,
>  		}
>  		ext4_debug("mark block bitmap %#04llx (+%llu/%u)\n",
>  			   first_cluster, first_cluster - start, count2);
> -		ext4_set_bits(bh->b_data, first_cluster - start, count2);
> +		mb_set_bits(bh->b_data, first_cluster - start, count2);
> 
>  		err = ext4_handle_dirty_metadata(handle, NULL, bh);
>  		brelse(bh);
> @@ -632,7 +632,7 @@ static int setup_new_flex_group_blocks(struct super_block *sb,
>  		if (overhead != 0) {
>  			ext4_debug("mark backup superblock %#04llx (+0)\n",
>  				   start);
> -			ext4_set_bits(bh->b_data, 0,
> +			mb_set_bits(bh->b_data, 0,
>  				      EXT4_NUM_B2C(sbi, overhead));
>  		}
>  		ext4_mark_bitmap_end(EXT4_B2C(sbi, group_data[i].blocks_count),
> --
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv1 7/9] ext4: Add ext4_sb_block_valid() refactored out of ext4_inode_block_valid()
  2022-02-05 14:09 ` [PATCHv1 7/9] ext4: Add ext4_sb_block_valid() refactored out of ext4_inode_block_valid() Ritesh Harjani
@ 2022-02-07 16:42   ` Jan Kara
  2022-02-08  3:03     ` Ritesh Harjani
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2022-02-07 16:42 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, linux-fsdevel, Theodore Ts'o, Jan Kara,
	Harshad Shirwadkar

On Sat 05-02-22 19:39:56, Ritesh Harjani wrote:
> This API will be needed at places where we don't have an inode
> for e.g. while freeing blocks in ext4_group_add_blocks()
> 
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>

...

> @@ -329,7 +324,8 @@ int ext4_inode_block_valid(struct inode *inode, ext4_fsblk_t start_blk,
>  		else if (start_blk >= (entry->start_blk + entry->count))
>  			n = n->rb_right;
>  		else {
> -			ret = (entry->ino == inode->i_ino);
> +			if (inode)
> +				ret = (entry->ino == inode->i_ino);
>  			break;

In case inode is not passed, we must not overlap any entry in the rbtree.
So we should return 0, not 1.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv1 8/9] ext4: Add strict range checks while freeing blocks
  2022-02-05 14:09 ` [PATCHv1 8/9] ext4: Add strict range checks while freeing blocks Ritesh Harjani
@ 2022-02-07 16:44   ` Jan Kara
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Kara @ 2022-02-07 16:44 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, linux-fsdevel, Theodore Ts'o, Jan Kara,
	Harshad Shirwadkar

On Sat 05-02-22 19:39:57, Ritesh Harjani wrote:
> Currently ext4_mb_clear_bb() & ext4_group_add_blocks() only checks
> whether the given block ranges (which is to be freed) belongs to any FS
> metadata blocks or not, of the block's respective block group.
> But to detect any FS error early, it is better to add more strict
> checkings in those functions which checks whether the given blocks
> belongs to any critical FS metadata or not within system-zone.
> 
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/mballoc.c | 16 +++-------------
>  1 file changed, 3 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 23313963bb56..9f2b3a057918 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -5930,13 +5930,7 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>  		goto error_return;
>  	}
>  
> -	if (in_range(ext4_block_bitmap(sb, gdp), block, count) ||
> -	    in_range(ext4_inode_bitmap(sb, gdp), block, count) ||
> -	    in_range(block, ext4_inode_table(sb, gdp),
> -		     sbi->s_itb_per_group) ||
> -	    in_range(block + count - 1, ext4_inode_table(sb, gdp),
> -		     sbi->s_itb_per_group)) {
> -
> +	if (!ext4_inode_block_valid(inode, block, count)) {
>  		ext4_error(sb, "Freeing blocks in system zone - "
>  			   "Block = %llu, count = %lu", block, count);
>  		/* err = 0. ext4_std_error should be a no op */
> @@ -6007,7 +6001,7 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>  						 NULL);
>  			if (err && err != -EOPNOTSUPP)
>  				ext4_msg(sb, KERN_WARNING, "discard request in"
> -					 " group:%d block:%d count:%lu failed"
> +					 " group:%u block:%d count:%lu failed"
>  					 " with %d", block_group, bit, count,
>  					 err);
>  		} else
> @@ -6220,11 +6214,7 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
>  		goto error_return;
>  	}
>  
> -	if (in_range(ext4_block_bitmap(sb, desc), block, count) ||
> -	    in_range(ext4_inode_bitmap(sb, desc), block, count) ||
> -	    in_range(block, ext4_inode_table(sb, desc), sbi->s_itb_per_group) ||
> -	    in_range(block + count - 1, ext4_inode_table(sb, desc),
> -		     sbi->s_itb_per_group)) {
> +	if (!ext4_sb_block_valid(sb, NULL, block, count)) {
>  		ext4_error(sb, "Adding blocks in system zones - "
>  			   "Block = %llu, count = %lu",
>  			   block, count);
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv1 9/9] ext4: Add extra check in ext4_mb_mark_bb() to prevent against possible corruption
  2022-02-05 14:09 ` [PATCHv1 9/9] ext4: Add extra check in ext4_mb_mark_bb() to prevent against possible corruption Ritesh Harjani
@ 2022-02-07 16:45   ` Jan Kara
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Kara @ 2022-02-07 16:45 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, linux-fsdevel, Theodore Ts'o, Jan Kara,
	Harshad Shirwadkar

On Sat 05-02-22 19:39:58, Ritesh Harjani wrote:
> This patch adds an extra checks in ext4_mb_mark_bb() function
> to make sure we mark & report error if we were to mark/clear any
> of the critical FS metadata specific bitmaps (&bail out) to prevent
> from any accidental corruption.
> 
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/mballoc.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 9f2b3a057918..75c20a10529a 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -3918,6 +3918,14 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
>  			EXT4_BLOCKS_PER_GROUP(sb) - EXT4_C2B(sbi, blkoff));
>  		clen = EXT4_NUM_B2C(sbi, thisgrp_len);
>  
> +		if (!ext4_sb_block_valid(sb, NULL, block, thisgrp_len)) {
> +			ext4_error(sb, "Marking blocks in system zone - "
> +				   "Block = %llu, len = %u",
> +				   block, thisgrp_len);
> +			bitmap_bh = NULL;
> +			break;
> +		}
> +
>  		bitmap_bh = ext4_read_block_bitmap(sb, group);
>  		if (IS_ERR(bitmap_bh)) {
>  			err = PTR_ERR(bitmap_bh);
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv1 7/9] ext4: Add ext4_sb_block_valid() refactored out of ext4_inode_block_valid()
  2022-02-07 16:42   ` Jan Kara
@ 2022-02-08  3:03     ` Ritesh Harjani
  0 siblings, 0 replies; 19+ messages in thread
From: Ritesh Harjani @ 2022-02-08  3:03 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, linux-fsdevel, Theodore Ts'o, Harshad Shirwadkar

On 22/02/07 05:42PM, Jan Kara wrote:
> On Sat 05-02-22 19:39:56, Ritesh Harjani wrote:
> > This API will be needed at places where we don't have an inode
> > for e.g. while freeing blocks in ext4_group_add_blocks()
> >
> > Suggested-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
>
> ...
>
> > @@ -329,7 +324,8 @@ int ext4_inode_block_valid(struct inode *inode, ext4_fsblk_t start_blk,
> >  		else if (start_blk >= (entry->start_blk + entry->count))
> >  			n = n->rb_right;
> >  		else {
> > -			ret = (entry->ino == inode->i_ino);
> > +			if (inode)
> > +				ret = (entry->ino == inode->i_ino);
> >  			break;
>
> In case inode is not passed, we must not overlap any entry in the rbtree.
> So we should return 0, not 1.
>
Damm! Thanks for catching that. Don't know how did I miss that.
Will make this below change then.
	else {
		ret = 0;
		if (inode)
			ret = (entry->ino == inode->i_ino)
		break;
	}

-riteshh

> 								Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv1 2/9] ext4: Fixes ext4_mb_mark_bb() with flex_bg with fast_commit
  2022-02-07 16:37   ` Jan Kara
@ 2022-02-08  3:11     ` Ritesh Harjani
  2022-02-08  9:59       ` Jan Kara
  0 siblings, 1 reply; 19+ messages in thread
From: Ritesh Harjani @ 2022-02-08  3:11 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, linux-fsdevel, Theodore Ts'o, Harshad Shirwadkar

On 22/02/07 05:37PM, Jan Kara wrote:
> On Sat 05-02-22 19:39:51, Ritesh Harjani wrote:
> > In case of flex_bg feature (which is by default enabled), extents for
> > any given inode might span across blocks from two different block group.
> > ext4_mb_mark_bb() only reads the buffer_head of block bitmap once for the
> > starting block group, but it fails to read it again when the extent length
> > boundary overflows to another block group. Then in this below loop it
> > accesses memory beyond the block group bitmap buffer_head and results
> > into a data abort.
> >
> > 	for (i = 0; i < clen; i++)
> > 		if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) == !state)
> > 			already++;
> >
> > This patch adds this functionality for checking block group boundary in
> > ext4_mb_mark_bb() and update the buffer_head(bitmap_bh) for every different
> > block group.
> >
> > w/o this patch, I was easily able to hit a data access abort using Power platform.
> >
> > <...>
> > [   74.327662] EXT4-fs error (device loop3): ext4_mb_generate_buddy:1141: group 11, block bitmap and bg descriptor inconsistent: 21248 vs 23294 free clusters
> > [   74.533214] EXT4-fs (loop3): shut down requested (2)
> > [   74.536705] Aborting journal on device loop3-8.
> > [   74.702705] BUG: Unable to handle kernel data access on read at 0xc00000005e980000
> > [   74.703727] Faulting instruction address: 0xc0000000007bffb8
> > cpu 0xd: Vector: 300 (Data Access) at [c000000015db7060]
> >     pc: c0000000007bffb8: ext4_mb_mark_bb+0x198/0x5a0
> >     lr: c0000000007bfeec: ext4_mb_mark_bb+0xcc/0x5a0
> >     sp: c000000015db7300
> >    msr: 800000000280b033
> >    dar: c00000005e980000
> >  dsisr: 40000000
> >   current = 0xc000000027af6880
> >   paca    = 0xc00000003ffd5200   irqmask: 0x03   irq_happened: 0x01
> >     pid   = 5167, comm = mount
> > <...>
> > enter ? for help
> > [c000000015db7380] c000000000782708 ext4_ext_clear_bb+0x378/0x410
> > [c000000015db7400] c000000000813f14 ext4_fc_replay+0x1794/0x2000
> > [c000000015db7580] c000000000833f7c do_one_pass+0xe9c/0x12a0
> > [c000000015db7710] c000000000834504 jbd2_journal_recover+0x184/0x2d0
> > [c000000015db77c0] c000000000841398 jbd2_journal_load+0x188/0x4a0
> > [c000000015db7880] c000000000804de8 ext4_fill_super+0x2638/0x3e10
> > [c000000015db7a40] c0000000005f8404 get_tree_bdev+0x2b4/0x350
> > [c000000015db7ae0] c0000000007ef058 ext4_get_tree+0x28/0x40
> > [c000000015db7b00] c0000000005f6344 vfs_get_tree+0x44/0x100
> > [c000000015db7b70] c00000000063c408 path_mount+0xdd8/0xe70
> > [c000000015db7c40] c00000000063c8f0 sys_mount+0x450/0x550
> > [c000000015db7d50] c000000000035770 system_call_exception+0x4a0/0x4e0
> > [c000000015db7e10] c00000000000c74c system_call_common+0xec/0x250
> >
> > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
>
> Just two nits below. Otherwise feel free to add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
>
> > ---
> >  fs/ext4/mballoc.c | 131 +++++++++++++++++++++++++++-------------------
> >  1 file changed, 76 insertions(+), 55 deletions(-)
> >
> > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> > index 2f117ce3bb73..d0bd51b1e1ad 100644
> > --- a/fs/ext4/mballoc.c
> > +++ b/fs/ext4/mballoc.c
> > @@ -3901,72 +3901,93 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
> >  	ext4_grpblk_t blkoff;
> >  	int i, err;
> >  	int already;
> > -	unsigned int clen, clen_changed;
> > +	unsigned int clen, clen_changed, thisgrp_len;
> >
> > -	clen = EXT4_NUM_B2C(sbi, len);
> > -
> > -	ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
> > -	bitmap_bh = ext4_read_block_bitmap(sb, group);
> > -	if (IS_ERR(bitmap_bh)) {
> > -		err = PTR_ERR(bitmap_bh);
> > -		bitmap_bh = NULL;
> > -		goto out_err;
> > -	}
> > -
> > -	err = -EIO;
> > -	gdp = ext4_get_group_desc(sb, group, &gdp_bh);
> > -	if (!gdp)
> > -		goto out_err;
> > +	while (len > 0) {
> > +		ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
> >
> > -	ext4_lock_group(sb, group);
> > -	already = 0;
> > -	for (i = 0; i < clen; i++)
> > -		if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) == !state)
> > -			already++;
> > -
> > -	clen_changed = clen - already;
> > -	if (state)
> > -		ext4_set_bits(bitmap_bh->b_data, blkoff, clen);
> > -	else
> > -		mb_test_and_clear_bits(bitmap_bh->b_data, blkoff, clen);
> > -	if (ext4_has_group_desc_csum(sb) &&
> > -	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
> > -		gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
> > -		ext4_free_group_clusters_set(sb, gdp,
> > -					     ext4_free_clusters_after_init(sb,
> > -						group, gdp));
> > -	}
> > -	if (state)
> > -		clen = ext4_free_group_clusters(sb, gdp) - clen_changed;
> > -	else
> > -		clen = ext4_free_group_clusters(sb, gdp) + clen_changed;
> > +		/*
> > +		 * Check to see if we are freeing blocks across a group
> > +		 * boundary.
> > +		 * In case of flex_bg, this can happen that (block, len) may
> > +		 * span across more than one group. In that case we need to
> > +		 * get the corresponding group metadata to work with.
> > +		 * For this we have goto again loop.
> > +		 */
> > +		thisgrp_len = min_t(unsigned int, (unsigned int)len,
> > +			EXT4_BLOCKS_PER_GROUP(sb) - EXT4_C2B(sbi, blkoff));
> > +		clen = EXT4_NUM_B2C(sbi, thisgrp_len);
> >
> > -	ext4_free_group_clusters_set(sb, gdp, clen);
> > -	ext4_block_bitmap_csum_set(sb, group, gdp, bitmap_bh);
> > -	ext4_group_desc_csum_set(sb, group, gdp);
> > +		bitmap_bh = ext4_read_block_bitmap(sb, group);
> > +		if (IS_ERR(bitmap_bh)) {
> > +			err = PTR_ERR(bitmap_bh);
> > +			bitmap_bh = NULL;
> > +			break;
> > +		}
> >
> > -	ext4_unlock_group(sb, group);
> > +		err = -EIO;
> > +		gdp = ext4_get_group_desc(sb, group, &gdp_bh);
> > +		if (!gdp)
> > +			break;
> >
> > -	if (sbi->s_log_groups_per_flex) {
> > -		ext4_group_t flex_group = ext4_flex_group(sbi, group);
> > -		struct flex_groups *fg = sbi_array_rcu_deref(sbi,
> > -					   s_flex_groups, flex_group);
> > +		ext4_lock_group(sb, group);
> > +		already = 0;
> > +		for (i = 0; i < clen; i++)
> > +			if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) ==
> > +					 !state)
> > +				already++;
> >
> > +		clen_changed = clen - already;
> >  		if (state)
> > -			atomic64_sub(clen_changed, &fg->free_clusters);
> > +			ext4_set_bits(bitmap_bh->b_data, blkoff, clen);
> >  		else
> > -			atomic64_add(clen_changed, &fg->free_clusters);
> > +			mb_test_and_clear_bits(bitmap_bh->b_data, blkoff, clen);
> > +		if (ext4_has_group_desc_csum(sb) &&
> > +		    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
> > +			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
> > +			ext4_free_group_clusters_set(sb, gdp,
> > +			     ext4_free_clusters_after_init(sb, group, gdp));
> > +		}
> > +		if (state)
> > +			clen = ext4_free_group_clusters(sb, gdp) - clen_changed;
> > +		else
> > +			clen = ext4_free_group_clusters(sb, gdp) + clen_changed;
> > +
> > +		ext4_free_group_clusters_set(sb, gdp, clen);
> > +		ext4_block_bitmap_csum_set(sb, group, gdp, bitmap_bh);
> > +		ext4_group_desc_csum_set(sb, group, gdp);
> > +
> > +		ext4_unlock_group(sb, group);
> > +
> > +		if (sbi->s_log_groups_per_flex) {
> > +			ext4_group_t flex_group = ext4_flex_group(sbi, group);
> > +			struct flex_groups *fg = sbi_array_rcu_deref(sbi,
> > +						   s_flex_groups, flex_group);
> > +
> > +			if (state)
> > +				atomic64_sub(clen_changed, &fg->free_clusters);
> > +			else
> > +				atomic64_add(clen_changed, &fg->free_clusters);
> > +
> > +		}
> > +
> > +		err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
> > +		if (err)
> > +			break;
> > +		sync_dirty_buffer(bitmap_bh);
> > +		err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
> > +		sync_dirty_buffer(gdp_bh);
> > +		if (err)
> > +			break;
> > +
> > +		block += thisgrp_len;
> > +		len = len - thisgrp_len;
> 		^^^ Maybe: len -= thisgrp_len;
>
> > +		put_bh(bitmap_bh);
> 		^^ brelse() would be more usual here...

Sure, will make above two changes.

Btw, any general rules of when should we use put_bh() v/s brelse()?

Assumption about why I used put_bh() above was that in a non-error loop, where
we are doing ext4_read_block_bitmap() (which will return bh with b_count
elevated), I thought, we could simply do put_bh() in the end.

But when there is a possibility of an error occurred somewhere in
between, then it's safe to do brelse().

-ritesh

>
>
> 								Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv1 2/9] ext4: Fixes ext4_mb_mark_bb() with flex_bg with fast_commit
  2022-02-08  3:11     ` Ritesh Harjani
@ 2022-02-08  9:59       ` Jan Kara
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Kara @ 2022-02-08  9:59 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Jan Kara, linux-ext4, linux-fsdevel, Theodore Ts'o,
	Harshad Shirwadkar

On Tue 08-02-22 08:41:07, Ritesh Harjani wrote:
> On 22/02/07 05:37PM, Jan Kara wrote:
> > On Sat 05-02-22 19:39:51, Ritesh Harjani wrote:
> > > In case of flex_bg feature (which is by default enabled), extents for
> > > any given inode might span across blocks from two different block group.
> > > ext4_mb_mark_bb() only reads the buffer_head of block bitmap once for the
> > > starting block group, but it fails to read it again when the extent length
> > > boundary overflows to another block group. Then in this below loop it
> > > accesses memory beyond the block group bitmap buffer_head and results
> > > into a data abort.
> > >
> > > 	for (i = 0; i < clen; i++)
> > > 		if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) == !state)
> > > 			already++;
> > >
> > > This patch adds this functionality for checking block group boundary in
> > > ext4_mb_mark_bb() and update the buffer_head(bitmap_bh) for every different
> > > block group.
> > >
> > > w/o this patch, I was easily able to hit a data access abort using Power platform.
> > >
> > > <...>
> > > [   74.327662] EXT4-fs error (device loop3): ext4_mb_generate_buddy:1141: group 11, block bitmap and bg descriptor inconsistent: 21248 vs 23294 free clusters
> > > [   74.533214] EXT4-fs (loop3): shut down requested (2)
> > > [   74.536705] Aborting journal on device loop3-8.
> > > [   74.702705] BUG: Unable to handle kernel data access on read at 0xc00000005e980000
> > > [   74.703727] Faulting instruction address: 0xc0000000007bffb8
> > > cpu 0xd: Vector: 300 (Data Access) at [c000000015db7060]
> > >     pc: c0000000007bffb8: ext4_mb_mark_bb+0x198/0x5a0
> > >     lr: c0000000007bfeec: ext4_mb_mark_bb+0xcc/0x5a0
> > >     sp: c000000015db7300
> > >    msr: 800000000280b033
> > >    dar: c00000005e980000
> > >  dsisr: 40000000
> > >   current = 0xc000000027af6880
> > >   paca    = 0xc00000003ffd5200   irqmask: 0x03   irq_happened: 0x01
> > >     pid   = 5167, comm = mount
> > > <...>
> > > enter ? for help
> > > [c000000015db7380] c000000000782708 ext4_ext_clear_bb+0x378/0x410
> > > [c000000015db7400] c000000000813f14 ext4_fc_replay+0x1794/0x2000
> > > [c000000015db7580] c000000000833f7c do_one_pass+0xe9c/0x12a0
> > > [c000000015db7710] c000000000834504 jbd2_journal_recover+0x184/0x2d0
> > > [c000000015db77c0] c000000000841398 jbd2_journal_load+0x188/0x4a0
> > > [c000000015db7880] c000000000804de8 ext4_fill_super+0x2638/0x3e10
> > > [c000000015db7a40] c0000000005f8404 get_tree_bdev+0x2b4/0x350
> > > [c000000015db7ae0] c0000000007ef058 ext4_get_tree+0x28/0x40
> > > [c000000015db7b00] c0000000005f6344 vfs_get_tree+0x44/0x100
> > > [c000000015db7b70] c00000000063c408 path_mount+0xdd8/0xe70
> > > [c000000015db7c40] c00000000063c8f0 sys_mount+0x450/0x550
> > > [c000000015db7d50] c000000000035770 system_call_exception+0x4a0/0x4e0
> > > [c000000015db7e10] c00000000000c74c system_call_common+0xec/0x250
> > >
> > > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> >
> > Just two nits below. Otherwise feel free to add:
> >
> > Reviewed-by: Jan Kara <jack@suse.cz>
> >
> > > ---
> > >  fs/ext4/mballoc.c | 131 +++++++++++++++++++++++++++-------------------
> > >  1 file changed, 76 insertions(+), 55 deletions(-)
> > >
> > > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> > > index 2f117ce3bb73..d0bd51b1e1ad 100644
> > > --- a/fs/ext4/mballoc.c
> > > +++ b/fs/ext4/mballoc.c
> > > @@ -3901,72 +3901,93 @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
> > >  	ext4_grpblk_t blkoff;
> > >  	int i, err;
> > >  	int already;
> > > -	unsigned int clen, clen_changed;
> > > +	unsigned int clen, clen_changed, thisgrp_len;
> > >
> > > -	clen = EXT4_NUM_B2C(sbi, len);
> > > -
> > > -	ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
> > > -	bitmap_bh = ext4_read_block_bitmap(sb, group);
> > > -	if (IS_ERR(bitmap_bh)) {
> > > -		err = PTR_ERR(bitmap_bh);
> > > -		bitmap_bh = NULL;
> > > -		goto out_err;
> > > -	}
> > > -
> > > -	err = -EIO;
> > > -	gdp = ext4_get_group_desc(sb, group, &gdp_bh);
> > > -	if (!gdp)
> > > -		goto out_err;
> > > +	while (len > 0) {
> > > +		ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
> > >
> > > -	ext4_lock_group(sb, group);
> > > -	already = 0;
> > > -	for (i = 0; i < clen; i++)
> > > -		if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) == !state)
> > > -			already++;
> > > -
> > > -	clen_changed = clen - already;
> > > -	if (state)
> > > -		ext4_set_bits(bitmap_bh->b_data, blkoff, clen);
> > > -	else
> > > -		mb_test_and_clear_bits(bitmap_bh->b_data, blkoff, clen);
> > > -	if (ext4_has_group_desc_csum(sb) &&
> > > -	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
> > > -		gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
> > > -		ext4_free_group_clusters_set(sb, gdp,
> > > -					     ext4_free_clusters_after_init(sb,
> > > -						group, gdp));
> > > -	}
> > > -	if (state)
> > > -		clen = ext4_free_group_clusters(sb, gdp) - clen_changed;
> > > -	else
> > > -		clen = ext4_free_group_clusters(sb, gdp) + clen_changed;
> > > +		/*
> > > +		 * Check to see if we are freeing blocks across a group
> > > +		 * boundary.
> > > +		 * In case of flex_bg, this can happen that (block, len) may
> > > +		 * span across more than one group. In that case we need to
> > > +		 * get the corresponding group metadata to work with.
> > > +		 * For this we have goto again loop.
> > > +		 */
> > > +		thisgrp_len = min_t(unsigned int, (unsigned int)len,
> > > +			EXT4_BLOCKS_PER_GROUP(sb) - EXT4_C2B(sbi, blkoff));
> > > +		clen = EXT4_NUM_B2C(sbi, thisgrp_len);
> > >
> > > -	ext4_free_group_clusters_set(sb, gdp, clen);
> > > -	ext4_block_bitmap_csum_set(sb, group, gdp, bitmap_bh);
> > > -	ext4_group_desc_csum_set(sb, group, gdp);
> > > +		bitmap_bh = ext4_read_block_bitmap(sb, group);
> > > +		if (IS_ERR(bitmap_bh)) {
> > > +			err = PTR_ERR(bitmap_bh);
> > > +			bitmap_bh = NULL;
> > > +			break;
> > > +		}
> > >
> > > -	ext4_unlock_group(sb, group);
> > > +		err = -EIO;
> > > +		gdp = ext4_get_group_desc(sb, group, &gdp_bh);
> > > +		if (!gdp)
> > > +			break;
> > >
> > > -	if (sbi->s_log_groups_per_flex) {
> > > -		ext4_group_t flex_group = ext4_flex_group(sbi, group);
> > > -		struct flex_groups *fg = sbi_array_rcu_deref(sbi,
> > > -					   s_flex_groups, flex_group);
> > > +		ext4_lock_group(sb, group);
> > > +		already = 0;
> > > +		for (i = 0; i < clen; i++)
> > > +			if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) ==
> > > +					 !state)
> > > +				already++;
> > >
> > > +		clen_changed = clen - already;
> > >  		if (state)
> > > -			atomic64_sub(clen_changed, &fg->free_clusters);
> > > +			ext4_set_bits(bitmap_bh->b_data, blkoff, clen);
> > >  		else
> > > -			atomic64_add(clen_changed, &fg->free_clusters);
> > > +			mb_test_and_clear_bits(bitmap_bh->b_data, blkoff, clen);
> > > +		if (ext4_has_group_desc_csum(sb) &&
> > > +		    (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
> > > +			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
> > > +			ext4_free_group_clusters_set(sb, gdp,
> > > +			     ext4_free_clusters_after_init(sb, group, gdp));
> > > +		}
> > > +		if (state)
> > > +			clen = ext4_free_group_clusters(sb, gdp) - clen_changed;
> > > +		else
> > > +			clen = ext4_free_group_clusters(sb, gdp) + clen_changed;
> > > +
> > > +		ext4_free_group_clusters_set(sb, gdp, clen);
> > > +		ext4_block_bitmap_csum_set(sb, group, gdp, bitmap_bh);
> > > +		ext4_group_desc_csum_set(sb, group, gdp);
> > > +
> > > +		ext4_unlock_group(sb, group);
> > > +
> > > +		if (sbi->s_log_groups_per_flex) {
> > > +			ext4_group_t flex_group = ext4_flex_group(sbi, group);
> > > +			struct flex_groups *fg = sbi_array_rcu_deref(sbi,
> > > +						   s_flex_groups, flex_group);
> > > +
> > > +			if (state)
> > > +				atomic64_sub(clen_changed, &fg->free_clusters);
> > > +			else
> > > +				atomic64_add(clen_changed, &fg->free_clusters);
> > > +
> > > +		}
> > > +
> > > +		err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
> > > +		if (err)
> > > +			break;
> > > +		sync_dirty_buffer(bitmap_bh);
> > > +		err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
> > > +		sync_dirty_buffer(gdp_bh);
> > > +		if (err)
> > > +			break;
> > > +
> > > +		block += thisgrp_len;
> > > +		len = len - thisgrp_len;
> > 		^^^ Maybe: len -= thisgrp_len;
> >
> > > +		put_bh(bitmap_bh);
> > 		^^ brelse() would be more usual here...
> 
> Sure, will make above two changes.
> 
> Btw, any general rules of when should we use put_bh() v/s brelse()?
> 
> Assumption about why I used put_bh() above was that in a non-error loop,
> where we are doing ext4_read_block_bitmap() (which will return bh with
> b_count elevated), I thought, we could simply do put_bh() in the end.
> 
> But when there is a possibility of an error occurred somewhere in
> between, then it's safe to do brelse().

The difference between put_bh() and brelse() are just the safety checks in
brelse(). So I generally use brelse() in higher level code and put_bh() in
lowlevel code where the overhead of additional checks could matter. But I
guess the opinions can differ :).

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-02-08 11:31 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-05 14:09 [PATCHv1 0/9] ext4: fast_commit fixes, stricter block checking & cleanups Ritesh Harjani
2022-02-05 14:09 ` [PATCHv1 1/9] ext4: Correct cluster len and clusters changed accounting in ext4_mb_mark_bb Ritesh Harjani
2022-02-07 15:28   ` Jan Kara
2022-02-05 14:09 ` [PATCHv1 2/9] ext4: Fixes ext4_mb_mark_bb() with flex_bg with fast_commit Ritesh Harjani
2022-02-07 16:37   ` Jan Kara
2022-02-08  3:11     ` Ritesh Harjani
2022-02-08  9:59       ` Jan Kara
2022-02-05 14:09 ` [PATCHv1 3/9] ext4: Refactor ext4_free_blocks() to pull out ext4_mb_clear_bb() Ritesh Harjani
2022-02-05 14:09 ` [PATCHv1 4/9] ext4: Use in_range() for range checking in ext4_fc_replay_check_excluded Ritesh Harjani
2022-02-05 14:09 ` [PATCHv1 5/9] ext4: Rename ext4_set_bits to mb_set_bits Ritesh Harjani
2022-02-07 16:38   ` Jan Kara
2022-02-05 14:09 ` [PATCHv1 6/9] ext4: No need to test for block bitmap bits in ext4_mb_mark_bb() Ritesh Harjani
2022-02-05 14:09 ` [PATCHv1 7/9] ext4: Add ext4_sb_block_valid() refactored out of ext4_inode_block_valid() Ritesh Harjani
2022-02-07 16:42   ` Jan Kara
2022-02-08  3:03     ` Ritesh Harjani
2022-02-05 14:09 ` [PATCHv1 8/9] ext4: Add strict range checks while freeing blocks Ritesh Harjani
2022-02-07 16:44   ` Jan Kara
2022-02-05 14:09 ` [PATCHv1 9/9] ext4: Add extra check in ext4_mb_mark_bb() to prevent against possible corruption Ritesh Harjani
2022-02-07 16:45   ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.