linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/3] ext4: fix a inode checksum error
@ 2021-09-01  2:09 Zhang Yi
  2021-09-01  2:09 ` [PATCH v5 1/3] ext4: factor out ext4_fill_raw_inode() Zhang Yi
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Zhang Yi @ 2021-09-01  2:09 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

We find a checksum error and a inode corruption problem while doing
stress test, this 3 patches address to fix them. The first two patches
are prepare to do the fix, the last patch fix these two issue.

 - Checksum error

    EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid

 - Inode corruption

    e2fsck 1.46.0 (29-Jan-2020)
    Pass 1: Checking inodes, blocks, and sizes
    Pass 2: Checking directory structure
    Entry 'foo' in / (2) has deleted/unused inode 17.  Clear<y>? yes
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Pass 5: Checking group summary information
    Inode bitmap differences:  -17
    Fix<y>? yes
    Free inodes count wrong for group #0 (32750, counted=32751).
    Fix<y>? yes
    Free inodes count wrong (32750, counted=32751).
    Fix<y>? yes

Changes since v4:
 - Drop first three already applied patches.
 - Remove 'in_mem' parameter passing __ext4_get_inode_loc() in the last
   patch.

Changes since v3:
 - Postpone initialization to ext4_do_update_inode() may cause zeroout
   newly set xattr entry. So switch to do initialization in
   __ext4_get_inode_loc().

Changes since v2:
 - Instead of using WARN_ON_ONCE to prevent ext4_do_update_inode()
   return before filling the inode buffer, keep the error and postpone
   the report after the updating in the third patch.
 - Fix some language mistacks in the last patch.

Changes since v1:
 - Add a patch to prevent ext4_do_update_inode() return before filling
   the inode buffer.
 - Do not use BH_New flag to indicate the empty buffer, postpone the
   zero and uptodate logic into ext4_do_update_inode() before filling
   the inode buffer.

Thanks,
Yi.

Zhang Yi (3):
  ext4: factor out ext4_fill_raw_inode()
  ext4: move ext4_fill_raw_inode() related functions
  ext4: prevent getting empty inode buffer

 fs/ext4/inode.c | 316 +++++++++++++++++++++++++-----------------------
 1 file changed, 165 insertions(+), 151 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v5 1/3] ext4: factor out ext4_fill_raw_inode()
  2021-09-01  2:09 [PATCH v5 0/3] ext4: fix a inode checksum error Zhang Yi
@ 2021-09-01  2:09 ` Zhang Yi
  2021-09-20 13:57   ` Jan Kara
  2021-09-01  2:09 ` [PATCH v5 2/3] ext4: move ext4_fill_raw_inode() related functions Zhang Yi
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Zhang Yi @ 2021-09-01  2:09 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

Factor out ext4_fill_raw_inode() from ext4_do_update_inode(), which is
use to fill the in-mem inode contents into the inode table buffer, in
preparation for initializing the exclusive inode buffer without reading
the block in __ext4_get_inode_loc().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c | 85 +++++++++++++++++++++++++++----------------------
 1 file changed, 47 insertions(+), 38 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 8323d3e8f393..c7186460c14d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4902,9 +4902,8 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
 	return ERR_PTR(ret);
 }
 
-static int ext4_inode_blocks_set(handle_t *handle,
-				struct ext4_inode *raw_inode,
-				struct ext4_inode_info *ei)
+static int ext4_inode_blocks_set(struct ext4_inode *raw_inode,
+				 struct ext4_inode_info *ei)
 {
 	struct inode *inode = &(ei->vfs_inode);
 	u64 i_blocks = READ_ONCE(inode->i_blocks);
@@ -5007,37 +5006,16 @@ static void ext4_update_other_inodes_time(struct super_block *sb,
 	rcu_read_unlock();
 }
 
-/*
- * Post the struct inode info into an on-disk inode location in the
- * buffer-cache.  This gobbles the caller's reference to the
- * buffer_head in the inode location struct.
- *
- * The caller must have write access to iloc->bh.
- */
-static int ext4_do_update_inode(handle_t *handle,
-				struct inode *inode,
-				struct ext4_iloc *iloc)
+static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode)
 {
-	struct ext4_inode *raw_inode = ext4_raw_inode(iloc);
 	struct ext4_inode_info *ei = EXT4_I(inode);
-	struct buffer_head *bh = iloc->bh;
-	struct super_block *sb = inode->i_sb;
-	int err = 0, block;
-	int need_datasync = 0, set_large_file = 0;
 	uid_t i_uid;
 	gid_t i_gid;
 	projid_t i_projid;
+	int block;
+	int err;
 
-	spin_lock(&ei->i_raw_lock);
-
-	/*
-	 * For fields not tracked in the in-memory inode, initialise them
-	 * to zero for new inodes.
-	 */
-	if (ext4_test_inode_state(inode, EXT4_STATE_NEW))
-		memset(raw_inode, 0, EXT4_SB(inode->i_sb)->s_inode_size);
-
-	err = ext4_inode_blocks_set(handle, raw_inode, ei);
+	err = ext4_inode_blocks_set(raw_inode, ei);
 
 	raw_inode->i_mode = cpu_to_le16(inode->i_mode);
 	i_uid = i_uid_read(inode);
@@ -5079,16 +5057,8 @@ static int ext4_do_update_inode(handle_t *handle,
 		raw_inode->i_file_acl_high =
 			cpu_to_le16(ei->i_file_acl >> 32);
 	raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl);
-	if (READ_ONCE(ei->i_disksize) != ext4_isize(inode->i_sb, raw_inode)) {
-		ext4_isize_set(raw_inode, ei->i_disksize);
-		need_datasync = 1;
-	}
-	if (ei->i_disksize > 0x7fffffffULL) {
-		if (!ext4_has_feature_large_file(sb) ||
-				EXT4_SB(sb)->s_es->s_rev_level ==
-		    cpu_to_le32(EXT4_GOOD_OLD_REV))
-			set_large_file = 1;
-	}
+	ext4_isize_set(raw_inode, ei->i_disksize);
+
 	raw_inode->i_generation = cpu_to_le32(inode->i_generation);
 	if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) {
 		if (old_valid_dev(inode->i_rdev)) {
@@ -5128,6 +5098,45 @@ static int ext4_do_update_inode(handle_t *handle,
 		raw_inode->i_projid = cpu_to_le32(i_projid);
 
 	ext4_inode_csum_set(inode, raw_inode, ei);
+	return err;
+}
+
+/*
+ * Post the struct inode info into an on-disk inode location in the
+ * buffer-cache.  This gobbles the caller's reference to the
+ * buffer_head in the inode location struct.
+ *
+ * The caller must have write access to iloc->bh.
+ */
+static int ext4_do_update_inode(handle_t *handle,
+				struct inode *inode,
+				struct ext4_iloc *iloc)
+{
+	struct ext4_inode *raw_inode = ext4_raw_inode(iloc);
+	struct ext4_inode_info *ei = EXT4_I(inode);
+	struct buffer_head *bh = iloc->bh;
+	struct super_block *sb = inode->i_sb;
+	int err;
+	int need_datasync = 0, set_large_file = 0;
+
+	spin_lock(&ei->i_raw_lock);
+
+	/*
+	 * For fields not tracked in the in-memory inode, initialise them
+	 * to zero for new inodes.
+	 */
+	if (ext4_test_inode_state(inode, EXT4_STATE_NEW))
+		memset(raw_inode, 0, EXT4_SB(inode->i_sb)->s_inode_size);
+
+	if (READ_ONCE(ei->i_disksize) != ext4_isize(inode->i_sb, raw_inode))
+		need_datasync = 1;
+	if (ei->i_disksize > 0x7fffffffULL) {
+		if (!ext4_has_feature_large_file(sb) ||
+		    EXT4_SB(sb)->s_es->s_rev_level == cpu_to_le32(EXT4_GOOD_OLD_REV))
+			set_large_file = 1;
+	}
+
+	err = ext4_fill_raw_inode(inode, raw_inode);
 	spin_unlock(&ei->i_raw_lock);
 	if (err) {
 		EXT4_ERROR_INODE(inode, "corrupted inode contents");
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 2/3] ext4: move ext4_fill_raw_inode() related functions
  2021-09-01  2:09 [PATCH v5 0/3] ext4: fix a inode checksum error Zhang Yi
  2021-09-01  2:09 ` [PATCH v5 1/3] ext4: factor out ext4_fill_raw_inode() Zhang Yi
@ 2021-09-01  2:09 ` Zhang Yi
  2021-09-20 13:57   ` Jan Kara
  2021-09-01  2:09 ` [PATCH v5 3/3] ext4: prevent getting empty inode buffer Zhang Yi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Zhang Yi @ 2021-09-01  2:09 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

In preparation for calling ext4_fill_raw_inode() in
__ext4_get_inode_loc(), move three related functions before
__ext4_get_inode_loc(), no logical change.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c | 293 ++++++++++++++++++++++++------------------------
 1 file changed, 147 insertions(+), 146 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c7186460c14d..3c36e701e30e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4292,6 +4292,153 @@ int ext4_truncate(struct inode *inode)
 	return err;
 }
 
+static inline u64 ext4_inode_peek_iversion(const struct inode *inode)
+{
+	if (unlikely(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
+		return inode_peek_iversion_raw(inode);
+	else
+		return inode_peek_iversion(inode);
+}
+
+static int ext4_inode_blocks_set(struct ext4_inode *raw_inode,
+				 struct ext4_inode_info *ei)
+{
+	struct inode *inode = &(ei->vfs_inode);
+	u64 i_blocks = READ_ONCE(inode->i_blocks);
+	struct super_block *sb = inode->i_sb;
+
+	if (i_blocks <= ~0U) {
+		/*
+		 * i_blocks can be represented in a 32 bit variable
+		 * as multiple of 512 bytes
+		 */
+		raw_inode->i_blocks_lo   = cpu_to_le32(i_blocks);
+		raw_inode->i_blocks_high = 0;
+		ext4_clear_inode_flag(inode, EXT4_INODE_HUGE_FILE);
+		return 0;
+	}
+
+	/*
+	 * This should never happen since sb->s_maxbytes should not have
+	 * allowed this, sb->s_maxbytes was set according to the huge_file
+	 * feature in ext4_fill_super().
+	 */
+	if (!ext4_has_feature_huge_file(sb))
+		return -EFSCORRUPTED;
+
+	if (i_blocks <= 0xffffffffffffULL) {
+		/*
+		 * i_blocks can be represented in a 48 bit variable
+		 * as multiple of 512 bytes
+		 */
+		raw_inode->i_blocks_lo   = cpu_to_le32(i_blocks);
+		raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
+		ext4_clear_inode_flag(inode, EXT4_INODE_HUGE_FILE);
+	} else {
+		ext4_set_inode_flag(inode, EXT4_INODE_HUGE_FILE);
+		/* i_block is stored in file system block size */
+		i_blocks = i_blocks >> (inode->i_blkbits - 9);
+		raw_inode->i_blocks_lo   = cpu_to_le32(i_blocks);
+		raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
+	}
+	return 0;
+}
+
+static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode)
+{
+	struct ext4_inode_info *ei = EXT4_I(inode);
+	uid_t i_uid;
+	gid_t i_gid;
+	projid_t i_projid;
+	int block;
+	int err;
+
+	err = ext4_inode_blocks_set(raw_inode, ei);
+
+	raw_inode->i_mode = cpu_to_le16(inode->i_mode);
+	i_uid = i_uid_read(inode);
+	i_gid = i_gid_read(inode);
+	i_projid = from_kprojid(&init_user_ns, ei->i_projid);
+	if (!(test_opt(inode->i_sb, NO_UID32))) {
+		raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
+		raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
+		/*
+		 * Fix up interoperability with old kernels. Otherwise,
+		 * old inodes get re-used with the upper 16 bits of the
+		 * uid/gid intact.
+		 */
+		if (ei->i_dtime && list_empty(&ei->i_orphan)) {
+			raw_inode->i_uid_high = 0;
+			raw_inode->i_gid_high = 0;
+		} else {
+			raw_inode->i_uid_high =
+				cpu_to_le16(high_16_bits(i_uid));
+			raw_inode->i_gid_high =
+				cpu_to_le16(high_16_bits(i_gid));
+		}
+	} else {
+		raw_inode->i_uid_low = cpu_to_le16(fs_high2lowuid(i_uid));
+		raw_inode->i_gid_low = cpu_to_le16(fs_high2lowgid(i_gid));
+		raw_inode->i_uid_high = 0;
+		raw_inode->i_gid_high = 0;
+	}
+	raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
+
+	EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
+	EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode);
+	EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
+	EXT4_EINODE_SET_XTIME(i_crtime, ei, raw_inode);
+
+	raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
+	raw_inode->i_flags = cpu_to_le32(ei->i_flags & 0xFFFFFFFF);
+	if (likely(!test_opt2(inode->i_sb, HURD_COMPAT)))
+		raw_inode->i_file_acl_high =
+			cpu_to_le16(ei->i_file_acl >> 32);
+	raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl);
+	ext4_isize_set(raw_inode, ei->i_disksize);
+
+	raw_inode->i_generation = cpu_to_le32(inode->i_generation);
+	if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) {
+		if (old_valid_dev(inode->i_rdev)) {
+			raw_inode->i_block[0] =
+				cpu_to_le32(old_encode_dev(inode->i_rdev));
+			raw_inode->i_block[1] = 0;
+		} else {
+			raw_inode->i_block[0] = 0;
+			raw_inode->i_block[1] =
+				cpu_to_le32(new_encode_dev(inode->i_rdev));
+			raw_inode->i_block[2] = 0;
+		}
+	} else if (!ext4_has_inline_data(inode)) {
+		for (block = 0; block < EXT4_N_BLOCKS; block++)
+			raw_inode->i_block[block] = ei->i_data[block];
+	}
+
+	if (likely(!test_opt2(inode->i_sb, HURD_COMPAT))) {
+		u64 ivers = ext4_inode_peek_iversion(inode);
+
+		raw_inode->i_disk_version = cpu_to_le32(ivers);
+		if (ei->i_extra_isize) {
+			if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
+				raw_inode->i_version_hi =
+					cpu_to_le32(ivers >> 32);
+			raw_inode->i_extra_isize =
+				cpu_to_le16(ei->i_extra_isize);
+		}
+	}
+
+	if (i_projid != EXT4_DEF_PROJID &&
+	    !ext4_has_feature_project(inode->i_sb))
+		err = err ?: -EFSCORRUPTED;
+
+	if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE &&
+	    EXT4_FITS_IN_INODE(raw_inode, ei, i_projid))
+		raw_inode->i_projid = cpu_to_le32(i_projid);
+
+	ext4_inode_csum_set(inode, raw_inode, ei);
+	return err;
+}
+
 /*
  * ext4_get_inode_loc returns with an extra refcount against the inode's
  * underlying buffer_head on success. If 'in_mem' is true, we have all
@@ -4580,13 +4727,6 @@ static inline void ext4_inode_set_iversion_queried(struct inode *inode, u64 val)
 	else
 		inode_set_iversion_queried(inode, val);
 }
-static inline u64 ext4_inode_peek_iversion(const struct inode *inode)
-{
-	if (unlikely(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
-		return inode_peek_iversion_raw(inode);
-	else
-		return inode_peek_iversion(inode);
-}
 
 struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
 			  ext4_iget_flags flags, const char *function,
@@ -4902,50 +5042,6 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
 	return ERR_PTR(ret);
 }
 
-static int ext4_inode_blocks_set(struct ext4_inode *raw_inode,
-				 struct ext4_inode_info *ei)
-{
-	struct inode *inode = &(ei->vfs_inode);
-	u64 i_blocks = READ_ONCE(inode->i_blocks);
-	struct super_block *sb = inode->i_sb;
-
-	if (i_blocks <= ~0U) {
-		/*
-		 * i_blocks can be represented in a 32 bit variable
-		 * as multiple of 512 bytes
-		 */
-		raw_inode->i_blocks_lo   = cpu_to_le32(i_blocks);
-		raw_inode->i_blocks_high = 0;
-		ext4_clear_inode_flag(inode, EXT4_INODE_HUGE_FILE);
-		return 0;
-	}
-
-	/*
-	 * This should never happen since sb->s_maxbytes should not have
-	 * allowed this, sb->s_maxbytes was set according to the huge_file
-	 * feature in ext4_fill_super().
-	 */
-	if (!ext4_has_feature_huge_file(sb))
-		return -EFSCORRUPTED;
-
-	if (i_blocks <= 0xffffffffffffULL) {
-		/*
-		 * i_blocks can be represented in a 48 bit variable
-		 * as multiple of 512 bytes
-		 */
-		raw_inode->i_blocks_lo   = cpu_to_le32(i_blocks);
-		raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
-		ext4_clear_inode_flag(inode, EXT4_INODE_HUGE_FILE);
-	} else {
-		ext4_set_inode_flag(inode, EXT4_INODE_HUGE_FILE);
-		/* i_block is stored in file system block size */
-		i_blocks = i_blocks >> (inode->i_blkbits - 9);
-		raw_inode->i_blocks_lo   = cpu_to_le32(i_blocks);
-		raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
-	}
-	return 0;
-}
-
 static void __ext4_update_other_inode_time(struct super_block *sb,
 					   unsigned long orig_ino,
 					   unsigned long ino,
@@ -5006,101 +5102,6 @@ static void ext4_update_other_inodes_time(struct super_block *sb,
 	rcu_read_unlock();
 }
 
-static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode)
-{
-	struct ext4_inode_info *ei = EXT4_I(inode);
-	uid_t i_uid;
-	gid_t i_gid;
-	projid_t i_projid;
-	int block;
-	int err;
-
-	err = ext4_inode_blocks_set(raw_inode, ei);
-
-	raw_inode->i_mode = cpu_to_le16(inode->i_mode);
-	i_uid = i_uid_read(inode);
-	i_gid = i_gid_read(inode);
-	i_projid = from_kprojid(&init_user_ns, ei->i_projid);
-	if (!(test_opt(inode->i_sb, NO_UID32))) {
-		raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
-		raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
-		/*
-		 * Fix up interoperability with old kernels. Otherwise,
-		 * old inodes get re-used with the upper 16 bits of the
-		 * uid/gid intact.
-		 */
-		if (ei->i_dtime && list_empty(&ei->i_orphan)) {
-			raw_inode->i_uid_high = 0;
-			raw_inode->i_gid_high = 0;
-		} else {
-			raw_inode->i_uid_high =
-				cpu_to_le16(high_16_bits(i_uid));
-			raw_inode->i_gid_high =
-				cpu_to_le16(high_16_bits(i_gid));
-		}
-	} else {
-		raw_inode->i_uid_low = cpu_to_le16(fs_high2lowuid(i_uid));
-		raw_inode->i_gid_low = cpu_to_le16(fs_high2lowgid(i_gid));
-		raw_inode->i_uid_high = 0;
-		raw_inode->i_gid_high = 0;
-	}
-	raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
-
-	EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
-	EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode);
-	EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
-	EXT4_EINODE_SET_XTIME(i_crtime, ei, raw_inode);
-
-	raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
-	raw_inode->i_flags = cpu_to_le32(ei->i_flags & 0xFFFFFFFF);
-	if (likely(!test_opt2(inode->i_sb, HURD_COMPAT)))
-		raw_inode->i_file_acl_high =
-			cpu_to_le16(ei->i_file_acl >> 32);
-	raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl);
-	ext4_isize_set(raw_inode, ei->i_disksize);
-
-	raw_inode->i_generation = cpu_to_le32(inode->i_generation);
-	if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) {
-		if (old_valid_dev(inode->i_rdev)) {
-			raw_inode->i_block[0] =
-				cpu_to_le32(old_encode_dev(inode->i_rdev));
-			raw_inode->i_block[1] = 0;
-		} else {
-			raw_inode->i_block[0] = 0;
-			raw_inode->i_block[1] =
-				cpu_to_le32(new_encode_dev(inode->i_rdev));
-			raw_inode->i_block[2] = 0;
-		}
-	} else if (!ext4_has_inline_data(inode)) {
-		for (block = 0; block < EXT4_N_BLOCKS; block++)
-			raw_inode->i_block[block] = ei->i_data[block];
-	}
-
-	if (likely(!test_opt2(inode->i_sb, HURD_COMPAT))) {
-		u64 ivers = ext4_inode_peek_iversion(inode);
-
-		raw_inode->i_disk_version = cpu_to_le32(ivers);
-		if (ei->i_extra_isize) {
-			if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
-				raw_inode->i_version_hi =
-					cpu_to_le32(ivers >> 32);
-			raw_inode->i_extra_isize =
-				cpu_to_le16(ei->i_extra_isize);
-		}
-	}
-
-	if (i_projid != EXT4_DEF_PROJID &&
-	    !ext4_has_feature_project(inode->i_sb))
-		err = err ?: -EFSCORRUPTED;
-
-	if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE &&
-	    EXT4_FITS_IN_INODE(raw_inode, ei, i_projid))
-		raw_inode->i_projid = cpu_to_le32(i_projid);
-
-	ext4_inode_csum_set(inode, raw_inode, ei);
-	return err;
-}
-
 /*
  * Post the struct inode info into an on-disk inode location in the
  * buffer-cache.  This gobbles the caller's reference to the
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 3/3] ext4: prevent getting empty inode buffer
  2021-09-01  2:09 [PATCH v5 0/3] ext4: fix a inode checksum error Zhang Yi
  2021-09-01  2:09 ` [PATCH v5 1/3] ext4: factor out ext4_fill_raw_inode() Zhang Yi
  2021-09-01  2:09 ` [PATCH v5 2/3] ext4: move ext4_fill_raw_inode() related functions Zhang Yi
@ 2021-09-01  2:09 ` Zhang Yi
  2021-09-20 13:58   ` Jan Kara
  2021-10-14 10:57 ` [PATCH v5 0/3] ext4: fix a inode checksum error Zhang Yi
  2021-10-28 14:47 ` Theodore Ts'o
  4 siblings, 1 reply; 9+ messages in thread
From: Zhang Yi @ 2021-09-01  2:09 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

In ext4_get_inode_loc(), we may skip IO and get an zero && uptodate
inode buffer when the inode monopolize an inode block for performance
reason. For most cases, ext4_mark_iloc_dirty() will fill the inode
buffer to make it fine, but we could miss this call if something bad
happened. Finally, __ext4_get_inode_loc_noinmem() may probably get an
empty inode buffer and trigger ext4 error.

For example, if we remove a nonexistent xattr on inode A,
ext4_xattr_set_handle() will return ENODATA before invoking
ext4_mark_iloc_dirty(), it will left an uptodate but zero buffer. We
will get checksum error message in ext4_iget() when getting inode again.

  EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid

Even worse, if we allocate another inode B at the same inode block, it
will corrupt the inode A on disk when write back inode B.

So this patch initialize the inode buffer by filling the in-mem inode
contents if we skip read I/O, ensure that the buffer is really uptodate.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3c36e701e30e..a8388ec91f9f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4441,12 +4441,12 @@ static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode
 
 /*
  * ext4_get_inode_loc returns with an extra refcount against the inode's
- * underlying buffer_head on success. If 'in_mem' is true, we have all
- * data in memory that is needed to recreate the on-disk version of this
- * inode.
+ * underlying buffer_head on success. If we pass 'inode' and it does not
+ * have in-inode xattr, we have all inode data in memory that is needed
+ * to recreate the on-disk version of this inode.
  */
 static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
-				struct ext4_iloc *iloc, int in_mem,
+				struct inode *inode, struct ext4_iloc *iloc,
 				ext4_fsblk_t *ret_block)
 {
 	struct ext4_group_desc	*gdp;
@@ -4486,7 +4486,7 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
 	 * is the only valid inode in the block, we need not read the
 	 * block.
 	 */
-	if (in_mem) {
+	if (inode && !ext4_test_inode_state(inode, EXT4_STATE_XATTR)) {
 		struct buffer_head *bitmap_bh;
 		int i, start;
 
@@ -4514,8 +4514,13 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
 		}
 		brelse(bitmap_bh);
 		if (i == start + inodes_per_block) {
+			struct ext4_inode *raw_inode =
+				(struct ext4_inode *) (bh->b_data + iloc->offset);
+
 			/* all other inodes are free, so skip I/O */
 			memset(bh->b_data, 0, bh->b_size);
+			if (!ext4_test_inode_state(inode, EXT4_STATE_NEW))
+				ext4_fill_raw_inode(inode, raw_inode);
 			set_buffer_uptodate(bh);
 			unlock_buffer(bh);
 			goto has_buffer;
@@ -4576,7 +4581,7 @@ static int __ext4_get_inode_loc_noinmem(struct inode *inode,
 	ext4_fsblk_t err_blk;
 	int ret;
 
-	ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, iloc, 0,
+	ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, NULL, iloc,
 					&err_blk);
 
 	if (ret == -EIO)
@@ -4591,9 +4596,8 @@ int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)
 	ext4_fsblk_t err_blk;
 	int ret;
 
-	/* We have all inode data except xattrs in memory here. */
-	ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, iloc,
-		!ext4_test_inode_state(inode, EXT4_STATE_XATTR), &err_blk);
+	ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, inode, iloc,
+					&err_blk);
 
 	if (ret == -EIO)
 		ext4_error_inode_block(inode, err_blk, EIO,
@@ -4606,7 +4610,7 @@ int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)
 int ext4_get_fc_inode_loc(struct super_block *sb, unsigned long ino,
 			  struct ext4_iloc *iloc)
 {
-	return __ext4_get_inode_loc(sb, ino, iloc, 0, NULL);
+	return __ext4_get_inode_loc(sb, ino, NULL, iloc, NULL);
 }
 
 static bool ext4_should_enable_dax(struct inode *inode)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 1/3] ext4: factor out ext4_fill_raw_inode()
  2021-09-01  2:09 ` [PATCH v5 1/3] ext4: factor out ext4_fill_raw_inode() Zhang Yi
@ 2021-09-20 13:57   ` Jan Kara
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2021-09-20 13:57 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Wed 01-09-21 10:09:53, Zhang Yi wrote:
> Factor out ext4_fill_raw_inode() from ext4_do_update_inode(), which is
> use to fill the in-mem inode contents into the inode table buffer, in
> preparation for initializing the exclusive inode buffer without reading
> the block in __ext4_get_inode_loc().
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/inode.c | 85 +++++++++++++++++++++++++++----------------------
>  1 file changed, 47 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 8323d3e8f393..c7186460c14d 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4902,9 +4902,8 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
>  	return ERR_PTR(ret);
>  }
>  
> -static int ext4_inode_blocks_set(handle_t *handle,
> -				struct ext4_inode *raw_inode,
> -				struct ext4_inode_info *ei)
> +static int ext4_inode_blocks_set(struct ext4_inode *raw_inode,
> +				 struct ext4_inode_info *ei)
>  {
>  	struct inode *inode = &(ei->vfs_inode);
>  	u64 i_blocks = READ_ONCE(inode->i_blocks);
> @@ -5007,37 +5006,16 @@ static void ext4_update_other_inodes_time(struct super_block *sb,
>  	rcu_read_unlock();
>  }
>  
> -/*
> - * Post the struct inode info into an on-disk inode location in the
> - * buffer-cache.  This gobbles the caller's reference to the
> - * buffer_head in the inode location struct.
> - *
> - * The caller must have write access to iloc->bh.
> - */
> -static int ext4_do_update_inode(handle_t *handle,
> -				struct inode *inode,
> -				struct ext4_iloc *iloc)
> +static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode)
>  {
> -	struct ext4_inode *raw_inode = ext4_raw_inode(iloc);
>  	struct ext4_inode_info *ei = EXT4_I(inode);
> -	struct buffer_head *bh = iloc->bh;
> -	struct super_block *sb = inode->i_sb;
> -	int err = 0, block;
> -	int need_datasync = 0, set_large_file = 0;
>  	uid_t i_uid;
>  	gid_t i_gid;
>  	projid_t i_projid;
> +	int block;
> +	int err;
>  
> -	spin_lock(&ei->i_raw_lock);
> -
> -	/*
> -	 * For fields not tracked in the in-memory inode, initialise them
> -	 * to zero for new inodes.
> -	 */
> -	if (ext4_test_inode_state(inode, EXT4_STATE_NEW))
> -		memset(raw_inode, 0, EXT4_SB(inode->i_sb)->s_inode_size);
> -
> -	err = ext4_inode_blocks_set(handle, raw_inode, ei);
> +	err = ext4_inode_blocks_set(raw_inode, ei);
>  
>  	raw_inode->i_mode = cpu_to_le16(inode->i_mode);
>  	i_uid = i_uid_read(inode);
> @@ -5079,16 +5057,8 @@ static int ext4_do_update_inode(handle_t *handle,
>  		raw_inode->i_file_acl_high =
>  			cpu_to_le16(ei->i_file_acl >> 32);
>  	raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl);
> -	if (READ_ONCE(ei->i_disksize) != ext4_isize(inode->i_sb, raw_inode)) {
> -		ext4_isize_set(raw_inode, ei->i_disksize);
> -		need_datasync = 1;
> -	}
> -	if (ei->i_disksize > 0x7fffffffULL) {
> -		if (!ext4_has_feature_large_file(sb) ||
> -				EXT4_SB(sb)->s_es->s_rev_level ==
> -		    cpu_to_le32(EXT4_GOOD_OLD_REV))
> -			set_large_file = 1;
> -	}
> +	ext4_isize_set(raw_inode, ei->i_disksize);
> +
>  	raw_inode->i_generation = cpu_to_le32(inode->i_generation);
>  	if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) {
>  		if (old_valid_dev(inode->i_rdev)) {
> @@ -5128,6 +5098,45 @@ static int ext4_do_update_inode(handle_t *handle,
>  		raw_inode->i_projid = cpu_to_le32(i_projid);
>  
>  	ext4_inode_csum_set(inode, raw_inode, ei);
> +	return err;
> +}
> +
> +/*
> + * Post the struct inode info into an on-disk inode location in the
> + * buffer-cache.  This gobbles the caller's reference to the
> + * buffer_head in the inode location struct.
> + *
> + * The caller must have write access to iloc->bh.
> + */
> +static int ext4_do_update_inode(handle_t *handle,
> +				struct inode *inode,
> +				struct ext4_iloc *iloc)
> +{
> +	struct ext4_inode *raw_inode = ext4_raw_inode(iloc);
> +	struct ext4_inode_info *ei = EXT4_I(inode);
> +	struct buffer_head *bh = iloc->bh;
> +	struct super_block *sb = inode->i_sb;
> +	int err;
> +	int need_datasync = 0, set_large_file = 0;
> +
> +	spin_lock(&ei->i_raw_lock);
> +
> +	/*
> +	 * For fields not tracked in the in-memory inode, initialise them
> +	 * to zero for new inodes.
> +	 */
> +	if (ext4_test_inode_state(inode, EXT4_STATE_NEW))
> +		memset(raw_inode, 0, EXT4_SB(inode->i_sb)->s_inode_size);
> +
> +	if (READ_ONCE(ei->i_disksize) != ext4_isize(inode->i_sb, raw_inode))
> +		need_datasync = 1;
> +	if (ei->i_disksize > 0x7fffffffULL) {
> +		if (!ext4_has_feature_large_file(sb) ||
> +		    EXT4_SB(sb)->s_es->s_rev_level == cpu_to_le32(EXT4_GOOD_OLD_REV))
> +			set_large_file = 1;
> +	}
> +
> +	err = ext4_fill_raw_inode(inode, raw_inode);
>  	spin_unlock(&ei->i_raw_lock);
>  	if (err) {
>  		EXT4_ERROR_INODE(inode, "corrupted inode contents");
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 2/3] ext4: move ext4_fill_raw_inode() related functions
  2021-09-01  2:09 ` [PATCH v5 2/3] ext4: move ext4_fill_raw_inode() related functions Zhang Yi
@ 2021-09-20 13:57   ` Jan Kara
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2021-09-20 13:57 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Wed 01-09-21 10:09:54, Zhang Yi wrote:
> In preparation for calling ext4_fill_raw_inode() in
> __ext4_get_inode_loc(), move three related functions before
> __ext4_get_inode_loc(), no logical change.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/inode.c | 293 ++++++++++++++++++++++++------------------------
>  1 file changed, 147 insertions(+), 146 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index c7186460c14d..3c36e701e30e 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4292,6 +4292,153 @@ int ext4_truncate(struct inode *inode)
>  	return err;
>  }
>  
> +static inline u64 ext4_inode_peek_iversion(const struct inode *inode)
> +{
> +	if (unlikely(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> +		return inode_peek_iversion_raw(inode);
> +	else
> +		return inode_peek_iversion(inode);
> +}
> +
> +static int ext4_inode_blocks_set(struct ext4_inode *raw_inode,
> +				 struct ext4_inode_info *ei)
> +{
> +	struct inode *inode = &(ei->vfs_inode);
> +	u64 i_blocks = READ_ONCE(inode->i_blocks);
> +	struct super_block *sb = inode->i_sb;
> +
> +	if (i_blocks <= ~0U) {
> +		/*
> +		 * i_blocks can be represented in a 32 bit variable
> +		 * as multiple of 512 bytes
> +		 */
> +		raw_inode->i_blocks_lo   = cpu_to_le32(i_blocks);
> +		raw_inode->i_blocks_high = 0;
> +		ext4_clear_inode_flag(inode, EXT4_INODE_HUGE_FILE);
> +		return 0;
> +	}
> +
> +	/*
> +	 * This should never happen since sb->s_maxbytes should not have
> +	 * allowed this, sb->s_maxbytes was set according to the huge_file
> +	 * feature in ext4_fill_super().
> +	 */
> +	if (!ext4_has_feature_huge_file(sb))
> +		return -EFSCORRUPTED;
> +
> +	if (i_blocks <= 0xffffffffffffULL) {
> +		/*
> +		 * i_blocks can be represented in a 48 bit variable
> +		 * as multiple of 512 bytes
> +		 */
> +		raw_inode->i_blocks_lo   = cpu_to_le32(i_blocks);
> +		raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
> +		ext4_clear_inode_flag(inode, EXT4_INODE_HUGE_FILE);
> +	} else {
> +		ext4_set_inode_flag(inode, EXT4_INODE_HUGE_FILE);
> +		/* i_block is stored in file system block size */
> +		i_blocks = i_blocks >> (inode->i_blkbits - 9);
> +		raw_inode->i_blocks_lo   = cpu_to_le32(i_blocks);
> +		raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
> +	}
> +	return 0;
> +}
> +
> +static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode)
> +{
> +	struct ext4_inode_info *ei = EXT4_I(inode);
> +	uid_t i_uid;
> +	gid_t i_gid;
> +	projid_t i_projid;
> +	int block;
> +	int err;
> +
> +	err = ext4_inode_blocks_set(raw_inode, ei);
> +
> +	raw_inode->i_mode = cpu_to_le16(inode->i_mode);
> +	i_uid = i_uid_read(inode);
> +	i_gid = i_gid_read(inode);
> +	i_projid = from_kprojid(&init_user_ns, ei->i_projid);
> +	if (!(test_opt(inode->i_sb, NO_UID32))) {
> +		raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
> +		raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
> +		/*
> +		 * Fix up interoperability with old kernels. Otherwise,
> +		 * old inodes get re-used with the upper 16 bits of the
> +		 * uid/gid intact.
> +		 */
> +		if (ei->i_dtime && list_empty(&ei->i_orphan)) {
> +			raw_inode->i_uid_high = 0;
> +			raw_inode->i_gid_high = 0;
> +		} else {
> +			raw_inode->i_uid_high =
> +				cpu_to_le16(high_16_bits(i_uid));
> +			raw_inode->i_gid_high =
> +				cpu_to_le16(high_16_bits(i_gid));
> +		}
> +	} else {
> +		raw_inode->i_uid_low = cpu_to_le16(fs_high2lowuid(i_uid));
> +		raw_inode->i_gid_low = cpu_to_le16(fs_high2lowgid(i_gid));
> +		raw_inode->i_uid_high = 0;
> +		raw_inode->i_gid_high = 0;
> +	}
> +	raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
> +
> +	EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
> +	EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode);
> +	EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
> +	EXT4_EINODE_SET_XTIME(i_crtime, ei, raw_inode);
> +
> +	raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
> +	raw_inode->i_flags = cpu_to_le32(ei->i_flags & 0xFFFFFFFF);
> +	if (likely(!test_opt2(inode->i_sb, HURD_COMPAT)))
> +		raw_inode->i_file_acl_high =
> +			cpu_to_le16(ei->i_file_acl >> 32);
> +	raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl);
> +	ext4_isize_set(raw_inode, ei->i_disksize);
> +
> +	raw_inode->i_generation = cpu_to_le32(inode->i_generation);
> +	if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) {
> +		if (old_valid_dev(inode->i_rdev)) {
> +			raw_inode->i_block[0] =
> +				cpu_to_le32(old_encode_dev(inode->i_rdev));
> +			raw_inode->i_block[1] = 0;
> +		} else {
> +			raw_inode->i_block[0] = 0;
> +			raw_inode->i_block[1] =
> +				cpu_to_le32(new_encode_dev(inode->i_rdev));
> +			raw_inode->i_block[2] = 0;
> +		}
> +	} else if (!ext4_has_inline_data(inode)) {
> +		for (block = 0; block < EXT4_N_BLOCKS; block++)
> +			raw_inode->i_block[block] = ei->i_data[block];
> +	}
> +
> +	if (likely(!test_opt2(inode->i_sb, HURD_COMPAT))) {
> +		u64 ivers = ext4_inode_peek_iversion(inode);
> +
> +		raw_inode->i_disk_version = cpu_to_le32(ivers);
> +		if (ei->i_extra_isize) {
> +			if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
> +				raw_inode->i_version_hi =
> +					cpu_to_le32(ivers >> 32);
> +			raw_inode->i_extra_isize =
> +				cpu_to_le16(ei->i_extra_isize);
> +		}
> +	}
> +
> +	if (i_projid != EXT4_DEF_PROJID &&
> +	    !ext4_has_feature_project(inode->i_sb))
> +		err = err ?: -EFSCORRUPTED;
> +
> +	if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE &&
> +	    EXT4_FITS_IN_INODE(raw_inode, ei, i_projid))
> +		raw_inode->i_projid = cpu_to_le32(i_projid);
> +
> +	ext4_inode_csum_set(inode, raw_inode, ei);
> +	return err;
> +}
> +
>  /*
>   * ext4_get_inode_loc returns with an extra refcount against the inode's
>   * underlying buffer_head on success. If 'in_mem' is true, we have all
> @@ -4580,13 +4727,6 @@ static inline void ext4_inode_set_iversion_queried(struct inode *inode, u64 val)
>  	else
>  		inode_set_iversion_queried(inode, val);
>  }
> -static inline u64 ext4_inode_peek_iversion(const struct inode *inode)
> -{
> -	if (unlikely(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> -		return inode_peek_iversion_raw(inode);
> -	else
> -		return inode_peek_iversion(inode);
> -}
>  
>  struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
>  			  ext4_iget_flags flags, const char *function,
> @@ -4902,50 +5042,6 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
>  	return ERR_PTR(ret);
>  }
>  
> -static int ext4_inode_blocks_set(struct ext4_inode *raw_inode,
> -				 struct ext4_inode_info *ei)
> -{
> -	struct inode *inode = &(ei->vfs_inode);
> -	u64 i_blocks = READ_ONCE(inode->i_blocks);
> -	struct super_block *sb = inode->i_sb;
> -
> -	if (i_blocks <= ~0U) {
> -		/*
> -		 * i_blocks can be represented in a 32 bit variable
> -		 * as multiple of 512 bytes
> -		 */
> -		raw_inode->i_blocks_lo   = cpu_to_le32(i_blocks);
> -		raw_inode->i_blocks_high = 0;
> -		ext4_clear_inode_flag(inode, EXT4_INODE_HUGE_FILE);
> -		return 0;
> -	}
> -
> -	/*
> -	 * This should never happen since sb->s_maxbytes should not have
> -	 * allowed this, sb->s_maxbytes was set according to the huge_file
> -	 * feature in ext4_fill_super().
> -	 */
> -	if (!ext4_has_feature_huge_file(sb))
> -		return -EFSCORRUPTED;
> -
> -	if (i_blocks <= 0xffffffffffffULL) {
> -		/*
> -		 * i_blocks can be represented in a 48 bit variable
> -		 * as multiple of 512 bytes
> -		 */
> -		raw_inode->i_blocks_lo   = cpu_to_le32(i_blocks);
> -		raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
> -		ext4_clear_inode_flag(inode, EXT4_INODE_HUGE_FILE);
> -	} else {
> -		ext4_set_inode_flag(inode, EXT4_INODE_HUGE_FILE);
> -		/* i_block is stored in file system block size */
> -		i_blocks = i_blocks >> (inode->i_blkbits - 9);
> -		raw_inode->i_blocks_lo   = cpu_to_le32(i_blocks);
> -		raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
> -	}
> -	return 0;
> -}
> -
>  static void __ext4_update_other_inode_time(struct super_block *sb,
>  					   unsigned long orig_ino,
>  					   unsigned long ino,
> @@ -5006,101 +5102,6 @@ static void ext4_update_other_inodes_time(struct super_block *sb,
>  	rcu_read_unlock();
>  }
>  
> -static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode)
> -{
> -	struct ext4_inode_info *ei = EXT4_I(inode);
> -	uid_t i_uid;
> -	gid_t i_gid;
> -	projid_t i_projid;
> -	int block;
> -	int err;
> -
> -	err = ext4_inode_blocks_set(raw_inode, ei);
> -
> -	raw_inode->i_mode = cpu_to_le16(inode->i_mode);
> -	i_uid = i_uid_read(inode);
> -	i_gid = i_gid_read(inode);
> -	i_projid = from_kprojid(&init_user_ns, ei->i_projid);
> -	if (!(test_opt(inode->i_sb, NO_UID32))) {
> -		raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
> -		raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
> -		/*
> -		 * Fix up interoperability with old kernels. Otherwise,
> -		 * old inodes get re-used with the upper 16 bits of the
> -		 * uid/gid intact.
> -		 */
> -		if (ei->i_dtime && list_empty(&ei->i_orphan)) {
> -			raw_inode->i_uid_high = 0;
> -			raw_inode->i_gid_high = 0;
> -		} else {
> -			raw_inode->i_uid_high =
> -				cpu_to_le16(high_16_bits(i_uid));
> -			raw_inode->i_gid_high =
> -				cpu_to_le16(high_16_bits(i_gid));
> -		}
> -	} else {
> -		raw_inode->i_uid_low = cpu_to_le16(fs_high2lowuid(i_uid));
> -		raw_inode->i_gid_low = cpu_to_le16(fs_high2lowgid(i_gid));
> -		raw_inode->i_uid_high = 0;
> -		raw_inode->i_gid_high = 0;
> -	}
> -	raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
> -
> -	EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
> -	EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode);
> -	EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
> -	EXT4_EINODE_SET_XTIME(i_crtime, ei, raw_inode);
> -
> -	raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
> -	raw_inode->i_flags = cpu_to_le32(ei->i_flags & 0xFFFFFFFF);
> -	if (likely(!test_opt2(inode->i_sb, HURD_COMPAT)))
> -		raw_inode->i_file_acl_high =
> -			cpu_to_le16(ei->i_file_acl >> 32);
> -	raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl);
> -	ext4_isize_set(raw_inode, ei->i_disksize);
> -
> -	raw_inode->i_generation = cpu_to_le32(inode->i_generation);
> -	if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) {
> -		if (old_valid_dev(inode->i_rdev)) {
> -			raw_inode->i_block[0] =
> -				cpu_to_le32(old_encode_dev(inode->i_rdev));
> -			raw_inode->i_block[1] = 0;
> -		} else {
> -			raw_inode->i_block[0] = 0;
> -			raw_inode->i_block[1] =
> -				cpu_to_le32(new_encode_dev(inode->i_rdev));
> -			raw_inode->i_block[2] = 0;
> -		}
> -	} else if (!ext4_has_inline_data(inode)) {
> -		for (block = 0; block < EXT4_N_BLOCKS; block++)
> -			raw_inode->i_block[block] = ei->i_data[block];
> -	}
> -
> -	if (likely(!test_opt2(inode->i_sb, HURD_COMPAT))) {
> -		u64 ivers = ext4_inode_peek_iversion(inode);
> -
> -		raw_inode->i_disk_version = cpu_to_le32(ivers);
> -		if (ei->i_extra_isize) {
> -			if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
> -				raw_inode->i_version_hi =
> -					cpu_to_le32(ivers >> 32);
> -			raw_inode->i_extra_isize =
> -				cpu_to_le16(ei->i_extra_isize);
> -		}
> -	}
> -
> -	if (i_projid != EXT4_DEF_PROJID &&
> -	    !ext4_has_feature_project(inode->i_sb))
> -		err = err ?: -EFSCORRUPTED;
> -
> -	if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE &&
> -	    EXT4_FITS_IN_INODE(raw_inode, ei, i_projid))
> -		raw_inode->i_projid = cpu_to_le32(i_projid);
> -
> -	ext4_inode_csum_set(inode, raw_inode, ei);
> -	return err;
> -}
> -
>  /*
>   * Post the struct inode info into an on-disk inode location in the
>   * buffer-cache.  This gobbles the caller's reference to the
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 3/3] ext4: prevent getting empty inode buffer
  2021-09-01  2:09 ` [PATCH v5 3/3] ext4: prevent getting empty inode buffer Zhang Yi
@ 2021-09-20 13:58   ` Jan Kara
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2021-09-20 13:58 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Wed 01-09-21 10:09:55, Zhang Yi wrote:
> In ext4_get_inode_loc(), we may skip IO and get an zero && uptodate
> inode buffer when the inode monopolize an inode block for performance
> reason. For most cases, ext4_mark_iloc_dirty() will fill the inode
> buffer to make it fine, but we could miss this call if something bad
> happened. Finally, __ext4_get_inode_loc_noinmem() may probably get an
> empty inode buffer and trigger ext4 error.
> 
> For example, if we remove a nonexistent xattr on inode A,
> ext4_xattr_set_handle() will return ENODATA before invoking
> ext4_mark_iloc_dirty(), it will left an uptodate but zero buffer. We
> will get checksum error message in ext4_iget() when getting inode again.
> 
>   EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid
> 
> Even worse, if we allocate another inode B at the same inode block, it
> will corrupt the inode A on disk when write back inode B.
> 
> So this patch initialize the inode buffer by filling the in-mem inode
> contents if we skip read I/O, ensure that the buffer is really uptodate.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/inode.c | 24 ++++++++++++++----------
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 3c36e701e30e..a8388ec91f9f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4441,12 +4441,12 @@ static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode
>  
>  /*
>   * ext4_get_inode_loc returns with an extra refcount against the inode's
> - * underlying buffer_head on success. If 'in_mem' is true, we have all
> - * data in memory that is needed to recreate the on-disk version of this
> - * inode.
> + * underlying buffer_head on success. If we pass 'inode' and it does not
> + * have in-inode xattr, we have all inode data in memory that is needed
> + * to recreate the on-disk version of this inode.
>   */
>  static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
> -				struct ext4_iloc *iloc, int in_mem,
> +				struct inode *inode, struct ext4_iloc *iloc,
>  				ext4_fsblk_t *ret_block)
>  {
>  	struct ext4_group_desc	*gdp;
> @@ -4486,7 +4486,7 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
>  	 * is the only valid inode in the block, we need not read the
>  	 * block.
>  	 */
> -	if (in_mem) {
> +	if (inode && !ext4_test_inode_state(inode, EXT4_STATE_XATTR)) {
>  		struct buffer_head *bitmap_bh;
>  		int i, start;
>  
> @@ -4514,8 +4514,13 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
>  		}
>  		brelse(bitmap_bh);
>  		if (i == start + inodes_per_block) {
> +			struct ext4_inode *raw_inode =
> +				(struct ext4_inode *) (bh->b_data + iloc->offset);
> +
>  			/* all other inodes are free, so skip I/O */
>  			memset(bh->b_data, 0, bh->b_size);
> +			if (!ext4_test_inode_state(inode, EXT4_STATE_NEW))
> +				ext4_fill_raw_inode(inode, raw_inode);
>  			set_buffer_uptodate(bh);
>  			unlock_buffer(bh);
>  			goto has_buffer;
> @@ -4576,7 +4581,7 @@ static int __ext4_get_inode_loc_noinmem(struct inode *inode,
>  	ext4_fsblk_t err_blk;
>  	int ret;
>  
> -	ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, iloc, 0,
> +	ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, NULL, iloc,
>  					&err_blk);
>  
>  	if (ret == -EIO)
> @@ -4591,9 +4596,8 @@ int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)
>  	ext4_fsblk_t err_blk;
>  	int ret;
>  
> -	/* We have all inode data except xattrs in memory here. */
> -	ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, iloc,
> -		!ext4_test_inode_state(inode, EXT4_STATE_XATTR), &err_blk);
> +	ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, inode, iloc,
> +					&err_blk);
>  
>  	if (ret == -EIO)
>  		ext4_error_inode_block(inode, err_blk, EIO,
> @@ -4606,7 +4610,7 @@ int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)
>  int ext4_get_fc_inode_loc(struct super_block *sb, unsigned long ino,
>  			  struct ext4_iloc *iloc)
>  {
> -	return __ext4_get_inode_loc(sb, ino, iloc, 0, NULL);
> +	return __ext4_get_inode_loc(sb, ino, NULL, iloc, NULL);
>  }
>  
>  static bool ext4_should_enable_dax(struct inode *inode)
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 0/3] ext4: fix a inode checksum error
  2021-09-01  2:09 [PATCH v5 0/3] ext4: fix a inode checksum error Zhang Yi
                   ` (2 preceding siblings ...)
  2021-09-01  2:09 ` [PATCH v5 3/3] ext4: prevent getting empty inode buffer Zhang Yi
@ 2021-10-14 10:57 ` Zhang Yi
  2021-10-28 14:47 ` Theodore Ts'o
  4 siblings, 0 replies; 9+ messages in thread
From: Zhang Yi @ 2021-10-14 10:57 UTC (permalink / raw)
  To: tytso; +Cc: adilger.kernel, jack, yukuai3, linux-ext4

Hi, Ted. Do you consider to merge this remaining patch set?

Thanks,
Yi.

On 2021/9/1 10:09, Zhang Yi wrote:
> We find a checksum error and a inode corruption problem while doing
> stress test, this 3 patches address to fix them. The first two patches
> are prepare to do the fix, the last patch fix these two issue.
> 
>  - Checksum error
> 
>     EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid
> 
>  - Inode corruption
> 
>     e2fsck 1.46.0 (29-Jan-2020)
>     Pass 1: Checking inodes, blocks, and sizes
>     Pass 2: Checking directory structure
>     Entry 'foo' in / (2) has deleted/unused inode 17.  Clear<y>? yes
>     Pass 3: Checking directory connectivity
>     Pass 4: Checking reference counts
>     Pass 5: Checking group summary information
>     Inode bitmap differences:  -17
>     Fix<y>? yes
>     Free inodes count wrong for group #0 (32750, counted=32751).
>     Fix<y>? yes
>     Free inodes count wrong (32750, counted=32751).
>     Fix<y>? yes
> 
> Changes since v4:
>  - Drop first three already applied patches.
>  - Remove 'in_mem' parameter passing __ext4_get_inode_loc() in the last
>    patch.
> 
> Changes since v3:
>  - Postpone initialization to ext4_do_update_inode() may cause zeroout
>    newly set xattr entry. So switch to do initialization in
>    __ext4_get_inode_loc().
> 
> Changes since v2:
>  - Instead of using WARN_ON_ONCE to prevent ext4_do_update_inode()
>    return before filling the inode buffer, keep the error and postpone
>    the report after the updating in the third patch.
>  - Fix some language mistacks in the last patch.
> 
> Changes since v1:
>  - Add a patch to prevent ext4_do_update_inode() return before filling
>    the inode buffer.
>  - Do not use BH_New flag to indicate the empty buffer, postpone the
>    zero and uptodate logic into ext4_do_update_inode() before filling
>    the inode buffer.
> 
> Thanks,
> Yi.
> 
> Zhang Yi (3):
>   ext4: factor out ext4_fill_raw_inode()
>   ext4: move ext4_fill_raw_inode() related functions
>   ext4: prevent getting empty inode buffer
> 
>  fs/ext4/inode.c | 316 +++++++++++++++++++++++++-----------------------
>  1 file changed, 165 insertions(+), 151 deletions(-)
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 0/3] ext4: fix a inode checksum error
  2021-09-01  2:09 [PATCH v5 0/3] ext4: fix a inode checksum error Zhang Yi
                   ` (3 preceding siblings ...)
  2021-10-14 10:57 ` [PATCH v5 0/3] ext4: fix a inode checksum error Zhang Yi
@ 2021-10-28 14:47 ` Theodore Ts'o
  4 siblings, 0 replies; 9+ messages in thread
From: Theodore Ts'o @ 2021-10-28 14:47 UTC (permalink / raw)
  To: linux-ext4, Zhang Yi; +Cc: Theodore Ts'o, jack, adilger.kernel, yukuai3

On Wed, 1 Sep 2021 10:09:52 +0800, Zhang Yi wrote:
> We find a checksum error and a inode corruption problem while doing
> stress test, this 3 patches address to fix them. The first two patches
> are prepare to do the fix, the last patch fix these two issue.
> 
>  - Checksum error
> 
>     EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid
> 
> [...]

Applied, thanks!

[1/3] ext4: factor out ext4_fill_raw_inode()
      commit: 6a6af6b5ee4363e29fc86552ddfd94c5000d845d
[2/3] ext4: move ext4_fill_raw_inode() related functions
      commit: 4bc2b511975a6fe5ce9f2b2dcc84b152354d1f90
[3/3] ext4: prevent getting empty inode buffer
      commit: fb5becb151d54652ae613e34157f1aadb8447ed8

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-10-28 14:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-01  2:09 [PATCH v5 0/3] ext4: fix a inode checksum error Zhang Yi
2021-09-01  2:09 ` [PATCH v5 1/3] ext4: factor out ext4_fill_raw_inode() Zhang Yi
2021-09-20 13:57   ` Jan Kara
2021-09-01  2:09 ` [PATCH v5 2/3] ext4: move ext4_fill_raw_inode() related functions Zhang Yi
2021-09-20 13:57   ` Jan Kara
2021-09-01  2:09 ` [PATCH v5 3/3] ext4: prevent getting empty inode buffer Zhang Yi
2021-09-20 13:58   ` Jan Kara
2021-10-14 10:57 ` [PATCH v5 0/3] ext4: fix a inode checksum error Zhang Yi
2021-10-28 14:47 ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).