linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2.3 00/23] ext4: Add metadata checksumming
@ 2012-01-07  8:27 Darrick J. Wong
  2012-01-07  8:27 ` [PATCH 01/23] ext4: Create a new BH_Verified flag to avoid unnecessary metadata validation Darrick J. Wong
                   ` (22 more replies)
  0 siblings, 23 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:27 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Hi all,

This patchset adds crc32c checksums to most of the ext4 metadata objects.  A
full design document is on the ext4 wiki[1] but I will summarize that document here.

As much as we wish our storage hardware was totally reliable, it is still
quite possible for data to be corrupted on disk, corrupted during transfer over
a wire, or written to the wrong places.  To protect against this sort of
non-hostile corruption, it is desirable to store checksums of metadata objects
on the filesystem to prevent broken metadata from shredding the filesystem.

The crc32c polynomial was chosen for its improved error detection capabilities
over crc32 and crc16, and because of its hardware acceleration on current and
upcoming Intel and Sparc chips.

Each type of metadata object has been retrofitted to store a checksum as follows:

- The superblock stores a crc32c of itself.
- Each inode stores crc32c(fs_uuid + inode_num + inode_gen + inode +
  slack_space_after_inode)
- Block and inode bitmaps each get their own crc32c(fs_uuid + group_num +
  bitmap), stored in the block group descriptor.
- Each extent tree block stores a crc32c(fs_uuid + inode_num + inode_gen +
  extent_entries) in unused space at the end of the block.
- Each directory leaf block has an unused-looking directory entry big enough to
  store a crc32c(fs_uuid + inode_num + inode_gen + block) at the end of the
  block.
- Each directory htree block is shortened to contain a crc32c(fs_uuid +
  inode_num + inode_gen + block) at the end of the block.
- Extended attribute blocks store crc32c(fs_uuid + id + ea_block) in the
  header, where id is, depending on the refcount, either the inode_num and
  inode_gen; or the block number.
- MMP blocks store crc32c(fs_uuid + mmpblock) at the end of the MMP block.
- Block groups can now use crc32c instead of crc16.
- The journal now has a v2 checksum feature flag.
- crc32c(j_uuid + block) checksums have been inserted into descriptor blocks,
  commit blocks, revoke blocks, and the journal superblock.
- Each block tag in a descriptor block has a checksum of the related data block.

The patchset for e2fsprogs will be sent under separate cover only to linux-ext4
as it is quite lengthy (~51 patches).

As far as performance impact goes, I see nearly no change with a standard mail
server ffsb simulation.  On a test that involves only file creation and
deletion and extent tree modifications, I see a drop of about 50 percent with
the current kernel crc32c implementation; this improves to a drop of about 20
percent with the enclosed crc32c implementation.  However, given that metadata
is usually a small fraction of total IO, it doesn't seem like the cost of
enabling this feature is unreasonable.

There are a few unresolved issues:

- I haven't fixed it up to checksum the exclude bitmap yet.  I'll probably
  submit that as an add-on to the snapshot patchset.

- Using the journal commit hooks to delay crc32c calculation until dirty
  buffers are actually being written to disk.

- Interaction with online resize code.  Yongqiang seems to be in the process of
  rewriting this not to use custom metadata block write functions, but I haven't
  looked at it very closely yet.

Please have a look at the design document and patches, and please feel free to
suggest any changes.

v2: Checksum the MMP block, store the checksum type in the superblock, include
the inode generation in file checksums, and finally solve the problem of limited
space in block groups by splitting the checksum into two halves.

v2.1: Checksum the reserved parts of the htree tail structure.  Fix some flag
handling bugs with the mb cache init routine wherein bitmaps could fail to be
checksummed at read time.

v2.2: Reincorporate the FS UUID in the bitmap checksum calcuations.  Move all
disk layout changes to the front and the feature flag enablement to the end of
the patch set.  Fail journal recovery if revoke block fails checksum.

v2.3: Update the design document URL, and various minor naming cleanups.

This patchset has been tested on 3.2.0-rc7 on x64, i386, ppc64, and ppc32.  The
patches seems to work fine on all four platforms.

--D

[1] https://ext4.wiki.kernel.org/articles/e/x/t/Ext4_Metadata_Checksums_4d24.html


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 01/23] ext4: Create a new BH_Verified flag to avoid unnecessary metadata validation
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
@ 2012-01-07  8:27 ` Darrick J. Wong
  2012-01-07  8:28 ` [PATCH 02/23] ext4: Change on-disk layout to support extended metadata checksumming Darrick J. Wong
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:27 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Create a new BH_Verified flag to indicate that we've verified all the data in a
buffer_head for correctness.  This allows us to bypass expensive verification
steps when they are not necessary without missing them when they are.

v2: The later jbd2 metadata checksumming patches need a BH_Verified flag for
journal metadata, so put the flag in jbd2.h so that both drivers can share the
same bit.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/ext4/extents.c          |   35 ++++++++++++++++++++++++++---------
 include/linux/jbd_common.h |    2 ++
 2 files changed, 28 insertions(+), 9 deletions(-)


diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 607b155..148f89f 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -396,6 +396,26 @@ int ext4_ext_check_inode(struct inode *inode)
 	return ext4_ext_check(inode, ext_inode_hdr(inode), ext_depth(inode));
 }
 
+static int __ext4_ext_check_block(const char *function, unsigned int line,
+				  struct inode *inode,
+				  struct ext4_extent_header *eh,
+				  int depth,
+				  struct buffer_head *bh)
+{
+	int ret;
+
+	if (buffer_verified(bh))
+		return 0;
+	ret = ext4_ext_check(inode, eh, depth);
+	if (ret)
+		return ret;
+	set_buffer_verified(bh);
+	return ret;
+}
+
+#define ext4_ext_check_block(inode, eh, depth, bh)	\
+	__ext4_ext_check_block(__func__, __LINE__, inode, eh, depth, bh)
+
 #ifdef EXT_DEBUG
 static void ext4_ext_show_path(struct inode *inode, struct ext4_ext_path *path)
 {
@@ -652,8 +672,6 @@ ext4_ext_find_extent(struct inode *inode, ext4_lblk_t block,
 	i = depth;
 	/* walk through the tree */
 	while (i) {
-		int need_to_validate = 0;
-
 		ext_debug("depth %d: num %d, max %d\n",
 			  ppos, le16_to_cpu(eh->eh_entries), le16_to_cpu(eh->eh_max));
 
@@ -672,8 +690,6 @@ ext4_ext_find_extent(struct inode *inode, ext4_lblk_t block,
 				put_bh(bh);
 				goto err;
 			}
-			/* validate the extent entries */
-			need_to_validate = 1;
 		}
 		eh = ext_block_hdr(bh);
 		ppos++;
@@ -687,7 +703,7 @@ ext4_ext_find_extent(struct inode *inode, ext4_lblk_t block,
 		path[ppos].p_hdr = eh;
 		i--;
 
-		if (need_to_validate && ext4_ext_check(inode, eh, i))
+		if (ext4_ext_check_block(inode, eh, i, bh))
 			goto err;
 	}
 
@@ -1328,7 +1344,8 @@ got_index:
 			return -EIO;
 		eh = ext_block_hdr(bh);
 		/* subtract from p_depth to get proper eh_depth */
-		if (ext4_ext_check(inode, eh, path->p_depth - depth)) {
+		if (ext4_ext_check_block(inode, eh,
+					 path->p_depth - depth, bh)) {
 			put_bh(bh);
 			return -EIO;
 		}
@@ -1341,7 +1358,7 @@ got_index:
 	if (bh == NULL)
 		return -EIO;
 	eh = ext_block_hdr(bh);
-	if (ext4_ext_check(inode, eh, path->p_depth - depth)) {
+	if (ext4_ext_check_block(inode, eh, path->p_depth - depth, bh)) {
 		put_bh(bh);
 		return -EIO;
 	}
@@ -2572,8 +2589,8 @@ again:
 				err = -EIO;
 				break;
 			}
-			if (ext4_ext_check(inode, ext_block_hdr(bh),
-							depth - i - 1)) {
+			if (ext4_ext_check_block(inode, ext_block_hdr(bh),
+							depth - i - 1, bh)) {
 				err = -EIO;
 				break;
 			}
diff --git a/include/linux/jbd_common.h b/include/linux/jbd_common.h
index 6230f85..6133679 100644
--- a/include/linux/jbd_common.h
+++ b/include/linux/jbd_common.h
@@ -12,6 +12,7 @@ enum jbd_state_bits {
 	BH_State,		/* Pins most journal_head state */
 	BH_JournalHead,		/* Pins bh->b_private and jh->b_bh */
 	BH_Unshadow,		/* Dummy bit, for BJ_Shadow wakeup filtering */
+	BH_Verified,		/* Metadata block has been verified ok */
 	BH_JBDPrivateStart,	/* First bit available for private use by FS */
 };
 
@@ -24,6 +25,7 @@ TAS_BUFFER_FNS(Revoked, revoked)
 BUFFER_FNS(RevokeValid, revokevalid)
 TAS_BUFFER_FNS(RevokeValid, revokevalid)
 BUFFER_FNS(Freed, freed)
+BUFFER_FNS(Verified, verified)
 
 static inline struct buffer_head *jh2bh(struct journal_head *jh)
 {


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 02/23] ext4: Change on-disk layout to support extended metadata checksumming
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
  2012-01-07  8:27 ` [PATCH 01/23] ext4: Create a new BH_Verified flag to avoid unnecessary metadata validation Darrick J. Wong
@ 2012-01-07  8:28 ` Darrick J. Wong
  2012-01-07  8:28 ` [PATCH 03/23] ext4: Record the checksum algorithm in use in the superblock Darrick J. Wong
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:28 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Define flags and change structure definitions to allow checksumming of ext4
metadata.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/ext4/ext4.h         |   43 +++++++++++++++++++++++++++++++++++--------
 fs/ext4/ext4_extents.h |   13 +++++++++++++
 fs/ext4/namei.c        |    8 ++++++++
 fs/ext4/xattr.h        |    4 +++-
 4 files changed, 59 insertions(+), 9 deletions(-)


diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 5b0e26a..f85fb34 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -289,7 +289,9 @@ struct ext4_group_desc
 	__le16	bg_free_inodes_count_lo;/* Free inodes count */
 	__le16	bg_used_dirs_count_lo;	/* Directories count */
 	__le16	bg_flags;		/* EXT4_BG_flags (INODE_UNINIT, etc) */
-	__u32	bg_reserved[2];		/* Likely block/inode bitmap checksum */
+	__le32  bg_exclude_bitmap_lo;   /* Exclude bitmap for snapshots */
+	__le16  bg_block_bitmap_csum_lo;/* crc32c(s_uuid+grp_num+bbitmap) LE */
+	__le16  bg_inode_bitmap_csum_lo;/* crc32c(s_uuid+grp_num+ibitmap) LE */
 	__le16  bg_itable_unused_lo;	/* Unused inodes count */
 	__le16  bg_checksum;		/* crc16(sb_uuid+group+desc) */
 	__le32	bg_block_bitmap_hi;	/* Blocks bitmap block MSB */
@@ -299,7 +301,10 @@ struct ext4_group_desc
 	__le16	bg_free_inodes_count_hi;/* Free inodes count MSB */
 	__le16	bg_used_dirs_count_hi;	/* Directories count MSB */
 	__le16  bg_itable_unused_hi;    /* Unused inodes count MSB */
-	__u32	bg_reserved2[3];
+	__le32  bg_exclude_bitmap_hi;   /* Exclude bitmap block MSB */
+	__le16  bg_block_bitmap_csum_hi;/* crc32c(s_uuid+grp_num+bbitmap) BE */
+	__le16  bg_inode_bitmap_csum_hi;/* crc32c(s_uuid+grp_num+ibitmap) BE */
+	__u32   bg_reserved;
 };
 
 /*
@@ -632,7 +637,8 @@ struct ext4_inode {
 			__le16	l_i_file_acl_high;
 			__le16	l_i_uid_high;	/* these 2 fields */
 			__le16	l_i_gid_high;	/* were reserved2[0] */
-			__u32	l_i_reserved2;
+			__le16	l_i_checksum_lo;/* crc32c(uuid+inum+inode) LE */
+			__le16	l_i_reserved;
 		} linux2;
 		struct {
 			__le16	h_i_reserved1;	/* Obsoleted fragment number/size which are removed in ext4 */
@@ -648,7 +654,7 @@ struct ext4_inode {
 		} masix2;
 	} osd2;				/* OS dependent 2 */
 	__le16	i_extra_isize;
-	__le16	i_pad1;
+	__le16	i_checksum_hi;	/* crc32c(uuid+inum+inode) BE */
 	__le32  i_ctime_extra;  /* extra Change time      (nsec << 2 | epoch) */
 	__le32  i_mtime_extra;  /* extra Modification time(nsec << 2 | epoch) */
 	__le32  i_atime_extra;  /* extra Access time      (nsec << 2 | epoch) */
@@ -750,7 +756,7 @@ do {									       \
 #define i_gid_low	i_gid
 #define i_uid_high	osd2.linux2.l_i_uid_high
 #define i_gid_high	osd2.linux2.l_i_gid_high
-#define i_reserved2	osd2.linux2.l_i_reserved2
+#define i_checksum_lo	osd2.linux2.l_i_checksum_lo
 
 #elif defined(__GNU__)
 
@@ -982,6 +988,9 @@ extern void ext4_set_bits(void *bm, int cur, int len);
 #define EXT4_ERRORS_PANIC		3	/* Panic */
 #define EXT4_ERRORS_DEFAULT		EXT4_ERRORS_CONTINUE
 
+/* Metadata checksum algorithm codes */
+#define EXT4_CRC32C_CHKSUM		1
+
 /*
  * Structure of the super block
  */
@@ -1068,7 +1077,7 @@ struct ext4_super_block {
 	__le64  s_mmp_block;            /* Block for multi-mount protection */
 	__le32  s_raid_stripe_width;    /* blocks on all data disks (N*stride)*/
 	__u8	s_log_groups_per_flex;  /* FLEX_BG group size */
-	__u8	s_reserved_char_pad;
+	__u8	s_checksum_type;	/* metadata checksum algorithm used */
 	__le16  s_reserved_pad;
 	__le64	s_kbytes_written;	/* nr of lifetime kilobytes written */
 	__le32	s_snapshot_inum;	/* Inode number of active snapshot */
@@ -1094,7 +1103,8 @@ struct ext4_super_block {
 	__le32	s_usr_quota_inum;	/* inode for tracking user quota */
 	__le32	s_grp_quota_inum;	/* inode for tracking group quota */
 	__le32	s_overhead_clusters;	/* overhead blocks/clusters in fs */
-	__le32  s_reserved[109];        /* Padding to the end of the block */
+	__le32	s_reserved[108];	/* Padding to the end of the block */
+	__le32	s_checksum;		/* crc32c(superblock) */
 };
 
 #define EXT4_S_ERR_LEN (EXT4_S_ERR_END - EXT4_S_ERR_START)
@@ -1397,6 +1407,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
 #define EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE	0x0040
 #define EXT4_FEATURE_RO_COMPAT_QUOTA		0x0100
 #define EXT4_FEATURE_RO_COMPAT_BIGALLOC		0x0200
+#define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM	0x0400
 
 #define EXT4_FEATURE_INCOMPAT_COMPRESSION	0x0001
 #define EXT4_FEATURE_INCOMPAT_FILETYPE		0x0002
@@ -1409,6 +1420,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
 #define EXT4_FEATURE_INCOMPAT_FLEX_BG		0x0200
 #define EXT4_FEATURE_INCOMPAT_EA_INODE		0x0400 /* EA in inode */
 #define EXT4_FEATURE_INCOMPAT_DIRDATA		0x1000 /* data in dirent */
+#define EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM	0x2000
 
 #define EXT2_FEATURE_COMPAT_SUPP	EXT4_FEATURE_COMPAT_EXT_ATTR
 #define EXT2_FEATURE_INCOMPAT_SUPP	(EXT4_FEATURE_INCOMPAT_FILETYPE| \
@@ -1506,6 +1518,18 @@ struct ext4_dir_entry_2 {
 };
 
 /*
+ * This is a bogus directory entry at the end of each leaf block that
+ * records checksums.
+ */
+struct ext4_dir_entry_tail {
+	__le32	det_reserved_zero1;	/* Pretend to be unused */
+	__le16	det_rec_len;		/* 12 */
+	__u8	det_reserved_zero2;	/* Zero name length */
+	__u8	det_reserved_ft;	/* 0xDE, fake file type */
+	__le32	det_checksum;		/* crc32c(uuid+inum+dirblock) */
+};
+
+/*
  * Ext4 directory file types.  Only the low 3 bits are used.  The
  * other bits are reserved for now.
  */
@@ -1520,6 +1544,8 @@ struct ext4_dir_entry_2 {
 
 #define EXT4_FT_MAX		8
 
+#define EXT4_FT_DIR_CSUM	0xDE
+
 /*
  * EXT4_DIR_PAD defines the directory entries boundaries
  *
@@ -1716,7 +1742,8 @@ struct mmp_struct {
 	__le16	mmp_check_interval;
 
 	__le16	mmp_pad1;
-	__le32	mmp_pad2[227];
+	__le32	mmp_pad2[226];
+	__le32	mmp_checksum;		/* crc32c(uuid+mmp_block) */
 };
 
 /* arguments passed to the mmp thread */
diff --git a/fs/ext4/ext4_extents.h b/fs/ext4/ext4_extents.h
index a52db3a..5c7fbad 100644
--- a/fs/ext4/ext4_extents.h
+++ b/fs/ext4/ext4_extents.h
@@ -63,9 +63,22 @@
  * ext4_inode has i_block array (60 bytes total).
  * The first 12 bytes store ext4_extent_header;
  * the remainder stores an array of ext4_extent.
+ * For non-inode extent blocks, ext4_extent_tail
+ * follows the array.
  */
 
 /*
+ * This is the extent tail on-disk structure.
+ * All other extent structures are 12 bytes long.  It turns out that
+ * block_size % 12 >= 4 for at least all powers of 2 greater than 512, which
+ * covers all valid ext4 block sizes.  Therefore, this tail structure can be
+ * crammed into the end of the block without having to rebalance the tree.
+ */
+struct ext4_extent_tail {
+	__le32	et_checksum;	/* crc32c(uuid+inum+extent_block) */
+};
+
+/*
  * This is the extent on-disk structure.
  * It's used at the bottom of the tree.
  */
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index aa4c782..c10ae34 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -145,6 +145,14 @@ struct dx_map_entry
 	u16 size;
 };
 
+/*
+ * This goes at the end of each htree block.
+ */
+struct dx_tail {
+	u32 dt_reserved;
+	__le32 dt_checksum;	/* crc32c(uuid+inum+dirblock) */
+};
+
 static inline ext4_lblk_t dx_get_block(struct dx_entry *entry);
 static void dx_set_block(struct dx_entry *entry, ext4_lblk_t value);
 static inline unsigned dx_get_hash(struct dx_entry *entry);
diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h
index 25b7387..91f31ca 100644
--- a/fs/ext4/xattr.h
+++ b/fs/ext4/xattr.h
@@ -27,7 +27,9 @@ struct ext4_xattr_header {
 	__le32	h_refcount;	/* reference count */
 	__le32	h_blocks;	/* number of disk blocks used */
 	__le32	h_hash;		/* hash value of all attributes */
-	__u32	h_reserved[4];	/* zero right now */
+	__le32	h_checksum;	/* crc32c(uuid+id+xattrblock) */
+				/* id = inum if refcount=1, blknum otherwise */
+	__u32	h_reserved[3];	/* zero right now */
 };
 
 struct ext4_xattr_ibody_header {


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 03/23] ext4: Record the checksum algorithm in use in the superblock
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
  2012-01-07  8:27 ` [PATCH 01/23] ext4: Create a new BH_Verified flag to avoid unnecessary metadata validation Darrick J. Wong
  2012-01-07  8:28 ` [PATCH 02/23] ext4: Change on-disk layout to support extended metadata checksumming Darrick J. Wong
@ 2012-01-07  8:28 ` Darrick J. Wong
  2012-01-07  8:28 ` [PATCH 04/23] ext4: Only call out to crc32c if necessary Darrick J. Wong
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:28 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Record the type of checksum algorithm we're using for metadata in the
superblock, in case we ever want/need to change the algorithm.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/ext4/super.c |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)


diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 3e1329e..de9f78c 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -111,6 +111,16 @@ static struct file_system_type ext3_fs_type = {
 #define IS_EXT3_SB(sb) (0)
 #endif
 
+static int ext4_verify_csum_type(struct super_block *sb,
+				 struct ext4_super_block *es)
+{
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
+					EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return 1;
+
+	return es->s_checksum_type == EXT4_CRC32C_CHKSUM;
+}
+
 void *ext4_kvmalloc(size_t size, gfp_t flags)
 {
 	void *ret;
@@ -3181,6 +3191,14 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		goto cantfind_ext4;
 	sbi->s_kbytes_written = le64_to_cpu(es->s_kbytes_written);
 
+	/* Check for a known checksum algorithm */
+	if (!ext4_verify_csum_type(sb, es)) {
+		ext4_msg(sb, KERN_ERR, "VFS: Found ext4 filesystem with "
+			 "unknown checksum algorithm.");
+		silent = 1;
+		goto cantfind_ext4;
+	}
+
 	/* Set defaults before we parse the mount options */
 	def_mount_opts = le32_to_cpu(es->s_default_mount_opts);
 	set_opt(sb, INIT_INODE_TABLE);


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 04/23] ext4: Only call out to crc32c if necessary
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (2 preceding siblings ...)
  2012-01-07  8:28 ` [PATCH 03/23] ext4: Record the checksum algorithm in use in the superblock Darrick J. Wong
@ 2012-01-07  8:28 ` Darrick J. Wong
  2012-01-07  8:28 ` [PATCH 05/23] ext4: Calculate and verify superblock checksum Darrick J. Wong
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:28 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Only obtain a reference to the cryptoapi and crc32c if we happen to mount a
filesystem with metadata checksumming enabled.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/ext4/Kconfig |    2 ++
 fs/ext4/ext4.h  |   23 +++++++++++++++++++++++
 fs/ext4/super.c |   16 ++++++++++++++++
 3 files changed, 41 insertions(+), 0 deletions(-)


diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig
index 9ed1bb1..c22f170 100644
--- a/fs/ext4/Kconfig
+++ b/fs/ext4/Kconfig
@@ -2,6 +2,8 @@ config EXT4_FS
 	tristate "The Extended 4 (ext4) filesystem"
 	select JBD2
 	select CRC16
+	select CRYPTO
+	select CRYPTO_CRC32C
 	help
 	  This is the next generation of the ext3 filesystem.
 
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index f85fb34..a6de0d2 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -29,6 +29,7 @@
 #include <linux/wait.h>
 #include <linux/blockgroup_lock.h>
 #include <linux/percpu_counter.h>
+#include <crypto/hash.h>
 #ifdef __KERNEL__
 #include <linux/compat.h>
 #endif
@@ -1259,6 +1260,9 @@ struct ext4_sb_info {
 
 	/* record the last minlen when FITRIM is called. */
 	atomic_t s_last_trim_minblks;
+
+	/* Reference to checksum algorithm driver via cryptoapi */
+	struct crypto_shash *s_chksum_driver;
 };
 
 static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
@@ -1614,6 +1618,25 @@ static inline __le16 ext4_rec_len_to_disk(unsigned len, unsigned blocksize)
 #define DX_HASH_HALF_MD4_UNSIGNED	4
 #define DX_HASH_TEA_UNSIGNED		5
 
+static inline u32 ext4_chksum(struct ext4_sb_info *sbi, u32 crc,
+			      const void *address, unsigned int length)
+{
+	struct {
+		struct shash_desc shash;
+		char ctx[crypto_shash_descsize(sbi->s_chksum_driver)];
+	} desc;
+	int err;
+
+	desc.shash.tfm = sbi->s_chksum_driver;
+	desc.shash.flags = 0;
+	*(u32 *)desc.ctx = crc;
+
+	err = crypto_shash_update(&desc.shash, address, length);
+	BUG_ON(err);
+
+	return *(u32 *)desc.ctx;
+}
+
 #ifdef __KERNEL__
 
 /* hash info structure used by the directory hash */
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index de9f78c..58c697f 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -887,6 +887,8 @@ static void ext4_put_super(struct super_block *sb)
 	unlock_super(sb);
 	kobject_put(&sbi->s_kobj);
 	wait_for_completion(&sbi->s_kobj_unregister);
+	if (sbi->s_chksum_driver)
+		crypto_free_shash(sbi->s_chksum_driver);
 	kfree(sbi->s_blockgroup_lock);
 	kfree(sbi);
 }
@@ -3199,6 +3201,18 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		goto cantfind_ext4;
 	}
 
+	/* Load the checksum driver */
+	if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
+				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
+		sbi->s_chksum_driver = crypto_alloc_shash("crc32c", 0, 0);
+		if (IS_ERR(sbi->s_chksum_driver)) {
+			ext4_msg(sb, KERN_ERR, "Cannot load crc32c driver.");
+			ret = PTR_ERR(sbi->s_chksum_driver);
+			sbi->s_chksum_driver = NULL;
+			goto failed_mount;
+		}
+	}
+
 	/* Set defaults before we parse the mount options */
 	def_mount_opts = le32_to_cpu(es->s_default_mount_opts);
 	set_opt(sb, INIT_INODE_TABLE);
@@ -3879,6 +3893,8 @@ failed_mount2:
 		brelse(sbi->s_group_desc[i]);
 	ext4_kvfree(sbi->s_group_desc);
 failed_mount:
+	if (sbi->s_chksum_driver)
+		crypto_free_shash(sbi->s_chksum_driver);
 	if (sbi->s_proc) {
 		remove_proc_entry(sb->s_id, ext4_proc_root);
 	}


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 05/23] ext4: Calculate and verify superblock checksum
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (3 preceding siblings ...)
  2012-01-07  8:28 ` [PATCH 04/23] ext4: Only call out to crc32c if necessary Darrick J. Wong
@ 2012-01-07  8:28 ` Darrick J. Wong
  2012-01-07  8:28 ` [PATCH 06/23] ext4: Calculate and verify inode checksums Darrick J. Wong
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:28 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Calculate and verify the superblock checksum.  Since the UUID and block group
number are embedded in each copy of the superblock, we need only checksum the
entire block.  Refactor some of the code to eliminate open-coding of the
checksum update call.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/ext4/ext4.h      |   10 ++++++++++
 fs/ext4/ext4_jbd2.c |    9 ++++++++-
 fs/ext4/ext4_jbd2.h |    7 +++++--
 fs/ext4/inode.c     |    3 +--
 fs/ext4/namei.c     |    4 ++--
 fs/ext4/resize.c    |    6 +++++-
 fs/ext4/super.c     |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 78 insertions(+), 8 deletions(-)


diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index a6de0d2..9d1504e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1263,6 +1263,9 @@ struct ext4_sb_info {
 
 	/* Reference to checksum algorithm driver via cryptoapi */
 	struct crypto_shash *s_chksum_driver;
+
+	/* Precomputed FS UUID checksum for seeding other checksums */
+	__u32 s_csum_seed;
 };
 
 static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
@@ -1976,6 +1979,10 @@ extern int ext4_group_extend(struct super_block *sb,
 				ext4_fsblk_t n_blocks_count);
 
 /* super.c */
+extern int ext4_superblock_csum_verify(struct super_block *sb,
+				       struct ext4_super_block *es);
+extern void ext4_superblock_csum_set(struct super_block *sb,
+				     struct ext4_super_block *es);
 extern void *ext4_kvmalloc(size_t size, gfp_t flags);
 extern void *ext4_kvzalloc(size_t size, gfp_t flags);
 extern void ext4_kvfree(void *ptr);
@@ -2251,6 +2258,9 @@ static inline void ext4_unlock_group(struct super_block *sb,
 
 static inline void ext4_mark_super_dirty(struct super_block *sb)
 {
+	struct ext4_super_block *es = EXT4_SB(sb)->s_es;
+
+	ext4_superblock_csum_set(sb, es);
 	if (EXT4_SB(sb)->s_journal == NULL)
 		sb->s_dirt =1;
 }
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index aca1790..90f7c2e 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -138,16 +138,23 @@ int __ext4_handle_dirty_metadata(const char *where, unsigned int line,
 }
 
 int __ext4_handle_dirty_super(const char *where, unsigned int line,
-			      handle_t *handle, struct super_block *sb)
+			      handle_t *handle, struct super_block *sb,
+			      int now)
 {
 	struct buffer_head *bh = EXT4_SB(sb)->s_sbh;
 	int err = 0;
 
 	if (ext4_handle_valid(handle)) {
+		ext4_superblock_csum_set(sb,
+				(struct ext4_super_block *)bh->b_data);
 		err = jbd2_journal_dirty_metadata(handle, bh);
 		if (err)
 			ext4_journal_abort_handle(where, line, __func__,
 						  bh, handle, err);
+	} else if (now) {
+		ext4_superblock_csum_set(sb,
+				(struct ext4_super_block *)bh->b_data);
+		mark_buffer_dirty(bh);
 	} else
 		sb->s_dirt = 1;
 	return err;
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 5802fa1..ed9b78d 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -141,7 +141,8 @@ int __ext4_handle_dirty_metadata(const char *where, unsigned int line,
 				 struct buffer_head *bh);
 
 int __ext4_handle_dirty_super(const char *where, unsigned int line,
-			      handle_t *handle, struct super_block *sb);
+			      handle_t *handle, struct super_block *sb,
+			      int now);
 
 #define ext4_journal_get_write_access(handle, bh) \
 	__ext4_journal_get_write_access(__func__, __LINE__, (handle), (bh))
@@ -153,8 +154,10 @@ int __ext4_handle_dirty_super(const char *where, unsigned int line,
 #define ext4_handle_dirty_metadata(handle, inode, bh) \
 	__ext4_handle_dirty_metadata(__func__, __LINE__, (handle), (inode), \
 				     (bh))
+#define ext4_handle_dirty_super_now(handle, sb) \
+	__ext4_handle_dirty_super(__func__, __LINE__, (handle), (sb), 1)
 #define ext4_handle_dirty_super(handle, sb) \
-	__ext4_handle_dirty_super(__func__, __LINE__, (handle), (sb))
+	__ext4_handle_dirty_super(__func__, __LINE__, (handle), (sb), 0)
 
 handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks);
 int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 92655fd..07cd63d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4044,8 +4044,7 @@ static int ext4_do_update_inode(handle_t *handle,
 					EXT4_FEATURE_RO_COMPAT_LARGE_FILE);
 			sb->s_dirt = 1;
 			ext4_handle_sync(handle);
-			err = ext4_handle_dirty_metadata(handle, NULL,
-					EXT4_SB(sb)->s_sbh);
+			err = ext4_handle_dirty_super_now(handle, sb);
 		}
 	}
 	raw_inode->i_generation = cpu_to_le32(inode->i_generation);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index c10ae34..ebfc499 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2021,7 +2021,7 @@ int ext4_orphan_add(handle_t *handle, struct inode *inode)
 	/* Insert this inode at the head of the on-disk orphan list... */
 	NEXT_ORPHAN(inode) = le32_to_cpu(EXT4_SB(sb)->s_es->s_last_orphan);
 	EXT4_SB(sb)->s_es->s_last_orphan = cpu_to_le32(inode->i_ino);
-	err = ext4_handle_dirty_metadata(handle, NULL, EXT4_SB(sb)->s_sbh);
+	err = ext4_handle_dirty_super_now(handle, sb);
 	rc = ext4_mark_iloc_dirty(handle, inode, &iloc);
 	if (!err)
 		err = rc;
@@ -2094,7 +2094,7 @@ int ext4_orphan_del(handle_t *handle, struct inode *inode)
 		if (err)
 			goto out_brelse;
 		sbi->s_es->s_last_orphan = cpu_to_le32(ino_next);
-		err = ext4_handle_dirty_metadata(handle, NULL, sbi->s_sbh);
+		err = ext4_handle_dirty_super_now(handle, inode->i_sb);
 	} else {
 		struct ext4_iloc iloc2;
 		struct inode *i_prev =
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 996780a..c33b72a 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -511,7 +511,7 @@ static int add_new_gdb(handle_t *handle, struct inode *inode,
 	ext4_kvfree(o_group_desc);
 
 	le16_add_cpu(&es->s_reserved_gdt_blocks, -1);
-	err = ext4_handle_dirty_metadata(handle, NULL, EXT4_SB(sb)->s_sbh);
+	err = ext4_handle_dirty_super_now(handle, sb);
 	if (err)
 		ext4_std_error(sb, err);
 
@@ -682,6 +682,8 @@ static void update_backups(struct super_block *sb,
 		goto exit_err;
 	}
 
+	ext4_superblock_csum_set(sb, (struct ext4_super_block *)data);
+
 	while ((group = ext4_list_backups(sb, &three, &five, &seven)) < last) {
 		struct buffer_head *bh;
 
@@ -925,6 +927,8 @@ int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
 	/* Update the global fs size fields */
 	sbi->s_groups_count++;
 
+	ext4_superblock_csum_set(sb,
+				 (struct ext4_super_block *)primary->b_data);
 	err = ext4_handle_dirty_metadata(handle, NULL, primary);
 	if (unlikely(err)) {
 		ext4_std_error(sb, err);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 58c697f..99bee03 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -121,6 +121,38 @@ static int ext4_verify_csum_type(struct super_block *sb,
 	return es->s_checksum_type == EXT4_CRC32C_CHKSUM;
 }
 
+static __le32 ext4_superblock_csum(struct super_block *sb,
+				   struct ext4_super_block *es)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	int offset = offsetof(struct ext4_super_block, s_checksum);
+	__u32 csum;
+
+	csum = ext4_chksum(sbi, ~0, (char *)es, offset);
+
+	return cpu_to_le32(csum);
+}
+
+int ext4_superblock_csum_verify(struct super_block *sb,
+				struct ext4_super_block *es)
+{
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
+				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return 1;
+
+	return es->s_checksum == ext4_superblock_csum(sb, es);
+}
+
+void ext4_superblock_csum_set(struct super_block *sb,
+			      struct ext4_super_block *es)
+{
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
+		EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return;
+
+	es->s_checksum = ext4_superblock_csum(sb, es);
+}
+
 void *ext4_kvmalloc(size_t size, gfp_t flags)
 {
 	void *ret;
@@ -3213,6 +3245,20 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		}
 	}
 
+	/* Check superblock checksum */
+	if (!ext4_superblock_csum_verify(sb, es)) {
+		ext4_msg(sb, KERN_ERR, "VFS: Found ext4 filesystem with "
+			 "invalid superblock checksum.  Run e2fsck?");
+		silent = 1;
+		goto cantfind_ext4;
+	}
+
+	/* Precompute checksum seed for all metadata */
+	if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
+			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		sbi->s_csum_seed = ext4_chksum(sbi, ~0, es->s_uuid,
+					       sizeof(es->s_uuid));
+
 	/* Set defaults before we parse the mount options */
 	def_mount_opts = le32_to_cpu(es->s_default_mount_opts);
 	set_opt(sb, INIT_INODE_TABLE);
@@ -4218,6 +4264,7 @@ static int ext4_commit_super(struct super_block *sb, int sync)
 				&EXT4_SB(sb)->s_freeinodes_counter));
 	sb->s_dirt = 0;
 	BUFFER_TRACE(sbh, "marking dirty");
+	ext4_superblock_csum_set(sb, es);
 	mark_buffer_dirty(sbh);
 	if (sync) {
 		error = sync_dirty_buffer(sbh);


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 06/23] ext4: Calculate and verify inode checksums
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (4 preceding siblings ...)
  2012-01-07  8:28 ` [PATCH 05/23] ext4: Calculate and verify superblock checksum Darrick J. Wong
@ 2012-01-07  8:28 ` Darrick J. Wong
  2012-01-07  8:28 ` [PATCH 07/23] ext4: Calculate and verify checksums for inode bitmaps Darrick J. Wong
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:28 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

This patch introduces to ext4 the ability to calculate and verify inode
checksums.  This requires the use of a new ro compatibility flag and some
accompanying e2fsprogs patches to provide the relevant features in tune2fs and
e2fsck.  The inode generation changes have been integrated into this patch.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/ext4/ext4.h   |    3 +
 fs/ext4/ialloc.c |   13 ++++++
 fs/ext4/inode.c  |  111 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 fs/ext4/ioctl.c  |    7 +++
 4 files changed, 126 insertions(+), 8 deletions(-)


diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9d1504e..57c1e39 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -897,6 +897,9 @@ struct ext4_inode_info {
 	 */
 	tid_t i_sync_tid;
 	tid_t i_datasync_tid;
+
+	/* Precomputed uuid+inum+igen checksum for seeding inode checksums */
+	__u32 i_csum_seed;
 };
 
 /*
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 00beb4f..9884fa1 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -892,6 +892,19 @@ got:
 	inode->i_generation = sbi->s_next_generation++;
 	spin_unlock(&sbi->s_next_gen_lock);
 
+	/* Precompute checksum seed for inode metadata */
+	if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
+			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
+		__u32 csum;
+		struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+		__le32 inum = cpu_to_le32(inode->i_ino);
+		__le32 gen = cpu_to_le32(inode->i_generation);
+		csum = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)&inum,
+				   sizeof(inum));
+		ei->i_csum_seed = ext4_chksum(sbi, csum, (__u8 *)&gen,
+					      sizeof(gen));
+	}
+
 	ext4_clear_state_flags(ei); /* Only relevant on 32-bit archs */
 	ext4_set_inode_state(inode, EXT4_STATE_NEW);
 
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 07cd63d..d8f6a63 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -48,6 +48,73 @@
 
 #define MPAGE_DA_EXTENT_TAIL 0x01
 
+static __u32 ext4_inode_csum(struct inode *inode, struct ext4_inode *raw,
+			      struct ext4_inode_info *ei)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+	__u16 csum_lo;
+	__u16 csum_hi = 0;
+	__u32 csum;
+
+	csum_lo = raw->i_checksum_lo;
+	raw->i_checksum_lo = 0;
+	if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE &&
+	    EXT4_FITS_IN_INODE(raw, ei, i_checksum_hi)) {
+		csum_hi = raw->i_checksum_hi;
+		raw->i_checksum_hi = 0;
+	}
+
+	csum = ext4_chksum(sbi, ei->i_csum_seed, (__u8 *)raw,
+			   EXT4_INODE_SIZE(inode->i_sb));
+
+	raw->i_checksum_lo = csum_lo;
+	if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE &&
+	    EXT4_FITS_IN_INODE(raw, ei, i_checksum_hi))
+		raw->i_checksum_hi = csum_hi;
+
+	return csum;
+}
+
+static int ext4_inode_csum_verify(struct inode *inode, struct ext4_inode *raw,
+				  struct ext4_inode_info *ei)
+{
+	__u32 provided, calculated;
+
+	if (EXT4_SB(inode->i_sb)->s_es->s_creator_os !=
+	    cpu_to_le32(EXT4_OS_LINUX) ||
+	    !EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+		EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return 1;
+
+	provided = le16_to_cpu(raw->i_checksum_lo);
+	calculated = ext4_inode_csum(inode, raw, ei);
+	if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE &&
+	    EXT4_FITS_IN_INODE(raw, ei, i_checksum_hi))
+		provided |= ((__u32)le16_to_cpu(raw->i_checksum_hi)) << 16;
+	else
+		calculated &= 0xFFFF;
+
+	return provided == calculated;
+}
+
+static void ext4_inode_csum_set(struct inode *inode, struct ext4_inode *raw,
+				struct ext4_inode_info *ei)
+{
+	__u32 csum;
+
+	if (EXT4_SB(inode->i_sb)->s_es->s_creator_os !=
+	    cpu_to_le32(EXT4_OS_LINUX) ||
+	    !EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+		EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return;
+
+	csum = ext4_inode_csum(inode, raw, ei);
+	raw->i_checksum_lo = cpu_to_le16(csum & 0xFFFF);
+	if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE &&
+	    EXT4_FITS_IN_INODE(raw, ei, i_checksum_hi))
+		raw->i_checksum_hi = cpu_to_le16(csum >> 16);
+}
+
 static inline int ext4_begin_ordered_truncate(struct inode *inode,
 					      loff_t new_size)
 {
@@ -3751,6 +3818,39 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
 	if (ret < 0)
 		goto bad_inode;
 	raw_inode = ext4_raw_inode(&iloc);
+
+	if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
+		ei->i_extra_isize = le16_to_cpu(raw_inode->i_extra_isize);
+		if (EXT4_GOOD_OLD_INODE_SIZE + ei->i_extra_isize >
+		    EXT4_INODE_SIZE(inode->i_sb)) {
+			EXT4_ERROR_INODE(inode, "bad extra_isize (%u != %u)",
+				EXT4_GOOD_OLD_INODE_SIZE + ei->i_extra_isize,
+				EXT4_INODE_SIZE(inode->i_sb));
+			ret = -EIO;
+			goto bad_inode;
+		}
+	} else
+		ei->i_extra_isize = 0;
+
+	/* Precompute checksum seed for inode metadata */
+	if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
+			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
+		struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+		__u32 csum;
+		__le32 inum = cpu_to_le32(inode->i_ino);
+		__le32 gen = raw_inode->i_generation;
+		csum = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)&inum,
+				   sizeof(inum));
+		ei->i_csum_seed = ext4_chksum(sbi, csum, (__u8 *)&gen,
+					      sizeof(gen));
+	}
+
+	if (!ext4_inode_csum_verify(inode, raw_inode, ei)) {
+		EXT4_ERROR_INODE(inode, "checksum invalid");
+		ret = -EIO;
+		goto bad_inode;
+	}
+
 	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
 	inode->i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
 	inode->i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
@@ -3828,12 +3928,6 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
 	}
 
 	if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
-		ei->i_extra_isize = le16_to_cpu(raw_inode->i_extra_isize);
-		if (EXT4_GOOD_OLD_INODE_SIZE + ei->i_extra_isize >
-		    EXT4_INODE_SIZE(inode->i_sb)) {
-			ret = -EIO;
-			goto bad_inode;
-		}
 		if (ei->i_extra_isize == 0) {
 			/* The extra space is currently unused. Use it. */
 			ei->i_extra_isize = sizeof(struct ext4_inode) -
@@ -3845,8 +3939,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
 			if (*magic == cpu_to_le32(EXT4_XATTR_MAGIC))
 				ext4_set_inode_state(inode, EXT4_STATE_XATTR);
 		}
-	} else
-		ei->i_extra_isize = 0;
+	}
 
 	EXT4_INODE_GET_XTIME(i_ctime, inode, raw_inode);
 	EXT4_INODE_GET_XTIME(i_mtime, inode, raw_inode);
@@ -4071,6 +4164,8 @@ static int ext4_do_update_inode(handle_t *handle,
 		raw_inode->i_extra_isize = cpu_to_le16(ei->i_extra_isize);
 	}
 
+	ext4_inode_csum_set(inode, raw_inode, ei);
+
 	BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata");
 	rc = ext4_handle_dirty_metadata(handle, NULL, bh);
 	if (!err)
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index a567968..4217f99 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -150,6 +150,13 @@ flags_out:
 		if (!inode_owner_or_capable(inode))
 			return -EPERM;
 
+		if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+				EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
+			ext4_warning(sb, "Setting inode version is not "
+				     "supported with metadata_csum enabled.");
+			return -ENOTTY;
+		}
+
 		err = mnt_want_write(filp->f_path.mnt);
 		if (err)
 			return err;


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 07/23] ext4: Calculate and verify checksums for inode bitmaps
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (5 preceding siblings ...)
  2012-01-07  8:28 ` [PATCH 06/23] ext4: Calculate and verify inode checksums Darrick J. Wong
@ 2012-01-07  8:28 ` Darrick J. Wong
  2012-01-07  8:28 ` [PATCH 08/23] ext4: Calculate and verify block bitmap checksum Darrick J. Wong
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:28 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Compute and verify the checksum of the inode bitmap; the checkum is stored in
the block group descriptor.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/ext4/bitmap.c |   39 +++++++++++++++++++++++++++++++++++++++
 fs/ext4/ext4.h   |   13 +++++++++++++
 fs/ext4/ialloc.c |   30 +++++++++++++++++++++++++++---
 3 files changed, 79 insertions(+), 3 deletions(-)


diff --git a/fs/ext4/bitmap.c b/fs/ext4/bitmap.c
index fa3af81..0ae4a01 100644
--- a/fs/ext4/bitmap.c
+++ b/fs/ext4/bitmap.c
@@ -29,3 +29,42 @@ unsigned int ext4_count_free(struct buffer_head *map, unsigned int numchars)
 
 #endif  /*  EXT4FS_DEBUG  */
 
+int ext4_inode_bitmap_csum_verify(struct super_block *sb, ext4_group_t group,
+				  struct ext4_group_desc *gdp,
+				  struct buffer_head *bh, int sz)
+{
+	__u32 hi;
+	__u32 provided, calculated;
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
+			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return 1;
+
+	provided = le16_to_cpu(gdp->bg_inode_bitmap_csum_lo);
+	calculated = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)bh->b_data, sz);
+	if (sbi->s_desc_size >= EXT4_BG_INODE_BITMAP_CSUM_HI_END) {
+		hi = le16_to_cpu(gdp->bg_inode_bitmap_csum_hi);
+		provided |= (hi << 16);
+	} else
+		calculated &= 0xFFFF;
+
+	return provided == calculated;
+}
+
+void ext4_inode_bitmap_csum_set(struct super_block *sb, ext4_group_t group,
+				struct ext4_group_desc *gdp,
+				struct buffer_head *bh, int sz)
+{
+	__u32 csum;
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
+			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return;
+
+	csum = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)bh->b_data, sz);
+	gdp->bg_inode_bitmap_csum_lo = cpu_to_le16(csum & 0xFFFF);
+	if (sbi->s_desc_size >= EXT4_BG_INODE_BITMAP_CSUM_HI_END)
+		gdp->bg_inode_bitmap_csum_hi = cpu_to_le16(csum >> 16);
+}
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 57c1e39..4bb9fb3 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -308,6 +308,13 @@ struct ext4_group_desc
 	__u32   bg_reserved;
 };
 
+#define EXT4_BG_INODE_BITMAP_CSUM_HI_END	\
+	(offsetof(struct ext4_group_desc, bg_inode_bitmap_csum_hi) + \
+	 sizeof(__le16))
+#define EXT4_BG_BLOCK_BITMAP_CSUM_HI_END	\
+	(offsetof(struct ext4_group_desc, bg_block_bitmap_csum_hi) + \
+	 sizeof(__le16))
+
 /*
  * Structure of a flex block group info
  */
@@ -1815,6 +1822,12 @@ struct mmpd_data {
 
 /* bitmap.c */
 extern unsigned int ext4_count_free(struct buffer_head *, unsigned);
+void ext4_inode_bitmap_csum_set(struct super_block *sb, ext4_group_t group,
+				struct ext4_group_desc *gdp,
+				struct buffer_head *bh, int sz);
+int ext4_inode_bitmap_csum_verify(struct super_block *sb, ext4_group_t group,
+				  struct ext4_group_desc *gdp,
+				  struct buffer_head *bh, int sz);
 
 /* balloc.c */
 extern unsigned int ext4_block_group(struct super_block *sb,
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 9884fa1..59e3644 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -82,12 +82,17 @@ static unsigned ext4_init_inode_bitmap(struct super_block *sb,
 		ext4_free_inodes_set(sb, gdp, 0);
 		ext4_itable_unused_set(sb, gdp, 0);
 		memset(bh->b_data, 0xff, sb->s_blocksize);
+		ext4_inode_bitmap_csum_set(sb, block_group, gdp, bh,
+					   EXT4_INODES_PER_GROUP(sb) / 8);
 		return 0;
 	}
 
 	memset(bh->b_data, 0, (EXT4_INODES_PER_GROUP(sb) + 7) / 8);
 	ext4_mark_bitmap_end(EXT4_INODES_PER_GROUP(sb), sb->s_blocksize * 8,
 			bh->b_data);
+	ext4_inode_bitmap_csum_set(sb, block_group, gdp, bh,
+				   EXT4_INODES_PER_GROUP(sb) / 8);
+	gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
 
 	return EXT4_INODES_PER_GROUP(sb);
 }
@@ -118,12 +123,12 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
 		return NULL;
 	}
 	if (bitmap_uptodate(bh))
-		return bh;
+		goto verify;
 
 	lock_buffer(bh);
 	if (bitmap_uptodate(bh)) {
 		unlock_buffer(bh);
-		return bh;
+		goto verify;
 	}
 
 	ext4_lock_group(sb, block_group);
@@ -131,6 +136,7 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
 		ext4_init_inode_bitmap(sb, bh, block_group, desc);
 		set_bitmap_uptodate(bh);
 		set_buffer_uptodate(bh);
+		set_buffer_verified(bh);
 		ext4_unlock_group(sb, block_group);
 		unlock_buffer(bh);
 		return bh;
@@ -144,7 +150,7 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
 		 */
 		set_bitmap_uptodate(bh);
 		unlock_buffer(bh);
-		return bh;
+		goto verify;
 	}
 	/*
 	 * submit the buffer_head for read. We can
@@ -161,6 +167,20 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
 			    block_group, bitmap_blk);
 		return NULL;
 	}
+
+verify:
+	ext4_lock_group(sb, block_group);
+	if (!buffer_verified(bh) &&
+	    !ext4_inode_bitmap_csum_verify(sb, block_group, desc, bh,
+					   EXT4_INODES_PER_GROUP(sb) / 8)) {
+		ext4_unlock_group(sb, block_group);
+		put_bh(bh);
+		ext4_error(sb, "Corrupt inode bitmap - block_group = %u, "
+			   "inode_bitmap = %llu", block_group, bitmap_blk);
+		return NULL;
+	}
+	ext4_unlock_group(sb, block_group);
+	set_buffer_verified(bh);
 	return bh;
 }
 
@@ -265,6 +285,8 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
 		ext4_used_dirs_set(sb, gdp, count);
 		percpu_counter_dec(&sbi->s_dirs_counter);
 	}
+	ext4_inode_bitmap_csum_set(sb, block_group, gdp, bitmap_bh,
+				   EXT4_INODES_PER_GROUP(sb) / 8);
 	gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
 	ext4_unlock_group(sb, block_group);
 
@@ -673,6 +695,8 @@ static int ext4_claim_inode(struct super_block *sb,
 			atomic_inc(&sbi->s_flex_groups[f].used_dirs);
 		}
 	}
+	ext4_inode_bitmap_csum_set(sb, group, gdp, inode_bitmap_bh,
+				   EXT4_INODES_PER_GROUP(sb) / 8);
 	gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp);
 err_ret:
 	ext4_unlock_group(sb, group);


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 08/23] ext4: Calculate and verify block bitmap checksum
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (6 preceding siblings ...)
  2012-01-07  8:28 ` [PATCH 07/23] ext4: Calculate and verify checksums for inode bitmaps Darrick J. Wong
@ 2012-01-07  8:28 ` Darrick J. Wong
  2012-01-07  8:28 ` [PATCH 09/23] ext4: Verify and calculate checksums for extent tree blocks Darrick J. Wong
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:28 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Compute and verify the checksum of the block bitmap; this checksum is stored in
the block group descriptor.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/ext4/balloc.c  |   38 ++++++++++++++++----
 fs/ext4/bitmap.c  |   40 +++++++++++++++++++++
 fs/ext4/ext4.h    |   10 +++++
 fs/ext4/ialloc.c  |    4 ++
 fs/ext4/mballoc.c |   99 ++++++++++++++++++++++++++++++++++++++++++++++-------
 5 files changed, 169 insertions(+), 22 deletions(-)


diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 12ccacd..dcac4bd 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -172,6 +172,8 @@ void ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
 		ext4_free_inodes_set(sb, gdp, 0);
 		ext4_itable_unused_set(sb, gdp, 0);
 		memset(bh->b_data, 0xff, sb->s_blocksize);
+		ext4_block_bitmap_csum_set(sb, block_group, gdp, bh,
+					   EXT4_BLOCKS_PER_GROUP(sb) / 8);
 		return;
 	}
 	memset(bh->b_data, 0, sb->s_blocksize);
@@ -208,6 +210,9 @@ void ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
 	 */
 	ext4_mark_bitmap_end(num_clusters_in_group(sb, block_group),
 			     sb->s_blocksize * 8, bh->b_data);
+	ext4_block_bitmap_csum_set(sb, block_group, gdp, bh,
+				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
+	gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
 }
 
 /* Return the number of free blocks in a block group.  It is used when
@@ -273,10 +278,10 @@ struct ext4_group_desc * ext4_get_group_desc(struct super_block *sb,
 	return desc;
 }
 
-static int ext4_valid_block_bitmap(struct super_block *sb,
-					struct ext4_group_desc *desc,
-					unsigned int block_group,
-					struct buffer_head *bh)
+int ext4_valid_block_bitmap(struct super_block *sb,
+			    struct ext4_group_desc *desc,
+			    unsigned int block_group,
+			    struct buffer_head *bh)
 {
 	ext4_grpblk_t offset;
 	ext4_grpblk_t next_zero_bit;
@@ -353,12 +358,12 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
 	}
 
 	if (bitmap_uptodate(bh))
-		return bh;
+		goto verify;
 
 	lock_buffer(bh);
 	if (bitmap_uptodate(bh)) {
 		unlock_buffer(bh);
-		return bh;
+		goto verify;
 	}
 	ext4_lock_group(sb, block_group);
 	if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
@@ -377,7 +382,7 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
 		 */
 		set_bitmap_uptodate(bh);
 		unlock_buffer(bh);
-		return bh;
+		goto verify;
 	}
 	/*
 	 * submit the buffer_head for read. We can
@@ -394,11 +399,26 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
 			    block_group, bitmap_blk);
 		return NULL;
 	}
-	ext4_valid_block_bitmap(sb, desc, block_group, bh);
+
+verify:
+	if (buffer_verified(bh))
+		return bh;
 	/*
 	 * file system mounted not to panic on error,
-	 * continue with corrupt bitmap
+	 * -EIO with corrupt bitmap
 	 */
+	ext4_lock_group(sb, block_group);
+	if (!ext4_valid_block_bitmap(sb, desc, block_group, bh) ||
+	    !ext4_block_bitmap_csum_verify(sb, block_group, desc, bh,
+					   EXT4_BLOCKS_PER_GROUP(sb) / 8)) {
+		ext4_unlock_group(sb, block_group);
+		put_bh(bh);
+		ext4_error(sb, "Corrupt block bitmap - block_group = %u, "
+			   "block_bitmap = %llu", block_group, bitmap_blk);
+		return NULL;
+	}
+	ext4_unlock_group(sb, block_group);
+	set_buffer_verified(bh);
 	return bh;
 }
 
diff --git a/fs/ext4/bitmap.c b/fs/ext4/bitmap.c
index 0ae4a01..33ad69e 100644
--- a/fs/ext4/bitmap.c
+++ b/fs/ext4/bitmap.c
@@ -68,3 +68,43 @@ void ext4_inode_bitmap_csum_set(struct super_block *sb, ext4_group_t group,
 	if (sbi->s_desc_size >= EXT4_BG_INODE_BITMAP_CSUM_HI_END)
 		gdp->bg_inode_bitmap_csum_hi = cpu_to_le16(csum >> 16);
 }
+
+int ext4_block_bitmap_csum_verify(struct super_block *sb, ext4_group_t group,
+				  struct ext4_group_desc *gdp,
+				  struct buffer_head *bh, int sz)
+{
+	__u32 hi;
+	__u32 provided, calculated;
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
+			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return 1;
+
+	provided = le16_to_cpu(gdp->bg_block_bitmap_csum_lo);
+	calculated = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)bh->b_data, sz);
+	if (sbi->s_desc_size >= EXT4_BG_BLOCK_BITMAP_CSUM_HI_END) {
+		hi = le16_to_cpu(gdp->bg_block_bitmap_csum_hi);
+		provided |= (hi << 16);
+	} else
+		calculated &= 0xFFFF;
+
+	return provided == calculated;
+}
+
+void ext4_block_bitmap_csum_set(struct super_block *sb, ext4_group_t group,
+				struct ext4_group_desc *gdp,
+				struct buffer_head *bh, int sz)
+{
+	__u32 csum;
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
+			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return;
+
+	csum = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)bh->b_data, sz);
+	gdp->bg_block_bitmap_csum_lo = cpu_to_le16(csum & 0xFFFF);
+	if (sbi->s_desc_size >= EXT4_BG_BLOCK_BITMAP_CSUM_HI_END)
+		gdp->bg_block_bitmap_csum_hi = cpu_to_le16(csum >> 16);
+}
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 4bb9fb3..f62a822 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1828,8 +1828,18 @@ void ext4_inode_bitmap_csum_set(struct super_block *sb, ext4_group_t group,
 int ext4_inode_bitmap_csum_verify(struct super_block *sb, ext4_group_t group,
 				  struct ext4_group_desc *gdp,
 				  struct buffer_head *bh, int sz);
+void ext4_block_bitmap_csum_set(struct super_block *sb, ext4_group_t group,
+				struct ext4_group_desc *gdp,
+				struct buffer_head *bh, int sz);
+int ext4_block_bitmap_csum_verify(struct super_block *sb, ext4_group_t group,
+				  struct ext4_group_desc *gdp,
+				  struct buffer_head *bh, int sz);
 
 /* balloc.c */
+extern int ext4_valid_block_bitmap(struct super_block *sb,
+				   struct ext4_group_desc *desc,
+				   unsigned int block_group,
+				   struct buffer_head *bh);
 extern unsigned int ext4_block_group(struct super_block *sb,
 			ext4_fsblk_t blocknr);
 extern ext4_grpblk_t ext4_block_group_offset(struct super_block *sb,
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 59e3644..261ffce 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -854,6 +854,10 @@ got:
 			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
 			ext4_free_group_clusters_set(sb, gdp,
 				ext4_free_clusters_after_init(sb, group, gdp));
+			ext4_block_bitmap_csum_set(sb, group, gdp,
+						   block_bitmap_bh,
+						   EXT4_BLOCKS_PER_GROUP(sb) /
+						   8);
 			gdp->bg_checksum = ext4_group_desc_csum(sbi, group,
 								gdp);
 		}
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index e2d8be8..dbd1453 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -754,6 +754,53 @@ void ext4_mb_generate_buddy(struct super_block *sb,
 	spin_unlock(&EXT4_SB(sb)->s_bal_lock);
 }
 
+static void ext4_mb_verify_block_bitmap(struct super_block *sb,
+					ext4_group_t group,
+					struct buffer_head *bh)
+{
+	struct ext4_group_desc *desc;
+
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
+			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return;
+
+	if (buffer_verified(bh))
+		return;
+
+	desc = ext4_get_group_desc(sb, group, NULL);
+	if (!desc)
+		return;
+
+	ext4_lock_group(sb, group);
+	if (!ext4_valid_block_bitmap(sb, desc, group, bh) ||
+	    !ext4_block_bitmap_csum_verify(sb, group, desc, bh,
+					   EXT4_BLOCKS_PER_GROUP(sb) / 8)) {
+		ext4_unlock_group(sb, group);
+		ext4_error(sb, "Corrupt block bitmap, group = %u", group);
+		return;
+	}
+	set_buffer_verified(bh);
+	ext4_unlock_group(sb, group);
+}
+
+struct ext4_csum_data {
+	struct super_block	*cd_sb;
+	ext4_group_t		cd_group;
+};
+
+static void ext4_end_buffer_read_sync(struct buffer_head *bh, int uptodate)
+{
+	struct super_block *sb =
+		((struct ext4_csum_data *)bh->b_private)->cd_sb;
+	ext4_group_t group = ((struct ext4_csum_data *)bh->b_private)->cd_group;
+
+	if (uptodate)
+		ext4_mb_verify_block_bitmap(sb, group, bh);
+
+	bh->b_private = NULL;
+	end_buffer_read_sync(bh, uptodate);
+}
+
 /* The buddy information is attached the buddy cache inode
  * for convenience. The information regarding each group
  * is loaded via ext4_mb_load_buddy. The information involve
@@ -786,11 +833,12 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
 	int first_block;
 	struct super_block *sb;
 	struct buffer_head *bhs;
-	struct buffer_head **bh;
+	struct buffer_head **bh = NULL;
 	struct inode *inode;
 	char *data;
 	char *bitmap;
 	struct ext4_group_info *grinfo;
+	struct ext4_csum_data *csd = NULL;
 
 	mb_debug(1, "init page %lu\n", page->index);
 
@@ -804,6 +852,14 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
 	if (groups_per_page == 0)
 		groups_per_page = 1;
 
+	if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
+			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
+		csd = kzalloc(sizeof(struct ext4_csum_data) * groups_per_page,
+			      GFP_NOFS);
+		if (csd == NULL)
+			goto out;
+	}
+
 	/* allocate buffer_heads to read bitmaps */
 	if (groups_per_page > 1) {
 		err = -ENOMEM;
@@ -845,11 +901,13 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
 		if (bh[i] == NULL)
 			goto out;
 
-		if (bitmap_uptodate(bh[i]))
+		if (bitmap_uptodate(bh[i]) &&
+		    (csd && buffer_verified(bh[i])))
 			continue;
 
 		lock_buffer(bh[i]);
 		if (bitmap_uptodate(bh[i])) {
+			ext4_mb_verify_block_bitmap(sb, first_group + i, bh[i]);
 			unlock_buffer(bh[i]);
 			continue;
 		}
@@ -860,6 +918,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
 			set_bitmap_uptodate(bh[i]);
 			set_buffer_uptodate(bh[i]);
 			ext4_unlock_group(sb, first_group + i);
+			ext4_mb_verify_block_bitmap(sb, first_group + i, bh[i]);
 			unlock_buffer(bh[i]);
 			continue;
 		}
@@ -870,6 +929,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
 			 * bitmap is also uptodate
 			 */
 			set_bitmap_uptodate(bh[i]);
+			ext4_mb_verify_block_bitmap(sb, first_group + i, bh[i]);
 			unlock_buffer(bh[i]);
 			continue;
 		}
@@ -881,22 +941,28 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
 		 * get set with buffer lock held.
 		 */
 		set_bitmap_uptodate(bh[i]);
-		bh[i]->b_end_io = end_buffer_read_sync;
+		if (csd) {
+			csd[i].cd_sb = sb;
+			csd[i].cd_group = first_group + i;
+			bh[i]->b_private = csd + i;
+			bh[i]->b_end_io = ext4_end_buffer_read_sync;
+		} else
+			bh[i]->b_end_io = end_buffer_read_sync;
 		submit_bh(READ, bh[i]);
 		mb_debug(1, "read bitmap for group %u\n", first_group + i);
 	}
 
-	/* wait for I/O completion */
-	for (i = 0; i < groups_per_page; i++)
-		if (bh[i])
-			wait_on_buffer(bh[i]);
-
-	err = -EIO;
-	for (i = 0; i < groups_per_page; i++)
-		if (bh[i] && !buffer_uptodate(bh[i]))
-			goto out;
-
+	/* Wait for I/O completion and checksum verification */
 	err = 0;
+	for (i = 0; i < groups_per_page; i++) {
+		if (bh[i] == NULL)
+			continue;
+		wait_on_buffer(bh[i]);
+		if (!buffer_uptodate(bh[i]) ||
+		    (csd && !buffer_verified(bh[i])))
+			err = -EIO;
+	}
+
 	first_block = page->index * blocks_per_page;
 	for (i = 0; i < blocks_per_page; i++) {
 		int group;
@@ -973,6 +1039,7 @@ out:
 		if (bh != &bhs)
 			kfree(bh);
 	}
+	kfree(csd);
 	return err;
 }
 
@@ -2850,6 +2917,8 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
 	}
 	len = ext4_free_group_clusters(sb, gdp) - ac->ac_b_ex.fe_len;
 	ext4_free_group_clusters_set(sb, gdp, len);
+	ext4_block_bitmap_csum_set(sb, ac->ac_b_ex.fe_group, gdp, bitmap_bh,
+				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
 	gdp->bg_checksum = ext4_group_desc_csum(sbi, ac->ac_b_ex.fe_group, gdp);
 
 	ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
@@ -4716,6 +4785,8 @@ do_more:
 
 	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
 	ext4_free_group_clusters_set(sb, gdp, ret);
+	ext4_block_bitmap_csum_set(sb, block_group, gdp, bitmap_bh,
+				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
 	gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
 	ext4_unlock_group(sb, block_group);
 	percpu_counter_add(&sbi->s_freeclusters_counter, count_clusters);
@@ -4860,6 +4931,8 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 	mb_free_blocks(NULL, &e4b, bit, count);
 	blk_free_count = blocks_freed + ext4_free_group_clusters(sb, desc);
 	ext4_free_group_clusters_set(sb, desc, blk_free_count);
+	ext4_block_bitmap_csum_set(sb, block_group, desc, bitmap_bh,
+				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
 	desc->bg_checksum = ext4_group_desc_csum(sbi, block_group, desc);
 	ext4_unlock_group(sb, block_group);
 	percpu_counter_add(&sbi->s_freeclusters_counter,


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 09/23] ext4: Verify and calculate checksums for extent tree blocks
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (7 preceding siblings ...)
  2012-01-07  8:28 ` [PATCH 08/23] ext4: Calculate and verify block bitmap checksum Darrick J. Wong
@ 2012-01-07  8:28 ` Darrick J. Wong
  2012-01-07  8:28 ` [PATCH 10/23] ext4: Calculate and verify checksums for htree nodes Darrick J. Wong
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:28 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Calculate and verify the checksum for each extent tree block.  The checksum is
located in the space immediately after the last possible ext4_extent in the
block.  The space is is typically the last 4-8 bytes in the block.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/ext4/ext4_extents.h |   11 +++++++++++
 fs/ext4/extents.c      |   50 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+), 0 deletions(-)


diff --git a/fs/ext4/ext4_extents.h b/fs/ext4/ext4_extents.h
index 5c7fbad..7514176 100644
--- a/fs/ext4/ext4_extents.h
+++ b/fs/ext4/ext4_extents.h
@@ -114,6 +114,17 @@ struct ext4_extent_header {
 
 #define EXT4_EXT_MAGIC		cpu_to_le16(0xf30a)
 
+#define EXT4_EXTENT_TAIL_OFFSET(hdr) \
+	(sizeof(struct ext4_extent_header) + \
+	 (sizeof(struct ext4_extent) * le16_to_cpu((hdr)->eh_max)))
+
+static inline struct ext4_extent_tail *
+find_ext4_extent_tail(struct ext4_extent_header *eh)
+{
+	return (struct ext4_extent_tail *)(((void *)eh) +
+					   EXT4_EXTENT_TAIL_OFFSET(eh));
+}
+
 /*
  * Array of ext4_ext_path contains path to some extent.
  * Creation/lookup routines use it for traversal/splitting/etc.
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 148f89f..c501980 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -45,6 +45,46 @@
 
 #include <trace/events/ext4.h>
 
+static __le32 ext4_extent_block_csum(struct inode *inode,
+				     struct ext4_extent_header *eh)
+{
+	struct ext4_inode_info *ei = EXT4_I(inode);
+	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+	__u32 csum;
+
+	csum = ext4_chksum(sbi, ei->i_csum_seed, (__u8 *)eh,
+			   EXT4_EXTENT_TAIL_OFFSET(eh));
+	return cpu_to_le32(csum);
+}
+
+static int ext4_extent_block_csum_verify(struct inode *inode,
+					 struct ext4_extent_header *eh)
+{
+	struct ext4_extent_tail *et;
+
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+		EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return 1;
+
+	et = find_ext4_extent_tail(eh);
+	if (et->et_checksum != ext4_extent_block_csum(inode, eh))
+		return 0;
+	return 1;
+}
+
+static void ext4_extent_block_csum_set(struct inode *inode,
+				       struct ext4_extent_header *eh)
+{
+	struct ext4_extent_tail *et;
+
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+		EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return;
+
+	et = find_ext4_extent_tail(eh);
+	et->et_checksum = ext4_extent_block_csum(inode, eh);
+}
+
 static int ext4_split_extent(handle_t *handle,
 				struct inode *inode,
 				struct ext4_ext_path *path,
@@ -103,6 +143,7 @@ static int __ext4_ext_dirty(const char *where, unsigned int line,
 {
 	int err;
 	if (path->p_bh) {
+		ext4_extent_block_csum_set(inode, ext_block_hdr(path->p_bh));
 		/* path points to block */
 		err = __ext4_handle_dirty_metadata(where, line, handle,
 						   inode, path->p_bh);
@@ -375,6 +416,12 @@ static int __ext4_ext_check(const char *function, unsigned int line,
 		error_msg = "invalid extent entries";
 		goto corrupted;
 	}
+	/* Verify checksum on non-root extent tree nodes */
+	if (ext_depth(inode) != depth &&
+	    !ext4_extent_block_csum_verify(inode, eh)) {
+		error_msg = "extent tree corrupted";
+		goto corrupted;
+	}
 	return 0;
 
 corrupted:
@@ -914,6 +961,7 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,
 		le16_add_cpu(&neh->eh_entries, m);
 	}
 
+	ext4_extent_block_csum_set(inode, neh);
 	set_buffer_uptodate(bh);
 	unlock_buffer(bh);
 
@@ -992,6 +1040,7 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,
 				sizeof(struct ext4_extent_idx) * m);
 			le16_add_cpu(&neh->eh_entries, m);
 		}
+		ext4_extent_block_csum_set(inode, neh);
 		set_buffer_uptodate(bh);
 		unlock_buffer(bh);
 
@@ -1089,6 +1138,7 @@ static int ext4_ext_grow_indepth(handle_t *handle, struct inode *inode,
 	else
 		neh->eh_max = cpu_to_le16(ext4_ext_space_block(inode, 0));
 	neh->eh_magic = EXT4_EXT_MAGIC;
+	ext4_extent_block_csum_set(inode, neh);
 	set_buffer_uptodate(bh);
 	unlock_buffer(bh);
 


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 10/23] ext4: Calculate and verify checksums for htree nodes
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (8 preceding siblings ...)
  2012-01-07  8:28 ` [PATCH 09/23] ext4: Verify and calculate checksums for extent tree blocks Darrick J. Wong
@ 2012-01-07  8:28 ` Darrick J. Wong
  2012-01-07  8:29 ` [PATCH 11/23] ext4: Calculate and verify checksums of directory leaf blocks Darrick J. Wong
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:28 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Calculate and verify the checksum for directory index tree (htree) node blocks.
The checksum is stored in the last 4 bytes of the htree block and requires the
dx_entry array to stop 1 dx_entry short of the end of the block.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/ext4/namei.c |  160 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 156 insertions(+), 4 deletions(-)


diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index ebfc499..8721f51 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -188,6 +188,121 @@ static struct buffer_head * ext4_dx_find_entry(struct inode *dir,
 static int ext4_dx_add_entry(handle_t *handle, struct dentry *dentry,
 			     struct inode *inode);
 
+/* checksumming functions */
+static struct dx_countlimit *get_dx_countlimit(struct inode *inode,
+					       struct ext4_dir_entry *dirent,
+					       int *offset)
+{
+	struct ext4_dir_entry *dp;
+	struct dx_root_info *root;
+	int count_offset;
+
+	if (le16_to_cpu(dirent->rec_len) == EXT4_BLOCK_SIZE(inode->i_sb))
+		count_offset = 8;
+	else if (le16_to_cpu(dirent->rec_len) == 12) {
+		dp = (struct ext4_dir_entry *)(((void *)dirent) + 12);
+		if (le16_to_cpu(dp->rec_len) !=
+		    EXT4_BLOCK_SIZE(inode->i_sb) - 12)
+			return NULL;
+		root = (struct dx_root_info *)(((void *)dp + 12));
+		if (root->reserved_zero ||
+		    root->info_length != sizeof(struct dx_root_info))
+			return NULL;
+		count_offset = 32;
+	} else
+		return NULL;
+
+	if (offset)
+		*offset = count_offset;
+	return (struct dx_countlimit *)(((void *)dirent) + count_offset);
+}
+
+static __le32 ext4_dx_csum(struct inode *inode, struct ext4_dir_entry *dirent,
+			   int count_offset, int count, struct dx_tail *t)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+	struct ext4_inode_info *ei = EXT4_I(inode);
+	__u32 csum, old_csum;
+	int size;
+
+	size = count_offset + (count * sizeof(struct dx_entry));
+	old_csum = t->dt_checksum;
+	t->dt_checksum = 0;
+	csum = ext4_chksum(sbi, ei->i_csum_seed, (__u8 *)dirent, size);
+	csum = ext4_chksum(sbi, csum, (__u8 *)t, sizeof(struct dx_tail));
+	t->dt_checksum = old_csum;
+
+	return cpu_to_le32(csum);
+}
+
+static int ext4_dx_csum_verify(struct inode *inode,
+			       struct ext4_dir_entry *dirent)
+{
+	struct dx_countlimit *c;
+	struct dx_tail *t;
+	int count_offset, limit, count;
+
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+					EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return 1;
+
+	c = get_dx_countlimit(inode, dirent, &count_offset);
+	if (!c) {
+		EXT4_ERROR_INODE(inode, "dir seems corrupt?  Run e2fsck -D.");
+		return 1;
+	}
+	limit = le16_to_cpu(c->limit);
+	count = le16_to_cpu(c->count);
+	if (count_offset + (limit * sizeof(struct dx_entry)) >
+	    EXT4_BLOCK_SIZE(inode->i_sb) - sizeof(struct dx_tail)) {
+		EXT4_ERROR_INODE(inode, "metadata_csum set but no space for "
+				 "tree checksum found.  Run e2fsck -D.");
+		return 1;
+	}
+	t = (struct dx_tail *)(((struct dx_entry *)c) + limit);
+
+	if (t->dt_checksum != ext4_dx_csum(inode, dirent, count_offset,
+					    count, t))
+		return 0;
+	return 1;
+}
+
+static void ext4_dx_csum_set(struct inode *inode, struct ext4_dir_entry *dirent)
+{
+	struct dx_countlimit *c;
+	struct dx_tail *t;
+	int count_offset, limit, count;
+
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+					EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return;
+
+	c = get_dx_countlimit(inode, dirent, &count_offset);
+	if (!c) {
+		EXT4_ERROR_INODE(inode, "dir seems corrupt?  Run e2fsck -D.");
+		return;
+	}
+	limit = le16_to_cpu(c->limit);
+	count = le16_to_cpu(c->count);
+	if (count_offset + (limit * sizeof(struct dx_entry)) >
+	    EXT4_BLOCK_SIZE(inode->i_sb) - sizeof(struct dx_tail)) {
+		EXT4_ERROR_INODE(inode, "metadata_csum set but no space for "
+				 "tree checksum.  Run e2fsck -D.");
+		return;
+	}
+	t = (struct dx_tail *)(((struct dx_entry *)c) + limit);
+
+	t->dt_checksum = ext4_dx_csum(inode, dirent, count_offset, count, t);
+}
+
+static inline int ext4_handle_dirty_dx_node(handle_t *handle,
+					    struct inode *inode,
+					    struct buffer_head *bh)
+{
+	ext4_dx_csum_set(inode, (struct ext4_dir_entry *)bh->b_data);
+	return ext4_handle_dirty_metadata(handle, inode, bh);
+}
+
 /*
  * p is at least 6 bytes before the end of page
  */
@@ -247,12 +362,20 @@ static inline unsigned dx_root_limit(struct inode *dir, unsigned infosize)
 {
 	unsigned entry_space = dir->i_sb->s_blocksize - EXT4_DIR_REC_LEN(1) -
 		EXT4_DIR_REC_LEN(2) - infosize;
+
+	if (EXT4_HAS_RO_COMPAT_FEATURE(dir->i_sb,
+				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		entry_space -= sizeof(struct dx_tail);
 	return entry_space / sizeof(struct dx_entry);
 }
 
 static inline unsigned dx_node_limit(struct inode *dir)
 {
 	unsigned entry_space = dir->i_sb->s_blocksize - EXT4_DIR_REC_LEN(0);
+
+	if (EXT4_HAS_RO_COMPAT_FEATURE(dir->i_sb,
+				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		entry_space -= sizeof(struct dx_tail);
 	return entry_space / sizeof(struct dx_entry);
 }
 
@@ -398,6 +521,15 @@ dx_probe(const struct qstr *d_name, struct inode *dir,
 		goto fail;
 	}
 
+	if (!buffer_verified(bh) &&
+	    !ext4_dx_csum_verify(dir, (struct ext4_dir_entry *)bh->b_data)) {
+		ext4_warning(dir->i_sb, "Root failed checksum");
+		brelse(bh);
+		*err = ERR_BAD_DX_DIR;
+		goto fail;
+	}
+	set_buffer_verified(bh);
+
 	entries = (struct dx_entry *) (((char *)&root->info) +
 				       root->info.info_length);
 
@@ -458,6 +590,17 @@ dx_probe(const struct qstr *d_name, struct inode *dir,
 		if (!(bh = ext4_bread (NULL,dir, dx_get_block(at), 0, err)))
 			goto fail2;
 		at = entries = ((struct dx_node *) bh->b_data)->entries;
+
+		if (!buffer_verified(bh) &&
+		    !ext4_dx_csum_verify(dir,
+					 (struct ext4_dir_entry *)bh->b_data)) {
+			ext4_warning(dir->i_sb, "Node failed checksum");
+			brelse(bh);
+			*err = ERR_BAD_DX_DIR;
+			goto fail;
+		}
+		set_buffer_verified(bh);
+
 		if (dx_get_limit(entries) != dx_node_limit (dir)) {
 			ext4_warning(dir->i_sb,
 				     "dx entry: limit != node limit");
@@ -557,6 +700,15 @@ static int ext4_htree_next_block(struct inode *dir, __u32 hash,
 		if (!(bh = ext4_bread(NULL, dir, dx_get_block(p->at),
 				      0, &err)))
 			return err; /* Failure */
+
+		if (!buffer_verified(bh) &&
+		    !ext4_dx_csum_verify(dir,
+					 (struct ext4_dir_entry *)bh->b_data)) {
+			ext4_warning(dir->i_sb, "Node failed checksum");
+			return -EIO;
+		}
+		set_buffer_verified(bh);
+
 		p++;
 		brelse(p->bh);
 		p->bh = bh;
@@ -1232,7 +1384,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
 	err = ext4_handle_dirty_metadata(handle, dir, bh2);
 	if (err)
 		goto journal_error;
-	err = ext4_handle_dirty_metadata(handle, dir, frame->bh);
+	err = ext4_handle_dirty_dx_node(handle, dir, frame->bh);
 	if (err)
 		goto journal_error;
 	brelse(bh2);
@@ -1419,7 +1571,7 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
 	frame->bh = bh;
 	bh = bh2;
 
-	ext4_handle_dirty_metadata(handle, dir, frame->bh);
+	ext4_handle_dirty_dx_node(handle, dir, frame->bh);
 	ext4_handle_dirty_metadata(handle, dir, bh);
 
 	de = do_split(handle,dir, &bh, frame, &hinfo, &retval);
@@ -1594,7 +1746,7 @@ static int ext4_dx_add_entry(handle_t *handle, struct dentry *dentry,
 			dxtrace(dx_show_index("node", frames[1].entries));
 			dxtrace(dx_show_index("node",
 			       ((struct dx_node *) bh2->b_data)->entries));
-			err = ext4_handle_dirty_metadata(handle, dir, bh2);
+			err = ext4_handle_dirty_dx_node(handle, dir, bh2);
 			if (err)
 				goto journal_error;
 			brelse (bh2);
@@ -1620,7 +1772,7 @@ static int ext4_dx_add_entry(handle_t *handle, struct dentry *dentry,
 			if (err)
 				goto journal_error;
 		}
-		err = ext4_handle_dirty_metadata(handle, dir, frames[0].bh);
+		err = ext4_handle_dirty_dx_node(handle, dir, frames[0].bh);
 		if (err) {
 			ext4_std_error(inode->i_sb, err);
 			goto cleanup;


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 11/23] ext4: Calculate and verify checksums of directory leaf blocks
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (9 preceding siblings ...)
  2012-01-07  8:28 ` [PATCH 10/23] ext4: Calculate and verify checksums for htree nodes Darrick J. Wong
@ 2012-01-07  8:29 ` Darrick J. Wong
  2012-01-07  8:29 ` [PATCH 12/23] ext4: Calculate and verify checksums of extended attribute blocks Darrick J. Wong
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:29 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Calculate and verify the checksums for directory leaf blocks (i.e. blocks that
only contain actual directory entries).  The checksum lives in what looks to be
an unused directory entry with a 0 name_len at the end of the block.  This
scheme is not used for internal htree nodes because the mechanism in place
there only costs one dx_entry, whereas the "empty" directory entry would cost
two dx_entries.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/ext4/dir.c   |   12 +++
 fs/ext4/ext4.h  |    2 
 fs/ext4/namei.c |  260 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 259 insertions(+), 15 deletions(-)


diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 164c560..bc40c9e 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -180,6 +180,18 @@ static int ext4_readdir(struct file *filp,
 			continue;
 		}
 
+		/* Check the checksum */
+		if (!buffer_verified(bh) &&
+		    !ext4_dirent_csum_verify(inode,
+				(struct ext4_dir_entry *)bh->b_data)) {
+			EXT4_ERROR_FILE(filp, 0, "directory fails checksum "
+					"at offset %llu",
+					(unsigned long long)filp->f_pos);
+			filp->f_pos += sb->s_blocksize - offset;
+			continue;
+		}
+		set_buffer_verified(bh);
+
 revalidate:
 		/* If the dir block has changed since the last call to
 		 * readdir(2), then we might be pointing to an invalid
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index f62a822..48536d8 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1992,6 +1992,8 @@ extern long ext4_compat_ioctl(struct file *, unsigned int, unsigned long);
 extern int ext4_ext_migrate(struct inode *);
 
 /* namei.c */
+extern int ext4_dirent_csum_verify(struct inode *inode,
+				   struct ext4_dir_entry *dirent);
 extern int ext4_orphan_add(handle_t *, struct inode *);
 extern int ext4_orphan_del(handle_t *, struct inode *);
 extern int ext4_htree_fill_tree(struct file *dir_file, __u32 start_hash,
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 8721f51..77cd218 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -189,6 +189,115 @@ static int ext4_dx_add_entry(handle_t *handle, struct dentry *dentry,
 			     struct inode *inode);
 
 /* checksumming functions */
+#define EXT4_DIRENT_TAIL(block, blocksize) \
+	((struct ext4_dir_entry_tail *)(((void *)(block)) + \
+					((blocksize) - \
+					 sizeof(struct ext4_dir_entry_tail))))
+
+static void initialize_dirent_tail(struct ext4_dir_entry_tail *t,
+				   unsigned int blocksize)
+{
+	memset(t, 0, sizeof(struct ext4_dir_entry_tail));
+	t->det_rec_len = ext4_rec_len_to_disk(
+			sizeof(struct ext4_dir_entry_tail), blocksize);
+	t->det_reserved_ft = EXT4_FT_DIR_CSUM;
+}
+
+/* Walk through a dirent block to find a checksum "dirent" at the tail */
+static struct ext4_dir_entry_tail *get_dirent_tail(struct inode *inode,
+						   struct ext4_dir_entry *de)
+{
+	struct ext4_dir_entry_tail *t;
+
+#ifdef PARANOID
+	struct ext4_dir_entry *d, *top;
+
+	d = de;
+	top = (struct ext4_dir_entry *)(((void *)de) +
+		(EXT4_BLOCK_SIZE(inode->i_sb) -
+		sizeof(struct ext4_dir_entry_tail)));
+	while (d < top && d->rec_len)
+		d = (struct ext4_dir_entry *)(((void *)d) +
+		    le16_to_cpu(d->rec_len));
+
+	if (d != top)
+		return NULL;
+
+	t = (struct ext4_dir_entry_tail *)d;
+#else
+	t = EXT4_DIRENT_TAIL(de, EXT4_BLOCK_SIZE(inode->i_sb));
+#endif
+
+	if (t->det_reserved_zero1 ||
+	    le16_to_cpu(t->det_rec_len) != sizeof(struct ext4_dir_entry_tail) ||
+	    t->det_reserved_zero2 ||
+	    t->det_reserved_ft != EXT4_FT_DIR_CSUM)
+		return NULL;
+
+	return t;
+}
+
+static __le32 ext4_dirent_csum(struct inode *inode,
+			       struct ext4_dir_entry *dirent, int size)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+	struct ext4_inode_info *ei = EXT4_I(inode);
+	__u32 csum;
+
+	csum = ext4_chksum(sbi, ei->i_csum_seed, (__u8 *)dirent, size);
+	return cpu_to_le32(csum);
+}
+
+int ext4_dirent_csum_verify(struct inode *inode, struct ext4_dir_entry *dirent)
+{
+	struct ext4_dir_entry_tail *t;
+
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+					EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return 1;
+
+	t = get_dirent_tail(inode, dirent);
+	if (!t) {
+		EXT4_ERROR_INODE(inode, "metadata_csum set but no space in dir "
+				 "leaf for checksum.  Please run e2fsck -D.");
+		return 0;
+	}
+
+	if (t->det_checksum != ext4_dirent_csum(inode, dirent,
+						(void *)t - (void *)dirent))
+		return 0;
+
+	return 1;
+}
+
+static void ext4_dirent_csum_set(struct inode *inode,
+				 struct ext4_dir_entry *dirent)
+{
+	struct ext4_dir_entry_tail *t;
+
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+					EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return;
+
+	t = get_dirent_tail(inode, dirent);
+	if (!t) {
+		EXT4_ERROR_INODE(inode, "metadata_csum set but no space in dir "
+				 "leaf for checksum.  Please run e2fsck -D.");
+		return;
+	}
+
+	t->det_checksum = ext4_dirent_csum(inode, dirent,
+					   (void *)t - (void *)dirent);
+}
+
+static inline int ext4_handle_dirty_dirent_node(handle_t *handle,
+						struct inode *inode,
+						struct buffer_head *bh)
+{
+	ext4_dirent_csum_set(inode, (struct ext4_dir_entry *)bh->b_data);
+	return ext4_handle_dirty_metadata(handle, inode, bh);
+}
+
 static struct dx_countlimit *get_dx_countlimit(struct inode *inode,
 					       struct ext4_dir_entry *dirent,
 					       int *offset)
@@ -737,6 +846,11 @@ static int htree_dirblock_to_tree(struct file *dir_file,
 	if (!(bh = ext4_bread (NULL, dir, block, 0, &err)))
 		return err;
 
+	if (!buffer_verified(bh) &&
+	    !ext4_dirent_csum_verify(dir, (struct ext4_dir_entry *)bh->b_data))
+		return -EIO;
+	set_buffer_verified(bh);
+
 	de = (struct ext4_dir_entry_2 *) bh->b_data;
 	top = (struct ext4_dir_entry_2 *) ((char *) de +
 					   dir->i_sb->s_blocksize -
@@ -1096,6 +1210,15 @@ restart:
 			brelse(bh);
 			goto next;
 		}
+		if (!buffer_verified(bh) &&
+		    !ext4_dirent_csum_verify(dir,
+				(struct ext4_dir_entry *)bh->b_data)) {
+			EXT4_ERROR_INODE(dir, "checksumming directory "
+					 "block %lu", (unsigned long)block);
+			brelse(bh);
+			goto next;
+		}
+		set_buffer_verified(bh);
 		i = search_dirblock(bh, dir, d_name,
 			    block << EXT4_BLOCK_SIZE_BITS(sb), res_dir);
 		if (i == 1) {
@@ -1147,6 +1270,16 @@ static struct buffer_head * ext4_dx_find_entry(struct inode *dir, const struct q
 		if (!(bh = ext4_bread(NULL, dir, block, 0, err)))
 			goto errout;
 
+		if (!buffer_verified(bh) &&
+		    !ext4_dirent_csum_verify(dir,
+				(struct ext4_dir_entry *)bh->b_data)) {
+			EXT4_ERROR_INODE(dir, "checksumming directory "
+					 "block %lu", (unsigned long)block);
+			brelse(bh);
+			*err = -EIO;
+			goto errout;
+		}
+		set_buffer_verified(bh);
 		retval = search_dirblock(bh, dir, d_name,
 					 block << EXT4_BLOCK_SIZE_BITS(sb),
 					 res_dir);
@@ -1319,8 +1452,14 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
 	char *data1 = (*bh)->b_data, *data2;
 	unsigned split, move, size;
 	struct ext4_dir_entry_2 *de = NULL, *de2;
+	struct ext4_dir_entry_tail *t;
+	int	csum_size = 0;
 	int	err = 0, i;
 
+	if (EXT4_HAS_RO_COMPAT_FEATURE(dir->i_sb,
+				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		csum_size = sizeof(struct ext4_dir_entry_tail);
+
 	bh2 = ext4_append (handle, dir, &newblock, &err);
 	if (!(bh2)) {
 		brelse(*bh);
@@ -1367,10 +1506,20 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
 	/* Fancy dance to stay within two buffers */
 	de2 = dx_move_dirents(data1, data2, map + split, count - split, blocksize);
 	de = dx_pack_dirents(data1, blocksize);
-	de->rec_len = ext4_rec_len_to_disk(data1 + blocksize - (char *) de,
+	de->rec_len = ext4_rec_len_to_disk(data1 + (blocksize - csum_size) -
+					   (char *) de,
 					   blocksize);
-	de2->rec_len = ext4_rec_len_to_disk(data2 + blocksize - (char *) de2,
+	de2->rec_len = ext4_rec_len_to_disk(data2 + (blocksize - csum_size) -
+					    (char *) de2,
 					    blocksize);
+	if (csum_size) {
+		t = EXT4_DIRENT_TAIL(data2, blocksize);
+		initialize_dirent_tail(t, blocksize);
+
+		t = EXT4_DIRENT_TAIL(data1, blocksize);
+		initialize_dirent_tail(t, blocksize);
+	}
+
 	dxtrace(dx_show_leaf (hinfo, (struct ext4_dir_entry_2 *) data1, blocksize, 1));
 	dxtrace(dx_show_leaf (hinfo, (struct ext4_dir_entry_2 *) data2, blocksize, 1));
 
@@ -1381,7 +1530,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
 		de = de2;
 	}
 	dx_insert_block(frame, hash2 + continued, newblock);
-	err = ext4_handle_dirty_metadata(handle, dir, bh2);
+	err = ext4_handle_dirty_dirent_node(handle, dir, bh2);
 	if (err)
 		goto journal_error;
 	err = ext4_handle_dirty_dx_node(handle, dir, frame->bh);
@@ -1421,11 +1570,16 @@ static int add_dirent_to_buf(handle_t *handle, struct dentry *dentry,
 	unsigned short	reclen;
 	int		nlen, rlen, err;
 	char		*top;
+	int		csum_size = 0;
+
+	if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		csum_size = sizeof(struct ext4_dir_entry_tail);
 
 	reclen = EXT4_DIR_REC_LEN(namelen);
 	if (!de) {
 		de = (struct ext4_dir_entry_2 *)bh->b_data;
-		top = bh->b_data + blocksize - reclen;
+		top = bh->b_data + (blocksize - csum_size) - reclen;
 		while ((char *) de <= top) {
 			if (ext4_check_dir_entry(dir, NULL, de, bh, offset))
 				return -EIO;
@@ -1481,7 +1635,7 @@ static int add_dirent_to_buf(handle_t *handle, struct dentry *dentry,
 	dir->i_version++;
 	ext4_mark_inode_dirty(handle, dir);
 	BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata");
-	err = ext4_handle_dirty_metadata(handle, dir, bh);
+	err = ext4_handle_dirty_dirent_node(handle, dir, bh);
 	if (err)
 		ext4_std_error(dir->i_sb, err);
 	return 0;
@@ -1502,6 +1656,7 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
 	struct dx_frame	frames[2], *frame;
 	struct dx_entry *entries;
 	struct ext4_dir_entry_2	*de, *de2;
+	struct ext4_dir_entry_tail *t;
 	char		*data1, *top;
 	unsigned	len;
 	int		retval;
@@ -1509,6 +1664,11 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
 	struct dx_hash_info hinfo;
 	ext4_lblk_t  block;
 	struct fake_dirent *fde;
+	int		csum_size = 0;
+
+	if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		csum_size = sizeof(struct ext4_dir_entry_tail);
 
 	blocksize =  dir->i_sb->s_blocksize;
 	dxtrace(printk(KERN_DEBUG "Creating index: inode %lu\n", dir->i_ino));
@@ -1529,7 +1689,7 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
 		brelse(bh);
 		return -EIO;
 	}
-	len = ((char *) root) + blocksize - (char *) de;
+	len = ((char *) root) + (blocksize - csum_size) - (char *) de;
 
 	/* Allocate new block for the 0th block's dirents */
 	bh2 = ext4_append(handle, dir, &block, &retval);
@@ -1545,8 +1705,15 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
 	top = data1 + len;
 	while ((char *)(de2 = ext4_next_entry(de, blocksize)) < top)
 		de = de2;
-	de->rec_len = ext4_rec_len_to_disk(data1 + blocksize - (char *) de,
+	de->rec_len = ext4_rec_len_to_disk(data1 + (blocksize - csum_size) -
+					   (char *) de,
 					   blocksize);
+
+	if (csum_size) {
+		t = EXT4_DIRENT_TAIL(data1, blocksize);
+		initialize_dirent_tail(t, blocksize);
+	}
+
 	/* Initialize the root; the dot dirents already exist */
 	de = (struct ext4_dir_entry_2 *) (&root->dotdot);
 	de->rec_len = ext4_rec_len_to_disk(blocksize - EXT4_DIR_REC_LEN(2),
@@ -1572,7 +1739,7 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
 	bh = bh2;
 
 	ext4_handle_dirty_dx_node(handle, dir, frame->bh);
-	ext4_handle_dirty_metadata(handle, dir, bh);
+	ext4_handle_dirty_dirent_node(handle, dir, bh);
 
 	de = do_split(handle,dir, &bh, frame, &hinfo, &retval);
 	if (!de) {
@@ -1608,11 +1775,17 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry,
 	struct inode *dir = dentry->d_parent->d_inode;
 	struct buffer_head *bh;
 	struct ext4_dir_entry_2 *de;
+	struct ext4_dir_entry_tail *t;
 	struct super_block *sb;
 	int	retval;
 	int	dx_fallback=0;
 	unsigned blocksize;
 	ext4_lblk_t block, blocks;
+	int	csum_size = 0;
+
+	if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		csum_size = sizeof(struct ext4_dir_entry_tail);
 
 	sb = dir->i_sb;
 	blocksize = sb->s_blocksize;
@@ -1631,6 +1804,11 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry,
 		bh = ext4_bread(handle, dir, block, 0, &retval);
 		if(!bh)
 			return retval;
+		if (!buffer_verified(bh) &&
+		    !ext4_dirent_csum_verify(dir,
+				(struct ext4_dir_entry *)bh->b_data))
+			return -EIO;
+		set_buffer_verified(bh);
 		retval = add_dirent_to_buf(handle, dentry, inode, NULL, bh);
 		if (retval != -ENOSPC) {
 			brelse(bh);
@@ -1647,7 +1825,13 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry,
 		return retval;
 	de = (struct ext4_dir_entry_2 *) bh->b_data;
 	de->inode = 0;
-	de->rec_len = ext4_rec_len_to_disk(blocksize, blocksize);
+	de->rec_len = ext4_rec_len_to_disk(blocksize - csum_size, blocksize);
+
+	if (csum_size) {
+		t = EXT4_DIRENT_TAIL(bh->b_data, blocksize);
+		initialize_dirent_tail(t, blocksize);
+	}
+
 	retval = add_dirent_to_buf(handle, dentry, inode, de, bh);
 	brelse(bh);
 	if (retval == 0)
@@ -1679,6 +1863,11 @@ static int ext4_dx_add_entry(handle_t *handle, struct dentry *dentry,
 	if (!(bh = ext4_bread(handle,dir, dx_get_block(frame->at), 0, &err)))
 		goto cleanup;
 
+	if (!buffer_verified(bh) &&
+	    !ext4_dirent_csum_verify(dir, (struct ext4_dir_entry *)bh->b_data))
+		goto journal_error;
+	set_buffer_verified(bh);
+
 	BUFFER_TRACE(bh, "get_write_access");
 	err = ext4_journal_get_write_access(handle, bh);
 	if (err)
@@ -1804,12 +1993,17 @@ static int ext4_delete_entry(handle_t *handle,
 {
 	struct ext4_dir_entry_2 *de, *pde;
 	unsigned int blocksize = dir->i_sb->s_blocksize;
+	int csum_size = 0;
 	int i, err;
 
+	if (EXT4_HAS_RO_COMPAT_FEATURE(dir->i_sb,
+				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		csum_size = sizeof(struct ext4_dir_entry_tail);
+
 	i = 0;
 	pde = NULL;
 	de = (struct ext4_dir_entry_2 *) bh->b_data;
-	while (i < bh->b_size) {
+	while (i < bh->b_size - csum_size) {
 		if (ext4_check_dir_entry(dir, NULL, de, bh, i))
 			return -EIO;
 		if (de == de_del)  {
@@ -1830,7 +2024,7 @@ static int ext4_delete_entry(handle_t *handle,
 				de->inode = 0;
 			dir->i_version++;
 			BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata");
-			err = ext4_handle_dirty_metadata(handle, dir, bh);
+			err = ext4_handle_dirty_dirent_node(handle, dir, bh);
 			if (unlikely(err)) {
 				ext4_std_error(dir->i_sb, err);
 				return err;
@@ -1972,9 +2166,15 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, int mode)
 	struct inode *inode;
 	struct buffer_head *dir_block = NULL;
 	struct ext4_dir_entry_2 *de;
+	struct ext4_dir_entry_tail *t;
 	unsigned int blocksize = dir->i_sb->s_blocksize;
+	int csum_size = 0;
 	int err, retries = 0;
 
+	if (EXT4_HAS_RO_COMPAT_FEATURE(dir->i_sb,
+				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		csum_size = sizeof(struct ext4_dir_entry_tail);
+
 	if (EXT4_DIR_LINK_MAX(dir))
 		return -EMLINK;
 
@@ -2015,16 +2215,24 @@ retry:
 	ext4_set_de_type(dir->i_sb, de, S_IFDIR);
 	de = ext4_next_entry(de, blocksize);
 	de->inode = cpu_to_le32(dir->i_ino);
-	de->rec_len = ext4_rec_len_to_disk(blocksize - EXT4_DIR_REC_LEN(1),
+	de->rec_len = ext4_rec_len_to_disk(blocksize -
+					   (csum_size + EXT4_DIR_REC_LEN(1)),
 					   blocksize);
 	de->name_len = 2;
 	strcpy(de->name, "..");
 	ext4_set_de_type(dir->i_sb, de, S_IFDIR);
 	set_nlink(inode, 2);
+
+	if (csum_size) {
+		t = EXT4_DIRENT_TAIL(dir_block->b_data, blocksize);
+		initialize_dirent_tail(t, blocksize);
+	}
+
 	BUFFER_TRACE(dir_block, "call ext4_handle_dirty_metadata");
-	err = ext4_handle_dirty_metadata(handle, inode, dir_block);
+	err = ext4_handle_dirty_dirent_node(handle, inode, dir_block);
 	if (err)
 		goto out_clear_inode;
+	set_buffer_verified(dir_block);
 	err = ext4_mark_inode_dirty(handle, inode);
 	if (!err)
 		err = ext4_add_entry(handle, dentry, inode);
@@ -2074,6 +2282,14 @@ static int empty_dir(struct inode *inode)
 				     inode->i_ino);
 		return 1;
 	}
+	if (!buffer_verified(bh) &&
+	    !ext4_dirent_csum_verify(inode,
+			(struct ext4_dir_entry *)bh->b_data)) {
+		EXT4_ERROR_INODE(inode, "checksum error reading directory "
+				 "lblock 0");
+		return -EIO;
+	}
+	set_buffer_verified(bh);
 	de = (struct ext4_dir_entry_2 *) bh->b_data;
 	de1 = ext4_next_entry(de, sb->s_blocksize);
 	if (le32_to_cpu(de->inode) != inode->i_ino ||
@@ -2105,6 +2321,14 @@ static int empty_dir(struct inode *inode)
 				offset += sb->s_blocksize;
 				continue;
 			}
+			if (!buffer_verified(bh) &&
+			    !ext4_dirent_csum_verify(inode,
+					(struct ext4_dir_entry *)bh->b_data)) {
+				EXT4_ERROR_INODE(inode, "checksum error "
+						 "reading directory lblock 0");
+				return -EIO;
+			}
+			set_buffer_verified(bh);
 			de = (struct ext4_dir_entry_2 *) bh->b_data;
 		}
 		if (ext4_check_dir_entry(inode, NULL, de, bh, offset)) {
@@ -2605,6 +2829,11 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 		dir_bh = ext4_bread(handle, old_inode, 0, 0, &retval);
 		if (!dir_bh)
 			goto end_rename;
+		if (!buffer_verified(dir_bh) &&
+		    !ext4_dirent_csum_verify(old_inode,
+				(struct ext4_dir_entry *)dir_bh->b_data))
+			goto end_rename;
+		set_buffer_verified(dir_bh);
 		if (le32_to_cpu(PARENT_INO(dir_bh->b_data,
 				old_dir->i_sb->s_blocksize)) != old_dir->i_ino)
 			goto end_rename;
@@ -2635,7 +2864,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 					ext4_current_time(new_dir);
 		ext4_mark_inode_dirty(handle, new_dir);
 		BUFFER_TRACE(new_bh, "call ext4_handle_dirty_metadata");
-		retval = ext4_handle_dirty_metadata(handle, new_dir, new_bh);
+		retval = ext4_handle_dirty_dirent_node(handle, new_dir, new_bh);
 		if (unlikely(retval)) {
 			ext4_std_error(new_dir->i_sb, retval);
 			goto end_rename;
@@ -2689,7 +2918,8 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 		PARENT_INO(dir_bh->b_data, new_dir->i_sb->s_blocksize) =
 						cpu_to_le32(new_dir->i_ino);
 		BUFFER_TRACE(dir_bh, "call ext4_handle_dirty_metadata");
-		retval = ext4_handle_dirty_metadata(handle, old_inode, dir_bh);
+		retval = ext4_handle_dirty_dirent_node(handle, old_inode,
+						       dir_bh);
 		if (retval) {
 			ext4_std_error(old_dir->i_sb, retval);
 			goto end_rename;


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 12/23] ext4: Calculate and verify checksums of extended attribute blocks
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (10 preceding siblings ...)
  2012-01-07  8:29 ` [PATCH 11/23] ext4: Calculate and verify checksums of directory leaf blocks Darrick J. Wong
@ 2012-01-07  8:29 ` Darrick J. Wong
  2012-01-07  8:29 ` [PATCH 13/23] ext4: Add new feature to make block group checksums use metadata_csum algorithm Darrick J. Wong
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:29 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Calculate and verify the checksums of extended attribute blocks.  This only
applies to separate EA blocks that are pointed to by inode->i_file_acl (i.e.
external EA blocks); the checksum lives in the EA header.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/ext4/xattr.c |   87 ++++++++++++++++++++++++++++++++++++++++++++++---------
 1 files changed, 73 insertions(+), 14 deletions(-)


diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 93a00d8..51d314b 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -122,6 +122,58 @@ const struct xattr_handler *ext4_xattr_handlers[] = {
 	NULL
 };
 
+static __le32 ext4_xattr_block_csum(struct inode *inode,
+				    sector_t block_nr,
+				    struct ext4_xattr_header *hdr)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+	struct ext4_inode_info *ei = EXT4_I(inode);
+	__u32 csum, old;
+
+	old = hdr->h_checksum;
+	hdr->h_checksum = 0;
+	if (le32_to_cpu(hdr->h_refcount) != 1) {
+		block_nr = cpu_to_le64(block_nr);
+		csum = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)&block_nr,
+				   sizeof(block_nr));
+	} else
+		csum = ei->i_csum_seed;
+	csum = ext4_chksum(sbi, csum, (__u8 *)hdr,
+			   EXT4_BLOCK_SIZE(inode->i_sb));
+	hdr->h_checksum = old;
+	return cpu_to_le32(csum);
+}
+
+static int ext4_xattr_block_csum_verify(struct inode *inode,
+					sector_t block_nr,
+					struct ext4_xattr_header *hdr)
+{
+	if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+		EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) &&
+	    (hdr->h_checksum != ext4_xattr_block_csum(inode, block_nr, hdr)))
+		return 0;
+	return 1;
+}
+
+static void ext4_xattr_block_csum_set(struct inode *inode,
+				      sector_t block_nr,
+				      struct ext4_xattr_header *hdr)
+{
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+		EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return;
+
+	hdr->h_checksum = ext4_xattr_block_csum(inode, block_nr, hdr);
+}
+
+static inline int ext4_handle_dirty_xattr_block(handle_t *handle,
+						struct inode *inode,
+						struct buffer_head *bh)
+{
+	ext4_xattr_block_csum_set(inode, bh->b_blocknr, BHDR(bh));
+	return ext4_handle_dirty_metadata(handle, inode, bh);
+}
+
 static inline const struct xattr_handler *
 ext4_xattr_handler(int name_index)
 {
@@ -156,14 +208,21 @@ ext4_xattr_check_names(struct ext4_xattr_entry *entry, void *end)
 }
 
 static inline int
-ext4_xattr_check_block(struct buffer_head *bh)
+ext4_xattr_check_block(struct inode *inode, struct buffer_head *bh)
 {
 	int error;
 
+	if (buffer_verified(bh))
+		return 0;
+
 	if (BHDR(bh)->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC) ||
 	    BHDR(bh)->h_blocks != cpu_to_le32(1))
 		return -EIO;
+	if (!ext4_xattr_block_csum_verify(inode, bh->b_blocknr, BHDR(bh)))
+		return -EIO;
 	error = ext4_xattr_check_names(BFIRST(bh), bh->b_data + bh->b_size);
+	if (!error)
+		set_buffer_verified(bh);
 	return error;
 }
 
@@ -226,7 +285,7 @@ ext4_xattr_block_get(struct inode *inode, int name_index, const char *name,
 		goto cleanup;
 	ea_bdebug(bh, "b_count=%d, refcount=%d",
 		atomic_read(&(bh->b_count)), le32_to_cpu(BHDR(bh)->h_refcount));
-	if (ext4_xattr_check_block(bh)) {
+	if (ext4_xattr_check_block(inode, bh)) {
 bad_block:
 		EXT4_ERROR_INODE(inode, "bad block %llu",
 				 EXT4_I(inode)->i_file_acl);
@@ -370,7 +429,7 @@ ext4_xattr_block_list(struct dentry *dentry, char *buffer, size_t buffer_size)
 		goto cleanup;
 	ea_bdebug(bh, "b_count=%d, refcount=%d",
 		atomic_read(&(bh->b_count)), le32_to_cpu(BHDR(bh)->h_refcount));
-	if (ext4_xattr_check_block(bh)) {
+	if (ext4_xattr_check_block(inode, bh)) {
 		EXT4_ERROR_INODE(inode, "bad block %llu",
 				 EXT4_I(inode)->i_file_acl);
 		error = -EIO;
@@ -489,7 +548,7 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode,
 				 EXT4_FREE_BLOCKS_FORGET);
 	} else {
 		le32_add_cpu(&BHDR(bh)->h_refcount, -1);
-		error = ext4_handle_dirty_metadata(handle, inode, bh);
+		error = ext4_handle_dirty_xattr_block(handle, inode, bh);
 		if (IS_SYNC(inode))
 			ext4_handle_sync(handle);
 		dquot_free_block(inode, 1);
@@ -662,7 +721,7 @@ ext4_xattr_block_find(struct inode *inode, struct ext4_xattr_info *i,
 		ea_bdebug(bs->bh, "b_count=%d, refcount=%d",
 			atomic_read(&(bs->bh->b_count)),
 			le32_to_cpu(BHDR(bs->bh)->h_refcount));
-		if (ext4_xattr_check_block(bs->bh)) {
+		if (ext4_xattr_check_block(inode, bs->bh)) {
 			EXT4_ERROR_INODE(inode, "bad block %llu",
 					 EXT4_I(inode)->i_file_acl);
 			error = -EIO;
@@ -725,9 +784,9 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 			if (error == -EIO)
 				goto bad_block;
 			if (!error)
-				error = ext4_handle_dirty_metadata(handle,
-								   inode,
-								   bs->bh);
+				error = ext4_handle_dirty_xattr_block(handle,
+								      inode,
+								      bs->bh);
 			if (error)
 				goto cleanup;
 			goto inserted;
@@ -796,9 +855,9 @@ inserted:
 				ea_bdebug(new_bh, "reusing; refcount now=%d",
 					le32_to_cpu(BHDR(new_bh)->h_refcount));
 				unlock_buffer(new_bh);
-				error = ext4_handle_dirty_metadata(handle,
-								   inode,
-								   new_bh);
+				error = ext4_handle_dirty_xattr_block(handle,
+								      inode,
+								      new_bh);
 				if (error)
 					goto cleanup_dquot;
 			}
@@ -854,8 +913,8 @@ getblk_failed:
 			set_buffer_uptodate(new_bh);
 			unlock_buffer(new_bh);
 			ext4_xattr_cache_insert(new_bh);
-			error = ext4_handle_dirty_metadata(handle,
-							   inode, new_bh);
+			error = ext4_handle_dirty_xattr_block(handle,
+							      inode, new_bh);
 			if (error)
 				goto cleanup;
 		}
@@ -1192,7 +1251,7 @@ retry:
 		error = -EIO;
 		if (!bh)
 			goto cleanup;
-		if (ext4_xattr_check_block(bh)) {
+		if (ext4_xattr_check_block(inode, bh)) {
 			EXT4_ERROR_INODE(inode, "bad block %llu",
 					 EXT4_I(inode)->i_file_acl);
 			error = -EIO;


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 13/23] ext4: Add new feature to make block group checksums use metadata_csum algorithm
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (11 preceding siblings ...)
  2012-01-07  8:29 ` [PATCH 12/23] ext4: Calculate and verify checksums of extended attribute blocks Darrick J. Wong
@ 2012-01-07  8:29 ` Darrick J. Wong
       [not found]   ` <8ED6E1F9-DB56-4D31-BCA8-2A3A8D514BD5@dilger.ca>
  2012-01-07  8:29 ` [PATCH 14/23] ext4: Add checksums to the MMP block Darrick J. Wong
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:29 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Add a new feature flag to enable block group descriptors to use the (faster)
metadata_csum checksum algorithm.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/ext4/balloc.c  |    2 +-
 fs/ext4/ext4.h    |    4 ++-
 fs/ext4/ialloc.c  |   11 ++++-----
 fs/ext4/mballoc.c |    6 ++---
 fs/ext4/resize.c  |    2 +-
 fs/ext4/super.c   |   65 ++++++++++++++++++++++++++++++++++++++---------------
 6 files changed, 59 insertions(+), 31 deletions(-)


diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index dcac4bd..300f751 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -212,7 +212,7 @@ void ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
 			     sb->s_blocksize * 8, bh->b_data);
 	ext4_block_bitmap_csum_set(sb, block_group, gdp, bh,
 				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
-	gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
+	ext4_group_desc_csum_set(sbi, block_group, gdp);
 }
 
 /* Return the number of free blocks in a block group.  It is used when
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 48536d8..e24386f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2086,10 +2086,10 @@ extern void ext4_used_dirs_set(struct super_block *sb,
 				struct ext4_group_desc *bg, __u32 count);
 extern void ext4_itable_unused_set(struct super_block *sb,
 				   struct ext4_group_desc *bg, __u32 count);
-extern __le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 group,
-				   struct ext4_group_desc *gdp);
 extern int ext4_group_desc_csum_verify(struct ext4_sb_info *sbi, __u32 group,
 				       struct ext4_group_desc *gdp);
+extern void ext4_group_desc_csum_set(struct ext4_sb_info *sbi, __u32 group,
+				     struct ext4_group_desc *gdp);
 
 static inline ext4_fsblk_t ext4_blocks_count(struct ext4_super_block *es)
 {
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 261ffce..6d33995 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -92,7 +92,7 @@ static unsigned ext4_init_inode_bitmap(struct super_block *sb,
 			bh->b_data);
 	ext4_inode_bitmap_csum_set(sb, block_group, gdp, bh,
 				   EXT4_INODES_PER_GROUP(sb) / 8);
-	gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
+	ext4_group_desc_csum_set(sbi, block_group, gdp);
 
 	return EXT4_INODES_PER_GROUP(sb);
 }
@@ -287,7 +287,7 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
 	}
 	ext4_inode_bitmap_csum_set(sb, block_group, gdp, bitmap_bh,
 				   EXT4_INODES_PER_GROUP(sb) / 8);
-	gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
+	ext4_group_desc_csum_set(sbi, block_group, gdp);
 	ext4_unlock_group(sb, block_group);
 
 	percpu_counter_inc(&sbi->s_freeinodes_counter);
@@ -697,7 +697,7 @@ static int ext4_claim_inode(struct super_block *sb,
 	}
 	ext4_inode_bitmap_csum_set(sb, group, gdp, inode_bitmap_bh,
 				   EXT4_INODES_PER_GROUP(sb) / 8);
-	gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp);
+	ext4_group_desc_csum_set(sbi, group, gdp);
 err_ret:
 	ext4_unlock_group(sb, group);
 	up_read(&grp->alloc_sem);
@@ -858,8 +858,7 @@ got:
 						   block_bitmap_bh,
 						   EXT4_BLOCKS_PER_GROUP(sb) /
 						   8);
-			gdp->bg_checksum = ext4_group_desc_csum(sbi, group,
-								gdp);
+			ext4_group_desc_csum_set(sbi, group, gdp);
 		}
 		ext4_unlock_group(sb, group);
 
@@ -1223,7 +1222,7 @@ int ext4_init_inode_table(struct super_block *sb, ext4_group_t group,
 skip_zeroout:
 	ext4_lock_group(sb, group);
 	gdp->bg_flags |= cpu_to_le16(EXT4_BG_INODE_ZEROED);
-	gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp);
+	ext4_group_desc_csum_set(sbi, group, gdp);
 	ext4_unlock_group(sb, group);
 
 	BUFFER_TRACE(group_desc_bh,
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index dbd1453..2bbd9ee 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2919,7 +2919,7 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
 	ext4_free_group_clusters_set(sb, gdp, len);
 	ext4_block_bitmap_csum_set(sb, ac->ac_b_ex.fe_group, gdp, bitmap_bh,
 				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
-	gdp->bg_checksum = ext4_group_desc_csum(sbi, ac->ac_b_ex.fe_group, gdp);
+	ext4_group_desc_csum_set(sbi, ac->ac_b_ex.fe_group, gdp);
 
 	ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
 	percpu_counter_sub(&sbi->s_freeclusters_counter, ac->ac_b_ex.fe_len);
@@ -4787,7 +4787,7 @@ do_more:
 	ext4_free_group_clusters_set(sb, gdp, ret);
 	ext4_block_bitmap_csum_set(sb, block_group, gdp, bitmap_bh,
 				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
-	gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
+	ext4_group_desc_csum_set(sbi, block_group, gdp);
 	ext4_unlock_group(sb, block_group);
 	percpu_counter_add(&sbi->s_freeclusters_counter, count_clusters);
 
@@ -4933,7 +4933,7 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 	ext4_free_group_clusters_set(sb, desc, blk_free_count);
 	ext4_block_bitmap_csum_set(sb, block_group, desc, bitmap_bh,
 				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
-	desc->bg_checksum = ext4_group_desc_csum(sbi, block_group, desc);
+	ext4_group_desc_csum_set(sbi, block_group, desc);
 	ext4_unlock_group(sb, block_group);
 	percpu_counter_add(&sbi->s_freeclusters_counter,
 			   EXT4_B2C(sbi, blocks_freed));
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index c33b72a..083f429 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -880,7 +880,7 @@ int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
 	ext4_free_group_clusters_set(sb, gdp, input->free_blocks_count);
 	ext4_free_inodes_set(sb, gdp, EXT4_INODES_PER_GROUP(sb));
 	gdp->bg_flags = cpu_to_le16(EXT4_BG_INODE_ZEROED);
-	gdp->bg_checksum = ext4_group_desc_csum(sbi, input->group, gdp);
+	ext4_group_desc_csum_set(sbi, input->group, gdp);
 
 	/*
 	 * We can allocate memory for mb_alloc based on the new group
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 99bee03..1b2d91b 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2091,29 +2091,49 @@ failed:
 	return 0;
 }
 
-__le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 block_group,
-			    struct ext4_group_desc *gdp)
+static __le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 block_group,
+				   struct ext4_group_desc *gdp)
 {
+	int offset;
 	__u16 crc = 0;
+	__le32 le_group = cpu_to_le32(block_group);
 
-	if (sbi->s_es->s_feature_ro_compat &
-	    cpu_to_le32(EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
-		int offset = offsetof(struct ext4_group_desc, bg_checksum);
-		__le32 le_group = cpu_to_le32(block_group);
-
-		crc = crc16(~0, sbi->s_es->s_uuid, sizeof(sbi->s_es->s_uuid));
-		crc = crc16(crc, (__u8 *)&le_group, sizeof(le_group));
-		crc = crc16(crc, (__u8 *)gdp, offset);
-		offset += sizeof(gdp->bg_checksum); /* skip checksum */
-		/* for checksum of struct ext4_group_desc do the rest...*/
-		if ((sbi->s_es->s_feature_incompat &
-		     cpu_to_le32(EXT4_FEATURE_INCOMPAT_64BIT)) &&
-		    offset < le16_to_cpu(sbi->s_es->s_desc_size))
-			crc = crc16(crc, (__u8 *)gdp + offset,
-				    le16_to_cpu(sbi->s_es->s_desc_size) -
-					offset);
+	if ((sbi->s_es->s_feature_ro_compat &
+	     cpu_to_le32(EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) &&
+	    (sbi->s_es->s_feature_incompat &
+	     cpu_to_le32(EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM))) {
+		/* Use new metadata_csum algorithm */
+		__u16 old_csum;
+		__u32 csum32;
+
+		old_csum = gdp->bg_checksum;
+		gdp->bg_checksum = 0;
+		csum32 = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)&le_group,
+				     sizeof(le_group));
+		csum32 = ext4_chksum(sbi, csum32, (__u8 *)gdp,
+				     sbi->s_desc_size);
+		gdp->bg_checksum = old_csum;
+
+		crc = csum32 & 0xFFFF;
+		goto out;
 	}
 
+	/* old crc16 code */
+	offset = offsetof(struct ext4_group_desc, bg_checksum);
+
+	crc = crc16(~0, sbi->s_es->s_uuid, sizeof(sbi->s_es->s_uuid));
+	crc = crc16(crc, (__u8 *)&le_group, sizeof(le_group));
+	crc = crc16(crc, (__u8 *)gdp, offset);
+	offset += sizeof(gdp->bg_checksum); /* skip checksum */
+	/* for checksum of struct ext4_group_desc do the rest...*/
+	if ((sbi->s_es->s_feature_incompat &
+	     cpu_to_le32(EXT4_FEATURE_INCOMPAT_64BIT)) &&
+	    offset < le16_to_cpu(sbi->s_es->s_desc_size))
+		crc = crc16(crc, (__u8 *)gdp + offset,
+			    le16_to_cpu(sbi->s_es->s_desc_size) -
+				offset);
+
+out:
 	return cpu_to_le16(crc);
 }
 
@@ -2128,6 +2148,15 @@ int ext4_group_desc_csum_verify(struct ext4_sb_info *sbi, __u32 block_group,
 	return 1;
 }
 
+void ext4_group_desc_csum_set(struct ext4_sb_info *sbi, __u32 block_group,
+			      struct ext4_group_desc *gdp)
+{
+	if (!(sbi->s_es->s_feature_ro_compat &
+	     cpu_to_le32(EXT4_FEATURE_RO_COMPAT_GDT_CSUM)))
+		return;
+	gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
+}
+
 /* Called at mount-time, super-block is locked */
 static int ext4_check_descriptors(struct super_block *sb,
 				  ext4_group_t *first_not_zeroed)


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 14/23] ext4: Add checksums to the MMP block
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (12 preceding siblings ...)
  2012-01-07  8:29 ` [PATCH 13/23] ext4: Add new feature to make block group checksums use metadata_csum algorithm Darrick J. Wong
@ 2012-01-07  8:29 ` Darrick J. Wong
  2012-01-07  8:29 ` [PATCH 15/23] jbd2: Change disk layout for metadata checksumming Darrick J. Wong
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:29 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Compute and verify a checksum for the MMP block.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/ext4/ext4.h |    3 +++
 fs/ext4/mmp.c  |   44 +++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 42 insertions(+), 5 deletions(-)


diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index e24386f..325bae0 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2378,6 +2378,9 @@ extern int ext4_bio_write_page(struct ext4_io_submit *io,
 
 /* mmp.c */
 extern int ext4_multi_mount_protect(struct super_block *, ext4_fsblk_t);
+extern void ext4_mmp_csum_set(struct super_block *sb, struct mmp_struct *mmp);
+extern int ext4_mmp_csum_verify(struct super_block *sb,
+				struct mmp_struct *mmp);
 
 /* BH_Uninit flag: blocks are allocated but uninitialized on disk */
 enum ext4_state_bits {
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index 7ea4ba4..142e9d7 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -6,12 +6,45 @@
 
 #include "ext4.h"
 
+/* Checksumming functions */
+static __u32 ext4_mmp_csum(struct super_block *sb, struct mmp_struct *mmp)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	int offset = offsetof(struct mmp_struct, mmp_checksum);
+	__u32 csum;
+
+	csum = ext4_chksum(sbi, sbi->s_csum_seed, (char *)mmp, offset);
+
+	return cpu_to_le32(csum);
+}
+
+int ext4_mmp_csum_verify(struct super_block *sb, struct mmp_struct *mmp)
+{
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
+				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return 1;
+
+	return mmp->mmp_checksum == ext4_mmp_csum(sb, mmp);
+}
+
+void ext4_mmp_csum_set(struct super_block *sb, struct mmp_struct *mmp)
+{
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
+				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
+		return;
+
+	mmp->mmp_checksum = ext4_mmp_csum(sb, mmp);
+}
+
 /*
  * Write the MMP block using WRITE_SYNC to try to get the block on-disk
  * faster.
  */
-static int write_mmp_block(struct buffer_head *bh)
+static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
 {
+	struct mmp_struct *mmp = (struct mmp_struct *)(bh->b_data);
+
+	ext4_mmp_csum_set(sb, mmp);
 	mark_buffer_dirty(bh);
 	lock_buffer(bh);
 	bh->b_end_io = end_buffer_write_sync;
@@ -59,7 +92,8 @@ static int read_mmp_block(struct super_block *sb, struct buffer_head **bh,
 	}
 
 	mmp = (struct mmp_struct *)((*bh)->b_data);
-	if (le32_to_cpu(mmp->mmp_magic) != EXT4_MMP_MAGIC)
+	if (le32_to_cpu(mmp->mmp_magic) != EXT4_MMP_MAGIC ||
+	    !ext4_mmp_csum_verify(sb, mmp))
 		return -EINVAL;
 
 	return 0;
@@ -120,7 +154,7 @@ static int kmmpd(void *data)
 		mmp->mmp_time = cpu_to_le64(get_seconds());
 		last_update_time = jiffies;
 
-		retval = write_mmp_block(bh);
+		retval = write_mmp_block(sb, bh);
 		/*
 		 * Don't spew too many error messages. Print one every
 		 * (s_mmp_update_interval * 60) seconds.
@@ -200,7 +234,7 @@ static int kmmpd(void *data)
 	mmp->mmp_seq = cpu_to_le32(EXT4_MMP_SEQ_CLEAN);
 	mmp->mmp_time = cpu_to_le64(get_seconds());
 
-	retval = write_mmp_block(bh);
+	retval = write_mmp_block(sb, bh);
 
 failed:
 	kfree(data);
@@ -299,7 +333,7 @@ skip:
 	seq = mmp_new_seq();
 	mmp->mmp_seq = cpu_to_le32(seq);
 
-	retval = write_mmp_block(bh);
+	retval = write_mmp_block(sb, bh);
 	if (retval)
 		goto failed;
 


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 15/23] jbd2: Change disk layout for metadata checksumming
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (13 preceding siblings ...)
  2012-01-07  8:29 ` [PATCH 14/23] ext4: Add checksums to the MMP block Darrick J. Wong
@ 2012-01-07  8:29 ` Darrick J. Wong
  2012-01-07  8:29 ` [PATCH 16/23] jbd2: Enable journal clients to enable v2 checksumming Darrick J. Wong
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:29 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Define flags and allocate space in on-disk journal structures to support
checksumming of journal metadata.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 include/linux/jbd2.h |   28 +++++++++++++++++++++++++++-
 1 files changed, 27 insertions(+), 1 deletions(-)


diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 2092ea2..ecd9b45 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -147,12 +147,24 @@ typedef struct journal_header_s
 #define JBD2_CRC32_CHKSUM   1
 #define JBD2_MD5_CHKSUM     2
 #define JBD2_SHA1_CHKSUM    3
+#define JBD2_CRC32C_CHKSUM  4
 
 #define JBD2_CRC32_CHKSUM_SIZE 4
 
 #define JBD2_CHECKSUM_BYTES (32 / sizeof(u32))
 /*
  * Commit block header for storing transactional checksums:
+ *
+ * NOTE: If FEATURE_COMPAT_CHECKSUM (checksum v1) is set, the h_chksum*
+ * fields are used to store a checksum of the descriptor and data blocks.
+ *
+ * If FEATURE_INCOMPAT_CSUM_V2 (checksum v2) is set, then the h_chksum
+ * field is used to store crc32c(uuid+commit_block).  Each journal metadata
+ * block gets its own checksum, and data block checksums are stored in
+ * journal_block_tag (in the descriptor).  The other h_chksum* fields are
+ * not used.
+ *
+ * Checksum v1 and v2 are mutually exclusive features.
  */
 struct commit_header {
 	__be32		h_magic;
@@ -177,11 +189,17 @@ typedef struct journal_block_tag_s
 	__be32		t_blocknr;	/* The on-disk block number */
 	__be32		t_flags;	/* See below */
 	__be32		t_blocknr_high; /* most-significant high 32bits. */
+	__be32		t_checksum;	/* crc32c(uuid+seq+block) */
 } journal_block_tag_t;
 
 #define JBD2_TAG_SIZE32 (offsetof(journal_block_tag_t, t_blocknr_high))
 #define JBD2_TAG_SIZE64 (sizeof(journal_block_tag_t))
 
+/* Tail of descriptor block, for checksumming */
+struct jbd2_journal_block_tail {
+	__be32		t_checksum;	/* crc32c(uuid+descr_block) */
+};
+
 /*
  * The revoke descriptor: used on disk to describe a series of blocks to
  * be revoked from the log
@@ -192,6 +210,10 @@ typedef struct jbd2_journal_revoke_header_s
 	__be32		 r_count;	/* Count of bytes used in the block */
 } jbd2_journal_revoke_header_t;
 
+/* Tail of revoke block, for checksumming */
+struct jbd2_journal_revoke_tail {
+	__be32		r_checksum;	/* crc32c(uuid+revoke_block) */
+};
 
 /* Definitions for the journal tag flags word: */
 #define JBD2_FLAG_ESCAPE		1	/* on-disk block is escaped */
@@ -241,7 +263,10 @@ typedef struct journal_superblock_s
 	__be32	s_max_trans_data;	/* Limit of data blocks per trans. */
 
 /* 0x0050 */
-	__u32	s_padding[44];
+	__u8	s_checksum_type;	/* checksum type */
+	__u8	s_padding2[3];
+	__u32	s_padding[42];
+	__be32	s_checksum;		/* crc32c(superblock) */
 
 /* 0x0100 */
 	__u8	s_users[16*48];		/* ids of all fs'es sharing the log */
@@ -263,6 +288,7 @@ typedef struct journal_superblock_s
 #define JBD2_FEATURE_INCOMPAT_REVOKE		0x00000001
 #define JBD2_FEATURE_INCOMPAT_64BIT		0x00000002
 #define JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT	0x00000004
+#define JBD2_FEATURE_INCOMPAT_CSUM_V2		0x00000008
 
 /* Features known to this kernel version: */
 #define JBD2_KNOWN_COMPAT_FEATURES	JBD2_FEATURE_COMPAT_CHECKSUM


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 16/23] jbd2: Enable journal clients to enable v2 checksumming
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (14 preceding siblings ...)
  2012-01-07  8:29 ` [PATCH 15/23] jbd2: Change disk layout for metadata checksumming Darrick J. Wong
@ 2012-01-07  8:29 ` Darrick J. Wong
  2012-01-07  8:29 ` [PATCH 17/23] jbd2: Grab a reference to the crc32c driver only when necessary Darrick J. Wong
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:29 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Add in the necessary code so that journal clients can enable the new journal
checksumming features.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/ext4/super.c   |   55 ++++++++++++++++++++++++++++++++++++++++-------------
 fs/jbd2/journal.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 92 insertions(+), 13 deletions(-)


diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 1b2d91b..1d7c5fb 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3172,6 +3172,44 @@ static void ext4_destroy_lazyinit_thread(void)
 	kthread_stop(ext4_lazyinit_task);
 }
 
+static int set_journal_csum_feature_set(struct super_block *sb)
+{
+	int ret = 1;
+	int compat, incompat;
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+	if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
+				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
+		/* journal checksum v2 */
+		compat = 0;
+		incompat = JBD2_FEATURE_INCOMPAT_CSUM_V2;
+	} else {
+		/* journal checksum v1 */
+		compat = JBD2_FEATURE_COMPAT_CHECKSUM;
+		incompat = 0;
+	}
+
+	if (test_opt(sb, JOURNAL_ASYNC_COMMIT)) {
+		ret = jbd2_journal_set_features(sbi->s_journal,
+				compat, 0,
+				JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT |
+				incompat);
+	} else if (test_opt(sb, JOURNAL_CHECKSUM)) {
+		ret = jbd2_journal_set_features(sbi->s_journal,
+				compat, 0,
+				incompat);
+		jbd2_journal_clear_features(sbi->s_journal, 0, 0,
+				JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT);
+	} else {
+		jbd2_journal_clear_features(sbi->s_journal,
+				JBD2_FEATURE_COMPAT_CHECKSUM, 0,
+				JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT |
+				JBD2_FEATURE_INCOMPAT_CSUM_V2);
+	}
+
+	return ret;
+}
+
 static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 {
 	char *orig_data = kstrdup(data, GFP_KERNEL);
@@ -3761,19 +3799,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		goto failed_mount_wq;
 	}
 
-	if (test_opt(sb, JOURNAL_ASYNC_COMMIT)) {
-		jbd2_journal_set_features(sbi->s_journal,
-				JBD2_FEATURE_COMPAT_CHECKSUM, 0,
-				JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT);
-	} else if (test_opt(sb, JOURNAL_CHECKSUM)) {
-		jbd2_journal_set_features(sbi->s_journal,
-				JBD2_FEATURE_COMPAT_CHECKSUM, 0, 0);
-		jbd2_journal_clear_features(sbi->s_journal, 0, 0,
-				JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT);
-	} else {
-		jbd2_journal_clear_features(sbi->s_journal,
-				JBD2_FEATURE_COMPAT_CHECKSUM, 0,
-				JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT);
+	if (!set_journal_csum_feature_set(sb)) {
+		ext4_msg(sb, KERN_ERR, "Failed to set journal checksum "
+			 "feature set");
+		goto failed_mount_wq;
 	}
 
 	/* We have now updated the journal if required, so we can
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 0fa0123..300b46f 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -100,6 +100,15 @@ static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 static void __journal_abort_soft (journal_t *journal, int errno);
 static int jbd2_journal_create_slab(size_t slab_size);
 
+/* Checksumming functions */
+int jbd2_verify_csum_type(journal_t *j, journal_superblock_t *sb)
+{
+	if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+		return 1;
+
+	return sb->s_checksum_type == JBD2_CRC32C_CHKSUM;
+}
+
 /*
  * Helper function used to manage commit timeouts
  */
@@ -1222,6 +1231,9 @@ static int journal_get_superblock(journal_t *journal)
 		}
 	}
 
+	if (buffer_verified(bh))
+		return 0;
+
 	sb = journal->j_superblock;
 
 	err = -EINVAL;
@@ -1259,6 +1271,21 @@ static int journal_get_superblock(journal_t *journal)
 		goto out;
 	}
 
+	if (JBD2_HAS_COMPAT_FEATURE(journal, JBD2_FEATURE_COMPAT_CHECKSUM) &&
+	    JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V2)) {
+		/* Can't have checksum v1 and v2 on at the same time! */
+		printk(KERN_ERR "JBD: Can't enable checksumming v1 and v2 "
+		       "at the same time!\n");
+		goto out;
+	}
+
+	if (!jbd2_verify_csum_type(journal, sb)) {
+		printk(KERN_ERR "JBD: Unknown checksum type\n");
+		goto out;
+	}
+
+	set_buffer_verified(bh);
+
 	return 0;
 
 out:
@@ -1502,6 +1529,10 @@ int jbd2_journal_check_available_features (journal_t *journal, unsigned long com
 int jbd2_journal_set_features (journal_t *journal, unsigned long compat,
 			  unsigned long ro, unsigned long incompat)
 {
+#define INCOMPAT_FEATURE_ON(f) \
+		((incompat & (f)) && !(sb->s_feature_incompat & cpu_to_be32(f)))
+#define COMPAT_FEATURE_ON(f) \
+		((compat & (f)) && !(sb->s_feature_compat & cpu_to_be32(f)))
 	journal_superblock_t *sb;
 
 	if (jbd2_journal_check_used_features(journal, compat, ro, incompat))
@@ -1510,16 +1541,35 @@ int jbd2_journal_set_features (journal_t *journal, unsigned long compat,
 	if (!jbd2_journal_check_available_features(journal, compat, ro, incompat))
 		return 0;
 
+	/* Asking for checksumming v2 and v1?  Only give them v2. */
+	if (incompat & JBD2_FEATURE_INCOMPAT_CSUM_V2 &&
+	    compat & JBD2_FEATURE_COMPAT_CHECKSUM)
+		compat &= ~JBD2_FEATURE_COMPAT_CHECKSUM;
+
 	jbd_debug(1, "Setting new features 0x%lx/0x%lx/0x%lx\n",
 		  compat, ro, incompat);
 
 	sb = journal->j_superblock;
 
+	/* If enabling v2 checksums, update superblock */
+	if (INCOMPAT_FEATURE_ON(JBD2_FEATURE_INCOMPAT_CSUM_V2)) {
+		sb->s_checksum_type = JBD2_CRC32C_CHKSUM;
+		sb->s_feature_compat &=
+			~cpu_to_be32(JBD2_FEATURE_COMPAT_CHECKSUM);
+	}
+
+	/* If enabling v1 checksums, downgrade superblock */
+	if (COMPAT_FEATURE_ON(JBD2_FEATURE_COMPAT_CHECKSUM))
+		sb->s_feature_incompat &=
+			~cpu_to_be32(JBD2_FEATURE_INCOMPAT_CSUM_V2);
+
 	sb->s_feature_compat    |= cpu_to_be32(compat);
 	sb->s_feature_ro_compat |= cpu_to_be32(ro);
 	sb->s_feature_incompat  |= cpu_to_be32(incompat);
 
 	return 1;
+#undef COMPAT_FEATURE_ON
+#undef INCOMPAT_FEATURE_ON
 }
 
 /*


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 17/23] jbd2: Grab a reference to the crc32c driver only when necessary
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (15 preceding siblings ...)
  2012-01-07  8:29 ` [PATCH 16/23] jbd2: Enable journal clients to enable v2 checksumming Darrick J. Wong
@ 2012-01-07  8:29 ` Darrick J. Wong
  2012-01-07  8:29 ` [PATCH 18/23] jbd2: Checksum journal superblock Darrick J. Wong
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:29 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Obtain a reference to the crc32c driver only when the journal needs it for
checksum v2.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/jbd2/Kconfig      |    2 ++
 fs/jbd2/journal.c    |   25 +++++++++++++++++++++++++
 include/linux/jbd2.h |   23 +++++++++++++++++++++++
 3 files changed, 50 insertions(+), 0 deletions(-)


diff --git a/fs/jbd2/Kconfig b/fs/jbd2/Kconfig
index f32f346..69a48c2 100644
--- a/fs/jbd2/Kconfig
+++ b/fs/jbd2/Kconfig
@@ -1,6 +1,8 @@
 config JBD2
 	tristate
 	select CRC32
+	select CRYPTO
+	select CRYPTO_CRC32C
 	help
 	  This is a generic journaling layer for block devices that support
 	  both 32-bit and 64-bit block numbers.  It is currently used by
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 300b46f..bb7f03f 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1284,6 +1284,17 @@ static int journal_get_superblock(journal_t *journal)
 		goto out;
 	}
 
+	/* Load the checksum driver */
+	if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V2)) {
+		journal->j_chksum_driver = crypto_alloc_shash("crc32c", 0, 0);
+		if (IS_ERR(journal->j_chksum_driver)) {
+			printk(KERN_ERR "JBD: Cannot load crc32c driver.\n");
+			err = PTR_ERR(journal->j_chksum_driver);
+			journal->j_chksum_driver = NULL;
+			goto out;
+		}
+	}
+
 	set_buffer_verified(bh);
 
 	return 0;
@@ -1440,6 +1451,8 @@ int jbd2_journal_destroy(journal_t *journal)
 		iput(journal->j_inode);
 	if (journal->j_revoke)
 		jbd2_journal_destroy_revoke(journal);
+	if (journal->j_chksum_driver)
+		crypto_free_shash(journal->j_chksum_driver);
 	kfree(journal->j_wbuf);
 	kfree(journal);
 
@@ -1556,6 +1569,18 @@ int jbd2_journal_set_features (journal_t *journal, unsigned long compat,
 		sb->s_checksum_type = JBD2_CRC32C_CHKSUM;
 		sb->s_feature_compat &=
 			~cpu_to_be32(JBD2_FEATURE_COMPAT_CHECKSUM);
+
+		/* Load the checksum driver */
+		if (journal->j_chksum_driver == NULL) {
+			journal->j_chksum_driver = crypto_alloc_shash("crc32c",
+								      0, 0);
+			if (IS_ERR(journal->j_chksum_driver)) {
+				printk(KERN_ERR "JBD: Cannot load crc32c "
+				       "driver.\n");
+				journal->j_chksum_driver = NULL;
+				return 0;
+			}
+		}
 	}
 
 	/* If enabling v1 checksums, downgrade superblock */
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index ecd9b45..5b2abf9 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -31,6 +31,7 @@
 #include <linux/mutex.h>
 #include <linux/timer.h>
 #include <linux/slab.h>
+#include <crypto/hash.h>
 #endif
 
 #define journal_oom_retry 1
@@ -965,6 +966,9 @@ struct journal_s
 	 * superblock pointer here
 	 */
 	void *j_private;
+
+	/* Reference to checksum algorithm driver via cryptoapi */
+	struct crypto_shash *j_chksum_driver;
 };
 
 /*
@@ -1283,6 +1287,25 @@ static inline int jbd_space_needed(journal_t *journal)
 
 extern int jbd_blocks_per_page(struct inode *inode);
 
+static inline u32 jbd2_chksum(journal_t *journal, u32 crc,
+			      const void *address, unsigned int length)
+{
+	struct {
+		struct shash_desc shash;
+		char ctx[crypto_shash_descsize(journal->j_chksum_driver)];
+	} desc;
+	int err;
+
+	desc.shash.tfm = journal->j_chksum_driver;
+	desc.shash.flags = 0;
+	*(u32 *)desc.ctx = crc;
+
+	err = crypto_shash_update(&desc.shash, address, length);
+	BUG_ON(err);
+
+	return *(u32 *)desc.ctx;
+}
+
 #ifdef __KERNEL__
 
 #define buffer_trace_init(bh)	do {} while (0)


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 18/23] jbd2: Checksum journal superblock
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (16 preceding siblings ...)
  2012-01-07  8:29 ` [PATCH 17/23] jbd2: Grab a reference to the crc32c driver only when necessary Darrick J. Wong
@ 2012-01-07  8:29 ` Darrick J. Wong
  2012-01-07  8:30 ` [PATCH 19/23] jbd2: Checksum revocation blocks Darrick J. Wong
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:29 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Calculate and verify a checksum covering the journal superblock.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/jbd2/journal.c    |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/jbd2.h |    3 +++
 2 files changed, 53 insertions(+), 0 deletions(-)


diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index bb7f03f..bc80533 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -109,6 +109,34 @@ int jbd2_verify_csum_type(journal_t *j, journal_superblock_t *sb)
 	return sb->s_checksum_type == JBD2_CRC32C_CHKSUM;
 }
 
+static __u32 jbd2_superblock_csum(journal_t *j, journal_superblock_t *sb)
+{
+	__u32 csum, old_csum;
+
+	old_csum = sb->s_checksum;
+	sb->s_checksum = 0;
+	csum = jbd2_chksum(j, ~0, (char *)sb, sizeof(journal_superblock_t));
+	sb->s_checksum = old_csum;
+
+	return cpu_to_be32(csum);
+}
+
+int jbd2_superblock_csum_verify(journal_t *j, journal_superblock_t *sb)
+{
+	if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+		return 1;
+
+	return sb->s_checksum == jbd2_superblock_csum(j, sb);
+}
+
+void jbd2_superblock_csum_set(journal_t *j, journal_superblock_t *sb)
+{
+	if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+		return;
+
+	sb->s_checksum = jbd2_superblock_csum(j, sb);
+}
+
 /*
  * Helper function used to manage commit timeouts
  */
@@ -1178,6 +1206,7 @@ void jbd2_journal_update_superblock(journal_t *journal, int wait)
 	sb->s_sequence = cpu_to_be32(journal->j_tail_sequence);
 	sb->s_start    = cpu_to_be32(journal->j_tail);
 	sb->s_errno    = cpu_to_be32(journal->j_errno);
+	jbd2_superblock_csum_set(journal, sb);
 	read_unlock(&journal->j_state_lock);
 
 	BUFFER_TRACE(bh, "marking dirty");
@@ -1295,6 +1324,17 @@ static int journal_get_superblock(journal_t *journal)
 		}
 	}
 
+	/* Check superblock checksum */
+	if (!jbd2_superblock_csum_verify(journal, sb)) {
+		printk(KERN_ERR "JBD: journal checksum error\n");
+		goto out;
+	}
+
+	/* Precompute checksum seed for all metadata */
+	if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+		journal->j_csum_seed = jbd2_chksum(journal, ~0, sb->s_uuid,
+						   sizeof(sb->s_uuid));
+
 	set_buffer_verified(bh);
 
 	return 0;
@@ -1581,6 +1621,13 @@ int jbd2_journal_set_features (journal_t *journal, unsigned long compat,
 				return 0;
 			}
 		}
+
+		/* Precompute checksum seed for all metadata */
+		if (JBD2_HAS_INCOMPAT_FEATURE(journal,
+					      JBD2_FEATURE_INCOMPAT_CSUM_V2))
+			journal->j_csum_seed = jbd2_chksum(journal, ~0,
+							   sb->s_uuid,
+							   sizeof(sb->s_uuid));
 	}
 
 	/* If enabling v1 checksums, downgrade superblock */
@@ -1591,6 +1638,7 @@ int jbd2_journal_set_features (journal_t *journal, unsigned long compat,
 	sb->s_feature_compat    |= cpu_to_be32(compat);
 	sb->s_feature_ro_compat |= cpu_to_be32(ro);
 	sb->s_feature_incompat  |= cpu_to_be32(incompat);
+	jbd2_journal_update_superblock(journal, 0);
 
 	return 1;
 #undef COMPAT_FEATURE_ON
@@ -1621,6 +1669,7 @@ void jbd2_journal_clear_features(journal_t *journal, unsigned long compat,
 	sb->s_feature_compat    &= ~cpu_to_be32(compat);
 	sb->s_feature_ro_compat &= ~cpu_to_be32(ro);
 	sb->s_feature_incompat  &= ~cpu_to_be32(incompat);
+	jbd2_journal_update_superblock(journal, 0);
 }
 EXPORT_SYMBOL(jbd2_journal_clear_features);
 
@@ -1669,6 +1718,7 @@ static int journal_convert_superblock_v1(journal_t *journal,
 
 	sb->s_nr_users = cpu_to_be32(1);
 	sb->s_header.h_blocktype = cpu_to_be32(JBD2_SUPERBLOCK_V2);
+	jbd2_superblock_csum_set(journal, sb);
 	journal->j_format_version = 2;
 
 	bh = journal->j_sb_buffer;
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 5b2abf9..51d4a0b 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -969,6 +969,9 @@ struct journal_s
 
 	/* Reference to checksum algorithm driver via cryptoapi */
 	struct crypto_shash *j_chksum_driver;
+
+	/* Precomputed journal UUID checksum for seeding other checksums */
+	__u32 j_csum_seed;
 };
 
 /*


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 19/23] jbd2: Checksum revocation blocks
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (17 preceding siblings ...)
  2012-01-07  8:29 ` [PATCH 18/23] jbd2: Checksum journal superblock Darrick J. Wong
@ 2012-01-07  8:30 ` Darrick J. Wong
  2012-01-07  8:30 ` [PATCH 20/23] jbd2: Checksum descriptor blocks Darrick J. Wong
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:30 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Compute and verify revoke blocks inside the journal.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/jbd2/recovery.c |   22 ++++++++++++++++++++++
 fs/jbd2/revoke.c   |   27 ++++++++++++++++++++++++++-
 2 files changed, 48 insertions(+), 1 deletions(-)


diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index da6d7ba..bfc52cd 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -703,6 +703,25 @@ static int do_one_pass(journal_t *journal,
 	return err;
 }
 
+static int jbd2_revoke_block_csum_verify(journal_t *j,
+					 void *buf)
+{
+	struct jbd2_journal_revoke_tail *tail;
+	__u32 provided, calculated;
+
+	if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+		return 1;
+
+	tail = (struct jbd2_journal_revoke_tail *)(buf + j->j_blocksize -
+			sizeof(struct jbd2_journal_revoke_tail));
+	provided = tail->r_checksum;
+	tail->r_checksum = 0;
+	calculated = jbd2_chksum(j, j->j_csum_seed, buf, j->j_blocksize);
+	tail->r_checksum = provided;
+
+	provided = be32_to_cpu(provided);
+	return provided == calculated;
+}
 
 /* Scan a revoke record, marking all blocks mentioned as revoked. */
 
@@ -717,6 +736,9 @@ static int scan_revoke_records(journal_t *journal, struct buffer_head *bh,
 	offset = sizeof(jbd2_journal_revoke_header_t);
 	max = be32_to_cpu(header->r_count);
 
+	if (!jbd2_revoke_block_csum_verify(journal, header))
+		return -EINVAL;
+
 	if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_64BIT))
 		record_len = 8;
 
diff --git a/fs/jbd2/revoke.c b/fs/jbd2/revoke.c
index 69fd935..a197ba5 100644
--- a/fs/jbd2/revoke.c
+++ b/fs/jbd2/revoke.c
@@ -548,6 +548,7 @@ static void write_one_revoke_record(journal_t *journal,
 				    struct jbd2_revoke_record_s *record,
 				    int write_op)
 {
+	int csum_size = 0;
 	struct journal_head *descriptor;
 	int offset;
 	journal_header_t *header;
@@ -562,9 +563,13 @@ static void write_one_revoke_record(journal_t *journal,
 	descriptor = *descriptorp;
 	offset = *offsetp;
 
+	/* Do we need to leave space at the end for a checksum? */
+	if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+		csum_size = sizeof(struct jbd2_journal_revoke_tail);
+
 	/* Make sure we have a descriptor with space left for the record */
 	if (descriptor) {
-		if (offset == journal->j_blocksize) {
+		if (offset >= journal->j_blocksize - csum_size) {
 			flush_descriptor(journal, descriptor, offset, write_op);
 			descriptor = NULL;
 		}
@@ -601,6 +606,24 @@ static void write_one_revoke_record(journal_t *journal,
 	*offsetp = offset;
 }
 
+static void jbd2_revoke_csum_set(journal_t *j,
+				 struct journal_head *descriptor)
+{
+	struct jbd2_journal_revoke_tail *tail;
+	__u32 csum;
+
+	if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+		return;
+
+	tail = (struct jbd2_journal_revoke_tail *)
+			(jh2bh(descriptor)->b_data + j->j_blocksize -
+			sizeof(struct jbd2_journal_revoke_tail));
+	tail->r_checksum = 0;
+	csum = jbd2_chksum(j, j->j_csum_seed, jh2bh(descriptor)->b_data,
+			   j->j_blocksize);
+	tail->r_checksum = cpu_to_be32(csum);
+}
+
 /*
  * Flush a revoke descriptor out to the journal.  If we are aborting,
  * this is a noop; otherwise we are generating a buffer which needs to
@@ -622,6 +645,8 @@ static void flush_descriptor(journal_t *journal,
 
 	header = (jbd2_journal_revoke_header_t *) jh2bh(descriptor)->b_data;
 	header->r_count = cpu_to_be32(offset);
+	jbd2_revoke_csum_set(journal, descriptor);
+
 	set_buffer_jwrite(bh);
 	BUFFER_TRACE(bh, "write");
 	set_buffer_dirty(bh);


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 20/23] jbd2: Checksum descriptor blocks
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (18 preceding siblings ...)
  2012-01-07  8:30 ` [PATCH 19/23] jbd2: Checksum revocation blocks Darrick J. Wong
@ 2012-01-07  8:30 ` Darrick J. Wong
  2012-01-07  8:30 ` [PATCH 21/23] jbd2: Checksum commit blocks Darrick J. Wong
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:30 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Calculate and verify a checksum of each descriptor block.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/jbd2/commit.c   |   26 ++++++++++++++++++++++++--
 fs/jbd2/recovery.c |   37 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 60 insertions(+), 3 deletions(-)


diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 68d704d..fab143a 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -302,6 +302,24 @@ static void write_tag_block(int tag_bytes, journal_block_tag_t *tag,
 		tag->t_blocknr_high = cpu_to_be32((block >> 31) >> 1);
 }
 
+static void jbd2_descr_block_csum_set(journal_t *j,
+				      struct journal_head *descriptor)
+{
+	struct jbd2_journal_block_tail *tail;
+	__u32 csum;
+
+	if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+		return;
+
+	tail = (struct jbd2_journal_block_tail *)
+			(jh2bh(descriptor)->b_data + j->j_blocksize -
+			sizeof(struct jbd2_journal_block_tail));
+	tail->t_checksum = 0;
+	csum = jbd2_chksum(j, j->j_csum_seed, jh2bh(descriptor)->b_data,
+			   j->j_blocksize);
+	tail->t_checksum = cpu_to_be32(csum);
+}
+
 /*
  * jbd2_journal_commit_transaction
  *
@@ -331,6 +349,10 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 	struct buffer_head *cbh = NULL; /* For transactional checksums */
 	__u32 crc32_sum = ~0;
 	struct blk_plug plug;
+	int csum_size = 0;
+
+	if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+		csum_size = sizeof(struct jbd2_journal_block_tail);
 
 	/*
 	 * First job: lock down the current transaction and wait for
@@ -623,7 +645,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 
 		if (bufs == journal->j_wbufsize ||
 		    commit_transaction->t_buffers == NULL ||
-		    space_left < tag_bytes + 16) {
+		    space_left < tag_bytes + 16 + csum_size) {
 
 			jbd_debug(4, "JBD2: Submit %d IOs\n", bufs);
 
@@ -632,7 +654,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
                            the last tag we set up. */
 
 			tag->t_flags |= cpu_to_be32(JBD2_FLAG_LAST_TAG);
-
+			jbd2_descr_block_csum_set(journal, descriptor);
 start_journal_io:
 			for (i = 0; i < bufs; i++) {
 				struct buffer_head *bh = wbuf[i];
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index bfc52cd..513006b 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -173,6 +173,25 @@ static int jread(struct buffer_head **bhp, journal_t *journal,
 	return 0;
 }
 
+static int jbd2_descr_block_csum_verify(journal_t *j,
+					void *buf)
+{
+	struct jbd2_journal_block_tail *tail;
+	__u32 provided, calculated;
+
+	if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+		return 1;
+
+	tail = (struct jbd2_journal_block_tail *)(buf + j->j_blocksize -
+			sizeof(struct jbd2_journal_block_tail));
+	provided = tail->t_checksum;
+	tail->t_checksum = 0;
+	calculated = jbd2_chksum(j, j->j_csum_seed, buf, j->j_blocksize);
+	tail->t_checksum = provided;
+
+	provided = be32_to_cpu(provided);
+	return provided == calculated;
+}
 
 /*
  * Count the number of in-use tags in a journal descriptor block.
@@ -185,6 +204,9 @@ static int count_tags(journal_t *journal, struct buffer_head *bh)
 	int			nr = 0, size = journal->j_blocksize;
 	int			tag_bytes = journal_tag_bytes(journal);
 
+	if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+		size -= sizeof(struct jbd2_journal_block_tail);
+
 	tagp = &bh->b_data[sizeof(journal_header_t)];
 
 	while ((tagp - bh->b_data + tag_bytes) <= size) {
@@ -363,6 +385,7 @@ static int do_one_pass(journal_t *journal,
 	int			blocktype;
 	int			tag_bytes = journal_tag_bytes(journal);
 	__u32			crc32_sum = ~0; /* Transactional Checksums */
+	int			descr_csum_size = 0;
 
 	/*
 	 * First thing is to establish what we expect to find in the log
@@ -448,6 +471,18 @@ static int do_one_pass(journal_t *journal,
 
 		switch(blocktype) {
 		case JBD2_DESCRIPTOR_BLOCK:
+			/* Verify checksum first */
+			if (JBD2_HAS_INCOMPAT_FEATURE(journal,
+					JBD2_FEATURE_INCOMPAT_CSUM_V2))
+				descr_csum_size =
+					sizeof(struct jbd2_journal_block_tail);
+			if (descr_csum_size > 0 &&
+			    !jbd2_descr_block_csum_verify(journal,
+							  bh->b_data)) {
+				err = -EIO;
+				goto failed;
+			}
+
 			/* If it is a valid descriptor block, replay it
 			 * in pass REPLAY; if journal_checksums enabled, then
 			 * calculate checksums in PASS_SCAN, otherwise,
@@ -478,7 +513,7 @@ static int do_one_pass(journal_t *journal,
 
 			tagp = &bh->b_data[sizeof(journal_header_t)];
 			while ((tagp - bh->b_data + tag_bytes)
-			       <= journal->j_blocksize) {
+			       <= journal->j_blocksize - descr_csum_size) {
 				unsigned long io_block;
 
 				tag = (journal_block_tag_t *) tagp;


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 21/23] jbd2: Checksum commit blocks
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (19 preceding siblings ...)
  2012-01-07  8:30 ` [PATCH 20/23] jbd2: Checksum descriptor blocks Darrick J. Wong
@ 2012-01-07  8:30 ` Darrick J. Wong
  2012-01-07  8:30 ` [PATCH 22/23] jbd2: Checksum data blocks that are stored in the journal Darrick J. Wong
  2012-01-07  8:30 ` [PATCH 23/23] ext4/jbd2: Add metadata checksumming to the list of supported features Darrick J. Wong
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:30 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Calculate and verify the checksum of commit blocks.  In checksum v2, deprecate
most of the checksum v1 commit block checksum fields, since each block has its
own checksum.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/jbd2/commit.c   |   19 +++++++++++++++++++
 fs/jbd2/recovery.c |   31 +++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+), 0 deletions(-)


diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index fab143a..ccf0b6f 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -86,6 +86,24 @@ nope:
 	__brelse(bh);
 }
 
+static void jbd2_commit_block_csum_set(journal_t *j,
+				       struct journal_head *descriptor)
+{
+	struct commit_header *h;
+	__u32 csum;
+
+	if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+		return;
+
+	h = (struct commit_header *)(jh2bh(descriptor)->b_data);
+	h->h_chksum_type = 0;
+	h->h_chksum_size = 0;
+	h->h_chksum[0] = 0;
+	csum = jbd2_chksum(j, j->j_csum_seed, jh2bh(descriptor)->b_data,
+			   j->j_blocksize);
+	h->h_chksum[0] = cpu_to_be32(csum);
+}
+
 /*
  * Done it all: now submit the commit record.  We should have
  * cleaned up our previous buffers by now, so if we are in abort
@@ -129,6 +147,7 @@ static int journal_submit_commit_record(journal_t *journal,
 		tmp->h_chksum_size 	= JBD2_CRC32_CHKSUM_SIZE;
 		tmp->h_chksum[0] 	= cpu_to_be32(crc32_sum);
 	}
+	jbd2_commit_block_csum_set(journal, descriptor);
 
 	JBUFFER_TRACE(descriptor, "submit commit block");
 	lock_buffer(bh);
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index 513006b..a757d8d 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -372,6 +372,24 @@ static int calc_chksums(journal_t *journal, struct buffer_head *bh,
 	return 0;
 }
 
+static int jbd2_commit_block_csum_verify(journal_t *j, void *buf)
+{
+	struct commit_header *h;
+	__u32 provided, calculated;
+
+	if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+		return 1;
+
+	h = buf;
+	provided = h->h_chksum[0];
+	h->h_chksum[0] = 0;
+	calculated = jbd2_chksum(j, j->j_csum_seed, buf, j->j_blocksize);
+	h->h_chksum[0] = provided;
+
+	provided = be32_to_cpu(provided);
+	return provided == calculated;
+}
+
 static int do_one_pass(journal_t *journal,
 			struct recovery_info *info, enum passtype pass)
 {
@@ -682,6 +700,19 @@ static int do_one_pass(journal_t *journal,
 				}
 				crc32_sum = ~0;
 			}
+			if (pass == PASS_SCAN &&
+			    !jbd2_commit_block_csum_verify(journal,
+							   bh->b_data)) {
+				info->end_transaction = next_commit_ID;
+
+				if (!JBD2_HAS_INCOMPAT_FEATURE(journal,
+				     JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) {
+					journal->j_failed_commit =
+						next_commit_ID;
+					brelse(bh);
+					break;
+				}
+			}
 			brelse(bh);
 			next_commit_ID++;
 			continue;


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 22/23] jbd2: Checksum data blocks that are stored in the journal
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (20 preceding siblings ...)
  2012-01-07  8:30 ` [PATCH 21/23] jbd2: Checksum commit blocks Darrick J. Wong
@ 2012-01-07  8:30 ` Darrick J. Wong
  2012-01-07  8:30 ` [PATCH 23/23] ext4/jbd2: Add metadata checksumming to the list of supported features Darrick J. Wong
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:30 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Calculate and verify checksums of each data block being stored in the journal.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/jbd2/commit.c   |   22 ++++++++++++++++++++++
 fs/jbd2/journal.c  |   10 ++++++++--
 fs/jbd2/recovery.c |   30 ++++++++++++++++++++++++++++++
 3 files changed, 60 insertions(+), 2 deletions(-)


diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index ccf0b6f..25bb1c3 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -339,6 +339,26 @@ static void jbd2_descr_block_csum_set(journal_t *j,
 	tail->t_checksum = cpu_to_be32(csum);
 }
 
+static void jbd2_block_tag_csum_set(journal_t *j, journal_block_tag_t *tag,
+				    struct buffer_head *bh, __u32 sequence)
+{
+	struct page *page = bh->b_page;
+	__u8 *addr;
+	__u32 csum;
+
+	if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+		return;
+
+	sequence = cpu_to_be32(sequence);
+	addr = kmap_atomic(page, KM_USER0);
+	csum = jbd2_chksum(j, j->j_csum_seed, (__u8 *)&sequence,
+			  sizeof(sequence));
+	csum = jbd2_chksum(j, csum, addr + offset_in_page(bh->b_data),
+			  bh->b_size);
+	kunmap_atomic(addr, KM_USER0);
+
+	tag->t_checksum = cpu_to_be32(csum);
+}
 /*
  * jbd2_journal_commit_transaction
  *
@@ -649,6 +669,8 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 		tag = (journal_block_tag_t *) tagp;
 		write_tag_block(tag_bytes, tag, jh2bh(jh)->b_blocknr);
 		tag->t_flags = cpu_to_be32(tag_flag);
+		jbd2_block_tag_csum_set(journal, tag, jh2bh(new_jh),
+					commit_transaction->t_tid);
 		tagp += tag_bytes;
 		space_left -= tag_bytes;
 
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index bc80533..ff4a870 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2005,10 +2005,16 @@ int jbd2_journal_blocks_per_page(struct inode *inode)
  */
 size_t journal_tag_bytes(journal_t *journal)
 {
+	journal_block_tag_t tag;
+	size_t x = 0;
+
+	if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+		x += sizeof(tag.t_checksum);
+
 	if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_64BIT))
-		return JBD2_TAG_SIZE64;
+		return x + JBD2_TAG_SIZE64;
 	else
-		return JBD2_TAG_SIZE32;
+		return x + JBD2_TAG_SIZE32;
 }
 
 /*
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index a757d8d..3dd8544 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -390,6 +390,23 @@ static int jbd2_commit_block_csum_verify(journal_t *j, void *buf)
 	return provided == calculated;
 }
 
+static int jbd2_block_tag_csum_verify(journal_t *j, journal_block_tag_t *tag,
+				      void *buf, __u32 sequence)
+{
+	__u32 provided, calculated;
+
+	if (!JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2))
+		return 1;
+
+	sequence = cpu_to_be32(sequence);
+	calculated = jbd2_chksum(j, j->j_csum_seed, (__u8 *)&sequence,
+				 sizeof(sequence));
+	calculated = jbd2_chksum(j, calculated, buf, j->j_blocksize);
+	provided = be32_to_cpu(tag->t_checksum);
+
+	return provided == cpu_to_be32(calculated);
+}
+
 static int do_one_pass(journal_t *journal,
 			struct recovery_info *info, enum passtype pass)
 {
@@ -566,6 +583,19 @@ static int do_one_pass(journal_t *journal,
 						goto skip_write;
 					}
 
+					/* Look for block corruption */
+					if (!jbd2_block_tag_csum_verify(
+						journal, tag, obh->b_data,
+						be32_to_cpu(tmp->h_sequence))) {
+						brelse(obh);
+						success = -EIO;
+						printk(KERN_ERR "JBD: Invalid "
+						       "checksum recovering "
+						       "block %llu in log\n",
+						       blocknr);
+						continue;
+					}
+
 					/* Find a buffer for the new
 					 * data being restored */
 					nbh = __getblk(journal->j_fs_dev,


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 23/23] ext4/jbd2: Add metadata checksumming to the list of supported features
  2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
                   ` (21 preceding siblings ...)
  2012-01-07  8:30 ` [PATCH 22/23] jbd2: Checksum data blocks that are stored in the journal Darrick J. Wong
@ 2012-01-07  8:30 ` Darrick J. Wong
  22 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-01-07  8:30 UTC (permalink / raw)
  To: Andreas Dilger, Theodore Tso, Darrick J. Wong
  Cc: Sunil Mushran, Martin K Petersen, Greg Freemyer, Amir Goldstein,
	linux-kernel, Andi Kleen, Mingming Cao, Joel Becker,
	linux-fsdevel, linux-ext4, Coly Li

Activate the metadata checksumming feature by adding it to ext4 and jbd2's
lists of supported features.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 fs/ext4/ext4.h       |    6 ++++--
 include/linux/jbd2.h |    3 ++-
 2 files changed, 6 insertions(+), 3 deletions(-)


diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 325bae0..af756bc 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1461,7 +1461,8 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
 					 EXT4_FEATURE_INCOMPAT_EXTENTS| \
 					 EXT4_FEATURE_INCOMPAT_64BIT| \
 					 EXT4_FEATURE_INCOMPAT_FLEX_BG| \
-					 EXT4_FEATURE_INCOMPAT_MMP)
+					 EXT4_FEATURE_INCOMPAT_MMP| \
+					 EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)
 #define EXT4_FEATURE_RO_COMPAT_SUPP	(EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER| \
 					 EXT4_FEATURE_RO_COMPAT_LARGE_FILE| \
 					 EXT4_FEATURE_RO_COMPAT_GDT_CSUM| \
@@ -1469,7 +1470,8 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
 					 EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE | \
 					 EXT4_FEATURE_RO_COMPAT_BTREE_DIR |\
 					 EXT4_FEATURE_RO_COMPAT_HUGE_FILE |\
-					 EXT4_FEATURE_RO_COMPAT_BIGALLOC)
+					 EXT4_FEATURE_RO_COMPAT_BIGALLOC |\
+					 EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
 
 /*
  * Default values for user and/or group using reserved blocks
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 51d4a0b..b1c5857 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -296,7 +296,8 @@ typedef struct journal_superblock_s
 #define JBD2_KNOWN_ROCOMPAT_FEATURES	0
 #define JBD2_KNOWN_INCOMPAT_FEATURES	(JBD2_FEATURE_INCOMPAT_REVOKE | \
 					JBD2_FEATURE_INCOMPAT_64BIT | \
-					JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)
+					JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT | \
+					JBD2_FEATURE_INCOMPAT_CSUM_V2)
 
 #ifdef __KERNEL__
 


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 13/23] ext4: Add new feature to make block group checksums use metadata_csum algorithm
       [not found]   ` <8ED6E1F9-DB56-4D31-BCA8-2A3A8D514BD5@dilger.ca>
@ 2012-02-13 22:28     ` Ted Ts'o
  2012-02-29  1:27       ` [RFC] e2fsprogs: Rework metadata_csum/gdt_csum flag handling Darrick J. Wong
  2012-02-29  1:32       ` [RFC] ext4: Rework metadata_csum/gdt_csum flag handling in kernel Darrick J. Wong
  0 siblings, 2 replies; 31+ messages in thread
From: Ted Ts'o @ 2012-02-13 22:28 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Darrick J. Wong, Sunil Mushran, Martin K Petersen, Greg Freemyer,
	Amir Goldstein, linux-kernel, Andi Kleen, Mingming Cao,
	Joel Becker, linux-fsdevel, linux-ext4, Coly Li

On Mon, Feb 13, 2012 at 02:40:26PM -0700, Andreas Dilger wrote:
> 
> If a kernel understands METADATA_CSUM, it will check this first and
> ignore whether GDT_CSUM is set or not (though it shouldn't ever be
> set at the same time).  Either of these features will cause an older
> kernel to mount the filesystem read-only, which is all that is needed.

... note that when we check for the uninit bits, this will have to be
done if GDT_CSUM || METADATA_CSUM are set.

					- Ted

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC] e2fsprogs: Rework metadata_csum/gdt_csum flag handling
  2012-02-13 22:28     ` Ted Ts'o
@ 2012-02-29  1:27       ` Darrick J. Wong
  2012-02-29  5:40         ` Andreas Dilger
  2012-02-29  1:32       ` [RFC] ext4: Rework metadata_csum/gdt_csum flag handling in kernel Darrick J. Wong
  1 sibling, 1 reply; 31+ messages in thread
From: Darrick J. Wong @ 2012-02-29  1:27 UTC (permalink / raw)
  To: Ted Ts'o, Andreas Dilger, Sunil Mushran, Martin K Petersen,
	Greg Freemyer, Amir Goldstein, linux-kernel, Andi Kleen,
	Mingming Cao, Joel Becker, linux-fsdevel, linux-ext4, Coly Li

Ok, I've reworked the block group descriptor checksum handling code per this
email thread.  INCOMPAT_BG_USE_META_CSUM is gone.  METADATA_CSUM implies (and
in fact overrides) GDT_CSUM.  When both are set, the group descriptor checksum
uses the same function as all other metadata blocks' checksums (by default
crc32c).  I created a helper function to determine if group descriptor
checksums are enabled, and the actual gdt checksum verify/set functions are
smart enough to use the correct function.

Below are the changes that I intend to make to e2fsprogs.  I'll integrate these
changes into the (huge) e2fsprogs patchset, but wanted to aggregate the changes
here first to avoid overwhelming reviewers.  I'll send a kernel patch shortly.

Question: What will happen to old kernels when METADATA_CSUM and GDT_CSUM are
set?  Should tune2fs/e2fsck change METADATA_CSUM|GDT_CSUM to only METADATA_CSUM
if they encounter it?  I'm a little concerned that a pre-METADATA_CSUM kernel
will see the GDT_CSUM flag and assume it's ok to proceed in ro mode and get
confused.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---

 debugfs/debugfs.c        |    3 +--
 e2fsck/pass5.c           |   18 ++++++-----------
 e2fsck/super.c           |    3 +--
 e2fsck/unix.c            |    2 +-
 lib/e2p/feature.c        |    2 --
 lib/ext2fs/alloc.c       |    6 ++----
 lib/ext2fs/alloc_stats.c |    3 +--
 lib/ext2fs/csum.c        |   13 ++++--------
 lib/ext2fs/ext2_fs.h     |    6 +++++-
 lib/ext2fs/ext2fs.h      |   12 ++++++++----
 lib/ext2fs/initialize.c  |    3 +--
 lib/ext2fs/inode.c       |    9 +++------
 lib/ext2fs/openfs.c      |    3 +--
 lib/ext2fs/rw_bitmaps.c  |   12 ++++--------
 misc/dumpe2fs.c          |    4 ++--
 misc/mke2fs.c            |   23 +++++-----------------
 misc/tune2fs.c           |   48 +++-------------------------------------------
 resize/resize2fs.c       |   12 ++++--------
 18 files changed, 52 insertions(+), 130 deletions(-)

diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
index c1cbf06..9c8e48e 100644
--- a/debugfs/debugfs.c
+++ b/debugfs/debugfs.c
@@ -357,8 +357,7 @@ void do_show_super_stats(int argc, char *argv[])
 		return;
 	}
 
-	gdt_csum = EXT2_HAS_RO_COMPAT_FEATURE(current_fs->super,
-					      EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
+	gdt_csum = ext2fs_has_group_desc_csum(current_fs);
 	for (i = 0; i < current_fs->group_desc_count; i++) {
 		fprintf(out, " Group %2d: block bitmap at %llu, "
 		        "inode bitmap at %llu, "
diff --git a/e2fsck/pass5.c b/e2fsck/pass5.c
index f1ce6d7..c5dba0b 100644
--- a/e2fsck/pass5.c
+++ b/e2fsck/pass5.c
@@ -88,7 +88,7 @@ static void check_inode_bitmap_checksum(e2fsck_t ctx)
 	int		nbytes;
 	ext2_ino_t	ino_itr;
 	errcode_t	retval;
-	int		csum_flag = 0;
+	int		csum_flag;
 
 	/* If bitmap is dirty from being fixed, checksum will be corrected */
 	if (ext2fs_test_ib_dirty(ctx->fs))
@@ -103,9 +103,7 @@ static void check_inode_bitmap_checksum(e2fsck_t ctx)
 		fatal_error(ctx, 0);
 	}
 
-	if (EXT2_HAS_RO_COMPAT_FEATURE(ctx->fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
-		csum_flag = 1;
+	csum_flag = ext2fs_has_group_desc_csum(ctx->fs);
 
 	clear_problem_context(&pctx);
 	for (i = 0; i < ctx->fs->group_desc_count; i++) {
@@ -149,7 +147,7 @@ static void check_block_bitmap_checksum(e2fsck_t ctx)
 	int		nbytes;
 	blk64_t		blk_itr;
 	errcode_t	retval;
-	int		csum_flag = 0;
+	int		csum_flag;
 
 	/* If bitmap is dirty from being fixed, checksum will be corrected */
 	if (ext2fs_test_bb_dirty(ctx->fs))
@@ -164,9 +162,7 @@ static void check_block_bitmap_checksum(e2fsck_t ctx)
 		fatal_error(ctx, 0);
 	}
 
-	if (EXT2_HAS_RO_COMPAT_FEATURE(ctx->fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
-		csum_flag = 1;
+	csum_flag = ext2fs_has_group_desc_csum(ctx->fs);
 
 	clear_problem_context(&pctx);
 	for (i = 0; i < ctx->fs->group_desc_count; i++) {
@@ -322,8 +318,7 @@ static void check_block_bitmaps(e2fsck_t ctx)
 		goto errout;
 	}
 
-	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
+	csum_flag = ext2fs_has_group_desc_csum(fs);
 redo_counts:
 	had_problem = 0;
 	save_problem = 0;
@@ -599,8 +594,7 @@ static void check_inode_bitmaps(e2fsck_t ctx)
 		goto errout;
 	}
 
-	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
+	csum_flag = ext2fs_has_group_desc_csum(fs);
 redo_counts:
 	had_problem = 0;
 	save_problem = 0;
diff --git a/e2fsck/super.c b/e2fsck/super.c
index dbd337c..5f6fb08 100644
--- a/e2fsck/super.c
+++ b/e2fsck/super.c
@@ -583,8 +583,7 @@ void check_super_block(e2fsck_t ctx)
 	first_block = sb->s_first_data_block;
 	last_block = ext2fs_blocks_count(sb)-1;
 
-	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
+	csum_flag = ext2fs_has_group_desc_csum(fs);
 	for (i = 0; i < fs->group_desc_count; i++) {
 		pctx.group = i;
 
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index 9319e40..d3fb8f8 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -1658,7 +1658,7 @@ no_journal:
 	}
 
 	if ((run_result & E2F_FLAG_CANCEL) == 0 &&
-	    sb->s_feature_ro_compat & EXT4_FEATURE_RO_COMPAT_GDT_CSUM &&
+	    ext2fs_has_group_desc_csum(ctx->fs) &&
 	    !(ctx->options & E2F_OPT_READONLY)) {
 		retval = ext2fs_set_gdt_csum(ctx->fs);
 		if (retval) {
diff --git a/lib/e2p/feature.c b/lib/e2p/feature.c
index 9f9c6dd..486f846 100644
--- a/lib/e2p/feature.c
+++ b/lib/e2p/feature.c
@@ -87,8 +87,6 @@ static struct feature feature_list[] = {
 			"mmp" },
 	{       E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_FLEX_BG,
 			"flex_bg"},
-	{	E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM,
-			"bg_use_meta_csum"},
 	{	0, 0, 0 },
 };
 
diff --git a/lib/ext2fs/alloc.c b/lib/ext2fs/alloc.c
index 948a0ec..e62ed68 100644
--- a/lib/ext2fs/alloc.c
+++ b/lib/ext2fs/alloc.c
@@ -36,8 +36,7 @@ static void check_block_uninit(ext2_filsys fs, ext2fs_block_bitmap map,
 	blk64_t		blk, super_blk, old_desc_blk, new_desc_blk;
 	int		old_desc_blocks;
 
-	if (!(EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					 EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) ||
+	if (!ext2fs_has_group_desc_csum(fs) ||
 	    !(ext2fs_bg_flags_test(fs, group, EXT2_BG_BLOCK_UNINIT)))
 		return;
 
@@ -83,8 +82,7 @@ static void check_inode_uninit(ext2_filsys fs, ext2fs_inode_bitmap map,
 {
 	ext2_ino_t	i, ino;
 
-	if (!(EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					 EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) ||
+	if (!ext2fs_has_group_desc_csum(fs) ||
 	    !(ext2fs_bg_flags_test(fs, group, EXT2_BG_INODE_UNINIT)))
 		return;
 
diff --git a/lib/ext2fs/alloc_stats.c b/lib/ext2fs/alloc_stats.c
index adec363..4229084 100644
--- a/lib/ext2fs/alloc_stats.c
+++ b/lib/ext2fs/alloc_stats.c
@@ -38,8 +38,7 @@ void ext2fs_inode_alloc_stats2(ext2_filsys fs, ext2_ino_t ino,
 	/* We don't strictly need to be clearing the uninit flag if inuse < 0
 	 * (i.e. freeing inodes) but it also means something is bad. */
 	ext2fs_bg_flags_clear(fs, group, EXT2_BG_INODE_UNINIT);
-	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
+	if (ext2fs_has_group_desc_csum(fs)) {
 		ext2_ino_t first_unused_inode =	fs->super->s_inodes_per_group -
 			ext2fs_bg_itable_unused(fs, group) +
 			group * fs->super->s_inodes_per_group + 1;
diff --git a/lib/ext2fs/csum.c b/lib/ext2fs/csum.c
index 99ca652..425f736 100644
--- a/lib/ext2fs/csum.c
+++ b/lib/ext2fs/csum.c
@@ -743,9 +743,7 @@ STATIC __u16 ext2fs_group_desc_csum(ext2_filsys fs, dgrp_t group)
 #endif
 
 	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) &&
-	    EXT2_HAS_INCOMPAT_FEATURE(fs->super,
-			EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)) {
+			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
 		/* new metadata csum code */
 		__u16 old_crc;
 		__u32 crc32;
@@ -781,8 +779,7 @@ out:
 
 int ext2fs_group_desc_csum_verify(ext2_filsys fs, dgrp_t group)
 {
-	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
+	if (ext2fs_has_group_desc_csum(fs) &&
 	    (ext2fs_bg_checksum(fs, group) !=
 	     ext2fs_group_desc_csum(fs, group)))
 		return 0;
@@ -792,8 +789,7 @@ int ext2fs_group_desc_csum_verify(ext2_filsys fs, dgrp_t group)
 
 void ext2fs_group_desc_csum_set(ext2_filsys fs, dgrp_t group)
 {
-	if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+	if (!ext2fs_has_group_desc_csum(fs))
 		return;
 
 	/* ext2fs_bg_checksum_set() sets the actual checksum field but
@@ -827,8 +823,7 @@ errcode_t ext2fs_set_gdt_csum(ext2_filsys fs)
 	if (!fs->inode_map)
 		return EXT2_ET_NO_INODE_BITMAP;
 
-	if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+	if (!ext2fs_has_group_desc_csum(fs))
 		return 0;
 
 	for (i = 0; i < fs->group_desc_count; i++) {
diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index c2e7fbe..89df977 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -729,6 +729,11 @@ struct ext2_super_block {
 #define EXT4_FEATURE_RO_COMPAT_HAS_SNAPSHOT	0x0080
 #define EXT4_FEATURE_RO_COMPAT_QUOTA		0x0100
 #define EXT4_FEATURE_RO_COMPAT_BIGALLOC		0x0200
+/*
+ * METADATA_CSUM implies GDT_CSUM.  When METADATA_CSUM is set, group
+ * descriptor checksums use the same algorithm as all other data
+ * structures' checksums.
+ */
 #define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM	0x0400
 #define EXT4_FEATURE_RO_COMPAT_REPLICA		0x0800
 
@@ -743,7 +748,6 @@ struct ext2_super_block {
 #define EXT4_FEATURE_INCOMPAT_FLEX_BG		0x0200
 #define EXT4_FEATURE_INCOMPAT_EA_INODE		0x0400
 #define EXT4_FEATURE_INCOMPAT_DIRDATA		0x1000
-#define EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM	0x8000
 
 #define EXT2_FEATURE_COMPAT_SUPP	0
 #define EXT2_FEATURE_INCOMPAT_SUPP    (EXT2_FEATURE_INCOMPAT_FILETYPE| \
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index ff2799a..28cb626 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -579,8 +579,7 @@ typedef struct ext2_icount *ext2_icount_t;
 					 EXT3_FEATURE_INCOMPAT_EXTENTS|\
 					 EXT4_FEATURE_INCOMPAT_FLEX_BG|\
 					 EXT4_FEATURE_INCOMPAT_MMP|\
-					 EXT4_FEATURE_INCOMPAT_64BIT|\
-					 EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)
+					 EXT4_FEATURE_INCOMPAT_64BIT)
 #else
 #define EXT2_LIB_FEATURE_INCOMPAT_SUPP	(EXT2_FEATURE_INCOMPAT_FILETYPE|\
 					 EXT3_FEATURE_INCOMPAT_JOURNAL_DEV|\
@@ -589,8 +588,7 @@ typedef struct ext2_icount *ext2_icount_t;
 					 EXT3_FEATURE_INCOMPAT_EXTENTS|\
 					 EXT4_FEATURE_INCOMPAT_FLEX_BG|\
 					 EXT4_FEATURE_INCOMPAT_MMP|\
-					 EXT4_FEATURE_INCOMPAT_64BIT|\
-					 EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)
+					 EXT4_FEATURE_INCOMPAT_64BIT)
 #endif
 #ifdef CONFIG_QUOTA
 #define EXT2_LIB_FEATURE_RO_COMPAT_SUPP	(EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER|\
@@ -646,6 +644,12 @@ typedef struct stat ext2fs_struct_stat;
 /*
  * function prototypes
  */
+static inline int ext2fs_has_group_desc_csum(ext2_filsys fs)
+{
+	return EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
+			EXT4_FEATURE_RO_COMPAT_GDT_CSUM |
+			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM);
+}
 
 /* alloc.c */
 extern errcode_t ext2fs_new_inode(ext2_filsys fs, ext2_ino_t dir, int mode,
diff --git a/lib/ext2fs/initialize.c b/lib/ext2fs/initialize.c
index a63ea18..a22cab4 100644
--- a/lib/ext2fs/initialize.c
+++ b/lib/ext2fs/initialize.c
@@ -435,8 +435,7 @@ ipg_retry:
 	 * bitmaps will be accounted for when allocated).
 	 */
 	free_blocks = 0;
-	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
+	csum_flag = ext2fs_has_group_desc_csum(fs);
 	for (i = 0; i < fs->group_desc_count; i++) {
 		/*
 		 * Don't set the BLOCK_UNINIT group for the last group
diff --git a/lib/ext2fs/inode.c b/lib/ext2fs/inode.c
index 74703c5..3e6d853 100644
--- a/lib/ext2fs/inode.c
+++ b/lib/ext2fs/inode.c
@@ -157,8 +157,7 @@ errcode_t ext2fs_open_inode_scan(ext2_filsys fs, int buffer_blocks,
 						     scan->current_group);
 	scan->inodes_left = EXT2_INODES_PER_GROUP(scan->fs->super);
 	scan->blocks_left = scan->fs->inode_blocks_per_group;
-	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
+	if (ext2fs_has_group_desc_csum(fs)) {
 		scan->inodes_left -=
 			ext2fs_bg_itable_unused(fs, scan->current_group);
 		scan->blocks_left =
@@ -183,8 +182,7 @@ errcode_t ext2fs_open_inode_scan(ext2_filsys fs, int buffer_blocks,
 	}
 	if (scan->fs->badblocks && scan->fs->badblocks->num)
 		scan->scan_flags |= EXT2_SF_CHK_BADBLOCKS;
-	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+	if (ext2fs_has_group_desc_csum(fs))
 		scan->scan_flags |= EXT2_SF_DO_LAZY;
 	*ret_scan = scan;
 	return 0;
@@ -250,8 +248,7 @@ static errcode_t get_next_blockgroup(ext2_inode_scan scan)
 	scan->bytes_left = 0;
 	scan->inodes_left = EXT2_INODES_PER_GROUP(fs->super);
 	scan->blocks_left = fs->inode_blocks_per_group;
-	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
+	if (ext2fs_has_group_desc_csum(fs)) {
 		scan->inodes_left -=
 			ext2fs_bg_itable_unused(fs, scan->current_group);
 		scan->blocks_left =
diff --git a/lib/ext2fs/openfs.c b/lib/ext2fs/openfs.c
index d2b64f4..2dc9b94 100644
--- a/lib/ext2fs/openfs.c
+++ b/lib/ext2fs/openfs.c
@@ -382,8 +382,7 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
 	 * If recovery is from backup superblock, Clear _UNININT flags &
 	 * reset bg_itable_unused to zero
 	 */
-	if (superblock > 1 && EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
+	if (superblock > 1 && ext2fs_has_group_desc_csum(fs)) {
 		dgrp_t group;
 
 		for (group = 0; group < fs->group_desc_count; group++) {
diff --git a/lib/ext2fs/rw_bitmaps.c b/lib/ext2fs/rw_bitmaps.c
index a5097c1..18e18aa 100644
--- a/lib/ext2fs/rw_bitmaps.c
+++ b/lib/ext2fs/rw_bitmaps.c
@@ -36,7 +36,7 @@ static errcode_t write_bitmaps(ext2_filsys fs, int do_inode, int do_block)
 	unsigned int	nbits;
 	errcode_t	retval;
 	char		*block_buf = NULL, *inode_buf = NULL;
-	int		csum_flag = 0;
+	int		csum_flag;
 	blk64_t		blk;
 	blk64_t		blk_itr = EXT2FS_B2C(fs, fs->super->s_first_data_block);
 	ext2_ino_t	ino_itr = 1;
@@ -46,9 +46,7 @@ static errcode_t write_bitmaps(ext2_filsys fs, int do_inode, int do_block)
 	if (!(fs->flags & EXT2_FLAG_RW))
 		return EXT2_ET_RO_FILSYS;
 
-	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
-		csum_flag = 1;
+	csum_flag = ext2fs_has_group_desc_csum(fs);
 
 	inode_nbytes = block_nbytes = 0;
 	if (do_block) {
@@ -168,7 +166,7 @@ static errcode_t read_bitmaps(ext2_filsys fs, int do_inode, int do_block)
 	errcode_t retval;
 	int block_nbytes = EXT2_CLUSTERS_PER_GROUP(fs->super) / 8;
 	int inode_nbytes = EXT2_INODES_PER_GROUP(fs->super) / 8;
-	int csum_flag = 0;
+	int csum_flag;
 	int do_image = fs->flags & EXT2_FLAG_IMAGE_FILE;
 	unsigned int	cnt;
 	blk64_t	blk;
@@ -181,9 +179,7 @@ static errcode_t read_bitmaps(ext2_filsys fs, int do_inode, int do_block)
 
 	fs->write_bitmaps = ext2fs_write_bitmaps;
 
-	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
-		csum_flag = 1;
+	csum_flag = ext2fs_has_group_desc_csum(fs);
 
 	retval = ext2fs_get_mem(strlen(fs->device_name) + 80, &buf);
 	if (retval)
diff --git a/misc/dumpe2fs.c b/misc/dumpe2fs.c
index b8f386e..3ceb0f8 100644
--- a/misc/dumpe2fs.c
+++ b/misc/dumpe2fs.c
@@ -114,7 +114,7 @@ static void print_bg_opts(ext2_filsys fs, dgrp_t i)
 {
 	int first = 1, bg_flags = 0;
 
-	if (fs->super->s_feature_ro_compat & EXT4_FEATURE_RO_COMPAT_GDT_CSUM)
+	if (ext2fs_has_group_desc_csum(fs))
 		bg_flags = ext2fs_bg_flags(fs, i);
 
 	print_bg_opt(bg_flags, EXT2_BG_INODE_UNINIT, "INODE_UNINIT",
@@ -190,7 +190,7 @@ static void list_desc (ext2_filsys fs)
 		print_range(first_block, last_block);
 		fputs(")", stdout);
 		print_bg_opts(fs, i);
-		if (fs->super->s_feature_ro_compat & EXT4_FEATURE_RO_COMPAT_GDT_CSUM)
+		if (ext2fs_has_group_desc_csum(fs))
 			printf(_("  Checksum 0x%04x, unused inodes %u\n"),
 			       ext2fs_bg_checksum(fs, i),
 			       ext2fs_bg_itable_unused(fs, i));
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index 8852735..f5d3d3b 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -885,8 +885,7 @@ static __u32 ok_features[3] = {
 		EXT2_FEATURE_INCOMPAT_META_BG|
 		EXT4_FEATURE_INCOMPAT_FLEX_BG|
 		EXT4_FEATURE_INCOMPAT_MMP |
-		EXT4_FEATURE_INCOMPAT_64BIT |
-		EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM,
+		EXT4_FEATURE_INCOMPAT_64BIT,
 	/* R/O compat */
 	EXT2_FEATURE_RO_COMPAT_LARGE_FILE|
 		EXT4_FEATURE_RO_COMPAT_HUGE_FILE|
@@ -2049,7 +2048,8 @@ static int should_do_undo(const char *name)
 	int csum_flag, force_undo;
 
 	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(&fs_param,
-					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
+				EXT4_FEATURE_RO_COMPAT_GDT_CSUM |
+				EXT4_FEATURE_RO_COMPAT_METADATA_CSUM);
 	force_undo = get_int_from_profile(fs_types, "force_undo", 0);
 	if (!force_undo && (!csum_flag || !lazy_itable_init))
 		return 0;
@@ -2306,19 +2306,6 @@ int main (int argc, char *argv[])
 	if (!quiet &&
 	    EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
 				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
-		if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
-			printf(_("Group descriptor checksums "
-				 "are not enabled.  This reduces the "
-				 "coverage of metadata checksumming.  "
-				 "Pass -O uninit_bg to rectify.\n"));
-		if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
-		    !EXT2_HAS_INCOMPAT_FEATURE(fs->super,
-				EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM))
-			printf(_("Group descriptor checksums will not use "
-				 "the faster metadata_checksum algorithm.  "
-				 "Pass -O bg_use_meta_csum to rectify.\n"));
 		if (!EXT2_HAS_INCOMPAT_FEATURE(fs->super,
 				EXT3_FEATURE_INCOMPAT_EXTENTS))
 			printf(_("Extents are not enabled.  The file extent "
@@ -2358,6 +2345,7 @@ int main (int argc, char *argv[])
 	    (fs_param.s_feature_ro_compat &
 	     (EXT4_FEATURE_RO_COMPAT_HUGE_FILE|EXT4_FEATURE_RO_COMPAT_GDT_CSUM|
 	      EXT4_FEATURE_RO_COMPAT_DIR_NLINK|
+	      EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|
 	      EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE)))
 		fs->super->s_kbytes_written = 1;
 
@@ -2505,8 +2493,7 @@ int main (int argc, char *argv[])
 		 * inodes as unused; we want e2fsck to consider all
 		 * inodes as potentially containing recoverable data.
 		 */
-		if (fs->super->s_feature_ro_compat &
-		    EXT4_FEATURE_RO_COMPAT_GDT_CSUM) {
+		if (ext2fs_has_group_desc_csum(fs)) {
 			for (i = 1; i < fs->group_desc_count; i++)
 				ext2fs_bg_itable_unused_set(fs, i, 0);
 		}
diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index cba4d4c..5a55412 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -92,7 +92,6 @@ static unsigned long new_inode_size;
 static char *ext_mount_opts;
 static int usrquota, grpquota;
 static int rewrite_checksums;
-static int rewrite_bgs_for_checksum;
 
 int journal_size, journal_flags;
 char *journal_device;
@@ -138,8 +137,7 @@ static __u32 ok_features[3] = {
 	EXT2_FEATURE_INCOMPAT_FILETYPE |
 		EXT3_FEATURE_INCOMPAT_EXTENTS |
 		EXT4_FEATURE_INCOMPAT_FLEX_BG |
-		EXT4_FEATURE_INCOMPAT_MMP |
-		EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM,
+		EXT4_FEATURE_INCOMPAT_MMP,
 	/* R/O compat */
 	EXT2_FEATURE_RO_COMPAT_LARGE_FILE |
 		EXT4_FEATURE_RO_COMPAT_HUGE_FILE|
@@ -159,8 +157,7 @@ static __u32 clear_ok_features[3] = {
 	/* Incompat */
 	EXT2_FEATURE_INCOMPAT_FILETYPE |
 		EXT4_FEATURE_INCOMPAT_FLEX_BG |
-		EXT4_FEATURE_INCOMPAT_MMP |
-		EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM,
+		EXT4_FEATURE_INCOMPAT_MMP,
 	/* R/O compat */
 	EXT2_FEATURE_RO_COMPAT_LARGE_FILE |
 		EXT4_FEATURE_RO_COMPAT_HUGE_FILE|
@@ -718,29 +715,6 @@ static void rewrite_metadata_checksums(ext2_filsys fs)
 }
 
 /*
- * Rewrite just the block group checksums.  Only call this function if
- * you're _not_ calling rewrite_metadata_checksums; this function exists
- * to handle the case that you're changing bg_use_meta_csum and NOT changing
- * either gdt_csum or metadata_csum.
- */
-static void rewrite_bg_checksums(ext2_filsys fs)
-{
-	int i;
-
-	if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					EXT4_FEATURE_RO_COMPAT_GDT_CSUM) ||
-	    !EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
-		return;
-
-	ext2fs_init_csum_seed(fs);
-	for (i = 0; i < fs->group_desc_count; i++)
-		ext2fs_group_desc_csum_set(fs, i);
-	fs->flags &= ~EXT2_FLAG_SUPER_ONLY;
-	ext2fs_mark_super_dirty(fs);
-}
-
-/*
  * Update the feature set as provided by the user.
  */
 static int update_feature_set(ext2_filsys fs, char *features)
@@ -912,20 +886,6 @@ mmp_error:
 		}
 	}
 
-	if (FEATURE_ON(E2P_FEATURE_INCOMPAT,
-		       EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)) {
-		if (check_fsck_needed(fs))
-			exit(1);
-		rewrite_bgs_for_checksum = 1;
-	}
-
-	if (FEATURE_OFF(E2P_FEATURE_INCOMPAT,
-			EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)) {
-		if (check_fsck_needed(fs))
-			exit(1);
-		rewrite_bgs_for_checksum = 1;
-	}
-
 	if (FEATURE_ON(E2P_FEATURE_RO_INCOMPAT,
 		       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
 		if (check_fsck_needed(fs))
@@ -965,7 +925,7 @@ mmp_error:
 			}
 			gd->bg_itable_unused = 0;
 			gd->bg_flags = 0;
-			gd->bg_checksum = 0;
+			ext2fs_group_desc_csum_set(fs, i);
 		}
 		fs->flags &= ~EXT2_FLAG_SUPER_ONLY;
 	}
@@ -2588,8 +2548,6 @@ retry_open:
 	}
 	if (rewrite_checksums)
 		rewrite_metadata_checksums(fs);
-	else if (rewrite_bgs_for_checksum)
-		rewrite_bg_checksums(fs);
 	if (I_flag) {
 		if (mount_flags & EXT2_MF_MOUNTED) {
 			fputs(_("The inode size may only be "
diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index dc2805d..8a02ff4 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -191,8 +191,7 @@ static void fix_uninit_block_bitmaps(ext2_filsys fs)
 	int		old_desc_blocks;
 	dgrp_t		g;
 
-	if (!(EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					 EXT4_FEATURE_RO_COMPAT_GDT_CSUM)))
+	if (!ext2fs_has_group_desc_csum(fs))
 		return;
 
 	for (g=0; g < fs->group_desc_count; g++) {
@@ -482,8 +481,7 @@ retry:
 	group_block = fs->super->s_first_data_block +
 		old_fs->group_desc_count * fs->super->s_blocks_per_group;
 
-	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
+	csum_flag = ext2fs_has_group_desc_csum(fs);
 	adj = old_fs->group_desc_count;
 	max_group = fs->group_desc_count - adj;
 	if (fs->super->s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG)
@@ -743,8 +741,7 @@ static void mark_fs_metablock(ext2_resize_t rfs,
 	} else if (IS_INODE_TB(fs, group, blk)) {
 		ext2fs_inode_table_loc_set(fs, group, 0);
 		rfs->needed_blocks++;
-	} else if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					      EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
+	} else if (ext2fs_has_group_desc_csum(fs) &&
 		   (ext2fs_bg_flags_test(fs, group, EXT2_BG_BLOCK_UNINIT))) {
 		/*
 		 * If the block bitmap is uninitialized, which means
@@ -804,8 +801,7 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
 	for (blk = ext2fs_blocks_count(fs->super);
 	     blk < ext2fs_blocks_count(old_fs->super); blk++) {
 		g = ext2fs_group_of_blk2(fs, blk);
-		if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
+		if (ext2fs_has_group_desc_csum(fs) &&
 		    ext2fs_bg_flags_test(old_fs, g, EXT2_BG_BLOCK_UNINIT)) {
 			/*
 			 * The block bitmap is uninitialized, so skip


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC] ext4: Rework metadata_csum/gdt_csum flag handling in kernel
  2012-02-13 22:28     ` Ted Ts'o
  2012-02-29  1:27       ` [RFC] e2fsprogs: Rework metadata_csum/gdt_csum flag handling Darrick J. Wong
@ 2012-02-29  1:32       ` Darrick J. Wong
  2012-02-29  5:48         ` Andreas Dilger
  1 sibling, 1 reply; 31+ messages in thread
From: Darrick J. Wong @ 2012-02-29  1:32 UTC (permalink / raw)
  To: Ted Ts'o, Andreas Dilger, Sunil Mushran, Martin K Petersen,
	Greg Freemyer, Amir Goldstein, linux-kernel, Andi Kleen,
	Mingming Cao, Joel Becker, linux-fsdevel, linux-ext4, Coly Li

Ok, I've reworked the block group descriptor checksum handling code per this
email thread.  INCOMPAT_BG_USE_META_CSUM is gone.  METADATA_CSUM implies (and
in fact overrides) GDT_CSUM, though the group descriptor checksum uses the same
function as all other metadata blocks' checksums (by default crc32c).  I
created a helper function to determine if group descriptor checksums are
enabled, and the actual gdt checksum verify/set functions are smart enough to
use the correct function.

Below are the changes that I intend to make to the kernel.  I'll integrate these
changes into the (huge) kernel patchset, but wanted to aggregate the changes
here first to avoid overwhelming reviewers.

Question: What will happen to old kernels when METADATA_CSUM and GDT_CSUM are
set?  Should the kernel reject the combination and ask for fsck?  I think it
will be ok, but older kernels might not be...?

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---

 fs/ext4/balloc.c  |    4 ++--
 fs/ext4/ext4.h    |   20 +++++++++++++++-----
 fs/ext4/ialloc.c  |   19 ++++++++-----------
 fs/ext4/inode.c   |    3 +--
 fs/ext4/mballoc.c |    6 +++---
 fs/ext4/resize.c  |    9 +++------
 fs/ext4/super.c   |   23 ++++++++++-------------
 7 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 6eee0e6..b5a7951 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -168,7 +168,7 @@ void ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
 
 	/* If checksum is bad mark all blocks used to prevent allocation
 	 * essentially implementing a per-group read-only flag. */
-	if (!ext4_group_desc_csum_verify(sbi, block_group, gdp)) {
+	if (!ext4_group_desc_csum_verify(sb, block_group, gdp)) {
 		ext4_error(sb, "Checksum bad for group %u", block_group);
 		ext4_free_group_clusters_set(sb, gdp, 0);
 		ext4_free_inodes_set(sb, gdp, 0);
@@ -214,7 +214,7 @@ void ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
 			     sb->s_blocksize * 8, bh->b_data);
 	ext4_block_bitmap_csum_set(sb, block_group, gdp, bh,
 				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
-	ext4_group_desc_csum_set(sbi, block_group, gdp);
+	ext4_group_desc_csum_set(sb, block_group, gdp);
 }
 
 /* Return the number of free blocks in a block group.  It is used when
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 70bd236..a518930 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1434,6 +1434,11 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
 #define EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE	0x0040
 #define EXT4_FEATURE_RO_COMPAT_QUOTA		0x0100
 #define EXT4_FEATURE_RO_COMPAT_BIGALLOC		0x0200
+/*
+ * METADATA_CSUM implies GDT_CSUM.  When METADATA_CSUM is set, group
+ * descriptor checksums use the same algorithm as all other data
+ * structures' checksums.
+ */
 #define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM	0x0400
 
 #define EXT4_FEATURE_INCOMPAT_COMPRESSION	0x0001
@@ -1449,7 +1454,6 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
 #define EXT4_FEATURE_INCOMPAT_DIRDATA		0x1000 /* data in dirent */
 #define EXT4_FEATURE_INCOMPAT_INLINEDATA	0x2000 /* data in inode */
 #define EXT4_FEATURE_INCOMPAT_LARGEDIR		0x4000 /* >2GB or 3-lvl htree */
-#define EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM	0x8000
 
 #define EXT2_FEATURE_COMPAT_SUPP	EXT4_FEATURE_COMPAT_EXT_ATTR
 #define EXT2_FEATURE_INCOMPAT_SUPP	(EXT4_FEATURE_INCOMPAT_FILETYPE| \
@@ -1473,8 +1477,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
 					 EXT4_FEATURE_INCOMPAT_EXTENTS| \
 					 EXT4_FEATURE_INCOMPAT_64BIT| \
 					 EXT4_FEATURE_INCOMPAT_FLEX_BG| \
-					 EXT4_FEATURE_INCOMPAT_MMP| \
-					 EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)
+					 EXT4_FEATURE_INCOMPAT_MMP)
 #define EXT4_FEATURE_RO_COMPAT_SUPP	(EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER| \
 					 EXT4_FEATURE_RO_COMPAT_LARGE_FILE| \
 					 EXT4_FEATURE_RO_COMPAT_GDT_CSUM| \
@@ -2092,11 +2095,18 @@ extern void ext4_used_dirs_set(struct super_block *sb,
 				struct ext4_group_desc *bg, __u32 count);
 extern void ext4_itable_unused_set(struct super_block *sb,
 				   struct ext4_group_desc *bg, __u32 count);
-extern int ext4_group_desc_csum_verify(struct ext4_sb_info *sbi, __u32 group,
+extern int ext4_group_desc_csum_verify(struct super_block *sb, __u32 group,
 				       struct ext4_group_desc *gdp);
-extern void ext4_group_desc_csum_set(struct ext4_sb_info *sbi, __u32 group,
+extern void ext4_group_desc_csum_set(struct super_block *sb, __u32 group,
 				     struct ext4_group_desc *gdp);
 
+static inline int ext4_has_group_desc_csum(struct super_block *sb)
+{
+	return EXT4_HAS_RO_COMPAT_FEATURE(sb,
+					  EXT4_FEATURE_RO_COMPAT_GDT_CSUM |
+					  EXT4_FEATURE_RO_COMPAT_METADATA_CSUM);
+}
+
 static inline ext4_fsblk_t ext4_blocks_count(struct ext4_super_block *es)
 {
 	return ((ext4_fsblk_t)le32_to_cpu(es->s_blocks_count_hi) << 32) |
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index b9b6b27..1ade34d 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -70,13 +70,11 @@ static unsigned ext4_init_inode_bitmap(struct super_block *sb,
 				       ext4_group_t block_group,
 				       struct ext4_group_desc *gdp)
 {
-	struct ext4_sb_info *sbi = EXT4_SB(sb);
-
 	J_ASSERT_BH(bh, buffer_locked(bh));
 
 	/* If checksum is bad mark all blocks and inodes use to prevent
 	 * allocation, essentially implementing a per-group read-only flag. */
-	if (!ext4_group_desc_csum_verify(sbi, block_group, gdp)) {
+	if (!ext4_group_desc_csum_verify(sb, block_group, gdp)) {
 		ext4_error(sb, "Checksum bad for group %u", block_group);
 		ext4_free_group_clusters_set(sb, gdp, 0);
 		ext4_free_inodes_set(sb, gdp, 0);
@@ -92,7 +90,7 @@ static unsigned ext4_init_inode_bitmap(struct super_block *sb,
 			bh->b_data);
 	ext4_inode_bitmap_csum_set(sb, block_group, gdp, bh,
 				   EXT4_INODES_PER_GROUP(sb) / 8);
-	ext4_group_desc_csum_set(sbi, block_group, gdp);
+	ext4_group_desc_csum_set(sb, block_group, gdp);
 
 	return EXT4_INODES_PER_GROUP(sb);
 }
@@ -287,7 +285,7 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
 	}
 	ext4_inode_bitmap_csum_set(sb, block_group, gdp, bitmap_bh,
 				   EXT4_INODES_PER_GROUP(sb) / 8);
-	ext4_group_desc_csum_set(sbi, block_group, gdp);
+	ext4_group_desc_csum_set(sb, block_group, gdp);
 	ext4_unlock_group(sb, block_group);
 
 	percpu_counter_inc(&sbi->s_freeinodes_counter);
@@ -657,8 +655,7 @@ static int ext4_claim_inode(struct super_block *sb,
 	}
 	/* If we didn't allocate from within the initialized part of the inode
 	 * table then we need to initialize up to this inode. */
-	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
-
+	if (ext4_has_group_desc_csum(sb)) {
 		if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
 			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
 			/* When marking the block group with
@@ -697,7 +694,7 @@ static int ext4_claim_inode(struct super_block *sb,
 	}
 	ext4_inode_bitmap_csum_set(sb, group, gdp, inode_bitmap_bh,
 				   EXT4_INODES_PER_GROUP(sb) / 8);
-	ext4_group_desc_csum_set(sbi, group, gdp);
+	ext4_group_desc_csum_set(sb, group, gdp);
 err_ret:
 	ext4_unlock_group(sb, group);
 	up_read(&grp->alloc_sem);
@@ -832,7 +829,7 @@ repeat_in_this_group:
 
 got:
 	/* We may have to initialize the block bitmap if it isn't already */
-	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
+	if (ext4_has_group_desc_csum(sb) &&
 	    gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
 		struct buffer_head *block_bitmap_bh;
 
@@ -858,7 +855,7 @@ got:
 						   block_bitmap_bh,
 						   EXT4_BLOCKS_PER_GROUP(sb) /
 						   8);
-			ext4_group_desc_csum_set(sbi, group, gdp);
+			ext4_group_desc_csum_set(sb, group, gdp);
 		}
 		ext4_unlock_group(sb, group);
 
@@ -1226,7 +1223,7 @@ int ext4_init_inode_table(struct super_block *sb, ext4_group_t group,
 skip_zeroout:
 	ext4_lock_group(sb, group);
 	gdp->bg_flags |= cpu_to_le16(EXT4_BG_INODE_ZEROED);
-	ext4_group_desc_csum_set(sbi, group, gdp);
+	ext4_group_desc_csum_set(sb, group, gdp);
 	ext4_unlock_group(sb, group);
 
 	BUFFER_TRACE(group_desc_bh,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c0200cf..e94ac91 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3573,8 +3573,7 @@ make_io:
 				b = table;
 			end = b + EXT4_SB(sb)->s_inode_readahead_blks;
 			num = EXT4_INODES_PER_GROUP(sb);
-			if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+			if (ext4_has_group_desc_csum(sb))
 				num -= ext4_itable_unused_count(sb, gdp);
 			table += num / inodes_per_block;
 			if (end > table)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5f2e2ed..d6062e7 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2919,7 +2919,7 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
 	ext4_free_group_clusters_set(sb, gdp, len);
 	ext4_block_bitmap_csum_set(sb, ac->ac_b_ex.fe_group, gdp, bitmap_bh,
 				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
-	ext4_group_desc_csum_set(sbi, ac->ac_b_ex.fe_group, gdp);
+	ext4_group_desc_csum_set(sb, ac->ac_b_ex.fe_group, gdp);
 
 	ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
 	percpu_counter_sub(&sbi->s_freeclusters_counter, ac->ac_b_ex.fe_len);
@@ -4787,7 +4787,7 @@ do_more:
 	ext4_free_group_clusters_set(sb, gdp, ret);
 	ext4_block_bitmap_csum_set(sb, block_group, gdp, bitmap_bh,
 				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
-	ext4_group_desc_csum_set(sbi, block_group, gdp);
+	ext4_group_desc_csum_set(sb, block_group, gdp);
 	ext4_unlock_group(sb, block_group);
 	percpu_counter_add(&sbi->s_freeclusters_counter, count_clusters);
 
@@ -4933,7 +4933,7 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 	ext4_free_group_clusters_set(sb, desc, blk_free_count);
 	ext4_block_bitmap_csum_set(sb, block_group, desc, bitmap_bh,
 				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
-	ext4_group_desc_csum_set(sbi, block_group, desc);
+	ext4_group_desc_csum_set(sb, block_group, desc);
 	ext4_unlock_group(sb, block_group);
 	percpu_counter_add(&sbi->s_freeclusters_counter,
 			   EXT4_B2C(sbi, blocks_freed));
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 2363532..21ace95 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1106,7 +1106,7 @@ static int ext4_setup_new_descs(handle_t *handle, struct super_block *sb,
 					     EXT4_B2C(sbi, group_data->free_blocks_count));
 		ext4_free_inodes_set(sb, gdp, EXT4_INODES_PER_GROUP(sb));
 		gdp->bg_flags = cpu_to_le16(*bg_flags);
-		ext4_group_desc_csum_set(sbi, group, gdp);
+		ext4_group_desc_csum_set(sb, group, gdp);
 
 		err = ext4_handle_dirty_metadata(handle, NULL, gdb_bh);
 		if (unlikely(err)) {
@@ -1342,17 +1342,14 @@ static int ext4_setup_next_flex_gd(struct super_block *sb,
 			   (1 + ext4_bg_num_gdb(sb, group + i) +
 			    le16_to_cpu(es->s_reserved_gdt_blocks)) : 0;
 		group_data[i].free_blocks_count = blocks_per_group - overhead;
-		if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
-					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+		if (ext4_has_group_desc_csum(sb))
 			flex_gd->bg_flags[i] = EXT4_BG_BLOCK_UNINIT |
 					       EXT4_BG_INODE_UNINIT;
 		else
 			flex_gd->bg_flags[i] = EXT4_BG_INODE_ZEROED;
 	}
 
-	if (last_group == n_group &&
-	    EXT4_HAS_RO_COMPAT_FEATURE(sb,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+	if (last_group == n_group && ext4_has_group_desc_csum(sb))
 		/* We need to initialize block bitmap of last group. */
 		flex_gd->bg_flags[i - 1] &= ~EXT4_BG_BLOCK_UNINIT;
 
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 2190044..6196bfa 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2097,9 +2097,7 @@ static __le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 block_group,
 	__le32 le_group = cpu_to_le32(block_group);
 
 	if ((sbi->s_es->s_feature_ro_compat &
-	     cpu_to_le32(EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) &&
-	    (sbi->s_es->s_feature_incompat &
-	     cpu_to_le32(EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM))) {
+	     cpu_to_le32(EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))) {
 		/* Use new metadata_csum algorithm */
 		__u16 old_csum;
 		__u32 csum32;
@@ -2135,24 +2133,23 @@ out:
 	return cpu_to_le16(crc);
 }
 
-int ext4_group_desc_csum_verify(struct ext4_sb_info *sbi, __u32 block_group,
+int ext4_group_desc_csum_verify(struct super_block *sb, __u32 block_group,
 				struct ext4_group_desc *gdp)
 {
-	if ((sbi->s_es->s_feature_ro_compat &
-	     cpu_to_le32(EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) &&
-	    (gdp->bg_checksum != ext4_group_desc_csum(sbi, block_group, gdp)))
+	if (ext4_has_group_desc_csum(sb) &&
+	    (gdp->bg_checksum != ext4_group_desc_csum(EXT4_SB(sb),
+						      block_group, gdp)))
 		return 0;
 
 	return 1;
 }
 
-void ext4_group_desc_csum_set(struct ext4_sb_info *sbi, __u32 block_group,
+void ext4_group_desc_csum_set(struct super_block *sb, __u32 block_group,
 			      struct ext4_group_desc *gdp)
 {
-	if (!(sbi->s_es->s_feature_ro_compat &
-	     cpu_to_le32(EXT4_FEATURE_RO_COMPAT_GDT_CSUM)))
+	if (!ext4_has_group_desc_csum(sb))
 		return;
-	gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
+	gdp->bg_checksum = ext4_group_desc_csum(EXT4_SB(sb), block_group, gdp);
 }
 
 /* Called at mount-time, super-block is locked */
@@ -2209,7 +2206,7 @@ static int ext4_check_descriptors(struct super_block *sb,
 			return 0;
 		}
 		ext4_lock_group(sb, i);
-		if (!ext4_group_desc_csum_verify(sbi, i, gdp)) {
+		if (!ext4_group_desc_csum_verify(sb, i, gdp)) {
 			ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: "
 				 "Checksum for group %u failed (%u!=%u)",
 				 i, le16_to_cpu(ext4_group_desc_csum(sbi, i,
@@ -4620,7 +4617,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
 				struct ext4_group_desc *gdp =
 					ext4_get_group_desc(sb, g, NULL);
 
-				if (!ext4_group_desc_csum_verify(sbi, g, gdp)) {
+				if (!ext4_group_desc_csum_verify(sb, g, gdp)) {
 					ext4_msg(sb, KERN_ERR,
 	       "ext4_remount: Checksum for group %u failed (%u!=%u)",
 		g, le16_to_cpu(ext4_group_desc_csum(sbi, g, gdp)),


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [RFC] e2fsprogs: Rework metadata_csum/gdt_csum flag handling
  2012-02-29  1:27       ` [RFC] e2fsprogs: Rework metadata_csum/gdt_csum flag handling Darrick J. Wong
@ 2012-02-29  5:40         ` Andreas Dilger
  2012-03-03  3:50           ` [RFC v2] " Darrick J. Wong
  0 siblings, 1 reply; 31+ messages in thread
From: Andreas Dilger @ 2012-02-29  5:40 UTC (permalink / raw)
  To: djwong
  Cc: Ted Ts'o, Sunil Mushran, Martin K Petersen, Greg Freemyer,
	Amir Goldstein, linux-kernel, Andi Kleen, Mingming Cao,
	Joel Becker, linux-fsdevel, linux-ext4, Coly Li

On 2012-02-28, at 6:27 PM, Darrick J. Wong wrote:
> Ok, I've reworked the block group descriptor checksum handling code per this
> email thread.  INCOMPAT_BG_USE_META_CSUM is gone.  METADATA_CSUM implies (and
> in fact overrides) GDT_CSUM.  When both are set, the group descriptor checksum
> uses the same function as all other metadata blocks' checksums (by default
> crc32c).  I created a helper function to determine if group descriptor
> checksums are enabled, and the actual gdt checksum verify/set functions are
> smart enough to use the correct function.
> 
> Below are the changes that I intend to make to e2fsprogs.  I'll integrate these
> changes into the (huge) e2fsprogs patchset, but wanted to aggregate the changes
> here first to avoid overwhelming reviewers.  I'll send a kernel patch shortly.
> 
> Question: What will happen to old kernels when METADATA_CSUM and GDT_CSUM are
> set?

This should never be allowed by the tools, and should be treated by e2fsck as an error, that is fixed by clearing GDT_CSUM and leaving METADATA_CSUM set.

> Should tune2fs/e2fsck change METADATA_CSUM|GDT_CSUM to only METADATA_CSUM
> if they encounter it?

Yes.

> I'm a little concerned that a pre-METADATA_CSUM kernel will see the GDT_CSUM
> flag and assume it's ok to proceed in ro mode and get confused.

Right, so if tune2fs/mke2fs set METADATA_CSUM and always disable GDT_CSUM at the same time there will be no problem.  e2fsck will correct this in case it is seen in the wild.  This should be rare, since it means the other feature flags are also corrupted, and that will probably force the use of a backup superblock, or make mincemeat of the filesystem for other reasons (bad checksums cannot themselves corrupt the filesystem).

> Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
> ---

Looks like a net win all around.  One comment inline, but you can add my

Acked-by: Andreas Dilger <adilger@dilger.ca>

> debugfs/debugfs.c        |    3 +--
> e2fsck/pass5.c           |   18 ++++++-----------
> e2fsck/super.c           |    3 +--
> e2fsck/unix.c            |    2 +-
> lib/e2p/feature.c        |    2 --
> lib/ext2fs/alloc.c       |    6 ++----
> lib/ext2fs/alloc_stats.c |    3 +--
> lib/ext2fs/csum.c        |   13 ++++--------
> lib/ext2fs/ext2_fs.h     |    6 +++++-
> lib/ext2fs/ext2fs.h      |   12 ++++++++----
> lib/ext2fs/initialize.c  |    3 +--
> lib/ext2fs/inode.c       |    9 +++------
> lib/ext2fs/openfs.c      |    3 +--
> lib/ext2fs/rw_bitmaps.c  |   12 ++++--------
> misc/dumpe2fs.c          |    4 ++--
> misc/mke2fs.c            |   23 +++++-----------------
> misc/tune2fs.c           |   48 +++-------------------------------------------
> resize/resize2fs.c       |   12 ++++--------
> 18 files changed, 52 insertions(+), 130 deletions(-)
> 
> diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
> index c1cbf06..9c8e48e 100644
> --- a/debugfs/debugfs.c
> +++ b/debugfs/debugfs.c
> @@ -357,8 +357,7 @@ void do_show_super_stats(int argc, char *argv[])
> 		return;
> 	}
> 
> -	gdt_csum = EXT2_HAS_RO_COMPAT_FEATURE(current_fs->super,
> -					      EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
> +	gdt_csum = ext2fs_has_group_desc_csum(current_fs);
> 	for (i = 0; i < current_fs->group_desc_count; i++) {
> 		fprintf(out, " Group %2d: block bitmap at %llu, "
> 		        "inode bitmap at %llu, "
> diff --git a/e2fsck/pass5.c b/e2fsck/pass5.c
> index f1ce6d7..c5dba0b 100644
> --- a/e2fsck/pass5.c
> +++ b/e2fsck/pass5.c
> @@ -88,7 +88,7 @@ static void check_inode_bitmap_checksum(e2fsck_t ctx)
> 	int		nbytes;
> 	ext2_ino_t	ino_itr;
> 	errcode_t	retval;
> -	int		csum_flag = 0;
> +	int		csum_flag;
> 
> 	/* If bitmap is dirty from being fixed, checksum will be corrected */
> 	if (ext2fs_test_ib_dirty(ctx->fs))
> @@ -103,9 +103,7 @@ static void check_inode_bitmap_checksum(e2fsck_t ctx)
> 		fatal_error(ctx, 0);
> 	}
> 
> -	if (EXT2_HAS_RO_COMPAT_FEATURE(ctx->fs->super,
> -				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
> -		csum_flag = 1;
> +	csum_flag = ext2fs_has_group_desc_csum(ctx->fs);
> 
> 	clear_problem_context(&pctx);
> 	for (i = 0; i < ctx->fs->group_desc_count; i++) {
> @@ -149,7 +147,7 @@ static void check_block_bitmap_checksum(e2fsck_t ctx)
> 	int		nbytes;
> 	blk64_t		blk_itr;
> 	errcode_t	retval;
> -	int		csum_flag = 0;
> +	int		csum_flag;
> 
> 	/* If bitmap is dirty from being fixed, checksum will be corrected */
> 	if (ext2fs_test_bb_dirty(ctx->fs))
> @@ -164,9 +162,7 @@ static void check_block_bitmap_checksum(e2fsck_t ctx)
> 		fatal_error(ctx, 0);
> 	}
> 
> -	if (EXT2_HAS_RO_COMPAT_FEATURE(ctx->fs->super,
> -				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
> -		csum_flag = 1;
> +	csum_flag = ext2fs_has_group_desc_csum(ctx->fs);
> 
> 	clear_problem_context(&pctx);
> 	for (i = 0; i < ctx->fs->group_desc_count; i++) {
> @@ -322,8 +318,7 @@ static void check_block_bitmaps(e2fsck_t ctx)
> 		goto errout;
> 	}
> 
> -	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
> +	csum_flag = ext2fs_has_group_desc_csum(fs);
> redo_counts:
> 	had_problem = 0;
> 	save_problem = 0;
> @@ -599,8 +594,7 @@ static void check_inode_bitmaps(e2fsck_t ctx)
> 		goto errout;
> 	}
> 
> -	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
> +	csum_flag = ext2fs_has_group_desc_csum(fs);
> redo_counts:
> 	had_problem = 0;
> 	save_problem = 0;
> diff --git a/e2fsck/super.c b/e2fsck/super.c
> index dbd337c..5f6fb08 100644
> --- a/e2fsck/super.c
> +++ b/e2fsck/super.c
> @@ -583,8 +583,7 @@ void check_super_block(e2fsck_t ctx)
> 	first_block = sb->s_first_data_block;
> 	last_block = ext2fs_blocks_count(sb)-1;
> 
> -	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
> +	csum_flag = ext2fs_has_group_desc_csum(fs);
> 	for (i = 0; i < fs->group_desc_count; i++) {
> 		pctx.group = i;
> 
> diff --git a/e2fsck/unix.c b/e2fsck/unix.c
> index 9319e40..d3fb8f8 100644
> --- a/e2fsck/unix.c
> +++ b/e2fsck/unix.c
> @@ -1658,7 +1658,7 @@ no_journal:
> 	}
> 
> 	if ((run_result & E2F_FLAG_CANCEL) == 0 &&
> -	    sb->s_feature_ro_compat & EXT4_FEATURE_RO_COMPAT_GDT_CSUM &&
> +	    ext2fs_has_group_desc_csum(ctx->fs) &&
> 	    !(ctx->options & E2F_OPT_READONLY)) {
> 		retval = ext2fs_set_gdt_csum(ctx->fs);
> 		if (retval) {
> diff --git a/lib/e2p/feature.c b/lib/e2p/feature.c
> index 9f9c6dd..486f846 100644
> --- a/lib/e2p/feature.c
> +++ b/lib/e2p/feature.c
> @@ -87,8 +87,6 @@ static struct feature feature_list[] = {
> 			"mmp" },
> 	{       E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_FLEX_BG,
> 			"flex_bg"},
> -	{	E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM,
> -			"bg_use_meta_csum"},
> 	{	0, 0, 0 },
> };
> 
> diff --git a/lib/ext2fs/alloc.c b/lib/ext2fs/alloc.c
> index 948a0ec..e62ed68 100644
> --- a/lib/ext2fs/alloc.c
> +++ b/lib/ext2fs/alloc.c
> @@ -36,8 +36,7 @@ static void check_block_uninit(ext2_filsys fs, ext2fs_block_bitmap map,
> 	blk64_t		blk, super_blk, old_desc_blk, new_desc_blk;
> 	int		old_desc_blocks;
> 
> -	if (!(EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -					 EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) ||
> +	if (!ext2fs_has_group_desc_csum(fs) ||
> 	    !(ext2fs_bg_flags_test(fs, group, EXT2_BG_BLOCK_UNINIT)))
> 		return;
> 
> @@ -83,8 +82,7 @@ static void check_inode_uninit(ext2_filsys fs, ext2fs_inode_bitmap map,
> {
> 	ext2_ino_t	i, ino;
> 
> -	if (!(EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -					 EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) ||
> +	if (!ext2fs_has_group_desc_csum(fs) ||
> 	    !(ext2fs_bg_flags_test(fs, group, EXT2_BG_INODE_UNINIT)))
> 		return;
> 
> diff --git a/lib/ext2fs/alloc_stats.c b/lib/ext2fs/alloc_stats.c
> index adec363..4229084 100644
> --- a/lib/ext2fs/alloc_stats.c
> +++ b/lib/ext2fs/alloc_stats.c
> @@ -38,8 +38,7 @@ void ext2fs_inode_alloc_stats2(ext2_filsys fs, ext2_ino_t ino,
> 	/* We don't strictly need to be clearing the uninit flag if inuse < 0
> 	 * (i.e. freeing inodes) but it also means something is bad. */
> 	ext2fs_bg_flags_clear(fs, group, EXT2_BG_INODE_UNINIT);
> -	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
> +	if (ext2fs_has_group_desc_csum(fs)) {
> 		ext2_ino_t first_unused_inode =	fs->super->s_inodes_per_group -
> 			ext2fs_bg_itable_unused(fs, group) +
> 			group * fs->super->s_inodes_per_group + 1;
> diff --git a/lib/ext2fs/csum.c b/lib/ext2fs/csum.c
> index 99ca652..425f736 100644
> --- a/lib/ext2fs/csum.c
> +++ b/lib/ext2fs/csum.c
> @@ -743,9 +743,7 @@ STATIC __u16 ext2fs_group_desc_csum(ext2_filsys fs, dgrp_t group)
> #endif
> 
> 	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) &&
> -	    EXT2_HAS_INCOMPAT_FEATURE(fs->super,
> -			EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)) {
> +			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
> 		/* new metadata csum code */
> 		__u16 old_crc;
> 		__u32 crc32;
> @@ -781,8 +779,7 @@ out:
> 
> int ext2fs_group_desc_csum_verify(ext2_filsys fs, dgrp_t group)
> {
> -	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
> +	if (ext2fs_has_group_desc_csum(fs) &&
> 	    (ext2fs_bg_checksum(fs, group) !=
> 	     ext2fs_group_desc_csum(fs, group)))
> 		return 0;
> @@ -792,8 +789,7 @@ int ext2fs_group_desc_csum_verify(ext2_filsys fs, dgrp_t group)
> 
> void ext2fs_group_desc_csum_set(ext2_filsys fs, dgrp_t group)
> {
> -	if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -					EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
> +	if (!ext2fs_has_group_desc_csum(fs))
> 		return;
> 
> 	/* ext2fs_bg_checksum_set() sets the actual checksum field but
> @@ -827,8 +823,7 @@ errcode_t ext2fs_set_gdt_csum(ext2_filsys fs)
> 	if (!fs->inode_map)
> 		return EXT2_ET_NO_INODE_BITMAP;
> 
> -	if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -					EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
> +	if (!ext2fs_has_group_desc_csum(fs))
> 		return 0;
> 
> 	for (i = 0; i < fs->group_desc_count; i++) {
> diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
> index c2e7fbe..89df977 100644
> --- a/lib/ext2fs/ext2_fs.h
> +++ b/lib/ext2fs/ext2_fs.h
> @@ -729,6 +729,11 @@ struct ext2_super_block {
> #define EXT4_FEATURE_RO_COMPAT_HAS_SNAPSHOT	0x0080
> #define EXT4_FEATURE_RO_COMPAT_QUOTA		0x0100
> #define EXT4_FEATURE_RO_COMPAT_BIGALLOC		0x0200
> +/*
> + * METADATA_CSUM implies GDT_CSUM.  When METADATA_CSUM is set, group

This comment should explicitly state that METADATA_CSUM is mutually
exclusive of GDT_CSUM.

> + * descriptor checksums use the same algorithm as all other data
> + * structures' checksums.
> + */
> #define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM	0x0400
> #define EXT4_FEATURE_RO_COMPAT_REPLICA		0x0800
> 
> @@ -743,7 +748,6 @@ struct ext2_super_block {
> #define EXT4_FEATURE_INCOMPAT_FLEX_BG		0x0200
> #define EXT4_FEATURE_INCOMPAT_EA_INODE		0x0400
> #define EXT4_FEATURE_INCOMPAT_DIRDATA		0x1000
> -#define EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM	0x8000
> 
> #define EXT2_FEATURE_COMPAT_SUPP	0
> #define EXT2_FEATURE_INCOMPAT_SUPP    (EXT2_FEATURE_INCOMPAT_FILETYPE| \
> diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> index ff2799a..28cb626 100644
> --- a/lib/ext2fs/ext2fs.h
> +++ b/lib/ext2fs/ext2fs.h
> @@ -579,8 +579,7 @@ typedef struct ext2_icount *ext2_icount_t;
> 					 EXT3_FEATURE_INCOMPAT_EXTENTS|\
> 					 EXT4_FEATURE_INCOMPAT_FLEX_BG|\
> 					 EXT4_FEATURE_INCOMPAT_MMP|\
> -					 EXT4_FEATURE_INCOMPAT_64BIT|\
> -					 EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)
> +					 EXT4_FEATURE_INCOMPAT_64BIT)
> #else
> #define EXT2_LIB_FEATURE_INCOMPAT_SUPP	(EXT2_FEATURE_INCOMPAT_FILETYPE|\
> 					 EXT3_FEATURE_INCOMPAT_JOURNAL_DEV|\
> @@ -589,8 +588,7 @@ typedef struct ext2_icount *ext2_icount_t;
> 					 EXT3_FEATURE_INCOMPAT_EXTENTS|\
> 					 EXT4_FEATURE_INCOMPAT_FLEX_BG|\
> 					 EXT4_FEATURE_INCOMPAT_MMP|\
> -					 EXT4_FEATURE_INCOMPAT_64BIT|\
> -					 EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)
> +					 EXT4_FEATURE_INCOMPAT_64BIT)
> #endif
> #ifdef CONFIG_QUOTA
> #define EXT2_LIB_FEATURE_RO_COMPAT_SUPP	(EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER|\
> @@ -646,6 +644,12 @@ typedef struct stat ext2fs_struct_stat;
> /*
>  * function prototypes
>  */
> +static inline int ext2fs_has_group_desc_csum(ext2_filsys fs)
> +{
> +	return EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> +			EXT4_FEATURE_RO_COMPAT_GDT_CSUM |
> +			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM);
> +}
> 
> /* alloc.c */
> extern errcode_t ext2fs_new_inode(ext2_filsys fs, ext2_ino_t dir, int mode,
> diff --git a/lib/ext2fs/initialize.c b/lib/ext2fs/initialize.c
> index a63ea18..a22cab4 100644
> --- a/lib/ext2fs/initialize.c
> +++ b/lib/ext2fs/initialize.c
> @@ -435,8 +435,7 @@ ipg_retry:
> 	 * bitmaps will be accounted for when allocated).
> 	 */
> 	free_blocks = 0;
> -	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
> +	csum_flag = ext2fs_has_group_desc_csum(fs);
> 	for (i = 0; i < fs->group_desc_count; i++) {
> 		/*
> 		 * Don't set the BLOCK_UNINIT group for the last group
> diff --git a/lib/ext2fs/inode.c b/lib/ext2fs/inode.c
> index 74703c5..3e6d853 100644
> --- a/lib/ext2fs/inode.c
> +++ b/lib/ext2fs/inode.c
> @@ -157,8 +157,7 @@ errcode_t ext2fs_open_inode_scan(ext2_filsys fs, int buffer_blocks,
> 						     scan->current_group);
> 	scan->inodes_left = EXT2_INODES_PER_GROUP(scan->fs->super);
> 	scan->blocks_left = scan->fs->inode_blocks_per_group;
> -	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
> +	if (ext2fs_has_group_desc_csum(fs)) {
> 		scan->inodes_left -=
> 			ext2fs_bg_itable_unused(fs, scan->current_group);
> 		scan->blocks_left =
> @@ -183,8 +182,7 @@ errcode_t ext2fs_open_inode_scan(ext2_filsys fs, int buffer_blocks,
> 	}
> 	if (scan->fs->badblocks && scan->fs->badblocks->num)
> 		scan->scan_flags |= EXT2_SF_CHK_BADBLOCKS;
> -	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
> +	if (ext2fs_has_group_desc_csum(fs))
> 		scan->scan_flags |= EXT2_SF_DO_LAZY;
> 	*ret_scan = scan;
> 	return 0;
> @@ -250,8 +248,7 @@ static errcode_t get_next_blockgroup(ext2_inode_scan scan)
> 	scan->bytes_left = 0;
> 	scan->inodes_left = EXT2_INODES_PER_GROUP(fs->super);
> 	scan->blocks_left = fs->inode_blocks_per_group;
> -	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
> +	if (ext2fs_has_group_desc_csum(fs)) {
> 		scan->inodes_left -=
> 			ext2fs_bg_itable_unused(fs, scan->current_group);
> 		scan->blocks_left =
> diff --git a/lib/ext2fs/openfs.c b/lib/ext2fs/openfs.c
> index d2b64f4..2dc9b94 100644
> --- a/lib/ext2fs/openfs.c
> +++ b/lib/ext2fs/openfs.c
> @@ -382,8 +382,7 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
> 	 * If recovery is from backup superblock, Clear _UNININT flags &
> 	 * reset bg_itable_unused to zero
> 	 */
> -	if (superblock > 1 && EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -					EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
> +	if (superblock > 1 && ext2fs_has_group_desc_csum(fs)) {
> 		dgrp_t group;
> 
> 		for (group = 0; group < fs->group_desc_count; group++) {
> diff --git a/lib/ext2fs/rw_bitmaps.c b/lib/ext2fs/rw_bitmaps.c
> index a5097c1..18e18aa 100644
> --- a/lib/ext2fs/rw_bitmaps.c
> +++ b/lib/ext2fs/rw_bitmaps.c
> @@ -36,7 +36,7 @@ static errcode_t write_bitmaps(ext2_filsys fs, int do_inode, int do_block)
> 	unsigned int	nbits;
> 	errcode_t	retval;
> 	char		*block_buf = NULL, *inode_buf = NULL;
> -	int		csum_flag = 0;
> +	int		csum_flag;
> 	blk64_t		blk;
> 	blk64_t		blk_itr = EXT2FS_B2C(fs, fs->super->s_first_data_block);
> 	ext2_ino_t	ino_itr = 1;
> @@ -46,9 +46,7 @@ static errcode_t write_bitmaps(ext2_filsys fs, int do_inode, int do_block)
> 	if (!(fs->flags & EXT2_FLAG_RW))
> 		return EXT2_ET_RO_FILSYS;
> 
> -	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
> -		csum_flag = 1;
> +	csum_flag = ext2fs_has_group_desc_csum(fs);
> 
> 	inode_nbytes = block_nbytes = 0;
> 	if (do_block) {
> @@ -168,7 +166,7 @@ static errcode_t read_bitmaps(ext2_filsys fs, int do_inode, int do_block)
> 	errcode_t retval;
> 	int block_nbytes = EXT2_CLUSTERS_PER_GROUP(fs->super) / 8;
> 	int inode_nbytes = EXT2_INODES_PER_GROUP(fs->super) / 8;
> -	int csum_flag = 0;
> +	int csum_flag;
> 	int do_image = fs->flags & EXT2_FLAG_IMAGE_FILE;
> 	unsigned int	cnt;
> 	blk64_t	blk;
> @@ -181,9 +179,7 @@ static errcode_t read_bitmaps(ext2_filsys fs, int do_inode, int do_block)
> 
> 	fs->write_bitmaps = ext2fs_write_bitmaps;
> 
> -	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
> -		csum_flag = 1;
> +	csum_flag = ext2fs_has_group_desc_csum(fs);
> 
> 	retval = ext2fs_get_mem(strlen(fs->device_name) + 80, &buf);
> 	if (retval)
> diff --git a/misc/dumpe2fs.c b/misc/dumpe2fs.c
> index b8f386e..3ceb0f8 100644
> --- a/misc/dumpe2fs.c
> +++ b/misc/dumpe2fs.c
> @@ -114,7 +114,7 @@ static void print_bg_opts(ext2_filsys fs, dgrp_t i)
> {
> 	int first = 1, bg_flags = 0;
> 
> -	if (fs->super->s_feature_ro_compat & EXT4_FEATURE_RO_COMPAT_GDT_CSUM)
> +	if (ext2fs_has_group_desc_csum(fs))
> 		bg_flags = ext2fs_bg_flags(fs, i);
> 
> 	print_bg_opt(bg_flags, EXT2_BG_INODE_UNINIT, "INODE_UNINIT",
> @@ -190,7 +190,7 @@ static void list_desc (ext2_filsys fs)
> 		print_range(first_block, last_block);
> 		fputs(")", stdout);
> 		print_bg_opts(fs, i);
> -		if (fs->super->s_feature_ro_compat & EXT4_FEATURE_RO_COMPAT_GDT_CSUM)
> +		if (ext2fs_has_group_desc_csum(fs))
> 			printf(_("  Checksum 0x%04x, unused inodes %u\n"),
> 			       ext2fs_bg_checksum(fs, i),
> 			       ext2fs_bg_itable_unused(fs, i));
> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> index 8852735..f5d3d3b 100644
> --- a/misc/mke2fs.c
> +++ b/misc/mke2fs.c
> @@ -885,8 +885,7 @@ static __u32 ok_features[3] = {
> 		EXT2_FEATURE_INCOMPAT_META_BG|
> 		EXT4_FEATURE_INCOMPAT_FLEX_BG|
> 		EXT4_FEATURE_INCOMPAT_MMP |
> -		EXT4_FEATURE_INCOMPAT_64BIT |
> -		EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM,
> +		EXT4_FEATURE_INCOMPAT_64BIT,
> 	/* R/O compat */
> 	EXT2_FEATURE_RO_COMPAT_LARGE_FILE|
> 		EXT4_FEATURE_RO_COMPAT_HUGE_FILE|
> @@ -2049,7 +2048,8 @@ static int should_do_undo(const char *name)
> 	int csum_flag, force_undo;
> 
> 	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(&fs_param,
> -					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
> +				EXT4_FEATURE_RO_COMPAT_GDT_CSUM |
> +				EXT4_FEATURE_RO_COMPAT_METADATA_CSUM);
> 	force_undo = get_int_from_profile(fs_types, "force_undo", 0);
> 	if (!force_undo && (!csum_flag || !lazy_itable_init))
> 		return 0;
> @@ -2306,19 +2306,6 @@ int main (int argc, char *argv[])
> 	if (!quiet &&
> 	    EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> 				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
> -		if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -				EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
> -			printf(_("Group descriptor checksums "
> -				 "are not enabled.  This reduces the "
> -				 "coverage of metadata checksumming.  "
> -				 "Pass -O uninit_bg to rectify.\n"));
> -		if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -				EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
> -		    !EXT2_HAS_INCOMPAT_FEATURE(fs->super,
> -				EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM))
> -			printf(_("Group descriptor checksums will not use "
> -				 "the faster metadata_checksum algorithm.  "
> -				 "Pass -O bg_use_meta_csum to rectify.\n"));
> 		if (!EXT2_HAS_INCOMPAT_FEATURE(fs->super,
> 				EXT3_FEATURE_INCOMPAT_EXTENTS))
> 			printf(_("Extents are not enabled.  The file extent "
> @@ -2358,6 +2345,7 @@ int main (int argc, char *argv[])
> 	    (fs_param.s_feature_ro_compat &
> 	     (EXT4_FEATURE_RO_COMPAT_HUGE_FILE|EXT4_FEATURE_RO_COMPAT_GDT_CSUM|
> 	      EXT4_FEATURE_RO_COMPAT_DIR_NLINK|
> +	      EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|
> 	      EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE)))
> 		fs->super->s_kbytes_written = 1;
> 
> @@ -2505,8 +2493,7 @@ int main (int argc, char *argv[])
> 		 * inodes as unused; we want e2fsck to consider all
> 		 * inodes as potentially containing recoverable data.
> 		 */
> -		if (fs->super->s_feature_ro_compat &
> -		    EXT4_FEATURE_RO_COMPAT_GDT_CSUM) {
> +		if (ext2fs_has_group_desc_csum(fs)) {
> 			for (i = 1; i < fs->group_desc_count; i++)
> 				ext2fs_bg_itable_unused_set(fs, i, 0);
> 		}
> diff --git a/misc/tune2fs.c b/misc/tune2fs.c
> index cba4d4c..5a55412 100644
> --- a/misc/tune2fs.c
> +++ b/misc/tune2fs.c
> @@ -92,7 +92,6 @@ static unsigned long new_inode_size;
> static char *ext_mount_opts;
> static int usrquota, grpquota;
> static int rewrite_checksums;
> -static int rewrite_bgs_for_checksum;
> 
> int journal_size, journal_flags;
> char *journal_device;
> @@ -138,8 +137,7 @@ static __u32 ok_features[3] = {
> 	EXT2_FEATURE_INCOMPAT_FILETYPE |
> 		EXT3_FEATURE_INCOMPAT_EXTENTS |
> 		EXT4_FEATURE_INCOMPAT_FLEX_BG |
> -		EXT4_FEATURE_INCOMPAT_MMP |
> -		EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM,
> +		EXT4_FEATURE_INCOMPAT_MMP,
> 	/* R/O compat */
> 	EXT2_FEATURE_RO_COMPAT_LARGE_FILE |
> 		EXT4_FEATURE_RO_COMPAT_HUGE_FILE|
> @@ -159,8 +157,7 @@ static __u32 clear_ok_features[3] = {
> 	/* Incompat */
> 	EXT2_FEATURE_INCOMPAT_FILETYPE |
> 		EXT4_FEATURE_INCOMPAT_FLEX_BG |
> -		EXT4_FEATURE_INCOMPAT_MMP |
> -		EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM,
> +		EXT4_FEATURE_INCOMPAT_MMP,
> 	/* R/O compat */
> 	EXT2_FEATURE_RO_COMPAT_LARGE_FILE |
> 		EXT4_FEATURE_RO_COMPAT_HUGE_FILE|
> @@ -718,29 +715,6 @@ static void rewrite_metadata_checksums(ext2_filsys fs)
> }
> 
> /*
> - * Rewrite just the block group checksums.  Only call this function if
> - * you're _not_ calling rewrite_metadata_checksums; this function exists
> - * to handle the case that you're changing bg_use_meta_csum and NOT changing
> - * either gdt_csum or metadata_csum.
> - */
> -static void rewrite_bg_checksums(ext2_filsys fs)
> -{
> -	int i;
> -
> -	if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -					EXT4_FEATURE_RO_COMPAT_GDT_CSUM) ||
> -	    !EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -					EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
> -		return;
> -
> -	ext2fs_init_csum_seed(fs);
> -	for (i = 0; i < fs->group_desc_count; i++)
> -		ext2fs_group_desc_csum_set(fs, i);
> -	fs->flags &= ~EXT2_FLAG_SUPER_ONLY;
> -	ext2fs_mark_super_dirty(fs);
> -}
> -
> -/*
>  * Update the feature set as provided by the user.
>  */
> static int update_feature_set(ext2_filsys fs, char *features)
> @@ -912,20 +886,6 @@ mmp_error:
> 		}
> 	}
> 
> -	if (FEATURE_ON(E2P_FEATURE_INCOMPAT,
> -		       EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)) {
> -		if (check_fsck_needed(fs))
> -			exit(1);
> -		rewrite_bgs_for_checksum = 1;
> -	}
> -
> -	if (FEATURE_OFF(E2P_FEATURE_INCOMPAT,
> -			EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)) {
> -		if (check_fsck_needed(fs))
> -			exit(1);
> -		rewrite_bgs_for_checksum = 1;
> -	}
> -
> 	if (FEATURE_ON(E2P_FEATURE_RO_INCOMPAT,
> 		       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
> 		if (check_fsck_needed(fs))
> @@ -965,7 +925,7 @@ mmp_error:
> 			}
> 			gd->bg_itable_unused = 0;
> 			gd->bg_flags = 0;
> -			gd->bg_checksum = 0;
> +			ext2fs_group_desc_csum_set(fs, i);
> 		}
> 		fs->flags &= ~EXT2_FLAG_SUPER_ONLY;
> 	}
> @@ -2588,8 +2548,6 @@ retry_open:
> 	}
> 	if (rewrite_checksums)
> 		rewrite_metadata_checksums(fs);
> -	else if (rewrite_bgs_for_checksum)
> -		rewrite_bg_checksums(fs);
> 	if (I_flag) {
> 		if (mount_flags & EXT2_MF_MOUNTED) {
> 			fputs(_("The inode size may only be "
> diff --git a/resize/resize2fs.c b/resize/resize2fs.c
> index dc2805d..8a02ff4 100644
> --- a/resize/resize2fs.c
> +++ b/resize/resize2fs.c
> @@ -191,8 +191,7 @@ static void fix_uninit_block_bitmaps(ext2_filsys fs)
> 	int		old_desc_blocks;
> 	dgrp_t		g;
> 
> -	if (!(EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -					 EXT4_FEATURE_RO_COMPAT_GDT_CSUM)))
> +	if (!ext2fs_has_group_desc_csum(fs))
> 		return;
> 
> 	for (g=0; g < fs->group_desc_count; g++) {
> @@ -482,8 +481,7 @@ retry:
> 	group_block = fs->super->s_first_data_block +
> 		old_fs->group_desc_count * fs->super->s_blocks_per_group;
> 
> -	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
> +	csum_flag = ext2fs_has_group_desc_csum(fs);
> 	adj = old_fs->group_desc_count;
> 	max_group = fs->group_desc_count - adj;
> 	if (fs->super->s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG)
> @@ -743,8 +741,7 @@ static void mark_fs_metablock(ext2_resize_t rfs,
> 	} else if (IS_INODE_TB(fs, group, blk)) {
> 		ext2fs_inode_table_loc_set(fs, group, 0);
> 		rfs->needed_blocks++;
> -	} else if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -					      EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
> +	} else if (ext2fs_has_group_desc_csum(fs) &&
> 		   (ext2fs_bg_flags_test(fs, group, EXT2_BG_BLOCK_UNINIT))) {
> 		/*
> 		 * If the block bitmap is uninitialized, which means
> @@ -804,8 +801,7 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
> 	for (blk = ext2fs_blocks_count(fs->super);
> 	     blk < ext2fs_blocks_count(old_fs->super); blk++) {
> 		g = ext2fs_group_of_blk2(fs, blk);
> -		if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> -					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
> +		if (ext2fs_has_group_desc_csum(fs) &&
> 		    ext2fs_bg_flags_test(old_fs, g, EXT2_BG_BLOCK_UNINIT)) {
> 			/*
> 			 * The block bitmap is uninitialized, so skip
> 


Cheers, Andreas






^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC] ext4: Rework metadata_csum/gdt_csum flag handling in kernel
  2012-02-29  1:32       ` [RFC] ext4: Rework metadata_csum/gdt_csum flag handling in kernel Darrick J. Wong
@ 2012-02-29  5:48         ` Andreas Dilger
  2012-03-03  3:56           ` [RFC v2] " Darrick J. Wong
  0 siblings, 1 reply; 31+ messages in thread
From: Andreas Dilger @ 2012-02-29  5:48 UTC (permalink / raw)
  To: djwong
  Cc: Ted Ts'o, Sunil Mushran, Martin K Petersen, Greg Freemyer,
	Amir Goldstein, linux-kernel, Andi Kleen, Mingming Cao,
	Joel Becker, linux-fsdevel, linux-ext4, Coly Li

On 2012-02-28, at 6:32 PM, Darrick J. Wong wrote:
> Ok, I've reworked the block group descriptor checksum handling code per this
> email thread.  INCOMPAT_BG_USE_META_CSUM is gone.  METADATA_CSUM implies (and
> in fact overrides) GDT_CSUM, though the group descriptor checksum uses the same
> function as all other metadata blocks' checksums (by default crc32c).  I
> created a helper function to determine if group descriptor checksums are
> enabled, and the actual gdt checksum verify/set functions are smart enough to
> use the correct function.
> 
> Below are the changes that I intend to make to the kernel.  I'll integrate these
> changes into the (huge) kernel patchset, but wanted to aggregate the changes
> here first to avoid overwhelming reviewers.
> 
> Question: What will happen to old kernels when METADATA_CSUM and GDT_CSUM are
> set?  Should the kernel reject the combination and ask for fsck?  I think it
> will be ok, but older kernels might not be...?

As with the e2fsprogs patch, I think METADATA_CSUM should override GDT_CSUM
completely.  If both are set, then the kernel should ignore GDT_CSUM entirely
and just use the new checksum algorithm for the group descriptors.  It is up
to the user tools not to allow this combination of features to be set, and
there is no value in adding an extra failure case if they are (though if the
superblock checksum is also incorrect, that means the superblock is broken
and a backup should be used and/or the mount failed).

> Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
> ---

One minor comment below, but I think this patch is the right approach.  I was
also going to proffer my Acked-by: for this patch, but I now recall that this
patch is itself short lived and will be merged into the patch series.

> fs/ext4/balloc.c  |    4 ++--
> fs/ext4/ext4.h    |   20 +++++++++++++++-----
> fs/ext4/ialloc.c  |   19 ++++++++-----------
> fs/ext4/inode.c   |    3 +--
> fs/ext4/mballoc.c |    6 +++---
> fs/ext4/resize.c  |    9 +++------
> fs/ext4/super.c   |   23 ++++++++++-------------
> 7 files changed, 42 insertions(+), 42 deletions(-)
> 
> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index 6eee0e6..b5a7951 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -168,7 +168,7 @@ void ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
> 
> 	/* If checksum is bad mark all blocks used to prevent allocation
> 	 * essentially implementing a per-group read-only flag. */
> -	if (!ext4_group_desc_csum_verify(sbi, block_group, gdp)) {
> +	if (!ext4_group_desc_csum_verify(sb, block_group, gdp)) {
> 		ext4_error(sb, "Checksum bad for group %u", block_group);
> 		ext4_free_group_clusters_set(sb, gdp, 0);
> 		ext4_free_inodes_set(sb, gdp, 0);
> @@ -214,7 +214,7 @@ void ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
> 			     sb->s_blocksize * 8, bh->b_data);
> 	ext4_block_bitmap_csum_set(sb, block_group, gdp, bh,
> 				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
> -	ext4_group_desc_csum_set(sbi, block_group, gdp);
> +	ext4_group_desc_csum_set(sb, block_group, gdp);
> }
> 
> /* Return the number of free blocks in a block group.  It is used when
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 70bd236..a518930 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1434,6 +1434,11 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
> #define EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE	0x0040
> #define EXT4_FEATURE_RO_COMPAT_QUOTA		0x0100
> #define EXT4_FEATURE_RO_COMPAT_BIGALLOC		0x0200
> +/*
> + * METADATA_CSUM implies GDT_CSUM.  When METADATA_CSUM is set, group

This should also get an explicit comment that METADATA_CSUM overrides and
is mutually exclusive with GDT_CSUM.

> + * descriptor checksums use the same algorithm as all other data
> + * structures' checksums.
> + */
> #define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM	0x0400
> 
> #define EXT4_FEATURE_INCOMPAT_COMPRESSION	0x0001
> @@ -1449,7 +1454,6 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
> #define EXT4_FEATURE_INCOMPAT_DIRDATA		0x1000 /* data in dirent */
> #define EXT4_FEATURE_INCOMPAT_INLINEDATA	0x2000 /* data in inode */
> #define EXT4_FEATURE_INCOMPAT_LARGEDIR		0x4000 /* >2GB or 3-lvl htree */
> -#define EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM	0x8000
> 
> #define EXT2_FEATURE_COMPAT_SUPP	EXT4_FEATURE_COMPAT_EXT_ATTR
> #define EXT2_FEATURE_INCOMPAT_SUPP	(EXT4_FEATURE_INCOMPAT_FILETYPE| \
> @@ -1473,8 +1477,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
> 					 EXT4_FEATURE_INCOMPAT_EXTENTS| \
> 					 EXT4_FEATURE_INCOMPAT_64BIT| \
> 					 EXT4_FEATURE_INCOMPAT_FLEX_BG| \
> -					 EXT4_FEATURE_INCOMPAT_MMP| \
> -					 EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)
> +					 EXT4_FEATURE_INCOMPAT_MMP)
> #define EXT4_FEATURE_RO_COMPAT_SUPP	(EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER| \
> 					 EXT4_FEATURE_RO_COMPAT_LARGE_FILE| \
> 					 EXT4_FEATURE_RO_COMPAT_GDT_CSUM| \
> @@ -2092,11 +2095,18 @@ extern void ext4_used_dirs_set(struct super_block *sb,
> 				struct ext4_group_desc *bg, __u32 count);
> extern void ext4_itable_unused_set(struct super_block *sb,
> 				   struct ext4_group_desc *bg, __u32 count);
> -extern int ext4_group_desc_csum_verify(struct ext4_sb_info *sbi, __u32 group,
> +extern int ext4_group_desc_csum_verify(struct super_block *sb, __u32 group,
> 				       struct ext4_group_desc *gdp);
> -extern void ext4_group_desc_csum_set(struct ext4_sb_info *sbi, __u32 group,
> +extern void ext4_group_desc_csum_set(struct super_block *sb, __u32 group,
> 				     struct ext4_group_desc *gdp);
> 
> +static inline int ext4_has_group_desc_csum(struct super_block *sb)
> +{
> +	return EXT4_HAS_RO_COMPAT_FEATURE(sb,
> +					  EXT4_FEATURE_RO_COMPAT_GDT_CSUM |
> +					  EXT4_FEATURE_RO_COMPAT_METADATA_CSUM);
> +}
> +
> static inline ext4_fsblk_t ext4_blocks_count(struct ext4_super_block *es)
> {
> 	return ((ext4_fsblk_t)le32_to_cpu(es->s_blocks_count_hi) << 32) |
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index b9b6b27..1ade34d 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -70,13 +70,11 @@ static unsigned ext4_init_inode_bitmap(struct super_block *sb,
> 				       ext4_group_t block_group,
> 				       struct ext4_group_desc *gdp)
> {
> -	struct ext4_sb_info *sbi = EXT4_SB(sb);
> -
> 	J_ASSERT_BH(bh, buffer_locked(bh));
> 
> 	/* If checksum is bad mark all blocks and inodes use to prevent
> 	 * allocation, essentially implementing a per-group read-only flag. */
> -	if (!ext4_group_desc_csum_verify(sbi, block_group, gdp)) {
> +	if (!ext4_group_desc_csum_verify(sb, block_group, gdp)) {
> 		ext4_error(sb, "Checksum bad for group %u", block_group);
> 		ext4_free_group_clusters_set(sb, gdp, 0);
> 		ext4_free_inodes_set(sb, gdp, 0);
> @@ -92,7 +90,7 @@ static unsigned ext4_init_inode_bitmap(struct super_block *sb,
> 			bh->b_data);
> 	ext4_inode_bitmap_csum_set(sb, block_group, gdp, bh,
> 				   EXT4_INODES_PER_GROUP(sb) / 8);
> -	ext4_group_desc_csum_set(sbi, block_group, gdp);
> +	ext4_group_desc_csum_set(sb, block_group, gdp);
> 
> 	return EXT4_INODES_PER_GROUP(sb);
> }
> @@ -287,7 +285,7 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
> 	}
> 	ext4_inode_bitmap_csum_set(sb, block_group, gdp, bitmap_bh,
> 				   EXT4_INODES_PER_GROUP(sb) / 8);
> -	ext4_group_desc_csum_set(sbi, block_group, gdp);
> +	ext4_group_desc_csum_set(sb, block_group, gdp);
> 	ext4_unlock_group(sb, block_group);
> 
> 	percpu_counter_inc(&sbi->s_freeinodes_counter);
> @@ -657,8 +655,7 @@ static int ext4_claim_inode(struct super_block *sb,
> 	}
> 	/* If we didn't allocate from within the initialized part of the inode
> 	 * table then we need to initialize up to this inode. */
> -	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
> -
> +	if (ext4_has_group_desc_csum(sb)) {
> 		if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
> 			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
> 			/* When marking the block group with
> @@ -697,7 +694,7 @@ static int ext4_claim_inode(struct super_block *sb,
> 	}
> 	ext4_inode_bitmap_csum_set(sb, group, gdp, inode_bitmap_bh,
> 				   EXT4_INODES_PER_GROUP(sb) / 8);
> -	ext4_group_desc_csum_set(sbi, group, gdp);
> +	ext4_group_desc_csum_set(sb, group, gdp);
> err_ret:
> 	ext4_unlock_group(sb, group);
> 	up_read(&grp->alloc_sem);
> @@ -832,7 +829,7 @@ repeat_in_this_group:
> 
> got:
> 	/* We may have to initialize the block bitmap if it isn't already */
> -	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
> +	if (ext4_has_group_desc_csum(sb) &&
> 	    gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
> 		struct buffer_head *block_bitmap_bh;
> 
> @@ -858,7 +855,7 @@ got:
> 						   block_bitmap_bh,
> 						   EXT4_BLOCKS_PER_GROUP(sb) /
> 						   8);
> -			ext4_group_desc_csum_set(sbi, group, gdp);
> +			ext4_group_desc_csum_set(sb, group, gdp);
> 		}
> 		ext4_unlock_group(sb, group);
> 
> @@ -1226,7 +1223,7 @@ int ext4_init_inode_table(struct super_block *sb, ext4_group_t group,
> skip_zeroout:
> 	ext4_lock_group(sb, group);
> 	gdp->bg_flags |= cpu_to_le16(EXT4_BG_INODE_ZEROED);
> -	ext4_group_desc_csum_set(sbi, group, gdp);
> +	ext4_group_desc_csum_set(sb, group, gdp);
> 	ext4_unlock_group(sb, group);
> 
> 	BUFFER_TRACE(group_desc_bh,
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index c0200cf..e94ac91 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3573,8 +3573,7 @@ make_io:
> 				b = table;
> 			end = b + EXT4_SB(sb)->s_inode_readahead_blks;
> 			num = EXT4_INODES_PER_GROUP(sb);
> -			if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
> -				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
> +			if (ext4_has_group_desc_csum(sb))
> 				num -= ext4_itable_unused_count(sb, gdp);
> 			table += num / inodes_per_block;
> 			if (end > table)
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 5f2e2ed..d6062e7 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2919,7 +2919,7 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
> 	ext4_free_group_clusters_set(sb, gdp, len);
> 	ext4_block_bitmap_csum_set(sb, ac->ac_b_ex.fe_group, gdp, bitmap_bh,
> 				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
> -	ext4_group_desc_csum_set(sbi, ac->ac_b_ex.fe_group, gdp);
> +	ext4_group_desc_csum_set(sb, ac->ac_b_ex.fe_group, gdp);
> 
> 	ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
> 	percpu_counter_sub(&sbi->s_freeclusters_counter, ac->ac_b_ex.fe_len);
> @@ -4787,7 +4787,7 @@ do_more:
> 	ext4_free_group_clusters_set(sb, gdp, ret);
> 	ext4_block_bitmap_csum_set(sb, block_group, gdp, bitmap_bh,
> 				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
> -	ext4_group_desc_csum_set(sbi, block_group, gdp);
> +	ext4_group_desc_csum_set(sb, block_group, gdp);
> 	ext4_unlock_group(sb, block_group);
> 	percpu_counter_add(&sbi->s_freeclusters_counter, count_clusters);
> 
> @@ -4933,7 +4933,7 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
> 	ext4_free_group_clusters_set(sb, desc, blk_free_count);
> 	ext4_block_bitmap_csum_set(sb, block_group, desc, bitmap_bh,
> 				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
> -	ext4_group_desc_csum_set(sbi, block_group, desc);
> +	ext4_group_desc_csum_set(sb, block_group, desc);
> 	ext4_unlock_group(sb, block_group);
> 	percpu_counter_add(&sbi->s_freeclusters_counter,
> 			   EXT4_B2C(sbi, blocks_freed));
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index 2363532..21ace95 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -1106,7 +1106,7 @@ static int ext4_setup_new_descs(handle_t *handle, struct super_block *sb,
> 					     EXT4_B2C(sbi, group_data->free_blocks_count));
> 		ext4_free_inodes_set(sb, gdp, EXT4_INODES_PER_GROUP(sb));
> 		gdp->bg_flags = cpu_to_le16(*bg_flags);
> -		ext4_group_desc_csum_set(sbi, group, gdp);
> +		ext4_group_desc_csum_set(sb, group, gdp);
> 
> 		err = ext4_handle_dirty_metadata(handle, NULL, gdb_bh);
> 		if (unlikely(err)) {
> @@ -1342,17 +1342,14 @@ static int ext4_setup_next_flex_gd(struct super_block *sb,
> 			   (1 + ext4_bg_num_gdb(sb, group + i) +
> 			    le16_to_cpu(es->s_reserved_gdt_blocks)) : 0;
> 		group_data[i].free_blocks_count = blocks_per_group - overhead;
> -		if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
> -					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
> +		if (ext4_has_group_desc_csum(sb))
> 			flex_gd->bg_flags[i] = EXT4_BG_BLOCK_UNINIT |
> 					       EXT4_BG_INODE_UNINIT;
> 		else
> 			flex_gd->bg_flags[i] = EXT4_BG_INODE_ZEROED;
> 	}
> 
> -	if (last_group == n_group &&
> -	    EXT4_HAS_RO_COMPAT_FEATURE(sb,
> -				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
> +	if (last_group == n_group && ext4_has_group_desc_csum(sb))
> 		/* We need to initialize block bitmap of last group. */
> 		flex_gd->bg_flags[i - 1] &= ~EXT4_BG_BLOCK_UNINIT;
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 2190044..6196bfa 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -2097,9 +2097,7 @@ static __le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 block_group,
> 	__le32 le_group = cpu_to_le32(block_group);
> 
> 	if ((sbi->s_es->s_feature_ro_compat &
> -	     cpu_to_le32(EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) &&
> -	    (sbi->s_es->s_feature_incompat &
> -	     cpu_to_le32(EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM))) {
> +	     cpu_to_le32(EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))) {
> 		/* Use new metadata_csum algorithm */
> 		__u16 old_csum;
> 		__u32 csum32;
> @@ -2135,24 +2133,23 @@ out:
> 	return cpu_to_le16(crc);
> }
> 
> -int ext4_group_desc_csum_verify(struct ext4_sb_info *sbi, __u32 block_group,
> +int ext4_group_desc_csum_verify(struct super_block *sb, __u32 block_group,
> 				struct ext4_group_desc *gdp)
> {
> -	if ((sbi->s_es->s_feature_ro_compat &
> -	     cpu_to_le32(EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) &&
> -	    (gdp->bg_checksum != ext4_group_desc_csum(sbi, block_group, gdp)))
> +	if (ext4_has_group_desc_csum(sb) &&
> +	    (gdp->bg_checksum != ext4_group_desc_csum(EXT4_SB(sb),
> +						      block_group, gdp)))
> 		return 0;
> 
> 	return 1;
> }
> 
> -void ext4_group_desc_csum_set(struct ext4_sb_info *sbi, __u32 block_group,
> +void ext4_group_desc_csum_set(struct super_block *sb, __u32 block_group,
> 			      struct ext4_group_desc *gdp)
> {
> -	if (!(sbi->s_es->s_feature_ro_compat &
> -	     cpu_to_le32(EXT4_FEATURE_RO_COMPAT_GDT_CSUM)))
> +	if (!ext4_has_group_desc_csum(sb))
> 		return;
> -	gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
> +	gdp->bg_checksum = ext4_group_desc_csum(EXT4_SB(sb), block_group, gdp);
> }
> 
> /* Called at mount-time, super-block is locked */
> @@ -2209,7 +2206,7 @@ static int ext4_check_descriptors(struct super_block *sb,
> 			return 0;
> 		}
> 		ext4_lock_group(sb, i);
> -		if (!ext4_group_desc_csum_verify(sbi, i, gdp)) {
> +		if (!ext4_group_desc_csum_verify(sb, i, gdp)) {
> 			ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: "
> 				 "Checksum for group %u failed (%u!=%u)",
> 				 i, le16_to_cpu(ext4_group_desc_csum(sbi, i,
> @@ -4620,7 +4617,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
> 				struct ext4_group_desc *gdp =
> 					ext4_get_group_desc(sb, g, NULL);
> 
> -				if (!ext4_group_desc_csum_verify(sbi, g, gdp)) {
> +				if (!ext4_group_desc_csum_verify(sb, g, gdp)) {
> 					ext4_msg(sb, KERN_ERR,
> 	       "ext4_remount: Checksum for group %u failed (%u!=%u)",
> 		g, le16_to_cpu(ext4_group_desc_csum(sbi, g, gdp)),
> 


Cheers, Andreas






^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC v2] e2fsprogs: Rework metadata_csum/gdt_csum flag handling
  2012-02-29  5:40         ` Andreas Dilger
@ 2012-03-03  3:50           ` Darrick J. Wong
  0 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-03-03  3:50 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Ted Ts'o, Sunil Mushran, Martin K Petersen, Greg Freemyer,
	Amir Goldstein, linux-kernel, Andi Kleen, Mingming Cao,
	Joel Becker, linux-fsdevel, linux-ext4, Coly Li

On Tue, Feb 28, 2012 at 10:40:59PM -0700, Andreas Dilger wrote:
> On 2012-02-28, at 6:27 PM, Darrick J. Wong wrote:
> > Ok, I've reworked the block group descriptor checksum handling code per this
> > email thread.  INCOMPAT_BG_USE_META_CSUM is gone.  METADATA_CSUM implies (and
> > in fact overrides) GDT_CSUM.  When both are set, the group descriptor checksum
> > uses the same function as all other metadata blocks' checksums (by default
> > crc32c).  I created a helper function to determine if group descriptor
> > checksums are enabled, and the actual gdt checksum verify/set functions are
> > smart enough to use the correct function.
> > 
> > Below are the changes that I intend to make to e2fsprogs.  I'll integrate these
> > changes into the (huge) e2fsprogs patchset, but wanted to aggregate the changes
> > here first to avoid overwhelming reviewers.  I'll send a kernel patch shortly.
> > 
> > Question: What will happen to old kernels when METADATA_CSUM and GDT_CSUM are
> > set?
> 
> This should never be allowed by the tools, and should be treated by e2fsck as
> an error, that is fixed by clearing GDT_CSUM and leaving METADATA_CSUM set.
> 
> > Should tune2fs/e2fsck change METADATA_CSUM|GDT_CSUM to only METADATA_CSUM
> > if they encounter it?
> 
> Yes.
> 
> > I'm a little concerned that a pre-METADATA_CSUM kernel will see the GDT_CSUM
> > flag and assume it's ok to proceed in ro mode and get confused.
> 
> Right, so if tune2fs/mke2fs set METADATA_CSUM and always disable GDT_CSUM at
> the same time there will be no problem.  e2fsck will correct this in case it
> is seen in the wild.  This should be rare, since it means the other feature
> flags are also corrupted, and that will probably force the use of a backup
> superblock, or make mincemeat of the filesystem for other reasons (bad
> checksums cannot themselves corrupt the filesystem).
> 
> > Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
> > ---
> 
> Looks like a net win all around.  One comment inline, but you can add my
> 
> Acked-by: Andreas Dilger <adilger@dilger.ca>

I fixed the comment, and modified the tools so that metadata_csum and uninit_bg
can't both be set at the same time.  This made tune2fs handling a bit trickier,
because now I had to deal with transitioning the filesystem between
metadata_csum, uninit_bg, and neither flag being set.  I think I covered all
possible transitions of those flags in my testing matrix. :)

Question: What do we do when clearing metadata_csum?  Right now the code can
handle transitions either to ^metadata_csum,uninit_bg (mostly a matter of
rewriting the gdt checksums and bitwise operations) and
^metadata_csum,^uninit_bg.  If, however, the user doesn't explicitly specify an
uninit_bg setting, what do we default to?  Defaulting to metadata_csum ->
^uninit_bg is least surprising (command line args work as expected) but then
all the group descriptors have to be rewritten, and uninit bitmaps have to be
initialized.  On the other hand, metadata_csum -> uninit_bg causes fewer
changes to the fs.

Also it turns out that the old code to turn off uninit_bg is broken -- in the
case of a group with uninit bitmaps, it will zero out the group flags (clearing
the uninit bitmap status) but does not zero out the bitmap.  This causes the
next fs driver to see garbage in the bitmaps.

I also fixed a e2fsck problem code that I hadn't specified in problem.c.

So, here's v2 (which I will integrate into the main patch series when the dust
settles).

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---

 debugfs/debugfs.c        |    3 -
 e2fsck/pass5.c           |   18 ++----
 e2fsck/problem.c         |   20 ++++++
 e2fsck/problem.h         |    5 ++
 e2fsck/super.c           |   16 +++++
 e2fsck/unix.c            |    2 -
 lib/e2p/feature.c        |    2 -
 lib/ext2fs/alloc.c       |    6 +-
 lib/ext2fs/alloc_stats.c |    3 -
 lib/ext2fs/csum.c        |   13 +---
 lib/ext2fs/ext2_fs.h     |    6 ++
 lib/ext2fs/ext2fs.h      |   12 +++-
 lib/ext2fs/initialize.c  |    3 -
 lib/ext2fs/inode.c       |    9 +--
 lib/ext2fs/openfs.c      |    3 -
 lib/ext2fs/rw_bitmaps.c  |   12 +---
 misc/dumpe2fs.c          |    4 +
 misc/mke2fs.c            |   29 ++++-----
 misc/tune2fs.c           |  142 ++++++++++++++++++++++++++--------------------
 resize/resize2fs.c       |   12 +---
 20 files changed, 170 insertions(+), 150 deletions(-)

diff --git a/debugfs/debugfs.c b/debugfs/debugfs.c
index c1cbf06..9c8e48e 100644
--- a/debugfs/debugfs.c
+++ b/debugfs/debugfs.c
@@ -357,8 +357,7 @@ void do_show_super_stats(int argc, char *argv[])
 		return;
 	}
 
-	gdt_csum = EXT2_HAS_RO_COMPAT_FEATURE(current_fs->super,
-					      EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
+	gdt_csum = ext2fs_has_group_desc_csum(current_fs);
 	for (i = 0; i < current_fs->group_desc_count; i++) {
 		fprintf(out, " Group %2d: block bitmap at %llu, "
 		        "inode bitmap at %llu, "
diff --git a/e2fsck/pass5.c b/e2fsck/pass5.c
index f1ce6d7..c5dba0b 100644
--- a/e2fsck/pass5.c
+++ b/e2fsck/pass5.c
@@ -88,7 +88,7 @@ static void check_inode_bitmap_checksum(e2fsck_t ctx)
 	int		nbytes;
 	ext2_ino_t	ino_itr;
 	errcode_t	retval;
-	int		csum_flag = 0;
+	int		csum_flag;
 
 	/* If bitmap is dirty from being fixed, checksum will be corrected */
 	if (ext2fs_test_ib_dirty(ctx->fs))
@@ -103,9 +103,7 @@ static void check_inode_bitmap_checksum(e2fsck_t ctx)
 		fatal_error(ctx, 0);
 	}
 
-	if (EXT2_HAS_RO_COMPAT_FEATURE(ctx->fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
-		csum_flag = 1;
+	csum_flag = ext2fs_has_group_desc_csum(ctx->fs);
 
 	clear_problem_context(&pctx);
 	for (i = 0; i < ctx->fs->group_desc_count; i++) {
@@ -149,7 +147,7 @@ static void check_block_bitmap_checksum(e2fsck_t ctx)
 	int		nbytes;
 	blk64_t		blk_itr;
 	errcode_t	retval;
-	int		csum_flag = 0;
+	int		csum_flag;
 
 	/* If bitmap is dirty from being fixed, checksum will be corrected */
 	if (ext2fs_test_bb_dirty(ctx->fs))
@@ -164,9 +162,7 @@ static void check_block_bitmap_checksum(e2fsck_t ctx)
 		fatal_error(ctx, 0);
 	}
 
-	if (EXT2_HAS_RO_COMPAT_FEATURE(ctx->fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
-		csum_flag = 1;
+	csum_flag = ext2fs_has_group_desc_csum(ctx->fs);
 
 	clear_problem_context(&pctx);
 	for (i = 0; i < ctx->fs->group_desc_count; i++) {
@@ -322,8 +318,7 @@ static void check_block_bitmaps(e2fsck_t ctx)
 		goto errout;
 	}
 
-	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
+	csum_flag = ext2fs_has_group_desc_csum(fs);
 redo_counts:
 	had_problem = 0;
 	save_problem = 0;
@@ -599,8 +594,7 @@ static void check_inode_bitmaps(e2fsck_t ctx)
 		goto errout;
 	}
 
-	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
+	csum_flag = ext2fs_has_group_desc_csum(fs);
 redo_counts:
 	had_problem = 0;
 	save_problem = 0;
diff --git a/e2fsck/problem.c b/e2fsck/problem.c
index d3d0ee5..d7be5aa 100644
--- a/e2fsck/problem.c
+++ b/e2fsck/problem.c
@@ -428,6 +428,15 @@ static struct e2fsck_problem problem_table[] = {
 	  N_("@S MMP block checksum does not match MMP block.  "),
 	  PROMPT_FIX, PR_PREEN_OK | PR_NO_OK},
 
+	/*
+	 * metadata_csum implies uninit_bg; both feature bits cannot
+	 * be set simultaneously.
+	 */
+	{ PR_0_META_AND_GDT_CSUM_SET,
+	  N_("@S metadata_csum supersedes uninit_bg; both feature "
+	     "bits cannot be set simultaneously."),
+	  PROMPT_FIX, PR_PREEN_OK | PR_NO_OK},
+
 	/* Pass 1 errors */
 
 	/* Pass 1: Checking inodes, blocks, and sizes */
@@ -1423,10 +1432,15 @@ static struct e2fsck_problem problem_table[] = {
 	  N_("@d @i %i, %B, offset %N: @d has no checksum\n"),
 	  PROMPT_FIX, PR_PREEN_OK },
 
-	/* leaf node passes checks, but fails checksum */
+	/* leaf node fails checksum */
 	{ PR_2_LEAF_NODE_CSUM_INVALID,
-	  N_("@d @i %i, %B, offset %N: @d passes checks, but fails checksum\n"),
-	  PROMPT_FIX, 0 },
+	  N_("@d @i %i, %B, offset %N: @d fails checksum\n"),
+	  PROMPT_CLEAR, PR_PREEN_OK },
+
+	/* leaf node passes checks but fails checksum */
+	{ PR_2_LEAF_NODE_ONLY_CSUM_INVALID,
+	  N_("@d @i %i, %B, offset %N: @d passes checks but fails checksum\n"),
+	  PROMPT_FIX, PR_PREEN_OK },
 
 	/* Pass 3 errors */
 
diff --git a/e2fsck/problem.h b/e2fsck/problem.h
index 01d8377..5126c57 100644
--- a/e2fsck/problem.h
+++ b/e2fsck/problem.h
@@ -245,6 +245,11 @@ struct problem_context {
 /* Superblock has invalid MMP checksum. */
 #define PR_0_MMP_CSUM_INVALID			0x000044
 
+/*
+ * metadata_csum supersedes uninit_bg; both feature bits cannot be set
+ * simultaneously.
+ */
+#define PR_0_META_AND_GDT_CSUM_SET		0x000045
 
 /*
  * Pass 1 errors
diff --git a/e2fsck/super.c b/e2fsck/super.c
index dbd337c..d70947b 100644
--- a/e2fsck/super.c
+++ b/e2fsck/super.c
@@ -577,14 +577,26 @@ void check_super_block(e2fsck_t ctx)
 		}
 	}
 
+	/* Are meta_csum and gdt_csum both set? */
+	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
+				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) &&
+	    EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
+				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
+	    fix_problem(ctx, PR_0_META_AND_GDT_CSUM_SET, &pctx)) {
+		fs->super->s_feature_ro_compat &=
+			~EXT4_FEATURE_RO_COMPAT_GDT_CSUM;
+		ext2fs_mark_super_dirty(fs);
+		for (i = 0; i < fs->group_desc_count; i++)
+			ext2fs_group_desc_csum_set(fs, i);
+	}
+
 	/*
 	 * Verify the group descriptors....
 	 */
 	first_block = sb->s_first_data_block;
 	last_block = ext2fs_blocks_count(sb)-1;
 
-	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
+	csum_flag = ext2fs_has_group_desc_csum(fs);
 	for (i = 0; i < fs->group_desc_count; i++) {
 		pctx.group = i;
 
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index 9319e40..d3fb8f8 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -1658,7 +1658,7 @@ no_journal:
 	}
 
 	if ((run_result & E2F_FLAG_CANCEL) == 0 &&
-	    sb->s_feature_ro_compat & EXT4_FEATURE_RO_COMPAT_GDT_CSUM &&
+	    ext2fs_has_group_desc_csum(ctx->fs) &&
 	    !(ctx->options & E2F_OPT_READONLY)) {
 		retval = ext2fs_set_gdt_csum(ctx->fs);
 		if (retval) {
diff --git a/lib/e2p/feature.c b/lib/e2p/feature.c
index 9f9c6dd..486f846 100644
--- a/lib/e2p/feature.c
+++ b/lib/e2p/feature.c
@@ -87,8 +87,6 @@ static struct feature feature_list[] = {
 			"mmp" },
 	{       E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_FLEX_BG,
 			"flex_bg"},
-	{	E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM,
-			"bg_use_meta_csum"},
 	{	0, 0, 0 },
 };
 
diff --git a/lib/ext2fs/alloc.c b/lib/ext2fs/alloc.c
index 948a0ec..e62ed68 100644
--- a/lib/ext2fs/alloc.c
+++ b/lib/ext2fs/alloc.c
@@ -36,8 +36,7 @@ static void check_block_uninit(ext2_filsys fs, ext2fs_block_bitmap map,
 	blk64_t		blk, super_blk, old_desc_blk, new_desc_blk;
 	int		old_desc_blocks;
 
-	if (!(EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					 EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) ||
+	if (!ext2fs_has_group_desc_csum(fs) ||
 	    !(ext2fs_bg_flags_test(fs, group, EXT2_BG_BLOCK_UNINIT)))
 		return;
 
@@ -83,8 +82,7 @@ static void check_inode_uninit(ext2_filsys fs, ext2fs_inode_bitmap map,
 {
 	ext2_ino_t	i, ino;
 
-	if (!(EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					 EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) ||
+	if (!ext2fs_has_group_desc_csum(fs) ||
 	    !(ext2fs_bg_flags_test(fs, group, EXT2_BG_INODE_UNINIT)))
 		return;
 
diff --git a/lib/ext2fs/alloc_stats.c b/lib/ext2fs/alloc_stats.c
index adec363..4229084 100644
--- a/lib/ext2fs/alloc_stats.c
+++ b/lib/ext2fs/alloc_stats.c
@@ -38,8 +38,7 @@ void ext2fs_inode_alloc_stats2(ext2_filsys fs, ext2_ino_t ino,
 	/* We don't strictly need to be clearing the uninit flag if inuse < 0
 	 * (i.e. freeing inodes) but it also means something is bad. */
 	ext2fs_bg_flags_clear(fs, group, EXT2_BG_INODE_UNINIT);
-	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
+	if (ext2fs_has_group_desc_csum(fs)) {
 		ext2_ino_t first_unused_inode =	fs->super->s_inodes_per_group -
 			ext2fs_bg_itable_unused(fs, group) +
 			group * fs->super->s_inodes_per_group + 1;
diff --git a/lib/ext2fs/csum.c b/lib/ext2fs/csum.c
index 99ca652..425f736 100644
--- a/lib/ext2fs/csum.c
+++ b/lib/ext2fs/csum.c
@@ -743,9 +743,7 @@ STATIC __u16 ext2fs_group_desc_csum(ext2_filsys fs, dgrp_t group)
 #endif
 
 	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) &&
-	    EXT2_HAS_INCOMPAT_FEATURE(fs->super,
-			EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)) {
+			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
 		/* new metadata csum code */
 		__u16 old_crc;
 		__u32 crc32;
@@ -781,8 +779,7 @@ out:
 
 int ext2fs_group_desc_csum_verify(ext2_filsys fs, dgrp_t group)
 {
-	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
+	if (ext2fs_has_group_desc_csum(fs) &&
 	    (ext2fs_bg_checksum(fs, group) !=
 	     ext2fs_group_desc_csum(fs, group)))
 		return 0;
@@ -792,8 +789,7 @@ int ext2fs_group_desc_csum_verify(ext2_filsys fs, dgrp_t group)
 
 void ext2fs_group_desc_csum_set(ext2_filsys fs, dgrp_t group)
 {
-	if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+	if (!ext2fs_has_group_desc_csum(fs))
 		return;
 
 	/* ext2fs_bg_checksum_set() sets the actual checksum field but
@@ -827,8 +823,7 @@ errcode_t ext2fs_set_gdt_csum(ext2_filsys fs)
 	if (!fs->inode_map)
 		return EXT2_ET_NO_INODE_BITMAP;
 
-	if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+	if (!ext2fs_has_group_desc_csum(fs))
 		return 0;
 
 	for (i = 0; i < fs->group_desc_count; i++) {
diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index c2e7fbe..89df977 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -729,6 +729,11 @@ struct ext2_super_block {
 #define EXT4_FEATURE_RO_COMPAT_HAS_SNAPSHOT	0x0080
 #define EXT4_FEATURE_RO_COMPAT_QUOTA		0x0100
 #define EXT4_FEATURE_RO_COMPAT_BIGALLOC		0x0200
+/*
+ * METADATA_CSUM implies GDT_CSUM.  When METADATA_CSUM is set, group
+ * descriptor checksums use the same algorithm as all other data
+ * structures' checksums.
+ */
 #define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM	0x0400
 #define EXT4_FEATURE_RO_COMPAT_REPLICA		0x0800
 
@@ -743,7 +748,6 @@ struct ext2_super_block {
 #define EXT4_FEATURE_INCOMPAT_FLEX_BG		0x0200
 #define EXT4_FEATURE_INCOMPAT_EA_INODE		0x0400
 #define EXT4_FEATURE_INCOMPAT_DIRDATA		0x1000
-#define EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM	0x8000
 
 #define EXT2_FEATURE_COMPAT_SUPP	0
 #define EXT2_FEATURE_INCOMPAT_SUPP    (EXT2_FEATURE_INCOMPAT_FILETYPE| \
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index ff2799a..28cb626 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -579,8 +579,7 @@ typedef struct ext2_icount *ext2_icount_t;
 					 EXT3_FEATURE_INCOMPAT_EXTENTS|\
 					 EXT4_FEATURE_INCOMPAT_FLEX_BG|\
 					 EXT4_FEATURE_INCOMPAT_MMP|\
-					 EXT4_FEATURE_INCOMPAT_64BIT|\
-					 EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)
+					 EXT4_FEATURE_INCOMPAT_64BIT)
 #else
 #define EXT2_LIB_FEATURE_INCOMPAT_SUPP	(EXT2_FEATURE_INCOMPAT_FILETYPE|\
 					 EXT3_FEATURE_INCOMPAT_JOURNAL_DEV|\
@@ -589,8 +588,7 @@ typedef struct ext2_icount *ext2_icount_t;
 					 EXT3_FEATURE_INCOMPAT_EXTENTS|\
 					 EXT4_FEATURE_INCOMPAT_FLEX_BG|\
 					 EXT4_FEATURE_INCOMPAT_MMP|\
-					 EXT4_FEATURE_INCOMPAT_64BIT|\
-					 EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)
+					 EXT4_FEATURE_INCOMPAT_64BIT)
 #endif
 #ifdef CONFIG_QUOTA
 #define EXT2_LIB_FEATURE_RO_COMPAT_SUPP	(EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER|\
@@ -646,6 +644,12 @@ typedef struct stat ext2fs_struct_stat;
 /*
  * function prototypes
  */
+static inline int ext2fs_has_group_desc_csum(ext2_filsys fs)
+{
+	return EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
+			EXT4_FEATURE_RO_COMPAT_GDT_CSUM |
+			EXT4_FEATURE_RO_COMPAT_METADATA_CSUM);
+}
 
 /* alloc.c */
 extern errcode_t ext2fs_new_inode(ext2_filsys fs, ext2_ino_t dir, int mode,
diff --git a/lib/ext2fs/initialize.c b/lib/ext2fs/initialize.c
index a63ea18..a22cab4 100644
--- a/lib/ext2fs/initialize.c
+++ b/lib/ext2fs/initialize.c
@@ -435,8 +435,7 @@ ipg_retry:
 	 * bitmaps will be accounted for when allocated).
 	 */
 	free_blocks = 0;
-	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
+	csum_flag = ext2fs_has_group_desc_csum(fs);
 	for (i = 0; i < fs->group_desc_count; i++) {
 		/*
 		 * Don't set the BLOCK_UNINIT group for the last group
diff --git a/lib/ext2fs/inode.c b/lib/ext2fs/inode.c
index 74703c5..3e6d853 100644
--- a/lib/ext2fs/inode.c
+++ b/lib/ext2fs/inode.c
@@ -157,8 +157,7 @@ errcode_t ext2fs_open_inode_scan(ext2_filsys fs, int buffer_blocks,
 						     scan->current_group);
 	scan->inodes_left = EXT2_INODES_PER_GROUP(scan->fs->super);
 	scan->blocks_left = scan->fs->inode_blocks_per_group;
-	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
+	if (ext2fs_has_group_desc_csum(fs)) {
 		scan->inodes_left -=
 			ext2fs_bg_itable_unused(fs, scan->current_group);
 		scan->blocks_left =
@@ -183,8 +182,7 @@ errcode_t ext2fs_open_inode_scan(ext2_filsys fs, int buffer_blocks,
 	}
 	if (scan->fs->badblocks && scan->fs->badblocks->num)
 		scan->scan_flags |= EXT2_SF_CHK_BADBLOCKS;
-	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+	if (ext2fs_has_group_desc_csum(fs))
 		scan->scan_flags |= EXT2_SF_DO_LAZY;
 	*ret_scan = scan;
 	return 0;
@@ -250,8 +248,7 @@ static errcode_t get_next_blockgroup(ext2_inode_scan scan)
 	scan->bytes_left = 0;
 	scan->inodes_left = EXT2_INODES_PER_GROUP(fs->super);
 	scan->blocks_left = fs->inode_blocks_per_group;
-	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
+	if (ext2fs_has_group_desc_csum(fs)) {
 		scan->inodes_left -=
 			ext2fs_bg_itable_unused(fs, scan->current_group);
 		scan->blocks_left =
diff --git a/lib/ext2fs/openfs.c b/lib/ext2fs/openfs.c
index d2b64f4..2dc9b94 100644
--- a/lib/ext2fs/openfs.c
+++ b/lib/ext2fs/openfs.c
@@ -382,8 +382,7 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
 	 * If recovery is from backup superblock, Clear _UNININT flags &
 	 * reset bg_itable_unused to zero
 	 */
-	if (superblock > 1 && EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
+	if (superblock > 1 && ext2fs_has_group_desc_csum(fs)) {
 		dgrp_t group;
 
 		for (group = 0; group < fs->group_desc_count; group++) {
diff --git a/lib/ext2fs/rw_bitmaps.c b/lib/ext2fs/rw_bitmaps.c
index a5097c1..18e18aa 100644
--- a/lib/ext2fs/rw_bitmaps.c
+++ b/lib/ext2fs/rw_bitmaps.c
@@ -36,7 +36,7 @@ static errcode_t write_bitmaps(ext2_filsys fs, int do_inode, int do_block)
 	unsigned int	nbits;
 	errcode_t	retval;
 	char		*block_buf = NULL, *inode_buf = NULL;
-	int		csum_flag = 0;
+	int		csum_flag;
 	blk64_t		blk;
 	blk64_t		blk_itr = EXT2FS_B2C(fs, fs->super->s_first_data_block);
 	ext2_ino_t	ino_itr = 1;
@@ -46,9 +46,7 @@ static errcode_t write_bitmaps(ext2_filsys fs, int do_inode, int do_block)
 	if (!(fs->flags & EXT2_FLAG_RW))
 		return EXT2_ET_RO_FILSYS;
 
-	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
-		csum_flag = 1;
+	csum_flag = ext2fs_has_group_desc_csum(fs);
 
 	inode_nbytes = block_nbytes = 0;
 	if (do_block) {
@@ -168,7 +166,7 @@ static errcode_t read_bitmaps(ext2_filsys fs, int do_inode, int do_block)
 	errcode_t retval;
 	int block_nbytes = EXT2_CLUSTERS_PER_GROUP(fs->super) / 8;
 	int inode_nbytes = EXT2_INODES_PER_GROUP(fs->super) / 8;
-	int csum_flag = 0;
+	int csum_flag;
 	int do_image = fs->flags & EXT2_FLAG_IMAGE_FILE;
 	unsigned int	cnt;
 	blk64_t	blk;
@@ -181,9 +179,7 @@ static errcode_t read_bitmaps(ext2_filsys fs, int do_inode, int do_block)
 
 	fs->write_bitmaps = ext2fs_write_bitmaps;
 
-	if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
-		csum_flag = 1;
+	csum_flag = ext2fs_has_group_desc_csum(fs);
 
 	retval = ext2fs_get_mem(strlen(fs->device_name) + 80, &buf);
 	if (retval)
diff --git a/misc/dumpe2fs.c b/misc/dumpe2fs.c
index b8f386e..3ceb0f8 100644
--- a/misc/dumpe2fs.c
+++ b/misc/dumpe2fs.c
@@ -114,7 +114,7 @@ static void print_bg_opts(ext2_filsys fs, dgrp_t i)
 {
 	int first = 1, bg_flags = 0;
 
-	if (fs->super->s_feature_ro_compat & EXT4_FEATURE_RO_COMPAT_GDT_CSUM)
+	if (ext2fs_has_group_desc_csum(fs))
 		bg_flags = ext2fs_bg_flags(fs, i);
 
 	print_bg_opt(bg_flags, EXT2_BG_INODE_UNINIT, "INODE_UNINIT",
@@ -190,7 +190,7 @@ static void list_desc (ext2_filsys fs)
 		print_range(first_block, last_block);
 		fputs(")", stdout);
 		print_bg_opts(fs, i);
-		if (fs->super->s_feature_ro_compat & EXT4_FEATURE_RO_COMPAT_GDT_CSUM)
+		if (ext2fs_has_group_desc_csum(fs))
 			printf(_("  Checksum 0x%04x, unused inodes %u\n"),
 			       ext2fs_bg_checksum(fs, i),
 			       ext2fs_bg_itable_unused(fs, i));
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index 8852735..3d3b1d3 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -885,8 +885,7 @@ static __u32 ok_features[3] = {
 		EXT2_FEATURE_INCOMPAT_META_BG|
 		EXT4_FEATURE_INCOMPAT_FLEX_BG|
 		EXT4_FEATURE_INCOMPAT_MMP |
-		EXT4_FEATURE_INCOMPAT_64BIT |
-		EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM,
+		EXT4_FEATURE_INCOMPAT_64BIT,
 	/* R/O compat */
 	EXT2_FEATURE_RO_COMPAT_LARGE_FILE|
 		EXT4_FEATURE_RO_COMPAT_HUGE_FILE|
@@ -1937,6 +1936,12 @@ profile_error:
 	if (extended_opts)
 		parse_extended_opts(&fs_param, extended_opts);
 
+	/* Don't allow user to set both metadata_csum and uninit_bg bits. */
+	if ((fs_param.s_feature_ro_compat &
+	     EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) &&
+	    (fs_param.s_feature_ro_compat & EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+		fs_param.s_feature_ro_compat &= ~EXT4_FEATURE_RO_COMPAT_GDT_CSUM;
+
 	/* Since sparse_super is the default, we would only have a problem
 	 * here if it was explicitly disabled.
 	 */
@@ -2049,7 +2054,8 @@ static int should_do_undo(const char *name)
 	int csum_flag, force_undo;
 
 	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(&fs_param,
-					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
+				EXT4_FEATURE_RO_COMPAT_GDT_CSUM |
+				EXT4_FEATURE_RO_COMPAT_METADATA_CSUM);
 	force_undo = get_int_from_profile(fs_types, "force_undo", 0);
 	if (!force_undo && (!csum_flag || !lazy_itable_init))
 		return 0;
@@ -2306,19 +2312,6 @@ int main (int argc, char *argv[])
 	if (!quiet &&
 	    EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
 				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
-		if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
-			printf(_("Group descriptor checksums "
-				 "are not enabled.  This reduces the "
-				 "coverage of metadata checksumming.  "
-				 "Pass -O uninit_bg to rectify.\n"));
-		if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-				EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
-		    !EXT2_HAS_INCOMPAT_FEATURE(fs->super,
-				EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM))
-			printf(_("Group descriptor checksums will not use "
-				 "the faster metadata_checksum algorithm.  "
-				 "Pass -O bg_use_meta_csum to rectify.\n"));
 		if (!EXT2_HAS_INCOMPAT_FEATURE(fs->super,
 				EXT3_FEATURE_INCOMPAT_EXTENTS))
 			printf(_("Extents are not enabled.  The file extent "
@@ -2358,6 +2351,7 @@ int main (int argc, char *argv[])
 	    (fs_param.s_feature_ro_compat &
 	     (EXT4_FEATURE_RO_COMPAT_HUGE_FILE|EXT4_FEATURE_RO_COMPAT_GDT_CSUM|
 	      EXT4_FEATURE_RO_COMPAT_DIR_NLINK|
+	      EXT4_FEATURE_RO_COMPAT_METADATA_CSUM|
 	      EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE)))
 		fs->super->s_kbytes_written = 1;
 
@@ -2505,8 +2499,7 @@ int main (int argc, char *argv[])
 		 * inodes as unused; we want e2fsck to consider all
 		 * inodes as potentially containing recoverable data.
 		 */
-		if (fs->super->s_feature_ro_compat &
-		    EXT4_FEATURE_RO_COMPAT_GDT_CSUM) {
+		if (ext2fs_has_group_desc_csum(fs)) {
 			for (i = 1; i < fs->group_desc_count; i++)
 				ext2fs_bg_itable_unused_set(fs, i, 0);
 		}
diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index cba4d4c..241694f 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -92,7 +92,6 @@ static unsigned long new_inode_size;
 static char *ext_mount_opts;
 static int usrquota, grpquota;
 static int rewrite_checksums;
-static int rewrite_bgs_for_checksum;
 
 int journal_size, journal_flags;
 char *journal_device;
@@ -138,8 +137,7 @@ static __u32 ok_features[3] = {
 	EXT2_FEATURE_INCOMPAT_FILETYPE |
 		EXT3_FEATURE_INCOMPAT_EXTENTS |
 		EXT4_FEATURE_INCOMPAT_FLEX_BG |
-		EXT4_FEATURE_INCOMPAT_MMP |
-		EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM,
+		EXT4_FEATURE_INCOMPAT_MMP,
 	/* R/O compat */
 	EXT2_FEATURE_RO_COMPAT_LARGE_FILE |
 		EXT4_FEATURE_RO_COMPAT_HUGE_FILE|
@@ -159,8 +157,7 @@ static __u32 clear_ok_features[3] = {
 	/* Incompat */
 	EXT2_FEATURE_INCOMPAT_FILETYPE |
 		EXT4_FEATURE_INCOMPAT_FLEX_BG |
-		EXT4_FEATURE_INCOMPAT_MMP |
-		EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM,
+		EXT4_FEATURE_INCOMPAT_MMP,
 	/* R/O compat */
 	EXT2_FEATURE_RO_COMPAT_LARGE_FILE |
 		EXT4_FEATURE_RO_COMPAT_HUGE_FILE|
@@ -717,27 +714,47 @@ static void rewrite_metadata_checksums(ext2_filsys fs)
 	ext2fs_mark_super_dirty(fs);
 }
 
-/*
- * Rewrite just the block group checksums.  Only call this function if
- * you're _not_ calling rewrite_metadata_checksums; this function exists
- * to handle the case that you're changing bg_use_meta_csum and NOT changing
- * either gdt_csum or metadata_csum.
- */
-static void rewrite_bg_checksums(ext2_filsys fs)
+static void enable_uninit_bg(ext2_filsys fs)
 {
+	struct ext2_group_desc *gd;
 	int i;
 
-	if (!EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					EXT4_FEATURE_RO_COMPAT_GDT_CSUM) ||
-	    !EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
-		return;
+	for (i = 0; i < fs->group_desc_count; i++) {
+		gd = ext2fs_group_desc(fs, fs->group_desc, i);
+		gd->bg_itable_unused = 0;
+		gd->bg_flags = EXT2_BG_INODE_ZEROED;
+		ext2fs_group_desc_csum_set(fs, i);
+	}
+	fs->flags &= ~EXT2_FLAG_SUPER_ONLY;
+}
 
-	ext2fs_init_csum_seed(fs);
-	for (i = 0; i < fs->group_desc_count; i++)
+static void disable_uninit_bg(ext2_filsys fs, __u32 csum_feature_flag)
+{
+	struct ext2_group_desc *gd;
+	int i;
+
+	/* Load bitmaps to ensure that the uninit ones get written out */
+	fs->super->s_feature_ro_compat |= csum_feature_flag;
+	ext2fs_read_bitmaps(fs);
+	ext2fs_mark_ib_dirty(fs);
+	ext2fs_mark_bb_dirty(fs);
+	fs->super->s_feature_ro_compat &= ~csum_feature_flag;
+
+	for (i = 0; i < fs->group_desc_count; i++) {
+		gd = ext2fs_group_desc(fs, fs->group_desc, i);
+		if ((gd->bg_flags & EXT2_BG_INODE_ZEROED) == 0) {
+			/* 
+			 * XXX what we really should do is zap
+			 * uninitialized inode tables instead.
+			 */
+			request_fsck_afterwards(fs);
+			break;
+		}
+		gd->bg_itable_unused = 0;
+		gd->bg_flags = 0;
 		ext2fs_group_desc_csum_set(fs, i);
+	}
 	fs->flags &= ~EXT2_FLAG_SUPER_ONLY;
-	ext2fs_mark_super_dirty(fs);
 }
 
 /*
@@ -912,25 +929,26 @@ mmp_error:
 		}
 	}
 
-	if (FEATURE_ON(E2P_FEATURE_INCOMPAT,
-		       EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)) {
-		if (check_fsck_needed(fs))
-			exit(1);
-		rewrite_bgs_for_checksum = 1;
-	}
-
-	if (FEATURE_OFF(E2P_FEATURE_INCOMPAT,
-			EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)) {
-		if (check_fsck_needed(fs))
-			exit(1);
-		rewrite_bgs_for_checksum = 1;
-	}
-
 	if (FEATURE_ON(E2P_FEATURE_RO_INCOMPAT,
 		       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) {
 		if (check_fsck_needed(fs))
 			exit(1);
 		rewrite_checksums = 1;
+		/* metadata_csum supersedes uninit_bg */
+		fs->super->s_feature_ro_compat &=
+			~EXT4_FEATURE_RO_COMPAT_GDT_CSUM;
+
+		/* if uninit_bg was previously off, rewrite group desc */
+		if (!(old_features[E2P_FEATURE_RO_INCOMPAT] &
+		      EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+			enable_uninit_bg(fs);
+
+		/*
+		 * Since metadata_csum supersedes uninit_bg, pretend like
+		 * uninit_bg has been off all along.
+		 */
+		old_features[E2P_FEATURE_RO_INCOMPAT] &=
+			~EXT4_FEATURE_RO_COMPAT_GDT_CSUM;
 	}
 
 	if (FEATURE_OFF(E2P_FEATURE_RO_INCOMPAT,
@@ -938,37 +956,40 @@ mmp_error:
 		if (check_fsck_needed(fs))
 			exit(1);
 		rewrite_checksums = 1;
+		/*
+		 * If we're turning off metadata_csum and not turning on
+		 * uninit_bg, rewrite group desc.
+		 */
+		if (!(fs->super->s_feature_ro_compat &
+		      EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+			disable_uninit_bg(fs,
+				EXT4_FEATURE_RO_COMPAT_METADATA_CSUM);
+		else
+			/*
+			 * metadata_csum previously provided uninit_bg, so if
+			 * we're also setting the uninit_bg feature bit,
+			 * pretend like it was previously enabled.  Checksums
+			 * will be rewritten with crc16 later.
+			 */
+			old_features[E2P_FEATURE_RO_INCOMPAT] |=
+				EXT4_FEATURE_RO_COMPAT_GDT_CSUM;
 	}
 
 	if (FEATURE_ON(E2P_FEATURE_RO_INCOMPAT,
 		       EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
-		for (i = 0; i < fs->group_desc_count; i++) {
-			gd = ext2fs_group_desc(fs, fs->group_desc, i);
-			gd->bg_itable_unused = 0;
-			gd->bg_flags = EXT2_BG_INODE_ZEROED;
-			ext2fs_group_desc_csum_set(fs, i);
-		}
-		fs->flags &= ~EXT2_FLAG_SUPER_ONLY;
+		/* Do not enable uninit_bg when metadata_csum enabled */
+		if (fs->super->s_feature_ro_compat &
+		    EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
+			fs->super->s_feature_ro_compat &=
+				~EXT4_FEATURE_RO_COMPAT_GDT_CSUM;
+		else
+			enable_uninit_bg(fs);
 	}
 
 	if (FEATURE_OFF(E2P_FEATURE_RO_INCOMPAT,
-			EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
-		for (i = 0; i < fs->group_desc_count; i++) {
-			gd = ext2fs_group_desc(fs, fs->group_desc, i);
-			if ((gd->bg_flags & EXT2_BG_INODE_ZEROED) == 0) {
-				/* 
-				 * XXX what we really should do is zap
-				 * uninitialized inode tables instead.
-				 */
-				request_fsck_afterwards(fs);
-				break;
-			}
-			gd->bg_itable_unused = 0;
-			gd->bg_flags = 0;
-			gd->bg_checksum = 0;
-		}
-		fs->flags &= ~EXT2_FLAG_SUPER_ONLY;
-	}
+			EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+		disable_uninit_bg(fs,
+				EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
 
 	if (FEATURE_ON(E2P_FEATURE_RO_INCOMPAT,
 				EXT4_FEATURE_RO_COMPAT_QUOTA)) {
@@ -2550,8 +2571,7 @@ retry_open:
 				exit(1);
 		}
 
-		if (sb->s_feature_ro_compat &
-		    EXT4_FEATURE_RO_COMPAT_GDT_CSUM) {
+		if (ext2fs_has_group_desc_csum(fs)) {
 			/*
 			 * Determine if the block group checksums are
 			 * correct so we know whether or not to set
@@ -2588,8 +2608,6 @@ retry_open:
 	}
 	if (rewrite_checksums)
 		rewrite_metadata_checksums(fs);
-	else if (rewrite_bgs_for_checksum)
-		rewrite_bg_checksums(fs);
 	if (I_flag) {
 		if (mount_flags & EXT2_MF_MOUNTED) {
 			fputs(_("The inode size may only be "
diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index dc2805d..8a02ff4 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -191,8 +191,7 @@ static void fix_uninit_block_bitmaps(ext2_filsys fs)
 	int		old_desc_blocks;
 	dgrp_t		g;
 
-	if (!(EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					 EXT4_FEATURE_RO_COMPAT_GDT_CSUM)))
+	if (!ext2fs_has_group_desc_csum(fs))
 		return;
 
 	for (g=0; g < fs->group_desc_count; g++) {
@@ -482,8 +481,7 @@ retry:
 	group_block = fs->super->s_first_data_block +
 		old_fs->group_desc_count * fs->super->s_blocks_per_group;
 
-	csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
+	csum_flag = ext2fs_has_group_desc_csum(fs);
 	adj = old_fs->group_desc_count;
 	max_group = fs->group_desc_count - adj;
 	if (fs->super->s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG)
@@ -743,8 +741,7 @@ static void mark_fs_metablock(ext2_resize_t rfs,
 	} else if (IS_INODE_TB(fs, group, blk)) {
 		ext2fs_inode_table_loc_set(fs, group, 0);
 		rfs->needed_blocks++;
-	} else if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					      EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
+	} else if (ext2fs_has_group_desc_csum(fs) &&
 		   (ext2fs_bg_flags_test(fs, group, EXT2_BG_BLOCK_UNINIT))) {
 		/*
 		 * If the block bitmap is uninitialized, which means
@@ -804,8 +801,7 @@ static errcode_t blocks_to_move(ext2_resize_t rfs)
 	for (blk = ext2fs_blocks_count(fs->super);
 	     blk < ext2fs_blocks_count(old_fs->super); blk++) {
 		g = ext2fs_group_of_blk2(fs, blk);
-		if (EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
-					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
+		if (ext2fs_has_group_desc_csum(fs) &&
 		    ext2fs_bg_flags_test(old_fs, g, EXT2_BG_BLOCK_UNINIT)) {
 			/*
 			 * The block bitmap is uninitialized, so skip


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC v2] ext4: Rework metadata_csum/gdt_csum flag handling in kernel
  2012-02-29  5:48         ` Andreas Dilger
@ 2012-03-03  3:56           ` Darrick J. Wong
  0 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2012-03-03  3:56 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Ted Ts'o, Sunil Mushran, Martin K Petersen, Greg Freemyer,
	Amir Goldstein, linux-kernel, Andi Kleen, Mingming Cao,
	Joel Becker, linux-fsdevel, linux-ext4, Coly Li

On Tue, Feb 28, 2012 at 10:48:16PM -0700, Andreas Dilger wrote:
> On 2012-02-28, at 6:32 PM, Darrick J. Wong wrote:
> > Ok, I've reworked the block group descriptor checksum handling code per this
> > email thread.  INCOMPAT_BG_USE_META_CSUM is gone.  METADATA_CSUM implies (and
> > in fact overrides) GDT_CSUM, though the group descriptor checksum uses the same
> > function as all other metadata blocks' checksums (by default crc32c).  I
> > created a helper function to determine if group descriptor checksums are
> > enabled, and the actual gdt checksum verify/set functions are smart enough to
> > use the correct function.
> > 
> > Below are the changes that I intend to make to the kernel.  I'll integrate these
> > changes into the (huge) kernel patchset, but wanted to aggregate the changes
> > here first to avoid overwhelming reviewers.
> > 
> > Question: What will happen to old kernels when METADATA_CSUM and GDT_CSUM are
> > set?  Should the kernel reject the combination and ask for fsck?  I think it
> > will be ok, but older kernels might not be...?
> 
> As with the e2fsprogs patch, I think METADATA_CSUM should override GDT_CSUM
> completely.  If both are set, then the kernel should ignore GDT_CSUM entirely
> and just use the new checksum algorithm for the group descriptors.  It is up
> to the user tools not to allow this combination of features to be set, and
> there is no value in adding an extra failure case if they are (though if the
> superblock checksum is also incorrect, that means the superblock is broken
> and a backup should be used and/or the mount failed).
> 
> > Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
> > ---
> 
> One minor comment below, but I think this patch is the right approach.  I was
> also going to proffer my Acked-by: for this patch, but I now recall that this
> patch is itself short lived and will be merged into the patch series.

I merged that comment into the kernel patch, along with an extra hunk to warn
if metadata_csum and uninit_bg feature bits are both set.  I wonder if the
kernel should react more forcefully to this supposedly "impossible" situation?
But then I might just be unnecessarily paranoid.  At worst, I suppose, the
checksums on all the group descriptors will be "wrong" and the fs simply won't
allow writes.

(Hm, maybe it should unclear the clean flag?)

Here's v2, to go with the e2fsprogs v2 patch.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---

 fs/ext4/balloc.c  |    4 ++--
 fs/ext4/ext4.h    |   21 ++++++++++++++++-----
 fs/ext4/ialloc.c  |   19 ++++++++-----------
 fs/ext4/inode.c   |    3 +--
 fs/ext4/mballoc.c |    6 +++---
 fs/ext4/resize.c  |    9 +++------
 fs/ext4/super.c   |   30 +++++++++++++++++-------------
 7 files changed, 50 insertions(+), 42 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 6eee0e6..b5a7951 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -168,7 +168,7 @@ void ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
 
 	/* If checksum is bad mark all blocks used to prevent allocation
 	 * essentially implementing a per-group read-only flag. */
-	if (!ext4_group_desc_csum_verify(sbi, block_group, gdp)) {
+	if (!ext4_group_desc_csum_verify(sb, block_group, gdp)) {
 		ext4_error(sb, "Checksum bad for group %u", block_group);
 		ext4_free_group_clusters_set(sb, gdp, 0);
 		ext4_free_inodes_set(sb, gdp, 0);
@@ -214,7 +214,7 @@ void ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
 			     sb->s_blocksize * 8, bh->b_data);
 	ext4_block_bitmap_csum_set(sb, block_group, gdp, bh,
 				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
-	ext4_group_desc_csum_set(sbi, block_group, gdp);
+	ext4_group_desc_csum_set(sb, block_group, gdp);
 }
 
 /* Return the number of free blocks in a block group.  It is used when
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 70bd236..dd75078 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1434,6 +1434,12 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
 #define EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE	0x0040
 #define EXT4_FEATURE_RO_COMPAT_QUOTA		0x0100
 #define EXT4_FEATURE_RO_COMPAT_BIGALLOC		0x0200
+/*
+ * METADATA_CSUM also enables group descriptor checksums (GDT_CSUM).  When
+ * METADATA_CSUM is set, group descriptor checksums use the same algorithm as
+ * all other data structures' checksums.  However, the METADATA_CSUM and
+ * GDT_CSUM bits are mutually exclusive.
+ */
 #define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM	0x0400
 
 #define EXT4_FEATURE_INCOMPAT_COMPRESSION	0x0001
@@ -1449,7 +1455,6 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
 #define EXT4_FEATURE_INCOMPAT_DIRDATA		0x1000 /* data in dirent */
 #define EXT4_FEATURE_INCOMPAT_INLINEDATA	0x2000 /* data in inode */
 #define EXT4_FEATURE_INCOMPAT_LARGEDIR		0x4000 /* >2GB or 3-lvl htree */
-#define EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM	0x8000
 
 #define EXT2_FEATURE_COMPAT_SUPP	EXT4_FEATURE_COMPAT_EXT_ATTR
 #define EXT2_FEATURE_INCOMPAT_SUPP	(EXT4_FEATURE_INCOMPAT_FILETYPE| \
@@ -1473,8 +1478,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
 					 EXT4_FEATURE_INCOMPAT_EXTENTS| \
 					 EXT4_FEATURE_INCOMPAT_64BIT| \
 					 EXT4_FEATURE_INCOMPAT_FLEX_BG| \
-					 EXT4_FEATURE_INCOMPAT_MMP| \
-					 EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM)
+					 EXT4_FEATURE_INCOMPAT_MMP)
 #define EXT4_FEATURE_RO_COMPAT_SUPP	(EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER| \
 					 EXT4_FEATURE_RO_COMPAT_LARGE_FILE| \
 					 EXT4_FEATURE_RO_COMPAT_GDT_CSUM| \
@@ -2092,11 +2096,18 @@ extern void ext4_used_dirs_set(struct super_block *sb,
 				struct ext4_group_desc *bg, __u32 count);
 extern void ext4_itable_unused_set(struct super_block *sb,
 				   struct ext4_group_desc *bg, __u32 count);
-extern int ext4_group_desc_csum_verify(struct ext4_sb_info *sbi, __u32 group,
+extern int ext4_group_desc_csum_verify(struct super_block *sb, __u32 group,
 				       struct ext4_group_desc *gdp);
-extern void ext4_group_desc_csum_set(struct ext4_sb_info *sbi, __u32 group,
+extern void ext4_group_desc_csum_set(struct super_block *sb, __u32 group,
 				     struct ext4_group_desc *gdp);
 
+static inline int ext4_has_group_desc_csum(struct super_block *sb)
+{
+	return EXT4_HAS_RO_COMPAT_FEATURE(sb,
+					  EXT4_FEATURE_RO_COMPAT_GDT_CSUM |
+					  EXT4_FEATURE_RO_COMPAT_METADATA_CSUM);
+}
+
 static inline ext4_fsblk_t ext4_blocks_count(struct ext4_super_block *es)
 {
 	return ((ext4_fsblk_t)le32_to_cpu(es->s_blocks_count_hi) << 32) |
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index b9b6b27..1ade34d 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -70,13 +70,11 @@ static unsigned ext4_init_inode_bitmap(struct super_block *sb,
 				       ext4_group_t block_group,
 				       struct ext4_group_desc *gdp)
 {
-	struct ext4_sb_info *sbi = EXT4_SB(sb);
-
 	J_ASSERT_BH(bh, buffer_locked(bh));
 
 	/* If checksum is bad mark all blocks and inodes use to prevent
 	 * allocation, essentially implementing a per-group read-only flag. */
-	if (!ext4_group_desc_csum_verify(sbi, block_group, gdp)) {
+	if (!ext4_group_desc_csum_verify(sb, block_group, gdp)) {
 		ext4_error(sb, "Checksum bad for group %u", block_group);
 		ext4_free_group_clusters_set(sb, gdp, 0);
 		ext4_free_inodes_set(sb, gdp, 0);
@@ -92,7 +90,7 @@ static unsigned ext4_init_inode_bitmap(struct super_block *sb,
 			bh->b_data);
 	ext4_inode_bitmap_csum_set(sb, block_group, gdp, bh,
 				   EXT4_INODES_PER_GROUP(sb) / 8);
-	ext4_group_desc_csum_set(sbi, block_group, gdp);
+	ext4_group_desc_csum_set(sb, block_group, gdp);
 
 	return EXT4_INODES_PER_GROUP(sb);
 }
@@ -287,7 +285,7 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
 	}
 	ext4_inode_bitmap_csum_set(sb, block_group, gdp, bitmap_bh,
 				   EXT4_INODES_PER_GROUP(sb) / 8);
-	ext4_group_desc_csum_set(sbi, block_group, gdp);
+	ext4_group_desc_csum_set(sb, block_group, gdp);
 	ext4_unlock_group(sb, block_group);
 
 	percpu_counter_inc(&sbi->s_freeinodes_counter);
@@ -657,8 +655,7 @@ static int ext4_claim_inode(struct super_block *sb,
 	}
 	/* If we didn't allocate from within the initialized part of the inode
 	 * table then we need to initialize up to this inode. */
-	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
-
+	if (ext4_has_group_desc_csum(sb)) {
 		if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
 			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
 			/* When marking the block group with
@@ -697,7 +694,7 @@ static int ext4_claim_inode(struct super_block *sb,
 	}
 	ext4_inode_bitmap_csum_set(sb, group, gdp, inode_bitmap_bh,
 				   EXT4_INODES_PER_GROUP(sb) / 8);
-	ext4_group_desc_csum_set(sbi, group, gdp);
+	ext4_group_desc_csum_set(sb, group, gdp);
 err_ret:
 	ext4_unlock_group(sb, group);
 	up_read(&grp->alloc_sem);
@@ -832,7 +829,7 @@ repeat_in_this_group:
 
 got:
 	/* We may have to initialize the block bitmap if it isn't already */
-	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
+	if (ext4_has_group_desc_csum(sb) &&
 	    gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
 		struct buffer_head *block_bitmap_bh;
 
@@ -858,7 +855,7 @@ got:
 						   block_bitmap_bh,
 						   EXT4_BLOCKS_PER_GROUP(sb) /
 						   8);
-			ext4_group_desc_csum_set(sbi, group, gdp);
+			ext4_group_desc_csum_set(sb, group, gdp);
 		}
 		ext4_unlock_group(sb, group);
 
@@ -1226,7 +1223,7 @@ int ext4_init_inode_table(struct super_block *sb, ext4_group_t group,
 skip_zeroout:
 	ext4_lock_group(sb, group);
 	gdp->bg_flags |= cpu_to_le16(EXT4_BG_INODE_ZEROED);
-	ext4_group_desc_csum_set(sbi, group, gdp);
+	ext4_group_desc_csum_set(sb, group, gdp);
 	ext4_unlock_group(sb, group);
 
 	BUFFER_TRACE(group_desc_bh,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c0200cf..e94ac91 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3573,8 +3573,7 @@ make_io:
 				b = table;
 			end = b + EXT4_SB(sb)->s_inode_readahead_blks;
 			num = EXT4_INODES_PER_GROUP(sb);
-			if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+			if (ext4_has_group_desc_csum(sb))
 				num -= ext4_itable_unused_count(sb, gdp);
 			table += num / inodes_per_block;
 			if (end > table)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5f2e2ed..d6062e7 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2919,7 +2919,7 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
 	ext4_free_group_clusters_set(sb, gdp, len);
 	ext4_block_bitmap_csum_set(sb, ac->ac_b_ex.fe_group, gdp, bitmap_bh,
 				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
-	ext4_group_desc_csum_set(sbi, ac->ac_b_ex.fe_group, gdp);
+	ext4_group_desc_csum_set(sb, ac->ac_b_ex.fe_group, gdp);
 
 	ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
 	percpu_counter_sub(&sbi->s_freeclusters_counter, ac->ac_b_ex.fe_len);
@@ -4787,7 +4787,7 @@ do_more:
 	ext4_free_group_clusters_set(sb, gdp, ret);
 	ext4_block_bitmap_csum_set(sb, block_group, gdp, bitmap_bh,
 				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
-	ext4_group_desc_csum_set(sbi, block_group, gdp);
+	ext4_group_desc_csum_set(sb, block_group, gdp);
 	ext4_unlock_group(sb, block_group);
 	percpu_counter_add(&sbi->s_freeclusters_counter, count_clusters);
 
@@ -4933,7 +4933,7 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 	ext4_free_group_clusters_set(sb, desc, blk_free_count);
 	ext4_block_bitmap_csum_set(sb, block_group, desc, bitmap_bh,
 				   EXT4_BLOCKS_PER_GROUP(sb) / 8);
-	ext4_group_desc_csum_set(sbi, block_group, desc);
+	ext4_group_desc_csum_set(sb, block_group, desc);
 	ext4_unlock_group(sb, block_group);
 	percpu_counter_add(&sbi->s_freeclusters_counter,
 			   EXT4_B2C(sbi, blocks_freed));
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 2363532..21ace95 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1106,7 +1106,7 @@ static int ext4_setup_new_descs(handle_t *handle, struct super_block *sb,
 					     EXT4_B2C(sbi, group_data->free_blocks_count));
 		ext4_free_inodes_set(sb, gdp, EXT4_INODES_PER_GROUP(sb));
 		gdp->bg_flags = cpu_to_le16(*bg_flags);
-		ext4_group_desc_csum_set(sbi, group, gdp);
+		ext4_group_desc_csum_set(sb, group, gdp);
 
 		err = ext4_handle_dirty_metadata(handle, NULL, gdb_bh);
 		if (unlikely(err)) {
@@ -1342,17 +1342,14 @@ static int ext4_setup_next_flex_gd(struct super_block *sb,
 			   (1 + ext4_bg_num_gdb(sb, group + i) +
 			    le16_to_cpu(es->s_reserved_gdt_blocks)) : 0;
 		group_data[i].free_blocks_count = blocks_per_group - overhead;
-		if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
-					       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+		if (ext4_has_group_desc_csum(sb))
 			flex_gd->bg_flags[i] = EXT4_BG_BLOCK_UNINIT |
 					       EXT4_BG_INODE_UNINIT;
 		else
 			flex_gd->bg_flags[i] = EXT4_BG_INODE_ZEROED;
 	}
 
-	if (last_group == n_group &&
-	    EXT4_HAS_RO_COMPAT_FEATURE(sb,
-				       EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+	if (last_group == n_group && ext4_has_group_desc_csum(sb))
 		/* We need to initialize block bitmap of last group. */
 		flex_gd->bg_flags[i - 1] &= ~EXT4_BG_BLOCK_UNINIT;
 
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 2190044..e9b01e4 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2097,9 +2097,7 @@ static __le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 block_group,
 	__le32 le_group = cpu_to_le32(block_group);
 
 	if ((sbi->s_es->s_feature_ro_compat &
-	     cpu_to_le32(EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) &&
-	    (sbi->s_es->s_feature_incompat &
-	     cpu_to_le32(EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM))) {
+	     cpu_to_le32(EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))) {
 		/* Use new metadata_csum algorithm */
 		__u16 old_csum;
 		__u32 csum32;
@@ -2135,24 +2133,23 @@ out:
 	return cpu_to_le16(crc);
 }
 
-int ext4_group_desc_csum_verify(struct ext4_sb_info *sbi, __u32 block_group,
+int ext4_group_desc_csum_verify(struct super_block *sb, __u32 block_group,
 				struct ext4_group_desc *gdp)
 {
-	if ((sbi->s_es->s_feature_ro_compat &
-	     cpu_to_le32(EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) &&
-	    (gdp->bg_checksum != ext4_group_desc_csum(sbi, block_group, gdp)))
+	if (ext4_has_group_desc_csum(sb) &&
+	    (gdp->bg_checksum != ext4_group_desc_csum(EXT4_SB(sb),
+						      block_group, gdp)))
 		return 0;
 
 	return 1;
 }
 
-void ext4_group_desc_csum_set(struct ext4_sb_info *sbi, __u32 block_group,
+void ext4_group_desc_csum_set(struct super_block *sb, __u32 block_group,
 			      struct ext4_group_desc *gdp)
 {
-	if (!(sbi->s_es->s_feature_ro_compat &
-	     cpu_to_le32(EXT4_FEATURE_RO_COMPAT_GDT_CSUM)))
+	if (!ext4_has_group_desc_csum(sb))
 		return;
-	gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
+	gdp->bg_checksum = ext4_group_desc_csum(EXT4_SB(sb), block_group, gdp);
 }
 
 /* Called at mount-time, super-block is locked */
@@ -2209,7 +2206,7 @@ static int ext4_check_descriptors(struct super_block *sb,
 			return 0;
 		}
 		ext4_lock_group(sb, i);
-		if (!ext4_group_desc_csum_verify(sbi, i, gdp)) {
+		if (!ext4_group_desc_csum_verify(sb, i, gdp)) {
 			ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: "
 				 "Checksum for group %u failed (%u!=%u)",
 				 i, le16_to_cpu(ext4_group_desc_csum(sbi, i,
@@ -3289,6 +3286,13 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		goto cantfind_ext4;
 	sbi->s_kbytes_written = le64_to_cpu(es->s_kbytes_written);
 
+	/* Warn if metadata_csum and gdt_csum are both set. */
+	if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
+				       EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) &&
+	    EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+		ext4_warning(sb, KERN_INFO "metadata_csum and uninit_bg are "
+			     "redundant flags; please run fsck.");
+
 	/* Check for a known checksum algorithm */
 	if (!ext4_verify_csum_type(sb, es)) {
 		ext4_msg(sb, KERN_ERR, "VFS: Found ext4 filesystem with "
@@ -4620,7 +4624,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
 				struct ext4_group_desc *gdp =
 					ext4_get_group_desc(sb, g, NULL);
 
-				if (!ext4_group_desc_csum_verify(sbi, g, gdp)) {
+				if (!ext4_group_desc_csum_verify(sb, g, gdp)) {
 					ext4_msg(sb, KERN_ERR,
 	       "ext4_remount: Checksum for group %u failed (%u!=%u)",
 		g, le16_to_cpu(ext4_group_desc_csum(sbi, g, gdp)),


^ permalink raw reply related	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2012-03-03  3:56 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-07  8:27 [PATCH v2.3 00/23] ext4: Add metadata checksumming Darrick J. Wong
2012-01-07  8:27 ` [PATCH 01/23] ext4: Create a new BH_Verified flag to avoid unnecessary metadata validation Darrick J. Wong
2012-01-07  8:28 ` [PATCH 02/23] ext4: Change on-disk layout to support extended metadata checksumming Darrick J. Wong
2012-01-07  8:28 ` [PATCH 03/23] ext4: Record the checksum algorithm in use in the superblock Darrick J. Wong
2012-01-07  8:28 ` [PATCH 04/23] ext4: Only call out to crc32c if necessary Darrick J. Wong
2012-01-07  8:28 ` [PATCH 05/23] ext4: Calculate and verify superblock checksum Darrick J. Wong
2012-01-07  8:28 ` [PATCH 06/23] ext4: Calculate and verify inode checksums Darrick J. Wong
2012-01-07  8:28 ` [PATCH 07/23] ext4: Calculate and verify checksums for inode bitmaps Darrick J. Wong
2012-01-07  8:28 ` [PATCH 08/23] ext4: Calculate and verify block bitmap checksum Darrick J. Wong
2012-01-07  8:28 ` [PATCH 09/23] ext4: Verify and calculate checksums for extent tree blocks Darrick J. Wong
2012-01-07  8:28 ` [PATCH 10/23] ext4: Calculate and verify checksums for htree nodes Darrick J. Wong
2012-01-07  8:29 ` [PATCH 11/23] ext4: Calculate and verify checksums of directory leaf blocks Darrick J. Wong
2012-01-07  8:29 ` [PATCH 12/23] ext4: Calculate and verify checksums of extended attribute blocks Darrick J. Wong
2012-01-07  8:29 ` [PATCH 13/23] ext4: Add new feature to make block group checksums use metadata_csum algorithm Darrick J. Wong
     [not found]   ` <8ED6E1F9-DB56-4D31-BCA8-2A3A8D514BD5@dilger.ca>
2012-02-13 22:28     ` Ted Ts'o
2012-02-29  1:27       ` [RFC] e2fsprogs: Rework metadata_csum/gdt_csum flag handling Darrick J. Wong
2012-02-29  5:40         ` Andreas Dilger
2012-03-03  3:50           ` [RFC v2] " Darrick J. Wong
2012-02-29  1:32       ` [RFC] ext4: Rework metadata_csum/gdt_csum flag handling in kernel Darrick J. Wong
2012-02-29  5:48         ` Andreas Dilger
2012-03-03  3:56           ` [RFC v2] " Darrick J. Wong
2012-01-07  8:29 ` [PATCH 14/23] ext4: Add checksums to the MMP block Darrick J. Wong
2012-01-07  8:29 ` [PATCH 15/23] jbd2: Change disk layout for metadata checksumming Darrick J. Wong
2012-01-07  8:29 ` [PATCH 16/23] jbd2: Enable journal clients to enable v2 checksumming Darrick J. Wong
2012-01-07  8:29 ` [PATCH 17/23] jbd2: Grab a reference to the crc32c driver only when necessary Darrick J. Wong
2012-01-07  8:29 ` [PATCH 18/23] jbd2: Checksum journal superblock Darrick J. Wong
2012-01-07  8:30 ` [PATCH 19/23] jbd2: Checksum revocation blocks Darrick J. Wong
2012-01-07  8:30 ` [PATCH 20/23] jbd2: Checksum descriptor blocks Darrick J. Wong
2012-01-07  8:30 ` [PATCH 21/23] jbd2: Checksum commit blocks Darrick J. Wong
2012-01-07  8:30 ` [PATCH 22/23] jbd2: Checksum data blocks that are stored in the journal Darrick J. Wong
2012-01-07  8:30 ` [PATCH 23/23] ext4/jbd2: Add metadata checksumming to the list of supported features Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).