linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@us.ibm.com>
To: Andreas Dilger <adilger.kernel@dilger.ca>, Theodore Tso <tytso@mit.edu>
Cc: Sunil Mushran <sunil.mushran@oracle.com>,
	Martin K Petersen <martin.petersen@oracle.com>,
	Greg Freemyer <greg.freemyer@gmail.com>,
	Amir Goldstein <amir73il@gmail.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Andi Kleen <andi@firstfloor.org>, Mingming Cao <cmm@us.ibm.com>,
	Joel Becker <jlbec@evilplan.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-ext4@vger.kernel.org, Coly Li <colyli@gmail.com>
Subject: Re: [PATCH v2.2 00/23] ext4: Add metadata checksumming
Date: Tue, 13 Dec 2011 17:10:28 -0800	[thread overview]
Message-ID: <20111214011028.GA8233@tux1.beaverton.ibm.com> (raw)
In-Reply-To: <20111214004613.28243.54122.stgit@elm3c44.beaverton.ibm.com>

On Tue, Dec 13, 2011 at 04:46:14PM -0800, Darrick J. Wong wrote:
> Hi all,
> 
> This patchset adds crc32c checksums to most of the ext4 metadata objects.  A
> full design document is on the ext4 wiki[1] but I will summarize that document here.
> 
> As much as we wish our storage hardware was totally reliable, it is still
> quite possible for data to be corrupted on disk, corrupted during transfer over
> a wire, or written to the wrong places.  To protect against this sort of
> non-hostile corruption, it is desirable to store checksums of metadata objects
> on the filesystem to prevent broken metadata from shredding the filesystem.
> 
> The crc32c polynomial was chosen for its improved error detection capabilities
> over crc32 and crc16, and because of its hardware acceleration on current and
> upcoming Intel and Sparc chips.
> 
> Each type of metadata object has been retrofitted to store a checksum as follows:
> 
> - The superblock stores a crc32c of itself.
> - Each inode stores crc32c(fs_uuid + inode_num + inode_gen + inode +
>   slack_space_after_inode)
> - Block and inode bitmaps each get their own crc32c(fs_uuid + group_num +
>   bitmap), stored in the block group descriptor.
> - Each extent tree block stores a crc32c(fs_uuid + inode_num + inode_gen +
>   extent_entries) in unused space at the end of the block.
> - Each directory leaf block has an unused-looking directory entry big enough to
>   store a crc32c(fs_uuid + inode_num + inode_gen + block) at the end of the
>   block.
> - Each directory htree block is shortened to contain a crc32c(fs_uuid +
>   inode_num + inode_gen + block) at the end of the block.
> - Extended attribute blocks store crc32c(fs_uuid + id + ea_block) in the
>   header, where id is, depending on the refcount, either the inode_num and
>   inode_gen; or the block number.
> - MMP blocks store crc32c(fs_uuid + mmpblock) at the end of the MMP block.
> - Block groups can now use crc32c instead of crc16.
> - The journal now has a v2 checksum feature flag.
> - crc32c(j_uuid + block) checksums have been inserted into descriptor blocks,
>   commit blocks, revoke blocks, and the journal superblock.
> - Each block tag in a descriptor block has a checksum of the related data block.
> 
> The patchset for e2fsprogs will be sent under separate cover only to linux-ext4
> as it is quite lengthy (~48 patches).
> 
> As far as performance impact goes, I see nearly no change with a standard mail
> server ffsb simulation.  On a test that involves only file creation and
> deletion and extent tree modifications, I see a drop of about 50 percent with
> the current kernel crc32c implementation; this improves to a drop of about 20
> percent with the enclosed crc32c implementation.  However, given that metadata
> is usually a small fraction of total IO, it doesn't seem like the cost of
> enabling this feature is unreasonable.
> 
> There are of course unresolved issues:
> 
> - I haven't fixed it up to checksum the exclude bitmap yet.  I'll probably
>   submit that as an add-on to the snapshot patchset.
> 
> - Using the journal commit hooks to delay crc32c calculation until dirty
>   buffers are actually being written to disk.
> 
> - Interaction with online resize code.  Yongqiang seems to be in the process of
>   rewriting this not to use custom metadata block write functions, but I haven't
>   looked at it very closely yet.
> 
> Please have a look at the design document and patches, and please feel free to
> suggest any changes.
> 
> v2: Checksum the MMP block, store the checksum type in the superblock, include
> the inode generation in file checksums, and finally solve the problem of limited
> space in block groups by splitting the checksum into two halves.
> 
> v2.1: Checksum the reserved parts of the htree tail structure.  Fix some flag
> handling bugs with the mb cache init routine wherein bitmaps could fail to be
> checksummed at read time.
> 
> v2.2: Reincorporate the FS UUID in the bitmap checksum calcuations.  Move all
> disk layout changes to the front and the feature flag enablement to the end of
> the patch set.  Fail journal recovery if revoke block fails checksum.
> 
> This patchset has been tested on 3.2.0-rc5 on x64, i386, ppc64, and ppc32.  The
> patches seems to work fine on all four platforms.

OH COME ON!!!!

stgit helpfully changed the From lines, screwing everything up.  Awesome.  I
apologize, I wasn't expecting stgit to change the From: lines when I migrated
the disk format changes to the front of the set.

I don't really want to respam the list just to fix this one little thing.  If
people want more code changes I'll gladly make them and re-send.  If not, then
Ted, I can just send you the whole mess as a big mbox file, or post them on a
webserver somewhere, with correct attribution.

--D
> 
> --D
> 
> [1] https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


  parent reply	other threads:[~2011-12-14  1:10 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-14  0:46 [PATCH v2.2 00/23] ext4: Add metadata checksumming Darrick J. Wong
2011-12-14  0:46 ` [PATCH 01/23] ext4: Create a new BH_Verified flag to avoid unnecessary metadata validation Darrick J. Wong
2011-12-14  0:46 ` [PATCH 03/23] ext4: Record the checksum algorithm in use in the superblock Darrick J. Wong
2011-12-14  0:46 ` [PATCH 04/23] ext4: Only call out to crc32c if necessary Darrick J. Wong
2011-12-14  0:47 ` [PATCH 13/23] ext4: Add new feature to make block group checksums use metadata_csum algorithm Darrick J. Wong
2011-12-14  0:47 ` [PATCH 14/23] ext4: Add checksums to the MMP block Darrick J. Wong
2011-12-14  0:47 ` [PATCH 15/23] jbd2: Change disk layout for metadata checksumming Darrick J. Wong
2011-12-14  0:48 ` [PATCH 16/23] jbd2: Enable journal clients to enable v2 checksumming Darrick J. Wong
2011-12-14  0:48 ` [PATCH 17/23] jbd2: Grab a reference to the crc32c driver only when necessary Darrick J. Wong
2011-12-14  0:48 ` [PATCH 18/23] jbd2: Update structure definitions and flags to support extended checksumming Darrick J. Wong
2011-12-14  0:48 ` [PATCH 19/23] jbd2: Checksum revocation blocks Darrick J. Wong
2011-12-14  0:48 ` [PATCH 20/23] jbd2: Checksum descriptor blocks Darrick J. Wong
2011-12-14  0:48 ` [PATCH 21/23] jbd2: Checksum commit blocks Darrick J. Wong
2011-12-14  0:48 ` [PATCH 22/23] jbd2: Checksum data blocks that are stored in the journal Darrick J. Wong
2011-12-14  0:48 ` [PATCH 23/23] ext4/jbd2: Add metadata checksumming to the list of supported features Darrick J. Wong
2011-12-14  1:10 ` Darrick J. Wong [this message]
2011-12-14  1:31   ` [PATCH v2.2 00/23] ext4: Add metadata checksumming Darrick J. Wong
2012-01-06 18:46     ` Andreas Dilger
2012-01-06 18:57       ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111214011028.GA8233@tux1.beaverton.ibm.com \
    --to=djwong@us.ibm.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=amir73il@gmail.com \
    --cc=andi@firstfloor.org \
    --cc=cmm@us.ibm.com \
    --cc=colyli@gmail.com \
    --cc=greg.freemyer@gmail.com \
    --cc=jlbec@evilplan.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=sunil.mushran@oracle.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).