All of lore.kernel.org
 help / color / mirror / Atom feed
From: Facebook <clm@fb.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: <dsterba@suse.cz>, <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH RFC] btrfs: csum: Introduce partial csum for tree block.
Date: Thu, 18 Jun 2015 11:57:46 -0400	[thread overview]
Message-ID: <1434643066.28534.0@mail.thefacebook.com> (raw)
In-Reply-To: <55822008.1090305@cn.fujitsu.com>



On Wed, Jun 17, 2015 at 9:34 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> 
wrote:
> Ping?
> 
> New new comments?

As our block sizes get bigger, it makes sense to think about more fine 
grained checksums.  We're using crcs for:

1) memory corruption on the way down to the storage.  We could be very 
small (bitflips) or smaller chunks (dma corrupting the whole bio).  The 
places I've seen this in production, the partial crcs might help save a 
percentage of the blocks, but overall the corruptions were just too 
pervasive to get back the data.

2) incomplete writes.  We're sending down up to 64K btree blocks, the 
storage might only write some of them.

3) IO errors from the drive.  These are likely to fail in much bigger 
chunks and the partial csums probably won't help at all.

I think the best way to repair all of these is with replication, either 
RAID5/6 or some number of mirrored copies.  It's more reliable than 
trying to stitch together streams from multiple copies, and the code 
complexity is much lower.

But, where I do find the partial crcs interesting is the ability to 
more accurately detect those three failure modes with our larger block 
sizes.  That's pure statistics based on the crc we've chosen and the 
size of the block.  The right answer might just be a different crc, but 
I'm more than open to data here.

-chris



  reply	other threads:[~2015-06-18 15:58 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-12  3:00 [PATCH RFC] btrfs: csum: Introduce partial csum for tree block Qu Wenruo
2015-06-12 14:10 ` Liu Bo
2015-06-12 16:23 ` Chris Mason
2015-06-15  8:02   ` Qu Wenruo
2015-06-15 13:15     ` David Sterba
2015-06-16  1:22       ` Qu Wenruo
2015-06-16  2:39         ` Qu Wenruo
2015-06-18  1:34           ` Qu Wenruo
2015-06-18 15:57             ` Facebook [this message]
2015-06-18 17:06               ` David Sterba
2015-06-19  1:26                 ` Qu Wenruo
2015-06-25 15:31                   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1434643066.28534.0@mail.thefacebook.com \
    --to=clm@fb.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.