All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Murphy <lists@colorremedies.com>
To: Roy Sigurd Karlsbakk <roy@karlsbakk.net>
Cc: linux-raid Raid <linux-raid@vger.kernel.org>
Subject: Re: Chances of silent errors?
Date: Mon, 21 Jan 2013 16:00:17 -0700	[thread overview]
Message-ID: <2F4E0D43-D487-4722-B582-1F424B7BFF10@colorremedies.com> (raw)
In-Reply-To: <28222077.12.1358798287217.JavaMail.root@zimbra>


On Jan 21, 2013, at 12:58 PM, Roy Sigurd Karlsbakk <roy@karlsbakk.net> wrote:

> Hi all
> 
> Coming from the zfs world, I've heard a few talk about the chances of "silent errors", meaning the checksum on the drives match, but the data being bad because of matching checksum (aka collisions). Does anyone in here know the relative chance of something like that happening with the checksums of current harddisks? Is the 1:10^14 or 1:10^15 chances for a URE in regard to this, or is that when the drive reports an error, or those two combined?

It's fun trying to locate what is a URE, a UER, and BER. I don't see even SNIA consistently using one term. WDC uses "Non-recoverable" rather than "unrecoverable" and while linguistically these are the same, if they're effectively using a different term than SNIA it might be a different thing, what is defined as "error". All of these though are disk errors.

SDC is not necessarily a disk only error. It can occur in the disk, in the cable between disk and controller, in the controller, or between controller and memory. So there are actually many more areas where SDC can occur.

To be SDC, it's either undetected error, or it's detected and improperly corrected error. In either case the result is error propagation without the OS being notified by constituent components in the storage stack.

Anyway, I think what you're after, a probability, for ZFS being spoofed as a result of SDC resulting in a checksum collision, is really remote. You're talking about a very rare case of SDC to start out with, second SDC is not well understood to have probabilities, and then you have a remarkably small surface area. To get a collision like you're suggesting, the SDC would have to exactly have affected the data and the metadata in a way that the corrupt data's checksum indicates it's valid data. Both would have to be so affected. That sort of collision is really next to impossible, even for MD5. Collisions have been demonstrated to be possible, but aren't expected in the wild. What checksum method does ZFS default to?

Chris Murphy

  parent reply	other threads:[~2013-01-21 23:00 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-21 19:58 Chances of silent errors? Roy Sigurd Karlsbakk
2013-01-21 22:16 ` Roger Heflin
2013-01-21 22:39 ` Peter Grandi
2013-01-21 23:00 ` Chris Murphy [this message]
2013-01-22 10:55 ` Roy Sigurd Karlsbakk
2013-01-22 16:28   ` Chris Murphy
2013-01-22 18:33     ` Roy Sigurd Karlsbakk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2F4E0D43-D487-4722-B582-1F424B7BFF10@colorremedies.com \
    --to=lists@colorremedies.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=roy@karlsbakk.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.