All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alan <alan@lxorguk.ukuu.org.uk>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: "Moore, Eric" <Eric.Moore@lsi.com>,
	ric@emc.com, Theodore Tso <tytso@mit.edu>,
	Neil Brown <neilb@suse.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Linux-ide <linux-ide@vger.kernel.org>,
	linux-scsi <linux-scsi@vger.kernel.org>,
	linux-raid@vger.kernel.org, Tejun Heo <htejun@gmail.com>,
	James Bottomley <James.Bottomley@SteelEye.com>,
	Mark Lord <mlord@pobox.com>, Jens Axboe <jens.axboe@oracle.com>,
	"Clark, Nathan" <Clark_Nathan@emc.com>,
	"Singh, Arvinder" <Singh_Arvinder@emc.com>,
	"De Smet, Jochen" <DeSmet_Jochen@emc.com>,
	"Farmer, Matt" <Farmer_Matt@emc.com>,
	linux-fsdevel@vger.kernel.org, "Mizar,
	Sunita" <Mizar_Sunita@emc.com>
Subject: Re: end to end error recovery musings
Date: Tue, 27 Feb 2007 19:02:36 +0000	[thread overview]
Message-ID: <20070227190236.58323a40@lxorguk.ukuu.org.uk> (raw)
In-Reply-To: <yq14pp78rze.fsf@sermon.lab.mkp.net>

> These features make the most sense in terms of WRITE.  Disks already
> have plenty of CRC on the data so if a READ fails on a regular drive
> we already know about it.

Don't bet on it. If you want to do this seriously you need an end to end
(media to host ram) checksum. We do see bizarre and quite evil things
happen to people occasionally because they rely on bus level protection -
both faulty network cards and faulty disk or controller RAM can cause very
bad things to happen in a critical environment and are very very hard to
detect and test for.

IDE has another hideously evil feature in this area. Command blocks are
sent by PIO cycles, and are therefore unprotected from corruption. So
while a data burst with corruption will error and retry and command which
corrupts the block number although very very much less likely (less bits
and much lower speed) will not be caught on a PATA system for read or for
write and will hit the wrong block.

With networking you can turn off hardware IP checksumming (and many
cluster people do) with disks we don't yet have a proper end to end
checksum to media system in the fs or block layers.

> It would be great if the app tag was more than 16 bits.  Ted mentioned
> that ideally he'd like to store the inode number in the app tag.  But
> as it stands there isn't room.

The lowest few bits are the most important with ext2/ext3 because you
normally lose a sector of inodes which means you've got dangly bits
associated with a sequence of inodes with the same upper bits. More
problematic is losing indirect blocks, and being able to keep some kind
of [inode low bits/block index] would help put stuff back together.

Alan

WARNING: multiple messages have this Message-ID (diff)
From: Alan <alan@lxorguk.ukuu.org.uk>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: "Moore, Eric" <Eric.Moore@lsi.com>, <ric@emc.com>,
	"Theodore Tso" <tytso@mit.edu>, "Neil Brown" <neilb@suse.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Linux-ide" <linux-ide@vger.kernel.org>,
	"linux-scsi" <linux-scsi@vger.kernel.org>,
	<linux-raid@vger.kernel.org>, "Tejun Heo" <htejun@gmail.com>,
	"James Bottomley" <James.Bottomley@SteelEye.com>,
	"Mark Lord" <mlord@pobox.com>,
	"Jens Axboe" <jens.axboe@oracle.com>,
	"Clark, Nathan" <Clark_Nathan@emc.com>,
	"Singh, Arvinder" <Singh_Arvinder@emc.com>,
	"De Smet, Jochen" <DeSmet_Jochen@emc.com>,
	"Farmer, Matt" <Farmer_Matt@emc.com>,
	<linux-fsdevel@vger.kernel.org>,
	"Mizar, Sunita" <Mizar_Sunita@emc.com>
Subject: Re: end to end error recovery musings
Date: Tue, 27 Feb 2007 19:02:36 +0000	[thread overview]
Message-ID: <20070227190236.58323a40@lxorguk.ukuu.org.uk> (raw)
In-Reply-To: <yq14pp78rze.fsf@sermon.lab.mkp.net>

> These features make the most sense in terms of WRITE.  Disks already
> have plenty of CRC on the data so if a READ fails on a regular drive
> we already know about it.

Don't bet on it. If you want to do this seriously you need an end to end
(media to host ram) checksum. We do see bizarre and quite evil things
happen to people occasionally because they rely on bus level protection -
both faulty network cards and faulty disk or controller RAM can cause very
bad things to happen in a critical environment and are very very hard to
detect and test for.

IDE has another hideously evil feature in this area. Command blocks are
sent by PIO cycles, and are therefore unprotected from corruption. So
while a data burst with corruption will error and retry and command which
corrupts the block number although very very much less likely (less bits
and much lower speed) will not be caught on a PATA system for read or for
write and will hit the wrong block.

With networking you can turn off hardware IP checksumming (and many
cluster people do) with disks we don't yet have a proper end to end
checksum to media system in the fs or block layers.

> It would be great if the app tag was more than 16 bits.  Ted mentioned
> that ideally he'd like to store the inode number in the app tag.  But
> as it stands there isn't room.

The lowest few bits are the most important with ext2/ext3 because you
normally lose a sector of inodes which means you've got dangly bits
associated with a sequence of inodes with the same upper bits. More
problematic is losing indirect blocks, and being able to keep some kind
of [inode low bits/block index] would help put stuff back together.

Alan

  parent reply	other threads:[~2007-02-27 19:02 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-27  1:10 end to end error recovery musings Moore, Eric
2007-02-27  1:10 ` Moore, Eric
2007-02-27 16:50 ` Martin K. Petersen
2007-02-27 16:50   ` Martin K. Petersen
2007-02-27 18:51   ` Ric Wheeler
2007-02-27 19:02   ` Alan [this message]
2007-02-27 19:02     ` Alan
2007-02-27 18:39     ` Andreas Dilger
2007-02-27 19:07     ` Martin K. Petersen
2007-02-27 19:07       ` Martin K. Petersen
2007-02-27 23:39       ` Alan
2007-02-27 23:39         ` Alan
2007-02-27 22:51         ` Martin K. Petersen
2007-02-27 22:51           ` Martin K. Petersen
2007-02-28 13:46           ` Douglas Gilbert
2007-02-28 17:16             ` Martin K. Petersen
2007-02-28 17:30               ` James Bottomley
2007-02-28 17:42                 ` Martin K. Petersen
2007-02-28 17:52                   ` James Bottomley
2007-03-01  1:28                     ` H. Peter Anvin
2007-03-01 14:25                       ` James Bottomley
2007-03-01 17:19                         ` H. Peter Anvin
2007-02-28 15:19       ` Moore, Eric
2007-02-28 15:19         ` Moore, Eric
2007-02-28 17:27         ` Martin K. Petersen
  -- strict thread matches above, loose matches on Subject: below --
2007-02-23 14:15 Ric Wheeler
2007-02-23 14:15 ` Ric Wheeler
2007-02-24  0:03 ` H. Peter Anvin
2007-02-24  0:37   ` Andreas Dilger
2007-02-24  2:05     ` H. Peter Anvin
2007-02-24  2:32     ` Theodore Tso
2007-02-24 18:39       ` Chris Wedgwood
2007-02-26  5:33       ` Neil Brown
2007-02-26 13:25         ` Theodore Tso
2007-02-26 15:15           ` Alan
2007-02-26 15:18             ` Ric Wheeler
2007-02-26 17:01               ` Alan
2007-02-26 16:42                 ` Ric Wheeler
2007-02-26 15:17           ` James Bottomley
2007-02-26 18:59           ` H. Peter Anvin
2007-02-26 22:46           ` Jeff Garzik
2007-02-26 22:53             ` Ric Wheeler
2007-02-27  1:19               ` Alan
2007-02-26  6:01   ` Douglas Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070227190236.58323a40@lxorguk.ukuu.org.uk \
    --to=alan@lxorguk.ukuu.org.uk \
    --cc=Clark_Nathan@emc.com \
    --cc=DeSmet_Jochen@emc.com \
    --cc=Eric.Moore@lsi.com \
    --cc=Farmer_Matt@emc.com \
    --cc=James.Bottomley@SteelEye.com \
    --cc=Mizar_Sunita@emc.com \
    --cc=Singh_Arvinder@emc.com \
    --cc=hpa@zytor.com \
    --cc=htejun@gmail.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mlord@pobox.com \
    --cc=neilb@suse.de \
    --cc=ric@emc.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.