From: Alan <alan@lxorguk.ukuu.org.uk> To: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: "Moore, Eric" <Eric.Moore@lsi.com>, ric@emc.com, Theodore Tso <tytso@mit.edu>, Neil Brown <neilb@suse.de>, "H. Peter Anvin" <hpa@zytor.com>, Linux-ide <linux-ide@vger.kernel.org>, linux-scsi <linux-scsi@vger.kernel.org>, linux-raid@vger.kernel.org, Tejun Heo <htejun@gmail.com>, James Bottomley <James.Bottomley@SteelEye.com>, Mark Lord <mlord@pobox.com>, Jens Axboe <jens.axboe@oracle.com>, "Clark, Nathan" <Clark_Nathan@emc.com>, "Singh, Arvinder" <Singh_Arvinder@emc.com>, "De Smet, Jochen" <DeSmet_Jochen@emc.com>, "Farmer, Matt" <Farmer_Matt@emc.com>, linux-fsdevel@vger.kernel.org, "Mizar, Sunita" <Mizar_Sunita@emc.com> Subject: Re: end to end error recovery musings Date: Tue, 27 Feb 2007 19:02:36 +0000 [thread overview] Message-ID: <20070227190236.58323a40@lxorguk.ukuu.org.uk> (raw) In-Reply-To: <yq14pp78rze.fsf@sermon.lab.mkp.net> > These features make the most sense in terms of WRITE. Disks already > have plenty of CRC on the data so if a READ fails on a regular drive > we already know about it. Don't bet on it. If you want to do this seriously you need an end to end (media to host ram) checksum. We do see bizarre and quite evil things happen to people occasionally because they rely on bus level protection - both faulty network cards and faulty disk or controller RAM can cause very bad things to happen in a critical environment and are very very hard to detect and test for. IDE has another hideously evil feature in this area. Command blocks are sent by PIO cycles, and are therefore unprotected from corruption. So while a data burst with corruption will error and retry and command which corrupts the block number although very very much less likely (less bits and much lower speed) will not be caught on a PATA system for read or for write and will hit the wrong block. With networking you can turn off hardware IP checksumming (and many cluster people do) with disks we don't yet have a proper end to end checksum to media system in the fs or block layers. > It would be great if the app tag was more than 16 bits. Ted mentioned > that ideally he'd like to store the inode number in the app tag. But > as it stands there isn't room. The lowest few bits are the most important with ext2/ext3 because you normally lose a sector of inodes which means you've got dangly bits associated with a sequence of inodes with the same upper bits. More problematic is losing indirect blocks, and being able to keep some kind of [inode low bits/block index] would help put stuff back together. Alan
WARNING: multiple messages have this Message-ID (diff)
From: Alan <alan@lxorguk.ukuu.org.uk> To: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: "Moore, Eric" <Eric.Moore@lsi.com>, <ric@emc.com>, "Theodore Tso" <tytso@mit.edu>, "Neil Brown" <neilb@suse.de>, "H. Peter Anvin" <hpa@zytor.com>, "Linux-ide" <linux-ide@vger.kernel.org>, "linux-scsi" <linux-scsi@vger.kernel.org>, <linux-raid@vger.kernel.org>, "Tejun Heo" <htejun@gmail.com>, "James Bottomley" <James.Bottomley@SteelEye.com>, "Mark Lord" <mlord@pobox.com>, "Jens Axboe" <jens.axboe@oracle.com>, "Clark, Nathan" <Clark_Nathan@emc.com>, "Singh, Arvinder" <Singh_Arvinder@emc.com>, "De Smet, Jochen" <DeSmet_Jochen@emc.com>, "Farmer, Matt" <Farmer_Matt@emc.com>, <linux-fsdevel@vger.kernel.org>, "Mizar, Sunita" <Mizar_Sunita@emc.com> Subject: Re: end to end error recovery musings Date: Tue, 27 Feb 2007 19:02:36 +0000 [thread overview] Message-ID: <20070227190236.58323a40@lxorguk.ukuu.org.uk> (raw) In-Reply-To: <yq14pp78rze.fsf@sermon.lab.mkp.net> > These features make the most sense in terms of WRITE. Disks already > have plenty of CRC on the data so if a READ fails on a regular drive > we already know about it. Don't bet on it. If you want to do this seriously you need an end to end (media to host ram) checksum. We do see bizarre and quite evil things happen to people occasionally because they rely on bus level protection - both faulty network cards and faulty disk or controller RAM can cause very bad things to happen in a critical environment and are very very hard to detect and test for. IDE has another hideously evil feature in this area. Command blocks are sent by PIO cycles, and are therefore unprotected from corruption. So while a data burst with corruption will error and retry and command which corrupts the block number although very very much less likely (less bits and much lower speed) will not be caught on a PATA system for read or for write and will hit the wrong block. With networking you can turn off hardware IP checksumming (and many cluster people do) with disks we don't yet have a proper end to end checksum to media system in the fs or block layers. > It would be great if the app tag was more than 16 bits. Ted mentioned > that ideally he'd like to store the inode number in the app tag. But > as it stands there isn't room. The lowest few bits are the most important with ext2/ext3 because you normally lose a sector of inodes which means you've got dangly bits associated with a sequence of inodes with the same upper bits. More problematic is losing indirect blocks, and being able to keep some kind of [inode low bits/block index] would help put stuff back together. Alan
next prev parent reply other threads:[~2007-02-27 19:02 UTC|newest] Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top 2007-02-27 1:10 end to end error recovery musings Moore, Eric 2007-02-27 1:10 ` Moore, Eric 2007-02-27 16:50 ` Martin K. Petersen 2007-02-27 16:50 ` Martin K. Petersen 2007-02-27 18:51 ` Ric Wheeler 2007-02-27 19:02 ` Alan [this message] 2007-02-27 19:02 ` Alan 2007-02-27 18:39 ` Andreas Dilger 2007-02-27 19:07 ` Martin K. Petersen 2007-02-27 19:07 ` Martin K. Petersen 2007-02-27 23:39 ` Alan 2007-02-27 23:39 ` Alan 2007-02-27 22:51 ` Martin K. Petersen 2007-02-27 22:51 ` Martin K. Petersen 2007-02-28 13:46 ` Douglas Gilbert 2007-02-28 17:16 ` Martin K. Petersen 2007-02-28 17:30 ` James Bottomley 2007-02-28 17:42 ` Martin K. Petersen 2007-02-28 17:52 ` James Bottomley 2007-03-01 1:28 ` H. Peter Anvin 2007-03-01 14:25 ` James Bottomley 2007-03-01 17:19 ` H. Peter Anvin 2007-02-28 15:19 ` Moore, Eric 2007-02-28 15:19 ` Moore, Eric 2007-02-28 17:27 ` Martin K. Petersen -- strict thread matches above, loose matches on Subject: below -- 2007-02-23 14:15 Ric Wheeler 2007-02-23 14:15 ` Ric Wheeler 2007-02-24 0:03 ` H. Peter Anvin 2007-02-24 0:37 ` Andreas Dilger 2007-02-24 2:05 ` H. Peter Anvin 2007-02-24 2:32 ` Theodore Tso 2007-02-24 18:39 ` Chris Wedgwood 2007-02-26 5:33 ` Neil Brown 2007-02-26 13:25 ` Theodore Tso 2007-02-26 15:15 ` Alan 2007-02-26 15:18 ` Ric Wheeler 2007-02-26 17:01 ` Alan 2007-02-26 16:42 ` Ric Wheeler 2007-02-26 15:17 ` James Bottomley 2007-02-26 18:59 ` H. Peter Anvin 2007-02-26 22:46 ` Jeff Garzik 2007-02-26 22:53 ` Ric Wheeler 2007-02-27 1:19 ` Alan 2007-02-26 6:01 ` Douglas Gilbert
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20070227190236.58323a40@lxorguk.ukuu.org.uk \ --to=alan@lxorguk.ukuu.org.uk \ --cc=Clark_Nathan@emc.com \ --cc=DeSmet_Jochen@emc.com \ --cc=Eric.Moore@lsi.com \ --cc=Farmer_Matt@emc.com \ --cc=James.Bottomley@SteelEye.com \ --cc=Mizar_Sunita@emc.com \ --cc=Singh_Arvinder@emc.com \ --cc=hpa@zytor.com \ --cc=htejun@gmail.com \ --cc=jens.axboe@oracle.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-ide@vger.kernel.org \ --cc=linux-raid@vger.kernel.org \ --cc=linux-scsi@vger.kernel.org \ --cc=martin.petersen@oracle.com \ --cc=mlord@pobox.com \ --cc=neilb@suse.de \ --cc=ric@emc.com \ --cc=tytso@mit.edu \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.