From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alan Subject: Re: end to end error recovery musings Date: Tue, 27 Feb 2007 19:02:36 +0000 Message-ID: <20070227190236.58323a40@lxorguk.ukuu.org.uk> References: <664A4EBB07F29743873A87CF62C26D705D6DDB@NAMAIL4.ad.lsil.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-scsi-owner@vger.kernel.org To: "Martin K. Petersen" Cc: "Moore, Eric" , ric@emc.com, Theodore Tso , Neil Brown , "H. Peter Anvin" , Linux-ide , linux-scsi , linux-raid@vger.kernel.org, Tejun Heo , James Bottomley , Mark Lord , Jens Axboe , "Clark, Nathan" , "Singh, Arvinder" , "De Smet, Jochen" , "Farmer, Matt" , linux-fsdevel@vger.kernel.org, "Mizar, Sunita" List-Id: linux-raid.ids > These features make the most sense in terms of WRITE. Disks already > have plenty of CRC on the data so if a READ fails on a regular drive > we already know about it. Don't bet on it. If you want to do this seriously you need an end to end (media to host ram) checksum. We do see bizarre and quite evil things happen to people occasionally because they rely on bus level protection - both faulty network cards and faulty disk or controller RAM can cause very bad things to happen in a critical environment and are very very hard to detect and test for. IDE has another hideously evil feature in this area. Command blocks are sent by PIO cycles, and are therefore unprotected from corruption. So while a data burst with corruption will error and retry and command which corrupts the block number although very very much less likely (less bits and much lower speed) will not be caught on a PATA system for read or for write and will hit the wrong block. With networking you can turn off hardware IP checksumming (and many cluster people do) with disks we don't yet have a proper end to end checksum to media system in the fs or block layers. > It would be great if the app tag was more than 16 bits. Ted mentioned > that ideally he'd like to store the inode number in the app tag. But > as it stands there isn't room. The lowest few bits are the most important with ext2/ext3 because you normally lose a sector of inodes which means you've got dangly bits associated with a sequence of inodes with the same upper bits. More problematic is losing indirect blocks, and being able to keep some kind of [inode low bits/block index] would help put stuff back together. Alan From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alan Subject: Re: end to end error recovery musings Date: Tue, 27 Feb 2007 19:02:36 +0000 Message-ID: <20070227190236.58323a40@lxorguk.ukuu.org.uk> References: <664A4EBB07F29743873A87CF62C26D705D6DDB@NAMAIL4.ad.lsil.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: "Moore, Eric" , , "Theodore Tso" , "Neil Brown" , "H. Peter Anvin" , "Linux-ide" , "linux-scsi" , , "Tejun Heo" , "James Bottomley" , "Mark Lord" , "Jens Axboe" , "Clark, Nathan" , "Singh, Arvinder" , "De Smet, Jochen" , "Farmer, Matt" , , "Mizar, Sunita" To: "Martin K. Petersen" Return-path: In-Reply-To: Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org > These features make the most sense in terms of WRITE. Disks already > have plenty of CRC on the data so if a READ fails on a regular drive > we already know about it. Don't bet on it. If you want to do this seriously you need an end to end (media to host ram) checksum. We do see bizarre and quite evil things happen to people occasionally because they rely on bus level protection - both faulty network cards and faulty disk or controller RAM can cause very bad things to happen in a critical environment and are very very hard to detect and test for. IDE has another hideously evil feature in this area. Command blocks are sent by PIO cycles, and are therefore unprotected from corruption. So while a data burst with corruption will error and retry and command which corrupts the block number although very very much less likely (less bits and much lower speed) will not be caught on a PATA system for read or for write and will hit the wrong block. With networking you can turn off hardware IP checksumming (and many cluster people do) with disks we don't yet have a proper end to end checksum to media system in the fs or block layers. > It would be great if the app tag was more than 16 bits. Ted mentioned > that ideally he'd like to store the inode number in the app tag. But > as it stands there isn't room. The lowest few bits are the most important with ext2/ext3 because you normally lose a sector of inodes which means you've got dangly bits associated with a sequence of inodes with the same upper bits. More problematic is losing indirect blocks, and being able to keep some kind of [inode low bits/block index] would help put stuff back together. Alan