From: "Martin K. Petersen" <martin.petersen@oracle.com> To: Alan <alan@lxorguk.ukuu.org.uk> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>, "Moore, Eric" <Eric.Moore@lsi.com>, ric@emc.com, Theodore Tso <tytso@mit.edu>, Neil Brown <neilb@suse.de>, "H. Peter Anvin" <hpa@zytor.com>, Linux-ide <linux-ide@vger.kernel.org>, linux-scsi <linux-scsi@vger.kernel.org>, linux-raid@vger.kernel.org, Tejun Heo <htejun@gmail.com>, James Bottomley <James.Bottomley@SteelEye.com>, Mark Lord <mlord@pobox.com>, Jens Axboe <jens.axboe@oracle.com>, "Clark, Nathan" <Clark_Nathan@emc.com>, "Singh, Arvinder" <Singh_Arvinder@emc.com>, "De Smet, Jochen" <DeSmet_Jochen@emc.com>, "Farmer, Matt" <Farmer_Matt@emc.com>, linux-fsdevel@vger.kernel.org, "Mizar, Sunita" <Mizar_Sunita@emc.com> Subject: Re: end to end error recovery musings Date: Tue, 27 Feb 2007 14:07:12 -0500 [thread overview] Message-ID: <yq1r6sb7733.fsf@sermon.lab.mkp.net> (raw) In-Reply-To: <20070227190236.58323a40@lxorguk.ukuu.org.uk> (alan@lxorguk.ukuu.org.uk's message of "Tue, 27 Feb 2007 19:02:36 +0000") >>>>> "Alan" == Alan <alan@lxorguk.ukuu.org.uk> writes: >> These features make the most sense in terms of WRITE. Disks >> already have plenty of CRC on the data so if a READ fails on a >> regular drive we already know about it. Alan> Don't bet on it. This is why I mentioned that I want to expose the protection data to the host. As written, DIF only protects the path between initiator and target. See below... Alan> If you want to do this seriously you need an end to end (media Alan> to host ram) checksum. We do see bizarre and quite evil things Alan> happen to people occasionally because they rely on bus level Alan> protection - both faulty network cards and faulty disk or Alan> controller RAM can cause very bad things to happen in a critical Alan> environment and are very very hard to detect and test for. Not sure you're up-to-date on the T10 data integrity feature. Essentially it's an extension of the 520 byte sectors common in disk arrays. For each 512 byte sector (or 4KB ditto) you get 8 bytes of protection data. There's a 2 byte CRC (GUARD tag), a 2 byte user-defined tag (APP) and a 4-byte reference tag (REF). Depending on how the drive is formatted, the REF tag usually needs to match the lower 32-bits of the target sector #. For each sector coming in the disk firmware verifies that the CRC and the reference tags are in accordance with the contents of the sector and the CDB start sector + offset. If they don't match the drive will reject the request. If an HBA is capable of exposing the protection tuples to the host we can precalculate the checksum and the LBA when submitting a WRITE. My current proposal involves passing them down in two separate buffers to minimize the risk of in-memory corruption (Besides, it would suck if you had to interleave data and protection data. The scatterlists would become long and twisted). And that's when the READ case becomes interesting. Because then the fs can verify that the checksum of the in-buffer matches of the GUARD tag. In that case we'll know there's been no corruption in the middle. And of course this also opens up using the APP field to tag sector contents. -- Martin K. Petersen Oracle Linux Engineering
WARNING: multiple messages have this Message-ID (diff)
From: "Martin K. Petersen" <martin.petersen@oracle.com> To: Alan <alan@lxorguk.ukuu.org.uk> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>, "Moore, Eric" <Eric.Moore@lsi.com>, <ric@emc.com>, "Theodore Tso" <tytso@mit.edu>, "Neil Brown" <neilb@suse.de>, "H. Peter Anvin" <hpa@zytor.com>, "Linux-ide" <linux-ide@vger.kernel.org>, "linux-scsi" <linux-scsi@vger.kernel.org>, <linux-raid@vger.kernel.org>, "Tejun Heo" <htejun@gmail.com>, "James Bottomley" <James.Bottomley@SteelEye.com>, "Mark Lord" <mlord@pobox.com>, "Jens Axboe" <jens.axboe@oracle.com>, "Clark, Nathan" <Clark_Nathan@emc.com>, "Singh, Arvinder" <Singh_Arvinder@emc.com>, "De Smet, Jochen" <DeSmet_Jochen@emc.com>, "Farmer, Matt" <Farmer_Matt@emc.com>, <linux-fsdevel@vger.kernel.org>, "Mizar, Sunita" <Mizar_Sunita@emc.com> Subject: Re: end to end error recovery musings Date: Tue, 27 Feb 2007 14:07:12 -0500 [thread overview] Message-ID: <yq1r6sb7733.fsf@sermon.lab.mkp.net> (raw) In-Reply-To: <20070227190236.58323a40@lxorguk.ukuu.org.uk> (alan@lxorguk.ukuu.org.uk's message of "Tue, 27 Feb 2007 19:02:36 +0000") >>>>> "Alan" == Alan <alan@lxorguk.ukuu.org.uk> writes: >> These features make the most sense in terms of WRITE. Disks >> already have plenty of CRC on the data so if a READ fails on a >> regular drive we already know about it. Alan> Don't bet on it. This is why I mentioned that I want to expose the protection data to the host. As written, DIF only protects the path between initiator and target. See below... Alan> If you want to do this seriously you need an end to end (media Alan> to host ram) checksum. We do see bizarre and quite evil things Alan> happen to people occasionally because they rely on bus level Alan> protection - both faulty network cards and faulty disk or Alan> controller RAM can cause very bad things to happen in a critical Alan> environment and are very very hard to detect and test for. Not sure you're up-to-date on the T10 data integrity feature. Essentially it's an extension of the 520 byte sectors common in disk arrays. For each 512 byte sector (or 4KB ditto) you get 8 bytes of protection data. There's a 2 byte CRC (GUARD tag), a 2 byte user-defined tag (APP) and a 4-byte reference tag (REF). Depending on how the drive is formatted, the REF tag usually needs to match the lower 32-bits of the target sector #. For each sector coming in the disk firmware verifies that the CRC and the reference tags are in accordance with the contents of the sector and the CDB start sector + offset. If they don't match the drive will reject the request. If an HBA is capable of exposing the protection tuples to the host we can precalculate the checksum and the LBA when submitting a WRITE. My current proposal involves passing them down in two separate buffers to minimize the risk of in-memory corruption (Besides, it would suck if you had to interleave data and protection data. The scatterlists would become long and twisted). And that's when the READ case becomes interesting. Because then the fs can verify that the checksum of the in-buffer matches of the GUARD tag. In that case we'll know there's been no corruption in the middle. And of course this also opens up using the APP field to tag sector contents. -- Martin K. Petersen Oracle Linux Engineering
next prev parent reply other threads:[~2007-02-27 19:07 UTC|newest] Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top 2007-02-27 1:10 end to end error recovery musings Moore, Eric 2007-02-27 1:10 ` Moore, Eric 2007-02-27 16:50 ` Martin K. Petersen 2007-02-27 16:50 ` Martin K. Petersen 2007-02-27 18:51 ` Ric Wheeler 2007-02-27 19:02 ` Alan 2007-02-27 19:02 ` Alan 2007-02-27 18:39 ` Andreas Dilger 2007-02-27 19:07 ` Martin K. Petersen [this message] 2007-02-27 19:07 ` Martin K. Petersen 2007-02-27 23:39 ` Alan 2007-02-27 23:39 ` Alan 2007-02-27 22:51 ` Martin K. Petersen 2007-02-27 22:51 ` Martin K. Petersen 2007-02-28 13:46 ` Douglas Gilbert 2007-02-28 17:16 ` Martin K. Petersen 2007-02-28 17:30 ` James Bottomley 2007-02-28 17:42 ` Martin K. Petersen 2007-02-28 17:52 ` James Bottomley 2007-03-01 1:28 ` H. Peter Anvin 2007-03-01 14:25 ` James Bottomley 2007-03-01 17:19 ` H. Peter Anvin 2007-02-28 15:19 ` Moore, Eric 2007-02-28 15:19 ` Moore, Eric 2007-02-28 17:27 ` Martin K. Petersen -- strict thread matches above, loose matches on Subject: below -- 2007-02-23 14:15 Ric Wheeler 2007-02-23 14:15 ` Ric Wheeler 2007-02-24 0:03 ` H. Peter Anvin 2007-02-24 0:37 ` Andreas Dilger 2007-02-24 2:05 ` H. Peter Anvin 2007-02-24 2:32 ` Theodore Tso 2007-02-24 18:39 ` Chris Wedgwood 2007-02-26 5:33 ` Neil Brown 2007-02-26 13:25 ` Theodore Tso 2007-02-26 15:15 ` Alan 2007-02-26 15:18 ` Ric Wheeler 2007-02-26 17:01 ` Alan 2007-02-26 16:42 ` Ric Wheeler 2007-02-26 15:17 ` James Bottomley 2007-02-26 18:59 ` H. Peter Anvin 2007-02-26 22:46 ` Jeff Garzik 2007-02-26 22:53 ` Ric Wheeler 2007-02-27 1:19 ` Alan 2007-02-26 6:01 ` Douglas Gilbert
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=yq1r6sb7733.fsf@sermon.lab.mkp.net \ --to=martin.petersen@oracle.com \ --cc=Clark_Nathan@emc.com \ --cc=DeSmet_Jochen@emc.com \ --cc=Eric.Moore@lsi.com \ --cc=Farmer_Matt@emc.com \ --cc=James.Bottomley@SteelEye.com \ --cc=Mizar_Sunita@emc.com \ --cc=Singh_Arvinder@emc.com \ --cc=alan@lxorguk.ukuu.org.uk \ --cc=hpa@zytor.com \ --cc=htejun@gmail.com \ --cc=jens.axboe@oracle.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-ide@vger.kernel.org \ --cc=linux-raid@vger.kernel.org \ --cc=linux-scsi@vger.kernel.org \ --cc=mlord@pobox.com \ --cc=neilb@suse.de \ --cc=ric@emc.com \ --cc=tytso@mit.edu \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.