From: Hannes Reinecke <hare@suse.de> To: Mike Snitzer <snitzer@redhat.com> Cc: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>, Tejun Heo <tj@kernel.org>, tytso@mit.edu, linux-scsi@vger.kernel.org, jaxboe@fusionio.com, jack@suse.cz, linux-kernel@vger.kernel.org, swhiteho@redhat.com, linux-raid@vger.kernel.org, linux-ide@vger.kernel.org, James.Bottomley@suse.de, konishi.ryusuke@lab.ntt.co.jp, linux-fsdevel@vger.kernel.org, vst@vlnb.net, rwheeler@redhat.com, Christoph Hellwig <hch@lst.de>, chris.mason@oracle.com, dm-devel@redhat.com, Frederick.Knight@netapp.com Subject: Re: safety of retrying SYNCHRONIZE CACHE [was: Re: [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush] Date: Wed, 01 Sep 2010 09:32:20 +0200 [thread overview] Message-ID: <4C7E0184.9030502@suse.de> (raw) In-Reply-To: <20100901005537.GA21466@redhat.com> Mike Snitzer wrote: > Hi Kiyoshi, > > On Mon, Aug 30 2010 at 2:13am -0400, > Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote: > >>> That does seem like a valid concern. But I'm not seeing why its unique >>> to SYNCHRONIZE CACHE. Any IO that fails on the target side should be >>> passed up once the error gets to DM. >> See the Tejun's explanation again: >> http://marc.info/?l=linux-kernel&m=128267361813859&w=2 >> What I'm concerning is whether the same thing as Tejun explained >> for ATA can happen on other types of devices. >> >> >> Normal write command has data and no data loss happens on error. >> So it can be retried cleanly, and if the result of the retry is >> success, it's really success, no implicit data loss. >> >> Normal read command has a sector to read. If the sector is broken, >> all retries will fail and the error will be reported upwards. >> So it can be retried cleanly as well. > > I reached out to Fred Knight on this, to get a more insight from a pure > SCSI SBC perspective, and he shared the following: > > ----- Forwarded message from "Knight, Frederick" <Frederick.Knight@netapp.com> ----- > >> Date: Tue, 31 Aug 2010 13:24:15 -0400 >> From: "Knight, Frederick" <Frederick.Knight@netapp.com> >> To: Mike Snitzer <snitzer@redhat.com> >> Subject: RE: safety of retrying SYNCHRONIZE CACHE? >> >> There are requirements in SBC to maintain data integrity. If you WRITE >> a block and READ that block, you must get the data you sent in the >> WRITE. This will be synchronized around the completion of the WRITE. >> Before the WRITE completes, who knows what a READ will return. Maybe >> all the old data, maybe all the new data, maybe some mix of old and new >> data. Once the WRITE ends successful, all READs of those LBAs (from any >> port) will always get the same data. >> >> As for errors, SBC describes how the deferred errors are reported (like >> when a CACHE tries to flush but fails). So if a write from cache to >> media does have problems, the device would tell you via a CHECK >> CONDITION (with the first byte of the sense data set to 71h or 73h. SBC >> clause 4.12 and 4.13 cover a lot of this information. It is these error >> codes that prevent silent loss of data. And, in this case, when the >> CHECK CONDITION is delivered, it will have nothing to do with the >> command that was issued (the victim command). If you look into the >> sense data, you will see the deferred error flag, and all the additional >> information fields will relate to the original I/O >> >> SYNCHRONIZE CACHE is not substantially different than a WRITE (it puts >> data on the media). So issuing it multiple times wouldn't be any >> different than issuing multiple WRITES (it might put a temporary dent in >> performance as everything flushes out to media). If it or any other >> commands fail with 71h/73h, then you have to dig down into the sense >> data buffer to find out what happened. For example, if you issue a >> WRITE command, and it completes into write back cache but later (before >> being written to the media), some of the cache breaks and looses data, >> then the device must signal a deferred error to tell the host, and cause >> a forced error on the LBA in question. >> >> Does that help? >> >> Fred > ----- End forwarded message ----- > > Seems like verifying/improving the handling of CHECK CONDITION is a more > pressing concern than silent data loss purely due to SYNCHRONIZE CACHE > retries. Without proper handling we could completely miss these > deferred errors. > Yes. > But how to effectively report such errors to upper layers is unclear to > me given that a particular SCSI command can carry error information for > IO that was already acknowledged successful (e.g. to the FS). > > drivers/scsi/scsi_error.c's various calls to scsi_check_sense() > illustrate Linux's current CHECK CONDITION handling. I need to look > closer at how deferred errors propagate to upper layers. After an > initial look it seems scsi_error.c does handle retrying commands where > appropriate. > > I believe Hannes has concerns/insight here. > Quite. We _should_ be handling deferred errors correctly; if you check drivers/scsi/scsi_lib.c:scsi_io_completion() you'll find this: if (host_byte(result) == DID_RESET) { /* Third party bus reset or reset for error recovery * reasons. Just retry the command and see what * happens. */ action = ACTION_RETRY; } else if (sense_valid && !sense_deferred) { ... } else { description = "Unhandled error code"; action = ACTION_FAIL; } ie for deferred errors we're already aborting the command. Not sure if I agree with this bit in drivers/scsi/scsi_lib.c: static int scsi_check_sense(struct scsi_cmnd *scmd) { struct scsi_device *sdev = scmd->device; struct scsi_sense_hdr sshdr; if (! scsi_command_normalize_sense(scmd, &sshdr)) return FAILED; /* no valid sense data */ if (scsi_sense_is_deferred(&sshdr)) return NEEDS_RETRY; I doubt we can resolve the situation by retrying the command, which will be the wrong command to retry anyway. I would rather have those retry 'SUCCESS' and add another case in scsi_io_completion() to notify us about the deferred error. I'll be sending a patch. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING: multiple messages have this Message-ID (diff)
From: Hannes Reinecke <hare@suse.de> To: Mike Snitzer <snitzer@redhat.com> Cc: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>, Tejun Heo <tj@kernel.org>, tytso@mit.edu, linux-scsi@vger.kernel.org, jaxboe@fusionio.com, jack@suse.cz, linux-kernel@vger.kernel.org, swhiteho@redhat.com, linux-raid@vger.kernel.org, linux-ide@vger.kernel.org, James.Bottomley@suse.de, konishi.ryusuke@lab.ntt.co.jp, linux-fsdevel@vger.kernel.org, vst@vlnb.net, rwheeler@redhat.com, Christoph Hellwig <hch@lst.de>, chris.mason@oracle.com, dm-devel@redhat.com, Frederick.Knight@netapp.com Subject: Re: safety of retrying SYNCHRONIZE CACHE [was: Re: [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush] Date: Wed, 01 Sep 2010 09:32:20 +0200 [thread overview] Message-ID: <4C7E0184.9030502@suse.de> (raw) In-Reply-To: <20100901005537.GA21466@redhat.com> Mike Snitzer wrote: > Hi Kiyoshi, > > On Mon, Aug 30 2010 at 2:13am -0400, > Kiyoshi Ueda <k-ueda@ct.jp.nec.com> wrote: > >>> That does seem like a valid concern. But I'm not seeing why its unique >>> to SYNCHRONIZE CACHE. Any IO that fails on the target side should be >>> passed up once the error gets to DM. >> See the Tejun's explanation again: >> http://marc.info/?l=linux-kernel&m=128267361813859&w=2 >> What I'm concerning is whether the same thing as Tejun explained >> for ATA can happen on other types of devices. >> >> >> Normal write command has data and no data loss happens on error. >> So it can be retried cleanly, and if the result of the retry is >> success, it's really success, no implicit data loss. >> >> Normal read command has a sector to read. If the sector is broken, >> all retries will fail and the error will be reported upwards. >> So it can be retried cleanly as well. > > I reached out to Fred Knight on this, to get a more insight from a pure > SCSI SBC perspective, and he shared the following: > > ----- Forwarded message from "Knight, Frederick" <Frederick.Knight@netapp.com> ----- > >> Date: Tue, 31 Aug 2010 13:24:15 -0400 >> From: "Knight, Frederick" <Frederick.Knight@netapp.com> >> To: Mike Snitzer <snitzer@redhat.com> >> Subject: RE: safety of retrying SYNCHRONIZE CACHE? >> >> There are requirements in SBC to maintain data integrity. If you WRITE >> a block and READ that block, you must get the data you sent in the >> WRITE. This will be synchronized around the completion of the WRITE. >> Before the WRITE completes, who knows what a READ will return. Maybe >> all the old data, maybe all the new data, maybe some mix of old and new >> data. Once the WRITE ends successful, all READs of those LBAs (from any >> port) will always get the same data. >> >> As for errors, SBC describes how the deferred errors are reported (like >> when a CACHE tries to flush but fails). So if a write from cache to >> media does have problems, the device would tell you via a CHECK >> CONDITION (with the first byte of the sense data set to 71h or 73h. SBC >> clause 4.12 and 4.13 cover a lot of this information. It is these error >> codes that prevent silent loss of data. And, in this case, when the >> CHECK CONDITION is delivered, it will have nothing to do with the >> command that was issued (the victim command). If you look into the >> sense data, you will see the deferred error flag, and all the additional >> information fields will relate to the original I/O >> >> SYNCHRONIZE CACHE is not substantially different than a WRITE (it puts >> data on the media). So issuing it multiple times wouldn't be any >> different than issuing multiple WRITES (it might put a temporary dent in >> performance as everything flushes out to media). If it or any other >> commands fail with 71h/73h, then you have to dig down into the sense >> data buffer to find out what happened. For example, if you issue a >> WRITE command, and it completes into write back cache but later (before >> being written to the media), some of the cache breaks and looses data, >> then the device must signal a deferred error to tell the host, and cause >> a forced error on the LBA in question. >> >> Does that help? >> >> Fred > ----- End forwarded message ----- > > Seems like verifying/improving the handling of CHECK CONDITION is a more > pressing concern than silent data loss purely due to SYNCHRONIZE CACHE > retries. Without proper handling we could completely miss these > deferred errors. > Yes. > But how to effectively report such errors to upper layers is unclear to > me given that a particular SCSI command can carry error information for > IO that was already acknowledged successful (e.g. to the FS). > > drivers/scsi/scsi_error.c's various calls to scsi_check_sense() > illustrate Linux's current CHECK CONDITION handling. I need to look > closer at how deferred errors propagate to upper layers. After an > initial look it seems scsi_error.c does handle retrying commands where > appropriate. > > I believe Hannes has concerns/insight here. > Quite. We _should_ be handling deferred errors correctly; if you check drivers/scsi/scsi_lib.c:scsi_io_completion() you'll find this: if (host_byte(result) == DID_RESET) { /* Third party bus reset or reset for error recovery * reasons. Just retry the command and see what * happens. */ action = ACTION_RETRY; } else if (sense_valid && !sense_deferred) { ... } else { description = "Unhandled error code"; action = ACTION_FAIL; } ie for deferred errors we're already aborting the command. Not sure if I agree with this bit in drivers/scsi/scsi_lib.c: static int scsi_check_sense(struct scsi_cmnd *scmd) { struct scsi_device *sdev = scmd->device; struct scsi_sense_hdr sshdr; if (! scsi_command_normalize_sense(scmd, &sshdr)) return FAILED; /* no valid sense data */ if (scsi_sense_is_deferred(&sshdr)) return NEEDS_RETRY; I doubt we can resolve the situation by retrying the command, which will be the wrong command to retry anyway. I would rather have those retry 'SUCCESS' and add another case in scsi_io_completion() to notify us about the deferred error. I'll be sending a patch. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg)
next prev parent reply other threads:[~2010-09-01 7:32 UTC|newest] Thread overview: 155+ messages / expand[flat|nested] mbox.gz Atom feed top 2010-08-12 12:41 [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-12 12:41 ` [PATCH 01/11] block/loop: queue ordered mode should be DRAIN_FLUSH Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-12 12:41 ` [PATCH 02/11] block: kill QUEUE_ORDERED_BY_TAG Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-13 12:56 ` Vladislav Bolkhovitin 2010-08-13 13:06 ` Christoph Hellwig 2010-08-12 12:41 ` [PATCH 03/11] block: deprecate barrier and replace blk_queue_ordered() with blk_queue_flush() Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-14 1:07 ` Jeremy Fitzhardinge 2010-08-14 1:07 ` Jeremy Fitzhardinge 2010-08-14 9:42 ` hch 2010-08-16 20:38 ` Jeremy Fitzhardinge 2010-08-12 12:41 ` [PATCH 04/11] block: remove spurious uses of REQ_HARDBARRIER Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-12 12:41 ` [PATCH 05/11] block: misc cleanups in barrier code Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-12 12:41 ` [PATCH 06/11] block: drop barrier ordering by queue draining Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-12 12:41 ` [PATCH 07/11] block: rename blk-barrier.c to blk-flush.c Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-12 12:41 ` [PATCH 08/11] block: rename barrier/ordered to flush Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-17 13:26 ` Christoph Hellwig 2010-08-17 16:23 ` Tejun Heo 2010-08-17 17:08 ` Christoph Hellwig 2010-08-18 6:23 ` Tejun Heo 2010-08-12 12:41 ` [PATCH 09/11] block: implement REQ_FLUSH/FUA based interface for FLUSH/FUA requests Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-12 12:41 ` [PATCH 10/11] fs, block: propagate REQ_FLUSH/FUA interface to upper layers Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-12 21:24 ` Jan Kara 2010-08-13 7:19 ` Tejun Heo 2010-08-13 7:47 ` Christoph Hellwig 2010-08-16 16:33 ` [PATCH UPDATED " Tejun Heo 2010-08-12 12:41 ` [PATCH " Tejun Heo 2010-08-12 12:41 ` [PATCH 11/11] block: use REQ_FLUSH in blkdev_issue_flush() Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-12 12:41 ` Tejun Heo 2010-08-13 11:48 ` [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush Christoph Hellwig 2010-08-13 13:48 ` Tejun Heo 2010-08-13 14:38 ` Christoph Hellwig 2010-08-13 14:51 ` Tejun Heo 2010-08-14 10:36 ` Christoph Hellwig 2010-08-17 9:59 ` Tejun Heo 2010-08-17 13:19 ` Christoph Hellwig 2010-08-17 16:41 ` Tejun Heo 2010-08-17 16:59 ` Christoph Hellwig 2010-08-18 6:35 ` Tejun Heo 2010-08-18 8:11 ` Tejun Heo 2010-08-20 8:26 ` Kiyoshi Ueda 2010-08-23 12:14 ` Tejun Heo 2010-08-23 14:17 ` Mike Snitzer 2010-08-24 10:24 ` Kiyoshi Ueda 2010-08-24 16:59 ` Tejun Heo 2010-08-24 17:52 ` Mike Snitzer 2010-08-24 18:14 ` Tejun Heo 2010-08-25 8:00 ` Kiyoshi Ueda 2010-08-25 15:28 ` Mike Snitzer 2010-08-27 9:47 ` Kiyoshi Ueda 2010-08-27 9:47 ` Kiyoshi Ueda 2010-08-27 13:49 ` Mike Snitzer 2010-08-30 6:13 ` Kiyoshi Ueda 2010-09-01 0:55 ` safety of retrying SYNCHRONIZE CACHE [was: Re: [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush] Mike Snitzer 2010-09-01 7:32 ` Hannes Reinecke [this message] 2010-09-01 7:32 ` Hannes Reinecke 2010-09-01 7:38 ` Hannes Reinecke 2010-09-01 7:38 ` Hannes Reinecke 2010-12-08 21:14 ` [PATCH] scsi: improve description for deferred error Mike Snitzer 2010-12-28 21:45 ` Brett Russ 2010-08-25 15:59 ` [RFC] training mpath to discern between SCSI errors (was: Re: [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush) Mike Snitzer 2010-08-25 19:15 ` [RFC] training mpath to discern between SCSI errors Mike Christie 2010-08-30 11:38 ` Hannes Reinecke 2010-08-30 12:07 ` Sergei Shtylyov 2010-08-30 12:39 ` Hannes Reinecke 2010-08-30 12:51 ` Christophe Varoqui 2010-08-30 13:10 ` Hannes Reinecke 2010-08-30 14:52 ` [dm-devel] " Hannes Reinecke 2010-08-30 14:52 ` Hannes Reinecke 2010-10-18 8:09 ` Jun'ichi Nomura 2010-10-18 11:55 ` Hannes Reinecke 2010-10-19 4:03 ` Jun'ichi Nomura 2010-11-19 3:11 ` [dm-devel] " Malahal Naineni 2010-11-30 22:59 ` Mike Snitzer 2010-12-07 23:16 ` [RFC PATCH 0/3] differentiate between I/O errors Mike Snitzer 2010-12-07 23:16 ` [RFC PATCH v2 1/3] scsi: Detailed " Mike Snitzer 2010-12-07 23:16 ` [RFC PATCH v2 2/3] dm mpath: propagate target errors immediately Mike Snitzer 2010-12-07 23:16 ` Mike Snitzer 2010-12-07 23:16 ` [RFC PATCH 3/3] block: improve detail in I/O error messages Mike Snitzer 2010-12-08 11:28 ` Sergei Shtylyov 2010-12-08 15:05 ` [PATCH v2 " Mike Snitzer 2010-12-10 23:40 ` [RFC PATCH 0/3] differentiate between I/O errors Malahal Naineni 2011-01-14 1:15 ` Mike Snitzer 2011-01-14 1:15 ` Mike Snitzer 2011-01-14 1:15 ` Mike Snitzer 2011-01-14 1:15 ` Mike Snitzer 2011-01-14 1:15 ` Mike Snitzer 2010-12-17 9:47 ` training mpath to discern between SCSI errors Hannes Reinecke 2010-12-17 14:06 ` Mike Snitzer 2010-12-17 14:06 ` Mike Snitzer 2011-01-14 1:09 ` Mike Snitzer 2011-01-14 7:45 ` Hannes Reinecke 2011-01-14 13:59 ` Mike Snitzer 2010-08-24 17:11 ` [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush Vladislav Bolkhovitin 2010-08-24 23:14 ` Alan Cox 2010-08-24 23:14 ` Alan Cox 2010-08-13 12:55 ` Vladislav Bolkhovitin 2010-08-13 13:17 ` Christoph Hellwig 2010-08-18 19:29 ` Vladislav Bolkhovitin 2010-08-13 13:21 ` Tejun Heo 2010-08-18 19:30 ` Vladislav Bolkhovitin 2010-08-19 9:51 ` Tejun Heo 2010-08-30 9:54 ` Hannes Reinecke 2010-08-30 20:34 ` Vladislav Bolkhovitin 2010-08-18 9:46 ` Christoph Hellwig 2010-08-19 9:57 ` Tejun Heo 2010-08-19 10:20 ` Christoph Hellwig 2010-08-19 10:22 ` Tejun Heo 2010-08-20 13:22 ` Christoph Hellwig 2010-08-20 15:18 ` Ric Wheeler 2010-08-20 16:00 ` Chris Mason 2010-08-20 16:02 ` Ric Wheeler 2010-08-20 16:02 ` Ric Wheeler 2010-08-20 16:02 ` Ric Wheeler 2010-08-20 16:02 ` Ric Wheeler 2010-08-20 16:02 ` Ric Wheeler 2010-08-20 16:02 ` Ric Wheeler 2010-08-23 12:30 ` Tejun Heo 2010-08-23 12:48 ` Christoph Hellwig 2010-08-23 13:58 ` Ric Wheeler 2010-08-23 14:01 ` Jens Axboe 2010-08-23 14:08 ` Christoph Hellwig 2010-08-23 14:13 ` Tejun Heo 2010-08-23 14:19 ` Christoph Hellwig 2010-08-25 11:31 ` Jens Axboe 2010-08-30 10:04 ` Hannes Reinecke 2010-08-23 15:19 ` Ric Wheeler 2010-08-23 16:45 ` Sergey Vlasov 2010-08-23 16:45 ` [dm-devel] " Sergey Vlasov 2010-08-23 16:49 ` Ric Wheeler 2010-08-23 16:49 ` Ric Wheeler 2010-08-23 16:49 ` Ric Wheeler 2010-08-23 16:49 ` Ric Wheeler 2010-08-23 16:49 ` Ric Wheeler 2010-08-23 12:36 ` Tejun Heo 2010-08-23 14:05 ` Christoph Hellwig 2010-08-23 14:15 ` [PATCH] block: simplify queue_next_fseq Christoph Hellwig 2010-08-23 16:28 ` OT grammar nit " John Robinson
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=4C7E0184.9030502@suse.de \ --to=hare@suse.de \ --cc=Frederick.Knight@netapp.com \ --cc=James.Bottomley@suse.de \ --cc=chris.mason@oracle.com \ --cc=dm-devel@redhat.com \ --cc=hch@lst.de \ --cc=jack@suse.cz \ --cc=jaxboe@fusionio.com \ --cc=k-ueda@ct.jp.nec.com \ --cc=konishi.ryusuke@lab.ntt.co.jp \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-ide@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-raid@vger.kernel.org \ --cc=linux-scsi@vger.kernel.org \ --cc=rwheeler@redhat.com \ --cc=snitzer@redhat.com \ --cc=swhiteho@redhat.com \ --cc=tj@kernel.org \ --cc=tytso@mit.edu \ --cc=vst@vlnb.net \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.