If I have a single bad sector, how many failed reads should simple dd report?

All of lore.kernel.org
 help / color / mirror / Atom feed

* If I have a single bad sector, how many failed reads should simple dd report?
@ 2010-07-08 17:14 Greg Freemyer
  2010-07-09 19:04 ` Greg Freemyer
  0 siblings, 1 reply; 7+ messages in thread
From: Greg Freemyer @ 2010-07-08 17:14 UTC (permalink / raw)
  To: IDE/ATA development list

All,

I just ran a test against a IDE drive (/dev/sdb) and slightly older
kernel (2.6.32).

Similar to:

dd if=/dev/sdb of=/dev/null conv=noerror,sync bs=4k

With a good drive it ran fine.

Then I used hdparm --make-bad-sector to intentionally corrupt a sector
on the drive.

When I re-ran it, /var/log/messages reported 10 bad logical blocks.
And even worse, dd reported 20 bad blocks.  I examined the data dd
read and it had 80KB of zero'ed out data.  So that's 160 sectors worth
of data lost because of a single bad sector.  At most I was expecting
4KB of zero'ed out data.

I haven't started troubleshooting, but I want to know if this is
expected behavior due to read-ahead or something.  (Is there
read-ahead on the raw device, or just if a file system is involved.)

I can redo my test with 2.6.34 and get logs if that is a bug.

And if not a bug, is there a hdparm command I can issue to eliminate
this behaviour.

Thanks
Greg

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: If I have a single bad sector, how many failed reads should simple dd report?
  2010-07-08 17:14 If I have a single bad sector, how many failed reads should simple dd report? Greg Freemyer
@ 2010-07-09 19:04 ` Greg Freemyer
  2010-07-10  1:19   ` Mark Lord
  0 siblings, 1 reply; 7+ messages in thread
From: Greg Freemyer @ 2010-07-09 19:04 UTC (permalink / raw)
  To: IDE/ATA development list, Mark Lord

Adding Mark Lord to CC because hdparm may be part of the problem.

On Thu, Jul 8, 2010 at 1:14 PM, Greg Freemyer <greg.freemyer@gmail.com> wrote:
> All,
>
> I just ran a test against a IDE drive (/dev/sdb) and slightly older
> kernel (2.6.32).
>
> Similar to:
>
> dd if=/dev/sdb of=/dev/null conv=noerror,sync bs=4k
>
> With a good drive it ran fine.
>
> Then I used hdparm --make-bad-sector to intentionally corrupt a sector
> on the drive.
>
> When I re-ran it, /var/log/messages reported 10 bad logical blocks.
> And even worse, dd reported 20 bad blocks.  I examined the data dd
> read and it had 80KB of zero'ed out data.  So that's 160 sectors worth
> of data lost because of a single bad sector.  At most I was expecting
> 4KB of zero'ed out data.
>
> I haven't started troubleshooting, but I want to know if this is
> expected behavior due to read-ahead or something.  (Is there
> read-ahead on the raw device, or just if a file system is involved.)
>
> I can redo my test with 2.6.34 and get logs if that is a bug.
>
> And if not a bug, is there a hdparm command I can issue to eliminate
> this behaviour.
>
> Thanks
> Greg

I retested with 2.6.25 and see the same behavior, so if its a bug, its
not a recently introduced one.  But 2.6.25 is showing a lot more media
errors than I recall from 2.6.32.

Even stranger, I'm now introducing the bad sector as sector 100,000
(ie. about 50 MB from the start of the drive).

Then doing a dd to capture just the first 100 MB to a file.

ie.
dd if=/dev/sdb of=clean.dd conv=noerror,sync bs=4K count=25600
hdparm --make-bad-sector 100000 --yes-i-know-what-i-am-doing /dev/sdb
dd if=/dev/sdb of=corrupt.dd conv=noerror,sync bs=4K count=25600
hdparm --write-sector 100000 --yes-i-know-what-i-am-doing /dev/sdb

(clearly don't do that to a disk you care about.  The data at sector
100,000 is permanently lost.)

cmp -bl clean.dd corrupt.dd > delta.log

Looking at delta.log, the bytes start disagreeing 4 bytes prior to the
sector boundary.  How can that be?

I guess hdparm --make-bad-sector may be corrupting an extra 4 bytes,
but that would surprise me.

That seems like a definite bug in either the kernel or hdparm.

Mark, do you have any input about either the early start of the
corruption, or the very large number of sectors that seem corrupted by
making a single sector bad.

Greg

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: If I have a single bad sector, how many failed reads should simple dd report?
  2010-07-09 19:04 ` Greg Freemyer
@ 2010-07-10  1:19   ` Mark Lord
  2010-07-10  1:24     ` Mark Lord
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Lord @ 2010-07-10  1:19 UTC (permalink / raw)
  To: Greg Freemyer; +Cc: IDE/ATA development list, Mark Lord

On 09/07/10 03:04 PM, Greg Freemyer wrote:
..
>> When I re-ran it, /var/log/messages reported 10 bad logical blocks.
>> And even worse, dd reported 20 bad blocks.  I examined the data dd
>> read and it had 80KB of zero'ed out data.  So that's 160 sectors worth
>> of data lost because of a single bad sector.  At most I was expecting
>> 4KB of zero'ed out data.
..

That's just the standard, undesirable result of the current SCSI EH
when used with libata for (mainly) desktop computers.

I have patches (against older kernels) to fix it, but have yet to
get both myself and James B. interested enough simultaneously to
actually get the kernel fixed.  :)
..
> cmp -bl clean.dd corrupt.dd>  delta.log
>
> Looking at delta.log, the bytes start disagreeing 4 bytes prior to the
> sector boundary.  How can that be?
..

Dunno.  That could be a different problem.
I've seen the odd scattered report of data corruption on ext3/ext4
with large data sets, but don't know what the root causes were.

Cheers

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: If I have a single bad sector, how many failed reads should simple dd report?
  2010-07-10  1:19   ` Mark Lord
@ 2010-07-10  1:24     ` Mark Lord
  2010-07-10 14:14       ` James Bottomley
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Lord @ 2010-07-10  1:24 UTC (permalink / raw)
  To: Greg Freemyer; +Cc: IDE/ATA development list, Mark Lord

[-- Attachment #1: Type: text/plain, Size: 4127 bytes --]

On 09/07/10 09:19 PM, Mark Lord wrote:
> On 09/07/10 03:04 PM, Greg Freemyer wrote:
> ..
>>> When I re-ran it, /var/log/messages reported 10 bad logical blocks.
>>> And even worse, dd reported 20 bad blocks. I examined the data dd
>>> read and it had 80KB of zero'ed out data. So that's 160 sectors worth
>>> of data lost because of a single bad sector. At most I was expecting
>>> 4KB of zero'ed out data.
> ..
>
> That's just the standard, undesirable result of the current SCSI EH
> when used with libata for (mainly) desktop computers.
>
> I have patches (against older kernels) to fix it, but have yet to
> get both myself and James B. interested enough simultaneously to
> actually get the kernel fixed. :)
..

Here (attached and inline below) are my most recent patches for this.
Still outdated, though.  These are against the SLES11 2.6.27.19 kernel:

-----------------------------snip----------------------------

Stop the SCSI EH from performing tons of retries on unrecoverable medium errors,
so that error-handling fails more quickly and we (EMC) avoid unneeded node resets.

The ugliness of this patch matches the ugliness of SCSI EH.
Does *anyone* actually understand this code completely?

Signed-off-by: Mark Lord <mlord@pobox.com>

--- old/drivers/scsi/scsi_error.c	2009-06-04 09:46:55.000000000 -0400
+++ linux/drivers/scsi/scsi_error.c	2009-06-04 12:08:48.000000000 -0400
@@ -423,6 +423,52 @@
  	}
  }
  
+/*
+ * The problem with scsi_check_sense(), is that it is (designed to be)
+ * called only after retries are exhausted.  But for MEDIUM_ERRORs (only)
+ * we don't want any retries here at all.
+ *
+ * So this function below is a clone of the necessary parts from scsi_check_sense(),
+ * to check for unrecoverable MEDIUM_ERRORs when deciding whether to retry or not.
+ */
+static int scsi_unrecoverable_medium_error(struct scsi_cmnd *scmd)
+{
+	struct scsi_sense_hdr sshdr;
+
+	if (! scsi_command_normalize_sense(scmd, &sshdr))
+		return 0;	/* no valid sense data */
+
+	if (scsi_sense_is_deferred(&sshdr))
+		return 0;
+
+	if (sshdr.response_code == 0x70) {
+		/* fixed format */
+		if (scmd->sense_buffer[2] & 0xe0)
+			return 0;
+	} else {
+		/*
+		 * descriptor format: look for "stream commands sense data
+		 * descriptor" (see SSC-3). Assume single sense data
+		 * descriptor. Ignore ILI from SBC-2 READ LONG and WRITE LONG.
+		 */
+		if ((sshdr.additional_length > 3) &&
+		    (scmd->sense_buffer[8] == 0x4) &&
+		    (scmd->sense_buffer[11] & 0xe0))
+			return 0;
+	}
+
+	switch (sshdr.sense_key) {
+	case MEDIUM_ERROR:
+		if (sshdr.asc == 0x11 || /* UNRECOVERED READ ERR */
+		    sshdr.asc == 0x13 || /* AMNF DATA FIELD */
+		    sshdr.asc == 0x14) { /* RECORD NOT FOUND */
+			//printk(KERN_WARNING "%s: MEDIUM_ERROR\n", __func__);
+			return 1;
+		}
+	}
+	return 0;
+}
+
  /**
   * scsi_eh_completed_normally - Disposition a eh cmd on return from LLD.
   * @scmd:	SCSI cmd to examine.
@@ -1334,6 +1380,8 @@
  
  	switch (status_byte(scmd->result)) {
  	case CHECK_CONDITION:
+		if (scsi_unrecoverable_medium_error(scmd))
+			return 1;
  		/*
  		 * assume caller has checked sense and determinted
  		 * the check condition was retryable.
-----------------------------snip----------------------------

On encountering a bad sector, report and skip over it,
then continue with the remainder of the request.
Otherwise we would fail perfectly good sectors,
making a bad situation even worse.

Signed-off-by: Mark Lord <mlord@pobox.com>

--- old/drivers/scsi/scsi_lib.c	2009-06-04 12:26:52.000000000 -0400
+++ linux/drivers/scsi/scsi_lib.c	2009-06-04 14:40:11.000000000 -0400
@@ -952,6 +952,12 @@
  	 */
  	if (sense_valid && !sense_deferred) {
  		switch (sshdr.sense_key) {
+		case MEDIUM_ERROR:
+		/* Bad sector.  Fail it, and then continue the rest of the request. */
+		if (this_count && scsi_end_request(cmd, -EIO, cmd->device->sector_size, 1) == NULL) {
+			cmd->retries = 0;       // go around again..
+			return;
+		}
  		case UNIT_ATTENTION:
  			if (cmd->device->removable) {
  				/* Detected disc change.  Set a bit
-----------------------------snip----------------------------

[-- Attachment #2: 01_scsi_no_retry_medium_errors-sles11.patch --]
[-- Type: text/x-diff, Size: 2252 bytes --]

sles11:
Stop the SCSI EH from performing tons of retries on unrecoverable medium errors,
so that error-handling fails more quickly and we (EMC) avoid unneeded node resets.

The ugliness of this patch matches the ugliness of SCSI EH.
Does *anyone* actually understand this code completely?

Signed-off-by: Mark Lord <mlord@pobox.com>

--- old/drivers/scsi/scsi_error.c	2009-06-04 09:46:55.000000000 -0400
+++ linux/drivers/scsi/scsi_error.c	2009-06-04 12:08:48.000000000 -0400
@@ -423,6 +423,52 @@
 	}
 }
 
+/*
+ * The problem with scsi_check_sense(), is that it is (designed to be)
+ * called only after retries are exhausted.  But for MEDIUM_ERRORs (only)
+ * we don't want any retries here at all.
+ *
+ * So this function below is a clone of the necessary parts from scsi_check_sense(),
+ * to check for unrecoverable MEDIUM_ERRORs when deciding whether to retry or not.
+ */
+static int scsi_unrecoverable_medium_error(struct scsi_cmnd *scmd)
+{
+	struct scsi_sense_hdr sshdr;
+
+	if (! scsi_command_normalize_sense(scmd, &sshdr))
+		return 0;	/* no valid sense data */
+
+	if (scsi_sense_is_deferred(&sshdr))
+		return 0;
+
+	if (sshdr.response_code == 0x70) {
+		/* fixed format */
+		if (scmd->sense_buffer[2] & 0xe0)
+			return 0;
+	} else {
+		/*
+		 * descriptor format: look for "stream commands sense data
+		 * descriptor" (see SSC-3). Assume single sense data
+		 * descriptor. Ignore ILI from SBC-2 READ LONG and WRITE LONG.
+		 */
+		if ((sshdr.additional_length > 3) &&
+		    (scmd->sense_buffer[8] == 0x4) &&
+		    (scmd->sense_buffer[11] & 0xe0))
+			return 0;
+	}
+
+	switch (sshdr.sense_key) {
+	case MEDIUM_ERROR:
+		if (sshdr.asc == 0x11 || /* UNRECOVERED READ ERR */
+		    sshdr.asc == 0x13 || /* AMNF DATA FIELD */
+		    sshdr.asc == 0x14) { /* RECORD NOT FOUND */
+			//printk(KERN_WARNING "%s: MEDIUM_ERROR\n", __func__);
+			return 1;
+		}
+	}
+	return 0;
+}
+
 /**
  * scsi_eh_completed_normally - Disposition a eh cmd on return from LLD.
  * @scmd:	SCSI cmd to examine.
@@ -1334,6 +1380,8 @@
 
 	switch (status_byte(scmd->result)) {
 	case CHECK_CONDITION:
+		if (scsi_unrecoverable_medium_error(scmd))
+			return 1;
 		/*
 		 * assume caller has checked sense and determinted
 		 * the check condition was retryable.

[-- Attachment #3: 02_scsi_eh_continue-sles11.patch --]
[-- Type: text/x-diff, Size: 825 bytes --]

sles11:
On encountering a bad sector, report and skip over it,
then continue with the remainder of the request.
Otherwise we would fail perfectly good sectors,
making a bad situation even worse.

Signed-off-by: Mark Lord <mlord@pobox.com>

--- old/drivers/scsi/scsi_lib.c	2009-06-04 12:26:52.000000000 -0400
+++ linux/drivers/scsi/scsi_lib.c	2009-06-04 14:40:11.000000000 -0400
@@ -952,6 +952,12 @@
 	 */
 	if (sense_valid && !sense_deferred) {
 		switch (sshdr.sense_key) {
+		case MEDIUM_ERROR:
+		/* Bad sector.  Fail it, and then continue the rest of the request. */
+		if (this_count && scsi_end_request(cmd, -EIO, cmd->device->sector_size, 1) == NULL) {
+			cmd->retries = 0;       // go around again..
+			return;
+		}
 		case UNIT_ATTENTION:
 			if (cmd->device->removable) {
 				/* Detected disc change.  Set a bit

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: If I have a single bad sector, how many failed reads should simple dd report?
  2010-07-10  1:24     ` Mark Lord
@ 2010-07-10 14:14       ` James Bottomley
  2010-07-11 12:58         ` Greg Freemyer
  2010-07-13 19:07         ` Mark Lord
  0 siblings, 2 replies; 7+ messages in thread
From: James Bottomley @ 2010-07-10 14:14 UTC (permalink / raw)
  To: Mark Lord; +Cc: Greg Freemyer, IDE/ATA development list, Mark Lord

On Fri, 2010-07-09 at 21:24 -0400, Mark Lord wrote:
> On 09/07/10 09:19 PM, Mark Lord wrote:
> > On 09/07/10 03:04 PM, Greg Freemyer wrote:
> > ..
> >>> When I re-ran it, /var/log/messages reported 10 bad logical blocks.
> >>> And even worse, dd reported 20 bad blocks. I examined the data dd
> >>> read and it had 80KB of zero'ed out data. So that's 160 sectors worth
> >>> of data lost because of a single bad sector. At most I was expecting
> >>> 4KB of zero'ed out data.
> > ..
> >
> > That's just the standard, undesirable result of the current SCSI EH
> > when used with libata for (mainly) desktop computers.
> >
> > I have patches (against older kernels) to fix it, but have yet to
> > get both myself and James B. interested enough simultaneously to
> > actually get the kernel fixed. :)
> ..
> 
> Here (attached and inline below) are my most recent patches for this.
> Still outdated, though.  These are against the SLES11 2.6.27.19 kernel:
> 
> -----------------------------snip----------------------------
> 
> Stop the SCSI EH from performing tons of retries on unrecoverable medium errors,
> so that error-handling fails more quickly and we (EMC) avoid unneeded node resets.
> 
> The ugliness of this patch matches the ugliness of SCSI EH.
> Does *anyone* actually understand this code completely?
> 
> Signed-off-by: Mark Lord <mlord@pobox.com>
> 
> --- old/drivers/scsi/scsi_error.c	2009-06-04 09:46:55.000000000 -0400
> +++ linux/drivers/scsi/scsi_error.c	2009-06-04 12:08:48.000000000 -0400
> @@ -423,6 +423,52 @@
>   	}
>   }
>   
> +/*
> + * The problem with scsi_check_sense(), is that it is (designed to be)
> + * called only after retries are exhausted.  But for MEDIUM_ERRORs (only)
> + * we don't want any retries here at all.
> + *
> + * So this function below is a clone of the necessary parts from scsi_check_sense(),
> + * to check for unrecoverable MEDIUM_ERRORs when deciding whether to retry or not.
> + */
> +static int scsi_unrecoverable_medium_error(struct scsi_cmnd *scmd)
> +{
> +	struct scsi_sense_hdr sshdr;
> +
> +	if (! scsi_command_normalize_sense(scmd, &sshdr))
> +		return 0;	/* no valid sense data */
> +
> +	if (scsi_sense_is_deferred(&sshdr))
> +		return 0;
> +
> +	if (sshdr.response_code == 0x70) {
> +		/* fixed format */
> +		if (scmd->sense_buffer[2] & 0xe0)
> +			return 0;
> +	} else {
> +		/*
> +		 * descriptor format: look for "stream commands sense data
> +		 * descriptor" (see SSC-3). Assume single sense data
> +		 * descriptor. Ignore ILI from SBC-2 READ LONG and WRITE LONG.
> +		 */
> +		if ((sshdr.additional_length > 3) &&
> +		    (scmd->sense_buffer[8] == 0x4) &&
> +		    (scmd->sense_buffer[11] & 0xe0))
> +			return 0;
> +	}
> +
> +	switch (sshdr.sense_key) {
> +	case MEDIUM_ERROR:
> +		if (sshdr.asc == 0x11 || /* UNRECOVERED READ ERR */
> +		    sshdr.asc == 0x13 || /* AMNF DATA FIELD */
> +		    sshdr.asc == 0x14) { /* RECORD NOT FOUND */
> +			//printk(KERN_WARNING "%s: MEDIUM_ERROR\n", __func__);
> +			return 1;
> +		}
> +	}
> +	return 0;
> +}


> +
>   /**
>    * scsi_eh_completed_normally - Disposition a eh cmd on return from LLD.
>    * @scmd:	SCSI cmd to examine.
> @@ -1334,6 +1380,8 @@
>   
>   	switch (status_byte(scmd->result)) {
>   	case CHECK_CONDITION:
> +		if (scsi_unrecoverable_medium_error(scmd))
> +			return 1;
>   		/*
>   		 * assume caller has checked sense and determinted
>   		 * the check condition was retryable.

The check is redundant:  scsi_decide_disposition is where we check for
retries or error handling.  If you look, it already picks out your three
cases and returns success for them (meaning pass straight through).  in
sd_done() MEDIUM_ERROR means complete immediately.


> 
> On encountering a bad sector, report and skip over it,
> then continue with the remainder of the request.
> Otherwise we would fail perfectly good sectors,
> making a bad situation even worse.
> 
> Signed-off-by: Mark Lord <mlord@pobox.com>
> 
> --- old/drivers/scsi/scsi_lib.c	2009-06-04 12:26:52.000000000 -0400
> +++ linux/drivers/scsi/scsi_lib.c	2009-06-04 14:40:11.000000000 -0400
> @@ -952,6 +952,12 @@
>   	 */
>   	if (sense_valid && !sense_deferred) {
>   		switch (sshdr.sense_key) {
> +		case MEDIUM_ERROR:
> +		/* Bad sector.  Fail it, and then continue the rest of the request. */
> +		if (this_count && scsi_end_request(cmd, -EIO, cmd->device->sector_size, 1) == NULL) {
> +			cmd->retries = 0;       // go around again..
> +			return;
> +		}
>   		case UNIT_ATTENTION:
>   			if (cmd->device->removable) {
>   				/* Detected disc change.  Set a bit

This one's in the wrong place.  Normal MEDIUM_ERRORS complete above
this.

I also don't think the skip is right.  We're supposed to communicate to
block what we've done, and all we have to do that with is good bytes.
If we skip over a bad sector and try to complete the rest, we've lost
the error position.

The failure return from Medium error is defined to be the number of
bytes we actually wrote, which is everything up to the medium error.

James



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: If I have a single bad sector, how many failed reads should simple dd report?
  2010-07-10 14:14       ` James Bottomley
@ 2010-07-11 12:58         ` Greg Freemyer
  2010-07-13 19:07         ` Mark Lord
  1 sibling, 0 replies; 7+ messages in thread
From: Greg Freemyer @ 2010-07-11 12:58 UTC (permalink / raw)
  To: James Bottomley; +Cc: Mark Lord, IDE/ATA development list, Mark Lord

On Sat, Jul 10, 2010 at 10:14 AM, James Bottomley
<James.Bottomley@suse.de> wrote:
> On Fri, 2010-07-09 at 21:24 -0400, Mark Lord wrote:
>> On 09/07/10 09:19 PM, Mark Lord wrote:
>> > On 09/07/10 03:04 PM, Greg Freemyer wrote:
>> > ..
>> >>> When I re-ran it, /var/log/messages reported 10 bad logical blocks.
>> >>> And even worse, dd reported 20 bad blocks. I examined the data dd
>> >>> read and it had 80KB of zero'ed out data. So that's 160 sectors worth
>> >>> of data lost because of a single bad sector. At most I was expecting
>> >>> 4KB of zero'ed out data.
>> > ..
>> >
>> > That's just the standard, undesirable result of the current SCSI EH
>> > when used with libata for (mainly) desktop computers.
>> >
>> > I have patches (against older kernels) to fix it, but have yet to
>> > get both myself and James B. interested enough simultaneously to
>> > actually get the kernel fixed. :)
>> ..
>>
>> Here (attached and inline below) are my most recent patches for this.
>> Still outdated, though.  These are against the SLES11 2.6.27.19 kernel:
>>
>> -----------------------------snip----------------------------
>>
>> Stop the SCSI EH from performing tons of retries on unrecoverable medium errors,
>> so that error-handling fails more quickly and we (EMC) avoid unneeded node resets.
>>
>> The ugliness of this patch matches the ugliness of SCSI EH.
>> Does *anyone* actually understand this code completely?
>>
>> Signed-off-by: Mark Lord <mlord@pobox.com>
>>
>> --- old/drivers/scsi/scsi_error.c     2009-06-04 09:46:55.000000000 -0400
>> +++ linux/drivers/scsi/scsi_error.c   2009-06-04 12:08:48.000000000 -0400
>> @@ -423,6 +423,52 @@
>>       }
>>   }
>>
>> +/*
>> + * The problem with scsi_check_sense(), is that it is (designed to be)
>> + * called only after retries are exhausted.  But for MEDIUM_ERRORs (only)
>> + * we don't want any retries here at all.
>> + *
>> + * So this function below is a clone of the necessary parts from scsi_check_sense(),
>> + * to check for unrecoverable MEDIUM_ERRORs when deciding whether to retry or not.
>> + */
>> +static int scsi_unrecoverable_medium_error(struct scsi_cmnd *scmd)
>> +{
>> +     struct scsi_sense_hdr sshdr;
>> +
>> +     if (! scsi_command_normalize_sense(scmd, &sshdr))
>> +             return 0;       /* no valid sense data */
>> +
>> +     if (scsi_sense_is_deferred(&sshdr))
>> +             return 0;
>> +
>> +     if (sshdr.response_code == 0x70) {
>> +             /* fixed format */
>> +             if (scmd->sense_buffer[2] & 0xe0)
>> +                     return 0;
>> +     } else {
>> +             /*
>> +              * descriptor format: look for "stream commands sense data
>> +              * descriptor" (see SSC-3). Assume single sense data
>> +              * descriptor. Ignore ILI from SBC-2 READ LONG and WRITE LONG.
>> +              */
>> +             if ((sshdr.additional_length > 3) &&
>> +                 (scmd->sense_buffer[8] == 0x4) &&
>> +                 (scmd->sense_buffer[11] & 0xe0))
>> +                     return 0;
>> +     }
>> +
>> +     switch (sshdr.sense_key) {
>> +     case MEDIUM_ERROR:
>> +             if (sshdr.asc == 0x11 || /* UNRECOVERED READ ERR */
>> +                 sshdr.asc == 0x13 || /* AMNF DATA FIELD */
>> +                 sshdr.asc == 0x14) { /* RECORD NOT FOUND */
>> +                     //printk(KERN_WARNING "%s: MEDIUM_ERROR\n", __func__);
>> +                     return 1;
>> +             }
>> +     }
>> +     return 0;
>> +}
>
>
>> +
>>   /**
>>    * scsi_eh_completed_normally - Disposition a eh cmd on return from LLD.
>>    * @scmd:   SCSI cmd to examine.
>> @@ -1334,6 +1380,8 @@
>>
>>       switch (status_byte(scmd->result)) {
>>       case CHECK_CONDITION:
>> +             if (scsi_unrecoverable_medium_error(scmd))
>> +                     return 1;
>>               /*
>>                * assume caller has checked sense and determinted
>>                * the check condition was retryable.
>
> The check is redundant:  scsi_decide_disposition is where we check for
> retries or error handling.  If you look, it already picks out your three
> cases and returns success for them (meaning pass straight through).  in
> sd_done() MEDIUM_ERROR means complete immediately.
>
>
>>
>> On encountering a bad sector, report and skip over it,
>> then continue with the remainder of the request.
>> Otherwise we would fail perfectly good sectors,
>> making a bad situation even worse.
>>
>> Signed-off-by: Mark Lord <mlord@pobox.com>
>>
>> --- old/drivers/scsi/scsi_lib.c       2009-06-04 12:26:52.000000000 -0400
>> +++ linux/drivers/scsi/scsi_lib.c     2009-06-04 14:40:11.000000000 -0400
>> @@ -952,6 +952,12 @@
>>        */
>>       if (sense_valid && !sense_deferred) {
>>               switch (sshdr.sense_key) {
>> +             case MEDIUM_ERROR:
>> +             /* Bad sector.  Fail it, and then continue the rest of the request. */
>> +             if (this_count && scsi_end_request(cmd, -EIO, cmd->device->sector_size, 1) == NULL) {
>> +                     cmd->retries = 0;       // go around again..
>> +                     return;
>> +             }
>>               case UNIT_ATTENTION:
>>                       if (cmd->device->removable) {
>>                               /* Detected disc change.  Set a bit
>
> This one's in the wrong place.  Normal MEDIUM_ERRORS complete above
> this.
>
> I also don't think the skip is right.  We're supposed to communicate to
> block what we've done, and all we have to do that with is good bytes.
> If we skip over a bad sector and try to complete the rest, we've lost
> the error position.
>
> The failure return from Medium error is defined to be the number of
> bytes we actually wrote, which is everything up to the medium error.
>
> James

If you guys have patches you want me to test, I'm happy to do so, but
I urge the team to work on this simple test case.

It may have been this way since day one of libata, but it certainly
looks like a collection of bugs to me.

Again, assuming /dev/sdb is a test drive without valuable data the
test case can be as simple as just:

==============
#create a 1000 sectors of random data starting at sector 1000
dd if=/dev/urandom of=/dev/sdb bs=512 seek=1000 count=1000

#make a clean copy of the data, starting at sector 0 and going well
past. 4MB total
dd if=/dev/sdb of=clean.dd conv=noerror,sync bs=4K count=1000

#corrupt a sector in the middle of the test data
hdparm --make-bad-sector 1500 --yes-i-know-what-i-am-doing /dev/sdb

#make another copy of the first 4MB
dd if=/dev/sdb of=corrupt.dd conv=noerror,sync bs=4K count=1000

#restore the drive sector so it is not permanently lost
hdparm --write-sector 1500 --yes-i-know-what-i-am-doing /dev/sdb

(I choose sector 1500 this time to make the testing process faster.)

cmp -bl clean.dd corrupt.dd > delta.log

Review delta.log for disagrees.  And /var/log/warn for logical block
failures as well as extraneous media errors.
==============

I found:

1) Corruption stated several bytes prior to the corrupt sector
2) With dd using 512 byte blocks, 64 sectors worth of corrupt data in corrupt.dd
3) With dd using 4K blocks, 160 sectors worth of corrupt data in
corrupt.dd.   (I don't know why the difference and I was using a
customized version of dd called dcfldd for this test.)
4) a 50% reduction in throughput based on one bad sector.  I'm not
sure that's a bug, but it is clearly an issue that can be looked at.
You need a much bigger dd run to see that impact.

Thanks
Greg

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: If I have a single bad sector, how many failed reads should simple dd report?
  2010-07-10 14:14       ` James Bottomley
  2010-07-11 12:58         ` Greg Freemyer
@ 2010-07-13 19:07         ` Mark Lord
  1 sibling, 0 replies; 7+ messages in thread
From: Mark Lord @ 2010-07-13 19:07 UTC (permalink / raw)
  To: James Bottomley; +Cc: Greg Freemyer, IDE/ATA development list, Mark Lord

On 10/07/10 10:14 AM, James Bottomley wrote:
..
> The check is redundant:  scsi_decide_disposition is where we check for
> retries or error handling.  If you look, it already picks out your three
> cases and returns success for them (meaning pass straight through).  in
> sd_done() MEDIUM_ERROR means complete immediately.
..
> This one's in the wrong place.  Normal MEDIUM_ERRORS complete above
> this.
>
> I also don't think the skip is right.  We're supposed to communicate to
> block what we've done, and all we have to do that with is good bytes.
> If we skip over a bad sector and try to complete the rest, we've lost
> the error position.
..

No, on 2.6.27, those patches work exactly as they should.
Failing (to the block layer) the bad sector(s), and successfully reading all others.

Try it.  It works.
But horribly out of date for the newer kernels,
which is why it's not in your patch queue from me.  :)

Cheers

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-07-13 19:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-08 17:14 If I have a single bad sector, how many failed reads should simple dd report? Greg Freemyer
2010-07-09 19:04 ` Greg Freemyer
2010-07-10  1:19   ` Mark Lord
2010-07-10  1:24     ` Mark Lord
2010-07-10 14:14       ` James Bottomley
2010-07-11 12:58         ` Greg Freemyer
2010-07-13 19:07         ` Mark Lord

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.