linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* LibPATA code issues / 2.6.15.4
@ 2006-02-14  9:48 Justin Piszcz
  2006-02-14 14:50 ` Mark Lord
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-02-14  9:48 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel

Jeff,

I'd have to double check but I do not recall getting these errors before 
the pass-thru code was introduced in 2.6.15, I also was not running the 
smart daemon until 2.6.15 for SATA drives as it was not supported.

I had a few issues before that I posted to LKML, those were due to too 
many SATA devices etc, everything is back to normal for the most part.

Speed, etc, all is well again, almost...

/dev/sdc:
  Timing buffered disk reads:  154 MB in  3.02 seconds =  50.97 MB/sec
/dev/sdc:
  Timing buffered disk reads:  162 MB in  3.00 seconds =  53.94 MB/sec

The only issue I have is when I copy a lot of files to a WD 400GB drive I 
these pesky errors in dmesg:

  ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
  ata3: status=0x51 { DriveReady SeekComplete Error }
  ata3: error=0x04 { DriveStatusError }

Yet, everything copied (226GB) or so to the 400GB drive without a single 
I/O error that I am aware of.  So my question is, why do I get these 
errors in dmesg if they are not critical?

Thanks,

Justin.



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-14  9:48 LibPATA code issues / 2.6.15.4 Justin Piszcz
@ 2006-02-14 14:50 ` Mark Lord
  2006-02-14 16:27   ` David Greaves
                     ` (2 more replies)
  0 siblings, 3 replies; 147+ messages in thread
From: Mark Lord @ 2006-02-14 14:50 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Jeff Garzik, linux-kernel, IDE/ATA development list

Justin Piszcz wrote:
..
>  ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>  ata3: status=0x51 { DriveReady SeekComplete Error }
>  ata3: error=0x04 { DriveStatusError }

I wonder if the FUA logic is inserting cache-flush commands
and perhaps the drive is rejecting those?

Jeff, we really ought to be including the failed ATA opcode
in those error messages!!

Cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-14 14:50 ` Mark Lord
@ 2006-02-14 16:27   ` David Greaves
  2006-02-14 17:12     ` Justin Piszcz
  2006-02-14 23:58   ` Justin Piszcz
  2006-02-17  8:45   ` Jeff Garzik
  2 siblings, 1 reply; 147+ messages in thread
From: David Greaves @ 2006-02-14 16:27 UTC (permalink / raw)
  To: Mark Lord
  Cc: Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list

Mark Lord wrote:

> Justin Piszcz wrote:
> ..
>
>>  ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>>  ata3: status=0x51 { DriveReady SeekComplete Error }
>>  ata3: error=0x04 { DriveStatusError }
>
>
> I wonder if the FUA logic is inserting cache-flush commands
> and perhaps the drive is rejecting those?
>
> Jeff, we really ought to be including the failed ATA opcode
> in those error messages!!
>
If such a thing were available as a patch then I too would apply it and
hopefully could provide useful feedback.

David
PS My problems:

http://marc.theaimsgroup.com/?l=linux-kernel&m=113769509617034&w=2
http://marc.theaimsgroup.com/?l=linux-ide&m=113828551519727&w=2
http://marc.theaimsgroup.com/?l=linux-ide&m=113829573105369&w=2
http://marc.theaimsgroup.com/?l=linux-ide&m=113933732903205&w=2



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-14 16:27   ` David Greaves
@ 2006-02-14 17:12     ` Justin Piszcz
  2006-02-14 18:00       ` Mark Lord
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-02-14 17:12 UTC (permalink / raw)
  To: David Greaves
  Cc: Mark Lord, Jeff Garzik, linux-kernel, IDE/ATA development list

I would like to try the patch too, if available.

I got these errors when nothing (apparent) was going on.

[25158.676998] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
0xb/00/00
[25158.677005] ata3: status=0x51 { DriveReady SeekComplete Error }
[25158.677009] ata3: error=0x04 { DriveStatusError }
[27306.663556] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
0xb/00/00
[27306.663563] ata3: status=0x51 { DriveReady SeekComplete Error }
[27306.663567] ata3: error=0x04 { DriveStatusError }


On Tue, 14 Feb 2006, David Greaves wrote:

> Mark Lord wrote:
>
>> Justin Piszcz wrote:
>> ..
>>
>>>  ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>>>  ata3: status=0x51 { DriveReady SeekComplete Error }
>>>  ata3: error=0x04 { DriveStatusError }
>>
>>
>> I wonder if the FUA logic is inserting cache-flush commands
>> and perhaps the drive is rejecting those?
>>
>> Jeff, we really ought to be including the failed ATA opcode
>> in those error messages!!
>>
> If such a thing were available as a patch then I too would apply it and
> hopefully could provide useful feedback.
>
> David
> PS My problems:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=113769509617034&w=2
> http://marc.theaimsgroup.com/?l=linux-ide&m=113828551519727&w=2
> http://marc.theaimsgroup.com/?l=linux-ide&m=113829573105369&w=2
> http://marc.theaimsgroup.com/?l=linux-ide&m=113933732903205&w=2
>
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-14 17:12     ` Justin Piszcz
@ 2006-02-14 18:00       ` Mark Lord
  2006-02-14 18:06         ` Justin Piszcz
                           ` (2 more replies)
  0 siblings, 3 replies; 147+ messages in thread
From: Mark Lord @ 2006-02-14 18:00 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list

On Tuesday 14 February 2006 12:12, Justin Piszcz wrote:
> I would like to try the patch too, if available.

Something like this:  (for 2.6.16-rc3-git2, but should be okay on 2.6.15 also).

Untested:  include the original SCSI opcode in printk's for libata SCSI errors,
to help understand where the errors are coming from.

Signed-Off-By:  Mark Lord <mlord@pobox.com>

--- linux/drivers/scsi/libata-scsi.c.orig	2006-02-12 19:27:25.000000000 -0500
+++ linux/drivers/scsi/libata-scsi.c	2006-02-14 12:54:17.000000000 -0500
@@ -420,6 +420,7 @@
  *	@sk: the sense key we'll fill out
  *	@asc: the additional sense code we'll fill out
  *	@ascq: the additional sense code qualifier we'll fill out
+ *	@opcode: the original SCSI command opcode byte
  *
  *	Converts an ATA error into a SCSI error.  Fill out pointers to
  *	SK, ASC, and ASCQ bytes for later use in fixed or descriptor
@@ -429,7 +430,7 @@
  *	spin_lock_irqsave(host_set lock)
  */
 void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc, 
-			u8 *ascq)
+			u8 *ascq, u8 opcode)
 {
 	int i;
 
@@ -508,8 +509,8 @@
 		}
 	}
 	/* No error?  Undecoded? */
-	printk(KERN_WARNING "ata%u: no sense translation for status: 0x%02x\n", 
-	       id, drv_stat);
+	printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: 0x%02x\n", 
+	       id, opcode, drv_stat);
 
 	/* For our last chance pick, use medium read error because
 	 * it's much more common than an ATA drive telling you a write
@@ -520,8 +521,8 @@
 	*ascq = 0x04; /*  "auto-reallocation failed" */
 
  translate_done:
-	printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to "
-	       "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err,
+	printk(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to "
+	       "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err,
 	       *sk, *asc, *ascq);
 	return;
 }
@@ -562,7 +563,7 @@
 	 */
 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
-				   &sb[1], &sb[2], &sb[3]);
+				   &sb[1], &sb[2], &sb[3], cmd->cmnd[0]);
 		sb[1] &= 0x0f;
 	}
 
@@ -637,7 +638,7 @@
 	 */
 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
-				   &sb[2], &sb[12], &sb[13]);
+				   &sb[2], &sb[12], &sb[13], cmd->cmnd[0]);
 		sb[2] &= 0x0f;
 	}
 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-14 18:00       ` Mark Lord
@ 2006-02-14 18:06         ` Justin Piszcz
  2006-02-23 23:39         ` Justin Piszcz
  2006-02-25 11:34         ` David Greaves
  2 siblings, 0 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-02-14 18:06 UTC (permalink / raw)
  To: Mark Lord
  Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list

Thanks, I will reboot later tonight and see what type of error codes it 
gives me.

Against 2.6.15.4:

# patch -p1 < /tmp/a
patching file drivers/scsi/libata-scsi.c
Hunk #1 succeeded at 404 (offset -16 lines).
Hunk #2 succeeded at 414 (offset -16 lines).
Hunk #3 succeeded at 493 (offset -16 lines).
Hunk #4 succeeded at 505 (offset -16 lines).
Hunk #5 succeeded at 547 (offset -16 lines).
Hunk #6 succeeded at 622 (offset -16 lines).
#

On Tue, 14 Feb 2006, Mark Lord wrote:

> On Tuesday 14 February 2006 12:12, Justin Piszcz wrote:
>> I would like to try the patch too, if available.
>
> Something like this:  (for 2.6.16-rc3-git2, but should be okay on 2.6.15 also).
>
> Untested:  include the original SCSI opcode in printk's for libata SCSI errors,
> to help understand where the errors are coming from.
>
> Signed-Off-By:  Mark Lord <mlord@pobox.com>
>
> --- linux/drivers/scsi/libata-scsi.c.orig	2006-02-12 19:27:25.000000000 -0500
> +++ linux/drivers/scsi/libata-scsi.c	2006-02-14 12:54:17.000000000 -0500
> @@ -420,6 +420,7 @@
>  *	@sk: the sense key we'll fill out
>  *	@asc: the additional sense code we'll fill out
>  *	@ascq: the additional sense code qualifier we'll fill out
> + *	@opcode: the original SCSI command opcode byte
>  *
>  *	Converts an ATA error into a SCSI error.  Fill out pointers to
>  *	SK, ASC, and ASCQ bytes for later use in fixed or descriptor
> @@ -429,7 +430,7 @@
>  *	spin_lock_irqsave(host_set lock)
>  */
> void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc,
> -			u8 *ascq)
> +			u8 *ascq, u8 opcode)
> {
> 	int i;
>
> @@ -508,8 +509,8 @@
> 		}
> 	}
> 	/* No error?  Undecoded? */
> -	printk(KERN_WARNING "ata%u: no sense translation for status: 0x%02x\n",
> -	       id, drv_stat);
> +	printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: 0x%02x\n",
> +	       id, opcode, drv_stat);
>
> 	/* For our last chance pick, use medium read error because
> 	 * it's much more common than an ATA drive telling you a write
> @@ -520,8 +521,8 @@
> 	*ascq = 0x04; /*  "auto-reallocation failed" */
>
>  translate_done:
> -	printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to "
> -	       "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err,
> +	printk(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to "
> +	       "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err,
> 	       *sk, *asc, *ascq);
> 	return;
> }
> @@ -562,7 +563,7 @@
> 	 */
> 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
> 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
> -				   &sb[1], &sb[2], &sb[3]);
> +				   &sb[1], &sb[2], &sb[3], cmd->cmnd[0]);
> 		sb[1] &= 0x0f;
> 	}
>
> @@ -637,7 +638,7 @@
> 	 */
> 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
> 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
> -				   &sb[2], &sb[12], &sb[13]);
> +				   &sb[2], &sb[12], &sb[13], cmd->cmnd[0]);
> 		sb[2] &= 0x0f;
> 	}
>
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-14 14:50 ` Mark Lord
  2006-02-14 16:27   ` David Greaves
@ 2006-02-14 23:58   ` Justin Piszcz
  2006-02-17  8:45   ` Jeff Garzik
  2 siblings, 0 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-02-14 23:58 UTC (permalink / raw)
  To: Mark Lord; +Cc: Jeff Garzik, linux-kernel, IDE/ATA development list

FYI:

Make a 100GB file, md5sum it, copy it to 'problem' drive and md5sum it, 
same MD5SUMS.

box:/x8# /usr/bin/time dd if=/dev/zero of=100gb bs=1M count=100000 ; 
/usr/bin/time md5sum 100gb; /usr/bin/time cp 100gb /x4 ; cd /x4 ; 
/usr/bin/time md5sum 100gb
100000+0 records in
100000+0 records out
104857600000 bytes transferred in 4735.034107 seconds (22145057 bytes/sec)
0.29user 245.59system 1:18:55elapsed 5%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (0major+210minor)pagefaults 0swaps
1e95cd44e2cb773f483ea7b2f676258d  100gb
248.24user 98.17system 32:50.97elapsed 17%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (1major+188minor)pagefaults 0swaps
14.75user 341.92system 35:25.25elapsed 16%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (4major+183minor)pagefaults 0swaps
1e95cd44e2cb773f483ea7b2f676258d  100gb
246.95user 110.41system 28:06.49elapsed 21%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (1major+190minor)pagefaults 0swaps
box:/x4#

Also, all SMART tests passed with flying colors..

(FYI)


On Tue, 14 Feb 2006, Mark Lord wrote:

> Justin Piszcz wrote:
> ..
>>  ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>>  ata3: status=0x51 { DriveReady SeekComplete Error }
>>  ata3: error=0x04 { DriveStatusError }
>
> I wonder if the FUA logic is inserting cache-flush commands
> and perhaps the drive is rejecting those?
>
> Jeff, we really ought to be including the failed ATA opcode
> in those error messages!!
>
> Cheers
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-14 14:50 ` Mark Lord
  2006-02-14 16:27   ` David Greaves
  2006-02-14 23:58   ` Justin Piszcz
@ 2006-02-17  8:45   ` Jeff Garzik
  2006-02-17 14:59     ` Mark Lord
  2 siblings, 1 reply; 147+ messages in thread
From: Jeff Garzik @ 2006-02-17  8:45 UTC (permalink / raw)
  To: Mark Lord; +Cc: Justin Piszcz, linux-kernel, IDE/ATA development list

Mark Lord wrote:
> Jeff, we really ought to be including the failed ATA opcode
> in those error messages!!

Submit a patch...

	Jeff




^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-17  8:45   ` Jeff Garzik
@ 2006-02-17 14:59     ` Mark Lord
  2006-02-17 15:00       ` Justin Piszcz
  2006-02-18 20:43       ` Sander
  0 siblings, 2 replies; 147+ messages in thread
From: Mark Lord @ 2006-02-17 14:59 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Justin Piszcz, linux-kernel, IDE/ATA development list

On Friday 17 February 2006 03:45, Jeff Garzik wrote:
>Submit a patch... 

You mean, something like this one?
Untested at present, as I was hoping to hear
back from one of the original problem reporters
after they tested it.

Cheers!



-------- Original Message --------
Subject: Re: LibPATA code issues / 2.6.15.4
Date: Tue, 14 Feb 2006 13:00:36 -0500
From: Mark Lord <lkml@rtr.ca>
To: Justin Piszcz <jpiszcz@lucidpixels.com>
CC: David Greaves <david@dgreaves.com>,	Jeff Garzik <jgarzik@pobox.com>, 
linux-kernel@vger.kernel.org,	IDE/ATA development list 
<linux-ide@vger.kernel.org>
References: <Pine.LNX.4.64.0602140439580.3567@p34> 
<43F2050B.8020006@dgreaves.com> <Pine.LNX.4.64.0602141211350.10793@p34>

On Tuesday 14 February 2006 12:12, Justin Piszcz wrote:
> I would like to try the patch too, if available.

Something like this:  (for 2.6.16-rc3-git2, but should be okay on 2.6.15 
also).

Untested:  include the original SCSI opcode in printk's for libata SCSI 
errors,
to help understand where the errors are coming from.

Signed-Off-By:  Mark Lord <mlord@pobox.com>

--- linux/drivers/scsi/libata-scsi.c.orig	2006-02-12 19:27:25.000000000 -0500
+++ linux/drivers/scsi/libata-scsi.c	2006-02-14 12:54:17.000000000 -0500
@@ -420,6 +420,7 @@
  *	@sk: the sense key we'll fill out
  *	@asc: the additional sense code we'll fill out
  *	@ascq: the additional sense code qualifier we'll fill out
+ *	@opcode: the original SCSI command opcode byte
  *
  *	Converts an ATA error into a SCSI error.  Fill out pointers to
  *	SK, ASC, and ASCQ bytes for later use in fixed or descriptor
@@ -429,7 +430,7 @@
  *	spin_lock_irqsave(host_set lock)
  */
 void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 
*asc, 
-			u8 *ascq)
+			u8 *ascq, u8 opcode)
 {
 	int i;
 
@@ -508,8 +509,8 @@
 		}
 	}
 	/* No error?  Undecoded? */
-	printk(KERN_WARNING "ata%u: no sense translation for status: 0x%02x\n", 
-	       id, drv_stat);
+	printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: 
0x%02x\n", 
+	       id, opcode, drv_stat);
 
 	/* For our last chance pick, use medium read error because
 	 * it's much more common than an ATA drive telling you a write
@@ -520,8 +521,8 @@
 	*ascq = 0x04; /*  "auto-reallocation failed" */
 
  translate_done:
-	printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to "
-	       "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err,
+	printk(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to "
+	       "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err,
 	       *sk, *asc, *ascq);
 	return;
 }
@@ -562,7 +563,7 @@
 	 */
 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
-				   &sb[1], &sb[2], &sb[3]);
+				   &sb[1], &sb[2], &sb[3], cmd->cmnd[0]);
 		sb[1] &= 0x0f;
 	}
 
@@ -637,7 +638,7 @@
 	 */
 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
-				   &sb[2], &sb[12], &sb[13]);
+				   &sb[2], &sb[12], &sb[13], cmd->cmnd[0]);
 		sb[2] &= 0x0f;
 	}
 
-

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-17 14:59     ` Mark Lord
@ 2006-02-17 15:00       ` Justin Piszcz
  2006-02-18 20:43       ` Sander
  1 sibling, 0 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-02-17 15:00 UTC (permalink / raw)
  To: Mark Lord; +Cc: Jeff Garzik, linux-kernel, IDE/ATA development list

I have patched the kernel and rebooted it with your patch, but, of course, 
with my luck it has not given me any errors since, even when repeating 
major file copies, bonnie++ and iozone!! :(


On Fri, 17 Feb 2006, Mark Lord wrote:

> On Friday 17 February 2006 03:45, Jeff Garzik wrote:
>> Submit a patch...
>
> You mean, something like this one?
> Untested at present, as I was hoping to hear
> back from one of the original problem reporters
> after they tested it.
>
> Cheers!
>
>
>
> -------- Original Message --------
> Subject: Re: LibPATA code issues / 2.6.15.4
> Date: Tue, 14 Feb 2006 13:00:36 -0500
> From: Mark Lord <lkml@rtr.ca>
> To: Justin Piszcz <jpiszcz@lucidpixels.com>
> CC: David Greaves <david@dgreaves.com>,	Jeff Garzik <jgarzik@pobox.com>,
> linux-kernel@vger.kernel.org,	IDE/ATA development list
> <linux-ide@vger.kernel.org>
> References: <Pine.LNX.4.64.0602140439580.3567@p34>
> <43F2050B.8020006@dgreaves.com> <Pine.LNX.4.64.0602141211350.10793@p34>
>
> On Tuesday 14 February 2006 12:12, Justin Piszcz wrote:
>> I would like to try the patch too, if available.
>
> Something like this:  (for 2.6.16-rc3-git2, but should be okay on 2.6.15
> also).
>
> Untested:  include the original SCSI opcode in printk's for libata SCSI
> errors,
> to help understand where the errors are coming from.
>
> Signed-Off-By:  Mark Lord <mlord@pobox.com>
>
> --- linux/drivers/scsi/libata-scsi.c.orig	2006-02-12 19:27:25.000000000 -0500
> +++ linux/drivers/scsi/libata-scsi.c	2006-02-14 12:54:17.000000000 -0500
> @@ -420,6 +420,7 @@
>  *	@sk: the sense key we'll fill out
>  *	@asc: the additional sense code we'll fill out
>  *	@ascq: the additional sense code qualifier we'll fill out
> + *	@opcode: the original SCSI command opcode byte
>  *
>  *	Converts an ATA error into a SCSI error.  Fill out pointers to
>  *	SK, ASC, and ASCQ bytes for later use in fixed or descriptor
> @@ -429,7 +430,7 @@
>  *	spin_lock_irqsave(host_set lock)
>  */
> void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8
> *asc,
> -			u8 *ascq)
> +			u8 *ascq, u8 opcode)
> {
> 	int i;
>
> @@ -508,8 +509,8 @@
> 		}
> 	}
> 	/* No error?  Undecoded? */
> -	printk(KERN_WARNING "ata%u: no sense translation for status: 0x%02x\n",
> -	       id, drv_stat);
> +	printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status:
> 0x%02x\n",
> +	       id, opcode, drv_stat);
>
> 	/* For our last chance pick, use medium read error because
> 	 * it's much more common than an ATA drive telling you a write
> @@ -520,8 +521,8 @@
> 	*ascq = 0x04; /*  "auto-reallocation failed" */
>
>  translate_done:
> -	printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to "
> -	       "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err,
> +	printk(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to "
> +	       "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err,
> 	       *sk, *asc, *ascq);
> 	return;
> }
> @@ -562,7 +563,7 @@
> 	 */
> 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
> 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
> -				   &sb[1], &sb[2], &sb[3]);
> +				   &sb[1], &sb[2], &sb[3], cmd->cmnd[0]);
> 		sb[1] &= 0x0f;
> 	}
>
> @@ -637,7 +638,7 @@
> 	 */
> 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
> 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
> -				   &sb[2], &sb[12], &sb[13]);
> +				   &sb[2], &sb[12], &sb[13], cmd->cmnd[0]);
> 		sb[2] &= 0x0f;
> 	}
>
> -
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-17 14:59     ` Mark Lord
  2006-02-17 15:00       ` Justin Piszcz
@ 2006-02-18 20:43       ` Sander
  2006-02-18 21:42         ` Mark Lord
  1 sibling, 1 reply; 147+ messages in thread
From: Sander @ 2006-02-18 20:43 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list

Mark Lord wrote (ao):
> On Friday 17 February 2006 03:45, Jeff Garzik wrote:
> >Submit a patch... 
> 
> You mean, something like this one?
> Untested at present, as I was hoping to hear
> back from one of the original problem reporters
> after they tested it.

Not the original reporter, but your patch Works For Me.
I get these:

[  633.449961] md: md1: sync done.
[  633.456070] RAID5 conf printout:
[  633.456117]  --- rd:9 wd:9 fd:0
[  633.456164]  disk 0, o:1, dev:sda2
[  633.456208]  disk 1, o:1, dev:sdb2
[  633.456250]  disk 2, o:1, dev:sdc2
[  633.456298]  disk 3, o:1, dev:sdd2
[  633.456340]  disk 4, o:1, dev:sde2
[  633.456383]  disk 5, o:1, dev:sdf2
[  633.456427]  disk 6, o:1, dev:sdg2
[  633.456470]  disk 7, o:1, dev:sdh2
[  633.456514]  disk 8, o:1, dev:sdi2
[  787.639858] kjournald starting.  Commit interval 5 seconds
[  787.657991] EXT3 FS on md1, internal journal
[  787.658023] EXT3-fs: mounted filesystem with writeback data mode.
[ 1872.338185] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
[ 1872.338239] ata6: status=0xd0 { Busy }
[ 5749.285084] ata8: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
[ 5749.285138] ata8: status=0xd0 { Busy }
[ 5906.008461] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
[ 5906.008515] ata6: status=0xd0 { Busy }
[ 9892.904205] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
[ 9892.904259] ata6: status=0xd0 { Busy }
[10146.084687] ata5: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
[10146.084740] ata5: status=0xd0 { Busy }
[10293.949040] ata5: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
[10293.949093] ata5: status=0xd0 { Busy }

Can you tell from this what they mean?

This is with 2.6.16-rc3, your patch, and running nine Maxtors disks
over onboard nForce4 and MV88SX6081 8-port SATA II PCI-X Controller (rev 09).

for i in `seq 10`
do dd if=/dev/zero of=bigfile.$i bs=1024k count=10000
done
md5sum bigfile.*

The errors mostly seem to happen during the md5sum (not during the dd).

I do not see data corruption or slowdown.

I do need a chunksize of 512k for the raid5. With anything lower (I tried
the default 64k, 128k, 256k, 512k and 4096k) I get data corruption and
the errors reported in:
http://marc.theaimsgroup.com/?l=linux-ide&m=114016903530007&w=2

Thanks!

	Sander

-- 
Humilis IT Services and Solutions
http://www.humilis.net

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-18 20:43       ` Sander
@ 2006-02-18 21:42         ` Mark Lord
  2006-02-18 21:51           ` Justin Piszcz
  2006-02-19  7:14           ` Sander
  0 siblings, 2 replies; 147+ messages in thread
From: Mark Lord @ 2006-02-18 21:42 UTC (permalink / raw)
  To: sander; +Cc: Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list

Sander wrote:
> Mark Lord wrote (ao):
>> On Friday 17 February 2006 03:45, Jeff Garzik wrote:
>>> Submit a patch... 
>> You mean, something like this one?
...
> [  633.449961] md: md1: sync done.
> [  633.456070] RAID5 conf printout:
> [  633.456117]  --- rd:9 wd:9 fd:0
...
> [ 1872.338185] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
> [ 1872.338239] ata6: status=0xd0 { Busy }
> [ 5749.285084] ata8: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
> [ 5749.285138] ata8: status=0xd0 { Busy }
> [ 5906.008461] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
> [ 5906.008515] ata6: status=0xd0 { Busy }
...
> This is with 2.6.16-rc3, your patch, and running nine Maxtors disks
> over onboard nForce4 and MV88SX6081 8-port SATA II PCI-X Controller (rev 09).
> 
> for i in `seq 10`
> do dd if=/dev/zero of=bigfile.$i bs=1024k count=10000
> done
> md5sum bigfile.*
> 
> The errors mostly seem to happen during the md5sum (not during the dd).

SCSI opcode 0x2a is WRITE_10, so the errors are being reported
in response to the writes to bigfile.$i.  But these are different
from the previously reported error status values -- I wonder why
it's getting "Busy" back as a status here ??


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-18 21:42         ` Mark Lord
@ 2006-02-18 21:51           ` Justin Piszcz
  2006-02-19  7:14           ` Sander
  1 sibling, 0 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-02-18 21:51 UTC (permalink / raw)
  To: Mark Lord; +Cc: sander, Jeff Garzik, linux-kernel, IDE/ATA development list

$ for i in `seq 10`
> do dd if=/dev/zero of=bigfile.$i bs=1024k count=10000
> done
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 190.997693 seconds (54899930 bytes/sec)
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 212.242724 seconds (49404568 bytes/sec)
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 189.324450 seconds (55385134 bytes/sec)
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 190.280352 seconds (55106898 bytes/sec)
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 191.567239 seconds (54736708 bytes/sec)
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 183.640928 seconds (57099254 bytes/sec)
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 179.974098 seconds (58262606 bytes/sec)
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 190.126087 seconds (55151611 bytes/sec)
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 192.227807 seconds (54548612 bytes/sec)
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 185.309607 seconds (56585086 bytes/sec)
war@p34:/x4$ md5sum bigfile.*
26f56024ac39cdc54b228820107f040d  bigfile.1
26f56024ac39cdc54b228820107f040d  bigfile.10
26f56024ac39cdc54b228820107f040d  bigfile.2
26f56024ac39cdc54b228820107f040d  bigfile.3
26f56024ac39cdc54b228820107f040d  bigfile.4
26f56024ac39cdc54b228820107f040d  bigfile.5
26f56024ac39cdc54b228820107f040d  bigfile.6
26f56024ac39cdc54b228820107f040d  bigfile.7
26f56024ac39cdc54b228820107f040d  bigfile.8
26f56024ac39cdc54b228820107f040d  bigfile.9

No errors in dmesg yet (for my issue).

On Sat, 18 Feb 2006, Mark Lord wrote:

> Sander wrote:
>> Mark Lord wrote (ao):
>>> On Friday 17 February 2006 03:45, Jeff Garzik wrote:
>>>> Submit a patch... 
>>> You mean, something like this one?
> ...
>> [  633.449961] md: md1: sync done.
>> [  633.456070] RAID5 conf printout:
>> [  633.456117]  --- rd:9 wd:9 fd:0
> ...
>> [ 1872.338185] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI 
>> SK/ASC/ASCQ 0xb/47/00
>> [ 1872.338239] ata6: status=0xd0 { Busy }
>> [ 5749.285084] ata8: translated op=0x2a ATA stat/err 0xd0/00 to SCSI 
>> SK/ASC/ASCQ 0xb/47/00
>> [ 5749.285138] ata8: status=0xd0 { Busy }
>> [ 5906.008461] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI 
>> SK/ASC/ASCQ 0xb/47/00
>> [ 5906.008515] ata6: status=0xd0 { Busy }
> ...
>> This is with 2.6.16-rc3, your patch, and running nine Maxtors disks
>> over onboard nForce4 and MV88SX6081 8-port SATA II PCI-X Controller (rev 
>> 09).
>> 
>> for i in `seq 10`
>> do dd if=/dev/zero of=bigfile.$i bs=1024k count=10000
>> done
>> md5sum bigfile.*
>> 
>> The errors mostly seem to happen during the md5sum (not during the dd).
>
> SCSI opcode 0x2a is WRITE_10, so the errors are being reported
> in response to the writes to bigfile.$i.  But these are different
> from the previously reported error status values -- I wonder why
> it's getting "Busy" back as a status here ??
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-18 21:42         ` Mark Lord
  2006-02-18 21:51           ` Justin Piszcz
@ 2006-02-19  7:14           ` Sander
  2006-02-19 15:30             ` Mark Lord
  1 sibling, 1 reply; 147+ messages in thread
From: Sander @ 2006-02-19  7:14 UTC (permalink / raw)
  To: Mark Lord
  Cc: sander, Jeff Garzik, Justin Piszcz, linux-kernel,
	IDE/ATA development list

Mark Lord wrote (ao):
> Sander wrote:
> >Mark Lord wrote (ao):
> >>On Friday 17 February 2006 03:45, Jeff Garzik wrote:
> >>>Submit a patch... 
> >>You mean, something like this one?
> ...
> >[  633.449961] md: md1: sync done.
> >[  633.456070] RAID5 conf printout:
> >[  633.456117]  --- rd:9 wd:9 fd:0
> ...
> >[ 1872.338185] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI 
> >SK/ASC/ASCQ 0xb/47/00
> >[ 1872.338239] ata6: status=0xd0 { Busy }
> >[ 5749.285084] ata8: translated op=0x2a ATA stat/err 0xd0/00 to SCSI 
> >SK/ASC/ASCQ 0xb/47/00
> >[ 5749.285138] ata8: status=0xd0 { Busy }
> >[ 5906.008461] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI 
> >SK/ASC/ASCQ 0xb/47/00
> >[ 5906.008515] ata6: status=0xd0 { Busy }
> ...
> >This is with 2.6.16-rc3, your patch, and running nine Maxtors disks
> >over onboard nForce4 and MV88SX6081 8-port SATA II PCI-X Controller (rev 
> >09).
> >
> >for i in `seq 10`
> >do dd if=/dev/zero of=bigfile.$i bs=1024k count=10000
> >done
> >md5sum bigfile.*
> >
> >The errors mostly seem to happen during the md5sum (not during the dd).
> 
> SCSI opcode 0x2a is WRITE_10, so the errors are being reported
> in response to the writes to bigfile.$i.

Ah, my bad then.

> But these are different from the previously reported error status
> values -- I wonder why it's getting "Busy" back as a status here ??

Well, as I wrote, I am not the original reporter whoms thread you
responded to with your patch. I just thought I could use it to get
better errors messages for my bug reports.

I am using the sata_mv driver, which is beta. That might explain why it
behaves not totally as expected in your eyes. I have no clue anyway :-)

I hope my reports are of any use to Jeff wrt the sata_mv driver.

Thank you for your response.

	Sander

-- 
Humilis IT Services and Solutions
http://www.humilis.net

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-19  7:14           ` Sander
@ 2006-02-19 15:30             ` Mark Lord
  2006-02-19 17:16               ` Sander
  0 siblings, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-02-19 15:30 UTC (permalink / raw)
  To: sander; +Cc: Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list

Sander wrote:
> Mark Lord wrote (ao):
>> Sander wrote:
>>> Mark Lord wrote (ao):
>>>> On Friday 17 February 2006 03:45, Jeff Garzik wrote:
>>>>> Submit a patch... 
>>>> You mean, something like this one?
>> ...
>>> [  633.449961] md: md1: sync done.
>>> [  633.456070] RAID5 conf printout:
>>> [  633.456117]  --- rd:9 wd:9 fd:0
>> ...
>>> [ 1872.338185] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI 
>>> SK/ASC/ASCQ 0xb/47/00
>>> [ 1872.338239] ata6: status=0xd0 { Busy }
>>> [ 5749.285084] ata8: translated op=0x2a ATA stat/err 0xd0/00 to SCSI 
>>> SK/ASC/ASCQ 0xb/47/00
>>> [ 5749.285138] ata8: status=0xd0 { Busy }
>>> [ 5906.008461] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI 
>>> SK/ASC/ASCQ 0xb/47/00
>>> [ 5906.008515] ata6: status=0xd0 { Busy }
...
>> SCSI opcode 0x2a is WRITE_10, so the errors are being reported
>> in response to the writes to bigfile.$i.
...
> I am using the sata_mv driver, which is beta. That might explain why it
> behaves not totally as expected in your eyes. I have no clue anyway :-)

Ahh.. that's useful to know.  I expect to be taking a long hard look
at the innards of the sata_mv code in the near future, so whatever is
wrong here just might get fixed soon.

Cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-19 15:30             ` Mark Lord
@ 2006-02-19 17:16               ` Sander
  2006-07-06 23:08                 ` Justin Piszcz
  0 siblings, 1 reply; 147+ messages in thread
From: Sander @ 2006-02-19 17:16 UTC (permalink / raw)
  To: Mark Lord
  Cc: sander, Jeff Garzik, Justin Piszcz, linux-kernel,
	IDE/ATA development list

Mark Lord wrote (ao):
> Sander wrote:
> >Mark Lord wrote (ao):
> >>Sander wrote:
> >>>Mark Lord wrote (ao):
> >>>>On Friday 17 February 2006 03:45, Jeff Garzik wrote:
> >>>>>Submit a patch... 
> >>>>You mean, something like this one?
> >>...
> >>>[  633.449961] md: md1: sync done.
> >>>[  633.456070] RAID5 conf printout:
> >>>[  633.456117]  --- rd:9 wd:9 fd:0
> >>...
> >>>[ 1872.338185] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI 
> >>>SK/ASC/ASCQ 0xb/47/00
> >>>[ 1872.338239] ata6: status=0xd0 { Busy }
> >>>[ 5749.285084] ata8: translated op=0x2a ATA stat/err 0xd0/00 to SCSI 
> >>>SK/ASC/ASCQ 0xb/47/00
> >>>[ 5749.285138] ata8: status=0xd0 { Busy }
> >>>[ 5906.008461] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI 
> >>>SK/ASC/ASCQ 0xb/47/00
> >>>[ 5906.008515] ata6: status=0xd0 { Busy }
> ...
> >>SCSI opcode 0x2a is WRITE_10, so the errors are being reported
> >>in response to the writes to bigfile.$i.
> ...
> >I am using the sata_mv driver, which is beta. That might explain why it
> >behaves not totally as expected in your eyes. I have no clue anyway :-)
> 
> Ahh.. that's useful to know.

I'm sorry for omitting that information in my previous mail.

> I expect to be taking a long hard look at the innards of the sata_mv
> code in the near future, so whatever is wrong here just might get
> fixed soon.

Consider me your happy and willing patch test victim :-)

I can easily reproduce data corruption with sata_mv.

FWIW, I like this card very much. It is cheap, seems to perform well,
and Marvell seems to be Linux friendly, providing the docs (according to
http://linux-ata.org/sata-status.html#marvell).

I'm not subscribed to linux-ide, but am to linux-kernel. If you post it
there (or cc me) I'll see and try it.

	Sander

-- 
Humilis IT Services and Solutions
http://www.humilis.net

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-14 18:00       ` Mark Lord
  2006-02-14 18:06         ` Justin Piszcz
@ 2006-02-23 23:39         ` Justin Piszcz
  2006-02-25 15:32           ` Mark Lord
  2006-02-25 11:34         ` David Greaves
  2 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-02-23 23:39 UTC (permalink / raw)
  To: Mark Lord
  Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list

I have reproduced the error with the patched kernel!

Here it is:

[263864.109854] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
0xb/00/00
[263864.109861] ata3: status=0x51 { DriveReady SeekComplete Error }
[263864.109866] ata3: error=0x04 { DriveStatusError }

Here is how I got it to error:

$ for i in `seq 1 1000`; do dd if=/dev/zero of=file.$i bs=1M count=$i; 
done

Now, how to fix? :)

On Tue, 14 Feb 2006, Mark Lord wrote:

> On Tuesday 14 February 2006 12:12, Justin Piszcz wrote:
>> I would like to try the patch too, if available.
>
> Something like this:  (for 2.6.16-rc3-git2, but should be okay on 2.6.15 also).
>
> Untested:  include the original SCSI opcode in printk's for libata SCSI errors,
> to help understand where the errors are coming from.
>
> Signed-Off-By:  Mark Lord <mlord@pobox.com>
>
> --- linux/drivers/scsi/libata-scsi.c.orig	2006-02-12 19:27:25.000000000 -0500
> +++ linux/drivers/scsi/libata-scsi.c	2006-02-14 12:54:17.000000000 -0500
> @@ -420,6 +420,7 @@
>  *	@sk: the sense key we'll fill out
>  *	@asc: the additional sense code we'll fill out
>  *	@ascq: the additional sense code qualifier we'll fill out
> + *	@opcode: the original SCSI command opcode byte
>  *
>  *	Converts an ATA error into a SCSI error.  Fill out pointers to
>  *	SK, ASC, and ASCQ bytes for later use in fixed or descriptor
> @@ -429,7 +430,7 @@
>  *	spin_lock_irqsave(host_set lock)
>  */
> void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc,
> -			u8 *ascq)
> +			u8 *ascq, u8 opcode)
> {
> 	int i;
>
> @@ -508,8 +509,8 @@
> 		}
> 	}
> 	/* No error?  Undecoded? */
> -	printk(KERN_WARNING "ata%u: no sense translation for status: 0x%02x\n",
> -	       id, drv_stat);
> +	printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: 0x%02x\n",
> +	       id, opcode, drv_stat);
>
> 	/* For our last chance pick, use medium read error because
> 	 * it's much more common than an ATA drive telling you a write
> @@ -520,8 +521,8 @@
> 	*ascq = 0x04; /*  "auto-reallocation failed" */
>
>  translate_done:
> -	printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to "
> -	       "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err,
> +	printk(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to "
> +	       "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err,
> 	       *sk, *asc, *ascq);
> 	return;
> }
> @@ -562,7 +563,7 @@
> 	 */
> 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
> 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
> -				   &sb[1], &sb[2], &sb[3]);
> +				   &sb[1], &sb[2], &sb[3], cmd->cmnd[0]);
> 		sb[1] &= 0x0f;
> 	}
>
> @@ -637,7 +638,7 @@
> 	 */
> 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
> 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
> -				   &sb[2], &sb[12], &sb[13]);
> +				   &sb[2], &sb[12], &sb[13], cmd->cmnd[0]);
> 		sb[2] &= 0x0f;
> 	}
>
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-14 18:00       ` Mark Lord
  2006-02-14 18:06         ` Justin Piszcz
  2006-02-23 23:39         ` Justin Piszcz
@ 2006-02-25 11:34         ` David Greaves
  2006-02-25 16:20           ` Mark Lord
  2 siblings, 1 reply; 147+ messages in thread
From: David Greaves @ 2006-02-25 11:34 UTC (permalink / raw)
  To: Mark Lord
  Cc: Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list

Mark Lord wrote:

>On Tuesday 14 February 2006 12:12, Justin Piszcz wrote:
>  
>
>>I would like to try the patch too, if available.
>>    
>>
>
>Something like this:  (for 2.6.16-rc3-git2, but should be okay on 2.6.15 also).
>
>Untested:  include the original SCSI opcode in printk's for libata SCSI errors,
>to help understand where the errors are coming from.
>
>Signed-Off-By:  Mark Lord <mlord@pobox.com>
>  
>
Thanks Mark - I've finally gotten this patch applied.

With smartd disabled and no smart commands issued, a readonly badblocks
scan of /dev/sdb2 shows no problems and now gives:
Feb 25 10:38:31 haze kernel: ata2: status=0x51 { DriveReady SeekComplete
Error }
Feb 25 10:38:32 haze kernel: ata2: no sense translation for op=0x28
status: 0x51
Feb 25 10:38:32 haze kernel: ata2: status=0x51 { DriveReady SeekComplete
Error }
Feb 25 10:38:35 haze kernel: ata2: no sense translation for op=0x28
status: 0x51
hundreds of times.

and during boot I can get:
ata2: no sense translation for op=0x28 status: 0x51
ata2: translated op=0x28 ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
ata2: status=0x51 { DriveReady SeekComplete Error }
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
ata2: no sense translation for op=0x28 status: 0x51
ata2: translated op=0x28 ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: no sense translation for op=0x28 status: 0x51
ata2: translated op=0x28 ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
ata2: status=0x51 { DriveReady SeekComplete Error }

Subsequently a
 smartclt -data -a /dev/sdb
shows no errors.
So could this be a faulty disk that smart shows is OK and shows no read
or write errors?

The other problem I noticed was that
 smartctl -o on -data /dev/sda
still just gives:
Feb 25 10:51:47 haze kernel: ata1: PIO error
Feb 25 10:51:47 haze kernel: ata1: status=0x51 { DriveReady SeekComplete
Error }
Feb 25 10:51:47 haze kernel: ata1: error=0x04 { DriveStatusError }
Feb 25 10:51:47 haze kernel: ata1: PIO error
Feb 25 10:51:47 haze kernel: ata1: status=0x51 { DriveReady SeekComplete
Error }
Feb 25 10:51:47 haze kernel: ata1: error=0x04 { DriveStatusError }
Feb 25 10:51:47 haze kernel: ata1: PIO error
many times.

I get similar problems for all the drives under both sata_sil and sata_via.


Linux haze 2.6.15patchsata #6 PREEMPT Fri Feb 24 19:15:07 UTC 2006 i686
GNU/Linux


libata version 1.20 loaded.
sata_sil 0000:00:0a.0: version 0.9
ACPI: PCI Interrupt 0000:00:0a.0[A] -> GSI 16 (level, low) -> IRQ 17
ata1: SATA max UDMA/100 cmd 0xF8804080 ctl 0xF880408A bmdma 0xF8804000
irq 17
ata2: SATA max UDMA/100 cmd 0xF88040C0 ctl 0xF88040CA bmdma 0xF8804008
irq 17
ata1: dev 0 cfg 49:2f00 82:7869 83:7d09 84:4043 85:7869 86:3c01 87:4043
88:203f
ata1: dev 0 ATA-7, max UDMA/100, 390721968 sectors: LBA48
ata1: dev 0 configured for UDMA/100
scsi0 : sata_sil
ata2: dev 0 cfg 49:2f00 82:7c6b 83:7f09 84:4063 85:7c69 86:3e01 87:4063
88:007f
ata2: dev 0 ATA-7, max UDMA/133, 398297088 sectors: LBA48
ata2: dev 0 configured for UDMA/100
scsi1 : sata_sil
  Vendor: ATA       Model: Maxtor 6B200M0    Rev: BANC
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: Maxtor 6B200M0    Rev: BANC
  Type:   Direct-Access                      ANSI SCSI revision: 05
sata_via 0000:00:0f.0: version 1.1
ACPI: PCI Interrupt 0000:00:0f.0[B] -> GSI 20 (level, low) -> IRQ 16
sata_via 0000:00:0f.0: routed to hard irq line 0
ata3: SATA max UDMA/133 cmd 0x9800 ctl 0x9402 bmdma 0x8400 irq 16
ata4: SATA max UDMA/133 cmd 0x9000 ctl 0x8802 bmdma 0x8408 irq 16
ata3: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3468 86:3c01 87:4003
88:407f
ata3: dev 0 ATA-6, max UDMA/133, 312581808 sectors: LBA48
ata3: dev 0 configured for UDMA/133
scsi2 : sata_via
ata4: dev 0 cfg 49:2f00 82:7c6b 83:7f09 84:4063 85:7c68 86:3e01 87:4063
88:407f
ata4: dev 0 ATA-7, max UDMA/133, 398297088 sectors: LBA48
ata4: dev 0 configured for UDMA/133
scsi3 : sata_via
  Vendor: ATA       Model: ST3160023AS       Rev: 3.18
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: Maxtor 6B200M0    Rev: BANC
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB)
SCSI device sda: drive cache: write back
 sda: sda1
sd 0:0:0:0: Attached scsi disk sda
SCSI device sdb: 398297088 512-byte hdwr sectors (203928 MB)
SCSI device sdb: drive cache: write back
SCSI device sdb: 398297088 512-byte hdwr sectors (203928 MB)
SCSI device sdb: drive cache: write back
 sdb: sdb1 sdb2
sd 1:0:0:0: Attached scsi disk sdb
SCSI device sdc: 312581808 512-byte hdwr sectors (160042 MB)
SCSI device sdc: drive cache: write back
SCSI device sdc: 312581808 512-byte hdwr sectors (160042 MB)
SCSI device sdc: drive cache: write back
 sdc: sdc1 sdc2 sdc3 sdc4
sd 2:0:0:0: Attached scsi disk sdc
SCSI device sdd: 398297088 512-byte hdwr sectors (203928 MB)
SCSI device sdd: drive cache: write back
SCSI device sdd: 398297088 512-byte hdwr sectors (203928 MB)
SCSI device sdd: drive cache: write back
 sdd: sdd1 sdd2
sd 3:0:0:0: Attached scsi disk sdd
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 1:0:0:0: Attached scsi generic sg1 type 0
sd 2:0:0:0: Attached scsi generic sg2 type 0
sd 3:0:0:0: Attached scsi generic sg3 type 0



David

-- 


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-23 23:39         ` Justin Piszcz
@ 2006-02-25 15:32           ` Mark Lord
  2006-02-25 15:58             ` Justin Piszcz
  0 siblings, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-02-25 15:32 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list

Justin Piszcz wrote:
> I have reproduced the error with the patched kernel!
> 
> Here it is:
> 
> [263864.109854] ata3: translated ATA stat/err 0x51/04 to SCSI 
> SK/ASC/ASCQ 0xb/00/00
> [263864.109861] ata3: status=0x51 { DriveReady SeekComplete Error }
> [263864.109866] ata3: error=0x04 { DriveStatusError }

Nope.. patch not present, as otherwise the line above would have
read something like this:

 > [263864.109854] ata3: translated op=0x21 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00

So we didn't get the extra info since the patch wasn't present.

Cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-25 15:32           ` Mark Lord
@ 2006-02-25 15:58             ` Justin Piszcz
  2006-02-25 16:11               ` Jesper Juhl
  2006-02-25 16:21               ` Mark Lord
  0 siblings, 2 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-02-25 15:58 UTC (permalink / raw)
  To: Mark Lord
  Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list

The kernel is patched, if you did not get what you wanted maybe the patch 
does not work in some instances or there is a bug?

On Sat, 25 Feb 2006, Mark Lord wrote:

> Justin Piszcz wrote:
>> I have reproduced the error with the patched kernel!
>> 
>> Here it is:
>> 
>> [263864.109854] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
>> 0xb/00/00
>> [263864.109861] ata3: status=0x51 { DriveReady SeekComplete Error }
>> [263864.109866] ata3: error=0x04 { DriveStatusError }
>
> Nope.. patch not present, as otherwise the line above would have
> read something like this:
>
>> [263864.109854] ata3: translated op=0x21 ATA stat/err 0x51/04 to SCSI 
> SK/ASC/ASCQ 0xb/00/00
>
> So we didn't get the extra info since the patch wasn't present.
>
> Cheers
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-25 15:58             ` Justin Piszcz
@ 2006-02-25 16:11               ` Jesper Juhl
  2006-02-25 16:21               ` Mark Lord
  1 sibling, 0 replies; 147+ messages in thread
From: Jesper Juhl @ 2006-02-25 16:11 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Mark Lord, David Greaves, Jeff Garzik, linux-kernel,
	IDE/ATA development list

On 2/25/06, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:

Please don't top-post.

> The kernel is patched, if you did not get what you wanted maybe the patch
> does not work in some instances or there is a bug?
>

You may have patched a kernel source with Mark's patch, but you are
very clearly not running a kernel build from that patched source.

As can be seen from (for example) this bit from Mark's patch

 translate_done:
-       printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to "
-              "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err,
+       printk(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err
0x%02x/%02x to "
+              "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode,
drv_stat, drv_err,
              *sk, *asc, *ascq);

the patch changes the text being printed. In this case the text
"ata%u: translated ATA stat/err ..." is changed into "ata%u:
translated ATA stat/err ..."

And if we look at the output you posted :

> >> Here it is:
> >>
> >> [263864.109854] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ
> >> 0xb/00/00

That string is clearly from an un-patched kernel as Mark also pointed
out in his reply to you.


--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-25 11:34         ` David Greaves
@ 2006-02-25 16:20           ` Mark Lord
  2006-02-25 17:45             ` Justin Piszcz
  0 siblings, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-02-25 16:20 UTC (permalink / raw)
  To: David Greaves
  Cc: Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list

[-- Attachment #1: Type: text/plain, Size: 1443 bytes --]

David Greaves wrote:
..
> Thanks Mark - I've finally gotten this patch applied.
> 
> With smartd disabled and no smart commands issued, a readonly badblocks
> scan of /dev/sdb2 shows no problems and now gives:
> Feb 25 10:38:31 haze kernel: ata2: status=0x51 { DriveReady SeekComplete
> Error }
> Feb 25 10:38:32 haze kernel: ata2: no sense translation for op=0x28
> status: 0x51
> Feb 25 10:38:32 haze kernel: ata2: status=0x51 { DriveReady SeekComplete
> Error }
> Feb 25 10:38:35 haze kernel: ata2: no sense translation for op=0x28
> status: 0x51
> hundreds of times.
..

Mmmm.. okay, it's happening due to a SCSI READ_10 opcode,
which means it isn't being triggered by any of the FUA stuff.

But there's still no obvious reason for the error.
The drive is basically just saying "command rejected",
and libata-scsi is translating that into "medium error"
for some unknown reason.

Unfortunately, the design of the current libata is such that
we no longer have access to the actual ATA opcode that was rejected.
It gets overwritten by the returned drive status on completion.

So.. I need to generate another patch for you now, to save/show
the real ATA opcode that was used to cause the errors.
My theory is that we'll discover that it is one that your drive
legitimately is rejecting (unsupported LBA48 or something..).

But we won't know until we see the output.

Second patch is attached:  apply *in addition* to the first one.

Cheers


[-- Attachment #2: 12_libata_ata_opcode.patch --]
[-- Type: text/x-patch, Size: 5983 bytes --]

--- linux/drivers/scsi/libata-core.c.orig	2006-02-23 16:15:52.000000000 -0500
+++ linux/drivers/scsi/libata-core.c	2006-02-25 11:17:42.000000000 -0500
@@ -253,10 +253,11 @@
  *	spin_lock_irqsave(host_set lock)
  */
 
-static void ata_exec_command_pio(struct ata_port *ap, const struct ata_taskfile *tf)
+static void ata_exec_command_pio(struct ata_port *ap, struct ata_taskfile *tf)
 {
 	DPRINTK("ata%u: cmd 0x%X\n", ap->id, tf->command);
 
+	tf->saved_command = tf->command;
        	outb(tf->command, ap->ioaddr.command_addr);
 	ata_pause(ap);
 }
@@ -274,10 +275,11 @@
  *	spin_lock_irqsave(host_set lock)
  */
 
-static void ata_exec_command_mmio(struct ata_port *ap, const struct ata_taskfile *tf)
+static void ata_exec_command_mmio(struct ata_port *ap, struct ata_taskfile *tf)
 {
 	DPRINTK("ata%u: cmd 0x%X\n", ap->id, tf->command);
 
+	tf->saved_command = tf->command;
        	writeb(tf->command, (void __iomem *) ap->ioaddr.command_addr);
 	ata_pause(ap);
 }
@@ -294,7 +296,7 @@
  *	LOCKING:
  *	spin_lock_irqsave(host_set lock)
  */
-void ata_exec_command(struct ata_port *ap, const struct ata_taskfile *tf)
+void ata_exec_command(struct ata_port *ap, struct ata_taskfile *tf)
 {
 	if (ap->flags & ATA_FLAG_MMIO)
 		ata_exec_command_mmio(ap, tf);
@@ -316,7 +318,7 @@
  */
 
 static inline void ata_tf_to_host(struct ata_port *ap,
-				  const struct ata_taskfile *tf)
+				  struct ata_taskfile *tf)
 {
 	ap->ops->tf_load(ap, tf);
 	ap->ops->exec_command(ap, tf);
@@ -506,12 +508,13 @@
  *	Inherited from caller.
  */
 
-void ata_tf_to_fis(const struct ata_taskfile *tf, u8 *fis, u8 pmp)
+void ata_tf_to_fis(struct ata_taskfile *tf, u8 *fis, u8 pmp)
 {
 	fis[0] = 0x27;	/* Register - Host to Device FIS */
 	fis[1] = (pmp & 0xf) | (1 << 7); /* Port multiplier number,
 					    bit 7 indicates Command FIS */
 	fis[2] = tf->command;
+	tf->saved_command = tf->command;
 	fis[3] = tf->feature;
 
 	fis[4] = tf->lbal;
@@ -631,6 +634,7 @@
 	cmd = ata_rw_cmds[index + fua + lba48 + write];
 	if (cmd) {
 		tf->command = cmd;
+		tf->saved_command = cmd;
 		return 0;
 	}
 	return -1;
--- linux/drivers/scsi/libata-scsi.c.orig	2006-02-25 10:58:41.000000000 -0500
+++ linux/drivers/scsi/libata-scsi.c	2006-02-25 11:16:07.000000000 -0500
@@ -438,7 +438,7 @@
  *	spin_lock_irqsave(host_set lock)
  */
 void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc, 
-			u8 *ascq, u8 opcode)
+			u8 *ascq, u8 opcode, u8 cmd)
 {
 	int i;
 
@@ -517,8 +517,8 @@
 		}
 	}
 	/* No error?  Undecoded? */
-	printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: 0x%02x\n", 
-	       id, opcode, drv_stat);
+	printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x cmd=0x%02x status: 0x%02x\n", 
+	       id, opcode, cmd, drv_stat);
 
 	/* For our last chance pick, use medium read error because
 	 * it's much more common than an ATA drive telling you a write
@@ -529,8 +529,8 @@
 	*ascq = 0x04; /*  "auto-reallocation failed" */
 
  translate_done:
-	DPRINTK(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to "
-	       "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err,
+	DPRINTK(KERN_ERR "ata%u: translated op=0x%02x cmd=0x%02x ATA stat/err 0x%02x/%02x to "
+	       "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, cmd, drv_stat, drv_err,
 	       *sk, *asc, *ascq);
 	return;
 }
@@ -571,7 +571,7 @@
 	 */
 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
-				   &sb[1], &sb[2], &sb[3], cmd->cmnd[0]);
+				   &sb[1], &sb[2], &sb[3], cmd->cmnd[0], tf->saved_command);
 		sb[1] &= 0x0f;
 	}
 
@@ -646,7 +646,7 @@
 	 */
 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
-				   &sb[2], &sb[12], &sb[13], cmd->cmnd[0]);
+				   &sb[2], &sb[12], &sb[13], cmd->cmnd[0], tf->saved_command);
 		sb[2] &= 0x0f;
 	}
 
@@ -1337,6 +1337,7 @@
 		goto early_finish;
 
 	/* select device, send command to hardware */
+	qc->tf.saved_command = qc->tf.command;
 	if (ata_qc_issue(qc))
 		goto err_did;
 
--- linux/include/linux/ata.h.orig	2006-02-17 17:23:45.000000000 -0500
+++ linux/include/linux/ata.h	2006-02-25 11:09:53.000000000 -0500
@@ -244,6 +244,7 @@
 	u8			device;
 
 	u8			command;	/* IO operation */
+	u8			saved_command;	/* IO operation */
 };
 
 #define ata_id_is_ata(id)	(((id)[0] & (1 << 15)) == 0)
--- linux/include/linux/libata.h.orig	2006-02-23 16:15:53.000000000 -0500
+++ linux/include/linux/libata.h	2006-02-25 11:17:14.000000000 -0500
@@ -420,7 +420,7 @@
 	void (*tf_load) (struct ata_port *ap, const struct ata_taskfile *tf);
 	void (*tf_read) (struct ata_port *ap, struct ata_taskfile *tf);
 
-	void (*exec_command)(struct ata_port *ap, const struct ata_taskfile *tf);
+	void (*exec_command)(struct ata_port *ap, struct ata_taskfile *tf);
 	u8   (*check_status)(struct ata_port *ap);
 	u8   (*check_altstatus)(struct ata_port *ap);
 	void (*dev_select)(struct ata_port *ap, unsigned int device);
@@ -512,13 +512,13 @@
  */
 extern void ata_tf_load(struct ata_port *ap, const struct ata_taskfile *tf);
 extern void ata_tf_read(struct ata_port *ap, struct ata_taskfile *tf);
-extern void ata_tf_to_fis(const struct ata_taskfile *tf, u8 *fis, u8 pmp);
+extern void ata_tf_to_fis(struct ata_taskfile *tf, u8 *fis, u8 pmp);
 extern void ata_tf_from_fis(const u8 *fis, struct ata_taskfile *tf);
 extern void ata_noop_dev_select (struct ata_port *ap, unsigned int device);
 extern void ata_std_dev_select (struct ata_port *ap, unsigned int device);
 extern u8 ata_check_status(struct ata_port *ap);
 extern u8 ata_altstatus(struct ata_port *ap);
-extern void ata_exec_command(struct ata_port *ap, const struct ata_taskfile *tf);
+extern void ata_exec_command(struct ata_port *ap, struct ata_taskfile *tf);
 extern int ata_port_start (struct ata_port *ap);
 extern void ata_port_stop (struct ata_port *ap);
 extern void ata_host_stop (struct ata_host_set *host_set);

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-25 15:58             ` Justin Piszcz
  2006-02-25 16:11               ` Jesper Juhl
@ 2006-02-25 16:21               ` Mark Lord
  1 sibling, 0 replies; 147+ messages in thread
From: Mark Lord @ 2006-02-25 16:21 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list

Justin Piszcz wrote:
> The kernel is patched, if you did not get what you wanted maybe the 
> patch does not work in some instances or there is a bug?

No, the output would be there if those messages came from the patched kernel.
(read the patch and see what I mean..).

Cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-25 16:20           ` Mark Lord
@ 2006-02-25 17:45             ` Justin Piszcz
  2006-02-25 18:28               ` Mark Lord
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-02-25 17:45 UTC (permalink / raw)
  To: Mark Lord
  Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list

Second patch fails for me.

On a clean 2.6.15.4 source tree:

p34:/usr/src# ls -ld linux
lrwxrwxrwx  1 root src 14 2006-02-25 12:41 linux -> linux-2.6.15.4/

The one from your e-mail earlier:
p34:/usr/src/linux# patch -p1 < /tmp/patch1
patching file drivers/scsi/libata-scsi.c
Hunk #1 succeeded at 404 (offset -16 lines).
Hunk #2 succeeded at 414 (offset -16 lines).
Hunk #3 succeeded at 493 (offset -16 lines).
Hunk #4 succeeded at 505 (offset -16 lines).
Hunk #5 succeeded at 547 (offset -16 lines).
Hunk #6 succeeded at 622 (offset -16 lines).

p34:/usr/src/linux# patch -p1 < /tmp/12_libata_ata_opcode.patch
patching file drivers/scsi/libata-core.c
Hunk #1 succeeded at 245 (offset -8 lines).
Hunk #2 succeeded at 267 (offset -8 lines).
Hunk #3 succeeded at 288 (offset -8 lines).
Hunk #4 succeeded at 310 (offset -8 lines).
Hunk #5 succeeded at 500 (offset -8 lines).
Hunk #6 FAILED at 626.
1 out of 6 hunks FAILED -- saving rejects to file 
drivers/scsi/libata-core.c.rej
patching file drivers/scsi/libata-scsi.c
Hunk #1 succeeded at 414 (offset -24 lines).
Hunk #2 succeeded at 493 (offset -24 lines).
Hunk #3 FAILED at 505.
Hunk #4 succeeded at 547 (offset -24 lines).
Hunk #5 succeeded at 622 (offset -24 lines).
Hunk #6 succeeded at 1308 (offset -29 lines).
1 out of 6 hunks FAILED -- saving rejects to file 
drivers/scsi/libata-scsi.c.rej
patching file include/linux/ata.h
Hunk #1 succeeded at 239 (offset -5 lines).
patching file include/linux/libata.h
Hunk #1 succeeded at 368 (offset -52 lines).
Hunk #2 succeeded at 452 (offset -60 lines).
p34:/usr/src/linux#


Should I be using 2.6.16-rcX?

On Sat, 25 Feb 2006, Mark Lord wrote:

> David Greaves wrote:
> ..
>> Thanks Mark - I've finally gotten this patch applied.
>> 
>> With smartd disabled and no smart commands issued, a readonly badblocks
>> scan of /dev/sdb2 shows no problems and now gives:
>> Feb 25 10:38:31 haze kernel: ata2: status=0x51 { DriveReady SeekComplete
>> Error }
>> Feb 25 10:38:32 haze kernel: ata2: no sense translation for op=0x28
>> status: 0x51
>> Feb 25 10:38:32 haze kernel: ata2: status=0x51 { DriveReady SeekComplete
>> Error }
>> Feb 25 10:38:35 haze kernel: ata2: no sense translation for op=0x28
>> status: 0x51
>> hundreds of times.
> ..
>
> Mmmm.. okay, it's happening due to a SCSI READ_10 opcode,
> which means it isn't being triggered by any of the FUA stuff.
>
> But there's still no obvious reason for the error.
> The drive is basically just saying "command rejected",
> and libata-scsi is translating that into "medium error"
> for some unknown reason.
>
> Unfortunately, the design of the current libata is such that
> we no longer have access to the actual ATA opcode that was rejected.
> It gets overwritten by the returned drive status on completion.
>
> So.. I need to generate another patch for you now, to save/show
> the real ATA opcode that was used to cause the errors.
> My theory is that we'll discover that it is one that your drive
> legitimately is rejecting (unsupported LBA48 or something..).
>
> But we won't know until we see the output.
>
> Second patch is attached:  apply *in addition* to the first one.
>
> Cheers
>
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-25 17:45             ` Justin Piszcz
@ 2006-02-25 18:28               ` Mark Lord
  2006-02-25 18:55                 ` Justin Piszcz
                                   ` (2 more replies)
  0 siblings, 3 replies; 147+ messages in thread
From: Mark Lord @ 2006-02-25 18:28 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Mark Lord, David Greaves, Jeff Garzik, linux-kernel,
	IDE/ATA development list

Justin Piszcz wrote:
> Second patch fails for me.
..
> Should I be using 2.6.16-rcX?

Mmm... that's what I'm using (plus other patches),
so, yes.. give that a try.  2.6.16 does seem to
be shaping up to be a nice kernel.

Cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-25 18:28               ` Mark Lord
@ 2006-02-25 18:55                 ` Justin Piszcz
  2006-02-25 19:29                 ` Justin Piszcz
  2006-02-25 19:47                 ` David Greaves
  2 siblings, 0 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-02-25 18:55 UTC (permalink / raw)
  To: Mark Lord
  Cc: Mark Lord, David Greaves, Jeff Garzik, linux-kernel,
	IDE/ATA development list

I will give 2.6.16-rcX a try shortly, here is the error again (with a 
freshly patched 2.6.15.4) just to rule out any problems with the first 
time that I patched:

[ 1037.451784] ata3: translated op=0x2a ATA stat/err 0x51/04 to SCSI 
SK/ASC/ASCQ 0xb/00/00
[ 1037.451791] ata3: status=0x51 { DriveReady SeekComplete Error }
[ 1037.451796] ata3: error=0x04 { DriveStatusError }
[ 1517.050496] ata3: no sense translation for op=0x2a status: 0x51
[ 1517.050504] ata3: translated op=0x2a ATA stat/err 0x51/00 to SCSI 
SK/ASC/ASCQ 0x3/11/04
[ 1517.050506] ata3: status=0x51 { DriveReady SeekComplete Error }


On Sat, 25 Feb 2006, Mark Lord wrote:

> Justin Piszcz wrote:
>> Second patch fails for me.
> ..
>> Should I be using 2.6.16-rcX?
>
> Mmm... that's what I'm using (plus other patches),
> so, yes.. give that a try.  2.6.16 does seem to
> be shaping up to be a nice kernel.
>
> Cheers
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-25 18:28               ` Mark Lord
  2006-02-25 18:55                 ` Justin Piszcz
@ 2006-02-25 19:29                 ` Justin Piszcz
  2006-02-25 19:53                   ` David Greaves
  2006-02-25 19:47                 ` David Greaves
  2 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-02-25 19:29 UTC (permalink / raw)
  To: Mark Lord
  Cc: Mark Lord, David Greaves, Jeff Garzik, linux-kernel,
	IDE/ATA development list

Which kernel did you run your patch against?

With 2.6.16-rc4....

First patch looks good..

p34:/usr/src/linux# patch -p1 < /tmp/patch1
patching file drivers/scsi/libata-scsi.c

p34:/usr/src/linux# patch -p1 < /tmp/12_libata_ata_opcode.patch
patching file drivers/scsi/libata-core.c
Hunk #1 succeeded at 245 (offset -8 lines).
Hunk #2 succeeded at 267 (offset -8 lines).
Hunk #3 succeeded at 288 (offset -8 lines).
Hunk #4 succeeded at 310 (offset -8 lines).
Hunk #5 succeeded at 500 (offset -8 lines).
Hunk #6 succeeded at 626 (offset -8 lines).
patching file drivers/scsi/libata-scsi.c
Hunk #1 succeeded at 430 (offset -8 lines).
Hunk #2 succeeded at 509 (offset -8 lines).
Hunk #3 FAILED at 521.
Hunk #4 succeeded at 563 (offset -8 lines).
Hunk #5 succeeded at 638 (offset -8 lines).
Hunk #6 succeeded at 1329 (offset -8 lines).
1 out of 6 hunks FAILED -- saving rejects to file 
drivers/scsi/libata-scsi.c.rej
patching file include/linux/ata.h
patching file include/linux/libata.h
Hunk #1 succeeded at 373 (offset -47 lines).
Hunk #2 succeeded at 463 (offset -49 lines).
p34:/usr/src/linux# ls -ld /usr/src/linux
lrwxrwxrwx  1 root src 16 2006-02-25 14:24 /usr/src/linux -> 
linux-2.6.16-rc4/
p34:/usr/src/linux#

Here is the *.rej file:

# cat libata-scsi.c.rej
***************
*** 521,528 ****
         *ascq = 0x04; /*  "auto-reallocation failed" */

    translate_done:
-       DPRINTK(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 
0x%02x/%02x to "
-              "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, 
drv_err,
                *sk, *asc, *ascq);
         return;
   }
--- 521,528 ----
         *ascq = 0x04; /*  "auto-reallocation failed" */

    translate_done:
+       DPRINTK(KERN_ERR "ata%u: translated op=0x%02x cmd=0x%02x ATA 
stat/err 0x%02x/%02x to "
+              "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, cmd, 
drv_stat, drv_err,
                *sk, *asc, *ascq);
         return;
   }




On Sat, 25 Feb 2006, Mark Lord wrote:

> Justin Piszcz wrote:
>> Second patch fails for me.
> ..
>> Should I be using 2.6.16-rcX?
>
> Mmm... that's what I'm using (plus other patches),
> so, yes.. give that a try.  2.6.16 does seem to
> be shaping up to be a nice kernel.
>
> Cheers
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-25 18:28               ` Mark Lord
  2006-02-25 18:55                 ` Justin Piszcz
  2006-02-25 19:29                 ` Justin Piszcz
@ 2006-02-25 19:47                 ` David Greaves
  2006-02-26  2:27                   ` Mark Lord
  2 siblings, 1 reply; 147+ messages in thread
From: David Greaves @ 2006-02-25 19:47 UTC (permalink / raw)
  To: Mark Lord
  Cc: Justin Piszcz, Mark Lord, Jeff Garzik, linux-kernel,
	IDE/ATA development list

Mark Lord wrote:

> Justin Piszcz wrote:
>
>> Should I be using 2.6.16-rcX?
>
>
> Mmm... that's what I'm using (plus other patches),
> so, yes.. give that a try.  2.6.16 does seem to
> be shaping up to be a nice kernel.

OK, failed for me too - I updated to 2.6.16-rc4 and it still failed
(despite -F) so I fixed by hand.
(printk -> DPRINTK

anyway:
Linux haze 2.6.16-rc4patched #1 PREEMPT Sat Feb 25 19:29:11 UTC 2006
i686 GNU/Linux

ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: error=0x04 { DriveStatusError }
ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata2: status=0x51 { DriveReady SeekComplete Error }
sd 1:0:0:0: SCSI error: return code = 0x8000002
sdb: Current: sense key: Medium Error
    Additional sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sdb, sector 398283329
raid1: Disk failure on sdb2, disabling device.
        Operation continuing on 1 devices


and later...


device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com
XFS mounting filesystem dm-0
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
ata1: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata1: status=0x51 { DriveReady SeekComplete Error }
ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata2: status=0x51 { DriveReady SeekComplete Error }
ata1: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata1: status=0x51 { DriveReady SeekComplete Error }
ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata2: status=0x51 { DriveReady SeekComplete Error }
ata1: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata1: status=0x51 { DriveReady SeekComplete Error }
ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata2: status=0x51 { DriveReady SeekComplete Error }
ata1: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata1: status=0x51 { DriveReady SeekComplete Error }
ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata2: status=0x51 { DriveReady SeekComplete Error }
sd 0:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key: Medium Error
    Additional sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sda, sector 390716735
raid5: Disk failure on sda1, disabling device. Operation continuing on 2
devices
ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata2: status=0x51 { DriveReady SeekComplete Error }
sd 1:0:0:0: SCSI error: return code = 0x8000002
sdb: Current: sense key: Medium Error
    Additional sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sdb, sector 390716735
raid5: Disk failure on sdb1, disabling device. Operation continuing on 1
devices
RAID5 conf printout:
 --- rd:3 wd:1 fd:2
 disk 0, o:1, dev:sdd1
 disk 1, o:0, dev:sdb1
 disk 2, o:0, dev:sda1
xfs_force_shutdown(dm-0,0x1) called from line 338 of file
fs/xfs/xfs_rw.c.  Return address = 0xc020c0e9
Filesystem "dm-0": I/O Error Detected.  Shutting down filesystem: dm-0
Please umount the filesystem, and rectify the problem(s)
I/O error in filesystem ("dm-0") meta-data dev dm-0 block
0x640884a       ("xlog_bwrite") error 5 buf count 262144
XFS: failed to locate log tail
XFS: log mount/recovery failed: error 5
XFS: log mount failed
RAID5 conf printout:
 --- rd:3 wd:1 fd:2
 disk 0, o:1, dev:sdd1
 disk 1, o:0, dev:sdb1
RAID5 conf printout:
 --- rd:3 wd:1 fd:2
 disk 0, o:1, dev:sdd1
 disk 1, o:0, dev:sdb1
RAID5 conf printout:
 --- rd:3 wd:1 fd:2
 disk 0, o:1, dev:sdd1


So I guess my raid just blew up too... hope there's no corruption!

David

(PS Hi Mark, this is lbt from the Empeg BBS :) )

-- 


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-25 19:29                 ` Justin Piszcz
@ 2006-02-25 19:53                   ` David Greaves
  0 siblings, 0 replies; 147+ messages in thread
From: David Greaves @ 2006-02-25 19:53 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Mark Lord, Mark Lord, Jeff Garzik, linux-kernel,
	IDE/ATA development list

Justin Piszcz wrote:

> Which kernel did you run your patch against?
>
> With 2.6.16-rc4....
>
> First patch looks good..
>
Justin, I'll help you out off-list :)

David


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-25 19:47                 ` David Greaves
@ 2006-02-26  2:27                   ` Mark Lord
  2006-02-26  9:56                     ` David Greaves
  2006-02-26 12:27                     ` James Courtier-Dutton
  0 siblings, 2 replies; 147+ messages in thread
From: Mark Lord @ 2006-02-26  2:27 UTC (permalink / raw)
  To: David Greaves
  Cc: Justin Piszcz, Jeff Garzik, linux-kernel,
	IDE/ATA development list, albertcc, axboe, htejun

David Greaves wrote:
>
> Linux haze 2.6.16-rc4patched #1 PREEMPT Sat Feb 25 19:29:11 UTC 2006
> i686 GNU/Linux
> 
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: error=0x04 { DriveStatusError }
> ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
> ata2: status=0x51 { DriveReady SeekComplete Error }
> sd 1:0:0:0: SCSI error: return code = 0x8000002
> sdb: Current: sense key: Medium Error
>     Additional sense: Unrecovered read error - auto reallocate failed
> end_request: I/O error, dev sdb, sector 398283329
> raid1: Disk failure on sdb2, disabling device.
>         Operation continuing on 1 devices

Oh good, *now* we've gotten somewhere!!

Albert / Jens / Jeff:

The command failing above is SCSI WRITE_10, which is being
translated into ATA_CMD_WRITE_FUA_EXT by libata.

This command fails -- unrecognized by the drive in question.
But libata reports it (most incorrectly) as a "medium error",
and the drive is taken out of service from its RAID.

Bad, bad, and worse.

Libata should really recover from this, by recognizing that
the command was rejected, and replacing it with a simple
WRITE_EXT instead.  Possibly followed by FLUSH_CACHE.

So.. I've forgotten who put FUA into libata, but hopefully
it's one of the folks on the CC: list, and that nice person
can now generate a patch to fix this bug somehow.

Cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-26  2:27                   ` Mark Lord
@ 2006-02-26  9:56                     ` David Greaves
  2006-02-26 14:04                       ` Mark Lord
  2006-02-26 12:27                     ` James Courtier-Dutton
  1 sibling, 1 reply; 147+ messages in thread
From: David Greaves @ 2006-02-26  9:56 UTC (permalink / raw)
  To: Mark Lord
  Cc: Justin Piszcz, Jeff Garzik, linux-kernel,
	IDE/ATA development list, albertcc, axboe, htejun,
	Linus Torvalds

Mark Lord wrote:

>> sdb: Current: sense key: Medium Error
>>     Additional sense: Unrecovered read error - auto reallocate failed
>> end_request: I/O error, dev sdb, sector 398283329
>> raid1: Disk failure on sdb2, disabling device.
>>         Operation continuing on 1 devices
>
>
> Oh good, *now* we've gotten somewhere!!
>
> Albert / Jens / Jeff:
>
> The command failing above is SCSI WRITE_10, which is being
> translated into ATA_CMD_WRITE_FUA_EXT by libata.
>
> This command fails -- unrecognized by the drive in question.
> But libata reports it (most incorrectly) as a "medium error",
> and the drive is taken out of service from its RAID.
>
> Bad, bad, and worse.
>
> Libata should really recover from this, by recognizing that
> the command was rejected, and replacing it with a simple
> WRITE_EXT instead.  Possibly followed by FLUSH_CACHE.
>
> So.. I've forgotten who put FUA into libata, but hopefully
> it's one of the folks on the CC: list, and that nice person
> can now generate a patch to fix this bug somehow.

Thanks Mark

I'm glad it's a bug and not bad hardware.

I am quite concerned that the basic effect of just booting a practically
vanilla 2.6.16-rc4 like this was to fry my raid array.

Luckily it dropped 2 (of  3) disks so quickly that the event counter was
the same allowing an easy rebuild.

2.6.15 has similar issues but they seem to happen *very* infrequently by
comparison - this hit me several times during a single boot.

Should Linus (cc'ed) hold off on 2.6.16 because of this or not?

David


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-26  2:27                   ` Mark Lord
  2006-02-26  9:56                     ` David Greaves
@ 2006-02-26 12:27                     ` James Courtier-Dutton
  2006-02-26 12:55                       ` David Greaves
  2006-02-26 13:56                       ` Mark Lord
  1 sibling, 2 replies; 147+ messages in thread
From: James Courtier-Dutton @ 2006-02-26 12:27 UTC (permalink / raw)
  To: Mark Lord
  Cc: David Greaves, Justin Piszcz, Jeff Garzik, linux-kernel,
	IDE/ATA development list, albertcc, axboe, htejun

Mark Lord wrote:
> David Greaves wrote:
>>
>> Linux haze 2.6.16-rc4patched #1 PREEMPT Sat Feb 25 19:29:11 UTC 2006
>> i686 GNU/Linux
>>
>> ata2: status=0x51 { DriveReady SeekComplete Error }
>> ata2: error=0x04 { DriveStatusError }
>> ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
>> ata2: status=0x51 { DriveReady SeekComplete Error }
>> ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
>> ata2: status=0x51 { DriveReady SeekComplete Error }
>> ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
>> ata2: status=0x51 { DriveReady SeekComplete Error }
>> ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
>> ata2: status=0x51 { DriveReady SeekComplete Error }
>> sd 1:0:0:0: SCSI error: return code = 0x8000002
>> sdb: Current: sense key: Medium Error
>>     Additional sense: Unrecovered read error - auto reallocate failed
>> end_request: I/O error, dev sdb, sector 398283329
>> raid1: Disk failure on sdb2, disabling device.
>>         Operation continuing on 1 devices
>
> Oh good, *now* we've gotten somewhere!!
>
> Albert / Jens / Jeff:
>
> The command failing above is SCSI WRITE_10, which is being
> translated into ATA_CMD_WRITE_FUA_EXT by libata.
>
> This command fails -- unrecognized by the drive in question.
> But libata reports it (most incorrectly) as a "medium error",
> and the drive is taken out of service from its RAID.
>
> Bad, bad, and worse.
>

I have what looks like similar problems. The issue I have is that I 
don't think the problem is ONLY libata related.
I have two linux PCs. One called "games", the other called "localhost".
The problem happens quite quickly on the old "games" machine, but I can 
run for days/weeks until I see the problem on the "localhost".
It might be happening on the "localhost", but I am just not noticing. 
The difference being that if reiserfs sees this error, it cannot 
recover, and I have reiserfs on the "games" machine. The "localhost" 
only uses ext3, and ext3 recovers gracefully from this problem.
Can I use libata on this old "games" machine? It is an old Pentium 3 
machine.
In any case, The "games" machine is currently switched off until I can 
find a kernel that works, so I will happily test different kernels and 
patches, if people have suggestions.

I have two desktop linux machines. One is an old Pentium 3 which shows 
the following errors(no libata involved):
Linux version 2.6.15-rc4 (root@games) (gcc version 4.0.3 20051111 
(prerelease) (Debian 4.0.2-4)
) #1 Sat Dec 3 18:47:19 GMT 2005
Dec 16 22:51:57 games kernel: hdc: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x40 { 
UncorrectableError }, LBAsect=53058185, sector=53057951
Dec 16 22:52:32 games kernel: ide: failed opcode was: unknown
Dec 16 22:52:32 games kernel: end_request: I/O error, dev hdc, sector 
53057951
Dec 16 22:52:32 games kernel: hdc: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x10 { 
SectorIdNotFound }, LBAsect=53058185, sector=53057959
Dec 16 22:52:32 games kernel: ide: failed opcode was: unknown

The other has the following errors:
Linux version 2.6.15.1 (root@localhost) (gcc version 3.4.5 (Gentoo 
3.4.5, ssp-3.4.5-1.0, pi
e-8.7.9)) #3 SMP PREEMPT Fri Feb 3 23:19:05 GMT 2006
Feb 10 23:30:07 localhost kernel: ata3: command 0xb0 timeout, stat 0xd0 
host_stat 0x0
Feb 10 23:30:07 localhost kernel: ata3: translated ATA stat/err 0xd0/00 
to SCSI SK/ASC/ASCQ 0xb/47/00
Feb 10 23:30:07 localhost kernel: ata3: status=0xd0 { Busy }
Feb 10 23:30:07 localhost kernel: ATA: abnormal status 0xD0 on port 
0xF880E087
Feb 10 23:30:07 localhost last message repeated 3 times
Feb 10 23:30:10 localhost kernel: ata3: PIO error
Feb 10 23:30:10 localhost kernel: ata3: status=0x50 { DriveReady 
SeekComplete }
Feb 11 10:18:10 localhost kernel: ata2: command 0xb0 timeout, stat 0xd0 
host_stat 0x0
Feb 11 10:18:10 localhost kernel: ata2: translated ATA stat/err 0xd0/00 
to SCSI SK/ASC/ASCQ 0xb/47/00
Feb 11 10:18:10 localhost kernel: ata2: status=0xd0 { Busy }
Feb 11 10:18:10 localhost kernel: ATA: abnormal status 0xD0 on port 0x177
Feb 11 10:18:10 localhost last message repeated 3 times



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-26 12:27                     ` James Courtier-Dutton
@ 2006-02-26 12:55                       ` David Greaves
  2006-02-26 13:56                       ` Mark Lord
  1 sibling, 0 replies; 147+ messages in thread
From: David Greaves @ 2006-02-26 12:55 UTC (permalink / raw)
  To: James Courtier-Dutton
  Cc: Mark Lord, Justin Piszcz, Jeff Garzik, linux-kernel,
	IDE/ATA development list, albertcc, axboe, htejun

James Courtier-Dutton wrote:

> I have two desktop linux machines. One is an old Pentium 3 which shows
> the following errors(no libata involved):
> Linux version 2.6.15-rc4 (root@games) (gcc version 4.0.3 20051111
> (prerelease) (Debian 4.0.2-4)
> ) #1 Sat Dec 3 18:47:19 GMT 2005
> Dec 16 22:51:57 games kernel: hdc: dma_intr: status=0x51 { DriveReady
> SeekComplete Error }
> Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x40 {
> UncorrectableError }, LBAsect=53058185, sector=53057951
> Dec 16 22:52:32 games kernel: ide: failed opcode was: unknown
> Dec 16 22:52:32 games kernel: end_request: I/O error, dev hdc, sector
> 53057951
> Dec 16 22:52:32 games kernel: hdc: dma_intr: status=0x51 { DriveReady
> SeekComplete Error }
> Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x10 {
> SectorIdNotFound }, LBAsect=53058185, sector=53057959
> Dec 16 22:52:32 games kernel: ide: failed opcode was: unknown

This looks like a simple bad disk drive. Notice that the sectors are
quite close.
If you like you can move the drive to a working machine and run a
badblocks on it.
do 'man badblocks' before you start.
Is it SMART capable? What does
  smartctl -a /dev/hdc
show?

ddrescue may be your friend if you need to recover data.

Reply offlist if this is the case.

> The other has the following errors:
> Linux version 2.6.15.1 (root@localhost) (gcc version 3.4.5 (Gentoo
> 3.4.5, ssp-3.4.5-1.0, pi
> e-8.7.9)) #3 SMP PREEMPT Fri Feb 3 23:19:05 GMT 2006
> Feb 10 23:30:07 localhost kernel: ata3: command 0xb0 timeout, stat
> 0xd0 host_stat 0x0
> Feb 10 23:30:07 localhost kernel: ata3: translated ATA stat/err
> 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
> Feb 10 23:30:07 localhost kernel: ata3: status=0xd0 { Busy }
> Feb 10 23:30:07 localhost kernel: ATA: abnormal status 0xD0 on port
> 0xF880E087
> Feb 10 23:30:07 localhost last message repeated 3 times
> Feb 10 23:30:10 localhost kernel: ata3: PIO error
> Feb 10 23:30:10 localhost kernel: ata3: status=0x50 { DriveReady
> SeekComplete }
> Feb 11 10:18:10 localhost kernel: ata2: command 0xb0 timeout, stat
> 0xd0 host_stat 0x0
> Feb 11 10:18:10 localhost kernel: ata2: translated ATA stat/err
> 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
> Feb 11 10:18:10 localhost kernel: ata2: status=0xd0 { Busy }
> Feb 11 10:18:10 localhost kernel: ATA: abnormal status 0xD0 on port 0x177
> Feb 11 10:18:10 localhost last message repeated 3 times

Have you got smartd running?
I get a similar problem running some smartcl commands (-s on and -o on)
I suspect this is a libata ata passthru problem - but I'm *guessing* :)

check the last messages in dmesg then run
smartctl -data -s on /dev/sd...
smartctl -data -o on /dev/sd...
See if there are new messages in dmesg

David

-- 


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-26 12:27                     ` James Courtier-Dutton
  2006-02-26 12:55                       ` David Greaves
@ 2006-02-26 13:56                       ` Mark Lord
  2006-02-26 14:30                         ` Kernel SeekCompleteErrors... Different from " James Courtier-Dutton
  1 sibling, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-02-26 13:56 UTC (permalink / raw)
  To: James Courtier-Dutton
  Cc: Mark Lord, David Greaves, Justin Piszcz, Jeff Garzik,
	linux-kernel, IDE/ATA development list, albertcc, axboe, htejun

James Courtier-Dutton wrote:
>
> I have what looks like similar problems. The issue I have is that I 

Nope.  Different issues.

> ) #1 Sat Dec 3 18:47:19 GMT 2005
> Dec 16 22:51:57 games kernel: hdc: dma_intr: status=0x51 { DriveReady 
> SeekComplete Error }
> Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x40 { 
> UncorrectableError }, LBAsect=53058185, sector=53057951

The disk really does have bad sectors in this case (above).

> The other has the following errors:
> Linux version 2.6.15.1 (root@localhost) (gcc version 3.4.5 (Gentoo 
> 3.4.5, ssp-3.4.5-1.0, pi
> e-8.7.9)) #3 SMP PREEMPT Fri Feb 3 23:19:05 GMT 2006
> Feb 10 23:30:07 localhost kernel: ata3: command 0xb0 timeout, stat 0xd0 
> host_stat 0x0
> Feb 10 23:30:07 localhost kernel: ata3: translated ATA stat/err 0xd0/00 
> to SCSI SK/ASC/ASCQ 0xb/47/00
> Feb 10 23:30:07 localhost kernel: ata3: status=0xd0 { Busy }
> Feb 10 23:30:07 localhost kernel: ATA: abnormal status 0xD0 on port 
> 0xF880E087
> Feb 10 23:30:07 localhost last message repeated 3 times
> Feb 10 23:30:10 localhost kernel: ata3: PIO error
> Feb 10 23:30:10 localhost kernel: ata3: status=0x50 { DriveReady 
> SeekComplete }
> Feb 11 10:18:10 localhost kernel: ata2: command 0xb0 timeout, stat 0xd0 
> host_stat 0x0
> Feb 11 10:18:10 localhost kernel: ata2: translated ATA stat/err 0xd0/00 
> to SCSI SK/ASC/ASCQ 0xb/47/00
> Feb 11 10:18:10 localhost kernel: ata2: status=0xd0 { Busy }
> Feb 11 10:18:10 localhost kernel: ATA: abnormal status 0xD0 on port 0x177
> Feb 11 10:18:10 localhost last message repeated 3 times

PIO errors?  Are you using Alan Cox's experimental PATA code for libata?

-ml

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-26  9:56                     ` David Greaves
@ 2006-02-26 14:04                       ` Mark Lord
  2006-02-27 21:34                         ` Mark Lord
  0 siblings, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-02-26 14:04 UTC (permalink / raw)
  To: David Greaves
  Cc: Justin Piszcz, Jeff Garzik, linux-kernel,
	IDE/ATA development list, albertcc, axboe, htejun,
	Linus Torvalds

David Greaves wrote:
> Mark Lord wrote:
> 
>>> sdb: Current: sense key: Medium Error
>>>     Additional sense: Unrecovered read error - auto reallocate failed
>>> end_request: I/O error, dev sdb, sector 398283329
>>> raid1: Disk failure on sdb2, disabling device.
>>>         Operation continuing on 1 devices
..
>> The command failing above is SCSI WRITE_10, which is being
>> translated into ATA_CMD_WRITE_FUA_EXT by libata.
>>
>> This command fails -- unrecognized by the drive in question.
>> But libata reports it (most incorrectly) as a "medium error",
>> and the drive is taken out of service from its RAID.
>>
>> Bad, bad, and worse.
..
> Thanks Mark
> 
> I'm glad it's a bug and not bad hardware.
> 
> I am quite concerned that the basic effect of just booting a practically
> vanilla 2.6.16-rc4 like this was to fry my raid array.
> 
> Luckily it dropped 2 (of  3) disks so quickly that the event counter was
> the same allowing an easy rebuild.
> 
> 2.6.15 has similar issues but they seem to happen *very* infrequently by
> comparison - this hit me several times during a single boot.
> 
> Should Linus (cc'ed) hold off on 2.6.16 because of this or not?

Well, no doubt whatsoever about it being a "regression",
since the FUA code is *new* in 2.6.16 (not present in 2.6.15).

The FUA code should either get fixed, or removed from 2.6.16.

Cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Kernel SeekCompleteErrors... Different from Re: LibPATA code issues / 2.6.15.4
  2006-02-26 13:56                       ` Mark Lord
@ 2006-02-26 14:30                         ` James Courtier-Dutton
  2006-02-26 17:03                           ` Mark Lord
  0 siblings, 1 reply; 147+ messages in thread
From: James Courtier-Dutton @ 2006-02-26 14:30 UTC (permalink / raw)
  To: Mark Lord
  Cc: Mark Lord, David Greaves, Justin Piszcz, Jeff Garzik,
	linux-kernel, IDE/ATA development list, albertcc, axboe, htejun

Mark Lord wrote:
> James Courtier-Dutton wrote:
>>
>> I have what looks like similar problems. The issue I have is that I 
>
> Nope.  Different issues.
I have changed the Subject line to indicate this so any future responses 
can be indicated.

>
>> ) #1 Sat Dec 3 18:47:19 GMT 2005
>> Dec 16 22:51:57 games kernel: hdc: dma_intr: status=0x51 { DriveReady 
>> SeekComplete Error }
>> Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x40 { 
>> UncorrectableError }, LBAsect=53058185, sector=53057951
>
> The disk really does have bad sectors in this case (above).
The disk has NO bad sectors. It has been checked using two different tests.
1) seatools (The seagate test tool passed the deep test where it reads 
all sectors.)
2) dd of the entire HD image onto another HD.
No sector errors were encountered in either case.

>
>
>> The other has the following errors:
>> Linux version 2.6.15.1 (root@localhost) (gcc version 3.4.5 (Gentoo 
>> 3.4.5, ssp-3.4.5-1.0, pi
>> e-8.7.9)) #3 SMP PREEMPT Fri Feb 3 23:19:05 GMT 2006
>> Feb 10 23:30:07 localhost kernel: ata3: command 0xb0 timeout, stat 
>> 0xd0 host_stat 0x0
>> Feb 10 23:30:07 localhost kernel: ata3: translated ATA stat/err 
>> 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
>> Feb 10 23:30:07 localhost kernel: ata3: status=0xd0 { Busy }
>> Feb 10 23:30:07 localhost kernel: ATA: abnormal status 0xD0 on port 
>> 0xF880E087
>> Feb 10 23:30:07 localhost last message repeated 3 times
>> Feb 10 23:30:10 localhost kernel: ata3: PIO error
>> Feb 10 23:30:10 localhost kernel: ata3: status=0x50 { DriveReady 
>> SeekComplete }
>> Feb 11 10:18:10 localhost kernel: ata2: command 0xb0 timeout, stat 
>> 0xd0 host_stat 0x0
>> Feb 11 10:18:10 localhost kernel: ata2: translated ATA stat/err 
>> 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
>> Feb 11 10:18:10 localhost kernel: ata2: status=0xd0 { Busy }
>> Feb 11 10:18:10 localhost kernel: ATA: abnormal status 0xD0 on port 
>> 0x177
>> Feb 11 10:18:10 localhost last message repeated 3 times
>
> PIO errors?  Are you using Alan Cox's experimental PATA code for libata?
>
> -ml
>
No, this is Linux kernel 2.6.15.1 with no patches.

I cut and pasted the Linux version number to the top of each trace 
output in my original email.



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: Kernel SeekCompleteErrors... Different from Re: LibPATA code issues / 2.6.15.4
  2006-02-26 14:30                         ` Kernel SeekCompleteErrors... Different from " James Courtier-Dutton
@ 2006-02-26 17:03                           ` Mark Lord
  2006-02-26 17:13                             ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-02-26 17:03 UTC (permalink / raw)
  To: James Courtier-Dutton
  Cc: David Greaves, Justin Piszcz, Jeff Garzik, linux-kernel,
	IDE/ATA development list, albertcc, axboe, htejun

James Courtier-Dutton wrote:
> Mark Lord wrote:
>> James Courtier-Dutton wrote:
>>>
>>> I have what looks like similar problems. The issue I have is that I 
>>
>> Nope.  Different issues.
> I have changed the Subject line to indicate this so any future responses 
> can be indicated.
> 
>>
>>> ) #1 Sat Dec 3 18:47:19 GMT 2005
>>> Dec 16 22:51:57 games kernel: hdc: dma_intr: status=0x51 { DriveReady 
>>> SeekComplete Error }
>>> Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x40 { 
>>> UncorrectableError }, LBAsect=53058185, sector=53057951
>>
>> The disk really does have bad sectors in this case (above).
> The disk has NO bad sectors. It has been checked using two different tests.

The *only* test that matters is to enable S.M.A.R.T.,
and read out the error logs from it.

"smartctl" is the tool.

Cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: Kernel SeekCompleteErrors... Different from Re: LibPATA code issues / 2.6.15.4
  2006-02-26 17:03                           ` Mark Lord
@ 2006-02-26 17:13                             ` Dr. David Alan Gilbert
  2006-02-26 17:43                               ` Alan Cox
  0 siblings, 1 reply; 147+ messages in thread
From: Dr. David Alan Gilbert @ 2006-02-26 17:13 UTC (permalink / raw)
  To: Mark Lord
  Cc: James Courtier-Dutton, David Greaves, Justin Piszcz, Jeff Garzik,
	linux-kernel, IDE/ATA development list, albertcc, axboe, htejun

* Mark Lord (liml@rtr.ca) wrote:
> >>James Courtier-Dutton wrote:
> >>>
> >>>I have what looks like similar problems. The issue I have is that I 
> >>
> >>Nope.  Different issues.
> >I have changed the Subject line to indicate this so any future responses 
> >can be indicated.
> >
> >>
> >>>) #1 Sat Dec 3 18:47:19 GMT 2005
> >>>Dec 16 22:51:57 games kernel: hdc: dma_intr: status=0x51 { DriveReady 
> >>>SeekComplete Error }
> >>>Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x40 { 
> >>>UncorrectableError }, LBAsect=53058185, sector=53057951
> >>
> >>The disk really does have bad sectors in this case (above).
> >The disk has NO bad sectors. It has been checked using two different tests.
> 
> The *only* test that matters is to enable S.M.A.R.T.,
> and read out the error logs from it.

I have seen a set of drives that has reported UncorrectableErrors
and :
    * Shows the Uncorrectable error in the SMART log
    * Passes a full SMART test
    * Shows no remapped sectors
    * Passes the vendors drive test
    * Now fully passes a dd if=/dev/hdx of=/dev/null with no errors.

They were a set of 250GB SATA drives by the same vendor; I've taken
 them out one at a time as each did the same thing and replaced them
with another vendors drive.  They were all in use in RAID-1 MD 
configuration (under heavy load).

I do wonder about the 'uncorrectable error rate' that vendors report;
it doesn't seem very large - but I'll admit to not understanding its
units.  Are soft non-repeatable uncorrectable errors expected in
principal? (Pointers to a good explanation of what this actually
means would be appreciated).

I do wonder how often this happens to people and if the read succeeds
again they just blame it on software.

Dave
--
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    | Running GNU/Linux on Alpha,68K| Happy  \ 
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: Kernel SeekCompleteErrors... Different from Re: LibPATA code issues / 2.6.15.4
  2006-02-26 17:13                             ` Dr. David Alan Gilbert
@ 2006-02-26 17:43                               ` Alan Cox
  2006-02-26 20:36                                 ` Mark Lord
  0 siblings, 1 reply; 147+ messages in thread
From: Alan Cox @ 2006-02-26 17:43 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Mark Lord, James Courtier-Dutton, David Greaves, Justin Piszcz,
	Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc,
	axboe, htejun

On Sul, 2006-02-26 at 17:13 +0000, Dr. David Alan Gilbert wrote:
> > The *only* test that matters is to enable S.M.A.R.T.,
> > and read out the error logs from it.

SMART is unreliable for many cases

> I have seen a set of drives that has reported UncorrectableErrors
> and :
>     * Shows the Uncorrectable error in the SMART log
>     * Passes a full SMART test
>     * Shows no remapped sectors
>     * Passes the vendors drive test
>     * Now fully passes a dd if=/dev/hdx of=/dev/null with no errors.

The very early SATA code didnt decode the errors from the drive fully so
could produce bogus reports. The current code decodes it and also
displays the ATA level diagnostics so should be reliable.


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: Kernel SeekCompleteErrors... Different from Re: LibPATA code issues / 2.6.15.4
  2006-02-26 17:43                               ` Alan Cox
@ 2006-02-26 20:36                                 ` Mark Lord
  2006-02-27 11:48                                   ` Alan Cox
  0 siblings, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-02-26 20:36 UTC (permalink / raw)
  To: Alan Cox
  Cc: Dr. David Alan Gilbert, James Courtier-Dutton, David Greaves,
	Justin Piszcz, Jeff Garzik, linux-kernel,
	IDE/ATA development list, albertcc, axboe, htejun

Alan Cox wrote:
>
> The very early SATA code didnt decode the errors from the drive fully so
> could produce bogus reports. The current code decodes it and also
> displays the ATA level diagnostics so should be reliable.

It still is unreliable, as being discussed in another thread.

libata wrongly says "medium error" any time it issues a command
that the drive rejects (unsupported, invalid parameters, etc..).

This is biting a few people in 2.6.16-rc*, due to the FUA stuff.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: Kernel SeekCompleteErrors... Different from Re: LibPATA code issues / 2.6.15.4
  2006-02-26 20:36                                 ` Mark Lord
@ 2006-02-27 11:48                                   ` Alan Cox
  2006-02-27 13:40                                     ` Mark Lord
  0 siblings, 1 reply; 147+ messages in thread
From: Alan Cox @ 2006-02-27 11:48 UTC (permalink / raw)
  To: Mark Lord
  Cc: Dr. David Alan Gilbert, James Courtier-Dutton, David Greaves,
	Justin Piszcz, Jeff Garzik, linux-kernel,
	IDE/ATA development list, albertcc, axboe, htejun

On Sul, 2006-02-26 at 15:36 -0500, Mark Lord wrote:
> It still is unreliable, as being discussed in another thread.
> 
> libata wrongly says "medium error" any time it issues a command
> that the drive rejects (unsupported, invalid parameters, etc..).

It seems to still get a single case wrong. But it does the report the
ATA state correctly still.

> This is biting a few people in 2.6.16-rc*, due to the FUA stuff.

It is driven by a table in 

libata-scsi.c:ata_to_sense_error()

so if you can figure out the wrong entry and tweak the table that would
be great


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: Kernel SeekCompleteErrors... Different from Re: LibPATA code issues / 2.6.15.4
  2006-02-27 11:48                                   ` Alan Cox
@ 2006-02-27 13:40                                     ` Mark Lord
  0 siblings, 0 replies; 147+ messages in thread
From: Mark Lord @ 2006-02-27 13:40 UTC (permalink / raw)
  To: Alan Cox
  Cc: Dr. David Alan Gilbert, James Courtier-Dutton, David Greaves,
	Justin Piszcz, Jeff Garzik, linux-kernel,
	IDE/ATA development list, albertcc, axboe, htejun

Alan Cox wrote:
> On Sul, 2006-02-26 at 15:36 -0500, Mark Lord wrote:
>> It still is unreliable, as being discussed in another thread.
>>
>> libata wrongly says "medium error" any time it issues a command
>> that the drive rejects (unsupported, invalid parameters, etc..).
> 
> It seems to still get a single case wrong. But it does the report the
> ATA state correctly still.
> 
>> This is biting a few people in 2.6.16-rc*, due to the FUA stuff.
> 
> It is driven by a table in 
> 
> libata-scsi.c:ata_to_sense_error()
> 
> so if you can figure out the wrong entry and tweak the table that would be great

It's the fall-through case, where the table is not used.


         /* No error?  Undecoded? */
         printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: 0x%02x\n",
                id, opcode, drv_stat);

         /* For our last chance pick, use medium read error because
          * it's much more common than an ATA drive telling you a write
          * has failed.
          */
         *sk = MEDIUM_ERROR;
         *asc = 0x11; /* "unrecovered read error" */
         *ascq = 0x04; /*  "auto-reallocation failed" */

Cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-26 14:04                       ` Mark Lord
@ 2006-02-27 21:34                         ` Mark Lord
  2006-02-28  1:33                           ` Tejun Heo
  0 siblings, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-02-27 21:34 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: David Greaves, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe, htejun,
	Linus Torvalds

Mark Lord wrote:
>> Mark Lord wrote:
>>
>>>> sdb: Current: sense key: Medium Error
>>>>     Additional sense: Unrecovered read error - auto reallocate failed
>>>> end_request: I/O error, dev sdb, sector 398283329
>>>> raid1: Disk failure on sdb2, disabling device.
>>>>         Operation continuing on 1 devices
> ..
>>> The command failing above is SCSI WRITE_10, which is being
>>> translated into ATA_CMD_WRITE_FUA_EXT by libata.
>>>
>>> This command fails -- unrecognized by the drive in question.
>>> But libata reports it (most incorrectly) as a "medium error",
>>> and the drive is taken out of service from its RAID.
>>>
>>> Bad, bad, and worse.

.. hold off on 2.6.16 because of this or not?

> 
> Well, no doubt whatsoever about it being a "regression",
> since the FUA code is *new* in 2.6.16 (not present in 2.6.15).
> 
> The FUA code should either get fixed, or removed from 2.6.16.

Actually, now that I've done a little more digging, this FUA stuff
is inherently dangerous as implemented.  A least a few SATA controllers
including pipelines and whatnot that rely upon recognizing the (S)ATA
opcodes being using.  And I sincerely doubt that any of those will
recognize the very newish (and aptly named..) FUA opcodes.

These may be unsafe in general, unless we tag controllers as
FUA-capable and NON-FUA-capable, in addition to tagging the drives.

:/

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-27 21:34                         ` Mark Lord
@ 2006-02-28  1:33                           ` Tejun Heo
  2006-02-28  1:46                             ` Linus Torvalds
  2006-02-28  4:16                             ` Mark Lord
  0 siblings, 2 replies; 147+ messages in thread
From: Tejun Heo @ 2006-02-28  1:33 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jeff Garzik, David Greaves, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

Hello, Mark.

Mark Lord wrote:
> 
> .. hold off on 2.6.16 because of this or not?
> 

It certainly is dangerous. I guess we should turn off FUA for the time 
being. Barrier auto-fallback was once implemented but it didn't seem 
like a good idea as it was too complex and hides low level bug from 
higher level. The concensus seems to be developing blacklist of drives 
which lie about FUA support (currently only one drive). Official kernel 
doesn't seem to be the correct place to grow the blacklist, Maybe we 
should do it from -mm?

>>
>> Well, no doubt whatsoever about it being a "regression",
>> since the FUA code is *new* in 2.6.16 (not present in 2.6.15).
>>
>> The FUA code should either get fixed, or removed from 2.6.16.
> 
> 
> Actually, now that I've done a little more digging, this FUA stuff
> is inherently dangerous as implemented.  A least a few SATA controllers
> including pipelines and whatnot that rely upon recognizing the (S)ATA
> opcodes being using.  And I sincerely doubt that any of those will
> recognize the very newish (and aptly named..) FUA opcodes.
> 
> These may be unsafe in general, unless we tag controllers as
> FUA-capable and NON-FUA-capable, in addition to tagging the drives.

All sii controllers and piix/ahci seem to handle FUA pretty ok. And 
yeah, we may have to create controller blacklist too.

BTW, can you let me know what drive we're talking about now (model name 
and firmware revision)?

-- 
tejun

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28  1:33                           ` Tejun Heo
@ 2006-02-28  1:46                             ` Linus Torvalds
  2006-02-28  2:07                               ` Jeff Garzik
  2006-02-28  8:03                               ` Jens Axboe
  2006-02-28  4:16                             ` Mark Lord
  1 sibling, 2 replies; 147+ messages in thread
From: Linus Torvalds @ 2006-02-28  1:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Mark Lord, Jeff Garzik, David Greaves, Justin Piszcz,
	linux-kernel, IDE/ATA development list, albertcc, axboe



On Tue, 28 Feb 2006, Tejun Heo wrote:

> Hello, Mark.
> 
> Mark Lord wrote:
> > 
> > .. hold off on 2.6.16 because of this or not?
> > 
> 
> It certainly is dangerous. I guess we should turn off FUA for the time being.
> Barrier auto-fallback was once implemented but it didn't seem like a good idea
> as it was too complex and hides low level bug from higher level. The concensus
> seems to be developing blacklist of drives which lie about FUA support
> (currently only one drive). Official kernel doesn't seem to be the correct
> place to grow the blacklist, Maybe we should do it from -mm?

For 2.6.16, the only sane solution for now is to just turn it off.

Somebody want to send me a patch that does that, along with an ack from 
Mark (and whoever else sees this) that it fixes his/their problems?

		Linus

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28  1:46                             ` Linus Torvalds
@ 2006-02-28  2:07                               ` Jeff Garzik
  2006-02-28  2:14                                 ` Linus Torvalds
  2006-02-28 10:30                                 ` Alan Cox
  2006-02-28  8:03                               ` Jens Axboe
  1 sibling, 2 replies; 147+ messages in thread
From: Jeff Garzik @ 2006-02-28  2:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tejun Heo, Mark Lord, David Greaves, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe

[-- Attachment #1: Type: text/plain, Size: 312 bytes --]

Linus Torvalds wrote:
> For 2.6.16, the only sane solution for now is to just turn it off.
> 
> Somebody want to send me a patch that does that, along with an ack from 
> Mark (and whoever else sees this) that it fixes his/their problems?

I've had this waiting in the wings, in fact...  [see attached]

	Jeff



[-- Attachment #2: libata.txt --]
[-- Type: text/plain, Size: 1644 bytes --]

Please pull from 'upstream-fixes' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git

to receive the following updates:

 drivers/scsi/libata-core.c |    4 ++++
 drivers/scsi/libata-scsi.c |    2 ++
 drivers/scsi/libata.h      |    1 +
 3 files changed, 7 insertions(+)

Jeff Garzik:
      [libata] Disable FUA by default

diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index 5f1d758..ab3c9a4 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -82,6 +82,10 @@ int atapi_enabled = 0;
 module_param(atapi_enabled, int, 0444);
 MODULE_PARM_DESC(atapi_enabled, "Enable discovery of ATAPI devices (0=off, 1=on)");
 
+int fua = 0;
+module_param(fua, int, 0444);
+MODULE_PARM_DESC(fua, "FUA support (0=off, 1=on)");
+
 MODULE_AUTHOR("Jeff Garzik");
 MODULE_DESCRIPTION("Library module for ATA devices");
 MODULE_LICENSE("GPL");
diff --git a/drivers/scsi/libata-scsi.c b/drivers/scsi/libata-scsi.c
index 07b1e7c..5ce33ae 100644
--- a/drivers/scsi/libata-scsi.c
+++ b/drivers/scsi/libata-scsi.c
@@ -1708,6 +1708,8 @@ static int ata_dev_supports_fua(u16 *id)
 {
 	unsigned char model[41], fw[9];
 
+	if (!fua)
+		return 0;
 	if (!ata_id_has_fua(id))
 		return 0;
 
diff --git a/drivers/scsi/libata.h b/drivers/scsi/libata.h
index e03ce48..abfd18f 100644
--- a/drivers/scsi/libata.h
+++ b/drivers/scsi/libata.h
@@ -41,6 +41,7 @@ struct ata_scsi_args {
 
 /* libata-core.c */
 extern int atapi_enabled;
+extern int fua;
 extern struct ata_queued_cmd *ata_qc_new_init(struct ata_port *ap,
 				      struct ata_device *dev);
 extern int ata_rwcmd_protocol(struct ata_queued_cmd *qc);

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28  2:07                               ` Jeff Garzik
@ 2006-02-28  2:14                                 ` Linus Torvalds
  2006-02-28  2:52                                   ` Jeff Garzik
  2006-02-28  3:36                                   ` Jeff Garzik
  2006-02-28 10:30                                 ` Alan Cox
  1 sibling, 2 replies; 147+ messages in thread
From: Linus Torvalds @ 2006-02-28  2:14 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Tejun Heo, Mark Lord, David Greaves, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe



On Mon, 27 Feb 2006, Jeff Garzik wrote:
> 
> I've had this waiting in the wings, in fact...  [see attached]

I really hate having a _global_ variable called "fua". That's just bad 
taste. I would suggest calling it "atapi_forced_unit_attention_enabled", 
but maybe that is going a bit overboard. It's definitely better than just 
"fua", though.

			Linus

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28  2:14                                 ` Linus Torvalds
@ 2006-02-28  2:52                                   ` Jeff Garzik
  2006-02-28  3:36                                   ` Jeff Garzik
  1 sibling, 0 replies; 147+ messages in thread
From: Jeff Garzik @ 2006-02-28  2:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tejun Heo, Mark Lord, David Greaves, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe

Linus Torvalds wrote:
> 
> On Mon, 27 Feb 2006, Jeff Garzik wrote:
> 
>>I've had this waiting in the wings, in fact...  [see attached]
> 
> 
> I really hate having a _global_ variable called "fua". That's just bad 
> taste. I would suggest calling it "atapi_forced_unit_attention_enabled", 
> but maybe that is going a bit overboard. It's definitely better than just 
> "fua", though.

<shrug>  It will go away when things are fixed, and only users who are 
testing will even bother with it.

Looking over the module subsystem, it looks like one could use 
module_param_named() to achieve proper namespace separation (C versus 
module opt) -- then you could call it libata_fua -- but for a temporary 
module option it seems like more trouble than its worth.

	Jeff




^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28  2:14                                 ` Linus Torvalds
  2006-02-28  2:52                                   ` Jeff Garzik
@ 2006-02-28  3:36                                   ` Jeff Garzik
  2006-02-28  4:11                                     ` Mark Lord
  1 sibling, 1 reply; 147+ messages in thread
From: Jeff Garzik @ 2006-02-28  3:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tejun Heo, Mark Lord, David Greaves, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe

[-- Attachment #1: Type: text/plain, Size: 436 bytes --]

Linus Torvalds wrote:
> 
> On Mon, 27 Feb 2006, Jeff Garzik wrote:
> 
>>I've had this waiting in the wings, in fact...  [see attached]
> 
> 
> I really hate having a _global_ variable called "fua". That's just bad 
> taste. I would suggest calling it "atapi_forced_unit_attention_enabled", 
> but maybe that is going a bit overboard. It's definitely better than just 
> "fua", though.

Here's the cleaner namespace version...

	Jeff




[-- Attachment #2: libata.txt --]
[-- Type: text/plain, Size: 1672 bytes --]

Please pull from 'upstream-fixes' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git

to receive the following updates:

 drivers/scsi/libata-core.c |    4 ++++
 drivers/scsi/libata-scsi.c |    2 ++
 drivers/scsi/libata.h      |    1 +
 3 files changed, 7 insertions(+)

Jeff Garzik:
      [libata] Disable FUA

diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index 5f1d758..4f91b0d 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -82,6 +82,10 @@ int atapi_enabled = 0;
 module_param(atapi_enabled, int, 0444);
 MODULE_PARM_DESC(atapi_enabled, "Enable discovery of ATAPI devices (0=off, 1=on)");
 
+int libata_fua = 0;
+module_param_named(fua, libata_fua, int, 0444);
+MODULE_PARM_DESC(fua, "FUA support (0=off, 1=on)");
+
 MODULE_AUTHOR("Jeff Garzik");
 MODULE_DESCRIPTION("Library module for ATA devices");
 MODULE_LICENSE("GPL");
diff --git a/drivers/scsi/libata-scsi.c b/drivers/scsi/libata-scsi.c
index 07b1e7c..59503c9 100644
--- a/drivers/scsi/libata-scsi.c
+++ b/drivers/scsi/libata-scsi.c
@@ -1708,6 +1708,8 @@ static int ata_dev_supports_fua(u16 *id)
 {
 	unsigned char model[41], fw[9];
 
+	if (!libata_fua)
+		return 0;
 	if (!ata_id_has_fua(id))
 		return 0;
 
diff --git a/drivers/scsi/libata.h b/drivers/scsi/libata.h
index e03ce48..fddaf47 100644
--- a/drivers/scsi/libata.h
+++ b/drivers/scsi/libata.h
@@ -41,6 +41,7 @@ struct ata_scsi_args {
 
 /* libata-core.c */
 extern int atapi_enabled;
+extern int libata_fua;
 extern struct ata_queued_cmd *ata_qc_new_init(struct ata_port *ap,
 				      struct ata_device *dev);
 extern int ata_rwcmd_protocol(struct ata_queued_cmd *qc);

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28  3:36                                   ` Jeff Garzik
@ 2006-02-28  4:11                                     ` Mark Lord
  0 siblings, 0 replies; 147+ messages in thread
From: Mark Lord @ 2006-02-28  4:11 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Linus Torvalds, Tejun Heo, David Greaves, Justin Piszcz,
	linux-kernel, IDE/ATA development list, albertcc, axboe

Jeff Garzik wrote:
> Linus Torvalds wrote:
..
>> I really hate having a _global_ variable called "fua". That's just bad 
>> taste. I would suggest calling it "atapi_forced_unit_attention_enabled"

Heh heh..
It's actually short for "Force Unit Access",
though oddly enough I don't think the patch
mentions that in the MODULE_PARM_DESC().

> Here's the cleaner namespace version...

David, do you want to ack this one for us?

Cheers


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28  1:33                           ` Tejun Heo
  2006-02-28  1:46                             ` Linus Torvalds
@ 2006-02-28  4:16                             ` Mark Lord
  2006-02-28 10:32                               ` Alan Cox
  2006-02-28 10:39                               ` David Greaves
  1 sibling, 2 replies; 147+ messages in thread
From: Mark Lord @ 2006-02-28  4:16 UTC (permalink / raw)
  To: Tejun Heo, David Greaves
  Cc: Mark Lord, Jeff Garzik, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

Tejun Heo wrote:
..
>> These may be unsafe in general, unless we tag controllers as
>> FUA-capable and NON-FUA-capable, in addition to tagging the drives.
> 
> All sii controllers and piix/ahci seem to handle FUA pretty ok. And 
> yeah, we may have to create controller blacklist too.

Or maybe a whitelist instead, since nearly all existing hardware
pre-dates FUA commands.

Or maybe just have a libata function to test whether the FUA commands
actually work or not, before enabling them for general use.
*That* could be a much better approach, given the large number of
possible drive/controller combos, and it cuts down on the maintenance
headache of having to list everything on a list somewhere.

> BTW, can you let me know what drive we're talking about now (model name 
> and firmware revision)?

David:  we need to see the output from "hdparm --Istdout /dev/sda
(or whichever drive it was that was failing on your system).

Cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28  1:46                             ` Linus Torvalds
  2006-02-28  2:07                               ` Jeff Garzik
@ 2006-02-28  8:03                               ` Jens Axboe
  1 sibling, 0 replies; 147+ messages in thread
From: Jens Axboe @ 2006-02-28  8:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tejun Heo, Mark Lord, Jeff Garzik, David Greaves, Justin Piszcz,
	linux-kernel, IDE/ATA development list, albertcc

On Mon, Feb 27 2006, Linus Torvalds wrote:
> 
> 
> On Tue, 28 Feb 2006, Tejun Heo wrote:
> 
> > Hello, Mark.
> > 
> > Mark Lord wrote:
> > > 
> > > .. hold off on 2.6.16 because of this or not?
> > > 
> > 
> > It certainly is dangerous. I guess we should turn off FUA for the
> > time being.  Barrier auto-fallback was once implemented but it
> > didn't seem like a good idea as it was too complex and hides low
> > level bug from higher level. The concensus seems to be developing
> > blacklist of drives which lie about FUA support (currently only one
> > drive). Official kernel doesn't seem to be the correct place to grow
> > the blacklist, Maybe we should do it from -mm?
> 
> For 2.6.16, the only sane solution for now is to just turn it off.
> 
> Somebody want to send me a patch that does that, along with an ack from 
> Mark (and whoever else sees this) that it fixes his/their problems?

That's the best solution right now. I guess there's no way around a
blacklist for FUA support and we need time to grow that :-(
And proper fallback to non-FUA writes with disabling FUA based barriers
as well.

Mark, what drive model+firmware are you using?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28 10:32                               ` Alan Cox
@ 2006-02-28 10:30                                 ` Justin Piszcz
  0 siblings, 0 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-02-28 10:30 UTC (permalink / raw)
  To: Alan Cox
  Cc: Mark Lord, Tejun Heo, David Greaves, Mark Lord, Jeff Garzik,
	linux-kernel, IDE/ATA development list, albertcc, axboe,
	Linus Torvalds



On Tue, 28 Feb 2006, Alan Cox wrote:

> On Llu, 2006-02-27 at 23:16 -0500, Mark Lord wrote:
>> Or maybe a whitelist instead, since nearly all existing hardware
>> pre-dates FUA commands.
>
> For controllers just add it as a host flag and it can be handled the
> same way as LBA48 is right now. It may also be some hosts can issue FUA
> with a bit of bandaging (state machine resets/pio etc)
>
> Alan
>

While I have not yet been able to reproduce the problem with the verbose 
patch, here is the hdparm -I:

/dev/sdc:

ATA device, with non-removable media
         Model Number:       WDC WD4000KD-00NAB0
         Serial Number:      WD-WMAMY1020930
         Firmware Revision:  01.06A01
Standards:
         Supported: 7 6 5 4
         Likely used: 7
Configuration:
         Logical         max     current
         cylinders       16383   16383
         heads           16      16
         sectors/track   63      63
         --
         CHS current addressable sectors:   16514064
         LBA    user addressable sectors:  268435455
         LBA48  user addressable sectors:  781422768
         device size with M = 1024*1024:      381554 MBytes
         device size with M = 1000*1000:      400088 MBytes (400 GB)
Capabilities:
         LBA, IORDY(can be disabled)
         Queue depth: 32
         Standby timer values: spec'd by Standard, with device specific 
minimum
         R/W multiple sector transfer: Max = 16  Current = 0
         Recommended acoustic management value: 128, current value: 254
         DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 udma6
              Cycle time: min=120ns recommended=120ns
         PIO: pio0 pio1 pio2 pio3 pio4
              Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
         Enabled Supported:
            *    NOP cmd
            *    READ BUFFER cmd
            *    WRITE BUFFER cmd
            *    Host Protected Area feature set
            *    Look-ahead
            *    Write cache
            *    Power Management feature set
                 Security Mode feature set
            *    SMART feature set
            *    FLUSH CACHE EXT command
            *    Mandatory FLUSH CACHE command
            *    Device Configuration Overlay feature set
            *    48-bit Address feature set
                 Automatic Acoustic Management feature set
                 SET MAX security extension
            *    DOWNLOAD MICROCODE cmd
            *    General Purpose Logging feature set
            *    SMART self-test
            *    SMART error logging
Security:
                 supported
         not     enabled
         not     locked
         not     frozen
         not     expired: security count
         not     supported: enhanced erase
Checksum: correct


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28  2:07                               ` Jeff Garzik
  2006-02-28  2:14                                 ` Linus Torvalds
@ 2006-02-28 10:30                                 ` Alan Cox
  1 sibling, 0 replies; 147+ messages in thread
From: Alan Cox @ 2006-02-28 10:30 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Linus Torvalds, Tejun Heo, Mark Lord, David Greaves,
	Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc,
	axboe

On Llu, 2006-02-27 at 21:07 -0500, Jeff Garzik wrote:
> led, "Enable discovery of ATAPI devices (0=off, 1=on)");
>  
> +int fua = 0;
> +module_param(fua, int, 0444);
> +MODULE_PARM_DESC(fua, "FUA support (0=off, 1=on)");
> +

Not a good name for a global.



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28  4:16                             ` Mark Lord
@ 2006-02-28 10:32                               ` Alan Cox
  2006-02-28 10:30                                 ` Justin Piszcz
  2006-02-28 10:39                               ` David Greaves
  1 sibling, 1 reply; 147+ messages in thread
From: Alan Cox @ 2006-02-28 10:32 UTC (permalink / raw)
  To: Mark Lord
  Cc: Tejun Heo, David Greaves, Mark Lord, Jeff Garzik, Justin Piszcz,
	linux-kernel, IDE/ATA development list, albertcc, axboe,
	Linus Torvalds

On Llu, 2006-02-27 at 23:16 -0500, Mark Lord wrote:
> Or maybe a whitelist instead, since nearly all existing hardware
> pre-dates FUA commands.

For controllers just add it as a host flag and it can be handled the
same way as LBA48 is right now. It may also be some hosts can issue FUA
with a bit of bandaging (state machine resets/pio etc)

Alan


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28  4:16                             ` Mark Lord
  2006-02-28 10:32                               ` Alan Cox
@ 2006-02-28 10:39                               ` David Greaves
  2006-02-28 14:37                                 ` Mark Lord
                                                   ` (2 more replies)
  1 sibling, 3 replies; 147+ messages in thread
From: David Greaves @ 2006-02-28 10:39 UTC (permalink / raw)
  To: Mark Lord
  Cc: Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

Mark Lord wrote:

> Tejun Heo wrote:
>
>> BTW, can you let me know what drive we're talking about now (model
>> name and firmware revision)?
>
>
> David:  we need to see the output from "hdparm --Istdout /dev/sda
> (or whichever drive it was that was failing on your system).
>
> Cheers
>
So here's the info for sda and sdb (see below for related log data).

/dev/sda:
 IO_support   =  0 (default 16-bit)
 readonly     =  0 (off)
 readahead    = 256 (on)
 geometry     = 24321/255/63, sectors = 390721968, start = 0
0040 3fff c837 0010 0000 0000 003f 0000
0000 0000 4234 3033 3852 5248 2020 2020
2020 2020 2020 2020 0003 4000 0004 4241
4e43 3139 3830 4d61 7874 6f72 2036 4232
3030 4d30 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 8010
0000 2f00 4000 0200 0000 0007 3fff 0010
003f fc10 00fb 0100 ffff 0fff 0000 0007
0003 0078 0078 0078 0078 0000 0000 0000
0000 0000 0000 0000 0002 0000 0000 0000
00fe 001e 7869 7d09 4043 7869 3c01 4043
203f 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 f1b0 1749 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0113 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 d3a5

/dev/sdb:
 IO_support   =  0 (default 16-bit)
 readonly     =  0 (off)
 readahead    = 256 (on)
 geometry     = 24792/255/63, sectors = 398297088, start = 0
0040 3fff c837 0010 0000 0000 003f 0000
0000 0000 4234 3152 5641 3148 2020 2020
2020 2020 2020 2020 0003 4000 0004 4241
4e43 3142 5930 4d61 7874 6f72 2036 4232
3030 4d30 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 8010
0000 2f00 4000 0200 0000 0007 3fff 0010
003f fc10 00fb 0100 ffff 0fff 0000 0007
0003 0078 0078 0078 0078 0000 0000 0000
0000 0000 0000 001f 0102 0000 0000 0000
00fe 001e 7c6b 7f09 4063 7c69 3e01 4063
207f 0000 0000 0000 fffe 0000 c0fe 0000
0000 0000 0000 0000 8800 17bd 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0001 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0113 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 d8a5

The info below is from the log I saved booted with 2.6.16-rc4
I got these errors:

sd 0:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key: Medium Error
    Additional sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sda, sector 390716735
raid5: Disk failure on sda1, disabling device. Operation continuing on 2
devices
ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata2: status=0x51 { DriveReady SeekComplete Error }
sd 1:0:0:0: SCSI error: return code = 0x8000002
sdb: Current: sense key: Medium Error
    Additional sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sdb, sector 390716735
raid5: Disk failure on sdb1, disabling device. Operation continuing on 1
devices

They are both attached to:
libata version 1.20 loaded.
sata_sil 0000:00:0a.0: version 0.9
ACPI: PCI Interrupt 0000:00:0a.0[A] -> GSI 16 (level, low) -> IRQ 17
ata1: SATA max UDMA/100 cmd 0xF8804080 ctl 0xF880408A bmdma 0xF8804000
irq 17
ata2: SATA max UDMA/100 cmd 0xF88040C0 ctl 0xF88040CA bmdma 0xF8804008
irq 17
ata1: SATA link up 1.5 Gbps (SStatus 113)
ata1: dev 0 cfg 49:2f00 82:7869 83:7d09 84:4043 85:7869 86:3c01 87:4043
88:203f
ata1: dev 0 ATA-7, max UDMA/100, 390721968 sectors: LBA48
ata1: dev 0 configured for UDMA/100
scsi0 : sata_sil
ata2: SATA link up 1.5 Gbps (SStatus 113)
ata2: dev 0 cfg 49:2f00 82:7c6b 83:7f09 84:4063 85:7c69 86:3e01 87:4063
88:007f
ata2: dev 0 ATA-7, max UDMA/133, 398297088 sectors: LBA48
ata2: dev 0 configured for UDMA/100
scsi1 : sata_sil
  Vendor: ATA       Model: Maxtor 6B200M0    Rev: BANC
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: Maxtor 6B200M0    Rev: BANC
  Type:   Direct-Access                      ANSI SCSI revision: 05

Are there any other tests; like swapping the disks to the other
controller (sata_via) and seeing what happens. With and without the patch?

David

-- 


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28 10:39                               ` David Greaves
@ 2006-02-28 14:37                                 ` Mark Lord
  2006-02-28 21:04                                   ` Bill Davidsen
  2006-02-28 14:38                                 ` Mark Lord
  2006-02-28 15:31                                 ` Mark Lord
  2 siblings, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-02-28 14:37 UTC (permalink / raw)
  To: David Greaves
  Cc: Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

David Greaves wrote:
>
> /dev/sda:
..
> 0040 3fff c837 0010 0000 0000 003f 0000
> 0000 0000 4234 3033 3852 5248 2020 2020
> 2020 2020 2020 2020 0003 4000 0004 4241
> 4e43 3139 3830 4d61 7874 6f72 2036 4232
> 3030 4d30 2020 2020 2020 2020 2020 2020
> 2020 2020 2020 2020 2020 2020 2020 8010
> 0000 2f00 4000 0200 0000 0007 3fff 0010
> 003f fc10 00fb 0100 ffff 0fff 0000 0007
> 0003 0078 0078 0078 0078 0000 0000 0000
> 0000 0000 0000 0000 0002 0000 0000 0000
> 00fe 001e 7869 7d09 4043 7869 3c01 4043
> 203f 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 f1b0 1749 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0113 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 d3a5
..
hdparm-6.4 says:

         Model Number:       Maxtor 6B200M0
         Serial Number:      B4038RRH
         Firmware Revision:  BANC1980

Commands/features:
         Enabled Supported:
            *    NOP cmd
            *    READ BUFFER cmd
            *    WRITE BUFFER cmd
            *    Look-ahead
            *    Write cache
            *    Power Management feature set
            *    SMART feature set
            *    FLUSH_CACHE_EXT
            *    Mandatory FLUSH_CACHE
            *    Device Configuration Overlay feature set
            *    48-bit Address feature set
                 SET_MAX security extension
                 Advanced Power Management feature set
            *    DOWNLOAD_MICROCODE
            *    WRITE_{DMA|MULTIPLE}_FUA_EXT
            *    SMART self-test
            *    SMART error logging

So, yes, the drive is either lying about "* WRITE_{DMA|MULTIPLE}_FUA_EXT",
or it didn't like the parameters it was given, or the SATA/IDE controller
chip didn't like the command.

Cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28 10:39                               ` David Greaves
  2006-02-28 14:37                                 ` Mark Lord
@ 2006-02-28 14:38                                 ` Mark Lord
  2006-02-28 15:16                                   ` Alan Cox
  2006-02-28 15:31                                 ` Mark Lord
  2 siblings, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-02-28 14:38 UTC (permalink / raw)
  To: David Greaves
  Cc: Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

David Greaves wrote:
..
> sd 0:0:0:0: SCSI error: return code = 0x8000002
> sda: Current: sense key: Medium Error
>     Additional sense: Unrecovered read error - auto reallocate failed
> end_request: I/O error, dev sda, sector 390716735
> raid5: Disk failure on sda1, disabling device. Operation continuing on 2
> devices
> ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
> ata2: status=0x51 { DriveReady SeekComplete Error }
> sd 1:0:0:0: SCSI error: return code = 0x8000002
> sdb: Current: sense key: Medium Error
>     Additional sense: Unrecovered read error - auto reallocate failed
> end_request: I/O error, dev sdb, sector 390716735
> raid5: Disk failure on sdb1, disabling device. Operation continuing on 1
> devices
..

The error handling still sucks, regardless of FUA.
All of this nonsense about "Medium Error" is pure bogosity here.

Cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28 14:38                                 ` Mark Lord
@ 2006-02-28 15:16                                   ` Alan Cox
  2006-03-01 17:33                                     ` David Greaves
  0 siblings, 1 reply; 147+ messages in thread
From: Alan Cox @ 2006-02-28 15:16 UTC (permalink / raw)
  To: Mark Lord
  Cc: David Greaves, Tejun Heo, Jeff Garzik, Justin Piszcz,
	linux-kernel, IDE/ATA development list, albertcc, axboe,
	Linus Torvalds

On Maw, 2006-02-28 at 09:38 -0500, Mark Lord wrote:
> 
> The error handling still sucks, regardless of FUA.
> All of this nonsense about "Medium Error" is pure bogosity here.

I've flipped my tree to report Aborted Command. Not sure there is a
better scsi sense match for "it broke and I dont know why"


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28 10:39                               ` David Greaves
  2006-02-28 14:37                                 ` Mark Lord
  2006-02-28 14:38                                 ` Mark Lord
@ 2006-02-28 15:31                                 ` Mark Lord
  2006-02-28 15:34                                   ` Jeff Garzik
  2 siblings, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-02-28 15:31 UTC (permalink / raw)
  To: David Greaves
  Cc: Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

David Greaves wrote:
>
> scsi1 : sata_sil
>   Vendor: ATA       Model: Maxtor 6B200M0    Rev: BANC
>   Type:   Direct-Access                      ANSI SCSI revision: 05
>   Vendor: ATA       Model: Maxtor 6B200M0    Rev: BANC
>   Type:   Direct-Access                      ANSI SCSI revision: 05

I wonder if the non-FUA component here is the sata_sil,
rather than the two Maxtor drives.

Also, your drives have different firmware,
but both have trouble with FUA here.

(sdb is slightly newer, and larger, than sda).

Cheers



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28 15:31                                 ` Mark Lord
@ 2006-02-28 15:34                                   ` Jeff Garzik
  2006-02-28 16:57                                     ` Eric D. Mudama
  2006-03-01 17:41                                     ` David Greaves
  0 siblings, 2 replies; 147+ messages in thread
From: Jeff Garzik @ 2006-02-28 15:34 UTC (permalink / raw)
  To: Mark Lord
  Cc: David Greaves, Tejun Heo, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

Mark Lord wrote:
> David Greaves wrote:
> 
>>
>> scsi1 : sata_sil
>>   Vendor: ATA       Model: Maxtor 6B200M0    Rev: BANC
>>   Type:   Direct-Access                      ANSI SCSI revision: 05
>>   Vendor: ATA       Model: Maxtor 6B200M0    Rev: BANC
>>   Type:   Direct-Access                      ANSI SCSI revision: 05
> 
> 
> I wonder if the non-FUA component here is the sata_sil,
> rather than the two Maxtor drives.
> 
> Also, your drives have different firmware,
> but both have trouble with FUA here.

sata_sil is indeed a piece of hardware that needs to know the opcodes 
ahead of time...

	Jeff




^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28 15:34                                   ` Jeff Garzik
@ 2006-02-28 16:57                                     ` Eric D. Mudama
  2006-03-01  1:04                                       ` Mark Lord
  2006-03-01 17:41                                     ` David Greaves
  1 sibling, 1 reply; 147+ messages in thread
From: Eric D. Mudama @ 2006-02-28 16:57 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Mark Lord, David Greaves, Tejun Heo, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

those drives should support all FUA opcodes properly, both queued and unqueued

On 2/28/06, Jeff Garzik <jgarzik@pobox.com> wrote:
> Mark Lord wrote:
> > David Greaves wrote:
> >
> >>
> >> scsi1 : sata_sil
> >>   Vendor: ATA       Model: Maxtor 6B200M0    Rev: BANC
> >>   Type:   Direct-Access                      ANSI SCSI revision: 05
> >>   Vendor: ATA       Model: Maxtor 6B200M0    Rev: BANC
> >>   Type:   Direct-Access                      ANSI SCSI revision: 05
> >
> >
> > I wonder if the non-FUA component here is the sata_sil,
> > rather than the two Maxtor drives.
> >
> > Also, your drives have different firmware,
> > but both have trouble with FUA here.
>
> sata_sil is indeed a piece of hardware that needs to know the opcodes
> ahead of time...
>
>         Jeff
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28 14:37                                 ` Mark Lord
@ 2006-02-28 21:04                                   ` Bill Davidsen
  2006-03-08  2:57                                     ` Mark Lord
  0 siblings, 1 reply; 147+ messages in thread
From: Bill Davidsen @ 2006-02-28 21:04 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe

Mark Lord wrote:
> David Greaves wrote:
>>
>> /dev/sda:
   [...snip...]
> ..
> hdparm-6.4 says:

Is there a version of that which will build on x86? I grabbed the 
version offered at freshmeat, but it won't compile on any x86 distro or 
gcc version to which I have access. RH8, RH9, FC1, FC3, FC4, ubuntu... 
with or without using the suggested alternate header.
> 
>         Model Number:       Maxtor 6B200M0
>         Serial Number:      B4038RRH
>         Firmware Revision:  BANC1980
> 
> Commands/features:
>         Enabled Supported:
>            *    NOP cmd
>            *    READ BUFFER cmd
>            *    WRITE BUFFER cmd
>            *    Look-ahead
>            *    Write cache
>            *    Power Management feature set
>            *    SMART feature set
>            *    FLUSH_CACHE_EXT
>            *    Mandatory FLUSH_CACHE
>            *    Device Configuration Overlay feature set
>            *    48-bit Address feature set
>                 SET_MAX security extension
>                 Advanced Power Management feature set
>            *    DOWNLOAD_MICROCODE
>            *    WRITE_{DMA|MULTIPLE}_FUA_EXT
>            *    SMART self-test
>            *    SMART error logging
> 
> So, yes, the drive is either lying about "* WRITE_{DMA|MULTIPLE}_FUA_EXT",
> or it didn't like the parameters it was given, or the SATA/IDE controller
> chip didn't like the command.


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28 16:57                                     ` Eric D. Mudama
@ 2006-03-01  1:04                                       ` Mark Lord
  2006-03-01 11:37                                         ` Justin Piszcz
  2006-03-01 13:17                                         ` Justin Piszcz
  0 siblings, 2 replies; 147+ messages in thread
From: Mark Lord @ 2006-03-01  1:04 UTC (permalink / raw)
  To: Eric D. Mudama
  Cc: Jeff Garzik, David Greaves, Tejun Heo, Justin Piszcz,
	linux-kernel, IDE/ATA development list, albertcc, axboe,
	Linus Torvalds

Eric D. Mudama wrote:
> those drives should support all FUA opcodes properly, both queued and unqueued

His first drive (sda) does not support queued commands at all,
but the newer firmware in his second drive (sdb) does support NCQ.

Both drives support FUA.

cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01  1:04                                       ` Mark Lord
@ 2006-03-01 11:37                                         ` Justin Piszcz
  2006-03-01 13:17                                         ` Justin Piszcz
  1 sibling, 0 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-03-01 11:37 UTC (permalink / raw)
  To: Mark Lord
  Cc: Eric D. Mudama, Jeff Garzik, David Greaves, Tejun Heo,
	linux-kernel, IDE/ATA development list, albertcc, axboe,
	Linus Torvalds

On Tue, 28 Feb 2006, Mark Lord wrote:

> Eric D. Mudama wrote:
>> those drives should support all FUA opcodes properly, both queued and 
>> unqueued
>
> His first drive (sda) does not support queued commands at all,
> but the newer firmware in his second drive (sdb) does support NCQ.
>
> Both drives support FUA.
>
> cheers
>

To trust or not to trust?

I have a 400GB SATA drive: WDC WD4000KD-00N.  With these errors in dmesg 
that have been mentioned throughout the thread, should I trust Linux using 
this drive, or should I remove it/wait until a patch is released to 
address this issue?

Also, in the forums (storagereview.com I believe), it has been noted that 
these drives do NOT work on the Intel ICH5 controller, and this turned out 
to be true, when I put it on the Intel ICH5, the box stalls for 2-3 
minutes and then it does not see the drive.  However, on the Silicon 
Image, Inc. SiI 3112 chipset or Promise SATA/150 TX2 it works okay but it 
has those errors in dmesg.

My question is, performing long and short smart tests, everything is 
physically ok with the drive; however, I probably should not use this 
drive for anything important in Linux, comments?

Justin.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01  1:04                                       ` Mark Lord
  2006-03-01 11:37                                         ` Justin Piszcz
@ 2006-03-01 13:17                                         ` Justin Piszcz
  1 sibling, 0 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-03-01 13:17 UTC (permalink / raw)
  To: Mark Lord
  Cc: Eric D. Mudama, Jeff Garzik, David Greaves, Tejun Heo,
	linux-kernel, IDE/ATA development list, albertcc, axboe,
	Linus Torvalds



On Tue, 28 Feb 2006, Mark Lord wrote:

> Eric D. Mudama wrote:
>> those drives should support all FUA opcodes properly, both queued and 
>> unqueued
>
> His first drive (sda) does not support queued commands at all,
> but the newer firmware in his second drive (sdb) does support NCQ.
>
> Both drives support FUA.
>
> cheers
>

Could someone *PLEASE* produce a *unified* patch that is compatible with 
2.6.16-rc5 or 2.6.15.4 so I can reproduce the error?

Mark had two patches, I have had the most PIA time getting them to work, 
patch properly, etc..

With 2.6.16-rc5:

# make bzImage
   CHK     include/linux/version.h
scripts/kconfig/conf -s arch/i386/Kconfig
#
# using defaults found in .config
#
   SPLIT   include/linux/autoconf.h -> include/config/*
   CHK     include/linux/compile.h
   CHK     usr/initramfs_list
   GEN     .version
   CHK     include/linux/compile.h
   UPD     include/linux/compile.h
   CC      init/version.o
   LD      init/built-in.o
   LD      .tmp_vmlinux1
drivers/built-in.o: In function `ata_to_sense_error': undefined reference 
to `print'
drivers/built-in.o: In function `ata_to_sense_error': undefined reference 
to `print'
make: *** [.tmp_vmlinux1] Error 1
Command exited with non-zero status 2



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28 15:16                                   ` Alan Cox
@ 2006-03-01 17:33                                     ` David Greaves
  2006-03-01 18:37                                       ` Alan Cox
  0 siblings, 1 reply; 147+ messages in thread
From: David Greaves @ 2006-03-01 17:33 UTC (permalink / raw)
  To: Alan Cox
  Cc: Mark Lord, Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

Alan Cox wrote:

>On Maw, 2006-02-28 at 09:38 -0500, Mark Lord wrote:
>  
>
>>The error handling still sucks, regardless of FUA.
>>All of this nonsense about "Medium Error" is pure bogosity here.
>>    
>>
>
>I've flipped my tree to report Aborted Command. Not sure there is a
>better scsi sense match for "it broke and I dont know why"
>  
>
As a user I prefer
  It Broke And I Dont Know Why
to
  Aborted Command

(honesty is the best policy)

I certainly hate Medium Error as modern hard disks seem to be flakier
than ever.

David

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28 15:34                                   ` Jeff Garzik
  2006-02-28 16:57                                     ` Eric D. Mudama
@ 2006-03-01 17:41                                     ` David Greaves
  2006-03-01 17:46                                       ` Mark Lord
  1 sibling, 1 reply; 147+ messages in thread
From: David Greaves @ 2006-03-01 17:41 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Mark Lord, Tejun Heo, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

Jeff Garzik wrote:

> Mark Lord wrote:
>
>> David Greaves wrote:
>>
>>>
>>> scsi1 : sata_sil
>>>   Vendor: ATA       Model: Maxtor 6B200M0    Rev: BANC
>>>   Type:   Direct-Access                      ANSI SCSI revision: 05
>>>   Vendor: ATA       Model: Maxtor 6B200M0    Rev: BANC
>>>   Type:   Direct-Access                      ANSI SCSI revision: 05
>>
>>
>>
>> I wonder if the non-FUA component here is the sata_sil,
>> rather than the two Maxtor drives.
>>
>> Also, your drives have different firmware,
>> but both have trouble with FUA here.
>
>
> sata_sil is indeed a piece of hardware that needs to know the opcodes
> ahead of time...
>
>     Jeff
>
I actually have 3 of those drives - one runs through sata_via and
doesn't have the same problem.

(the sata_via ones *do* have :
 ata3: status=0x50 { DriveReady SeekComplete }
 ata3: PIO error
problems with SMART)

David

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 17:41                                     ` David Greaves
@ 2006-03-01 17:46                                       ` Mark Lord
  2006-03-01 18:12                                         ` David Greaves
  0 siblings, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-03-01 17:46 UTC (permalink / raw)
  To: David Greaves
  Cc: Jeff Garzik, Tejun Heo, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

David Greaves wrote:
>
> I actually have 3 of those drives - one runs through sata_via and
> doesn't have the same problem.
> 
> (the sata_via ones *do* have :
>  ata3: status=0x50 { DriveReady SeekComplete }
>  ata3: PIO error
> problems with SMART)

And once again, not enough information in the error messages
for anyone to actually do anything about it (not David's fault).

What command do you use to get that bug to pop up?

BTW:
hdparm-6.5 is now available (sourceforge),
and should show all of the fancy features
of your drives for comparism between versions.

Cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 17:46                                       ` Mark Lord
@ 2006-03-01 18:12                                         ` David Greaves
  2006-03-01 18:30                                           ` Mark Lord
  0 siblings, 1 reply; 147+ messages in thread
From: David Greaves @ 2006-03-01 18:12 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jeff Garzik, Tejun Heo, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

Mark Lord wrote:

> David Greaves wrote:
>
>>
>> I actually have 3 of those drives - one runs through sata_via and
>> doesn't have the same problem.
>>
>> (the sata_via ones *do* have :
>>  ata3: status=0x50 { DriveReady SeekComplete }
>>  ata3: PIO error
>> problems with SMART)
>
>
> And once again, not enough information in the error messages
> for anyone to actually do anything about it (not David's fault).
>
> What command do you use to get that bug to pop up?

(FYI I'm running 2.6.15 with both 'info' patches 'cos I'm scared of
2.6.16-rc4!)

haze:/usr/src# smartctl -data -s on /dev/sdc
smartctl version 5.34 [i686-pc-linux-gnu] Copyright (C) 2002-5 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.

No messages in dmesg

haze:/usr/src# smartctl -data -o on /dev/sdc
smartctl version 5.34 [i686-pc-linux-gnu] Copyright (C) 2002-5 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
Error SMART Enable Automatic Offline failed: Input/output error
Smartctl: SMART Enable Automatic Offline Failed.

dmesg contains this message repeated 31 times:
ata3: PIO error
ata3: status=0x50 { DriveReady SeekComplete }

haze:/usr/src# smartctl -data -o off /dev/sdc
succeeds but gives me:

ata3: status=0x50 { DriveReady SeekComplete }
ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x51 { DriveReady SeekComplete Error }
ata3: error=0x04 { DriveStatusError }
ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x51 { DriveReady SeekComplete Error }
ata3: error=0x04 { DriveStatusError }
ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x51 { DriveReady SeekComplete Error }
ata3: error=0x04 { DriveStatusError }

haze:/usr/src# smartctl -data -o on /dev/sdd
smartctl version 5.34 [i686-pc-linux-gnu] Copyright (C) 2002-5 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
Error SMART Enable Automatic Offline failed: Input/output error
Smartctl: SMART Enable Automatic Offline Failed.

ata4: PIO error
ata4: status=0x50 { DriveReady SeekComplete }

# smartctl -data -o off /dev/sdd
ata4: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x51 { DriveReady SeekComplete Error }
ata4: error=0x04 { DriveStatusError }
ata4: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x51 { DriveReady SeekComplete Error }
ata4: error=0x04 { DriveStatusError }
ata4: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x51 { DriveReady SeekComplete Error }
ata4: error=0x04 { DriveStatusError }



haze:/usr/src# hdparm --Istdout /dev/sdc

/dev/sdc:
 IO_support   =  0 (default 16-bit)
 readonly     =  0 (off)
 readahead    = 256 (on)
 geometry     = 19457/255/63, sectors = 312581808, start = 0
0c5a 3fff c837 0010 0000 0000 003f 0000
0000 0000 334a 5332 4b53 4c33 2020 2020
2020 2020 2020 2020 0000 4000 0004 332e
3138 2020 2020 5354 3331 3630 3032 3341
5320 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 8010
0000 2f00 0000 0200 0200 0007 3fff 0010
003f fc10 00fb 0110 ffff 0fff 0000 0007
0003 0078 0078 00f0 0078 0000 0000 0000
0000 0000 0000 0000 0002 0000 0000 0000
007e 001b 346b 7d01 4003 3468 3c01 4003
407f 0000 0000 fefe 0000 0000 fe00 0000
0000 0000 0000 0000 9eb0 12a1 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0001 9eb0 12a1 9eb0 12a1 2020 0002 42b6
8000 008a 3c06 3c0a ffff 07c6 0100 0800
0ff0 1000 0002 0030 0000 0000 0000 fe06
0000 0002 0050 008a 954f 0000 0023 000b
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 7ea5


haze:/usr/src# hdparm --Istdout /dev/sdd

/dev/sdd:
 IO_support   =  0 (default 16-bit)
 readonly     =  0 (off)
 readahead    = 256 (on)
 geometry     = 24792/255/63, sectors = 398297088, start = 0
0040 3fff c837 0010 0000 0000 003f 0000
0000 0000 4234 3152 5643 3248 2020 2020
2020 2020 2020 2020 0003 4000 0004 4241
4e43 3142 5930 4d61 7874 6f72 2036 4232
3030 4d30 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 8010
0000 2f00 4000 0200 0000 0007 3fff 0010
003f fc10 00fb 0110 ffff 0fff 0000 0007
0003 0078 0078 0078 0078 0000 0000 0000
0000 0000 0000 001f 0102 0000 0000 0000
00fe 001e 7c6b 7f09 4063 7c68 3e01 4063
407f 0000 0000 0000 fffe 0000 c0fe 0000
0000 0000 0000 0000 8800 17bd 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0001 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0113 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 a6a5



David

>
> BTW:
> hdparm-6.5 is now available (sourceforge),
> and should show all of the fancy features
> of your drives for comparism between versions.

OK - soonish...

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 18:12                                         ` David Greaves
@ 2006-03-01 18:30                                           ` Mark Lord
  2006-03-01 18:32                                             ` Justin Piszcz
                                                               ` (3 more replies)
  0 siblings, 4 replies; 147+ messages in thread
From: Mark Lord @ 2006-03-01 18:30 UTC (permalink / raw)
  To: David Greaves
  Cc: Jeff Garzik, Tejun Heo, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

David Greaves wrote:
>
> haze:/usr/src# smartctl -data -o off /dev/sdc
> succeeds but gives me:
> 
> ata3: status=0x50 { DriveReady SeekComplete }
> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
> ata3: status=0x51 { DriveReady SeekComplete Error }
> ata3: error=0x04 { DriveStatusError }
> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
> ata3: status=0x51 { DriveReady SeekComplete Error }
> ata3: error=0x04 { DriveStatusError }
> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
> ata3: status=0x51 { DriveReady SeekComplete Error }
> ata3: error=0x04 { DriveStatusError }

"DriveStatusError" is "Command Aborted" in ac-speak.
 From the man page for smartctl, we read:

 >-o VALUE  Enables or disables SMART automatic offline test ...
 >Note that the SMART automatic offline test command is listed as "Obsolete" in every
 >version  of  the  ATA  and ATA/ATAPI Specifications.  It was originally part of the
 >SFF-8035i Revision 2.0 specification, but was never part of any ATA  specification.

There's a chance that your drives simply do not fully support this feature,
and are rejecting attempts to use it.

By the way, the latest 2.6.16-rc5-git4 is available,
and has FUA turned off by default now.  So it should
work with your drives, and *you* are expected to verify
that for us all now.

Cheers

-ml

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 18:30                                           ` Mark Lord
@ 2006-03-01 18:32                                             ` Justin Piszcz
  2006-03-01 18:33                                             ` Justin Piszcz
                                                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-03-01 18:32 UTC (permalink / raw)
  To: Mark Lord
  Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

On Wed, 1 Mar 2006, Mark Lord wrote:

> David Greaves wrote:
>> 
>> haze:/usr/src# smartctl -data -o off /dev/sdc
>> succeeds but gives me:
>> 
>> ata3: status=0x50 { DriveReady SeekComplete }
>> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>> ata3: status=0x51 { DriveReady SeekComplete Error }
>> ata3: error=0x04 { DriveStatusError }
>> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>> ata3: status=0x51 { DriveReady SeekComplete Error }
>> ata3: error=0x04 { DriveStatusError }
>> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>> ata3: status=0x51 { DriveReady SeekComplete Error }
>> ata3: error=0x04 { DriveStatusError }
>
> "DriveStatusError" is "Command Aborted" in ac-speak.
> From the man page for smartctl, we read:
>
>> -o VALUE  Enables or disables SMART automatic offline test ...
>> Note that the SMART automatic offline test command is listed as "Obsolete" 
> in every
>> version  of  the  ATA  and ATA/ATAPI Specifications.  It was originally part 
> of the
>> SFF-8035i Revision 2.0 specification, but was never part of any ATA 
> specification.
>
> There's a chance that your drives simply do not fully support this feature,
> and are rejecting attempts to use it.
>
> By the way, the latest 2.6.16-rc5-git4 is available,
> and has FUA turned off by default now.  So it should
> work with your drives, and *you* are expected to verify
> that for us all now.
>
> Cheers
>
> -ml
>

When running that command, I get it too:

[4294684.510000] ACPI: PCI Interrupt 0000:02:06.0[A] -> GSI 22 (level, 
low) -> I
RQ 17
[4294686.762000] process `syslogd' is using obsolete setsockopt 
SO_BSDCOMPAT
[4295292.736000] +++PATCH: Original kernel error:
[4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI 
SK/ASC/AS
CQ 0xb/00/00
[4295292.736000] +++PATCH: Mark Lord's extended verbosity patch:
[4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to 
SCSI
SK/ASC/ASCQ 0xb/00/00
[4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error }
[4295292.736000] ata3: error=0x04 { DriveStatusError }
[4295292.736000] +++PATCH: Original kernel error:
[4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI 
SK/ASC/AS
CQ 0xb/00/00
[4295292.736000] +++PATCH: Mark Lord's extended verbosity patch:
[4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to 
SCSI
SK/ASC/ASCQ 0xb/00/00
[4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error }
[4295292.736000] ata3: error=0x04 { DriveStatusError }
[4295292.736000] +++PATCH: Original kernel error:
[4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI 
SK/ASC/AS
CQ 0xb/00/00
[4295292.736000] +++PATCH: Mark Lord's extended verbosity patch:
[4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to 
SCSI
SK/ASC/ASCQ 0xb/00/00
[4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error }
[4295292.736000] ata3: error=0x04 { DriveStatusError }
[4295292.736000] +++PATCH: Original kernel error:
[4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI 
SK/ASC/AS
CQ 0xb/00/00
[4295292.736000] +++PATCH: Mark Lord's extended verbosity patch:
[4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to 
SCSI
SK/ASC/ASCQ 0xb/00/00
[4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error }
[4295292.736000] ata3: error=0x04 { DriveStatusError }
[4295292.736000] +++PATCH: Original kernel error:
[4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI 
SK/ASC/AS
CQ 0xb/00/00
[4295292.736000] +++PATCH: Mark Lord's extended verbosity patch:
[4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to 
SCSI
SK/ASC/ASCQ 0xb/00/00
[4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error }
[4295292.736000] ata3: error=0x04 { DriveStatusError }



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 18:30                                           ` Mark Lord
  2006-03-01 18:32                                             ` Justin Piszcz
@ 2006-03-01 18:33                                             ` Justin Piszcz
  2006-03-01 18:48                                             ` David Greaves
  2006-03-01 19:06                                             ` LibPATA code issues / 2.6.15.4 Justin Piszcz
  3 siblings, 0 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-03-01 18:33 UTC (permalink / raw)
  To: Mark Lord
  Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds



On Wed, 1 Mar 2006, Mark Lord wrote:

> David Greaves wrote:
>> 
>> haze:/usr/src# smartctl -data -o off /dev/sdc
>> succeeds but gives me:
>> 
>> ata3: status=0x50 { DriveReady SeekComplete }
>> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>> ata3: status=0x51 { DriveReady SeekComplete Error }
>> ata3: error=0x04 { DriveStatusError }
>> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>> ata3: status=0x51 { DriveReady SeekComplete Error }
>> ata3: error=0x04 { DriveStatusError }
>> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>> ata3: status=0x51 { DriveReady SeekComplete Error }
>> ata3: error=0x04 { DriveStatusError }
>
> "DriveStatusError" is "Command Aborted" in ac-speak.
> From the man page for smartctl, we read:
>
>> -o VALUE  Enables or disables SMART automatic offline test ...
>> Note that the SMART automatic offline test command is listed as "Obsolete" 
> in every
>> version  of  the  ATA  and ATA/ATAPI Specifications.  It was originally part 
> of the
>> SFF-8035i Revision 2.0 specification, but was never part of any ATA 
> specification.
>
> There's a chance that your drives simply do not fully support this feature,
> and are rejecting attempts to use it.
>
> By the way, the latest 2.6.16-rc5-git4 is available,
> and has FUA turned off by default now.  So it should
> work with your drives, and *you* are expected to verify
> that for us all now.
>
> Cheers
>
> -ml
>

Mark,

After patching to 2.6.16-rc5-git4, we should no longer see these errors 
right?  Then I can use my drive again without worrying about data loss? :)

Justin.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 17:33                                     ` David Greaves
@ 2006-03-01 18:37                                       ` Alan Cox
  2006-03-01 20:12                                         ` Phillip Susi
  0 siblings, 1 reply; 147+ messages in thread
From: Alan Cox @ 2006-03-01 18:37 UTC (permalink / raw)
  To: David Greaves
  Cc: Mark Lord, Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

On Mer, 2006-03-01 at 17:33 +0000, David Greaves wrote:
> As a user I prefer
>   It Broke And I Dont Know Why
> to
>   Aborted Command

So whats the SCSI sense encoding for that ?


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 18:30                                           ` Mark Lord
  2006-03-01 18:32                                             ` Justin Piszcz
  2006-03-01 18:33                                             ` Justin Piszcz
@ 2006-03-01 18:48                                             ` David Greaves
  2006-03-01 19:49                                               ` David Greaves
  2006-03-01 19:06                                             ` LibPATA code issues / 2.6.15.4 Justin Piszcz
  3 siblings, 1 reply; 147+ messages in thread
From: David Greaves @ 2006-03-01 18:48 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jeff Garzik, Tejun Heo, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

Mark Lord wrote:

> By the way, the latest 2.6.16-rc5-git4 is available,
> and has FUA turned off by default now.  So it should
> work with your drives, and *you* are expected to verify
> that for us all now.

Yeah, I know - I've got it on the machine... but it's my wife's machine.
I've asked nicely but she's editing a Hercule Poirot video so I'm not
allowed to reboot it for a while...

I've told her I'm not making pancakes until I've tested it so expect a
report Real Soon Now...

David


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 18:30                                           ` Mark Lord
                                                               ` (2 preceding siblings ...)
  2006-03-01 18:48                                             ` David Greaves
@ 2006-03-01 19:06                                             ` Justin Piszcz
  2006-03-01 19:28                                               ` Mark Lord
  2006-03-01 19:35                                               ` Mark Lord
  3 siblings, 2 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-03-01 19:06 UTC (permalink / raw)
  To: Mark Lord
  Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

On Wed, 1 Mar 2006, Mark Lord wrote:

> David Greaves wrote:
>> 
>> haze:/usr/src# smartctl -data -o off /dev/sdc
>> succeeds but gives me:
>> 
>> ata3: status=0x50 { DriveReady SeekComplete }
>> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>> ata3: status=0x51 { DriveReady SeekComplete Error }
>> ata3: error=0x04 { DriveStatusError }
>> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>> ata3: status=0x51 { DriveReady SeekComplete Error }
>> ata3: error=0x04 { DriveStatusError }
>> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>> ata3: status=0x51 { DriveReady SeekComplete Error }
>> ata3: error=0x04 { DriveStatusError }
>
> "DriveStatusError" is "Command Aborted" in ac-speak.
> From the man page for smartctl, we read:
>
>> -o VALUE  Enables or disables SMART automatic offline test ...
>> Note that the SMART automatic offline test command is listed as "Obsolete" 
> in every
>> version  of  the  ATA  and ATA/ATAPI Specifications.  It was originally part 
> of the
>> SFF-8035i Revision 2.0 specification, but was never part of any ATA 
> specification.
>
> There's a chance that your drives simply do not fully support this feature,
> and are rejecting attempts to use it.
>
> By the way, the latest 2.6.16-rc5-git4 is available,
> and has FUA turned off by default now.  So it should
> work with your drives, and *you* are expected to verify
> that for us all now.
>
> Cheers
>
> -ml
>

By the way, the latest 2.6.16-rc5-git4 is available,

I am using 2.6.16-rc5-git4, and after running:

# smartctl -data -o off /dev/sdc

I get:

[4294785.192000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
0xb/00/00
[4294785.192000] ata3: status=0x51 { DriveReady SeekComplete Error }
[4294785.192000] ata3: error=0x04 { DriveStatusError }
[4294785.192000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
0xb/00/00
[4294785.192000] ata3: status=0x51 { DriveReady SeekComplete Error }
[4294785.192000] ata3: error=0x04 { DriveStatusError }
[4294785.192000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
0xb/00/00
[4294785.192000] ata3: status=0x51 { DriveReady SeekComplete Error }
[4294785.192000] ata3: error=0x04 { DriveStatusError }
[4294785.192000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
0xb/00/00
[4294785.192000] ata3: status=0x51 { DriveReady SeekComplete Error }
[4294785.192000] ata3: error=0x04 { DriveStatusError }
[4294785.192000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
0xb/00/00
[4294785.192000] ata3: status=0x51 { DriveReady SeekComplete Error }
[4294785.192000] ata3: error=0x04 { DriveStatusError }

Did you mean you wanted us to test it like we normally do, ie, copy 
files/md5sum them on the disk and see if we can make it occur again, or?

Justin.


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 19:06                                             ` LibPATA code issues / 2.6.15.4 Justin Piszcz
@ 2006-03-01 19:28                                               ` Mark Lord
  2006-03-01 19:35                                               ` Mark Lord
  1 sibling, 0 replies; 147+ messages in thread
From: Mark Lord @ 2006-03-01 19:28 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

Justin Piszcz wrote:
>
> I am using 2.6.16-rc5-git4, and after running:
> 
> # smartctl -data -o off /dev/sdc
> 
> I get:
> 
> [4294785.192000] ata3: translated ATA stat/err 0x51/04 to SCSI 
> SK/ASC/ASCQ 0xb/00/00
> [4294785.192000] ata3: status=0x51 { DriveReady SeekComplete Error }
> [4294785.192000] ata3: error=0x04 { DriveStatusError }

That's probably just your drive reporting "unsupported sub-command".
Nothing serious -- the man page for smartctl even mentions the possibility.

Cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 19:06                                             ` LibPATA code issues / 2.6.15.4 Justin Piszcz
  2006-03-01 19:28                                               ` Mark Lord
@ 2006-03-01 19:35                                               ` Mark Lord
  2006-03-01 19:38                                                 ` Justin Piszcz
  1 sibling, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-03-01 19:35 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

Justin Piszcz wrote:
>
> Did you mean you wanted us to test it like we normally do, ie, copy 
> files/md5sum them on the disk and see if we can make it occur again, or?

Yes.  The S.M.A.R.T. stuff doesn't matter nearly as much as normal I/O.

And Justin, can you get those S.M.A.R.T. errors to pop up on 2.6.15 as well?

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 19:35                                               ` Mark Lord
@ 2006-03-01 19:38                                                 ` Justin Piszcz
  2006-03-01 19:41                                                   ` Jeff Garzik
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-03-01 19:38 UTC (permalink / raw)
  To: Mark Lord
  Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds



On Wed, 1 Mar 2006, Mark Lord wrote:

> Justin Piszcz wrote:
>> 
>> Did you mean you wanted us to test it like we normally do, ie, copy 
>> files/md5sum them on the disk and see if we can make it occur again, or?
>
> Yes.  The S.M.A.R.T. stuff doesn't matter nearly as much as normal I/O.
>
> And Justin, can you get those S.M.A.R.T. errors to pop up on 2.6.15 as well?
>

Have not tested, can test later if necessary, running some I/O tests to 
the disk which is probably going to take quite a while to see if I can get 
it to error again with 2.6.16-rc5-git4.

Justin.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 19:38                                                 ` Justin Piszcz
@ 2006-03-01 19:41                                                   ` Jeff Garzik
  0 siblings, 0 replies; 147+ messages in thread
From: Jeff Garzik @ 2006-03-01 19:41 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Mark Lord, David Greaves, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

Justin Piszcz wrote:
> 
> 
> On Wed, 1 Mar 2006, Mark Lord wrote:
> 
>> Justin Piszcz wrote:
>>
>>>
>>> Did you mean you wanted us to test it like we normally do, ie, copy 
>>> files/md5sum them on the disk and see if we can make it occur again, or?
>>
>>
>> Yes.  The S.M.A.R.T. stuff doesn't matter nearly as much as normal I/O.
>>
>> And Justin, can you get those S.M.A.R.T. errors to pop up on 2.6.15 as 
>> well?
>>
> 
> Have not tested, can test later if necessary, running some I/O tests to 
> the disk which is probably going to take quite a while to see if I can 
> get it to error again with 2.6.16-rc5-git4.

If there are FUA problems, it would be immediately apparent on the first 
write...

	Jeff



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 18:48                                             ` David Greaves
@ 2006-03-01 19:49                                               ` David Greaves
  2006-03-03 19:38                                                 ` Justin Piszcz
  2006-03-05 11:43                                                 ` Justin Piszcz
  0 siblings, 2 replies; 147+ messages in thread
From: David Greaves @ 2006-03-01 19:49 UTC (permalink / raw)
  To: David Greaves
  Cc: Mark Lord, Jeff Garzik, Tejun Heo, Justin Piszcz, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

David Greaves wrote:

>Mark Lord wrote:
>
>  
>
>>By the way, the latest 2.6.16-rc5-git4 is available,
>>and has FUA turned off by default now.  So it should
>>work with your drives, and *you* are expected to verify
>>that for us all now.
>>    
>>
>Yeah, I know - I've got it on the machine... but it's my wife's machine.
>I've asked nicely but she's editing a Hercule Poirot video so I'm not
>allowed to reboot it for a while...
>
>I've told her I'm not making pancakes until I've tested it so expect a
>report Real Soon Now...
>  
>
OK that worked (the pancakes - the kernel's not doing so well...)

haze:~# uname -a
Linux haze 2.6.16-rc5-git4 #2 PREEMPT Wed Mar 1 19:07:58 UTC 2006 i686
GNU/Linux

The boot is pretty clean.
I ran an xfs_repair -n on the lvm volume and got the following errors.
The repair reported a clean filesystem and the drive was not booted from
the raid so that's a big improvement.

I was not able to trigger similar messages on ata1 but a simple dd
doesn't trigger the messages on ata2 either (and for various reasons,
xfs_repair wouldn't run on ata1 - I thought I'd leave it and report this
first)

ata2: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: error=0x04 { DriveStatusError }
ata2: no sense translation for status: 0x51
ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: no sense translation for status: 0x51
ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: no sense translation for status: 0x51
ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: no sense translation for status: 0x51
ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: no sense translation for status: 0x51
ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: no sense translation for status: 0x51
ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: no sense translation for status: 0x51
ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
ata2: status=0x51 { DriveReady SeekComplete Error }

David

-- 


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 18:37                                       ` Alan Cox
@ 2006-03-01 20:12                                         ` Phillip Susi
  2006-03-08 16:46                                           ` Alan Cox
  0 siblings, 1 reply; 147+ messages in thread
From: Phillip Susi @ 2006-03-01 20:12 UTC (permalink / raw)
  To: Alan Cox
  Cc: David Greaves, Mark Lord, Tejun Heo, Jeff Garzik, Justin Piszcz,
	linux-kernel, IDE/ATA development list, albertcc, axboe, Linus

Alan Cox wrote:
> On Mer, 2006-03-01 at 17:33 +0000, David Greaves wrote:
>> As a user I prefer
>>   It Broke And I Dont Know Why
>> to
>>   Aborted Command
> 
> So whats the SCSI sense encoding for that ?
> 

Wouldn't that just be 0/0/0?  IIRC the standard defines that as "NO 
ADDITIONAL SENSE DATA" which sounds to me like another way of saying "I 
don't know what went wrong, but that didn't work".



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 19:49                                               ` David Greaves
@ 2006-03-03 19:38                                                 ` Justin Piszcz
  2006-03-03 22:46                                                   ` David Greaves
  2006-03-05 11:43                                                 ` Justin Piszcz
  1 sibling, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-03-03 19:38 UTC (permalink / raw)
  To: David Greaves
  Cc: Mark Lord, Jeff Garzik, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds



On Wed, 1 Mar 2006, David Greaves wrote:

> David Greaves wrote:
>
>> Mark Lord wrote:
>>
>>
>>
>>> By the way, the latest 2.6.16-rc5-git4 is available,
>>> and has FUA turned off by default now.  So it should
>>> work with your drives, and *you* are expected to verify
>>> that for us all now.
>>>
>>>
>> Yeah, I know - I've got it on the machine... but it's my wife's machine.
>> I've asked nicely but she's editing a Hercule Poirot video so I'm not
>> allowed to reboot it for a while...
>>
>> I've told her I'm not making pancakes until I've tested it so expect a
>> report Real Soon Now...
>>
>>
> OK that worked (the pancakes - the kernel's not doing so well...)
>
> haze:~# uname -a
> Linux haze 2.6.16-rc5-git4 #2 PREEMPT Wed Mar 1 19:07:58 UTC 2006 i686
> GNU/Linux
>
> The boot is pretty clean.
> I ran an xfs_repair -n on the lvm volume and got the following errors.
> The repair reported a clean filesystem and the drive was not booted from
> the raid so that's a big improvement.
>
> I was not able to trigger similar messages on ata1 but a simple dd
> doesn't trigger the messages on ata2 either (and for various reasons,
> xfs_repair wouldn't run on ata1 - I thought I'd leave it and report this
> first)
>
> ata2: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: error=0x04 { DriveStatusError }
> ata2: no sense translation for status: 0x51
> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: no sense translation for status: 0x51
> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: no sense translation for status: 0x51
> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: no sense translation for status: 0x51
> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: no sense translation for status: 0x51
> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: no sense translation for status: 0x51
> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: no sense translation for status: 0x51
> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
> ata2: status=0x51 { DriveReady SeekComplete Error }
>
> David
>
> -- 
>

As of 2.6.16-rc5-git4, I have written 281GB so far over a period of 48+ 
hours with no errors yet :)

Will keep you updated if I see any errors, but so far, so good!

Thanks,

Justin.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-03 19:38                                                 ` Justin Piszcz
@ 2006-03-03 22:46                                                   ` David Greaves
  2006-03-04 14:25                                                     ` Mark Lord
  0 siblings, 1 reply; 147+ messages in thread
From: David Greaves @ 2006-03-03 22:46 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Mark Lord, Jeff Garzik, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

Just FYI - I'm away (in Canada) for 2 weeks so can't do any additional
testing until I return.

David

-- 


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-03 22:46                                                   ` David Greaves
@ 2006-03-04 14:25                                                     ` Mark Lord
  2006-03-06  6:13                                                       ` David Greaves
  0 siblings, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-03-04 14:25 UTC (permalink / raw)
  To: David Greaves
  Cc: Justin Piszcz, Mark Lord, Jeff Garzik, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

David Greaves wrote:
> Just FYI - I'm away (in Canada) for 2 weeks so can't do any additional
> testing until I return.

Am I correct, in that your last test on rc5-git4 was a failure?

But without the "opcode" display in the error messages,
so we have no idea exactly what caused the errors (again!)?

[Whatcha doin up here?]

Cheers


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 19:49                                               ` David Greaves
  2006-03-03 19:38                                                 ` Justin Piszcz
@ 2006-03-05 11:43                                                 ` Justin Piszcz
  2006-03-05 12:41                                                   ` Justin Piszcz
  1 sibling, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-03-05 11:43 UTC (permalink / raw)
  To: David Greaves
  Cc: Mark Lord, Jeff Garzik, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

On Wed, 1 Mar 2006, David Greaves wrote:

> David Greaves wrote:
>
>> Mark Lord wrote:
>>
>>
>>
>>> By the way, the latest 2.6.16-rc5-git4 is available,
>>> and has FUA turned off by default now.  So it should
>>> work with your drives, and *you* are expected to verify
>>> that for us all now.
>>>
>>>
>> Yeah, I know - I've got it on the machine... but it's my wife's machine.
>> I've asked nicely but she's editing a Hercule Poirot video so I'm not
>> allowed to reboot it for a while...
>>
>> I've told her I'm not making pancakes until I've tested it so expect a
>> report Real Soon Now...
>>
>>
> OK that worked (the pancakes - the kernel's not doing so well...)
>
> haze:~# uname -a
> Linux haze 2.6.16-rc5-git4 #2 PREEMPT Wed Mar 1 19:07:58 UTC 2006 i686
> GNU/Linux
>
> The boot is pretty clean.
> I ran an xfs_repair -n on the lvm volume and got the following errors.
> The repair reported a clean filesystem and the drive was not booted from
> the raid so that's a big improvement.
>
> I was not able to trigger similar messages on ata1 but a simple dd
> doesn't trigger the messages on ata2 either (and for various reasons,
> xfs_repair wouldn't run on ata1 - I thought I'd leave it and report this
> first)
>
> ata2: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: error=0x04 { DriveStatusError }
> ata2: no sense translation for status: 0x51
> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: no sense translation for status: 0x51
> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: no sense translation for status: 0x51
> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: no sense translation for status: 0x51
> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: no sense translation for status: 0x51
> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: no sense translation for status: 0x51
> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
> ata2: status=0x51 { DriveReady SeekComplete Error }
> ata2: no sense translation for status: 0x51
> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
> ata2: status=0x51 { DriveReady SeekComplete Error }
>
> David
>
> -- 
>

Using 2.6.16-rc5-git4 and removing a directory of around 5.0GB of files 
while streaming a 1MB/s video stream on another (SATA disk), the I/O 
seemed to freeze up for a moment and I got this error:

[4342671.839000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x22

Only 1 in dmesg, any idea what causes this error?


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-05 11:43                                                 ` Justin Piszcz
@ 2006-03-05 12:41                                                   ` Justin Piszcz
  2006-03-05 22:58                                                     ` Mark Lord
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-03-05 12:41 UTC (permalink / raw)
  To: David Greaves
  Cc: Mark Lord, Jeff Garzik, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds



On Sun, 5 Mar 2006, Justin Piszcz wrote:

> On Wed, 1 Mar 2006, David Greaves wrote:
>
>> David Greaves wrote:
>> 
>>> Mark Lord wrote:
>>> 
>>> 
>>> 
>>>> By the way, the latest 2.6.16-rc5-git4 is available,
>>>> and has FUA turned off by default now.  So it should
>>>> work with your drives, and *you* are expected to verify
>>>> that for us all now.
>>>> 
>>>> 
>>> Yeah, I know - I've got it on the machine... but it's my wife's machine.
>>> I've asked nicely but she's editing a Hercule Poirot video so I'm not
>>> allowed to reboot it for a while...
>>> 
>>> I've told her I'm not making pancakes until I've tested it so expect a
>>> report Real Soon Now...
>>> 
>>> 
>> OK that worked (the pancakes - the kernel's not doing so well...)
>> 
>> haze:~# uname -a
>> Linux haze 2.6.16-rc5-git4 #2 PREEMPT Wed Mar 1 19:07:58 UTC 2006 i686
>> GNU/Linux
>> 
>> The boot is pretty clean.
>> I ran an xfs_repair -n on the lvm volume and got the following errors.
>> The repair reported a clean filesystem and the drive was not booted from
>> the raid so that's a big improvement.
>> 
>> I was not able to trigger similar messages on ata1 but a simple dd
>> doesn't trigger the messages on ata2 either (and for various reasons,
>> xfs_repair wouldn't run on ata1 - I thought I'd leave it and report this
>> first)
>> 
>> ata2: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>> ata2: status=0x51 { DriveReady SeekComplete Error }
>> ata2: error=0x04 { DriveStatusError }
>> ata2: no sense translation for status: 0x51
>> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
>> ata2: status=0x51 { DriveReady SeekComplete Error }
>> ata2: no sense translation for status: 0x51
>> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
>> ata2: status=0x51 { DriveReady SeekComplete Error }
>> ata2: no sense translation for status: 0x51
>> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
>> ata2: status=0x51 { DriveReady SeekComplete Error }
>> ata2: no sense translation for status: 0x51
>> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
>> ata2: status=0x51 { DriveReady SeekComplete Error }
>> ata2: no sense translation for status: 0x51
>> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
>> ata2: status=0x51 { DriveReady SeekComplete Error }
>> ata2: no sense translation for status: 0x51
>> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
>> ata2: status=0x51 { DriveReady SeekComplete Error }
>> ata2: no sense translation for status: 0x51
>> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04
>> ata2: status=0x51 { DriveReady SeekComplete Error }
>> 
>> David
>> 
>> -- 
>> 
>
> Using 2.6.16-rc5-git4 and removing a directory of around 5.0GB of files while 
> streaming a 1MB/s video stream on another (SATA disk), the I/O seemed to 
> freeze up for a moment and I got this error:
>
> [4342671.839000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x22
>
> Only 1 in dmesg, any idea what causes this error?
>
>

The drive it occured on was a 74GB raptor on an ICH5 controller.

[4294673.245000]   Vendor: ATA       Model: WDC WD740GD-00FL  Rev: 33.0
0000:00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA 
Controller (rev 02)



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-05 12:41                                                   ` Justin Piszcz
@ 2006-03-05 22:58                                                     ` Mark Lord
  2006-03-05 23:00                                                       ` Mark Lord
  2006-03-05 23:39                                                       ` Jeff Garzik
  0 siblings, 2 replies; 147+ messages in thread
From: Mark Lord @ 2006-03-05 22:58 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

Justin Piszcz wrote:
>
>> Using 2.6.16-rc5-git4 and removing a directory of around 5.0GB of 
>> files while streaming a 1MB/s video stream on another (SATA disk), the 
>> I/O seemed to freeze up for a moment and I got this error:
>>
>> [4342671.839000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x22
>>
>> Only 1 in dmesg, any idea what causes this error?
> 
> The drive it occured on was a 74GB raptor on an ICH5 controller.
> 
> [4294673.245000]   Vendor: ATA       Model: WDC WD740GD-00FL  Rev: 33.0
> 0000:00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA 
> Controller (rev 02)

SCSI opcode 0x35 is SYNCHRONIZE_CACHE.

Pity we don't know exactly what that got translated to by libata.
It would have been either a FLUSH_CACHE of some kind,
or possibly(?) one of the _FUA_ commands.

Cheers


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-05 22:58                                                     ` Mark Lord
@ 2006-03-05 23:00                                                       ` Mark Lord
  2006-03-05 23:19                                                         ` Justin Piszcz
  2006-03-05 23:39                                                       ` Jeff Garzik
  1 sibling, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-03-05 23:00 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

Mark Lord wrote:
> Justin Piszcz wrote:
>>
>>> Using 2.6.16-rc5-git4 and removing a directory of around 5.0GB of 
>>> files while streaming a 1MB/s video stream on another (SATA disk), 
>>> the I/O seemed to freeze up for a moment and I got this error:
>>>
>>> [4342671.839000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x22
>>>
>>> Only 1 in dmesg, any idea what causes this error?
>>
>> The drive it occured on was a 74GB raptor on an ICH5 controller.
>>
>> [4294673.245000]   Vendor: ATA       Model: WDC WD740GD-00FL  Rev: 33.0
>> 0000:00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA 
>> Controller (rev 02)
> 
> SCSI opcode 0x35 is SYNCHRONIZE_CACHE.

Oh, wait a sec.. on that path, libata actually does show the ATA opcode,
which would have been WRITE_DMA_EXT.  Not an FUA command.

Dunno what it's complaining about, though.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-05 23:00                                                       ` Mark Lord
@ 2006-03-05 23:19                                                         ` Justin Piszcz
  0 siblings, 0 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-03-05 23:19 UTC (permalink / raw)
  To: Mark Lord
  Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

On Sun, 5 Mar 2006, Mark Lord wrote:

> Mark Lord wrote:
>> Justin Piszcz wrote:
>>>
>>>> Using 2.6.16-rc5-git4 and removing a directory of around 5.0GB of 
files
>>>> while streaming a 1MB/s video stream on another (SATA disk), the I/O
>>>> seemed to freeze up for a moment and I got this error:
>>>>
>>>> [4342671.839000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x22
>>>>
>>>> Only 1 in dmesg, any idea what causes this error?
>>>
>>> The drive it occured on was a 74GB raptor on an ICH5 controller.
>>>
>>> [4294673.245000]   Vendor: ATA       Model: WDC WD740GD-00FL  Rev: 
33.0
>>> 0000:00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA
>>> Controller (rev 02)
>>
>> SCSI opcode 0x35 is SYNCHRONIZE_CACHE.
>
> Oh, wait a sec.. on that path, libata actually does show the ATA opcode,
> which would have been WRITE_DMA_EXT.  Not an FUA command.
>
> Dunno what it's complaining about, though.
>

Well I know what it was now...

The hard drive (RAPTOR/74GB failed)...

[4294685.928000] process `syslogd' is using obsolete setsockopt 
SO_BSDCOMPAT
[4342671.839000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x22
[4347012.243000] ata1: command 0x25 timeout, stat 0x50 host_stat 0x20
[4347157.486000] ata1: command 0x25 timeout, stat 0x80 host_stat 0x22
[4347157.486000] ata1: translated ATA stat/err 0x80/00 to SCSI SK/ASC/ASCQ 
0xb/4
7/00
[4347157.486000] ata1: status=0x80 { Busy }
[4347157.486000] sd 0:0:0:0: SCSI error: return code = 0x8000002
[4347157.486000] sda: Current: sense key=0xb
[4347157.486000]     ASC=0x47 ASCQ=0x0
[4347157.486000] end_request: I/O error, dev sda, sector 27646928
[4347157.486000] Buffer I/O error on device sda, logical block 3455866
[4347157.486000] ATA: abnormal status 0x80 on port 0xC007
[4347157.486000] ATA: abnormal status 0x80 on port 0xC007
[4347157.486000] ATA: abnormal status 0x80 on port 0xC007
[4347187.486000] ata1: command 0x25 timeout, stat 0x50 host_stat 0x21
[4347407.657000] ATA: abnormal status 0x80 on port 0xC007
[4347407.657000] ATA: abnormal status 0x80 on port 0xC007
[4347407.657000] ATA: abnormal status 0x80 on port 0xC007
[4347437.656000] ata1: command 0x35 timeout, stat 0x80 host_stat 0x21
[4347437.656000] ata1: translated ATA stat/err 0x80/00 to SCSI SK/ASC/ASCQ 
0xb/4
7/00
[4347437.656000] ata1: status=0x80 { Busy }
[4347437.656000] sd 0:0:0:0: SCSI error: return code = 0x8000002
[4347437.656000] sda: Current: sense key=0xb
[4347437.656000]     ASC=0x47 ASCQ=0x0
[4347437.656000] end_request: I/O error, dev sda, sector 76339746
[4347437.656000] ATA: abnormal status 0x80 on port 0xC007
[4347437.656000] ATA: abnormal status 0x80 on port 0xC007
[4347437.656000] ATA: abnormal status 0x80 on port 0xC007
[4347467.656000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x21
[4347467.656000] Device sda2 - XFS write error in file system meta-data 
block 0x
449af90 in sda2
[4347467.656000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x21
[4347467.656000] Device sda2 - XFS write error in file system meta-data 
block 0x
449af90 in sda2
[4347497.656000] ata1: command 0x25 timeout, stat 0x50 host_stat 0x21
[4347527.663000] ata1: command 0x25 timeout, stat 0x50 host_stat 0x22
[4347527.663000] Unable to handle kernel paging request at virtual address 
858f9
a70
[4347527.663000]  printing eip:
[4347527.663000] c021ff87
[4347527.663000] *pde = 00000000
[4347527.663000] Oops: 0000 [#1]
[4347527.663000] PREEMPT SMP
[4347527.663000] CPU:    0
[4347527.663000] EIP:    0060:[<c021ff87>]    Not tainted VLI
[4347527.663000] EFLAGS: 00210282   (2.6.16-rc5-git4 #3)
[4347527.663000] EIP is at xfs_dir2_block_lookup_int+0xb0/0x1e9
[4347527.663000] eax: 9b86a560   ebx: 00000000   ecx: cdc352b0   edx: 
00000000
[4347527.663000] esi: 177504f0   edi: 5e5cb7f4   ebp: 00000000   esp: 
f6c8bd18
[4347527.663000] ds: 007b   es: 007b   ss: 0068
[4347527.663000] Process nfsd (pid: 1359, threadinfo=f6c8a000 
task=f7c14030)
[4347527.663000] Stack: <0>00000000 c91fa944 00000000 021a0480 00000000 
f6c8bd64
  00000000 f6c8bd84
[4347527.663000]        f6c8bd88 f6c8bdac c73e7438 f6f916c0 00000004 
f7dbc800 00
000000 f3aa2000
[4347527.663000]        61a5869b c91fa9ac f7db9380 c73e7438 00000000 
c91fa944 f6
c8bdac 00000000
[4347527.663000] Call Trace:
[4347527.663000]  [<c02200da>] xfs_dir2_block_lookup+0x1a/0xa1
[4347527.663000]  [<c021f721>] xfs_dir2_lookup+0xd3/0x151
[4347527.663000]  [<c035e9d3>] ip_output+0x171/0x2de
[4347527.663000]  [<c035e1c9>] ip_finish_output+0x0/0x22d
[4347527.663000]  [<c024e836>] xfs_dir_lookup_int+0x40/0x125
[4347527.663000]  [<c0150b0d>] cache_alloc_refill+0xf1/0x50c
[4347527.663000]  [<c0252b39>] xfs_lookup+0x5f/0x88
[4347527.663000]  [<c02613cc>] linvfs_lookup+0x52/0x99
[4347527.663000]  [<c0161563>] __lookup_hash+0xc4/0xf3
[4347527.663000]  [<c016160f>] lookup_one_len+0x7d/0x84
[4347527.663000]  [<c01ad6c7>] nfsd_lookup+0xc0/0x4b2
[4347527.663000]  [<c01b4bcd>] nfsd3_proc_lookup+0xa5/0xf3
[4347527.663000]  [<c01a9497>] nfsd_dispatch+0x9c/0x214
[4347527.663000]  [<c039fb21>] svc_process+0x3bf/0x69e
[4347527.663000]  [<c01a97bc>] nfsd+0x1ad/0x331
[4347527.663000]  [<c01a960f>] nfsd+0x0/0x331
[4347527.663000]  [<c0100e95>] kernel_thread_helper+0x5/0xb
[4347527.663000] Code: 89 44 24 40 89 c2 0f ca 8d 04 d5 00 00 00 00 29 c6 
8d 42
ff 8b 4c 24 24 8b 79 14 31 d2 eb 07 8d 51 01 39 c2 7f 17 8d 0c 02 d1 f9 
<8b> 1c
ce 0f cb 39 df 74 2a 77 e9 8d 41 ff 39 c2 7e e9 8b 74 24
[4347527.663000]
[4347527.663000]  <4>ATA: abnormal status 0x80 on port 0xC007
[4347567.674000] ATA: abnormal status 0x80 on port 0xC007
[4347567.674000] ATA: abnormal status 0x80 on port 0xC007
[4347597.674000] ata1: command 0x35 timeout, stat 0x80 host_stat 0x21
[4347597.674000] ata1: translated ATA stat/err 0x80/00 to SCSI SK/ASC/ASCQ 
0xb/4
7/00
[4347597.674000] ata1: status=0x80 { Busy }
[4347597.674000] sd 0:0:0:0: SCSI error: return code = 0x8000002
[4347597.674000] sda: Current: sense key=0xb
[4347597.674000]     ASC=0x47 ASCQ=0x0
[4347597.674000] end_request: I/O error, dev sda, sector 4401810
[4347597.674000] ATA: abnormal status 0x80 on port 0xC007
[4347597.674000] ATA: abnormal status 0x80 on port 0xC007
[4347597.674000] ATA: abnormal status 0x80 on port 0xC007
[4347627.674000] ata1: command 0x35 timeout, stat 0x80 host_stat 0x21
[4347627.674000] ata1: translated ATA stat/err 0x80/00 to SCSI SK/ASC/ASCQ 
0xb/4
7/00
[4347627.674000] ata1: status=0x80 { Busy }
[4347627.674000] sd 0:0:0:0: SCSI error: return code = 0x8000002
[4347627.674000] sda: Current: sense key=0xb
[4347627.674000]     ASC=0x47 ASCQ=0x0
[4347627.674000] end_request: I/O error, dev sda, sector 110074018
[4347627.674000] ATA: abnormal status 0x80 on port 0xC007
[4347627.674000] ATA: abnormal status 0x80 on port 0xC007
[4347627.674000] ATA: abnormal status 0x80 on port 0xC007

..

ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x40 { UncorrectableError }
SCSI error : <0 0 0 0> return code = 0x8000002
sda: Current: sense key=0x3
     ASC=0x11 ASCQ=0x4
end_request: I/O error, dev sda, sector 66006018
Buffer I/O error on device sda2, logical block 61604208
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x40 { UncorrectableError }
SCSI error : <0 0 0 0> return code = 0x8000002
sda: Current: sense key=0x3
     ASC=0x11 ASCQ=0x4
end_request: I/O error, dev sda, sector 66006019
Buffer I/O error on device sda2, logical block 61604209
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x40 { UncorrectableError }
SCSI error : <0 0 0 0> return code = 0x8000002
sda: Current: sense key=0x3
     ASC=0x11 ASCQ=0x4
end_request: I/O error, dev sda, sector 66006020
Buffer I/O error on device sda2, logical block 61604210
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x40 { UncorrectableError }
SCSI error : <0 0 0 0> return code = 0x8000002
sda: Current: sense key=0x3
     ASC=0x11 ASCQ=0x4
end_request: I/O error, dev sda, sector 66006021
Buffer I/O error on device sda2, logical block 61604211
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x40 { UncorrectableError }
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x40 { UncorrectableError }
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x40 { UncorrectableError }
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x40 { UncorrectableError }
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x40 { UncorrectableError }
SCSI error : <0 0 0 0> return code = 0x8000002
sda: Current: sense key=0x3
     ASC=0x11 ASCQ=0x4
end_request: I/O error, dev sda, sector 66006018
Buffer I/O error on device sda2, logical block 61604208
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x40 { UncorrectableError }
SCSI error : <0 0 0 0> return code = 0x8000002
sda: Current: sense key=0x3
     ASC=0x11 ASCQ=0x4
end_request: I/O error, dev sda, sector 66006019

..

I later ran mkfs.ext2 -c /dev/sda and it kept returning errors such as 
these:

ata3: status=0x51 { DriveReady SeekComplete Error }
ata3: error=0x40 { UncorrectableError }
ata3: status=0x51 { DriveReady SeekComplete Error }
ata3: error=0x40 { UncorrectableError }
ata3: status=0x51 { DriveReady SeekComplete Error }
ata3: error=0x40 { UncorrectableError }
ata3: status=0x51 { DriveReady SeekComplete Error }
ata3: error=0x40 { UncorrectableError }
ata3: status=0x51 { DriveReady SeekComplete Error }
ata3: error=0x40 { UncorrectableError }
SCSI error : <2 0 0 0> return code = 0x8000002
sda: Current: sense key=0x3
     ASC=0x11 ASCQ=0x4
end_request: I/O error, dev sda, sector 66006016

I ran WD's tool on the drive, it confirmed it had problems.

Luckily I have a spare raptor and restored from backup and I am now back 
up and running with no errors yet.

Justin.


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-05 22:58                                                     ` Mark Lord
  2006-03-05 23:00                                                       ` Mark Lord
@ 2006-03-05 23:39                                                       ` Jeff Garzik
  2006-04-21 19:14                                                         ` LibPATA code issues / 2.6.16 (previously, 2.6.15.x) Justin Piszcz
  1 sibling, 1 reply; 147+ messages in thread
From: Jeff Garzik @ 2006-03-05 23:39 UTC (permalink / raw)
  To: Mark Lord
  Cc: Justin Piszcz, David Greaves, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

Mark Lord wrote:
> SCSI opcode 0x35 is SYNCHRONIZE_CACHE.
> 
> Pity we don't know exactly what that got translated to by libata.

Gave up on reading code?  If not, we know exactly what it was translated 
into.

	Jeff


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-04 14:25                                                     ` Mark Lord
@ 2006-03-06  6:13                                                       ` David Greaves
  2006-03-21 18:11                                                         ` David Greaves
  0 siblings, 1 reply; 147+ messages in thread
From: David Greaves @ 2006-03-06  6:13 UTC (permalink / raw)
  To: Mark Lord
  Cc: Justin Piszcz, Mark Lord, Jeff Garzik, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

Mark Lord wrote:
> David Greaves wrote:
>> Just FYI - I'm away (in Canada) for 2 weeks so can't do any additional
>> testing until I return.
>
> Am I correct, in that your last test on rc5-git4 was a failure?
It was *much* better than rc4 but it did have an error.
I *think* the problem I'm seeing is likely to be similar to the one I 
orginally reported (on 2.6.15 IIRC)
Same sporadic warning/error which didn't usually trigger the 
raid-boot-the-disk behaviour that the FUA code seemed to.
> But without the "opcode" display in the error messages,
> so we have no idea exactly what caused the errors (again!)?
Yes. I thought the/a opcode-verbose patch was in  there but I  guess not.
I don't have remote console access to the machine so wouldn't be able to 
carry out reliable kernel tests - sorry.
Of course I'll do this as soon as I return.
>
> [Whatcha doin up here?]
[:) 2weeks skiing in Whistler (this time - 10 days canadian canoeing in 
Algonquin last time!)
Canada's great !!]

David


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-28 21:04                                   ` Bill Davidsen
@ 2006-03-08  2:57                                     ` Mark Lord
  2006-03-08  3:18                                       ` Dave Jones
  2006-03-08 15:37                                       ` Bill Davidsen
  0 siblings, 2 replies; 147+ messages in thread
From: Mark Lord @ 2006-03-08  2:57 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Jeff Garzik, linux-kernel, IDE/ATA development list, axboe, albertcc

Bill Davidsen wrote:
>
> Is there a version of that which will build on x86? I grabbed the 
> version offered at freshmeat, but it won't compile on any x86 distro or 
> gcc version to which I have access. RH8, RH9, FC1, FC3, FC4, ubuntu... 
> with or without using the suggested alternate header.

hdparm-6.5 is the current version now.  Both it, and 6.4,
build/install/run cleanly on Ubunutu-5.10, Debian-Sarge,
and SLES9-SP3.

You seem to be having trouble on only Redhat distros..
I guess they've done something unfriendly again.

Care to be more specific about what Redhat is doing?

Cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-08  2:57                                     ` Mark Lord
@ 2006-03-08  3:18                                       ` Dave Jones
  2006-03-08  3:23                                         ` Mark Lord
  2006-03-08 15:37                                       ` Bill Davidsen
  1 sibling, 1 reply; 147+ messages in thread
From: Dave Jones @ 2006-03-08  3:18 UTC (permalink / raw)
  To: Mark Lord
  Cc: Bill Davidsen, Jeff Garzik, linux-kernel,
	IDE/ATA development list, axboe, albertcc

On Tue, Mar 07, 2006 at 09:57:07PM -0500, Mark Lord wrote:
 > Bill Davidsen wrote:
 > >
 > >Is there a version of that which will build on x86? I grabbed the 
 > >version offered at freshmeat, but it won't compile on any x86 distro or 
 > >gcc version to which I have access. RH8, RH9, FC1, FC3, FC4, ubuntu... 
 > >with or without using the suggested alternate header.
 > 
 > hdparm-6.5 is the current version now.  Both it, and 6.4,
 > build/install/run cleanly on Ubunutu-5.10, Debian-Sarge,
 > and SLES9-SP3.
 > 
 > You seem to be having trouble on only Redhat distros..
 > I guess they've done something unfriendly again.
 > 
 > Care to be more specific about what Redhat is doing?

looks like our userspace includes aren't up to date with some of the kernel
changes, so currently they're lacking the ide_task_request_t and related
taskfile bits.

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=184349

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-08  3:18                                       ` Dave Jones
@ 2006-03-08  3:23                                         ` Mark Lord
  0 siblings, 0 replies; 147+ messages in thread
From: Mark Lord @ 2006-03-08  3:23 UTC (permalink / raw)
  To: Dave Jones, Mark Lord, Bill Davidsen, Jeff Garzik, linux-kernel,
	IDE/ATA development list, axboe, albertcc

Dave Jones wrote:
>
> looks like our userspace includes aren't up to date with some of the kernel
> changes, so currently they're lacking the ide_task_request_t and related
> taskfile bits.
> 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=184349

Ahh.. Thanks, Dave.

hdparm-6.6 being released *now*, with that stuff #ifdef'd out when
the necessary header structs are missing.

It builds/runs for me, on RHEL4 at least.

Cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-08  2:57                                     ` Mark Lord
  2006-03-08  3:18                                       ` Dave Jones
@ 2006-03-08 15:37                                       ` Bill Davidsen
  1 sibling, 0 replies; 147+ messages in thread
From: Bill Davidsen @ 2006-03-08 15:37 UTC (permalink / raw)
  To: Mark Lord
  Cc: Bill Davidsen, Jeff Garzik, linux-kernel,
	IDE/ATA development list, axboe, albertcc

On Tue, 7 Mar 2006, Mark Lord wrote:

> Bill Davidsen wrote:
> >
> > Is there a version of that which will build on x86? I grabbed the 
> > version offered at freshmeat, but it won't compile on any x86 distro or 
> > gcc version to which I have access. RH8, RH9, FC1, FC3, FC4, ubuntu... 
> > with or without using the suggested alternate header.
> 
> hdparm-6.5 is the current version now.  Both it, and 6.4,
> build/install/run cleanly on Ubunutu-5.10, Debian-Sarge,
> and SLES9-SP3.
> 
> You seem to be having trouble on only Redhat distros..
> I guess they've done something unfriendly again.
> 
> Care to be more specific about what Redhat is doing?

I'll mail you the first few hundred errors from the compiler after I go 
find 6.5 and try that. My ubuntu tester reported similar results, so I'm 
not sure what we are doing.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
Doing interesting things with little computers since 1979


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 20:12                                         ` Phillip Susi
@ 2006-03-08 16:46                                           ` Alan Cox
  0 siblings, 0 replies; 147+ messages in thread
From: Alan Cox @ 2006-03-08 16:46 UTC (permalink / raw)
  To: Phillip Susi
  Cc: David Greaves, Mark Lord, Tejun Heo, Jeff Garzik, Justin Piszcz,
	linux-kernel, IDE/ATA development list, albertcc, axboe, Linus

On Mer, 2006-03-01 at 15:12 -0500, Phillip Susi wrote:
> >>   It Broke And I Dont Know Why
> >> to
> >>   Aborted Command
> > 
> > So whats the SCSI sense encoding for that ?
> > 
> 
> Wouldn't that just be 0/0/0?  IIRC the standard defines that as "NO 
> ADDITIONAL SENSE DATA" which sounds to me like another way of saying "I 
> don't know what went wrong, but that didn't work".

The 0/0/0 sense is already used. The question is what error do you use
with that sense. At the moment I'm using aborted command.


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-06  6:13                                                       ` David Greaves
@ 2006-03-21 18:11                                                         ` David Greaves
  2006-03-22 15:23                                                           ` David Greaves
  0 siblings, 1 reply; 147+ messages in thread
From: David Greaves @ 2006-03-21 18:11 UTC (permalink / raw)
  To: Mark Lord
  Cc: Justin Piszcz, Jeff Garzik, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

David Greaves wrote:

> Mark Lord wrote:
>
>> David Greaves wrote:
>>
>>> Just FYI - I'm away (in Canada) for 2 weeks so can't do any additional
>>> testing until I return.
>>
>>
>> Am I correct, in that your last test on rc5-git4 was a failure?
>
> It was *much* better than rc4 but it did have an error.
> I *think* the problem I'm seeing is likely to be similar to the one I
> orginally reported (on 2.6.15 IIRC)
> Same sporadic warning/error which didn't usually trigger the
> raid-boot-the-disk behaviour that the FUA code seemed to.
>
>> But without the "opcode" display in the error messages,
>> so we have no idea exactly what caused the errors (again!)?
>
> Yes. I thought the/a opcode-verbose patch was in  there but I  guess not.
> I don't have remote console access to the machine so wouldn't be able
> to carry out reliable kernel tests - sorry.
> Of course I'll do this as soon as I return.

Hi

Back now :)

I've upgraded to 2.6.16 and applied your verbosity patches.

I've persuaded my array to re-assemble and during the resync I got these
messages

dmesg:
ata1: translated op=0x28 cmd=0x25 ATA stat/err 0x51/04 to SCSI
SK/ASC/ASCQ 0xb/00/00
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
...(18mins later)
ata1: no sense translation for op=0x28 cmd=0x25 status: 0x51
ata1: translated op=0x28 cmd=0x25 ATA stat/err 0x51/00 to SCSI
SK/ASC/ASCQ 0x3/11/04
ata1: status=0x51 { DriveReady SeekComplete Error }

smartd is not running
This did not cause the raid subsystem to boot the disk (thank goodness!)

David


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-21 18:11                                                         ` David Greaves
@ 2006-03-22 15:23                                                           ` David Greaves
  0 siblings, 0 replies; 147+ messages in thread
From: David Greaves @ 2006-03-22 15:23 UTC (permalink / raw)
  To: Mark Lord
  Cc: Justin Piszcz, Jeff Garzik, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds

David Greaves wrote:

>I've upgraded to 2.6.16 and applied your verbosity patches.
>
>I've persuaded my array to re-assemble and during the resync I got these
>messages
>
>dmesg:
>ata1: translated op=0x28 cmd=0x25 ATA stat/err 0x51/04 to SCSI
>SK/ASC/ASCQ 0xb/00/00
>ata1: status=0x51 { DriveReady SeekComplete Error }
>ata1: error=0x04 { DriveStatusError }
>...(18mins later)
>ata1: no sense translation for op=0x28 cmd=0x25 status: 0x51
>ata1: translated op=0x28 cmd=0x25 ATA stat/err 0x51/00 to SCSI
>SK/ASC/ASCQ 0x3/11/04
>ata1: status=0x51 { DriveReady SeekComplete Error }
>
>smartd is not running
>This did not cause the raid subsystem to boot the disk (thank goodness!)
>  
>
Just providing a little more followon information...

I have had a further 52 of these messages over the last day. No obvious
cause.
Mar 22 13:14:55 haze kernel: ata2: no sense translation for op=0x28
cmd=0x25 status: 0x51
Mar 22 13:14:55 haze kernel: ata2: status=0x51 { DriveReady SeekComplete
Error }

Most recently this happened:

Mar 22 13:47:09 haze kernel: ata2: no sense translation for op=0x28
cmd=0x25 status: 0x51
Mar 22 13:47:09 haze kernel: ata2: status=0x51 { DriveReady SeekComplete
Error }
Mar 22 13:47:09 haze kernel: sd 1:0:0:0: SCSI error: return code = 0x8000002
Mar 22 13:47:09 haze kernel: sdb: Current: sense key: Medium Error
Mar 22 13:47:09 haze kernel:     Additional sense: Unrecovered read
error - auto reallocate failed
Mar 22 13:47:09 haze kernel: end_request: I/O error, dev sdb, sector
396518289

with dmesg piping up with:
raid1: sdb2: rescheduling sector 5801424
raid1: sdd2: redirecting sector 5801424 to another mirror

no drives were kicked from the array.

David

-- 


^ permalink raw reply	[flat|nested] 147+ messages in thread

* LibPATA code issues / 2.6.16 (previously, 2.6.15.x)
  2006-03-05 23:39                                                       ` Jeff Garzik
@ 2006-04-21 19:14                                                         ` Justin Piszcz
  2006-04-21 19:18                                                           ` Jeff Garzik
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-04-21 19:14 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Mark Lord, David Greaves, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds,
	smartmontools-support

Yet a new problem, under 2.6.16, when I fill up the disk, smartmontools 
reports this:

Apr 21 14:24:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable
(pending) sectors
Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable
(pending) sectors
Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Offline uncorrectable
sectors

What made it error under 2.6.16?

$ time dd if=/dev/zero of=file.out
dd: writing to `file.out': No space left on device
781118873+0 records in
781118872+0 records out
399932862464 bytes (400 GB) copied, 8873.06 seconds, 45.1 MB/s

real    147m53.092s
user    8m1.395s
sys     42m4.500s

$

Under 2.6.15.x, I did not see this behavior, is this going bad, or?

Thanks,

Justin.


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x)
  2006-04-21 19:14                                                         ` LibPATA code issues / 2.6.16 (previously, 2.6.15.x) Justin Piszcz
@ 2006-04-21 19:18                                                           ` Jeff Garzik
  2006-04-21 19:28                                                             ` Linus Torvalds
  0 siblings, 1 reply; 147+ messages in thread
From: Jeff Garzik @ 2006-04-21 19:18 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Mark Lord, David Greaves, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, Linus Torvalds,
	smartmontools-support

Justin Piszcz wrote:
> Yet a new problem, under 2.6.16, when I fill up the disk, smartmontools 
> reports this:
> 
> Apr 21 14:24:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable
> (pending) sectors
> Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable
> (pending) sectors
> Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Offline uncorrectable
> sectors
> 
> What made it error under 2.6.16?
> 
> $ time dd if=/dev/zero of=file.out
> dd: writing to `file.out': No space left on device
> 781118873+0 records in
> 781118872+0 records out
> 399932862464 bytes (400 GB) copied, 8873.06 seconds, 45.1 MB/s
> 
> real    147m53.092s
> user    8m1.395s
> sys     42m4.500s
> 
> $
> 
> Under 2.6.15.x, I did not see this behavior, is this going bad, or?

That's a disk-level problem.  You've got bad sectors.

You can force the disk to replace the bad sectors by doing a disk-level 
write:

	dd if=/dev/zero of=/dev/sda1 bs=4k

and then test the disk with

	smartctl -d ata -t long /dev/sda

If sectors continue to die, the disk is toast.

	Jeff




^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x)
  2006-04-21 19:18                                                           ` Jeff Garzik
@ 2006-04-21 19:28                                                             ` Linus Torvalds
  2006-04-21 22:46                                                               ` Jeff Garzik
  0 siblings, 1 reply; 147+ messages in thread
From: Linus Torvalds @ 2006-04-21 19:28 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Justin Piszcz, Mark Lord, David Greaves, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, smartmontools-support



On Fri, 21 Apr 2006, Jeff Garzik wrote:
> 
> You can force the disk to replace the bad sectors by doing a disk-level write:
> 
> 	dd if=/dev/zero of=/dev/sda1 bs=4k

NOTE! Obviously don't do this before you've backed up the disk.  Depending 
on the filesystem, you might just have overwritten something important, or 
just your pr0n collection ;)

Jeff, please be a little more careful about telling people commands like 
that. Some people might cut-and-paste the command without realizing what 
it's doing as a way to "fix" their problem.

			Linus

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x)
  2006-04-21 19:28                                                             ` Linus Torvalds
@ 2006-04-21 22:46                                                               ` Jeff Garzik
  2006-04-22  0:05                                                                 ` Linus Torvalds
  0 siblings, 1 reply; 147+ messages in thread
From: Jeff Garzik @ 2006-04-21 22:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Justin Piszcz, Mark Lord, David Greaves, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, smartmontools-support

Linus Torvalds wrote:
> 
> On Fri, 21 Apr 2006, Jeff Garzik wrote:
>> You can force the disk to replace the bad sectors by doing a disk-level write:
>>
>> 	dd if=/dev/zero of=/dev/sda1 bs=4k
> 
> NOTE! Obviously don't do this before you've backed up the disk.  Depending 
> on the filesystem, you might just have overwritten something important, or 
> just your pr0n collection ;)
> 
> Jeff, please be a little more careful about telling people commands like 
> that. Some people might cut-and-paste the command without realizing what 
> it's doing as a way to "fix" their problem.

Agreed, though the original poster had already done a 400GB dd from 
/dev/zero...

	Jeff




^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x)
  2006-04-21 22:46                                                               ` Jeff Garzik
@ 2006-04-22  0:05                                                                 ` Linus Torvalds
  2006-05-06 15:09                                                                   ` [smartmontools-support]Re: " Leon Woestenberg
  2006-06-11 11:13                                                                   ` Justin Piszcz
  0 siblings, 2 replies; 147+ messages in thread
From: Linus Torvalds @ 2006-04-22  0:05 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Justin Piszcz, Mark Lord, David Greaves, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, smartmontools-support



On Fri, 21 Apr 2006, Jeff Garzik wrote:

> 
> Agreed, though the original poster had already done a 400GB dd from
> /dev/zero...

Yes, but to a _file_ on the partition (ie he didn't overwrite any existign 
data, just the empty parts of a filesystem).

I realize that it's not enough for the "re-allocate on write" behaviour, 
and for that you really _do_ need to re-write the whole disk to get all 
the broken blocks reallocated, but my argument was just that we should 
make sure to _tell_ people when they are overwriting all their old data ;)

		Linus

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [smartmontools-support]Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x)
  2006-04-22  0:05                                                                 ` Linus Torvalds
@ 2006-05-06 15:09                                                                   ` Leon Woestenberg
  2006-05-07 12:44                                                                     ` Ingo Oeser
  2006-06-11 11:13                                                                   ` Justin Piszcz
  1 sibling, 1 reply; 147+ messages in thread
From: Leon Woestenberg @ 2006-05-06 15:09 UTC (permalink / raw)
  To: Linus Torvalds, smartmontools-support
  Cc: Jeff Garzik, Justin Piszcz, Mark Lord, David Greaves, Tejun Heo,
	linux-kernel, IDE/ATA development list, albertcc, axboe

Hi all,

On Fri, 2006-04-21 at 17:05 -0700, Linus Torvalds wrote:
> 
> On Fri, 21 Apr 2006, Jeff Garzik wrote:
> 
> > 
> > Agreed, though the original poster had already done a 400GB dd from
> > /dev/zero...
> 
> Yes, but to a _file_ on the partition (ie he didn't overwrite any existign 
> data, just the empty parts of a filesystem).
> 
> I realize that it's not enough for the "re-allocate on write" behaviour, 
> and for that you really _do_ need to re-write the whole disk to get all 
> the broken blocks reallocated, but my argument was just that we should 
> make sure to _tell_ people when they are overwriting all their old data ;)
> 
I did not realize this before, and asked badblocks maintainer Theodore
if badblocks /some/file was supported (the man page says no); but of
course any filesystem can decide to re-allocate blocks for a file.

However, for large files where parts may be bad sectors, I am still
searching for a way to read, then re-write every physical sector
occupied by the file. 

With the purpose to remap the bad sectors inside large MPEG files (where
I would rather have a few zeroed holes than a read error in them).

Anyone know such tooling exists? I suspect it has to use filesystem
specific IOCTL's to query for the blocks involved.

Regards,

Leon


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [smartmontools-support]Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x)
  2006-05-06 15:09                                                                   ` [smartmontools-support]Re: " Leon Woestenberg
@ 2006-05-07 12:44                                                                     ` Ingo Oeser
  0 siblings, 0 replies; 147+ messages in thread
From: Ingo Oeser @ 2006-05-07 12:44 UTC (permalink / raw)
  To: Leon Woestenberg
  Cc: Linus Torvalds, smartmontools-support, Jeff Garzik,
	Justin Piszcz, Mark Lord, David Greaves, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe

On Saturday, 6. May 2006 17:09, Leon Woestenberg wrote:
> However, for large files where parts may be bad sectors, I am still
> searching for a way to read, then re-write every physical sector
> occupied by the file. 
> 
> With the purpose to remap the bad sectors inside large MPEG files (where
> I would rather have a few zeroed holes than a read error in them).

This much easier to solve in the player software:
do {
	ret = read(fd, buffer, size)
	if (ret > 0) {
		playbuffer(buffer, ret)
	} else if (ret < 0) {
		switch(errno) {
		case EIO:
			playbuffer(allzeroesbuffer, size);
			/* skip over this frame because of disk problems */
			lseek(fd, size, SEEK_CUR);
			/* TODO: Handle return or lseek() here */
		}
	}
} while(ret != 0)

> Anyone know such tooling exists? I suspect it has to use filesystem
> specific IOCTL's to query for the blocks involved.

The (somewhat) portable ioctl() FIBMAP would suffice. 
That way you find out what blocks are this file is mapped to,
and could add some of these blocks to the badblock list of e2fsck.

Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x)
  2006-04-22  0:05                                                                 ` Linus Torvalds
  2006-05-06 15:09                                                                   ` [smartmontools-support]Re: " Leon Woestenberg
@ 2006-06-11 11:13                                                                   ` Justin Piszcz
  1 sibling, 0 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-06-11 11:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Garzik, Mark Lord, David Greaves, Tejun Heo, linux-kernel,
	IDE/ATA development list, albertcc, axboe, smartmontools-support

[4597362.011000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
0xb/00/00
[4597362.011000] ata3: status=0x51 { DriveReady SeekComplete Error }
[4597362.011000] ata3: error=0x04 { DriveStatusError }

Now under 2.6.16.20. (was doing an rsync from 1 drive (IDE) -> to this 
SATA) drive.

The SATA drive AFAIK does not have any issues, no bad sectors/etc, still 
the same drive as before, but this is the new one from the previous RMA.

Just FYI.


On Fri, 21 Apr 2006, Linus Torvalds wrote:

>
>
> On Fri, 21 Apr 2006, Jeff Garzik wrote:
>
>>
>> Agreed, though the original poster had already done a 400GB dd from
>> /dev/zero...
>
> Yes, but to a _file_ on the partition (ie he didn't overwrite any existign
> data, just the empty parts of a filesystem).
>
> I realize that it's not enough for the "re-allocate on write" behaviour,
> and for that you really _do_ need to re-write the whole disk to get all
> the broken blocks reallocated, but my argument was just that we should
> make sure to _tell_ people when they are overwriting all their old data ;)
>
> 		Linus
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-02-19 17:16               ` Sander
@ 2006-07-06 23:08                 ` Justin Piszcz
  2006-07-07 13:08                   ` Mark Lord
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-07-06 23:08 UTC (permalink / raw)
  To: Sander; +Cc: Mark Lord, Jeff Garzik, linux-kernel, IDE/ATA development list

Look at this:

>From smartctl, look at the correspondence:
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always 
-       4

[4301946.802000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
0xb/00/00
[4301946.802000] ata4: status=0x51 { DriveReady SeekComplete Error }
[4301946.802000] ata4: error=0x04 { DriveStatusError }
[4302380.482000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
0xb/00/00
[4302380.482000] ata4: status=0x51 { DriveReady SeekComplete Error }
[4302380.482000] ata4: error=0x04 { DriveStatusError }
[4302493.664000] ata4: no sense translation for status: 0x51
[4302493.664000] ata4: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 
0xb/00/00
[4302493.664000] ata4: status=0x51 { DriveReady SeekComplete Error }
[4302863.673000] ata4: no sense translation for status: 0x51
[4302863.673000] ata4: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 
0xb/00/00
[4302863.673000] ata4: status=0x51 { DriveReady SeekComplete Error }

different drive, different cable, same controller, but second port

So that Stat/err = UDMA_CRC_Error_Count!

Not sure if we can fix what is causing it (in Linux) but just FYI.


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-07-06 23:08                 ` Justin Piszcz
@ 2006-07-07 13:08                   ` Mark Lord
  2006-07-07 13:24                     ` Justin Piszcz
  0 siblings, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-07-07 13:08 UTC (permalink / raw)
  To: Justin Piszcz, Sander; +Cc: Jeff Garzik, linux-kernel, IDE/ATA development list

Justin Piszcz wrote:
> Look at this:
> 
>> From smartctl, look at the correspondence:
> 199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always 
> -       4
> 
> [4301946.802000] ata4: translated ATA stat/err 0x51/04 to SCSI 
> SK/ASC/ASCQ 0xb/00/00
> [4301946.802000] ata4: status=0x51 { DriveReady SeekComplete Error }
> [4301946.802000] ata4: error=0x04 { DriveStatusError }
> [4302380.482000] ata4: translated ATA stat/err 0x51/04 to SCSI 
> SK/ASC/ASCQ 0xb/00/00
> [4302380.482000] ata4: status=0x51 { DriveReady SeekComplete Error }
> [4302380.482000] ata4: error=0x04 { DriveStatusError }
> [4302493.664000] ata4: no sense translation for status: 0x51
> [4302493.664000] ata4: translated ATA stat/err 0x51/00 to SCSI 
> SK/ASC/ASCQ 0xb/00/00
> [4302493.664000] ata4: status=0x51 { DriveReady SeekComplete Error }
> [4302863.673000] ata4: no sense translation for status: 0x51
> [4302863.673000] ata4: translated ATA stat/err 0x51/00 to SCSI 
> SK/ASC/ASCQ 0xb/00/00
> [4302863.673000] ata4: status=0x51 { DriveReady SeekComplete Error }
> 
> different drive, different cable, same controller, but second port
> 
> So that Stat/err = UDMA_CRC_Error_Count!

No, I don't think it is -- there's a bit in the drive status
for indicating CRC errors, and it is not showing up here.

I think it's still just libata sending some command that this
drive does not implement.  You really need to dump out the failed
ATA opcode.

I *think* this (uncompiled, untested) patch may do it for you on 2.6.16/17:

--- linux/drivers/scsi/libata-scsi.c.orig	2006-06-19 10:37:03.000000000 -0400
+++ linux/drivers/scsi/libata-scsi.c	2006-07-07 09:06:57.000000000 -0400
@@ -542,6 +542,7 @@
 	struct ata_taskfile *tf = &qc->tf;
 	unsigned char *sb = cmd->sense_buffer;
 	unsigned char *desc = sb + 8;
+	unsigned char ata_op = tf->command;
 
 	memset(sb, 0, SCSI_SENSE_BUFFERSIZE);
 
@@ -558,6 +559,7 @@
 	 * onto sense key, asc & ascq.
 	 */
 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
+		printk(KERN_WARN "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op);
 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
 				   &sb[1], &sb[2], &sb[3]);
 		sb[1] &= 0x0f;

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-07-07 13:08                   ` Mark Lord
@ 2006-07-07 13:24                     ` Justin Piszcz
  2006-07-07 13:43                       ` Mark Lord
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-07-07 13:24 UTC (permalink / raw)
  To: Mark Lord; +Cc: Sander, Jeff Garzik, linux-kernel, IDE/ATA development list

On Fri, 7 Jul 2006, Mark Lord wrote:

> Justin Piszcz wrote:
>> Look at this:
>>
>>> From smartctl, look at the correspondence:
>> 199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always
>> -       4
>>
>> [4301946.802000] ata4: translated ATA stat/err 0x51/04 to SCSI
>> SK/ASC/ASCQ 0xb/00/00
>> [4301946.802000] ata4: status=0x51 { DriveReady SeekComplete Error }
>> [4301946.802000] ata4: error=0x04 { DriveStatusError }
>> [4302380.482000] ata4: translated ATA stat/err 0x51/04 to SCSI
>> SK/ASC/ASCQ 0xb/00/00
>> [4302380.482000] ata4: status=0x51 { DriveReady SeekComplete Error }
>> [4302380.482000] ata4: error=0x04 { DriveStatusError }
>> [4302493.664000] ata4: no sense translation for status: 0x51
>> [4302493.664000] ata4: translated ATA stat/err 0x51/00 to SCSI
>> SK/ASC/ASCQ 0xb/00/00
>> [4302493.664000] ata4: status=0x51 { DriveReady SeekComplete Error }
>> [4302863.673000] ata4: no sense translation for status: 0x51
>> [4302863.673000] ata4: translated ATA stat/err 0x51/00 to SCSI
>> SK/ASC/ASCQ 0xb/00/00
>> [4302863.673000] ata4: status=0x51 { DriveReady SeekComplete Error }
>>
>> different drive, different cable, same controller, but second port
>>
>> So that Stat/err = UDMA_CRC_Error_Count!
>
> No, I don't think it is -- there's a bit in the drive status
> for indicating CRC errors, and it is not showing up here.
>
> I think it's still just libata sending some command that this
> drive does not implement.  You really need to dump out the failed
> ATA opcode.
>
> I *think* this (uncompiled, untested) patch may do it for you on 2.6.16/17:
>
> --- linux/drivers/scsi/libata-scsi.c.orig	2006-06-19 10:37:03.000000000 -0400
> +++ linux/drivers/scsi/libata-scsi.c	2006-07-07 09:06:57.000000000 -0400
> @@ -542,6 +542,7 @@
> 	struct ata_taskfile *tf = &qc->tf;
> 	unsigned char *sb = cmd->sense_buffer;
> 	unsigned char *desc = sb + 8;
> +	unsigned char ata_op = tf->command;
>
> 	memset(sb, 0, SCSI_SENSE_BUFFERSIZE);
>
> @@ -558,6 +559,7 @@
> 	 * onto sense key, asc & ascq.
> 	 */
> 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
> +		printk(KERN_WARN "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op);
> 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
> 				   &sb[1], &sb[2], &sb[3]);
> 		sb[1] &= 0x0f;
>

had to change

KERN_WARN -> KERN_WARNING

then more errors

the patch never worked for me even when I had gotten it to work in
2.6.15.4, it never showed me what I wanted to see

drivers/scsi/libata-scsi.c: In function 'ata_gen_fixed_sense':
drivers/scsi/libata-scsi.c:638: error: 'ata_op' undeclared (first use in
this function)
drivers/scsi/libata-scsi.c:638: error: (Each undeclared identifier is
reported only once
drivers/scsi/libata-scsi.c:638: error: for each function it appears in.)
make[2]: *** [drivers/scsi/libata-scsi.o] Error 1

do you know who wrote the original patch?


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-07-07 13:24                     ` Justin Piszcz
@ 2006-07-07 13:43                       ` Mark Lord
  2006-07-07 13:48                         ` Justin Piszcz
                                           ` (2 more replies)
  0 siblings, 3 replies; 147+ messages in thread
From: Mark Lord @ 2006-07-07 13:43 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Sander, Jeff Garzik, linux-kernel, IDE/ATA development list

Justin Piszcz wrote:
> 
> had to change
> 
> KERN_WARN -> KERN_WARNING
> 
> then more errors

Eh?  After fixing the KERN_WARN -> KERN_WARNING part,
the patch compiles / links cleanly here on 2.6.17.
(fixed copy below).   Still untested, though.

> do you know who wrote the original patch?

I did.

Cheers

--- linux/drivers/scsi/libata-scsi.c.orig	2006-06-19 10:37:03.000000000 -0400
+++ linux/drivers/scsi/libata-scsi.c	2006-07-07 09:06:57.000000000 -0400
@@ -542,6 +542,7 @@
 	struct ata_taskfile *tf = &qc->tf;
 	unsigned char *sb = cmd->sense_buffer;
 	unsigned char *desc = sb + 8;
+	unsigned char ata_op = tf->command;
 
 	memset(sb, 0, SCSI_SENSE_BUFFERSIZE);
 
@@ -558,6 +559,7 @@
 	 * onto sense key, asc & ascq.
 	 */
 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
+		printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op);
 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
 				   &sb[1], &sb[2], &sb[3]);
 		sb[1] &= 0x0f;

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-07-07 13:43                       ` Mark Lord
@ 2006-07-07 13:48                         ` Justin Piszcz
  2006-07-07 14:01                         ` Justin Piszcz
  2006-07-07 14:35                         ` Justin Piszcz
  2 siblings, 0 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-07-07 13:48 UTC (permalink / raw)
  To: Mark Lord; +Cc: Sander, Jeff Garzik, linux-kernel, IDE/ATA development list



On Fri, 7 Jul 2006, Mark Lord wrote:

> Justin Piszcz wrote:
>>
>> had to change
>>
>> KERN_WARN -> KERN_WARNING
>>
>> then more errors
>
> Eh?  After fixing the KERN_WARN -> KERN_WARNING part,
> the patch compiles / links cleanly here on 2.6.17.
> (fixed copy below).   Still untested, though.
>
>> do you know who wrote the original patch?
>
> I did.
>
> Cheers
>
> --- linux/drivers/scsi/libata-scsi.c.orig	2006-06-19 10:37:03.000000000 -0400
> +++ linux/drivers/scsi/libata-scsi.c	2006-07-07 09:06:57.000000000 -0400
> @@ -542,6 +542,7 @@
> 	struct ata_taskfile *tf = &qc->tf;
> 	unsigned char *sb = cmd->sense_buffer;
> 	unsigned char *desc = sb + 8;
> +	unsigned char ata_op = tf->command;
>
> 	memset(sb, 0, SCSI_SENSE_BUFFERSIZE);
>
> @@ -558,6 +559,7 @@
> 	 * onto sense key, asc & ascq.
> 	 */
> 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
> +		printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op);
> 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
> 				   &sb[1], &sb[2], &sb[3]);
> 		sb[1] &= 0x0f;
>

Applied patch, rebooting, waiting to get the error again.


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-07-07 13:43                       ` Mark Lord
  2006-07-07 13:48                         ` Justin Piszcz
@ 2006-07-07 14:01                         ` Justin Piszcz
  2006-07-07 14:35                         ` Justin Piszcz
  2 siblings, 0 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-07-07 14:01 UTC (permalink / raw)
  To: Mark Lord; +Cc: Sander, Jeff Garzik, linux-kernel, IDE/ATA development list



On Fri, 7 Jul 2006, Mark Lord wrote:

> Justin Piszcz wrote:
>>
>> had to change
>>
>> KERN_WARN -> KERN_WARNING
>>
>> then more errors
>
> Eh?  After fixing the KERN_WARN -> KERN_WARNING part,
> the patch compiles / links cleanly here on 2.6.17.
> (fixed copy below).   Still untested, though.
>
>> do you know who wrote the original patch?
>
> I did.
>
> Cheers
>
> --- linux/drivers/scsi/libata-scsi.c.orig	2006-06-19 10:37:03.000000000 -0400
> +++ linux/drivers/scsi/libata-scsi.c	2006-07-07 09:06:57.000000000 -0400
> @@ -542,6 +542,7 @@
> 	struct ata_taskfile *tf = &qc->tf;
> 	unsigned char *sb = cmd->sense_buffer;
> 	unsigned char *desc = sb + 8;
> +	unsigned char ata_op = tf->command;
>
> 	memset(sb, 0, SCSI_SENSE_BUFFERSIZE);
>
> @@ -558,6 +559,7 @@
> 	 * onto sense key, asc & ascq.
> 	 */
> 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
> +		printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op);
> 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
> 				   &sb[1], &sb[2], &sb[3]);
> 		sb[1] &= 0x0f;
>

Mark,

I've set a disk faulty in my SW RAID5 and rebuilding it now, note, in the 
past two rebuilds I have done (in exact same manner & disk) I've gotten 
3-4 of these or so, so if I do not get them this time, that will be 
extremely odd.

Justin.


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-07-07 13:43                       ` Mark Lord
  2006-07-07 13:48                         ` Justin Piszcz
  2006-07-07 14:01                         ` Justin Piszcz
@ 2006-07-07 14:35                         ` Justin Piszcz
  2006-07-07 18:53                           ` Justin Piszcz
  2 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-07-07 14:35 UTC (permalink / raw)
  To: Mark Lord; +Cc: Sander, Jeff Garzik, linux-kernel, IDE/ATA development list



On Fri, 7 Jul 2006, Mark Lord wrote:

> Justin Piszcz wrote:
>>
>> had to change
>>
>> KERN_WARN -> KERN_WARNING
>>
>> then more errors
>
> Eh?  After fixing the KERN_WARN -> KERN_WARNING part,
> the patch compiles / links cleanly here on 2.6.17.
> (fixed copy below).   Still untested, though.
>
>> do you know who wrote the original patch?
>
> I did.
>
> Cheers
>
> --- linux/drivers/scsi/libata-scsi.c.orig	2006-06-19 10:37:03.000000000 -0400
> +++ linux/drivers/scsi/libata-scsi.c	2006-07-07 09:06:57.000000000 -0400
> @@ -542,6 +542,7 @@
> 	struct ata_taskfile *tf = &qc->tf;
> 	unsigned char *sb = cmd->sense_buffer;
> 	unsigned char *desc = sb + 8;
> +	unsigned char ata_op = tf->command;
>
> 	memset(sb, 0, SCSI_SENSE_BUFFERSIZE);
>
> @@ -558,6 +559,7 @@
> 	 * onto sense key, asc & ascq.
> 	 */
> 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
> +		printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op);
> 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
> 				   &sb[1], &sb[2], &sb[3]);
> 		sb[1] &= 0x0f;
>

Mark!! It did it again, here you go:

==> /p34/var/log/messages <==
Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { 
DriveReady SeekComplete Index Error }
Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { 
DriveStatusError }
==> /p34/var/log/kern.log <==
Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: translated ATA stat/err 
0x53/04 to SCSI SK/ASC/ASCQ 0xb/00/00
Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { 
DriveReady SeekComplete Index Error }
Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { 
DriveStatusError }

Does this help?

Can we eliminate the cause of these errors now?


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-07-07 14:35                         ` Justin Piszcz
@ 2006-07-07 18:53                           ` Justin Piszcz
  2006-07-07 19:19                             ` Jeff Garzik
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-07-07 18:53 UTC (permalink / raw)
  To: Mark Lord
  Cc: Sander, Jeff Garzik, linux-kernel, IDE/ATA development list, Alan Cox



On Fri, 7 Jul 2006, Justin Piszcz wrote:

>
>
> On Fri, 7 Jul 2006, Mark Lord wrote:
>
>> Justin Piszcz wrote:
>>> 
>>> had to change
>>> 
>>> KERN_WARN -> KERN_WARNING
>>> 
>>> then more errors
>> 
>> Eh?  After fixing the KERN_WARN -> KERN_WARNING part,
>> the patch compiles / links cleanly here on 2.6.17.
>> (fixed copy below).   Still untested, though.
>> 
>>> do you know who wrote the original patch?
>> 
>> I did.
>> 
>> Cheers
>> 
>> --- linux/drivers/scsi/libata-scsi.c.orig	2006-06-19 10:37:03.000000000 
>> -0400
>> +++ linux/drivers/scsi/libata-scsi.c	2006-07-07 09:06:57.000000000 -0400
>> @@ -542,6 +542,7 @@
>> 	struct ata_taskfile *tf = &qc->tf;
>> 	unsigned char *sb = cmd->sense_buffer;
>> 	unsigned char *desc = sb + 8;
>> +	unsigned char ata_op = tf->command;
>>
>> 	memset(sb, 0, SCSI_SENSE_BUFFERSIZE);
>> 
>> @@ -558,6 +559,7 @@
>> 	 * onto sense key, asc & ascq.
>> 	 */
>> 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
>> +		printk(KERN_WARNING "ata_gen_ata_desc_sense: failed 
>> ata_op=0x%02x\n", ata_op);
>> 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
>> 				   &sb[1], &sb[2], &sb[3]);
>> 		sb[1] &= 0x0f;
>> 
>
> Mark!! It did it again, here you go:
>
> ==> /p34/var/log/messages <==
> Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { DriveReady 
> SeekComplete Index Error }
> Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { 
> DriveStatusError }
> ==> /p34/var/log/kern.log <==
> Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: translated ATA stat/err 
> 0x53/04 to SCSI SK/ASC/ASCQ 0xb/00/00
> Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { DriveReady 
> SeekComplete Index Error }
> Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { 
> DriveStatusError }
>
> Does this help?
>
> Can we eliminate the cause of these errors now?
>
>

Jeff or Alan,

Does that ATA translation help in determining what *bad* commands are 
being sent to the drive?

This occurs on two separate identical disks.

Justin.


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-07-07 18:53                           ` Justin Piszcz
@ 2006-07-07 19:19                             ` Jeff Garzik
  2006-07-07 19:28                               ` Justin Piszcz
  0 siblings, 1 reply; 147+ messages in thread
From: Jeff Garzik @ 2006-07-07 19:19 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Mark Lord, Sander, linux-kernel, IDE/ATA development list, Alan Cox

Justin Piszcz wrote:
> 
> 
> On Fri, 7 Jul 2006, Justin Piszcz wrote:
> 
>>
>>
>> On Fri, 7 Jul 2006, Mark Lord wrote:
>>
>>> Justin Piszcz wrote:
>>>>
>>>> had to change
>>>>
>>>> KERN_WARN -> KERN_WARNING
>>>>
>>>> then more errors
>>>
>>> Eh?  After fixing the KERN_WARN -> KERN_WARNING part,
>>> the patch compiles / links cleanly here on 2.6.17.
>>> (fixed copy below).   Still untested, though.
>>>
>>>> do you know who wrote the original patch?
>>>
>>> I did.
>>>
>>> Cheers
>>>
>>> --- linux/drivers/scsi/libata-scsi.c.orig    2006-06-19 
>>> 10:37:03.000000000 -0400
>>> +++ linux/drivers/scsi/libata-scsi.c    2006-07-07 09:06:57.000000000 
>>> -0400
>>> @@ -542,6 +542,7 @@
>>>     struct ata_taskfile *tf = &qc->tf;
>>>     unsigned char *sb = cmd->sense_buffer;
>>>     unsigned char *desc = sb + 8;
>>> +    unsigned char ata_op = tf->command;
>>>
>>>     memset(sb, 0, SCSI_SENSE_BUFFERSIZE);
>>>
>>> @@ -558,6 +559,7 @@
>>>      * onto sense key, asc & ascq.
>>>      */
>>>     if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
>>> +        printk(KERN_WARNING "ata_gen_ata_desc_sense: failed 
>>> ata_op=0x%02x\n", ata_op);
>>>         ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
>>>                    &sb[1], &sb[2], &sb[3]);
>>>         sb[1] &= 0x0f;
>>>
>>
>> Mark!! It did it again, here you go:
>>
>> ==> /p34/var/log/messages <==
>> Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { 
>> DriveReady SeekComplete Index Error }
>> Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { 
>> DriveStatusError }
>> ==> /p34/var/log/kern.log <==
>> Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: translated ATA 
>> stat/err 0x53/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>> Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { 
>> DriveReady SeekComplete Index Error }
>> Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { 
>> DriveStatusError }
>>
>> Does this help?
>>
>> Can we eliminate the cause of these errors now?
>>
>>
> 
> Jeff or Alan,
> 
> Does that ATA translation help in determining what *bad* commands are 
> being sent to the drive?

No, it needs the patch that Mark has been posting...

	Jeff




^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-07-07 19:19                             ` Jeff Garzik
@ 2006-07-07 19:28                               ` Justin Piszcz
       [not found]                                 ` <200607091224.31451.liml@rtr.ca>
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-07-07 19:28 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Mark Lord, Sander, linux-kernel, IDE/ATA development list, Alan Cox



On Fri, 7 Jul 2006, Jeff Garzik wrote:

> Justin Piszcz wrote:
>> 
>> 
>> On Fri, 7 Jul 2006, Justin Piszcz wrote:
>> 
>>> 
>>> 
>>> On Fri, 7 Jul 2006, Mark Lord wrote:
>>> 
>>>> Justin Piszcz wrote:
>>>>> 
>>>>> had to change
>>>>> 
>>>>> KERN_WARN -> KERN_WARNING
>>>>> 
>>>>> then more errors
>>>> 
>>>> Eh?  After fixing the KERN_WARN -> KERN_WARNING part,
>>>> the patch compiles / links cleanly here on 2.6.17.
>>>> (fixed copy below).   Still untested, though.
>>>> 
>>>>> do you know who wrote the original patch?
>>>> 
>>>> I did.
>>>> 
>>>> Cheers
>>>> 
>>>> --- linux/drivers/scsi/libata-scsi.c.orig    2006-06-19 
>>>> 10:37:03.000000000 -0400
>>>> +++ linux/drivers/scsi/libata-scsi.c    2006-07-07 09:06:57.000000000 
>>>> -0400
>>>> @@ -542,6 +542,7 @@
>>>>     struct ata_taskfile *tf = &qc->tf;
>>>>     unsigned char *sb = cmd->sense_buffer;
>>>>     unsigned char *desc = sb + 8;
>>>> +    unsigned char ata_op = tf->command;
>>>>
>>>>     memset(sb, 0, SCSI_SENSE_BUFFERSIZE);
>>>> 
>>>> @@ -558,6 +559,7 @@
>>>>      * onto sense key, asc & ascq.
>>>>      */
>>>>     if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
>>>> +        printk(KERN_WARNING "ata_gen_ata_desc_sense: failed 
>>>> ata_op=0x%02x\n", ata_op);
>>>>         ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
>>>>                    &sb[1], &sb[2], &sb[3]);
>>>>         sb[1] &= 0x0f;
>>>> 
>>> 
>>> Mark!! It did it again, here you go:
>>> 
>>> ==> /p34/var/log/messages <==
>>> Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { 
>>> DriveReady SeekComplete Index Error }
>>> Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { 
>>> DriveStatusError }
>>> ==> /p34/var/log/kern.log <==
>>> Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: translated ATA stat/err 
>>> 0x53/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>>> Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { 
>>> DriveReady SeekComplete Index Error }
>>> Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { 
>>> DriveStatusError }
>>> 
>>> Does this help?
>>> 
>>> Can we eliminate the cause of these errors now?
>>> 
>>> 
>> 
>> Jeff or Alan,
>> 
>> Does that ATA translation help in determining what *bad* commands are being 
>> sent to the drive?
>
> No, it needs the patch that Mark has been posting...
>
> 	Jeff
>
>
>

Jeff, the patch is applied and box booted the new kernel and I reproduced 
the error messages, THAT is what is produced with the patch.


Without the patch:

Jun 18 07:09:53 p34 kernel: [4297678.777000] ata3: status=0x51 { 
DriveReady SeekComplete Error }
Jun 18 07:09:53 p34 kernel: [4297678.777000] ata3: error=0x04 { 
DriveStatusError }
Jun 18 07:20:08 p34 -- MARK --
Jun 18 07:27:31 p34 kernel: [4298736.905000] ata3: status=0x51 { 
DriveReady SeekComplete Error }
Jun 18 07:27:31 p34 kernel: [4298736.905000] ata3: error=0x04 { 
DriveStatusError }

With the patch:

Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: translated ATA stat/err 
0x53/04 to SCSI SK/ASC/ASCQ 0xb/00/00
Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { 
DriveReady SeekComplete Index Error }
Jul  7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { 
DriveStatusError }
Jul  7 10:49:29 p34 kernel: [4298273.178000] ata4: translated ATA stat/err 
0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
Jul  7 10:49:29 p34 kernel: [4298273.178000] ata4: status=0x51 { 
DriveReady SeekComplete Error }
Jul  7 10:49:29 p34 kernel: [4298273.178000] ata4: error=0x04 { 
DriveStatusError }
Jul  7 11:43:02 p34 kernel: [4301488.359000] ata4: translated ATA stat/err 
0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
Jul  7 11:43:02 p34 kernel: [4301488.359000] ata4: status=0x51 { 
DriveReady SeekComplete Error }
Jul  7 11:43:02 p34 kernel: [4301488.359000] ata4: error=0x04 { 
DriveStatusError }
Jul  7 12:35:27 p34 kernel: [4304634.600000] ata4: translated ATA stat/err 
0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
Jul  7 12:35:27 p34 kernel: [4304634.600000] ata4: status=0x51 { 
DriveReady SeekComplete Error }
Jul  7 12:35:27 p34 kernel: [4304634.600000] ata4: error=0x04 { 
DriveStatusError }
Jul  7 12:44:14 p34 kernel: [4305162.220000] ata4: no sense translation 
for status: 0x51
Jul  7 12:44:14 p34 kernel: [4305162.220000] ata4: translated ATA stat/err 
0x51/00 to SCSI SK/ASC/ASCQ 0xb/00/00
Jul  7 12:44:14 p34 kernel: [4305162.220000] ata4: status=0x51 { 
DriveReady SeekComplete Error }
Jul  7 13:03:22 p34 kernel: [4306309.782000] ata4: translated ATA stat/err 
0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
Jul  7 13:03:22 p34 kernel: [4306309.782000] ata4: status=0x51 { 
DriveReady SeekComplete Error }
Jul  7 13:03:22 p34 kernel: [4306309.782000] ata4: error=0x04 { 
DriveStatusError }
Jul  7 13:05:12 p34 kernel: [4306419.891000] ata4: no sense translation 
for status: 0x51
Jul  7 13:05:12 p34 kernel: [4306419.891000] ata4: translated ATA stat/err 
0x51/00 to SCSI SK/ASC/ASCQ 0xb/00/00
Jul  7 13:05:12 p34 kernel: [4306419.891000] ata4: status=0x51 { 
DriveReady SeekComplete Error }
Jul  7 13:32:20 p34 kernel: [4308048.717000] ata4: translated ATA stat/err 
0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
Jul  7 13:32:20 p34 kernel: [4308048.717000] ata4: status=0x51 { 
DriveReady SeekComplete Error }
Jul  7 13:32:20 p34 kernel: [4308048.717000] ata4: error=0x04 { 
DriveStatusError }

When I had been running it earlier with 2.6.15.x:

Mar  1 13:31:10 p34 kernel: [4295292.736000] +++PATCH: Original kernel 
error:
Mar  1 13:31:10 p34 kernel: [4295292.736000] ata3: translated op=0x85 ATA 
stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
Mar  1 13:31:10 p34 kernel: [4295292.736000] +++PATCH: Mark Lord's 
extended verbosity patch:
Mar  1 13:31:10 p34 kernel: [4295292.736000] ata3: translated op=0x85 
cmd=0xb0 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
Mar  1 13:31:10 p34 kernel: [4295292.736000] ata3: status=0x51 { 
DriveReady SeekComplete Error }
Mar  1 13:31:10 p34 kernel: [4295292.736000] ata3: error=0x04 { 
DriveStatusError }
Mar  1 13:31:10 p34 kernel: [4295292.736000] +++PATCH: Original kernel 
error:
Mar  1 13:31:10 p34 kernel: [4295292.736000] ata3: translated op=0x85 ATA 
stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
Mar  1 13:31:10 p34 kernel: [4295292.736000] +++PATCH: Mark Lord's 
extended verbosity patch:
Mar  1 13:31:10 p34 kernel: [4295292.736000] ata3: translated op=0x85 
cmd=0xb0 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
Mar  1 13:31:10 p34 kernel: [4295292.736000] ata3: status=0x51 { 
DriveReady SeekComplete Error }
Mar  1 13:31:10 p34 kernel: [4295292.736000] ata3: error=0x04 { 
DriveStatusError }

Perhaps the patch is not printing out the correct error message?

This shows that the source file was patched in libata-scsi.c.

         /*
          * Use ata_to_sense_error() to map status register bits
          * onto sense key, asc & ascq.
          */
         if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
                 printk(KERN_WARNING "ata_gen_ata_desc_sense: failed 
ata_op=0x%02x\n", ata_op);
                 ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
                                    &sb[1], &sb[2], &sb[3]);
                 sb[1] &= 0x0f;
         }


This shows the kernel version.
$ cat /usr/src/linux/.version
4

This shows I am running the patched version.
$ uname -a
Linux p34.internal.lan 2.6.17.3 #4 SMP PREEMPT Fri Jul 7 09:47:53 EDT 2006 
i686 GNU/Linux
$

Maybe something is blocking the opcode output from showing correctly?

Justin.


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
       [not found]                                 ` <200607091224.31451.liml@rtr.ca>
@ 2006-07-09 17:27                                   ` Justin Piszcz
  2006-07-09 20:16                                     ` Justin Piszcz
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-07-09 17:27 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jeff Garzik, Sander, linux-kernel, IDE/ATA development list, Alan Cox



On Sun, 9 Jul 2006, Mark Lord wrote:

> Mmm.. there are two main paths into those messages,
> and my current patch only caught one of them.
>
> Here's a reworked version that catches the ata_op on both paths.
> Maybe this will dump out the info we need to diagnose Justin's system.
>
> Compiles & links fine on 2.6.17, but not tested.
>
> Cheers
>
> --- linux/drivers/scsi/libata-scsi.c.orig	2006-06-23 13:38:37.000000000 -0400
> +++ linux/drivers/scsi/libata-scsi.c	2006-07-09 12:19:52.000000000 -0400
> @@ -542,6 +542,7 @@
> 	struct ata_taskfile *tf = &qc->tf;
> 	unsigned char *sb = cmd->sense_buffer;
> 	unsigned char *desc = sb + 8;
> +	unsigned char ata_op = tf->command;
>
> 	memset(sb, 0, SCSI_SENSE_BUFFERSIZE);
>
> @@ -558,6 +559,7 @@
> 	 * onto sense key, asc & ascq.
> 	 */
> 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
> +		printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op);
> 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
> 				   &sb[1], &sb[2], &sb[3]);
> 		sb[1] &= 0x0f;
> @@ -617,6 +619,7 @@
> 	struct scsi_cmnd *cmd = qc->scsicmd;
> 	struct ata_taskfile *tf = &qc->tf;
> 	unsigned char *sb = cmd->sense_buffer;
> +	unsigned char ata_op = tf->command;
>
> 	memset(sb, 0, SCSI_SENSE_BUFFERSIZE);
>
> @@ -633,6 +636,7 @@
> 	 * onto sense key, asc & ascq.
> 	 */
> 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
> +		printk(KERN_WARNING "ata_gen_fixed_sense: failed ata_op=0x%02x\n", ata_op);
> 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
> 				   &sb[2], &sb[12], &sb[13]);
> 		sb[2] &= 0x0f;
>

Thanks Mark!

Applying now.


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-07-09 17:27                                   ` Justin Piszcz
@ 2006-07-09 20:16                                     ` Justin Piszcz
  2006-07-09 20:40                                       ` LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! Justin Piszcz
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-07-09 20:16 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jeff Garzik, Sander, linux-kernel, IDE/ATA development list, Alan Cox

On Sun, 9 Jul 2006, Justin Piszcz wrote:

>
>
> On Sun, 9 Jul 2006, Mark Lord wrote:
>
>> Mmm.. there are two main paths into those messages,
>> and my current patch only caught one of them.
>> 
>> Here's a reworked version that catches the ata_op on both paths.
>> Maybe this will dump out the info we need to diagnose Justin's system.
>> 
>> Compiles & links fine on 2.6.17, but not tested.
>> 
>> Cheers
>> 
>> --- linux/drivers/scsi/libata-scsi.c.orig	2006-06-23 13:38:37.000000000 
>> -0400
>> +++ linux/drivers/scsi/libata-scsi.c	2006-07-09 12:19:52.000000000 -0400
>> @@ -542,6 +542,7 @@
>> 	struct ata_taskfile *tf = &qc->tf;
>> 	unsigned char *sb = cmd->sense_buffer;
>> 	unsigned char *desc = sb + 8;
>> +	unsigned char ata_op = tf->command;
>>
>> 	memset(sb, 0, SCSI_SENSE_BUFFERSIZE);
>> 
>> @@ -558,6 +559,7 @@
>> 	 * onto sense key, asc & ascq.
>> 	 */
>> 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
>> +		printk(KERN_WARNING "ata_gen_ata_desc_sense: failed 
>> ata_op=0x%02x\n", ata_op);
>> 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
>> 				   &sb[1], &sb[2], &sb[3]);
>> 		sb[1] &= 0x0f;
>> @@ -617,6 +619,7 @@
>> 	struct scsi_cmnd *cmd = qc->scsicmd;
>> 	struct ata_taskfile *tf = &qc->tf;
>> 	unsigned char *sb = cmd->sense_buffer;
>> +	unsigned char ata_op = tf->command;
>>
>> 	memset(sb, 0, SCSI_SENSE_BUFFERSIZE);
>> 
>> @@ -633,6 +636,7 @@
>> 	 * onto sense key, asc & ascq.
>> 	 */
>> 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
>> +		printk(KERN_WARNING "ata_gen_fixed_sense: failed 
>> ata_op=0x%02x\n", ata_op);
>> 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
>> 				   &sb[2], &sb[12], &sb[13]);
>> 		sb[2] &= 0x0f;
>> 
>
> Thanks Mark!
>
> Applying now.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

Mark,

Check line 519, this is where it is printing the error (I believe) and 
the patch does not print the ata_op here.

It is in the ata_to_sense_error() function.

I've already patched, as you can see, recompiled, etc..

# patch -p0 < /tmp/b
patching file linux/drivers/scsi/libata-scsi.c
Reversed (or previously applied) patch detected!  Assume -R? [n]
#

Jul  9 15:22:57 p34 kernel: [4300704.724000] ata3: translated ATA stat/err 
0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
Jul  9 15:22:57 p34 kernel: [4300704.724000] ata3: status=0x51 { 
DriveReady SeekComplete Error }
Jul  9 15:22:57 p34 kernel: [4300704.724000] ata3: error=0x04 { 
DriveStatusError }

This part needs the ata_op:

     519  translate_done:
     520         printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to "
     521                "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err,
     522                *sk, *asc, *ascq);


Justin.




^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4 (found the opcode=0x35)!
  2006-07-09 20:16                                     ` Justin Piszcz
@ 2006-07-09 20:40                                       ` Justin Piszcz
  2006-07-09 20:46                                         ` Justin Piszcz
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-07-09 20:40 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jeff Garzik, Sander, linux-kernel, IDE/ATA development list, Alan Cox

I made my own patch (following Mark's example) but also added that printk 
in that function.

Jul  9 16:37:52 p34 kernel: [4294810.556000] ata_gen_fixed_sense: failed 
ata_op=0x35
Jul  9 16:37:52 p34 kernel: [4294810.556000] ata4: translated ATA stat/err 
0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
Jul  9 16:37:52 p34 kernel: [4294810.556000] ata_gen_ata_desc_sense: 
failed ata_op=0x51
Jul  9 16:37:52 p34 kernel: [4294810.556000] ata4: status=0x51 { 
DriveReady SeekComplete Error }
Jul  9 16:37:52 p34 kernel: [4294810.556000] ata4: error=0x04 { 
DriveStatusError }

Now that we have found the ata_op code of 0x35, what does this mean?  Is 
it generated from a bad FUA/unsupported command from the kernel/SATA 
driver?

Justin.


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4 (found the opcode=0x35)!
  2006-07-09 20:40                                       ` LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! Justin Piszcz
@ 2006-07-09 20:46                                         ` Justin Piszcz
  2006-07-09 21:05                                           ` Justin Piszcz
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-07-09 20:46 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jeff Garzik, Sander, linux-kernel, IDE/ATA development list, Alan Cox



On Sun, 9 Jul 2006, Justin Piszcz wrote:

> I made my own patch (following Mark's example) but also added that printk in 
> that function.
>
> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata_gen_fixed_sense: failed 
> ata_op=0x35
> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata4: translated ATA stat/err 
> 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata_gen_ata_desc_sense: failed 
> ata_op=0x51
> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata4: status=0x51 { DriveReady 
> SeekComplete Error }
> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata4: error=0x04 { 
> DriveStatusError }
>
> Now that we have found the ata_op code of 0x35, what does this mean?  Is it 
> generated from a bad FUA/unsupported command from the kernel/SATA driver?
>
> Justin.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

In /usr/src/linux/include/linux/ata.h:

   ATA_CMD_WRITE_EXT = 0x35,

Perhaps these drives do not support this command or do not support it 
properly?

Any idea, Jeff/Alan?

Justin.


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4 (found the opcode=0x35)!
  2006-07-09 20:46                                         ` Justin Piszcz
@ 2006-07-09 21:05                                           ` Justin Piszcz
  2006-07-09 22:03                                             ` Justin Piszcz
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-07-09 21:05 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jeff Garzik, Sander, linux-kernel, IDE/ATA development list, Alan Cox



On Sun, 9 Jul 2006, Justin Piszcz wrote:

>
>
> On Sun, 9 Jul 2006, Justin Piszcz wrote:
>
>> I made my own patch (following Mark's example) but also added that printk 
>> in that function.
>> 
>> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata_gen_fixed_sense: failed 
>> ata_op=0x35
>> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata4: translated ATA stat/err 
>> 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata_gen_ata_desc_sense: failed 
>> ata_op=0x51
>> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata4: status=0x51 { DriveReady 
>> SeekComplete Error }
>> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata4: error=0x04 { 
>> DriveStatusError }
>> 
>> Now that we have found the ata_op code of 0x35, what does this mean?  Is it 
>> generated from a bad FUA/unsupported command from the kernel/SATA driver?
>> 
>> Justin.
>> 
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>> 
>
> In /usr/src/linux/include/linux/ata.h:
>
>  ATA_CMD_WRITE_EXT = 0x35,
>
> Perhaps these drives do not support this command or do not support it 
> properly?
>
> Any idea, Jeff/Alan?
>
> Justin.
>
>

Here are all the errors (when reading/writing heavily):

[4294810.556000] ata_gen_fixed_sense: failed ata_op=0x35
[4294810.556000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
0xb/00/00
[4294810.556000] ata_gen_ata_desc_sense: failed ata_op=0x51
[4294810.556000] ata4: status=0x51 { DriveReady SeekComplete Error }
[4294810.556000] ata4: error=0x04 { DriveStatusError }
[4295514.668000] ata_gen_fixed_sense: failed ata_op=0x35
[4295514.668000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
0xb/00/00
[4295514.668000] ata_gen_ata_desc_sense: failed ata_op=0x51
[4295514.668000] ata3: status=0x51 { DriveReady SeekComplete Error }
[4295514.668000] ata3: error=0x04 { DriveStatusError }

Jeff/Mark, from these errors can we reach a consensus as to the cause of 
these errors and how to eliminate the problem?

Thanks,

Justin.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4 (found the opcode=0x35)!
  2006-07-09 21:05                                           ` Justin Piszcz
@ 2006-07-09 22:03                                             ` Justin Piszcz
  2006-07-10 13:59                                               ` Follow up? " Justin Piszcz
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-07-09 22:03 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jeff Garzik, Sander, linux-kernel, IDE/ATA development list, Alan Cox

[4294810.556000] ata_gen_ata_desc_sense: failed ata_op=0x51
[4294810.556000] ata4: status=0x51 { DriveReady SeekComplete Error }
[4294810.556000] ata4: error=0x04 { DriveStatusError }
[4295514.668000] ata_gen_fixed_sense: failed ata_op=0x35
[4295514.668000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
0xb/00/00
[4295514.668000] ata_gen_ata_desc_sense: failed ata_op=0x51
[4295514.668000] ata3: status=0x51 { DriveReady SeekComplete Error }
[4295514.668000] ata3: error=0x04 { DriveStatusError }
[4297033.649000] ata_gen_fixed_sense: failed ata_op=0xca
[4297033.649000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
0xb/00/00
[4297033.649000] ata_gen_ata_desc_sense: failed ata_op=0x51
[4297033.649000] ata4: status=0x51 { DriveReady SeekComplete Error }
[4297033.649000] ata4: error=0x04 { DriveStatusError }
[4297741.057000] ata_gen_fixed_sense: failed ata_op=0x35
[4297741.057000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
0xb/00/00
[4297741.057000] ata_gen_ata_desc_sense: failed ata_op=0x51
[4297741.057000] ata4: status=0x51 { DriveReady SeekComplete Error }
[4297741.057000] ata4: error=0x04 { DriveStatusError }

Also got a 0xca.


On Sun, 9 Jul 2006, Justin Piszcz wrote:

>
>
> On Sun, 9 Jul 2006, Justin Piszcz wrote:
>
>> 
>> 
>> On Sun, 9 Jul 2006, Justin Piszcz wrote:
>> 
>>> I made my own patch (following Mark's example) but also added that printk 
>>> in that function.
>>> 
>>> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata_gen_fixed_sense: failed 
>>> ata_op=0x35
>>> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata4: translated ATA stat/err 
>>> 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>>> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata_gen_ata_desc_sense: 
>>> failed ata_op=0x51
>>> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata4: status=0x51 { 
>>> DriveReady SeekComplete Error }
>>> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata4: error=0x04 { 
>>> DriveStatusError }
>>> 
>>> Now that we have found the ata_op code of 0x35, what does this mean?  Is 
>>> it generated from a bad FUA/unsupported command from the kernel/SATA 
>>> driver?
>>> 
>>> Justin.
>>> 
>>> -
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
>>> 
>> 
>> In /usr/src/linux/include/linux/ata.h:
>>
>>  ATA_CMD_WRITE_EXT = 0x35,
>> 
>> Perhaps these drives do not support this command or do not support it 
>> properly?
>> 
>> Any idea, Jeff/Alan?
>> 
>> Justin.
>> 
>> 
>
> Here are all the errors (when reading/writing heavily):
>
> [4294810.556000] ata_gen_fixed_sense: failed ata_op=0x35
> [4294810.556000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
> 0xb/00/00
> [4294810.556000] ata_gen_ata_desc_sense: failed ata_op=0x51
> [4294810.556000] ata4: status=0x51 { DriveReady SeekComplete Error }
> [4294810.556000] ata4: error=0x04 { DriveStatusError }
> [4295514.668000] ata_gen_fixed_sense: failed ata_op=0x35
> [4295514.668000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
> 0xb/00/00
> [4295514.668000] ata_gen_ata_desc_sense: failed ata_op=0x51
> [4295514.668000] ata3: status=0x51 { DriveReady SeekComplete Error }
> [4295514.668000] ata3: error=0x04 { DriveStatusError }
>
> Jeff/Mark, from these errors can we reach a consensus as to the cause of 
> these errors and how to eliminate the problem?
>
> Thanks,
>
> Justin.
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)!
  2006-07-09 22:03                                             ` Justin Piszcz
@ 2006-07-10 13:59                                               ` Justin Piszcz
  2006-07-10 15:33                                                 ` Alan Cox
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-07-10 13:59 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jeff Garzik, Sander, linux-kernel, IDE/ATA development list, Alan Cox

Any follow up now that we have the failed ata-translated op codes?

On Sun, 9 Jul 2006, Justin Piszcz wrote:

> [4294810.556000] ata_gen_ata_desc_sense: failed ata_op=0x51
> [4294810.556000] ata4: status=0x51 { DriveReady SeekComplete Error }
> [4294810.556000] ata4: error=0x04 { DriveStatusError }
> [4295514.668000] ata_gen_fixed_sense: failed ata_op=0x35
> [4295514.668000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
> 0xb/00/00
> [4295514.668000] ata_gen_ata_desc_sense: failed ata_op=0x51
> [4295514.668000] ata3: status=0x51 { DriveReady SeekComplete Error }
> [4295514.668000] ata3: error=0x04 { DriveStatusError }
> [4297033.649000] ata_gen_fixed_sense: failed ata_op=0xca
> [4297033.649000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
> 0xb/00/00
> [4297033.649000] ata_gen_ata_desc_sense: failed ata_op=0x51
> [4297033.649000] ata4: status=0x51 { DriveReady SeekComplete Error }
> [4297033.649000] ata4: error=0x04 { DriveStatusError }
> [4297741.057000] ata_gen_fixed_sense: failed ata_op=0x35
> [4297741.057000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
> 0xb/00/00
> [4297741.057000] ata_gen_ata_desc_sense: failed ata_op=0x51
> [4297741.057000] ata4: status=0x51 { DriveReady SeekComplete Error }
> [4297741.057000] ata4: error=0x04 { DriveStatusError }
>
> Also got a 0xca.
>
>
> On Sun, 9 Jul 2006, Justin Piszcz wrote:
>
>> 
>> 
>> On Sun, 9 Jul 2006, Justin Piszcz wrote:
>> 
>>> 
>>> 
>>> On Sun, 9 Jul 2006, Justin Piszcz wrote:
>>> 
>>>> I made my own patch (following Mark's example) but also added that printk 
>>>> in that function.
>>>> 
>>>> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata_gen_fixed_sense: failed 
>>>> ata_op=0x35
>>>> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata4: translated ATA 
>>>> stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
>>>> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata_gen_ata_desc_sense: 
>>>> failed ata_op=0x51
>>>> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata4: status=0x51 { 
>>>> DriveReady SeekComplete Error }
>>>> Jul  9 16:37:52 p34 kernel: [4294810.556000] ata4: error=0x04 { 
>>>> DriveStatusError }
>>>> 
>>>> Now that we have found the ata_op code of 0x35, what does this mean?  Is 
>>>> it generated from a bad FUA/unsupported command from the kernel/SATA 
>>>> driver?
>>>> 
>>>> Justin.
>>>> 
>>>> -
>>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" 
>>>> in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>> 
>>> 
>>> In /usr/src/linux/include/linux/ata.h:
>>>
>>>  ATA_CMD_WRITE_EXT = 0x35,
>>> 
>>> Perhaps these drives do not support this command or do not support it 
>>> properly?
>>> 
>>> Any idea, Jeff/Alan?
>>> 
>>> Justin.
>>> 
>>> 
>> 
>> Here are all the errors (when reading/writing heavily):
>> 
>> [4294810.556000] ata_gen_fixed_sense: failed ata_op=0x35
>> [4294810.556000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
>> 0xb/00/00
>> [4294810.556000] ata_gen_ata_desc_sense: failed ata_op=0x51
>> [4294810.556000] ata4: status=0x51 { DriveReady SeekComplete Error }
>> [4294810.556000] ata4: error=0x04 { DriveStatusError }
>> [4295514.668000] ata_gen_fixed_sense: failed ata_op=0x35
>> [4295514.668000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
>> 0xb/00/00
>> [4295514.668000] ata_gen_ata_desc_sense: failed ata_op=0x51
>> [4295514.668000] ata3: status=0x51 { DriveReady SeekComplete Error }
>> [4295514.668000] ata3: error=0x04 { DriveStatusError }
>> 
>> Jeff/Mark, from these errors can we reach a consensus as to the cause of 
>> these errors and how to eliminate the problem?
>> 
>> Thanks,
>> 
>> Justin.
>> 
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)!
  2006-07-10 13:59                                               ` Follow up? " Justin Piszcz
@ 2006-07-10 15:33                                                 ` Alan Cox
  2006-07-10 15:45                                                   ` Justin Piszcz
  0 siblings, 1 reply; 147+ messages in thread
From: Alan Cox @ 2006-07-10 15:33 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Mark Lord, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list

Ar Llu, 2006-07-10 am 09:59 -0400, ysgrifennodd Justin Piszcz:
> > [4297741.057000] ata_gen_fixed_sense: failed ata_op=0x35
> > [4297741.057000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 
> > 0xb/00/00
> > [4297741.057000] ata_gen_ata_desc_sense: failed ata_op=0x51
> > [4297741.057000] ata4: status=0x51 { DriveReady SeekComplete Error }
> > [4297741.057000] ata4: error=0x04 { DriveStatusError }
> >
> > Also got a 0xca.

Thats "write" so if that is reporting as an unknown command something
very odd indeed is happening.



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)!
  2006-07-10 15:33                                                 ` Alan Cox
@ 2006-07-10 15:45                                                   ` Justin Piszcz
  2006-07-11 13:28                                                     ` LibPATA code issues / 2.6.17.3 (What is the next step?) Justin Piszcz
  2006-07-14 17:14                                                     ` Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! Mark Lord
  0 siblings, 2 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-07-10 15:45 UTC (permalink / raw)
  To: Alan Cox
  Cc: Mark Lord, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 637 bytes --]

Please verify I did the patch correctly, thanks.

On Mon, 10 Jul 2006, Alan Cox wrote:

> Ar Llu, 2006-07-10 am 09:59 -0400, ysgrifennodd Justin Piszcz:
>>> [4297741.057000] ata_gen_fixed_sense: failed ata_op=0x35
>>> [4297741.057000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ
>>> 0xb/00/00
>>> [4297741.057000] ata_gen_ata_desc_sense: failed ata_op=0x51
>>> [4297741.057000] ata4: status=0x51 { DriveReady SeekComplete Error }
>>> [4297741.057000] ata4: error=0x04 { DriveStatusError }
>>>
>>> Also got a 0xca.
>
> Thats "write" so if that is reporting as an unknown command something
> very odd indeed is happening.
>
>

[-- Attachment #2: Type: TEXT/PLAIN, Size: 2533 bytes --]

diff -uprN linux-2.6.17.3/drivers/scsi/libata-scsi.c linux-2.6.17.3-diff/drivers/scsi/libata-scsi.c
--- linux-2.6.17.3/drivers/scsi/libata-scsi.c	2006-06-30 13:37:38.000000000 -0400
+++ linux-2.6.17.3-diff/drivers/scsi/libata-scsi.c	2006-07-09 16:31:45.665112000 -0400
@@ -428,10 +428,16 @@ int ata_scsi_device_suspend(struct scsi_
  *	spin_lock_irqsave(host_set lock)
  */
 void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc,
-			u8 *ascq)
+			u8 *ascq, struct ata_queued_cmd *qc)
 {
 	int i;
 
+	struct scsi_cmnd *cmd = qc->scsicmd;
+	struct ata_taskfile *tf = &qc->tf;
+	unsigned char *sb = cmd->sense_buffer;
+	unsigned char *desc = sb + 8;
+	unsigned char ata_op = tf->command;
+	
 	/* Based on the 3ware driver translation table */
 	static const unsigned char sense_table[][4] = {
 		/* BBD|ECC|ID|MAR */
@@ -520,6 +526,7 @@ void ata_to_sense_error(unsigned id, u8 
 	printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to "
 	       "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err,
 	       *sk, *asc, *ascq);
+	printk(KERN_ERR "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op);
 	return;
 }
 
@@ -542,6 +549,7 @@ void ata_gen_ata_desc_sense(struct ata_q
 	struct ata_taskfile *tf = &qc->tf;
 	unsigned char *sb = cmd->sense_buffer;
 	unsigned char *desc = sb + 8;
+	unsigned char ata_op = tf->command;
 
 	memset(sb, 0, SCSI_SENSE_BUFFERSIZE);
 
@@ -558,8 +566,9 @@ void ata_gen_ata_desc_sense(struct ata_q
 	 * onto sense key, asc & ascq.
 	 */
 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
+		printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op);
 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
-				   &sb[1], &sb[2], &sb[3]);
+				   &sb[1], &sb[2], &sb[3],qc);
 		sb[1] &= 0x0f;
 	}
 
@@ -617,6 +626,7 @@ void ata_gen_fixed_sense(struct ata_queu
 	struct scsi_cmnd *cmd = qc->scsicmd;
 	struct ata_taskfile *tf = &qc->tf;
 	unsigned char *sb = cmd->sense_buffer;
+	unsigned char ata_op = tf->command;
 
 	memset(sb, 0, SCSI_SENSE_BUFFERSIZE);
 
@@ -633,8 +643,9 @@ void ata_gen_fixed_sense(struct ata_queu
 	 * onto sense key, asc & ascq.
 	 */
 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
+		printk(KERN_WARNING "ata_gen_fixed_sense: failed ata_op=0x%02x\n", ata_op);
 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
-				   &sb[2], &sb[12], &sb[13]);
+				   &sb[2], &sb[12], &sb[13],qc);
 		sb[2] &= 0x0f;
 	}
 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* LibPATA code issues / 2.6.17.3 (What is the next step?)
  2006-07-10 15:45                                                   ` Justin Piszcz
@ 2006-07-11 13:28                                                     ` Justin Piszcz
  2006-07-11 16:12                                                       ` Alan Cox
  2006-07-14 17:16                                                       ` Mark Lord
  2006-07-14 17:14                                                     ` Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! Mark Lord
  1 sibling, 2 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-07-11 13:28 UTC (permalink / raw)
  To: Alan Cox
  Cc: Mark Lord, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list

Alan/Jeff/Mark,

Is there anything else I can do to further troubleshoot this problem now 
that we have the failed opcode(s)?  Again, there is never any corruption 
on these drives, so it is more of an annoyance than anything else.

Other people also have this problem with these drives if you search Google 
but I am not sure they are aware of where to report their errors/problems.

opcode=0x35 & opcode=0xca

Justin.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.17.3 (What is the next step?)
  2006-07-11 13:28                                                     ` LibPATA code issues / 2.6.17.3 (What is the next step?) Justin Piszcz
@ 2006-07-11 16:12                                                       ` Alan Cox
  2006-07-12 22:10                                                         ` David Greaves
  2006-07-14 17:16                                                       ` Mark Lord
  1 sibling, 1 reply; 147+ messages in thread
From: Alan Cox @ 2006-07-11 16:12 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Mark Lord, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list

Ar Maw, 2006-07-11 am 09:28 -0400, ysgrifennodd Justin Piszcz:
> Alan/Jeff/Mark,
> 
> Is there anything else I can do to further troubleshoot this problem now 
> that we have the failed opcode(s)?  Again, there is never any corruption 
> on these drives, so it is more of an annoyance than anything else.

Nothing strikes me so far other than the data not making sense. Possibly
it will become clearer later if/when we see other examples.


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.17.3 (What is the next step?)
  2006-07-11 16:12                                                       ` Alan Cox
@ 2006-07-12 22:10                                                         ` David Greaves
  2006-07-12 22:29                                                           ` Justin Piszcz
  2006-07-13 10:55                                                           ` Erik Mouw
  0 siblings, 2 replies; 147+ messages in thread
From: David Greaves @ 2006-07-12 22:10 UTC (permalink / raw)
  To: Alan Cox
  Cc: Justin Piszcz, Mark Lord, Jeff Garzik, Sander, linux-kernel,
	IDE/ATA development list

Alan Cox wrote:
> Ar Maw, 2006-07-11 am 09:28 -0400, ysgrifennodd Justin Piszcz:
>> Alan/Jeff/Mark,
>>
>> Is there anything else I can do to further troubleshoot this problem now 
>> that we have the failed opcode(s)?  Again, there is never any corruption 
>> on these drives, so it is more of an annoyance than anything else.
> 
> Nothing strikes me so far other than the data not making sense. Possibly
> it will become clearer later if/when we see other examples.

For me it's SMART related.

smartctl -data -o on /dev/sda reliably gets a similar message.
Justin - does this smartctl command trigger a message for you?

I've been mailing on and off since January-ish.
(http://marc.theaimsgroup.com/?l=linux-ide&w=2&r=7&s=libpata&q=b)

Back in March I was running 2.6.16 (with a different version of Mark's
opcode patch) and I sent an email with the following info:

dmesg:
ata1: translated op=0x28 cmd=0x25 ATA stat/err 0x51/04 to SCSI
SK/ASC/ASCQ 0xb/00/00
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x04 { DriveStatusError }

Does that help with the diagnosis?

Also see my emails: SMART on SATA reporting errors?
  http://marc.theaimsgroup.com/?l=linux-ide&m=113933732903205&w=2

I did reply but got no response so I assumed I was just so far off base
that I was being ignored  :)

David

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.17.3 (What is the next step?)
  2006-07-12 22:10                                                         ` David Greaves
@ 2006-07-12 22:29                                                           ` Justin Piszcz
  2006-07-14 15:33                                                             ` David Greaves
  2006-07-13 10:55                                                           ` Erik Mouw
  1 sibling, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-07-12 22:29 UTC (permalink / raw)
  To: David Greaves
  Cc: Alan Cox, Mark Lord, Jeff Garzik, Sander, linux-kernel,
	IDE/ATA development list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1513 bytes --]

Unfortunately not, the correct patch you need is attached to get the 
ata_op code, against 2.6.17.3.

On Wed, 12 Jul 2006, David Greaves wrote:

> Alan Cox wrote:
>> Ar Maw, 2006-07-11 am 09:28 -0400, ysgrifennodd Justin Piszcz:
>>> Alan/Jeff/Mark,
>>>
>>> Is there anything else I can do to further troubleshoot this problem now
>>> that we have the failed opcode(s)?  Again, there is never any corruption
>>> on these drives, so it is more of an annoyance than anything else.
>>
>> Nothing strikes me so far other than the data not making sense. Possibly
>> it will become clearer later if/when we see other examples.
>
> For me it's SMART related.
>
> smartctl -data -o on /dev/sda reliably gets a similar message.
> Justin - does this smartctl command trigger a message for you?
>
> I've been mailing on and off since January-ish.
> (http://marc.theaimsgroup.com/?l=linux-ide&w=2&r=7&s=libpata&q=b)
>
> Back in March I was running 2.6.16 (with a different version of Mark's
> opcode patch) and I sent an email with the following info:
>
> dmesg:
> ata1: translated op=0x28 cmd=0x25 ATA stat/err 0x51/04 to SCSI
> SK/ASC/ASCQ 0xb/00/00
> ata1: status=0x51 { DriveReady SeekComplete Error }
> ata1: error=0x04 { DriveStatusError }
>
> Does that help with the diagnosis?
>
> Also see my emails: SMART on SATA reporting errors?
>  http://marc.theaimsgroup.com/?l=linux-ide&m=113933732903205&w=2
>
> I did reply but got no response so I assumed I was just so far off base
> that I was being ignored  :)
>
> David
>

[-- Attachment #2: Type: TEXT/PLAIN, Size: 2533 bytes --]

diff -uprN linux-2.6.17.3/drivers/scsi/libata-scsi.c linux-2.6.17.3-diff/drivers/scsi/libata-scsi.c
--- linux-2.6.17.3/drivers/scsi/libata-scsi.c	2006-06-30 13:37:38.000000000 -0400
+++ linux-2.6.17.3-diff/drivers/scsi/libata-scsi.c	2006-07-09 16:31:45.665112000 -0400
@@ -428,10 +428,16 @@ int ata_scsi_device_suspend(struct scsi_
  *	spin_lock_irqsave(host_set lock)
  */
 void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc,
-			u8 *ascq)
+			u8 *ascq, struct ata_queued_cmd *qc)
 {
 	int i;
 
+	struct scsi_cmnd *cmd = qc->scsicmd;
+	struct ata_taskfile *tf = &qc->tf;
+	unsigned char *sb = cmd->sense_buffer;
+	unsigned char *desc = sb + 8;
+	unsigned char ata_op = tf->command;
+	
 	/* Based on the 3ware driver translation table */
 	static const unsigned char sense_table[][4] = {
 		/* BBD|ECC|ID|MAR */
@@ -520,6 +526,7 @@ void ata_to_sense_error(unsigned id, u8 
 	printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to "
 	       "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err,
 	       *sk, *asc, *ascq);
+	printk(KERN_ERR "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op);
 	return;
 }
 
@@ -542,6 +549,7 @@ void ata_gen_ata_desc_sense(struct ata_q
 	struct ata_taskfile *tf = &qc->tf;
 	unsigned char *sb = cmd->sense_buffer;
 	unsigned char *desc = sb + 8;
+	unsigned char ata_op = tf->command;
 
 	memset(sb, 0, SCSI_SENSE_BUFFERSIZE);
 
@@ -558,8 +566,9 @@ void ata_gen_ata_desc_sense(struct ata_q
 	 * onto sense key, asc & ascq.
 	 */
 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
+		printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op);
 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
-				   &sb[1], &sb[2], &sb[3]);
+				   &sb[1], &sb[2], &sb[3],qc);
 		sb[1] &= 0x0f;
 	}
 
@@ -617,6 +626,7 @@ void ata_gen_fixed_sense(struct ata_queu
 	struct scsi_cmnd *cmd = qc->scsicmd;
 	struct ata_taskfile *tf = &qc->tf;
 	unsigned char *sb = cmd->sense_buffer;
+	unsigned char ata_op = tf->command;
 
 	memset(sb, 0, SCSI_SENSE_BUFFERSIZE);
 
@@ -633,8 +643,9 @@ void ata_gen_fixed_sense(struct ata_queu
 	 * onto sense key, asc & ascq.
 	 */
 	if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) {
+		printk(KERN_WARNING "ata_gen_fixed_sense: failed ata_op=0x%02x\n", ata_op);
 		ata_to_sense_error(qc->ap->id, tf->command, tf->feature,
-				   &sb[2], &sb[12], &sb[13]);
+				   &sb[2], &sb[12], &sb[13],qc);
 		sb[2] &= 0x0f;
 	}
 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.17.3 (What is the next step?)
  2006-07-12 22:10                                                         ` David Greaves
  2006-07-12 22:29                                                           ` Justin Piszcz
@ 2006-07-13 10:55                                                           ` Erik Mouw
  1 sibling, 0 replies; 147+ messages in thread
From: Erik Mouw @ 2006-07-13 10:55 UTC (permalink / raw)
  To: David Greaves
  Cc: Alan Cox, Justin Piszcz, Mark Lord, Jeff Garzik, Sander,
	linux-kernel, IDE/ATA development list

On Wed, Jul 12, 2006 at 11:10:59PM +0100, David Greaves wrote:
> Alan Cox wrote:
> > Ar Maw, 2006-07-11 am 09:28 -0400, ysgrifennodd Justin Piszcz:
> >> Alan/Jeff/Mark,
> >>
> >> Is there anything else I can do to further troubleshoot this problem now 
> >> that we have the failed opcode(s)?  Again, there is never any corruption 
> >> on these drives, so it is more of an annoyance than anything else.
> > 
> > Nothing strikes me so far other than the data not making sense. Possibly
> > it will become clearer later if/when we see other examples.
> 
> For me it's SMART related.
> 
> smartctl -data -o on /dev/sda reliably gets a similar message.
> Justin - does this smartctl command trigger a message for you?

In that case SMART just isn't enabled. smartctl -d ata --smart=on
/dev/sda should make those messages go away.

Some BIOSes have a setting to enable/disable SMART, though the option
is usually badly documented (hey, what do you expect from BIOS
writers).


Erik

-- 
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.17.3 (What is the next step?)
  2006-07-12 22:29                                                           ` Justin Piszcz
@ 2006-07-14 15:33                                                             ` David Greaves
  0 siblings, 0 replies; 147+ messages in thread
From: David Greaves @ 2006-07-14 15:33 UTC (permalink / raw)
  To: Alan Cox
  Cc: Justin Piszcz, Mark Lord, Jeff Garzik, Sander, linux-kernel,
	IDE/ATA development list, htejun

Justin Piszcz wrote:
> On Wed, 12 Jul 2006, David Greaves wrote:
> 
>> Alan Cox wrote:
>>> Ar Maw, 2006-07-11 am 09:28 -0400, ysgrifennodd Justin Piszcz:
>>>> Alan/Jeff/Mark,
>>>>
>>>> Is there anything else I can do to further troubleshoot this problem
>>>> now
>>>> that we have the failed opcode(s)?  Again, there is never any
>>>> corruption
>>>> on these drives, so it is more of an annoyance than anything else.
>>>
>>> Nothing strikes me so far other than the data not making sense. Possibly
>>> it will become clearer later if/when we see other examples.
>>
>> For me it's SMART related.
>>
>> smartctl -data -o on /dev/sda reliably gets a similar message.
>> Justin - does this smartctl command trigger a message for you?
>>
>> I've been mailing on and off since January-ish.
>> (http://marc.theaimsgroup.com/?l=linux-ide&w=2&r=7&s=libpata&q=b)
>>
>> Back in March I was running 2.6.16 (with a different version of Mark's
>> opcode patch) and I sent an email with the following info:
>>
> Unfortunately not, the correct patch you need is attached to get the
> ata_op code, against 2.6.17.3.

[mutter, mutter, getting a teeny bit fed up with applying the same
diagnostic patch (thanks Mark) and reporting this and getting no real
feedback (apart from Erik - ta - who was off base, it doesn't appear to
be BIOS and here's the pair of commands :) ... Ok, added Tejun to the
list since he's been doing EH for libata and this is some kind of E that
needs better H]

2.6.17.3 with op-code patch

smartctl -data --smart=on /dev/sda
no dmesg output
smartctl -data -o on /dev/sda
dmesg:
ata1: PIO error
ata_gen_ata_desc_sense: failed ata_op=0xb0
ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata_gen_ata_desc_sense: failed ata_op=0x51
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
ata1: PIO error
ata_gen_ata_desc_sense: failed ata_op=0xb0
ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata_gen_ata_desc_sense: failed ata_op=0x51
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
ata1: PIO error
ata_gen_ata_desc_sense: failed ata_op=0xb0
ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata_gen_ata_desc_sense: failed ata_op=0x51
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
ata1: PIO error
ata_gen_ata_desc_sense: failed ata_op=0xb0
ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata_gen_ata_desc_sense: failed ata_op=0x51
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
ata1: PIO error
ata_gen_ata_desc_sense: failed ata_op=0xb0
ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata_gen_ata_desc_sense: failed ata_op=0x51
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
ata1: PIO error
ata_gen_ata_desc_sense: failed ata_op=0xb0
ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata_gen_ata_desc_sense: failed ata_op=0x51
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x04 { DriveStatusError }

David

-- 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)!
  2006-07-10 15:45                                                   ` Justin Piszcz
  2006-07-11 13:28                                                     ` LibPATA code issues / 2.6.17.3 (What is the next step?) Justin Piszcz
@ 2006-07-14 17:14                                                     ` Mark Lord
  2006-07-14 17:17                                                       ` Justin Piszcz
  1 sibling, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-07-14 17:14 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Alan Cox, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list

>Jeff/Mark, from these errors can we reach a consensus as to the cause
>of these errors and how to eliminate the problem? 

It is up to the current subsystem maintainer to help investigate this
and come up with a solution, in cooperation with eager testers such
as yourself.  I gave away my kernel subsystem maintainer's duties about
seven years ago, because it just takes too much time to do it really well.

In this case, I'm proving a tiny amount of help, simply because I don't
see anyone else even trying, and there is obviously something wrong here.

Now.. your hacked version of my simple patch is incorrect.  It is frequently
dumping out ata_op=0x51, which is obviously the ATA status value not the
original ATA command byte.

But ignoring that, we also see some valid output from where it does trip
the code from my original patches:  ata_op=0x35.

So, the drive is rejecting an LBA48 WRITE operation, which should happen
only if the drive does not have LBA48 support.  Now, I know you posted all
of this nice info months ago, but let's see it again now, for the exact
drive that is generating that specific message.  We need to see the output
from "hdparm --Istdout /dev/sdX" for that drive.

Thanks

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.17.3 (What is the next step?)
  2006-07-11 13:28                                                     ` LibPATA code issues / 2.6.17.3 (What is the next step?) Justin Piszcz
  2006-07-11 16:12                                                       ` Alan Cox
@ 2006-07-14 17:16                                                       ` Mark Lord
  2006-07-14 17:18                                                         ` Justin Piszcz
  1 sibling, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-07-14 17:16 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Alan Cox, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list

Justin Piszcz wrote:
>
> opcode=0x35 & opcode=0xca

Those are non-DMA WRITE opcodes.  Using PIO for I/O is pretty rare these days,
so I'm betting that this is not a hard disk device -- compactflash?

-ml

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)!
  2006-07-14 17:14                                                     ` Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! Mark Lord
@ 2006-07-14 17:17                                                       ` Justin Piszcz
  2006-07-14 17:37                                                         ` Mark Lord
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-07-14 17:17 UTC (permalink / raw)
  To: Mark Lord
  Cc: Alan Cox, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list



On Fri, 14 Jul 2006, Mark Lord wrote:

>> Jeff/Mark, from these errors can we reach a consensus as to the cause
>> of these errors and how to eliminate the problem? 
>
> It is up to the current subsystem maintainer to help investigate this
> and come up with a solution, in cooperation with eager testers such
> as yourself.  I gave away my kernel subsystem maintainer's duties about
> seven years ago, because it just takes too much time to do it really well.
>
> In this case, I'm proving a tiny amount of help, simply because I don't
> see anyone else even trying, and there is obviously something wrong here.
>
> Now.. your hacked version of my simple patch is incorrect.  It is frequently
> dumping out ata_op=0x51, which is obviously the ATA status value not the
> original ATA command byte.
>
> But ignoring that, we also see some valid output from where it does trip
> the code from my original patches:  ata_op=0x35.
>
> So, the drive is rejecting an LBA48 WRITE operation, which should happen
> only if the drive does not have LBA48 support.  Now, I know you posted all
> of this nice info months ago, but let's see it again now, for the exact
> drive that is generating that specific message.  We need to see the output
> from "hdparm --Istdout /dev/sdX" for that drive.
>
> Thanks
>

Here it is:

They are identical disks (the WD 400KD), both show up as 373GB 
(formatted):

p34:~# hdparm --Istdout /dev/sdc

/dev/sdc:
  IO_support   =  0 (default 16-bit)
  readonly     =  0 (off)
  readahead    = 256 (on)
  geometry     = 48641/255/63, sectors = 781422768, start = 0
0c5a 3fff c837 0010 0000 0000 003f 0000
0000 0000 2020 2020 2020 2020 2020 2020
334e 4631 514a 3345 0000 8000 0004 332e
4141 4820 2020 5354 3334 3030 3633 3341
5320 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 8010
0000 2f00 4000 0200 0200 0007 3fff 0010
003f fc10 00fb 0010 ffff 0fff 0000 0007
0003 0078 0078 00f0 0078 0000 0000 0000
0000 0000 0000 001f 0502 0000 0040 0040
00fe 0000 346b 7d01 4023 3469 3c01 4023
407f 0000 0000 fefe fffe 0000 fe00 0000
0000 0000 0000 0000 90b0 2e93 0000 0000
0000 0000 4000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0100 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0001 90b0 2e93 90b0 2e93 2020 0002 02b6
0002 008a 3c06 3c0a 0000 07c6 0100 0800
100f 3000 0002 0080 0000 0000 00a0 0202
0000 0404 0000 0000 0000 0000 1000 000b
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 9da5
p34:~# hdparm --Istdout /dev/sdd

/dev/sdd:
  IO_support   =  0 (default 16-bit)
  readonly     =  0 (off)
  readahead    = 256 (on)
  geometry     = 48641/255/63, sectors = 781422768, start = 0
427a 3fff c837 0010 e100 0258 003f 0000
0000 000e 2020 2020 2057 442d 574d 414d
5931 3131 3335 3636 0003 8000 003f 3031
2e30 3641 3031 5744 4320 5744 3430 3030
4b44 2d30 304e 4142 3020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 8010
0000 2f00 4001 0280 0000 0007 3fff 0010
003f fc10 00fb 0100 ffff 0fff 0000 0007
0003 0078 0078 0078 0078 0000 0000 0000
0000 0000 0000 001f 0702 0000 0044 0040
00fe 001d 746b 7f01 4023 7469 3c01 4023
207f 0000 0000 0000 0000 0000 80fe 0000
0000 0000 0000 0000 90b0 2e93 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0001 0141 0000 0000 0000 075a 0000 0000
0000 0000 0000 0000 0000 0000 0002 0001
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0087
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 103f 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 48a5
p34:~#


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.17.3 (What is the next step?)
  2006-07-14 17:16                                                       ` Mark Lord
@ 2006-07-14 17:18                                                         ` Justin Piszcz
  2006-07-14 17:39                                                           ` Mark Lord
  0 siblings, 1 reply; 147+ messages in thread
From: Justin Piszcz @ 2006-07-14 17:18 UTC (permalink / raw)
  To: Mark Lord
  Cc: Alan Cox, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list

They are Western Digital 400* drives.

[4294678.049000]   Vendor: ATA       Model: WDC WD4000KD-00N  Rev: 01.0
[4294678.050000]   Vendor: ATA       Model: WDC WD4000KD-00N  Rev: 01.0

On a SiL controller, it also happens when they are on a promise controller 
too.


On Fri, 14 Jul 2006, Mark Lord wrote:

> Justin Piszcz wrote:
>> 
>> opcode=0x35 & opcode=0xca
>
> Those are non-DMA WRITE opcodes.  Using PIO for I/O is pretty rare these 
> days,
> so I'm betting that this is not a hard disk device -- compactflash?
>
> -ml
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)!
  2006-07-14 17:17                                                       ` Justin Piszcz
@ 2006-07-14 17:37                                                         ` Mark Lord
  2006-07-14 18:17                                                           ` Justin Piszcz
  0 siblings, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-07-14 17:37 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Alan Cox, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list

Justin Piszcz wrote:
> 
> 
> On Fri, 14 Jul 2006, Mark Lord wrote:
>
>> So, the drive is rejecting an LBA48 WRITE operation, which should happen
>> only if the drive does not have LBA48 support.  Now, I know you posted 
>> all
>> of this nice info months ago, but let's see it again now, for the exact
>> drive that is generating that specific message.  We need to see the 
>> output
>> from "hdparm --Istdout /dev/sdX" for that drive.
>>
>> Thanks
>>
> 
> Here it is:
> 
> They are identical disks (the WD 400KD), both show up as 373GB (formatted):


Which *exact* unit generated the WRITE errors you posted about?

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.17.3 (What is the next step?)
  2006-07-14 17:18                                                         ` Justin Piszcz
@ 2006-07-14 17:39                                                           ` Mark Lord
  2006-07-14 18:18                                                             ` Justin Piszcz
  2006-07-14 20:02                                                             ` Mark Lord
  0 siblings, 2 replies; 147+ messages in thread
From: Mark Lord @ 2006-07-14 17:39 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Alan Cox, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list

Justin Piszcz wrote:
> They are Western Digital 400* drives.
> 
> [4294678.049000]   Vendor: ATA       Model: WDC WD4000KD-00N  Rev: 01.0
> [4294678.050000]   Vendor: ATA       Model: WDC WD4000KD-00N  Rev: 01.0
> 
> On a SiL controller, it also happens when they are on a promise 
> controller too.
> 
> On Fri, 14 Jul 2006, Mark Lord wrote:
> 
>> Justin Piszcz wrote:
>>>
>>> opcode=0x35 & opcode=0xca
>>
>> Those are non-DMA WRITE opcodes.  Using PIO for I/O is pretty rare 
>> these days,
>> so I'm betting that this is not a hard disk device -- compactflash?

Okay.  So why are we issuing PIO WRITE commands to drives that
obviously should only be sent DMA commands by libata?

Perhaps that's the bug.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)!
  2006-07-14 17:37                                                         ` Mark Lord
@ 2006-07-14 18:17                                                           ` Justin Piszcz
  0 siblings, 0 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-07-14 18:17 UTC (permalink / raw)
  To: Mark Lord
  Cc: Alan Cox, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list



On Fri, 14 Jul 2006, Mark Lord wrote:

> Justin Piszcz wrote:
>> 
>> 
>> On Fri, 14 Jul 2006, Mark Lord wrote:
>> 
>>> So, the drive is rejecting an LBA48 WRITE operation, which should happen
>>> only if the drive does not have LBA48 support.  Now, I know you posted all
>>> of this nice info months ago, but let's see it again now, for the exact
>>> drive that is generating that specific message.  We need to see the output
>>> from "hdparm --Istdout /dev/sdX" for that drive.
>>> 
>>> Thanks
>>> 
>> 
>> Here it is:
>> 
>> They are identical disks (the WD 400KD), both show up as 373GB (formatted):
>
>
> Which *exact* unit generated the WRITE errors you posted about?
>

Both have generated the errors, they are identical drives and firmware.


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.17.3 (What is the next step?)
  2006-07-14 17:39                                                           ` Mark Lord
@ 2006-07-14 18:18                                                             ` Justin Piszcz
  2006-07-14 20:02                                                             ` Mark Lord
  1 sibling, 0 replies; 147+ messages in thread
From: Justin Piszcz @ 2006-07-14 18:18 UTC (permalink / raw)
  To: Mark Lord
  Cc: Alan Cox, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list



On Fri, 14 Jul 2006, Mark Lord wrote:

> Justin Piszcz wrote:
>> They are Western Digital 400* drives.
>> 
>> [4294678.049000]   Vendor: ATA       Model: WDC WD4000KD-00N  Rev: 01.0
>> [4294678.050000]   Vendor: ATA       Model: WDC WD4000KD-00N  Rev: 01.0
>> 
>> On a SiL controller, it also happens when they are on a promise controller 
>> too.
>> 
>> On Fri, 14 Jul 2006, Mark Lord wrote:
>> 
>>> Justin Piszcz wrote:
>>>> 
>>>> opcode=0x35 & opcode=0xca
>>> 
>>> Those are non-DMA WRITE opcodes.  Using PIO for I/O is pretty rare these 
>>> days,
>>> so I'm betting that this is not a hard disk device -- compactflash?
>
> Okay.  So why are we issuing PIO WRITE commands to drives that
> obviously should only be sent DMA commands by libata?
>
> Perhaps that's the bug.
>

Jeff/Alan -- ? Could this be it?

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.17.3 (What is the next step?)
  2006-07-14 17:39                                                           ` Mark Lord
  2006-07-14 18:18                                                             ` Justin Piszcz
@ 2006-07-14 20:02                                                             ` Mark Lord
  1 sibling, 0 replies; 147+ messages in thread
From: Mark Lord @ 2006-07-14 20:02 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Alan Cox, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list

Mark Lord wrote:
> Justin Piszcz wrote:
>> They are Western Digital 400* drives.
>>
>> [4294678.049000]   Vendor: ATA       Model: WDC WD4000KD-00N  Rev: 01.0
>> [4294678.050000]   Vendor: ATA       Model: WDC WD4000KD-00N  Rev: 01.0
>>
>> On a SiL controller, it also happens when they are on a promise 
>> controller too.
>>
>> On Fri, 14 Jul 2006, Mark Lord wrote:
>>
>>> Justin Piszcz wrote:
>>>>
>>>> opcode=0x35 & opcode=0xca
>>>
>>> Those are non-DMA WRITE opcodes.  Using PIO for I/O is pretty rare 
>>> these days,
>>> so I'm betting that this is not a hard disk device -- compactflash?
> 
> Okay.  So why are we issuing PIO WRITE commands to drives that
> obviously should only be sent DMA commands by libata?
> 
> Perhaps that's the bug.

Oh wait.. I remember this.. No, those are DMA commands,
despite the misleading libata name for them.  We went through
this before last spring..

Okay.  So I wonder what's really going on.
The next step would be to instrument the interrupt handler,
so that when it sees bad-status, it dumps out the stat/err values
right then and there, before anything else can muck with them.

It might also be good to have it dump out the controller engine's
DMA status/err values, assuming the controller has registers for those.

Then we should get a better picture of what's going on.
Assuming the drives aren't lying to us (a perfectly good assumption here),
then the controller must be aborting the transfer unexpectedly.

Cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-02  1:19     ` Eric D. Mudama
@ 2006-03-02  1:39       ` Eric D. Mudama
  0 siblings, 0 replies; 147+ messages in thread
From: Eric D. Mudama @ 2006-03-02  1:39 UTC (permalink / raw)
  To: Nicolas Mailhot; +Cc: Mark Lord, linux-ide, linux-kernel

On 3/1/06, Eric D. Mudama <edmudama@gmail.com> wrote:
> I believe this core should not be part of the FUA whitelist.  If I
> remember correctly, there are other implementations out there with
> similar limitations to opcodes this "new" to ATA.

That being said, I see from

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951

that a blacklisting of some Maxtor drives for this issue has
supposedly occurred or been pushed and accepted "upstream" in git ....
 For the obvious (selfish) reasons, I'd like to minimize the number of
Maxtor drives that are blacklisted, as I don't believe this is a drive
issue at all.

If there's a drive model out there reporting support for FUA but
screwing it up, I'm all ears as that's something I need to know about.
 If basic adapter functional testing is required for some of these
low-level commands, then that might be something I can help with too
(on a very limited scale), since we have access to ~100 different
chipsets.

--eric

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 23:12   ` Nicolas Mailhot
  2006-03-01 23:31     ` Jeff Garzik
@ 2006-03-02  1:19     ` Eric D. Mudama
  2006-03-02  1:39       ` Eric D. Mudama
  1 sibling, 1 reply; 147+ messages in thread
From: Eric D. Mudama @ 2006-03-02  1:19 UTC (permalink / raw)
  To: Nicolas Mailhot; +Cc: Mark Lord, linux-ide, linux-kernel

On 3/1/06, Nicolas Mailhot <nicolas.mailhot@gmail.com> wrote:
> Le mercredi 01 mars 2006 à 14:22 -0500, Mark Lord a écrit :
> > Nicolas Mailhot wrote:
> > >>
> > > How about the drives that got blacklisted following :
> > > http://bugzilla.kernel.org/show_bug.cgi?id=5914 ?
> > > and
> > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951 ?
> > >
> > > Device Model:     Maxtor 6L300S0
> > > Firmware Version: BANC1G10
> > >
> > > on Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
> >
> > Mmm.. somebody with one of those controllers should check
> > to see if *any* drives work with FUA, and blacklist the controller
> > instead of the drives if everything is failing.
>
> I'm a someone with such a controller (that's my boog here)
> But I only have these drives.
> So I can only confirm the combo it deadly.
> (I could possibly try to plug one on the nforce4 controller, not sure if
> extracting the box from the tangle of cables and hardware he's part of
> is worth it. sata_nv is rev-eng, while the siI docs are public, right?)
>
> I do suspect Eric D. Mudama knows if the problem is on the hard-drive
> side though
>
> Regards,
>
> --
> Nicolas Mailhot
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.1 (GNU/Linux)
>
> iEYEABECAAYFAkQGKmoACgkQI2bVKDsp8g0veQCggJkweq1nQn7YNSEIobOHitk0
> QXsAn0TnHI/6LBG9nezBnS0MTskLml0W
> =s1TM
> -----END PGP SIGNATURE-----
>


I didn't know offhand so we plugged in a bus analzyer and took a look
here in the lab... We didn't have a 3114 lying around, but issuing the
Write DMA FUA (0x3D) opcode on a 3112 resulted in a D0h soft hang.  I
think they're related (4-port vs 2-port).

Looking at the bus trace, the command is issued on the SATA bus, the
drive generates a DMA Activate FIS which is accepted by the 3112, and
then the 3112 generates a Data Payload FIS (46h) with no contents.

The first DWORD of the payload is a HOLD primitive, to which the
device promptly responds with HOLDA, and the two are in a soft bus
lock and will sit forever.  No data is ever generated by the host
(stopped capture after 4 seconds).

I believe this core should not be part of the FUA whitelist.  If I
remember correctly, there are other implementations out there with
similar limitations to opcodes this "new" to ATA.

--eric

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 23:12   ` Nicolas Mailhot
@ 2006-03-01 23:31     ` Jeff Garzik
  2006-03-02  1:19     ` Eric D. Mudama
  1 sibling, 0 replies; 147+ messages in thread
From: Jeff Garzik @ 2006-03-01 23:31 UTC (permalink / raw)
  To: Nicolas Mailhot; +Cc: Mark Lord, edmudama, linux-ide, linux-kernel

Nicolas Mailhot wrote:
> is worth it. sata_nv is rev-eng, while the siI docs are public, right?)

sata_nv was written by NVIDIA.

	Jeff



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 19:22 ` Mark Lord
@ 2006-03-01 23:12   ` Nicolas Mailhot
  2006-03-01 23:31     ` Jeff Garzik
  2006-03-02  1:19     ` Eric D. Mudama
  0 siblings, 2 replies; 147+ messages in thread
From: Nicolas Mailhot @ 2006-03-01 23:12 UTC (permalink / raw)
  To: Mark Lord; +Cc: edmudama, linux-ide, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1119 bytes --]

Le mercredi 01 mars 2006 à 14:22 -0500, Mark Lord a écrit :
> Nicolas Mailhot wrote:
> >>
> > How about the drives that got blacklisted following :
> > http://bugzilla.kernel.org/show_bug.cgi?id=5914 ?
> > and
> > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951 ?
> > 
> > Device Model:     Maxtor 6L300S0
> > Firmware Version: BANC1G10
> > 
> > on Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
> 
> Mmm.. somebody with one of those controllers should check
> to see if *any* drives work with FUA, and blacklist the controller
> instead of the drives if everything is failing.

I'm a someone with such a controller (that's my boog here)
But I only have these drives.
So I can only confirm the combo it deadly.
(I could possibly try to plug one on the nforce4 controller, not sure if
extracting the box from the tangle of cables and hardware he's part of
is worth it. sata_nv is rev-eng, while the siI docs are public, right?)

I do suspect Eric D. Mudama knows if the problem is on the hard-drive
side though

Regards,

-- 
Nicolas Mailhot

[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 199 bytes --]

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
  2006-03-01 19:00 LibPATA code issues / 2.6.15.4 Nicolas Mailhot
@ 2006-03-01 19:22 ` Mark Lord
  2006-03-01 23:12   ` Nicolas Mailhot
  0 siblings, 1 reply; 147+ messages in thread
From: Mark Lord @ 2006-03-01 19:22 UTC (permalink / raw)
  To: Nicolas Mailhot; +Cc: edmudama, linux-ide, linux-kernel

Nicolas Mailhot wrote:
>>
> How about the drives that got blacklisted following :
> http://bugzilla.kernel.org/show_bug.cgi?id=5914 ?
> and
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951 ?
> 
> Device Model:     Maxtor 6L300S0
> Firmware Version: BANC1G10
> 
> on Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)

Mmm.. somebody with one of those controllers should check
to see if *any* drives work with FUA, and blacklist the controller
instead of the drives if everything is failing.

Cheers

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: LibPATA code issues / 2.6.15.4
@ 2006-03-01 19:00 Nicolas Mailhot
  2006-03-01 19:22 ` Mark Lord
  0 siblings, 1 reply; 147+ messages in thread
From: Nicolas Mailhot @ 2006-03-01 19:00 UTC (permalink / raw)
  To: edmudama; +Cc: linux-ide, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 875 bytes --]


> those drives should support all FUA opcodes properly, both queued and unqueued
> 
> On 2/28/06, Jeff Garzik <jgarzik@pobox.com> wrote:
> > Mark Lord wrote:
> > > David Greaves wrote:
> > >
> > >>
> > >> scsi1 : sata_sil
> > >>   Vendor: ATA       Model: Maxtor 6B200M0    Rev: BANC
> > >>   Type:   Direct-Access                      ANSI SCSI revision: 05
> > >>   Vendor: ATA       Model: Maxtor 6B200M0    Rev: BANC
> > >>   Type:   Direct-Access                      ANSI SCSI revision: 05

How about the drives that got blacklisted following :
http://bugzilla.kernel.org/show_bug.cgi?id=5914 ?
and
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951 ?

Device Model:     Maxtor 6L300S0
Firmware Version: BANC1G10

on

Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)

Regards,

-- 
Nicolas Mailhot

[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 199 bytes --]

^ permalink raw reply	[flat|nested] 147+ messages in thread

end of thread, other threads:[~2006-07-14 20:02 UTC | newest]

Thread overview: 147+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-14  9:48 LibPATA code issues / 2.6.15.4 Justin Piszcz
2006-02-14 14:50 ` Mark Lord
2006-02-14 16:27   ` David Greaves
2006-02-14 17:12     ` Justin Piszcz
2006-02-14 18:00       ` Mark Lord
2006-02-14 18:06         ` Justin Piszcz
2006-02-23 23:39         ` Justin Piszcz
2006-02-25 15:32           ` Mark Lord
2006-02-25 15:58             ` Justin Piszcz
2006-02-25 16:11               ` Jesper Juhl
2006-02-25 16:21               ` Mark Lord
2006-02-25 11:34         ` David Greaves
2006-02-25 16:20           ` Mark Lord
2006-02-25 17:45             ` Justin Piszcz
2006-02-25 18:28               ` Mark Lord
2006-02-25 18:55                 ` Justin Piszcz
2006-02-25 19:29                 ` Justin Piszcz
2006-02-25 19:53                   ` David Greaves
2006-02-25 19:47                 ` David Greaves
2006-02-26  2:27                   ` Mark Lord
2006-02-26  9:56                     ` David Greaves
2006-02-26 14:04                       ` Mark Lord
2006-02-27 21:34                         ` Mark Lord
2006-02-28  1:33                           ` Tejun Heo
2006-02-28  1:46                             ` Linus Torvalds
2006-02-28  2:07                               ` Jeff Garzik
2006-02-28  2:14                                 ` Linus Torvalds
2006-02-28  2:52                                   ` Jeff Garzik
2006-02-28  3:36                                   ` Jeff Garzik
2006-02-28  4:11                                     ` Mark Lord
2006-02-28 10:30                                 ` Alan Cox
2006-02-28  8:03                               ` Jens Axboe
2006-02-28  4:16                             ` Mark Lord
2006-02-28 10:32                               ` Alan Cox
2006-02-28 10:30                                 ` Justin Piszcz
2006-02-28 10:39                               ` David Greaves
2006-02-28 14:37                                 ` Mark Lord
2006-02-28 21:04                                   ` Bill Davidsen
2006-03-08  2:57                                     ` Mark Lord
2006-03-08  3:18                                       ` Dave Jones
2006-03-08  3:23                                         ` Mark Lord
2006-03-08 15:37                                       ` Bill Davidsen
2006-02-28 14:38                                 ` Mark Lord
2006-02-28 15:16                                   ` Alan Cox
2006-03-01 17:33                                     ` David Greaves
2006-03-01 18:37                                       ` Alan Cox
2006-03-01 20:12                                         ` Phillip Susi
2006-03-08 16:46                                           ` Alan Cox
2006-02-28 15:31                                 ` Mark Lord
2006-02-28 15:34                                   ` Jeff Garzik
2006-02-28 16:57                                     ` Eric D. Mudama
2006-03-01  1:04                                       ` Mark Lord
2006-03-01 11:37                                         ` Justin Piszcz
2006-03-01 13:17                                         ` Justin Piszcz
2006-03-01 17:41                                     ` David Greaves
2006-03-01 17:46                                       ` Mark Lord
2006-03-01 18:12                                         ` David Greaves
2006-03-01 18:30                                           ` Mark Lord
2006-03-01 18:32                                             ` Justin Piszcz
2006-03-01 18:33                                             ` Justin Piszcz
2006-03-01 18:48                                             ` David Greaves
2006-03-01 19:49                                               ` David Greaves
2006-03-03 19:38                                                 ` Justin Piszcz
2006-03-03 22:46                                                   ` David Greaves
2006-03-04 14:25                                                     ` Mark Lord
2006-03-06  6:13                                                       ` David Greaves
2006-03-21 18:11                                                         ` David Greaves
2006-03-22 15:23                                                           ` David Greaves
2006-03-05 11:43                                                 ` Justin Piszcz
2006-03-05 12:41                                                   ` Justin Piszcz
2006-03-05 22:58                                                     ` Mark Lord
2006-03-05 23:00                                                       ` Mark Lord
2006-03-05 23:19                                                         ` Justin Piszcz
2006-03-05 23:39                                                       ` Jeff Garzik
2006-04-21 19:14                                                         ` LibPATA code issues / 2.6.16 (previously, 2.6.15.x) Justin Piszcz
2006-04-21 19:18                                                           ` Jeff Garzik
2006-04-21 19:28                                                             ` Linus Torvalds
2006-04-21 22:46                                                               ` Jeff Garzik
2006-04-22  0:05                                                                 ` Linus Torvalds
2006-05-06 15:09                                                                   ` [smartmontools-support]Re: " Leon Woestenberg
2006-05-07 12:44                                                                     ` Ingo Oeser
2006-06-11 11:13                                                                   ` Justin Piszcz
2006-03-01 19:06                                             ` LibPATA code issues / 2.6.15.4 Justin Piszcz
2006-03-01 19:28                                               ` Mark Lord
2006-03-01 19:35                                               ` Mark Lord
2006-03-01 19:38                                                 ` Justin Piszcz
2006-03-01 19:41                                                   ` Jeff Garzik
2006-02-26 12:27                     ` James Courtier-Dutton
2006-02-26 12:55                       ` David Greaves
2006-02-26 13:56                       ` Mark Lord
2006-02-26 14:30                         ` Kernel SeekCompleteErrors... Different from " James Courtier-Dutton
2006-02-26 17:03                           ` Mark Lord
2006-02-26 17:13                             ` Dr. David Alan Gilbert
2006-02-26 17:43                               ` Alan Cox
2006-02-26 20:36                                 ` Mark Lord
2006-02-27 11:48                                   ` Alan Cox
2006-02-27 13:40                                     ` Mark Lord
2006-02-14 23:58   ` Justin Piszcz
2006-02-17  8:45   ` Jeff Garzik
2006-02-17 14:59     ` Mark Lord
2006-02-17 15:00       ` Justin Piszcz
2006-02-18 20:43       ` Sander
2006-02-18 21:42         ` Mark Lord
2006-02-18 21:51           ` Justin Piszcz
2006-02-19  7:14           ` Sander
2006-02-19 15:30             ` Mark Lord
2006-02-19 17:16               ` Sander
2006-07-06 23:08                 ` Justin Piszcz
2006-07-07 13:08                   ` Mark Lord
2006-07-07 13:24                     ` Justin Piszcz
2006-07-07 13:43                       ` Mark Lord
2006-07-07 13:48                         ` Justin Piszcz
2006-07-07 14:01                         ` Justin Piszcz
2006-07-07 14:35                         ` Justin Piszcz
2006-07-07 18:53                           ` Justin Piszcz
2006-07-07 19:19                             ` Jeff Garzik
2006-07-07 19:28                               ` Justin Piszcz
     [not found]                                 ` <200607091224.31451.liml@rtr.ca>
2006-07-09 17:27                                   ` Justin Piszcz
2006-07-09 20:16                                     ` Justin Piszcz
2006-07-09 20:40                                       ` LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! Justin Piszcz
2006-07-09 20:46                                         ` Justin Piszcz
2006-07-09 21:05                                           ` Justin Piszcz
2006-07-09 22:03                                             ` Justin Piszcz
2006-07-10 13:59                                               ` Follow up? " Justin Piszcz
2006-07-10 15:33                                                 ` Alan Cox
2006-07-10 15:45                                                   ` Justin Piszcz
2006-07-11 13:28                                                     ` LibPATA code issues / 2.6.17.3 (What is the next step?) Justin Piszcz
2006-07-11 16:12                                                       ` Alan Cox
2006-07-12 22:10                                                         ` David Greaves
2006-07-12 22:29                                                           ` Justin Piszcz
2006-07-14 15:33                                                             ` David Greaves
2006-07-13 10:55                                                           ` Erik Mouw
2006-07-14 17:16                                                       ` Mark Lord
2006-07-14 17:18                                                         ` Justin Piszcz
2006-07-14 17:39                                                           ` Mark Lord
2006-07-14 18:18                                                             ` Justin Piszcz
2006-07-14 20:02                                                             ` Mark Lord
2006-07-14 17:14                                                     ` Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! Mark Lord
2006-07-14 17:17                                                       ` Justin Piszcz
2006-07-14 17:37                                                         ` Mark Lord
2006-07-14 18:17                                                           ` Justin Piszcz
2006-03-01 19:00 LibPATA code issues / 2.6.15.4 Nicolas Mailhot
2006-03-01 19:22 ` Mark Lord
2006-03-01 23:12   ` Nicolas Mailhot
2006-03-01 23:31     ` Jeff Garzik
2006-03-02  1:19     ` Eric D. Mudama
2006-03-02  1:39       ` Eric D. Mudama

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).