* LibPATA code issues / 2.6.15.4 @ 2006-02-14 9:48 Justin Piszcz 2006-02-14 14:50 ` Mark Lord 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-02-14 9:48 UTC (permalink / raw) To: Jeff Garzik; +Cc: linux-kernel Jeff, I'd have to double check but I do not recall getting these errors before the pass-thru code was introduced in 2.6.15, I also was not running the smart daemon until 2.6.15 for SATA drives as it was not supported. I had a few issues before that I posted to LKML, those were due to too many SATA devices etc, everything is back to normal for the most part. Speed, etc, all is well again, almost... /dev/sdc: Timing buffered disk reads: 154 MB in 3.02 seconds = 50.97 MB/sec /dev/sdc: Timing buffered disk reads: 162 MB in 3.00 seconds = 53.94 MB/sec The only issue I have is when I copy a lot of files to a WD 400GB drive I these pesky errors in dmesg: ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x04 { DriveStatusError } Yet, everything copied (226GB) or so to the 400GB drive without a single I/O error that I am aware of. So my question is, why do I get these errors in dmesg if they are not critical? Thanks, Justin. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-14 9:48 LibPATA code issues / 2.6.15.4 Justin Piszcz @ 2006-02-14 14:50 ` Mark Lord 2006-02-14 16:27 ` David Greaves ` (2 more replies) 0 siblings, 3 replies; 147+ messages in thread From: Mark Lord @ 2006-02-14 14:50 UTC (permalink / raw) To: Justin Piszcz; +Cc: Jeff Garzik, linux-kernel, IDE/ATA development list Justin Piszcz wrote: .. > ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 > ata3: status=0x51 { DriveReady SeekComplete Error } > ata3: error=0x04 { DriveStatusError } I wonder if the FUA logic is inserting cache-flush commands and perhaps the drive is rejecting those? Jeff, we really ought to be including the failed ATA opcode in those error messages!! Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-14 14:50 ` Mark Lord @ 2006-02-14 16:27 ` David Greaves 2006-02-14 17:12 ` Justin Piszcz 2006-02-14 23:58 ` Justin Piszcz 2006-02-17 8:45 ` Jeff Garzik 2 siblings, 1 reply; 147+ messages in thread From: David Greaves @ 2006-02-14 16:27 UTC (permalink / raw) To: Mark Lord Cc: Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list Mark Lord wrote: > Justin Piszcz wrote: > .. > >> ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } > > > I wonder if the FUA logic is inserting cache-flush commands > and perhaps the drive is rejecting those? > > Jeff, we really ought to be including the failed ATA opcode > in those error messages!! > If such a thing were available as a patch then I too would apply it and hopefully could provide useful feedback. David PS My problems: http://marc.theaimsgroup.com/?l=linux-kernel&m=113769509617034&w=2 http://marc.theaimsgroup.com/?l=linux-ide&m=113828551519727&w=2 http://marc.theaimsgroup.com/?l=linux-ide&m=113829573105369&w=2 http://marc.theaimsgroup.com/?l=linux-ide&m=113933732903205&w=2 ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-14 16:27 ` David Greaves @ 2006-02-14 17:12 ` Justin Piszcz 2006-02-14 18:00 ` Mark Lord 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-02-14 17:12 UTC (permalink / raw) To: David Greaves Cc: Mark Lord, Jeff Garzik, linux-kernel, IDE/ATA development list I would like to try the patch too, if available. I got these errors when nothing (apparent) was going on. [25158.676998] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [25158.677005] ata3: status=0x51 { DriveReady SeekComplete Error } [25158.677009] ata3: error=0x04 { DriveStatusError } [27306.663556] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [27306.663563] ata3: status=0x51 { DriveReady SeekComplete Error } [27306.663567] ata3: error=0x04 { DriveStatusError } On Tue, 14 Feb 2006, David Greaves wrote: > Mark Lord wrote: > >> Justin Piszcz wrote: >> .. >> >>> ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >>> ata3: status=0x51 { DriveReady SeekComplete Error } >>> ata3: error=0x04 { DriveStatusError } >> >> >> I wonder if the FUA logic is inserting cache-flush commands >> and perhaps the drive is rejecting those? >> >> Jeff, we really ought to be including the failed ATA opcode >> in those error messages!! >> > If such a thing were available as a patch then I too would apply it and > hopefully could provide useful feedback. > > David > PS My problems: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=113769509617034&w=2 > http://marc.theaimsgroup.com/?l=linux-ide&m=113828551519727&w=2 > http://marc.theaimsgroup.com/?l=linux-ide&m=113829573105369&w=2 > http://marc.theaimsgroup.com/?l=linux-ide&m=113933732903205&w=2 > > ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-14 17:12 ` Justin Piszcz @ 2006-02-14 18:00 ` Mark Lord 2006-02-14 18:06 ` Justin Piszcz ` (2 more replies) 0 siblings, 3 replies; 147+ messages in thread From: Mark Lord @ 2006-02-14 18:00 UTC (permalink / raw) To: Justin Piszcz Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list On Tuesday 14 February 2006 12:12, Justin Piszcz wrote: > I would like to try the patch too, if available. Something like this: (for 2.6.16-rc3-git2, but should be okay on 2.6.15 also). Untested: include the original SCSI opcode in printk's for libata SCSI errors, to help understand where the errors are coming from. Signed-Off-By: Mark Lord <mlord@pobox.com> --- linux/drivers/scsi/libata-scsi.c.orig 2006-02-12 19:27:25.000000000 -0500 +++ linux/drivers/scsi/libata-scsi.c 2006-02-14 12:54:17.000000000 -0500 @@ -420,6 +420,7 @@ * @sk: the sense key we'll fill out * @asc: the additional sense code we'll fill out * @ascq: the additional sense code qualifier we'll fill out + * @opcode: the original SCSI command opcode byte * * Converts an ATA error into a SCSI error. Fill out pointers to * SK, ASC, and ASCQ bytes for later use in fixed or descriptor @@ -429,7 +430,7 @@ * spin_lock_irqsave(host_set lock) */ void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc, - u8 *ascq) + u8 *ascq, u8 opcode) { int i; @@ -508,8 +509,8 @@ } } /* No error? Undecoded? */ - printk(KERN_WARNING "ata%u: no sense translation for status: 0x%02x\n", - id, drv_stat); + printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: 0x%02x\n", + id, opcode, drv_stat); /* For our last chance pick, use medium read error because * it's much more common than an ATA drive telling you a write @@ -520,8 +521,8 @@ *ascq = 0x04; /* "auto-reallocation failed" */ translate_done: - printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to " - "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err, + printk(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to " + "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err, *sk, *asc, *ascq); return; } @@ -562,7 +563,7 @@ */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { ata_to_sense_error(qc->ap->id, tf->command, tf->feature, - &sb[1], &sb[2], &sb[3]); + &sb[1], &sb[2], &sb[3], cmd->cmnd[0]); sb[1] &= 0x0f; } @@ -637,7 +638,7 @@ */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { ata_to_sense_error(qc->ap->id, tf->command, tf->feature, - &sb[2], &sb[12], &sb[13]); + &sb[2], &sb[12], &sb[13], cmd->cmnd[0]); sb[2] &= 0x0f; } ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-14 18:00 ` Mark Lord @ 2006-02-14 18:06 ` Justin Piszcz 2006-02-23 23:39 ` Justin Piszcz 2006-02-25 11:34 ` David Greaves 2 siblings, 0 replies; 147+ messages in thread From: Justin Piszcz @ 2006-02-14 18:06 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list Thanks, I will reboot later tonight and see what type of error codes it gives me. Against 2.6.15.4: # patch -p1 < /tmp/a patching file drivers/scsi/libata-scsi.c Hunk #1 succeeded at 404 (offset -16 lines). Hunk #2 succeeded at 414 (offset -16 lines). Hunk #3 succeeded at 493 (offset -16 lines). Hunk #4 succeeded at 505 (offset -16 lines). Hunk #5 succeeded at 547 (offset -16 lines). Hunk #6 succeeded at 622 (offset -16 lines). # On Tue, 14 Feb 2006, Mark Lord wrote: > On Tuesday 14 February 2006 12:12, Justin Piszcz wrote: >> I would like to try the patch too, if available. > > Something like this: (for 2.6.16-rc3-git2, but should be okay on 2.6.15 also). > > Untested: include the original SCSI opcode in printk's for libata SCSI errors, > to help understand where the errors are coming from. > > Signed-Off-By: Mark Lord <mlord@pobox.com> > > --- linux/drivers/scsi/libata-scsi.c.orig 2006-02-12 19:27:25.000000000 -0500 > +++ linux/drivers/scsi/libata-scsi.c 2006-02-14 12:54:17.000000000 -0500 > @@ -420,6 +420,7 @@ > * @sk: the sense key we'll fill out > * @asc: the additional sense code we'll fill out > * @ascq: the additional sense code qualifier we'll fill out > + * @opcode: the original SCSI command opcode byte > * > * Converts an ATA error into a SCSI error. Fill out pointers to > * SK, ASC, and ASCQ bytes for later use in fixed or descriptor > @@ -429,7 +430,7 @@ > * spin_lock_irqsave(host_set lock) > */ > void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc, > - u8 *ascq) > + u8 *ascq, u8 opcode) > { > int i; > > @@ -508,8 +509,8 @@ > } > } > /* No error? Undecoded? */ > - printk(KERN_WARNING "ata%u: no sense translation for status: 0x%02x\n", > - id, drv_stat); > + printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: 0x%02x\n", > + id, opcode, drv_stat); > > /* For our last chance pick, use medium read error because > * it's much more common than an ATA drive telling you a write > @@ -520,8 +521,8 @@ > *ascq = 0x04; /* "auto-reallocation failed" */ > > translate_done: > - printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to " > - "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err, > + printk(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to " > + "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err, > *sk, *asc, *ascq); > return; > } > @@ -562,7 +563,7 @@ > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > - &sb[1], &sb[2], &sb[3]); > + &sb[1], &sb[2], &sb[3], cmd->cmnd[0]); > sb[1] &= 0x0f; > } > > @@ -637,7 +638,7 @@ > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > - &sb[2], &sb[12], &sb[13]); > + &sb[2], &sb[12], &sb[13], cmd->cmnd[0]); > sb[2] &= 0x0f; > } > > ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-14 18:00 ` Mark Lord 2006-02-14 18:06 ` Justin Piszcz @ 2006-02-23 23:39 ` Justin Piszcz 2006-02-25 15:32 ` Mark Lord 2006-02-25 11:34 ` David Greaves 2 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-02-23 23:39 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list I have reproduced the error with the patched kernel! Here it is: [263864.109854] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [263864.109861] ata3: status=0x51 { DriveReady SeekComplete Error } [263864.109866] ata3: error=0x04 { DriveStatusError } Here is how I got it to error: $ for i in `seq 1 1000`; do dd if=/dev/zero of=file.$i bs=1M count=$i; done Now, how to fix? :) On Tue, 14 Feb 2006, Mark Lord wrote: > On Tuesday 14 February 2006 12:12, Justin Piszcz wrote: >> I would like to try the patch too, if available. > > Something like this: (for 2.6.16-rc3-git2, but should be okay on 2.6.15 also). > > Untested: include the original SCSI opcode in printk's for libata SCSI errors, > to help understand where the errors are coming from. > > Signed-Off-By: Mark Lord <mlord@pobox.com> > > --- linux/drivers/scsi/libata-scsi.c.orig 2006-02-12 19:27:25.000000000 -0500 > +++ linux/drivers/scsi/libata-scsi.c 2006-02-14 12:54:17.000000000 -0500 > @@ -420,6 +420,7 @@ > * @sk: the sense key we'll fill out > * @asc: the additional sense code we'll fill out > * @ascq: the additional sense code qualifier we'll fill out > + * @opcode: the original SCSI command opcode byte > * > * Converts an ATA error into a SCSI error. Fill out pointers to > * SK, ASC, and ASCQ bytes for later use in fixed or descriptor > @@ -429,7 +430,7 @@ > * spin_lock_irqsave(host_set lock) > */ > void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc, > - u8 *ascq) > + u8 *ascq, u8 opcode) > { > int i; > > @@ -508,8 +509,8 @@ > } > } > /* No error? Undecoded? */ > - printk(KERN_WARNING "ata%u: no sense translation for status: 0x%02x\n", > - id, drv_stat); > + printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: 0x%02x\n", > + id, opcode, drv_stat); > > /* For our last chance pick, use medium read error because > * it's much more common than an ATA drive telling you a write > @@ -520,8 +521,8 @@ > *ascq = 0x04; /* "auto-reallocation failed" */ > > translate_done: > - printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to " > - "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err, > + printk(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to " > + "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err, > *sk, *asc, *ascq); > return; > } > @@ -562,7 +563,7 @@ > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > - &sb[1], &sb[2], &sb[3]); > + &sb[1], &sb[2], &sb[3], cmd->cmnd[0]); > sb[1] &= 0x0f; > } > > @@ -637,7 +638,7 @@ > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > - &sb[2], &sb[12], &sb[13]); > + &sb[2], &sb[12], &sb[13], cmd->cmnd[0]); > sb[2] &= 0x0f; > } > > ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-23 23:39 ` Justin Piszcz @ 2006-02-25 15:32 ` Mark Lord 2006-02-25 15:58 ` Justin Piszcz 0 siblings, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-02-25 15:32 UTC (permalink / raw) To: Justin Piszcz Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list Justin Piszcz wrote: > I have reproduced the error with the patched kernel! > > Here it is: > > [263864.109854] ata3: translated ATA stat/err 0x51/04 to SCSI > SK/ASC/ASCQ 0xb/00/00 > [263864.109861] ata3: status=0x51 { DriveReady SeekComplete Error } > [263864.109866] ata3: error=0x04 { DriveStatusError } Nope.. patch not present, as otherwise the line above would have read something like this: > [263864.109854] ata3: translated op=0x21 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 So we didn't get the extra info since the patch wasn't present. Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 15:32 ` Mark Lord @ 2006-02-25 15:58 ` Justin Piszcz 2006-02-25 16:11 ` Jesper Juhl 2006-02-25 16:21 ` Mark Lord 0 siblings, 2 replies; 147+ messages in thread From: Justin Piszcz @ 2006-02-25 15:58 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list The kernel is patched, if you did not get what you wanted maybe the patch does not work in some instances or there is a bug? On Sat, 25 Feb 2006, Mark Lord wrote: > Justin Piszcz wrote: >> I have reproduced the error with the patched kernel! >> >> Here it is: >> >> [263864.109854] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ >> 0xb/00/00 >> [263864.109861] ata3: status=0x51 { DriveReady SeekComplete Error } >> [263864.109866] ata3: error=0x04 { DriveStatusError } > > Nope.. patch not present, as otherwise the line above would have > read something like this: > >> [263864.109854] ata3: translated op=0x21 ATA stat/err 0x51/04 to SCSI > SK/ASC/ASCQ 0xb/00/00 > > So we didn't get the extra info since the patch wasn't present. > > Cheers > ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 15:58 ` Justin Piszcz @ 2006-02-25 16:11 ` Jesper Juhl 2006-02-25 16:21 ` Mark Lord 1 sibling, 0 replies; 147+ messages in thread From: Jesper Juhl @ 2006-02-25 16:11 UTC (permalink / raw) To: Justin Piszcz Cc: Mark Lord, David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list On 2/25/06, Justin Piszcz <jpiszcz@lucidpixels.com> wrote: Please don't top-post. > The kernel is patched, if you did not get what you wanted maybe the patch > does not work in some instances or there is a bug? > You may have patched a kernel source with Mark's patch, but you are very clearly not running a kernel build from that patched source. As can be seen from (for example) this bit from Mark's patch translate_done: - printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to " - "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err, + printk(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to " + "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err, *sk, *asc, *ascq); the patch changes the text being printed. In this case the text "ata%u: translated ATA stat/err ..." is changed into "ata%u: translated ATA stat/err ..." And if we look at the output you posted : > >> Here it is: > >> > >> [263864.109854] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ > >> 0xb/00/00 That string is clearly from an un-patched kernel as Mark also pointed out in his reply to you. -- Jesper Juhl <jesper.juhl@gmail.com> Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 15:58 ` Justin Piszcz 2006-02-25 16:11 ` Jesper Juhl @ 2006-02-25 16:21 ` Mark Lord 1 sibling, 0 replies; 147+ messages in thread From: Mark Lord @ 2006-02-25 16:21 UTC (permalink / raw) To: Justin Piszcz Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list Justin Piszcz wrote: > The kernel is patched, if you did not get what you wanted maybe the > patch does not work in some instances or there is a bug? No, the output would be there if those messages came from the patched kernel. (read the patch and see what I mean..). Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-14 18:00 ` Mark Lord 2006-02-14 18:06 ` Justin Piszcz 2006-02-23 23:39 ` Justin Piszcz @ 2006-02-25 11:34 ` David Greaves 2006-02-25 16:20 ` Mark Lord 2 siblings, 1 reply; 147+ messages in thread From: David Greaves @ 2006-02-25 11:34 UTC (permalink / raw) To: Mark Lord Cc: Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list Mark Lord wrote: >On Tuesday 14 February 2006 12:12, Justin Piszcz wrote: > > >>I would like to try the patch too, if available. >> >> > >Something like this: (for 2.6.16-rc3-git2, but should be okay on 2.6.15 also). > >Untested: include the original SCSI opcode in printk's for libata SCSI errors, >to help understand where the errors are coming from. > >Signed-Off-By: Mark Lord <mlord@pobox.com> > > Thanks Mark - I've finally gotten this patch applied. With smartd disabled and no smart commands issued, a readonly badblocks scan of /dev/sdb2 shows no problems and now gives: Feb 25 10:38:31 haze kernel: ata2: status=0x51 { DriveReady SeekComplete Error } Feb 25 10:38:32 haze kernel: ata2: no sense translation for op=0x28 status: 0x51 Feb 25 10:38:32 haze kernel: ata2: status=0x51 { DriveReady SeekComplete Error } Feb 25 10:38:35 haze kernel: ata2: no sense translation for op=0x28 status: 0x51 hundreds of times. and during boot I can get: ata2: no sense translation for op=0x28 status: 0x51 ata2: translated op=0x28 ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } Installing knfsd (copyright (C) 1996 okir@monad.swb.de). ata2: no sense translation for op=0x28 status: 0x51 ata2: translated op=0x28 ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for op=0x28 status: 0x51 ata2: translated op=0x28 ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } Subsequently a smartclt -data -a /dev/sdb shows no errors. So could this be a faulty disk that smart shows is OK and shows no read or write errors? The other problem I noticed was that smartctl -o on -data /dev/sda still just gives: Feb 25 10:51:47 haze kernel: ata1: PIO error Feb 25 10:51:47 haze kernel: ata1: status=0x51 { DriveReady SeekComplete Error } Feb 25 10:51:47 haze kernel: ata1: error=0x04 { DriveStatusError } Feb 25 10:51:47 haze kernel: ata1: PIO error Feb 25 10:51:47 haze kernel: ata1: status=0x51 { DriveReady SeekComplete Error } Feb 25 10:51:47 haze kernel: ata1: error=0x04 { DriveStatusError } Feb 25 10:51:47 haze kernel: ata1: PIO error many times. I get similar problems for all the drives under both sata_sil and sata_via. Linux haze 2.6.15patchsata #6 PREEMPT Fri Feb 24 19:15:07 UTC 2006 i686 GNU/Linux libata version 1.20 loaded. sata_sil 0000:00:0a.0: version 0.9 ACPI: PCI Interrupt 0000:00:0a.0[A] -> GSI 16 (level, low) -> IRQ 17 ata1: SATA max UDMA/100 cmd 0xF8804080 ctl 0xF880408A bmdma 0xF8804000 irq 17 ata2: SATA max UDMA/100 cmd 0xF88040C0 ctl 0xF88040CA bmdma 0xF8804008 irq 17 ata1: dev 0 cfg 49:2f00 82:7869 83:7d09 84:4043 85:7869 86:3c01 87:4043 88:203f ata1: dev 0 ATA-7, max UDMA/100, 390721968 sectors: LBA48 ata1: dev 0 configured for UDMA/100 scsi0 : sata_sil ata2: dev 0 cfg 49:2f00 82:7c6b 83:7f09 84:4063 85:7c69 86:3e01 87:4063 88:007f ata2: dev 0 ATA-7, max UDMA/133, 398297088 sectors: LBA48 ata2: dev 0 configured for UDMA/100 scsi1 : sata_sil Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC Type: Direct-Access ANSI SCSI revision: 05 Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC Type: Direct-Access ANSI SCSI revision: 05 sata_via 0000:00:0f.0: version 1.1 ACPI: PCI Interrupt 0000:00:0f.0[B] -> GSI 20 (level, low) -> IRQ 16 sata_via 0000:00:0f.0: routed to hard irq line 0 ata3: SATA max UDMA/133 cmd 0x9800 ctl 0x9402 bmdma 0x8400 irq 16 ata4: SATA max UDMA/133 cmd 0x9000 ctl 0x8802 bmdma 0x8408 irq 16 ata3: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3468 86:3c01 87:4003 88:407f ata3: dev 0 ATA-6, max UDMA/133, 312581808 sectors: LBA48 ata3: dev 0 configured for UDMA/133 scsi2 : sata_via ata4: dev 0 cfg 49:2f00 82:7c6b 83:7f09 84:4063 85:7c68 86:3e01 87:4063 88:407f ata4: dev 0 ATA-7, max UDMA/133, 398297088 sectors: LBA48 ata4: dev 0 configured for UDMA/133 scsi3 : sata_via Vendor: ATA Model: ST3160023AS Rev: 3.18 Type: Direct-Access ANSI SCSI revision: 05 Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB) SCSI device sda: drive cache: write back SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB) SCSI device sda: drive cache: write back sda: sda1 sd 0:0:0:0: Attached scsi disk sda SCSI device sdb: 398297088 512-byte hdwr sectors (203928 MB) SCSI device sdb: drive cache: write back SCSI device sdb: 398297088 512-byte hdwr sectors (203928 MB) SCSI device sdb: drive cache: write back sdb: sdb1 sdb2 sd 1:0:0:0: Attached scsi disk sdb SCSI device sdc: 312581808 512-byte hdwr sectors (160042 MB) SCSI device sdc: drive cache: write back SCSI device sdc: 312581808 512-byte hdwr sectors (160042 MB) SCSI device sdc: drive cache: write back sdc: sdc1 sdc2 sdc3 sdc4 sd 2:0:0:0: Attached scsi disk sdc SCSI device sdd: 398297088 512-byte hdwr sectors (203928 MB) SCSI device sdd: drive cache: write back SCSI device sdd: 398297088 512-byte hdwr sectors (203928 MB) SCSI device sdd: drive cache: write back sdd: sdd1 sdd2 sd 3:0:0:0: Attached scsi disk sdd sd 0:0:0:0: Attached scsi generic sg0 type 0 sd 1:0:0:0: Attached scsi generic sg1 type 0 sd 2:0:0:0: Attached scsi generic sg2 type 0 sd 3:0:0:0: Attached scsi generic sg3 type 0 David -- ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 11:34 ` David Greaves @ 2006-02-25 16:20 ` Mark Lord 2006-02-25 17:45 ` Justin Piszcz 0 siblings, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-02-25 16:20 UTC (permalink / raw) To: David Greaves Cc: Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list [-- Attachment #1: Type: text/plain, Size: 1443 bytes --] David Greaves wrote: .. > Thanks Mark - I've finally gotten this patch applied. > > With smartd disabled and no smart commands issued, a readonly badblocks > scan of /dev/sdb2 shows no problems and now gives: > Feb 25 10:38:31 haze kernel: ata2: status=0x51 { DriveReady SeekComplete > Error } > Feb 25 10:38:32 haze kernel: ata2: no sense translation for op=0x28 > status: 0x51 > Feb 25 10:38:32 haze kernel: ata2: status=0x51 { DriveReady SeekComplete > Error } > Feb 25 10:38:35 haze kernel: ata2: no sense translation for op=0x28 > status: 0x51 > hundreds of times. .. Mmmm.. okay, it's happening due to a SCSI READ_10 opcode, which means it isn't being triggered by any of the FUA stuff. But there's still no obvious reason for the error. The drive is basically just saying "command rejected", and libata-scsi is translating that into "medium error" for some unknown reason. Unfortunately, the design of the current libata is such that we no longer have access to the actual ATA opcode that was rejected. It gets overwritten by the returned drive status on completion. So.. I need to generate another patch for you now, to save/show the real ATA opcode that was used to cause the errors. My theory is that we'll discover that it is one that your drive legitimately is rejecting (unsupported LBA48 or something..). But we won't know until we see the output. Second patch is attached: apply *in addition* to the first one. Cheers [-- Attachment #2: 12_libata_ata_opcode.patch --] [-- Type: text/x-patch, Size: 5983 bytes --] --- linux/drivers/scsi/libata-core.c.orig 2006-02-23 16:15:52.000000000 -0500 +++ linux/drivers/scsi/libata-core.c 2006-02-25 11:17:42.000000000 -0500 @@ -253,10 +253,11 @@ * spin_lock_irqsave(host_set lock) */ -static void ata_exec_command_pio(struct ata_port *ap, const struct ata_taskfile *tf) +static void ata_exec_command_pio(struct ata_port *ap, struct ata_taskfile *tf) { DPRINTK("ata%u: cmd 0x%X\n", ap->id, tf->command); + tf->saved_command = tf->command; outb(tf->command, ap->ioaddr.command_addr); ata_pause(ap); } @@ -274,10 +275,11 @@ * spin_lock_irqsave(host_set lock) */ -static void ata_exec_command_mmio(struct ata_port *ap, const struct ata_taskfile *tf) +static void ata_exec_command_mmio(struct ata_port *ap, struct ata_taskfile *tf) { DPRINTK("ata%u: cmd 0x%X\n", ap->id, tf->command); + tf->saved_command = tf->command; writeb(tf->command, (void __iomem *) ap->ioaddr.command_addr); ata_pause(ap); } @@ -294,7 +296,7 @@ * LOCKING: * spin_lock_irqsave(host_set lock) */ -void ata_exec_command(struct ata_port *ap, const struct ata_taskfile *tf) +void ata_exec_command(struct ata_port *ap, struct ata_taskfile *tf) { if (ap->flags & ATA_FLAG_MMIO) ata_exec_command_mmio(ap, tf); @@ -316,7 +318,7 @@ */ static inline void ata_tf_to_host(struct ata_port *ap, - const struct ata_taskfile *tf) + struct ata_taskfile *tf) { ap->ops->tf_load(ap, tf); ap->ops->exec_command(ap, tf); @@ -506,12 +508,13 @@ * Inherited from caller. */ -void ata_tf_to_fis(const struct ata_taskfile *tf, u8 *fis, u8 pmp) +void ata_tf_to_fis(struct ata_taskfile *tf, u8 *fis, u8 pmp) { fis[0] = 0x27; /* Register - Host to Device FIS */ fis[1] = (pmp & 0xf) | (1 << 7); /* Port multiplier number, bit 7 indicates Command FIS */ fis[2] = tf->command; + tf->saved_command = tf->command; fis[3] = tf->feature; fis[4] = tf->lbal; @@ -631,6 +634,7 @@ cmd = ata_rw_cmds[index + fua + lba48 + write]; if (cmd) { tf->command = cmd; + tf->saved_command = cmd; return 0; } return -1; --- linux/drivers/scsi/libata-scsi.c.orig 2006-02-25 10:58:41.000000000 -0500 +++ linux/drivers/scsi/libata-scsi.c 2006-02-25 11:16:07.000000000 -0500 @@ -438,7 +438,7 @@ * spin_lock_irqsave(host_set lock) */ void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc, - u8 *ascq, u8 opcode) + u8 *ascq, u8 opcode, u8 cmd) { int i; @@ -517,8 +517,8 @@ } } /* No error? Undecoded? */ - printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: 0x%02x\n", - id, opcode, drv_stat); + printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x cmd=0x%02x status: 0x%02x\n", + id, opcode, cmd, drv_stat); /* For our last chance pick, use medium read error because * it's much more common than an ATA drive telling you a write @@ -529,8 +529,8 @@ *ascq = 0x04; /* "auto-reallocation failed" */ translate_done: - DPRINTK(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to " - "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err, + DPRINTK(KERN_ERR "ata%u: translated op=0x%02x cmd=0x%02x ATA stat/err 0x%02x/%02x to " + "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, cmd, drv_stat, drv_err, *sk, *asc, *ascq); return; } @@ -571,7 +571,7 @@ */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { ata_to_sense_error(qc->ap->id, tf->command, tf->feature, - &sb[1], &sb[2], &sb[3], cmd->cmnd[0]); + &sb[1], &sb[2], &sb[3], cmd->cmnd[0], tf->saved_command); sb[1] &= 0x0f; } @@ -646,7 +646,7 @@ */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { ata_to_sense_error(qc->ap->id, tf->command, tf->feature, - &sb[2], &sb[12], &sb[13], cmd->cmnd[0]); + &sb[2], &sb[12], &sb[13], cmd->cmnd[0], tf->saved_command); sb[2] &= 0x0f; } @@ -1337,6 +1337,7 @@ goto early_finish; /* select device, send command to hardware */ + qc->tf.saved_command = qc->tf.command; if (ata_qc_issue(qc)) goto err_did; --- linux/include/linux/ata.h.orig 2006-02-17 17:23:45.000000000 -0500 +++ linux/include/linux/ata.h 2006-02-25 11:09:53.000000000 -0500 @@ -244,6 +244,7 @@ u8 device; u8 command; /* IO operation */ + u8 saved_command; /* IO operation */ }; #define ata_id_is_ata(id) (((id)[0] & (1 << 15)) == 0) --- linux/include/linux/libata.h.orig 2006-02-23 16:15:53.000000000 -0500 +++ linux/include/linux/libata.h 2006-02-25 11:17:14.000000000 -0500 @@ -420,7 +420,7 @@ void (*tf_load) (struct ata_port *ap, const struct ata_taskfile *tf); void (*tf_read) (struct ata_port *ap, struct ata_taskfile *tf); - void (*exec_command)(struct ata_port *ap, const struct ata_taskfile *tf); + void (*exec_command)(struct ata_port *ap, struct ata_taskfile *tf); u8 (*check_status)(struct ata_port *ap); u8 (*check_altstatus)(struct ata_port *ap); void (*dev_select)(struct ata_port *ap, unsigned int device); @@ -512,13 +512,13 @@ */ extern void ata_tf_load(struct ata_port *ap, const struct ata_taskfile *tf); extern void ata_tf_read(struct ata_port *ap, struct ata_taskfile *tf); -extern void ata_tf_to_fis(const struct ata_taskfile *tf, u8 *fis, u8 pmp); +extern void ata_tf_to_fis(struct ata_taskfile *tf, u8 *fis, u8 pmp); extern void ata_tf_from_fis(const u8 *fis, struct ata_taskfile *tf); extern void ata_noop_dev_select (struct ata_port *ap, unsigned int device); extern void ata_std_dev_select (struct ata_port *ap, unsigned int device); extern u8 ata_check_status(struct ata_port *ap); extern u8 ata_altstatus(struct ata_port *ap); -extern void ata_exec_command(struct ata_port *ap, const struct ata_taskfile *tf); +extern void ata_exec_command(struct ata_port *ap, struct ata_taskfile *tf); extern int ata_port_start (struct ata_port *ap); extern void ata_port_stop (struct ata_port *ap); extern void ata_host_stop (struct ata_host_set *host_set); ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 16:20 ` Mark Lord @ 2006-02-25 17:45 ` Justin Piszcz 2006-02-25 18:28 ` Mark Lord 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-02-25 17:45 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list Second patch fails for me. On a clean 2.6.15.4 source tree: p34:/usr/src# ls -ld linux lrwxrwxrwx 1 root src 14 2006-02-25 12:41 linux -> linux-2.6.15.4/ The one from your e-mail earlier: p34:/usr/src/linux# patch -p1 < /tmp/patch1 patching file drivers/scsi/libata-scsi.c Hunk #1 succeeded at 404 (offset -16 lines). Hunk #2 succeeded at 414 (offset -16 lines). Hunk #3 succeeded at 493 (offset -16 lines). Hunk #4 succeeded at 505 (offset -16 lines). Hunk #5 succeeded at 547 (offset -16 lines). Hunk #6 succeeded at 622 (offset -16 lines). p34:/usr/src/linux# patch -p1 < /tmp/12_libata_ata_opcode.patch patching file drivers/scsi/libata-core.c Hunk #1 succeeded at 245 (offset -8 lines). Hunk #2 succeeded at 267 (offset -8 lines). Hunk #3 succeeded at 288 (offset -8 lines). Hunk #4 succeeded at 310 (offset -8 lines). Hunk #5 succeeded at 500 (offset -8 lines). Hunk #6 FAILED at 626. 1 out of 6 hunks FAILED -- saving rejects to file drivers/scsi/libata-core.c.rej patching file drivers/scsi/libata-scsi.c Hunk #1 succeeded at 414 (offset -24 lines). Hunk #2 succeeded at 493 (offset -24 lines). Hunk #3 FAILED at 505. Hunk #4 succeeded at 547 (offset -24 lines). Hunk #5 succeeded at 622 (offset -24 lines). Hunk #6 succeeded at 1308 (offset -29 lines). 1 out of 6 hunks FAILED -- saving rejects to file drivers/scsi/libata-scsi.c.rej patching file include/linux/ata.h Hunk #1 succeeded at 239 (offset -5 lines). patching file include/linux/libata.h Hunk #1 succeeded at 368 (offset -52 lines). Hunk #2 succeeded at 452 (offset -60 lines). p34:/usr/src/linux# Should I be using 2.6.16-rcX? On Sat, 25 Feb 2006, Mark Lord wrote: > David Greaves wrote: > .. >> Thanks Mark - I've finally gotten this patch applied. >> >> With smartd disabled and no smart commands issued, a readonly badblocks >> scan of /dev/sdb2 shows no problems and now gives: >> Feb 25 10:38:31 haze kernel: ata2: status=0x51 { DriveReady SeekComplete >> Error } >> Feb 25 10:38:32 haze kernel: ata2: no sense translation for op=0x28 >> status: 0x51 >> Feb 25 10:38:32 haze kernel: ata2: status=0x51 { DriveReady SeekComplete >> Error } >> Feb 25 10:38:35 haze kernel: ata2: no sense translation for op=0x28 >> status: 0x51 >> hundreds of times. > .. > > Mmmm.. okay, it's happening due to a SCSI READ_10 opcode, > which means it isn't being triggered by any of the FUA stuff. > > But there's still no obvious reason for the error. > The drive is basically just saying "command rejected", > and libata-scsi is translating that into "medium error" > for some unknown reason. > > Unfortunately, the design of the current libata is such that > we no longer have access to the actual ATA opcode that was rejected. > It gets overwritten by the returned drive status on completion. > > So.. I need to generate another patch for you now, to save/show > the real ATA opcode that was used to cause the errors. > My theory is that we'll discover that it is one that your drive > legitimately is rejecting (unsupported LBA48 or something..). > > But we won't know until we see the output. > > Second patch is attached: apply *in addition* to the first one. > > Cheers > > ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 17:45 ` Justin Piszcz @ 2006-02-25 18:28 ` Mark Lord 2006-02-25 18:55 ` Justin Piszcz ` (2 more replies) 0 siblings, 3 replies; 147+ messages in thread From: Mark Lord @ 2006-02-25 18:28 UTC (permalink / raw) To: Justin Piszcz Cc: Mark Lord, David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list Justin Piszcz wrote: > Second patch fails for me. .. > Should I be using 2.6.16-rcX? Mmm... that's what I'm using (plus other patches), so, yes.. give that a try. 2.6.16 does seem to be shaping up to be a nice kernel. Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 18:28 ` Mark Lord @ 2006-02-25 18:55 ` Justin Piszcz 2006-02-25 19:29 ` Justin Piszcz 2006-02-25 19:47 ` David Greaves 2 siblings, 0 replies; 147+ messages in thread From: Justin Piszcz @ 2006-02-25 18:55 UTC (permalink / raw) To: Mark Lord Cc: Mark Lord, David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list I will give 2.6.16-rcX a try shortly, here is the error again (with a freshly patched 2.6.15.4) just to rule out any problems with the first time that I patched: [ 1037.451784] ata3: translated op=0x2a ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [ 1037.451791] ata3: status=0x51 { DriveReady SeekComplete Error } [ 1037.451796] ata3: error=0x04 { DriveStatusError } [ 1517.050496] ata3: no sense translation for op=0x2a status: 0x51 [ 1517.050504] ata3: translated op=0x2a ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 [ 1517.050506] ata3: status=0x51 { DriveReady SeekComplete Error } On Sat, 25 Feb 2006, Mark Lord wrote: > Justin Piszcz wrote: >> Second patch fails for me. > .. >> Should I be using 2.6.16-rcX? > > Mmm... that's what I'm using (plus other patches), > so, yes.. give that a try. 2.6.16 does seem to > be shaping up to be a nice kernel. > > Cheers > ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 18:28 ` Mark Lord 2006-02-25 18:55 ` Justin Piszcz @ 2006-02-25 19:29 ` Justin Piszcz 2006-02-25 19:53 ` David Greaves 2006-02-25 19:47 ` David Greaves 2 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-02-25 19:29 UTC (permalink / raw) To: Mark Lord Cc: Mark Lord, David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list Which kernel did you run your patch against? With 2.6.16-rc4.... First patch looks good.. p34:/usr/src/linux# patch -p1 < /tmp/patch1 patching file drivers/scsi/libata-scsi.c p34:/usr/src/linux# patch -p1 < /tmp/12_libata_ata_opcode.patch patching file drivers/scsi/libata-core.c Hunk #1 succeeded at 245 (offset -8 lines). Hunk #2 succeeded at 267 (offset -8 lines). Hunk #3 succeeded at 288 (offset -8 lines). Hunk #4 succeeded at 310 (offset -8 lines). Hunk #5 succeeded at 500 (offset -8 lines). Hunk #6 succeeded at 626 (offset -8 lines). patching file drivers/scsi/libata-scsi.c Hunk #1 succeeded at 430 (offset -8 lines). Hunk #2 succeeded at 509 (offset -8 lines). Hunk #3 FAILED at 521. Hunk #4 succeeded at 563 (offset -8 lines). Hunk #5 succeeded at 638 (offset -8 lines). Hunk #6 succeeded at 1329 (offset -8 lines). 1 out of 6 hunks FAILED -- saving rejects to file drivers/scsi/libata-scsi.c.rej patching file include/linux/ata.h patching file include/linux/libata.h Hunk #1 succeeded at 373 (offset -47 lines). Hunk #2 succeeded at 463 (offset -49 lines). p34:/usr/src/linux# ls -ld /usr/src/linux lrwxrwxrwx 1 root src 16 2006-02-25 14:24 /usr/src/linux -> linux-2.6.16-rc4/ p34:/usr/src/linux# Here is the *.rej file: # cat libata-scsi.c.rej *************** *** 521,528 **** *ascq = 0x04; /* "auto-reallocation failed" */ translate_done: - DPRINTK(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to " - "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err, *sk, *asc, *ascq); return; } --- 521,528 ---- *ascq = 0x04; /* "auto-reallocation failed" */ translate_done: + DPRINTK(KERN_ERR "ata%u: translated op=0x%02x cmd=0x%02x ATA stat/err 0x%02x/%02x to " + "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, cmd, drv_stat, drv_err, *sk, *asc, *ascq); return; } On Sat, 25 Feb 2006, Mark Lord wrote: > Justin Piszcz wrote: >> Second patch fails for me. > .. >> Should I be using 2.6.16-rcX? > > Mmm... that's what I'm using (plus other patches), > so, yes.. give that a try. 2.6.16 does seem to > be shaping up to be a nice kernel. > > Cheers > ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 19:29 ` Justin Piszcz @ 2006-02-25 19:53 ` David Greaves 0 siblings, 0 replies; 147+ messages in thread From: David Greaves @ 2006-02-25 19:53 UTC (permalink / raw) To: Justin Piszcz Cc: Mark Lord, Mark Lord, Jeff Garzik, linux-kernel, IDE/ATA development list Justin Piszcz wrote: > Which kernel did you run your patch against? > > With 2.6.16-rc4.... > > First patch looks good.. > Justin, I'll help you out off-list :) David ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 18:28 ` Mark Lord 2006-02-25 18:55 ` Justin Piszcz 2006-02-25 19:29 ` Justin Piszcz @ 2006-02-25 19:47 ` David Greaves 2006-02-26 2:27 ` Mark Lord 2 siblings, 1 reply; 147+ messages in thread From: David Greaves @ 2006-02-25 19:47 UTC (permalink / raw) To: Mark Lord Cc: Justin Piszcz, Mark Lord, Jeff Garzik, linux-kernel, IDE/ATA development list Mark Lord wrote: > Justin Piszcz wrote: > >> Should I be using 2.6.16-rcX? > > > Mmm... that's what I'm using (plus other patches), > so, yes.. give that a try. 2.6.16 does seem to > be shaping up to be a nice kernel. OK, failed for me too - I updated to 2.6.16-rc4 and it still failed (despite -F) so I fixed by hand. (printk -> DPRINTK anyway: Linux haze 2.6.16-rc4patched #1 PREEMPT Sat Feb 25 19:29:11 UTC 2006 i686 GNU/Linux ata2: status=0x51 { DriveReady SeekComplete Error } ata2: error=0x04 { DriveStatusError } ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } sd 1:0:0:0: SCSI error: return code = 0x8000002 sdb: Current: sense key: Medium Error Additional sense: Unrecovered read error - auto reallocate failed end_request: I/O error, dev sdb, sector 398283329 raid1: Disk failure on sdb2, disabling device. Operation continuing on 1 devices and later... device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com XFS mounting filesystem dm-0 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x04 { DriveStatusError } ata1: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata1: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } ata1: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata1: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } ata1: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata1: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } ata1: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata1: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } sd 0:0:0:0: SCSI error: return code = 0x8000002 sda: Current: sense key: Medium Error Additional sense: Unrecovered read error - auto reallocate failed end_request: I/O error, dev sda, sector 390716735 raid5: Disk failure on sda1, disabling device. Operation continuing on 2 devices ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } sd 1:0:0:0: SCSI error: return code = 0x8000002 sdb: Current: sense key: Medium Error Additional sense: Unrecovered read error - auto reallocate failed end_request: I/O error, dev sdb, sector 390716735 raid5: Disk failure on sdb1, disabling device. Operation continuing on 1 devices RAID5 conf printout: --- rd:3 wd:1 fd:2 disk 0, o:1, dev:sdd1 disk 1, o:0, dev:sdb1 disk 2, o:0, dev:sda1 xfs_force_shutdown(dm-0,0x1) called from line 338 of file fs/xfs/xfs_rw.c. Return address = 0xc020c0e9 Filesystem "dm-0": I/O Error Detected. Shutting down filesystem: dm-0 Please umount the filesystem, and rectify the problem(s) I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x640884a ("xlog_bwrite") error 5 buf count 262144 XFS: failed to locate log tail XFS: log mount/recovery failed: error 5 XFS: log mount failed RAID5 conf printout: --- rd:3 wd:1 fd:2 disk 0, o:1, dev:sdd1 disk 1, o:0, dev:sdb1 RAID5 conf printout: --- rd:3 wd:1 fd:2 disk 0, o:1, dev:sdd1 disk 1, o:0, dev:sdb1 RAID5 conf printout: --- rd:3 wd:1 fd:2 disk 0, o:1, dev:sdd1 So I guess my raid just blew up too... hope there's no corruption! David (PS Hi Mark, this is lbt from the Empeg BBS :) ) -- ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 19:47 ` David Greaves @ 2006-02-26 2:27 ` Mark Lord 2006-02-26 9:56 ` David Greaves 2006-02-26 12:27 ` James Courtier-Dutton 0 siblings, 2 replies; 147+ messages in thread From: Mark Lord @ 2006-02-26 2:27 UTC (permalink / raw) To: David Greaves Cc: Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun David Greaves wrote: > > Linux haze 2.6.16-rc4patched #1 PREEMPT Sat Feb 25 19:29:11 UTC 2006 > i686 GNU/Linux > > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: error=0x04 { DriveStatusError } > ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 > ata2: status=0x51 { DriveReady SeekComplete Error } > sd 1:0:0:0: SCSI error: return code = 0x8000002 > sdb: Current: sense key: Medium Error > Additional sense: Unrecovered read error - auto reallocate failed > end_request: I/O error, dev sdb, sector 398283329 > raid1: Disk failure on sdb2, disabling device. > Operation continuing on 1 devices Oh good, *now* we've gotten somewhere!! Albert / Jens / Jeff: The command failing above is SCSI WRITE_10, which is being translated into ATA_CMD_WRITE_FUA_EXT by libata. This command fails -- unrecognized by the drive in question. But libata reports it (most incorrectly) as a "medium error", and the drive is taken out of service from its RAID. Bad, bad, and worse. Libata should really recover from this, by recognizing that the command was rejected, and replacing it with a simple WRITE_EXT instead. Possibly followed by FLUSH_CACHE. So.. I've forgotten who put FUA into libata, but hopefully it's one of the folks on the CC: list, and that nice person can now generate a patch to fix this bug somehow. Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-26 2:27 ` Mark Lord @ 2006-02-26 9:56 ` David Greaves 2006-02-26 14:04 ` Mark Lord 2006-02-26 12:27 ` James Courtier-Dutton 1 sibling, 1 reply; 147+ messages in thread From: David Greaves @ 2006-02-26 9:56 UTC (permalink / raw) To: Mark Lord Cc: Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun, Linus Torvalds Mark Lord wrote: >> sdb: Current: sense key: Medium Error >> Additional sense: Unrecovered read error - auto reallocate failed >> end_request: I/O error, dev sdb, sector 398283329 >> raid1: Disk failure on sdb2, disabling device. >> Operation continuing on 1 devices > > > Oh good, *now* we've gotten somewhere!! > > Albert / Jens / Jeff: > > The command failing above is SCSI WRITE_10, which is being > translated into ATA_CMD_WRITE_FUA_EXT by libata. > > This command fails -- unrecognized by the drive in question. > But libata reports it (most incorrectly) as a "medium error", > and the drive is taken out of service from its RAID. > > Bad, bad, and worse. > > Libata should really recover from this, by recognizing that > the command was rejected, and replacing it with a simple > WRITE_EXT instead. Possibly followed by FLUSH_CACHE. > > So.. I've forgotten who put FUA into libata, but hopefully > it's one of the folks on the CC: list, and that nice person > can now generate a patch to fix this bug somehow. Thanks Mark I'm glad it's a bug and not bad hardware. I am quite concerned that the basic effect of just booting a practically vanilla 2.6.16-rc4 like this was to fry my raid array. Luckily it dropped 2 (of 3) disks so quickly that the event counter was the same allowing an easy rebuild. 2.6.15 has similar issues but they seem to happen *very* infrequently by comparison - this hit me several times during a single boot. Should Linus (cc'ed) hold off on 2.6.16 because of this or not? David ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-26 9:56 ` David Greaves @ 2006-02-26 14:04 ` Mark Lord 2006-02-27 21:34 ` Mark Lord 0 siblings, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-02-26 14:04 UTC (permalink / raw) To: David Greaves Cc: Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun, Linus Torvalds David Greaves wrote: > Mark Lord wrote: > >>> sdb: Current: sense key: Medium Error >>> Additional sense: Unrecovered read error - auto reallocate failed >>> end_request: I/O error, dev sdb, sector 398283329 >>> raid1: Disk failure on sdb2, disabling device. >>> Operation continuing on 1 devices .. >> The command failing above is SCSI WRITE_10, which is being >> translated into ATA_CMD_WRITE_FUA_EXT by libata. >> >> This command fails -- unrecognized by the drive in question. >> But libata reports it (most incorrectly) as a "medium error", >> and the drive is taken out of service from its RAID. >> >> Bad, bad, and worse. .. > Thanks Mark > > I'm glad it's a bug and not bad hardware. > > I am quite concerned that the basic effect of just booting a practically > vanilla 2.6.16-rc4 like this was to fry my raid array. > > Luckily it dropped 2 (of 3) disks so quickly that the event counter was > the same allowing an easy rebuild. > > 2.6.15 has similar issues but they seem to happen *very* infrequently by > comparison - this hit me several times during a single boot. > > Should Linus (cc'ed) hold off on 2.6.16 because of this or not? Well, no doubt whatsoever about it being a "regression", since the FUA code is *new* in 2.6.16 (not present in 2.6.15). The FUA code should either get fixed, or removed from 2.6.16. Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-26 14:04 ` Mark Lord @ 2006-02-27 21:34 ` Mark Lord 2006-02-28 1:33 ` Tejun Heo 0 siblings, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-02-27 21:34 UTC (permalink / raw) To: Jeff Garzik Cc: David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun, Linus Torvalds Mark Lord wrote: >> Mark Lord wrote: >> >>>> sdb: Current: sense key: Medium Error >>>> Additional sense: Unrecovered read error - auto reallocate failed >>>> end_request: I/O error, dev sdb, sector 398283329 >>>> raid1: Disk failure on sdb2, disabling device. >>>> Operation continuing on 1 devices > .. >>> The command failing above is SCSI WRITE_10, which is being >>> translated into ATA_CMD_WRITE_FUA_EXT by libata. >>> >>> This command fails -- unrecognized by the drive in question. >>> But libata reports it (most incorrectly) as a "medium error", >>> and the drive is taken out of service from its RAID. >>> >>> Bad, bad, and worse. .. hold off on 2.6.16 because of this or not? > > Well, no doubt whatsoever about it being a "regression", > since the FUA code is *new* in 2.6.16 (not present in 2.6.15). > > The FUA code should either get fixed, or removed from 2.6.16. Actually, now that I've done a little more digging, this FUA stuff is inherently dangerous as implemented. A least a few SATA controllers including pipelines and whatnot that rely upon recognizing the (S)ATA opcodes being using. And I sincerely doubt that any of those will recognize the very newish (and aptly named..) FUA opcodes. These may be unsafe in general, unless we tag controllers as FUA-capable and NON-FUA-capable, in addition to tagging the drives. :/ ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-27 21:34 ` Mark Lord @ 2006-02-28 1:33 ` Tejun Heo 2006-02-28 1:46 ` Linus Torvalds 2006-02-28 4:16 ` Mark Lord 0 siblings, 2 replies; 147+ messages in thread From: Tejun Heo @ 2006-02-28 1:33 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Hello, Mark. Mark Lord wrote: > > .. hold off on 2.6.16 because of this or not? > It certainly is dangerous. I guess we should turn off FUA for the time being. Barrier auto-fallback was once implemented but it didn't seem like a good idea as it was too complex and hides low level bug from higher level. The concensus seems to be developing blacklist of drives which lie about FUA support (currently only one drive). Official kernel doesn't seem to be the correct place to grow the blacklist, Maybe we should do it from -mm? >> >> Well, no doubt whatsoever about it being a "regression", >> since the FUA code is *new* in 2.6.16 (not present in 2.6.15). >> >> The FUA code should either get fixed, or removed from 2.6.16. > > > Actually, now that I've done a little more digging, this FUA stuff > is inherently dangerous as implemented. A least a few SATA controllers > including pipelines and whatnot that rely upon recognizing the (S)ATA > opcodes being using. And I sincerely doubt that any of those will > recognize the very newish (and aptly named..) FUA opcodes. > > These may be unsafe in general, unless we tag controllers as > FUA-capable and NON-FUA-capable, in addition to tagging the drives. All sii controllers and piix/ahci seem to handle FUA pretty ok. And yeah, we may have to create controller blacklist too. BTW, can you let me know what drive we're talking about now (model name and firmware revision)? -- tejun ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 1:33 ` Tejun Heo @ 2006-02-28 1:46 ` Linus Torvalds 2006-02-28 2:07 ` Jeff Garzik 2006-02-28 8:03 ` Jens Axboe 2006-02-28 4:16 ` Mark Lord 1 sibling, 2 replies; 147+ messages in thread From: Linus Torvalds @ 2006-02-28 1:46 UTC (permalink / raw) To: Tejun Heo Cc: Mark Lord, Jeff Garzik, David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe On Tue, 28 Feb 2006, Tejun Heo wrote: > Hello, Mark. > > Mark Lord wrote: > > > > .. hold off on 2.6.16 because of this or not? > > > > It certainly is dangerous. I guess we should turn off FUA for the time being. > Barrier auto-fallback was once implemented but it didn't seem like a good idea > as it was too complex and hides low level bug from higher level. The concensus > seems to be developing blacklist of drives which lie about FUA support > (currently only one drive). Official kernel doesn't seem to be the correct > place to grow the blacklist, Maybe we should do it from -mm? For 2.6.16, the only sane solution for now is to just turn it off. Somebody want to send me a patch that does that, along with an ack from Mark (and whoever else sees this) that it fixes his/their problems? Linus ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 1:46 ` Linus Torvalds @ 2006-02-28 2:07 ` Jeff Garzik 2006-02-28 2:14 ` Linus Torvalds 2006-02-28 10:30 ` Alan Cox 2006-02-28 8:03 ` Jens Axboe 1 sibling, 2 replies; 147+ messages in thread From: Jeff Garzik @ 2006-02-28 2:07 UTC (permalink / raw) To: Linus Torvalds Cc: Tejun Heo, Mark Lord, David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe [-- Attachment #1: Type: text/plain, Size: 312 bytes --] Linus Torvalds wrote: > For 2.6.16, the only sane solution for now is to just turn it off. > > Somebody want to send me a patch that does that, along with an ack from > Mark (and whoever else sees this) that it fixes his/their problems? I've had this waiting in the wings, in fact... [see attached] Jeff [-- Attachment #2: libata.txt --] [-- Type: text/plain, Size: 1644 bytes --] Please pull from 'upstream-fixes' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git to receive the following updates: drivers/scsi/libata-core.c | 4 ++++ drivers/scsi/libata-scsi.c | 2 ++ drivers/scsi/libata.h | 1 + 3 files changed, 7 insertions(+) Jeff Garzik: [libata] Disable FUA by default diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c index 5f1d758..ab3c9a4 100644 --- a/drivers/scsi/libata-core.c +++ b/drivers/scsi/libata-core.c @@ -82,6 +82,10 @@ int atapi_enabled = 0; module_param(atapi_enabled, int, 0444); MODULE_PARM_DESC(atapi_enabled, "Enable discovery of ATAPI devices (0=off, 1=on)"); +int fua = 0; +module_param(fua, int, 0444); +MODULE_PARM_DESC(fua, "FUA support (0=off, 1=on)"); + MODULE_AUTHOR("Jeff Garzik"); MODULE_DESCRIPTION("Library module for ATA devices"); MODULE_LICENSE("GPL"); diff --git a/drivers/scsi/libata-scsi.c b/drivers/scsi/libata-scsi.c index 07b1e7c..5ce33ae 100644 --- a/drivers/scsi/libata-scsi.c +++ b/drivers/scsi/libata-scsi.c @@ -1708,6 +1708,8 @@ static int ata_dev_supports_fua(u16 *id) { unsigned char model[41], fw[9]; + if (!fua) + return 0; if (!ata_id_has_fua(id)) return 0; diff --git a/drivers/scsi/libata.h b/drivers/scsi/libata.h index e03ce48..abfd18f 100644 --- a/drivers/scsi/libata.h +++ b/drivers/scsi/libata.h @@ -41,6 +41,7 @@ struct ata_scsi_args { /* libata-core.c */ extern int atapi_enabled; +extern int fua; extern struct ata_queued_cmd *ata_qc_new_init(struct ata_port *ap, struct ata_device *dev); extern int ata_rwcmd_protocol(struct ata_queued_cmd *qc); ^ permalink raw reply related [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 2:07 ` Jeff Garzik @ 2006-02-28 2:14 ` Linus Torvalds 2006-02-28 2:52 ` Jeff Garzik 2006-02-28 3:36 ` Jeff Garzik 2006-02-28 10:30 ` Alan Cox 1 sibling, 2 replies; 147+ messages in thread From: Linus Torvalds @ 2006-02-28 2:14 UTC (permalink / raw) To: Jeff Garzik Cc: Tejun Heo, Mark Lord, David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe On Mon, 27 Feb 2006, Jeff Garzik wrote: > > I've had this waiting in the wings, in fact... [see attached] I really hate having a _global_ variable called "fua". That's just bad taste. I would suggest calling it "atapi_forced_unit_attention_enabled", but maybe that is going a bit overboard. It's definitely better than just "fua", though. Linus ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 2:14 ` Linus Torvalds @ 2006-02-28 2:52 ` Jeff Garzik 2006-02-28 3:36 ` Jeff Garzik 1 sibling, 0 replies; 147+ messages in thread From: Jeff Garzik @ 2006-02-28 2:52 UTC (permalink / raw) To: Linus Torvalds Cc: Tejun Heo, Mark Lord, David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe Linus Torvalds wrote: > > On Mon, 27 Feb 2006, Jeff Garzik wrote: > >>I've had this waiting in the wings, in fact... [see attached] > > > I really hate having a _global_ variable called "fua". That's just bad > taste. I would suggest calling it "atapi_forced_unit_attention_enabled", > but maybe that is going a bit overboard. It's definitely better than just > "fua", though. <shrug> It will go away when things are fixed, and only users who are testing will even bother with it. Looking over the module subsystem, it looks like one could use module_param_named() to achieve proper namespace separation (C versus module opt) -- then you could call it libata_fua -- but for a temporary module option it seems like more trouble than its worth. Jeff ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 2:14 ` Linus Torvalds 2006-02-28 2:52 ` Jeff Garzik @ 2006-02-28 3:36 ` Jeff Garzik 2006-02-28 4:11 ` Mark Lord 1 sibling, 1 reply; 147+ messages in thread From: Jeff Garzik @ 2006-02-28 3:36 UTC (permalink / raw) To: Linus Torvalds Cc: Tejun Heo, Mark Lord, David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe [-- Attachment #1: Type: text/plain, Size: 436 bytes --] Linus Torvalds wrote: > > On Mon, 27 Feb 2006, Jeff Garzik wrote: > >>I've had this waiting in the wings, in fact... [see attached] > > > I really hate having a _global_ variable called "fua". That's just bad > taste. I would suggest calling it "atapi_forced_unit_attention_enabled", > but maybe that is going a bit overboard. It's definitely better than just > "fua", though. Here's the cleaner namespace version... Jeff [-- Attachment #2: libata.txt --] [-- Type: text/plain, Size: 1672 bytes --] Please pull from 'upstream-fixes' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git to receive the following updates: drivers/scsi/libata-core.c | 4 ++++ drivers/scsi/libata-scsi.c | 2 ++ drivers/scsi/libata.h | 1 + 3 files changed, 7 insertions(+) Jeff Garzik: [libata] Disable FUA diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c index 5f1d758..4f91b0d 100644 --- a/drivers/scsi/libata-core.c +++ b/drivers/scsi/libata-core.c @@ -82,6 +82,10 @@ int atapi_enabled = 0; module_param(atapi_enabled, int, 0444); MODULE_PARM_DESC(atapi_enabled, "Enable discovery of ATAPI devices (0=off, 1=on)"); +int libata_fua = 0; +module_param_named(fua, libata_fua, int, 0444); +MODULE_PARM_DESC(fua, "FUA support (0=off, 1=on)"); + MODULE_AUTHOR("Jeff Garzik"); MODULE_DESCRIPTION("Library module for ATA devices"); MODULE_LICENSE("GPL"); diff --git a/drivers/scsi/libata-scsi.c b/drivers/scsi/libata-scsi.c index 07b1e7c..59503c9 100644 --- a/drivers/scsi/libata-scsi.c +++ b/drivers/scsi/libata-scsi.c @@ -1708,6 +1708,8 @@ static int ata_dev_supports_fua(u16 *id) { unsigned char model[41], fw[9]; + if (!libata_fua) + return 0; if (!ata_id_has_fua(id)) return 0; diff --git a/drivers/scsi/libata.h b/drivers/scsi/libata.h index e03ce48..fddaf47 100644 --- a/drivers/scsi/libata.h +++ b/drivers/scsi/libata.h @@ -41,6 +41,7 @@ struct ata_scsi_args { /* libata-core.c */ extern int atapi_enabled; +extern int libata_fua; extern struct ata_queued_cmd *ata_qc_new_init(struct ata_port *ap, struct ata_device *dev); extern int ata_rwcmd_protocol(struct ata_queued_cmd *qc); ^ permalink raw reply related [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 3:36 ` Jeff Garzik @ 2006-02-28 4:11 ` Mark Lord 0 siblings, 0 replies; 147+ messages in thread From: Mark Lord @ 2006-02-28 4:11 UTC (permalink / raw) To: Jeff Garzik Cc: Linus Torvalds, Tejun Heo, David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe Jeff Garzik wrote: > Linus Torvalds wrote: .. >> I really hate having a _global_ variable called "fua". That's just bad >> taste. I would suggest calling it "atapi_forced_unit_attention_enabled" Heh heh.. It's actually short for "Force Unit Access", though oddly enough I don't think the patch mentions that in the MODULE_PARM_DESC(). > Here's the cleaner namespace version... David, do you want to ack this one for us? Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 2:07 ` Jeff Garzik 2006-02-28 2:14 ` Linus Torvalds @ 2006-02-28 10:30 ` Alan Cox 1 sibling, 0 replies; 147+ messages in thread From: Alan Cox @ 2006-02-28 10:30 UTC (permalink / raw) To: Jeff Garzik Cc: Linus Torvalds, Tejun Heo, Mark Lord, David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe On Llu, 2006-02-27 at 21:07 -0500, Jeff Garzik wrote: > led, "Enable discovery of ATAPI devices (0=off, 1=on)"); > > +int fua = 0; > +module_param(fua, int, 0444); > +MODULE_PARM_DESC(fua, "FUA support (0=off, 1=on)"); > + Not a good name for a global. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 1:46 ` Linus Torvalds 2006-02-28 2:07 ` Jeff Garzik @ 2006-02-28 8:03 ` Jens Axboe 1 sibling, 0 replies; 147+ messages in thread From: Jens Axboe @ 2006-02-28 8:03 UTC (permalink / raw) To: Linus Torvalds Cc: Tejun Heo, Mark Lord, Jeff Garzik, David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc On Mon, Feb 27 2006, Linus Torvalds wrote: > > > On Tue, 28 Feb 2006, Tejun Heo wrote: > > > Hello, Mark. > > > > Mark Lord wrote: > > > > > > .. hold off on 2.6.16 because of this or not? > > > > > > > It certainly is dangerous. I guess we should turn off FUA for the > > time being. Barrier auto-fallback was once implemented but it > > didn't seem like a good idea as it was too complex and hides low > > level bug from higher level. The concensus seems to be developing > > blacklist of drives which lie about FUA support (currently only one > > drive). Official kernel doesn't seem to be the correct place to grow > > the blacklist, Maybe we should do it from -mm? > > For 2.6.16, the only sane solution for now is to just turn it off. > > Somebody want to send me a patch that does that, along with an ack from > Mark (and whoever else sees this) that it fixes his/their problems? That's the best solution right now. I guess there's no way around a blacklist for FUA support and we need time to grow that :-( And proper fallback to non-FUA writes with disabling FUA based barriers as well. Mark, what drive model+firmware are you using? -- Jens Axboe ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 1:33 ` Tejun Heo 2006-02-28 1:46 ` Linus Torvalds @ 2006-02-28 4:16 ` Mark Lord 2006-02-28 10:32 ` Alan Cox 2006-02-28 10:39 ` David Greaves 1 sibling, 2 replies; 147+ messages in thread From: Mark Lord @ 2006-02-28 4:16 UTC (permalink / raw) To: Tejun Heo, David Greaves Cc: Mark Lord, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Tejun Heo wrote: .. >> These may be unsafe in general, unless we tag controllers as >> FUA-capable and NON-FUA-capable, in addition to tagging the drives. > > All sii controllers and piix/ahci seem to handle FUA pretty ok. And > yeah, we may have to create controller blacklist too. Or maybe a whitelist instead, since nearly all existing hardware pre-dates FUA commands. Or maybe just have a libata function to test whether the FUA commands actually work or not, before enabling them for general use. *That* could be a much better approach, given the large number of possible drive/controller combos, and it cuts down on the maintenance headache of having to list everything on a list somewhere. > BTW, can you let me know what drive we're talking about now (model name > and firmware revision)? David: we need to see the output from "hdparm --Istdout /dev/sda (or whichever drive it was that was failing on your system). Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 4:16 ` Mark Lord @ 2006-02-28 10:32 ` Alan Cox 2006-02-28 10:30 ` Justin Piszcz 2006-02-28 10:39 ` David Greaves 1 sibling, 1 reply; 147+ messages in thread From: Alan Cox @ 2006-02-28 10:32 UTC (permalink / raw) To: Mark Lord Cc: Tejun Heo, David Greaves, Mark Lord, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Llu, 2006-02-27 at 23:16 -0500, Mark Lord wrote: > Or maybe a whitelist instead, since nearly all existing hardware > pre-dates FUA commands. For controllers just add it as a host flag and it can be handled the same way as LBA48 is right now. It may also be some hosts can issue FUA with a bit of bandaging (state machine resets/pio etc) Alan ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 10:32 ` Alan Cox @ 2006-02-28 10:30 ` Justin Piszcz 0 siblings, 0 replies; 147+ messages in thread From: Justin Piszcz @ 2006-02-28 10:30 UTC (permalink / raw) To: Alan Cox Cc: Mark Lord, Tejun Heo, David Greaves, Mark Lord, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Tue, 28 Feb 2006, Alan Cox wrote: > On Llu, 2006-02-27 at 23:16 -0500, Mark Lord wrote: >> Or maybe a whitelist instead, since nearly all existing hardware >> pre-dates FUA commands. > > For controllers just add it as a host flag and it can be handled the > same way as LBA48 is right now. It may also be some hosts can issue FUA > with a bit of bandaging (state machine resets/pio etc) > > Alan > While I have not yet been able to reproduce the problem with the verbose patch, here is the hdparm -I: /dev/sdc: ATA device, with non-removable media Model Number: WDC WD4000KD-00NAB0 Serial Number: WD-WMAMY1020930 Firmware Revision: 01.06A01 Standards: Supported: 7 6 5 4 Likely used: 7 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 781422768 device size with M = 1024*1024: 381554 MBytes device size with M = 1000*1000: 400088 MBytes (400 GB) Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, with device specific minimum R/W multiple sector transfer: Max = 16 Current = 0 Recommended acoustic management value: 128, current value: 254 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * NOP cmd * READ BUFFER cmd * WRITE BUFFER cmd * Host Protected Area feature set * Look-ahead * Write cache * Power Management feature set Security Mode feature set * SMART feature set * FLUSH CACHE EXT command * Mandatory FLUSH CACHE command * Device Configuration Overlay feature set * 48-bit Address feature set Automatic Acoustic Management feature set SET MAX security extension * DOWNLOAD MICROCODE cmd * General Purpose Logging feature set * SMART self-test * SMART error logging Security: supported not enabled not locked not frozen not expired: security count not supported: enhanced erase Checksum: correct ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 4:16 ` Mark Lord 2006-02-28 10:32 ` Alan Cox @ 2006-02-28 10:39 ` David Greaves 2006-02-28 14:37 ` Mark Lord ` (2 more replies) 1 sibling, 3 replies; 147+ messages in thread From: David Greaves @ 2006-02-28 10:39 UTC (permalink / raw) To: Mark Lord Cc: Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Mark Lord wrote: > Tejun Heo wrote: > >> BTW, can you let me know what drive we're talking about now (model >> name and firmware revision)? > > > David: we need to see the output from "hdparm --Istdout /dev/sda > (or whichever drive it was that was failing on your system). > > Cheers > So here's the info for sda and sdb (see below for related log data). /dev/sda: IO_support = 0 (default 16-bit) readonly = 0 (off) readahead = 256 (on) geometry = 24321/255/63, sectors = 390721968, start = 0 0040 3fff c837 0010 0000 0000 003f 0000 0000 0000 4234 3033 3852 5248 2020 2020 2020 2020 2020 2020 0003 4000 0004 4241 4e43 3139 3830 4d61 7874 6f72 2036 4232 3030 4d30 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 8010 0000 2f00 4000 0200 0000 0007 3fff 0010 003f fc10 00fb 0100 ffff 0fff 0000 0007 0003 0078 0078 0078 0078 0000 0000 0000 0000 0000 0000 0000 0002 0000 0000 0000 00fe 001e 7869 7d09 4043 7869 3c01 4043 203f 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 f1b0 1749 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0113 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 d3a5 /dev/sdb: IO_support = 0 (default 16-bit) readonly = 0 (off) readahead = 256 (on) geometry = 24792/255/63, sectors = 398297088, start = 0 0040 3fff c837 0010 0000 0000 003f 0000 0000 0000 4234 3152 5641 3148 2020 2020 2020 2020 2020 2020 0003 4000 0004 4241 4e43 3142 5930 4d61 7874 6f72 2036 4232 3030 4d30 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 8010 0000 2f00 4000 0200 0000 0007 3fff 0010 003f fc10 00fb 0100 ffff 0fff 0000 0007 0003 0078 0078 0078 0078 0000 0000 0000 0000 0000 0000 001f 0102 0000 0000 0000 00fe 001e 7c6b 7f09 4063 7c69 3e01 4063 207f 0000 0000 0000 fffe 0000 c0fe 0000 0000 0000 0000 0000 8800 17bd 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0113 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 d8a5 The info below is from the log I saved booted with 2.6.16-rc4 I got these errors: sd 0:0:0:0: SCSI error: return code = 0x8000002 sda: Current: sense key: Medium Error Additional sense: Unrecovered read error - auto reallocate failed end_request: I/O error, dev sda, sector 390716735 raid5: Disk failure on sda1, disabling device. Operation continuing on 2 devices ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } sd 1:0:0:0: SCSI error: return code = 0x8000002 sdb: Current: sense key: Medium Error Additional sense: Unrecovered read error - auto reallocate failed end_request: I/O error, dev sdb, sector 390716735 raid5: Disk failure on sdb1, disabling device. Operation continuing on 1 devices They are both attached to: libata version 1.20 loaded. sata_sil 0000:00:0a.0: version 0.9 ACPI: PCI Interrupt 0000:00:0a.0[A] -> GSI 16 (level, low) -> IRQ 17 ata1: SATA max UDMA/100 cmd 0xF8804080 ctl 0xF880408A bmdma 0xF8804000 irq 17 ata2: SATA max UDMA/100 cmd 0xF88040C0 ctl 0xF88040CA bmdma 0xF8804008 irq 17 ata1: SATA link up 1.5 Gbps (SStatus 113) ata1: dev 0 cfg 49:2f00 82:7869 83:7d09 84:4043 85:7869 86:3c01 87:4043 88:203f ata1: dev 0 ATA-7, max UDMA/100, 390721968 sectors: LBA48 ata1: dev 0 configured for UDMA/100 scsi0 : sata_sil ata2: SATA link up 1.5 Gbps (SStatus 113) ata2: dev 0 cfg 49:2f00 82:7c6b 83:7f09 84:4063 85:7c69 86:3e01 87:4063 88:007f ata2: dev 0 ATA-7, max UDMA/133, 398297088 sectors: LBA48 ata2: dev 0 configured for UDMA/100 scsi1 : sata_sil Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC Type: Direct-Access ANSI SCSI revision: 05 Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC Type: Direct-Access ANSI SCSI revision: 05 Are there any other tests; like swapping the disks to the other controller (sata_via) and seeing what happens. With and without the patch? David -- ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 10:39 ` David Greaves @ 2006-02-28 14:37 ` Mark Lord 2006-02-28 21:04 ` Bill Davidsen 2006-02-28 14:38 ` Mark Lord 2006-02-28 15:31 ` Mark Lord 2 siblings, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-02-28 14:37 UTC (permalink / raw) To: David Greaves Cc: Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds David Greaves wrote: > > /dev/sda: .. > 0040 3fff c837 0010 0000 0000 003f 0000 > 0000 0000 4234 3033 3852 5248 2020 2020 > 2020 2020 2020 2020 0003 4000 0004 4241 > 4e43 3139 3830 4d61 7874 6f72 2036 4232 > 3030 4d30 2020 2020 2020 2020 2020 2020 > 2020 2020 2020 2020 2020 2020 2020 8010 > 0000 2f00 4000 0200 0000 0007 3fff 0010 > 003f fc10 00fb 0100 ffff 0fff 0000 0007 > 0003 0078 0078 0078 0078 0000 0000 0000 > 0000 0000 0000 0000 0002 0000 0000 0000 > 00fe 001e 7869 7d09 4043 7869 3c01 4043 > 203f 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 f1b0 1749 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0113 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 d3a5 .. hdparm-6.4 says: Model Number: Maxtor 6B200M0 Serial Number: B4038RRH Firmware Revision: BANC1980 Commands/features: Enabled Supported: * NOP cmd * READ BUFFER cmd * WRITE BUFFER cmd * Look-ahead * Write cache * Power Management feature set * SMART feature set * FLUSH_CACHE_EXT * Mandatory FLUSH_CACHE * Device Configuration Overlay feature set * 48-bit Address feature set SET_MAX security extension Advanced Power Management feature set * DOWNLOAD_MICROCODE * WRITE_{DMA|MULTIPLE}_FUA_EXT * SMART self-test * SMART error logging So, yes, the drive is either lying about "* WRITE_{DMA|MULTIPLE}_FUA_EXT", or it didn't like the parameters it was given, or the SATA/IDE controller chip didn't like the command. Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 14:37 ` Mark Lord @ 2006-02-28 21:04 ` Bill Davidsen 2006-03-08 2:57 ` Mark Lord 0 siblings, 1 reply; 147+ messages in thread From: Bill Davidsen @ 2006-02-28 21:04 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe Mark Lord wrote: > David Greaves wrote: >> >> /dev/sda: [...snip...] > .. > hdparm-6.4 says: Is there a version of that which will build on x86? I grabbed the version offered at freshmeat, but it won't compile on any x86 distro or gcc version to which I have access. RH8, RH9, FC1, FC3, FC4, ubuntu... with or without using the suggested alternate header. > > Model Number: Maxtor 6B200M0 > Serial Number: B4038RRH > Firmware Revision: BANC1980 > > Commands/features: > Enabled Supported: > * NOP cmd > * READ BUFFER cmd > * WRITE BUFFER cmd > * Look-ahead > * Write cache > * Power Management feature set > * SMART feature set > * FLUSH_CACHE_EXT > * Mandatory FLUSH_CACHE > * Device Configuration Overlay feature set > * 48-bit Address feature set > SET_MAX security extension > Advanced Power Management feature set > * DOWNLOAD_MICROCODE > * WRITE_{DMA|MULTIPLE}_FUA_EXT > * SMART self-test > * SMART error logging > > So, yes, the drive is either lying about "* WRITE_{DMA|MULTIPLE}_FUA_EXT", > or it didn't like the parameters it was given, or the SATA/IDE controller > chip didn't like the command. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 21:04 ` Bill Davidsen @ 2006-03-08 2:57 ` Mark Lord 2006-03-08 3:18 ` Dave Jones 2006-03-08 15:37 ` Bill Davidsen 0 siblings, 2 replies; 147+ messages in thread From: Mark Lord @ 2006-03-08 2:57 UTC (permalink / raw) To: Bill Davidsen Cc: Jeff Garzik, linux-kernel, IDE/ATA development list, axboe, albertcc Bill Davidsen wrote: > > Is there a version of that which will build on x86? I grabbed the > version offered at freshmeat, but it won't compile on any x86 distro or > gcc version to which I have access. RH8, RH9, FC1, FC3, FC4, ubuntu... > with or without using the suggested alternate header. hdparm-6.5 is the current version now. Both it, and 6.4, build/install/run cleanly on Ubunutu-5.10, Debian-Sarge, and SLES9-SP3. You seem to be having trouble on only Redhat distros.. I guess they've done something unfriendly again. Care to be more specific about what Redhat is doing? Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-08 2:57 ` Mark Lord @ 2006-03-08 3:18 ` Dave Jones 2006-03-08 3:23 ` Mark Lord 2006-03-08 15:37 ` Bill Davidsen 1 sibling, 1 reply; 147+ messages in thread From: Dave Jones @ 2006-03-08 3:18 UTC (permalink / raw) To: Mark Lord Cc: Bill Davidsen, Jeff Garzik, linux-kernel, IDE/ATA development list, axboe, albertcc On Tue, Mar 07, 2006 at 09:57:07PM -0500, Mark Lord wrote: > Bill Davidsen wrote: > > > >Is there a version of that which will build on x86? I grabbed the > >version offered at freshmeat, but it won't compile on any x86 distro or > >gcc version to which I have access. RH8, RH9, FC1, FC3, FC4, ubuntu... > >with or without using the suggested alternate header. > > hdparm-6.5 is the current version now. Both it, and 6.4, > build/install/run cleanly on Ubunutu-5.10, Debian-Sarge, > and SLES9-SP3. > > You seem to be having trouble on only Redhat distros.. > I guess they've done something unfriendly again. > > Care to be more specific about what Redhat is doing? looks like our userspace includes aren't up to date with some of the kernel changes, so currently they're lacking the ide_task_request_t and related taskfile bits. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=184349 Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-08 3:18 ` Dave Jones @ 2006-03-08 3:23 ` Mark Lord 0 siblings, 0 replies; 147+ messages in thread From: Mark Lord @ 2006-03-08 3:23 UTC (permalink / raw) To: Dave Jones, Mark Lord, Bill Davidsen, Jeff Garzik, linux-kernel, IDE/ATA development list, axboe, albertcc Dave Jones wrote: > > looks like our userspace includes aren't up to date with some of the kernel > changes, so currently they're lacking the ide_task_request_t and related > taskfile bits. > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=184349 Ahh.. Thanks, Dave. hdparm-6.6 being released *now*, with that stuff #ifdef'd out when the necessary header structs are missing. It builds/runs for me, on RHEL4 at least. Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-08 2:57 ` Mark Lord 2006-03-08 3:18 ` Dave Jones @ 2006-03-08 15:37 ` Bill Davidsen 1 sibling, 0 replies; 147+ messages in thread From: Bill Davidsen @ 2006-03-08 15:37 UTC (permalink / raw) To: Mark Lord Cc: Bill Davidsen, Jeff Garzik, linux-kernel, IDE/ATA development list, axboe, albertcc On Tue, 7 Mar 2006, Mark Lord wrote: > Bill Davidsen wrote: > > > > Is there a version of that which will build on x86? I grabbed the > > version offered at freshmeat, but it won't compile on any x86 distro or > > gcc version to which I have access. RH8, RH9, FC1, FC3, FC4, ubuntu... > > with or without using the suggested alternate header. > > hdparm-6.5 is the current version now. Both it, and 6.4, > build/install/run cleanly on Ubunutu-5.10, Debian-Sarge, > and SLES9-SP3. > > You seem to be having trouble on only Redhat distros.. > I guess they've done something unfriendly again. > > Care to be more specific about what Redhat is doing? I'll mail you the first few hundred errors from the compiler after I go find 6.5 and try that. My ubuntu tester reported similar results, so I'm not sure what we are doing. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with little computers since 1979 ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 10:39 ` David Greaves 2006-02-28 14:37 ` Mark Lord @ 2006-02-28 14:38 ` Mark Lord 2006-02-28 15:16 ` Alan Cox 2006-02-28 15:31 ` Mark Lord 2 siblings, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-02-28 14:38 UTC (permalink / raw) To: David Greaves Cc: Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds David Greaves wrote: .. > sd 0:0:0:0: SCSI error: return code = 0x8000002 > sda: Current: sense key: Medium Error > Additional sense: Unrecovered read error - auto reallocate failed > end_request: I/O error, dev sda, sector 390716735 > raid5: Disk failure on sda1, disabling device. Operation continuing on 2 > devices > ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 > ata2: status=0x51 { DriveReady SeekComplete Error } > sd 1:0:0:0: SCSI error: return code = 0x8000002 > sdb: Current: sense key: Medium Error > Additional sense: Unrecovered read error - auto reallocate failed > end_request: I/O error, dev sdb, sector 390716735 > raid5: Disk failure on sdb1, disabling device. Operation continuing on 1 > devices .. The error handling still sucks, regardless of FUA. All of this nonsense about "Medium Error" is pure bogosity here. Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 14:38 ` Mark Lord @ 2006-02-28 15:16 ` Alan Cox 2006-03-01 17:33 ` David Greaves 0 siblings, 1 reply; 147+ messages in thread From: Alan Cox @ 2006-02-28 15:16 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Maw, 2006-02-28 at 09:38 -0500, Mark Lord wrote: > > The error handling still sucks, regardless of FUA. > All of this nonsense about "Medium Error" is pure bogosity here. I've flipped my tree to report Aborted Command. Not sure there is a better scsi sense match for "it broke and I dont know why" ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 15:16 ` Alan Cox @ 2006-03-01 17:33 ` David Greaves 2006-03-01 18:37 ` Alan Cox 0 siblings, 1 reply; 147+ messages in thread From: David Greaves @ 2006-03-01 17:33 UTC (permalink / raw) To: Alan Cox Cc: Mark Lord, Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Alan Cox wrote: >On Maw, 2006-02-28 at 09:38 -0500, Mark Lord wrote: > > >>The error handling still sucks, regardless of FUA. >>All of this nonsense about "Medium Error" is pure bogosity here. >> >> > >I've flipped my tree to report Aborted Command. Not sure there is a >better scsi sense match for "it broke and I dont know why" > > As a user I prefer It Broke And I Dont Know Why to Aborted Command (honesty is the best policy) I certainly hate Medium Error as modern hard disks seem to be flakier than ever. David ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 17:33 ` David Greaves @ 2006-03-01 18:37 ` Alan Cox 2006-03-01 20:12 ` Phillip Susi 0 siblings, 1 reply; 147+ messages in thread From: Alan Cox @ 2006-03-01 18:37 UTC (permalink / raw) To: David Greaves Cc: Mark Lord, Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Mer, 2006-03-01 at 17:33 +0000, David Greaves wrote: > As a user I prefer > It Broke And I Dont Know Why > to > Aborted Command So whats the SCSI sense encoding for that ? ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 18:37 ` Alan Cox @ 2006-03-01 20:12 ` Phillip Susi 2006-03-08 16:46 ` Alan Cox 0 siblings, 1 reply; 147+ messages in thread From: Phillip Susi @ 2006-03-01 20:12 UTC (permalink / raw) To: Alan Cox Cc: David Greaves, Mark Lord, Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Alan Cox wrote: > On Mer, 2006-03-01 at 17:33 +0000, David Greaves wrote: >> As a user I prefer >> It Broke And I Dont Know Why >> to >> Aborted Command > > So whats the SCSI sense encoding for that ? > Wouldn't that just be 0/0/0? IIRC the standard defines that as "NO ADDITIONAL SENSE DATA" which sounds to me like another way of saying "I don't know what went wrong, but that didn't work". ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 20:12 ` Phillip Susi @ 2006-03-08 16:46 ` Alan Cox 0 siblings, 0 replies; 147+ messages in thread From: Alan Cox @ 2006-03-08 16:46 UTC (permalink / raw) To: Phillip Susi Cc: David Greaves, Mark Lord, Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus On Mer, 2006-03-01 at 15:12 -0500, Phillip Susi wrote: > >> It Broke And I Dont Know Why > >> to > >> Aborted Command > > > > So whats the SCSI sense encoding for that ? > > > > Wouldn't that just be 0/0/0? IIRC the standard defines that as "NO > ADDITIONAL SENSE DATA" which sounds to me like another way of saying "I > don't know what went wrong, but that didn't work". The 0/0/0 sense is already used. The question is what error do you use with that sense. At the moment I'm using aborted command. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 10:39 ` David Greaves 2006-02-28 14:37 ` Mark Lord 2006-02-28 14:38 ` Mark Lord @ 2006-02-28 15:31 ` Mark Lord 2006-02-28 15:34 ` Jeff Garzik 2 siblings, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-02-28 15:31 UTC (permalink / raw) To: David Greaves Cc: Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds David Greaves wrote: > > scsi1 : sata_sil > Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC > Type: Direct-Access ANSI SCSI revision: 05 > Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC > Type: Direct-Access ANSI SCSI revision: 05 I wonder if the non-FUA component here is the sata_sil, rather than the two Maxtor drives. Also, your drives have different firmware, but both have trouble with FUA here. (sdb is slightly newer, and larger, than sda). Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 15:31 ` Mark Lord @ 2006-02-28 15:34 ` Jeff Garzik 2006-02-28 16:57 ` Eric D. Mudama 2006-03-01 17:41 ` David Greaves 0 siblings, 2 replies; 147+ messages in thread From: Jeff Garzik @ 2006-02-28 15:34 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Tejun Heo, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Mark Lord wrote: > David Greaves wrote: > >> >> scsi1 : sata_sil >> Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC >> Type: Direct-Access ANSI SCSI revision: 05 >> Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC >> Type: Direct-Access ANSI SCSI revision: 05 > > > I wonder if the non-FUA component here is the sata_sil, > rather than the two Maxtor drives. > > Also, your drives have different firmware, > but both have trouble with FUA here. sata_sil is indeed a piece of hardware that needs to know the opcodes ahead of time... Jeff ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 15:34 ` Jeff Garzik @ 2006-02-28 16:57 ` Eric D. Mudama 2006-03-01 1:04 ` Mark Lord 2006-03-01 17:41 ` David Greaves 1 sibling, 1 reply; 147+ messages in thread From: Eric D. Mudama @ 2006-02-28 16:57 UTC (permalink / raw) To: Jeff Garzik Cc: Mark Lord, David Greaves, Tejun Heo, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds those drives should support all FUA opcodes properly, both queued and unqueued On 2/28/06, Jeff Garzik <jgarzik@pobox.com> wrote: > Mark Lord wrote: > > David Greaves wrote: > > > >> > >> scsi1 : sata_sil > >> Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC > >> Type: Direct-Access ANSI SCSI revision: 05 > >> Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC > >> Type: Direct-Access ANSI SCSI revision: 05 > > > > > > I wonder if the non-FUA component here is the sata_sil, > > rather than the two Maxtor drives. > > > > Also, your drives have different firmware, > > but both have trouble with FUA here. > > sata_sil is indeed a piece of hardware that needs to know the opcodes > ahead of time... > > Jeff > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-ide" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 16:57 ` Eric D. Mudama @ 2006-03-01 1:04 ` Mark Lord 2006-03-01 11:37 ` Justin Piszcz 2006-03-01 13:17 ` Justin Piszcz 0 siblings, 2 replies; 147+ messages in thread From: Mark Lord @ 2006-03-01 1:04 UTC (permalink / raw) To: Eric D. Mudama Cc: Jeff Garzik, David Greaves, Tejun Heo, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Eric D. Mudama wrote: > those drives should support all FUA opcodes properly, both queued and unqueued His first drive (sda) does not support queued commands at all, but the newer firmware in his second drive (sdb) does support NCQ. Both drives support FUA. cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 1:04 ` Mark Lord @ 2006-03-01 11:37 ` Justin Piszcz 2006-03-01 13:17 ` Justin Piszcz 1 sibling, 0 replies; 147+ messages in thread From: Justin Piszcz @ 2006-03-01 11:37 UTC (permalink / raw) To: Mark Lord Cc: Eric D. Mudama, Jeff Garzik, David Greaves, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Tue, 28 Feb 2006, Mark Lord wrote: > Eric D. Mudama wrote: >> those drives should support all FUA opcodes properly, both queued and >> unqueued > > His first drive (sda) does not support queued commands at all, > but the newer firmware in his second drive (sdb) does support NCQ. > > Both drives support FUA. > > cheers > To trust or not to trust? I have a 400GB SATA drive: WDC WD4000KD-00N. With these errors in dmesg that have been mentioned throughout the thread, should I trust Linux using this drive, or should I remove it/wait until a patch is released to address this issue? Also, in the forums (storagereview.com I believe), it has been noted that these drives do NOT work on the Intel ICH5 controller, and this turned out to be true, when I put it on the Intel ICH5, the box stalls for 2-3 minutes and then it does not see the drive. However, on the Silicon Image, Inc. SiI 3112 chipset or Promise SATA/150 TX2 it works okay but it has those errors in dmesg. My question is, performing long and short smart tests, everything is physically ok with the drive; however, I probably should not use this drive for anything important in Linux, comments? Justin. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 1:04 ` Mark Lord 2006-03-01 11:37 ` Justin Piszcz @ 2006-03-01 13:17 ` Justin Piszcz 1 sibling, 0 replies; 147+ messages in thread From: Justin Piszcz @ 2006-03-01 13:17 UTC (permalink / raw) To: Mark Lord Cc: Eric D. Mudama, Jeff Garzik, David Greaves, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Tue, 28 Feb 2006, Mark Lord wrote: > Eric D. Mudama wrote: >> those drives should support all FUA opcodes properly, both queued and >> unqueued > > His first drive (sda) does not support queued commands at all, > but the newer firmware in his second drive (sdb) does support NCQ. > > Both drives support FUA. > > cheers > Could someone *PLEASE* produce a *unified* patch that is compatible with 2.6.16-rc5 or 2.6.15.4 so I can reproduce the error? Mark had two patches, I have had the most PIA time getting them to work, patch properly, etc.. With 2.6.16-rc5: # make bzImage CHK include/linux/version.h scripts/kconfig/conf -s arch/i386/Kconfig # # using defaults found in .config # SPLIT include/linux/autoconf.h -> include/config/* CHK include/linux/compile.h CHK usr/initramfs_list GEN .version CHK include/linux/compile.h UPD include/linux/compile.h CC init/version.o LD init/built-in.o LD .tmp_vmlinux1 drivers/built-in.o: In function `ata_to_sense_error': undefined reference to `print' drivers/built-in.o: In function `ata_to_sense_error': undefined reference to `print' make: *** [.tmp_vmlinux1] Error 1 Command exited with non-zero status 2 ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 15:34 ` Jeff Garzik 2006-02-28 16:57 ` Eric D. Mudama @ 2006-03-01 17:41 ` David Greaves 2006-03-01 17:46 ` Mark Lord 1 sibling, 1 reply; 147+ messages in thread From: David Greaves @ 2006-03-01 17:41 UTC (permalink / raw) To: Jeff Garzik Cc: Mark Lord, Tejun Heo, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Jeff Garzik wrote: > Mark Lord wrote: > >> David Greaves wrote: >> >>> >>> scsi1 : sata_sil >>> Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC >>> Type: Direct-Access ANSI SCSI revision: 05 >>> Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC >>> Type: Direct-Access ANSI SCSI revision: 05 >> >> >> >> I wonder if the non-FUA component here is the sata_sil, >> rather than the two Maxtor drives. >> >> Also, your drives have different firmware, >> but both have trouble with FUA here. > > > sata_sil is indeed a piece of hardware that needs to know the opcodes > ahead of time... > > Jeff > I actually have 3 of those drives - one runs through sata_via and doesn't have the same problem. (the sata_via ones *do* have : ata3: status=0x50 { DriveReady SeekComplete } ata3: PIO error problems with SMART) David ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 17:41 ` David Greaves @ 2006-03-01 17:46 ` Mark Lord 2006-03-01 18:12 ` David Greaves 0 siblings, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-03-01 17:46 UTC (permalink / raw) To: David Greaves Cc: Jeff Garzik, Tejun Heo, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds David Greaves wrote: > > I actually have 3 of those drives - one runs through sata_via and > doesn't have the same problem. > > (the sata_via ones *do* have : > ata3: status=0x50 { DriveReady SeekComplete } > ata3: PIO error > problems with SMART) And once again, not enough information in the error messages for anyone to actually do anything about it (not David's fault). What command do you use to get that bug to pop up? BTW: hdparm-6.5 is now available (sourceforge), and should show all of the fancy features of your drives for comparism between versions. Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 17:46 ` Mark Lord @ 2006-03-01 18:12 ` David Greaves 2006-03-01 18:30 ` Mark Lord 0 siblings, 1 reply; 147+ messages in thread From: David Greaves @ 2006-03-01 18:12 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, Tejun Heo, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Mark Lord wrote: > David Greaves wrote: > >> >> I actually have 3 of those drives - one runs through sata_via and >> doesn't have the same problem. >> >> (the sata_via ones *do* have : >> ata3: status=0x50 { DriveReady SeekComplete } >> ata3: PIO error >> problems with SMART) > > > And once again, not enough information in the error messages > for anyone to actually do anything about it (not David's fault). > > What command do you use to get that bug to pop up? (FYI I'm running 2.6.15 with both 'info' patches 'cos I'm scared of 2.6.16-rc4!) haze:/usr/src# smartctl -data -s on /dev/sdc smartctl version 5.34 [i686-pc-linux-gnu] Copyright (C) 2002-5 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF ENABLE/DISABLE COMMANDS SECTION === SMART Enabled. No messages in dmesg haze:/usr/src# smartctl -data -o on /dev/sdc smartctl version 5.34 [i686-pc-linux-gnu] Copyright (C) 2002-5 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF ENABLE/DISABLE COMMANDS SECTION === Error SMART Enable Automatic Offline failed: Input/output error Smartctl: SMART Enable Automatic Offline Failed. dmesg contains this message repeated 31 times: ata3: PIO error ata3: status=0x50 { DriveReady SeekComplete } haze:/usr/src# smartctl -data -o off /dev/sdc succeeds but gives me: ata3: status=0x50 { DriveReady SeekComplete } ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x04 { DriveStatusError } ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x04 { DriveStatusError } ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x04 { DriveStatusError } haze:/usr/src# smartctl -data -o on /dev/sdd smartctl version 5.34 [i686-pc-linux-gnu] Copyright (C) 2002-5 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF ENABLE/DISABLE COMMANDS SECTION === Error SMART Enable Automatic Offline failed: Input/output error Smartctl: SMART Enable Automatic Offline Failed. ata4: PIO error ata4: status=0x50 { DriveReady SeekComplete } # smartctl -data -o off /dev/sdd ata4: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata4: status=0x51 { DriveReady SeekComplete Error } ata4: error=0x04 { DriveStatusError } ata4: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata4: status=0x51 { DriveReady SeekComplete Error } ata4: error=0x04 { DriveStatusError } ata4: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata4: status=0x51 { DriveReady SeekComplete Error } ata4: error=0x04 { DriveStatusError } haze:/usr/src# hdparm --Istdout /dev/sdc /dev/sdc: IO_support = 0 (default 16-bit) readonly = 0 (off) readahead = 256 (on) geometry = 19457/255/63, sectors = 312581808, start = 0 0c5a 3fff c837 0010 0000 0000 003f 0000 0000 0000 334a 5332 4b53 4c33 2020 2020 2020 2020 2020 2020 0000 4000 0004 332e 3138 2020 2020 5354 3331 3630 3032 3341 5320 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 8010 0000 2f00 0000 0200 0200 0007 3fff 0010 003f fc10 00fb 0110 ffff 0fff 0000 0007 0003 0078 0078 00f0 0078 0000 0000 0000 0000 0000 0000 0000 0002 0000 0000 0000 007e 001b 346b 7d01 4003 3468 3c01 4003 407f 0000 0000 fefe 0000 0000 fe00 0000 0000 0000 0000 0000 9eb0 12a1 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 9eb0 12a1 9eb0 12a1 2020 0002 42b6 8000 008a 3c06 3c0a ffff 07c6 0100 0800 0ff0 1000 0002 0030 0000 0000 0000 fe06 0000 0002 0050 008a 954f 0000 0023 000b 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 7ea5 haze:/usr/src# hdparm --Istdout /dev/sdd /dev/sdd: IO_support = 0 (default 16-bit) readonly = 0 (off) readahead = 256 (on) geometry = 24792/255/63, sectors = 398297088, start = 0 0040 3fff c837 0010 0000 0000 003f 0000 0000 0000 4234 3152 5643 3248 2020 2020 2020 2020 2020 2020 0003 4000 0004 4241 4e43 3142 5930 4d61 7874 6f72 2036 4232 3030 4d30 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 8010 0000 2f00 4000 0200 0000 0007 3fff 0010 003f fc10 00fb 0110 ffff 0fff 0000 0007 0003 0078 0078 0078 0078 0000 0000 0000 0000 0000 0000 001f 0102 0000 0000 0000 00fe 001e 7c6b 7f09 4063 7c68 3e01 4063 407f 0000 0000 0000 fffe 0000 c0fe 0000 0000 0000 0000 0000 8800 17bd 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0113 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 a6a5 David > > BTW: > hdparm-6.5 is now available (sourceforge), > and should show all of the fancy features > of your drives for comparism between versions. OK - soonish... ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 18:12 ` David Greaves @ 2006-03-01 18:30 ` Mark Lord 2006-03-01 18:32 ` Justin Piszcz ` (3 more replies) 0 siblings, 4 replies; 147+ messages in thread From: Mark Lord @ 2006-03-01 18:30 UTC (permalink / raw) To: David Greaves Cc: Jeff Garzik, Tejun Heo, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds David Greaves wrote: > > haze:/usr/src# smartctl -data -o off /dev/sdc > succeeds but gives me: > > ata3: status=0x50 { DriveReady SeekComplete } > ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 > ata3: status=0x51 { DriveReady SeekComplete Error } > ata3: error=0x04 { DriveStatusError } > ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 > ata3: status=0x51 { DriveReady SeekComplete Error } > ata3: error=0x04 { DriveStatusError } > ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 > ata3: status=0x51 { DriveReady SeekComplete Error } > ata3: error=0x04 { DriveStatusError } "DriveStatusError" is "Command Aborted" in ac-speak. From the man page for smartctl, we read: >-o VALUE Enables or disables SMART automatic offline test ... >Note that the SMART automatic offline test command is listed as "Obsolete" in every >version of the ATA and ATA/ATAPI Specifications. It was originally part of the >SFF-8035i Revision 2.0 specification, but was never part of any ATA specification. There's a chance that your drives simply do not fully support this feature, and are rejecting attempts to use it. By the way, the latest 2.6.16-rc5-git4 is available, and has FUA turned off by default now. So it should work with your drives, and *you* are expected to verify that for us all now. Cheers -ml ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 18:30 ` Mark Lord @ 2006-03-01 18:32 ` Justin Piszcz 2006-03-01 18:33 ` Justin Piszcz ` (2 subsequent siblings) 3 siblings, 0 replies; 147+ messages in thread From: Justin Piszcz @ 2006-03-01 18:32 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Wed, 1 Mar 2006, Mark Lord wrote: > David Greaves wrote: >> >> haze:/usr/src# smartctl -data -o off /dev/sdc >> succeeds but gives me: >> >> ata3: status=0x50 { DriveReady SeekComplete } >> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } >> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } >> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } > > "DriveStatusError" is "Command Aborted" in ac-speak. > From the man page for smartctl, we read: > >> -o VALUE Enables or disables SMART automatic offline test ... >> Note that the SMART automatic offline test command is listed as "Obsolete" > in every >> version of the ATA and ATA/ATAPI Specifications. It was originally part > of the >> SFF-8035i Revision 2.0 specification, but was never part of any ATA > specification. > > There's a chance that your drives simply do not fully support this feature, > and are rejecting attempts to use it. > > By the way, the latest 2.6.16-rc5-git4 is available, > and has FUA turned off by default now. So it should > work with your drives, and *you* are expected to verify > that for us all now. > > Cheers > > -ml > When running that command, I get it too: [4294684.510000] ACPI: PCI Interrupt 0000:02:06.0[A] -> GSI 22 (level, low) -> I RQ 17 [4294686.762000] process `syslogd' is using obsolete setsockopt SO_BSDCOMPAT [4295292.736000] +++PATCH: Original kernel error: [4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/AS CQ 0xb/00/00 [4295292.736000] +++PATCH: Mark Lord's extended verbosity patch: [4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error } [4295292.736000] ata3: error=0x04 { DriveStatusError } [4295292.736000] +++PATCH: Original kernel error: [4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/AS CQ 0xb/00/00 [4295292.736000] +++PATCH: Mark Lord's extended verbosity patch: [4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error } [4295292.736000] ata3: error=0x04 { DriveStatusError } [4295292.736000] +++PATCH: Original kernel error: [4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/AS CQ 0xb/00/00 [4295292.736000] +++PATCH: Mark Lord's extended verbosity patch: [4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error } [4295292.736000] ata3: error=0x04 { DriveStatusError } [4295292.736000] +++PATCH: Original kernel error: [4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/AS CQ 0xb/00/00 [4295292.736000] +++PATCH: Mark Lord's extended verbosity patch: [4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error } [4295292.736000] ata3: error=0x04 { DriveStatusError } [4295292.736000] +++PATCH: Original kernel error: [4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/AS CQ 0xb/00/00 [4295292.736000] +++PATCH: Mark Lord's extended verbosity patch: [4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error } [4295292.736000] ata3: error=0x04 { DriveStatusError } ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 18:30 ` Mark Lord 2006-03-01 18:32 ` Justin Piszcz @ 2006-03-01 18:33 ` Justin Piszcz 2006-03-01 18:48 ` David Greaves 2006-03-01 19:06 ` LibPATA code issues / 2.6.15.4 Justin Piszcz 3 siblings, 0 replies; 147+ messages in thread From: Justin Piszcz @ 2006-03-01 18:33 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Wed, 1 Mar 2006, Mark Lord wrote: > David Greaves wrote: >> >> haze:/usr/src# smartctl -data -o off /dev/sdc >> succeeds but gives me: >> >> ata3: status=0x50 { DriveReady SeekComplete } >> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } >> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } >> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } > > "DriveStatusError" is "Command Aborted" in ac-speak. > From the man page for smartctl, we read: > >> -o VALUE Enables or disables SMART automatic offline test ... >> Note that the SMART automatic offline test command is listed as "Obsolete" > in every >> version of the ATA and ATA/ATAPI Specifications. It was originally part > of the >> SFF-8035i Revision 2.0 specification, but was never part of any ATA > specification. > > There's a chance that your drives simply do not fully support this feature, > and are rejecting attempts to use it. > > By the way, the latest 2.6.16-rc5-git4 is available, > and has FUA turned off by default now. So it should > work with your drives, and *you* are expected to verify > that for us all now. > > Cheers > > -ml > Mark, After patching to 2.6.16-rc5-git4, we should no longer see these errors right? Then I can use my drive again without worrying about data loss? :) Justin. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 18:30 ` Mark Lord 2006-03-01 18:32 ` Justin Piszcz 2006-03-01 18:33 ` Justin Piszcz @ 2006-03-01 18:48 ` David Greaves 2006-03-01 19:49 ` David Greaves 2006-03-01 19:06 ` LibPATA code issues / 2.6.15.4 Justin Piszcz 3 siblings, 1 reply; 147+ messages in thread From: David Greaves @ 2006-03-01 18:48 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, Tejun Heo, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Mark Lord wrote: > By the way, the latest 2.6.16-rc5-git4 is available, > and has FUA turned off by default now. So it should > work with your drives, and *you* are expected to verify > that for us all now. Yeah, I know - I've got it on the machine... but it's my wife's machine. I've asked nicely but she's editing a Hercule Poirot video so I'm not allowed to reboot it for a while... I've told her I'm not making pancakes until I've tested it so expect a report Real Soon Now... David ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 18:48 ` David Greaves @ 2006-03-01 19:49 ` David Greaves 2006-03-03 19:38 ` Justin Piszcz 2006-03-05 11:43 ` Justin Piszcz 0 siblings, 2 replies; 147+ messages in thread From: David Greaves @ 2006-03-01 19:49 UTC (permalink / raw) To: David Greaves Cc: Mark Lord, Jeff Garzik, Tejun Heo, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds David Greaves wrote: >Mark Lord wrote: > > > >>By the way, the latest 2.6.16-rc5-git4 is available, >>and has FUA turned off by default now. So it should >>work with your drives, and *you* are expected to verify >>that for us all now. >> >> >Yeah, I know - I've got it on the machine... but it's my wife's machine. >I've asked nicely but she's editing a Hercule Poirot video so I'm not >allowed to reboot it for a while... > >I've told her I'm not making pancakes until I've tested it so expect a >report Real Soon Now... > > OK that worked (the pancakes - the kernel's not doing so well...) haze:~# uname -a Linux haze 2.6.16-rc5-git4 #2 PREEMPT Wed Mar 1 19:07:58 UTC 2006 i686 GNU/Linux The boot is pretty clean. I ran an xfs_repair -n on the lvm volume and got the following errors. The repair reported a clean filesystem and the drive was not booted from the raid so that's a big improvement. I was not able to trigger similar messages on ata1 but a simple dd doesn't trigger the messages on ata2 either (and for various reasons, xfs_repair wouldn't run on ata1 - I thought I'd leave it and report this first) ata2: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: error=0x04 { DriveStatusError } ata2: no sense translation for status: 0x51 ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for status: 0x51 ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for status: 0x51 ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for status: 0x51 ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for status: 0x51 ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for status: 0x51 ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for status: 0x51 ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } David -- ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 19:49 ` David Greaves @ 2006-03-03 19:38 ` Justin Piszcz 2006-03-03 22:46 ` David Greaves 2006-03-05 11:43 ` Justin Piszcz 1 sibling, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-03-03 19:38 UTC (permalink / raw) To: David Greaves Cc: Mark Lord, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Wed, 1 Mar 2006, David Greaves wrote: > David Greaves wrote: > >> Mark Lord wrote: >> >> >> >>> By the way, the latest 2.6.16-rc5-git4 is available, >>> and has FUA turned off by default now. So it should >>> work with your drives, and *you* are expected to verify >>> that for us all now. >>> >>> >> Yeah, I know - I've got it on the machine... but it's my wife's machine. >> I've asked nicely but she's editing a Hercule Poirot video so I'm not >> allowed to reboot it for a while... >> >> I've told her I'm not making pancakes until I've tested it so expect a >> report Real Soon Now... >> >> > OK that worked (the pancakes - the kernel's not doing so well...) > > haze:~# uname -a > Linux haze 2.6.16-rc5-git4 #2 PREEMPT Wed Mar 1 19:07:58 UTC 2006 i686 > GNU/Linux > > The boot is pretty clean. > I ran an xfs_repair -n on the lvm volume and got the following errors. > The repair reported a clean filesystem and the drive was not booted from > the raid so that's a big improvement. > > I was not able to trigger similar messages on ata1 but a simple dd > doesn't trigger the messages on ata2 either (and for various reasons, > xfs_repair wouldn't run on ata1 - I thought I'd leave it and report this > first) > > ata2: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: error=0x04 { DriveStatusError } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > > David > > -- > As of 2.6.16-rc5-git4, I have written 281GB so far over a period of 48+ hours with no errors yet :) Will keep you updated if I see any errors, but so far, so good! Thanks, Justin. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-03 19:38 ` Justin Piszcz @ 2006-03-03 22:46 ` David Greaves 2006-03-04 14:25 ` Mark Lord 0 siblings, 1 reply; 147+ messages in thread From: David Greaves @ 2006-03-03 22:46 UTC (permalink / raw) To: Justin Piszcz Cc: Mark Lord, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Just FYI - I'm away (in Canada) for 2 weeks so can't do any additional testing until I return. David -- ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-03 22:46 ` David Greaves @ 2006-03-04 14:25 ` Mark Lord 2006-03-06 6:13 ` David Greaves 0 siblings, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-03-04 14:25 UTC (permalink / raw) To: David Greaves Cc: Justin Piszcz, Mark Lord, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds David Greaves wrote: > Just FYI - I'm away (in Canada) for 2 weeks so can't do any additional > testing until I return. Am I correct, in that your last test on rc5-git4 was a failure? But without the "opcode" display in the error messages, so we have no idea exactly what caused the errors (again!)? [Whatcha doin up here?] Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-04 14:25 ` Mark Lord @ 2006-03-06 6:13 ` David Greaves 2006-03-21 18:11 ` David Greaves 0 siblings, 1 reply; 147+ messages in thread From: David Greaves @ 2006-03-06 6:13 UTC (permalink / raw) To: Mark Lord Cc: Justin Piszcz, Mark Lord, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Mark Lord wrote: > David Greaves wrote: >> Just FYI - I'm away (in Canada) for 2 weeks so can't do any additional >> testing until I return. > > Am I correct, in that your last test on rc5-git4 was a failure? It was *much* better than rc4 but it did have an error. I *think* the problem I'm seeing is likely to be similar to the one I orginally reported (on 2.6.15 IIRC) Same sporadic warning/error which didn't usually trigger the raid-boot-the-disk behaviour that the FUA code seemed to. > But without the "opcode" display in the error messages, > so we have no idea exactly what caused the errors (again!)? Yes. I thought the/a opcode-verbose patch was in there but I guess not. I don't have remote console access to the machine so wouldn't be able to carry out reliable kernel tests - sorry. Of course I'll do this as soon as I return. > > [Whatcha doin up here?] [:) 2weeks skiing in Whistler (this time - 10 days canadian canoeing in Algonquin last time!) Canada's great !!] David ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-06 6:13 ` David Greaves @ 2006-03-21 18:11 ` David Greaves 2006-03-22 15:23 ` David Greaves 0 siblings, 1 reply; 147+ messages in thread From: David Greaves @ 2006-03-21 18:11 UTC (permalink / raw) To: Mark Lord Cc: Justin Piszcz, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds David Greaves wrote: > Mark Lord wrote: > >> David Greaves wrote: >> >>> Just FYI - I'm away (in Canada) for 2 weeks so can't do any additional >>> testing until I return. >> >> >> Am I correct, in that your last test on rc5-git4 was a failure? > > It was *much* better than rc4 but it did have an error. > I *think* the problem I'm seeing is likely to be similar to the one I > orginally reported (on 2.6.15 IIRC) > Same sporadic warning/error which didn't usually trigger the > raid-boot-the-disk behaviour that the FUA code seemed to. > >> But without the "opcode" display in the error messages, >> so we have no idea exactly what caused the errors (again!)? > > Yes. I thought the/a opcode-verbose patch was in there but I guess not. > I don't have remote console access to the machine so wouldn't be able > to carry out reliable kernel tests - sorry. > Of course I'll do this as soon as I return. Hi Back now :) I've upgraded to 2.6.16 and applied your verbosity patches. I've persuaded my array to re-assemble and during the resync I got these messages dmesg: ata1: translated op=0x28 cmd=0x25 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x04 { DriveStatusError } ...(18mins later) ata1: no sense translation for op=0x28 cmd=0x25 status: 0x51 ata1: translated op=0x28 cmd=0x25 ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata1: status=0x51 { DriveReady SeekComplete Error } smartd is not running This did not cause the raid subsystem to boot the disk (thank goodness!) David ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-21 18:11 ` David Greaves @ 2006-03-22 15:23 ` David Greaves 0 siblings, 0 replies; 147+ messages in thread From: David Greaves @ 2006-03-22 15:23 UTC (permalink / raw) To: Mark Lord Cc: Justin Piszcz, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds David Greaves wrote: >I've upgraded to 2.6.16 and applied your verbosity patches. > >I've persuaded my array to re-assemble and during the resync I got these >messages > >dmesg: >ata1: translated op=0x28 cmd=0x25 ATA stat/err 0x51/04 to SCSI >SK/ASC/ASCQ 0xb/00/00 >ata1: status=0x51 { DriveReady SeekComplete Error } >ata1: error=0x04 { DriveStatusError } >...(18mins later) >ata1: no sense translation for op=0x28 cmd=0x25 status: 0x51 >ata1: translated op=0x28 cmd=0x25 ATA stat/err 0x51/00 to SCSI >SK/ASC/ASCQ 0x3/11/04 >ata1: status=0x51 { DriveReady SeekComplete Error } > >smartd is not running >This did not cause the raid subsystem to boot the disk (thank goodness!) > > Just providing a little more followon information... I have had a further 52 of these messages over the last day. No obvious cause. Mar 22 13:14:55 haze kernel: ata2: no sense translation for op=0x28 cmd=0x25 status: 0x51 Mar 22 13:14:55 haze kernel: ata2: status=0x51 { DriveReady SeekComplete Error } Most recently this happened: Mar 22 13:47:09 haze kernel: ata2: no sense translation for op=0x28 cmd=0x25 status: 0x51 Mar 22 13:47:09 haze kernel: ata2: status=0x51 { DriveReady SeekComplete Error } Mar 22 13:47:09 haze kernel: sd 1:0:0:0: SCSI error: return code = 0x8000002 Mar 22 13:47:09 haze kernel: sdb: Current: sense key: Medium Error Mar 22 13:47:09 haze kernel: Additional sense: Unrecovered read error - auto reallocate failed Mar 22 13:47:09 haze kernel: end_request: I/O error, dev sdb, sector 396518289 with dmesg piping up with: raid1: sdb2: rescheduling sector 5801424 raid1: sdd2: redirecting sector 5801424 to another mirror no drives were kicked from the array. David -- ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 19:49 ` David Greaves 2006-03-03 19:38 ` Justin Piszcz @ 2006-03-05 11:43 ` Justin Piszcz 2006-03-05 12:41 ` Justin Piszcz 1 sibling, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-03-05 11:43 UTC (permalink / raw) To: David Greaves Cc: Mark Lord, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Wed, 1 Mar 2006, David Greaves wrote: > David Greaves wrote: > >> Mark Lord wrote: >> >> >> >>> By the way, the latest 2.6.16-rc5-git4 is available, >>> and has FUA turned off by default now. So it should >>> work with your drives, and *you* are expected to verify >>> that for us all now. >>> >>> >> Yeah, I know - I've got it on the machine... but it's my wife's machine. >> I've asked nicely but she's editing a Hercule Poirot video so I'm not >> allowed to reboot it for a while... >> >> I've told her I'm not making pancakes until I've tested it so expect a >> report Real Soon Now... >> >> > OK that worked (the pancakes - the kernel's not doing so well...) > > haze:~# uname -a > Linux haze 2.6.16-rc5-git4 #2 PREEMPT Wed Mar 1 19:07:58 UTC 2006 i686 > GNU/Linux > > The boot is pretty clean. > I ran an xfs_repair -n on the lvm volume and got the following errors. > The repair reported a clean filesystem and the drive was not booted from > the raid so that's a big improvement. > > I was not able to trigger similar messages on ata1 but a simple dd > doesn't trigger the messages on ata2 either (and for various reasons, > xfs_repair wouldn't run on ata1 - I thought I'd leave it and report this > first) > > ata2: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: error=0x04 { DriveStatusError } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > > David > > -- > Using 2.6.16-rc5-git4 and removing a directory of around 5.0GB of files while streaming a 1MB/s video stream on another (SATA disk), the I/O seemed to freeze up for a moment and I got this error: [4342671.839000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x22 Only 1 in dmesg, any idea what causes this error? ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-05 11:43 ` Justin Piszcz @ 2006-03-05 12:41 ` Justin Piszcz 2006-03-05 22:58 ` Mark Lord 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-03-05 12:41 UTC (permalink / raw) To: David Greaves Cc: Mark Lord, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Sun, 5 Mar 2006, Justin Piszcz wrote: > On Wed, 1 Mar 2006, David Greaves wrote: > >> David Greaves wrote: >> >>> Mark Lord wrote: >>> >>> >>> >>>> By the way, the latest 2.6.16-rc5-git4 is available, >>>> and has FUA turned off by default now. So it should >>>> work with your drives, and *you* are expected to verify >>>> that for us all now. >>>> >>>> >>> Yeah, I know - I've got it on the machine... but it's my wife's machine. >>> I've asked nicely but she's editing a Hercule Poirot video so I'm not >>> allowed to reboot it for a while... >>> >>> I've told her I'm not making pancakes until I've tested it so expect a >>> report Real Soon Now... >>> >>> >> OK that worked (the pancakes - the kernel's not doing so well...) >> >> haze:~# uname -a >> Linux haze 2.6.16-rc5-git4 #2 PREEMPT Wed Mar 1 19:07:58 UTC 2006 i686 >> GNU/Linux >> >> The boot is pretty clean. >> I ran an xfs_repair -n on the lvm volume and got the following errors. >> The repair reported a clean filesystem and the drive was not booted from >> the raid so that's a big improvement. >> >> I was not able to trigger similar messages on ata1 but a simple dd >> doesn't trigger the messages on ata2 either (and for various reasons, >> xfs_repair wouldn't run on ata1 - I thought I'd leave it and report this >> first) >> >> ata2: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: error=0x04 { DriveStatusError } >> ata2: no sense translation for status: 0x51 >> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: no sense translation for status: 0x51 >> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: no sense translation for status: 0x51 >> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: no sense translation for status: 0x51 >> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: no sense translation for status: 0x51 >> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: no sense translation for status: 0x51 >> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: no sense translation for status: 0x51 >> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> >> David >> >> -- >> > > Using 2.6.16-rc5-git4 and removing a directory of around 5.0GB of files while > streaming a 1MB/s video stream on another (SATA disk), the I/O seemed to > freeze up for a moment and I got this error: > > [4342671.839000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x22 > > Only 1 in dmesg, any idea what causes this error? > > The drive it occured on was a 74GB raptor on an ICH5 controller. [4294673.245000] Vendor: ATA Model: WDC WD740GD-00FL Rev: 33.0 0000:00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02) ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-05 12:41 ` Justin Piszcz @ 2006-03-05 22:58 ` Mark Lord 2006-03-05 23:00 ` Mark Lord 2006-03-05 23:39 ` Jeff Garzik 0 siblings, 2 replies; 147+ messages in thread From: Mark Lord @ 2006-03-05 22:58 UTC (permalink / raw) To: Justin Piszcz Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Justin Piszcz wrote: > >> Using 2.6.16-rc5-git4 and removing a directory of around 5.0GB of >> files while streaming a 1MB/s video stream on another (SATA disk), the >> I/O seemed to freeze up for a moment and I got this error: >> >> [4342671.839000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x22 >> >> Only 1 in dmesg, any idea what causes this error? > > The drive it occured on was a 74GB raptor on an ICH5 controller. > > [4294673.245000] Vendor: ATA Model: WDC WD740GD-00FL Rev: 33.0 > 0000:00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA > Controller (rev 02) SCSI opcode 0x35 is SYNCHRONIZE_CACHE. Pity we don't know exactly what that got translated to by libata. It would have been either a FLUSH_CACHE of some kind, or possibly(?) one of the _FUA_ commands. Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-05 22:58 ` Mark Lord @ 2006-03-05 23:00 ` Mark Lord 2006-03-05 23:19 ` Justin Piszcz 2006-03-05 23:39 ` Jeff Garzik 1 sibling, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-03-05 23:00 UTC (permalink / raw) To: Justin Piszcz Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Mark Lord wrote: > Justin Piszcz wrote: >> >>> Using 2.6.16-rc5-git4 and removing a directory of around 5.0GB of >>> files while streaming a 1MB/s video stream on another (SATA disk), >>> the I/O seemed to freeze up for a moment and I got this error: >>> >>> [4342671.839000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x22 >>> >>> Only 1 in dmesg, any idea what causes this error? >> >> The drive it occured on was a 74GB raptor on an ICH5 controller. >> >> [4294673.245000] Vendor: ATA Model: WDC WD740GD-00FL Rev: 33.0 >> 0000:00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA >> Controller (rev 02) > > SCSI opcode 0x35 is SYNCHRONIZE_CACHE. Oh, wait a sec.. on that path, libata actually does show the ATA opcode, which would have been WRITE_DMA_EXT. Not an FUA command. Dunno what it's complaining about, though. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-05 23:00 ` Mark Lord @ 2006-03-05 23:19 ` Justin Piszcz 0 siblings, 0 replies; 147+ messages in thread From: Justin Piszcz @ 2006-03-05 23:19 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Sun, 5 Mar 2006, Mark Lord wrote: > Mark Lord wrote: >> Justin Piszcz wrote: >>> >>>> Using 2.6.16-rc5-git4 and removing a directory of around 5.0GB of files >>>> while streaming a 1MB/s video stream on another (SATA disk), the I/O >>>> seemed to freeze up for a moment and I got this error: >>>> >>>> [4342671.839000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x22 >>>> >>>> Only 1 in dmesg, any idea what causes this error? >>> >>> The drive it occured on was a 74GB raptor on an ICH5 controller. >>> >>> [4294673.245000] Vendor: ATA Model: WDC WD740GD-00FL Rev: 33.0 >>> 0000:00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA >>> Controller (rev 02) >> >> SCSI opcode 0x35 is SYNCHRONIZE_CACHE. > > Oh, wait a sec.. on that path, libata actually does show the ATA opcode, > which would have been WRITE_DMA_EXT. Not an FUA command. > > Dunno what it's complaining about, though. > Well I know what it was now... The hard drive (RAPTOR/74GB failed)... [4294685.928000] process `syslogd' is using obsolete setsockopt SO_BSDCOMPAT [4342671.839000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x22 [4347012.243000] ata1: command 0x25 timeout, stat 0x50 host_stat 0x20 [4347157.486000] ata1: command 0x25 timeout, stat 0x80 host_stat 0x22 [4347157.486000] ata1: translated ATA stat/err 0x80/00 to SCSI SK/ASC/ASCQ 0xb/4 7/00 [4347157.486000] ata1: status=0x80 { Busy } [4347157.486000] sd 0:0:0:0: SCSI error: return code = 0x8000002 [4347157.486000] sda: Current: sense key=0xb [4347157.486000] ASC=0x47 ASCQ=0x0 [4347157.486000] end_request: I/O error, dev sda, sector 27646928 [4347157.486000] Buffer I/O error on device sda, logical block 3455866 [4347157.486000] ATA: abnormal status 0x80 on port 0xC007 [4347157.486000] ATA: abnormal status 0x80 on port 0xC007 [4347157.486000] ATA: abnormal status 0x80 on port 0xC007 [4347187.486000] ata1: command 0x25 timeout, stat 0x50 host_stat 0x21 [4347407.657000] ATA: abnormal status 0x80 on port 0xC007 [4347407.657000] ATA: abnormal status 0x80 on port 0xC007 [4347407.657000] ATA: abnormal status 0x80 on port 0xC007 [4347437.656000] ata1: command 0x35 timeout, stat 0x80 host_stat 0x21 [4347437.656000] ata1: translated ATA stat/err 0x80/00 to SCSI SK/ASC/ASCQ 0xb/4 7/00 [4347437.656000] ata1: status=0x80 { Busy } [4347437.656000] sd 0:0:0:0: SCSI error: return code = 0x8000002 [4347437.656000] sda: Current: sense key=0xb [4347437.656000] ASC=0x47 ASCQ=0x0 [4347437.656000] end_request: I/O error, dev sda, sector 76339746 [4347437.656000] ATA: abnormal status 0x80 on port 0xC007 [4347437.656000] ATA: abnormal status 0x80 on port 0xC007 [4347437.656000] ATA: abnormal status 0x80 on port 0xC007 [4347467.656000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x21 [4347467.656000] Device sda2 - XFS write error in file system meta-data block 0x 449af90 in sda2 [4347467.656000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x21 [4347467.656000] Device sda2 - XFS write error in file system meta-data block 0x 449af90 in sda2 [4347497.656000] ata1: command 0x25 timeout, stat 0x50 host_stat 0x21 [4347527.663000] ata1: command 0x25 timeout, stat 0x50 host_stat 0x22 [4347527.663000] Unable to handle kernel paging request at virtual address 858f9 a70 [4347527.663000] printing eip: [4347527.663000] c021ff87 [4347527.663000] *pde = 00000000 [4347527.663000] Oops: 0000 [#1] [4347527.663000] PREEMPT SMP [4347527.663000] CPU: 0 [4347527.663000] EIP: 0060:[<c021ff87>] Not tainted VLI [4347527.663000] EFLAGS: 00210282 (2.6.16-rc5-git4 #3) [4347527.663000] EIP is at xfs_dir2_block_lookup_int+0xb0/0x1e9 [4347527.663000] eax: 9b86a560 ebx: 00000000 ecx: cdc352b0 edx: 00000000 [4347527.663000] esi: 177504f0 edi: 5e5cb7f4 ebp: 00000000 esp: f6c8bd18 [4347527.663000] ds: 007b es: 007b ss: 0068 [4347527.663000] Process nfsd (pid: 1359, threadinfo=f6c8a000 task=f7c14030) [4347527.663000] Stack: <0>00000000 c91fa944 00000000 021a0480 00000000 f6c8bd64 00000000 f6c8bd84 [4347527.663000] f6c8bd88 f6c8bdac c73e7438 f6f916c0 00000004 f7dbc800 00 000000 f3aa2000 [4347527.663000] 61a5869b c91fa9ac f7db9380 c73e7438 00000000 c91fa944 f6 c8bdac 00000000 [4347527.663000] Call Trace: [4347527.663000] [<c02200da>] xfs_dir2_block_lookup+0x1a/0xa1 [4347527.663000] [<c021f721>] xfs_dir2_lookup+0xd3/0x151 [4347527.663000] [<c035e9d3>] ip_output+0x171/0x2de [4347527.663000] [<c035e1c9>] ip_finish_output+0x0/0x22d [4347527.663000] [<c024e836>] xfs_dir_lookup_int+0x40/0x125 [4347527.663000] [<c0150b0d>] cache_alloc_refill+0xf1/0x50c [4347527.663000] [<c0252b39>] xfs_lookup+0x5f/0x88 [4347527.663000] [<c02613cc>] linvfs_lookup+0x52/0x99 [4347527.663000] [<c0161563>] __lookup_hash+0xc4/0xf3 [4347527.663000] [<c016160f>] lookup_one_len+0x7d/0x84 [4347527.663000] [<c01ad6c7>] nfsd_lookup+0xc0/0x4b2 [4347527.663000] [<c01b4bcd>] nfsd3_proc_lookup+0xa5/0xf3 [4347527.663000] [<c01a9497>] nfsd_dispatch+0x9c/0x214 [4347527.663000] [<c039fb21>] svc_process+0x3bf/0x69e [4347527.663000] [<c01a97bc>] nfsd+0x1ad/0x331 [4347527.663000] [<c01a960f>] nfsd+0x0/0x331 [4347527.663000] [<c0100e95>] kernel_thread_helper+0x5/0xb [4347527.663000] Code: 89 44 24 40 89 c2 0f ca 8d 04 d5 00 00 00 00 29 c6 8d 42 ff 8b 4c 24 24 8b 79 14 31 d2 eb 07 8d 51 01 39 c2 7f 17 8d 0c 02 d1 f9 <8b> 1c ce 0f cb 39 df 74 2a 77 e9 8d 41 ff 39 c2 7e e9 8b 74 24 [4347527.663000] [4347527.663000] <4>ATA: abnormal status 0x80 on port 0xC007 [4347567.674000] ATA: abnormal status 0x80 on port 0xC007 [4347567.674000] ATA: abnormal status 0x80 on port 0xC007 [4347597.674000] ata1: command 0x35 timeout, stat 0x80 host_stat 0x21 [4347597.674000] ata1: translated ATA stat/err 0x80/00 to SCSI SK/ASC/ASCQ 0xb/4 7/00 [4347597.674000] ata1: status=0x80 { Busy } [4347597.674000] sd 0:0:0:0: SCSI error: return code = 0x8000002 [4347597.674000] sda: Current: sense key=0xb [4347597.674000] ASC=0x47 ASCQ=0x0 [4347597.674000] end_request: I/O error, dev sda, sector 4401810 [4347597.674000] ATA: abnormal status 0x80 on port 0xC007 [4347597.674000] ATA: abnormal status 0x80 on port 0xC007 [4347597.674000] ATA: abnormal status 0x80 on port 0xC007 [4347627.674000] ata1: command 0x35 timeout, stat 0x80 host_stat 0x21 [4347627.674000] ata1: translated ATA stat/err 0x80/00 to SCSI SK/ASC/ASCQ 0xb/4 7/00 [4347627.674000] ata1: status=0x80 { Busy } [4347627.674000] sd 0:0:0:0: SCSI error: return code = 0x8000002 [4347627.674000] sda: Current: sense key=0xb [4347627.674000] ASC=0x47 ASCQ=0x0 [4347627.674000] end_request: I/O error, dev sda, sector 110074018 [4347627.674000] ATA: abnormal status 0x80 on port 0xC007 [4347627.674000] ATA: abnormal status 0x80 on port 0xC007 [4347627.674000] ATA: abnormal status 0x80 on port 0xC007 .. ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } SCSI error : <0 0 0 0> return code = 0x8000002 sda: Current: sense key=0x3 ASC=0x11 ASCQ=0x4 end_request: I/O error, dev sda, sector 66006018 Buffer I/O error on device sda2, logical block 61604208 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } SCSI error : <0 0 0 0> return code = 0x8000002 sda: Current: sense key=0x3 ASC=0x11 ASCQ=0x4 end_request: I/O error, dev sda, sector 66006019 Buffer I/O error on device sda2, logical block 61604209 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } SCSI error : <0 0 0 0> return code = 0x8000002 sda: Current: sense key=0x3 ASC=0x11 ASCQ=0x4 end_request: I/O error, dev sda, sector 66006020 Buffer I/O error on device sda2, logical block 61604210 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } SCSI error : <0 0 0 0> return code = 0x8000002 sda: Current: sense key=0x3 ASC=0x11 ASCQ=0x4 end_request: I/O error, dev sda, sector 66006021 Buffer I/O error on device sda2, logical block 61604211 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } SCSI error : <0 0 0 0> return code = 0x8000002 sda: Current: sense key=0x3 ASC=0x11 ASCQ=0x4 end_request: I/O error, dev sda, sector 66006018 Buffer I/O error on device sda2, logical block 61604208 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } SCSI error : <0 0 0 0> return code = 0x8000002 sda: Current: sense key=0x3 ASC=0x11 ASCQ=0x4 end_request: I/O error, dev sda, sector 66006019 .. I later ran mkfs.ext2 -c /dev/sda and it kept returning errors such as these: ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x40 { UncorrectableError } ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x40 { UncorrectableError } ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x40 { UncorrectableError } ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x40 { UncorrectableError } ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x40 { UncorrectableError } SCSI error : <2 0 0 0> return code = 0x8000002 sda: Current: sense key=0x3 ASC=0x11 ASCQ=0x4 end_request: I/O error, dev sda, sector 66006016 I ran WD's tool on the drive, it confirmed it had problems. Luckily I have a spare raptor and restored from backup and I am now back up and running with no errors yet. Justin. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-05 22:58 ` Mark Lord 2006-03-05 23:00 ` Mark Lord @ 2006-03-05 23:39 ` Jeff Garzik 2006-04-21 19:14 ` LibPATA code issues / 2.6.16 (previously, 2.6.15.x) Justin Piszcz 1 sibling, 1 reply; 147+ messages in thread From: Jeff Garzik @ 2006-03-05 23:39 UTC (permalink / raw) To: Mark Lord Cc: Justin Piszcz, David Greaves, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Mark Lord wrote: > SCSI opcode 0x35 is SYNCHRONIZE_CACHE. > > Pity we don't know exactly what that got translated to by libata. Gave up on reading code? If not, we know exactly what it was translated into. Jeff ^ permalink raw reply [flat|nested] 147+ messages in thread
* LibPATA code issues / 2.6.16 (previously, 2.6.15.x) 2006-03-05 23:39 ` Jeff Garzik @ 2006-04-21 19:14 ` Justin Piszcz 2006-04-21 19:18 ` Jeff Garzik 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-04-21 19:14 UTC (permalink / raw) To: Jeff Garzik Cc: Mark Lord, David Greaves, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds, smartmontools-support Yet a new problem, under 2.6.16, when I fill up the disk, smartmontools reports this: Apr 21 14:24:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable (pending) sectors Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable (pending) sectors Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Offline uncorrectable sectors What made it error under 2.6.16? $ time dd if=/dev/zero of=file.out dd: writing to `file.out': No space left on device 781118873+0 records in 781118872+0 records out 399932862464 bytes (400 GB) copied, 8873.06 seconds, 45.1 MB/s real 147m53.092s user 8m1.395s sys 42m4.500s $ Under 2.6.15.x, I did not see this behavior, is this going bad, or? Thanks, Justin. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x) 2006-04-21 19:14 ` LibPATA code issues / 2.6.16 (previously, 2.6.15.x) Justin Piszcz @ 2006-04-21 19:18 ` Jeff Garzik 2006-04-21 19:28 ` Linus Torvalds 0 siblings, 1 reply; 147+ messages in thread From: Jeff Garzik @ 2006-04-21 19:18 UTC (permalink / raw) To: Justin Piszcz Cc: Mark Lord, David Greaves, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds, smartmontools-support Justin Piszcz wrote: > Yet a new problem, under 2.6.16, when I fill up the disk, smartmontools > reports this: > > Apr 21 14:24:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable > (pending) sectors > Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Currently unreadable > (pending) sectors > Apr 21 14:54:20 p34 smartd[1443]: Device: /dev/sdc, 1 Offline uncorrectable > sectors > > What made it error under 2.6.16? > > $ time dd if=/dev/zero of=file.out > dd: writing to `file.out': No space left on device > 781118873+0 records in > 781118872+0 records out > 399932862464 bytes (400 GB) copied, 8873.06 seconds, 45.1 MB/s > > real 147m53.092s > user 8m1.395s > sys 42m4.500s > > $ > > Under 2.6.15.x, I did not see this behavior, is this going bad, or? That's a disk-level problem. You've got bad sectors. You can force the disk to replace the bad sectors by doing a disk-level write: dd if=/dev/zero of=/dev/sda1 bs=4k and then test the disk with smartctl -d ata -t long /dev/sda If sectors continue to die, the disk is toast. Jeff ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x) 2006-04-21 19:18 ` Jeff Garzik @ 2006-04-21 19:28 ` Linus Torvalds 2006-04-21 22:46 ` Jeff Garzik 0 siblings, 1 reply; 147+ messages in thread From: Linus Torvalds @ 2006-04-21 19:28 UTC (permalink / raw) To: Jeff Garzik Cc: Justin Piszcz, Mark Lord, David Greaves, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, smartmontools-support On Fri, 21 Apr 2006, Jeff Garzik wrote: > > You can force the disk to replace the bad sectors by doing a disk-level write: > > dd if=/dev/zero of=/dev/sda1 bs=4k NOTE! Obviously don't do this before you've backed up the disk. Depending on the filesystem, you might just have overwritten something important, or just your pr0n collection ;) Jeff, please be a little more careful about telling people commands like that. Some people might cut-and-paste the command without realizing what it's doing as a way to "fix" their problem. Linus ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x) 2006-04-21 19:28 ` Linus Torvalds @ 2006-04-21 22:46 ` Jeff Garzik 2006-04-22 0:05 ` Linus Torvalds 0 siblings, 1 reply; 147+ messages in thread From: Jeff Garzik @ 2006-04-21 22:46 UTC (permalink / raw) To: Linus Torvalds Cc: Justin Piszcz, Mark Lord, David Greaves, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, smartmontools-support Linus Torvalds wrote: > > On Fri, 21 Apr 2006, Jeff Garzik wrote: >> You can force the disk to replace the bad sectors by doing a disk-level write: >> >> dd if=/dev/zero of=/dev/sda1 bs=4k > > NOTE! Obviously don't do this before you've backed up the disk. Depending > on the filesystem, you might just have overwritten something important, or > just your pr0n collection ;) > > Jeff, please be a little more careful about telling people commands like > that. Some people might cut-and-paste the command without realizing what > it's doing as a way to "fix" their problem. Agreed, though the original poster had already done a 400GB dd from /dev/zero... Jeff ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x) 2006-04-21 22:46 ` Jeff Garzik @ 2006-04-22 0:05 ` Linus Torvalds 2006-05-06 15:09 ` [smartmontools-support]Re: " Leon Woestenberg 2006-06-11 11:13 ` Justin Piszcz 0 siblings, 2 replies; 147+ messages in thread From: Linus Torvalds @ 2006-04-22 0:05 UTC (permalink / raw) To: Jeff Garzik Cc: Justin Piszcz, Mark Lord, David Greaves, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, smartmontools-support On Fri, 21 Apr 2006, Jeff Garzik wrote: > > Agreed, though the original poster had already done a 400GB dd from > /dev/zero... Yes, but to a _file_ on the partition (ie he didn't overwrite any existign data, just the empty parts of a filesystem). I realize that it's not enough for the "re-allocate on write" behaviour, and for that you really _do_ need to re-write the whole disk to get all the broken blocks reallocated, but my argument was just that we should make sure to _tell_ people when they are overwriting all their old data ;) Linus ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: [smartmontools-support]Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x) 2006-04-22 0:05 ` Linus Torvalds @ 2006-05-06 15:09 ` Leon Woestenberg 2006-05-07 12:44 ` Ingo Oeser 2006-06-11 11:13 ` Justin Piszcz 1 sibling, 1 reply; 147+ messages in thread From: Leon Woestenberg @ 2006-05-06 15:09 UTC (permalink / raw) To: Linus Torvalds, smartmontools-support Cc: Jeff Garzik, Justin Piszcz, Mark Lord, David Greaves, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe Hi all, On Fri, 2006-04-21 at 17:05 -0700, Linus Torvalds wrote: > > On Fri, 21 Apr 2006, Jeff Garzik wrote: > > > > > Agreed, though the original poster had already done a 400GB dd from > > /dev/zero... > > Yes, but to a _file_ on the partition (ie he didn't overwrite any existign > data, just the empty parts of a filesystem). > > I realize that it's not enough for the "re-allocate on write" behaviour, > and for that you really _do_ need to re-write the whole disk to get all > the broken blocks reallocated, but my argument was just that we should > make sure to _tell_ people when they are overwriting all their old data ;) > I did not realize this before, and asked badblocks maintainer Theodore if badblocks /some/file was supported (the man page says no); but of course any filesystem can decide to re-allocate blocks for a file. However, for large files where parts may be bad sectors, I am still searching for a way to read, then re-write every physical sector occupied by the file. With the purpose to remap the bad sectors inside large MPEG files (where I would rather have a few zeroed holes than a read error in them). Anyone know such tooling exists? I suspect it has to use filesystem specific IOCTL's to query for the blocks involved. Regards, Leon ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: [smartmontools-support]Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x) 2006-05-06 15:09 ` [smartmontools-support]Re: " Leon Woestenberg @ 2006-05-07 12:44 ` Ingo Oeser 0 siblings, 0 replies; 147+ messages in thread From: Ingo Oeser @ 2006-05-07 12:44 UTC (permalink / raw) To: Leon Woestenberg Cc: Linus Torvalds, smartmontools-support, Jeff Garzik, Justin Piszcz, Mark Lord, David Greaves, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe On Saturday, 6. May 2006 17:09, Leon Woestenberg wrote: > However, for large files where parts may be bad sectors, I am still > searching for a way to read, then re-write every physical sector > occupied by the file. > > With the purpose to remap the bad sectors inside large MPEG files (where > I would rather have a few zeroed holes than a read error in them). This much easier to solve in the player software: do { ret = read(fd, buffer, size) if (ret > 0) { playbuffer(buffer, ret) } else if (ret < 0) { switch(errno) { case EIO: playbuffer(allzeroesbuffer, size); /* skip over this frame because of disk problems */ lseek(fd, size, SEEK_CUR); /* TODO: Handle return or lseek() here */ } } } while(ret != 0) > Anyone know such tooling exists? I suspect it has to use filesystem > specific IOCTL's to query for the blocks involved. The (somewhat) portable ioctl() FIBMAP would suffice. That way you find out what blocks are this file is mapped to, and could add some of these blocks to the badblock list of e2fsck. Regards Ingo Oeser ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.16 (previously, 2.6.15.x) 2006-04-22 0:05 ` Linus Torvalds 2006-05-06 15:09 ` [smartmontools-support]Re: " Leon Woestenberg @ 2006-06-11 11:13 ` Justin Piszcz 1 sibling, 0 replies; 147+ messages in thread From: Justin Piszcz @ 2006-06-11 11:13 UTC (permalink / raw) To: Linus Torvalds Cc: Jeff Garzik, Mark Lord, David Greaves, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, smartmontools-support [4597362.011000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4597362.011000] ata3: status=0x51 { DriveReady SeekComplete Error } [4597362.011000] ata3: error=0x04 { DriveStatusError } Now under 2.6.16.20. (was doing an rsync from 1 drive (IDE) -> to this SATA) drive. The SATA drive AFAIK does not have any issues, no bad sectors/etc, still the same drive as before, but this is the new one from the previous RMA. Just FYI. On Fri, 21 Apr 2006, Linus Torvalds wrote: > > > On Fri, 21 Apr 2006, Jeff Garzik wrote: > >> >> Agreed, though the original poster had already done a 400GB dd from >> /dev/zero... > > Yes, but to a _file_ on the partition (ie he didn't overwrite any existign > data, just the empty parts of a filesystem). > > I realize that it's not enough for the "re-allocate on write" behaviour, > and for that you really _do_ need to re-write the whole disk to get all > the broken blocks reallocated, but my argument was just that we should > make sure to _tell_ people when they are overwriting all their old data ;) > > Linus > ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 18:30 ` Mark Lord ` (2 preceding siblings ...) 2006-03-01 18:48 ` David Greaves @ 2006-03-01 19:06 ` Justin Piszcz 2006-03-01 19:28 ` Mark Lord 2006-03-01 19:35 ` Mark Lord 3 siblings, 2 replies; 147+ messages in thread From: Justin Piszcz @ 2006-03-01 19:06 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Wed, 1 Mar 2006, Mark Lord wrote: > David Greaves wrote: >> >> haze:/usr/src# smartctl -data -o off /dev/sdc >> succeeds but gives me: >> >> ata3: status=0x50 { DriveReady SeekComplete } >> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } >> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } >> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } > > "DriveStatusError" is "Command Aborted" in ac-speak. > From the man page for smartctl, we read: > >> -o VALUE Enables or disables SMART automatic offline test ... >> Note that the SMART automatic offline test command is listed as "Obsolete" > in every >> version of the ATA and ATA/ATAPI Specifications. It was originally part > of the >> SFF-8035i Revision 2.0 specification, but was never part of any ATA > specification. > > There's a chance that your drives simply do not fully support this feature, > and are rejecting attempts to use it. > > By the way, the latest 2.6.16-rc5-git4 is available, > and has FUA turned off by default now. So it should > work with your drives, and *you* are expected to verify > that for us all now. > > Cheers > > -ml > By the way, the latest 2.6.16-rc5-git4 is available, I am using 2.6.16-rc5-git4, and after running: # smartctl -data -o off /dev/sdc I get: [4294785.192000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4294785.192000] ata3: status=0x51 { DriveReady SeekComplete Error } [4294785.192000] ata3: error=0x04 { DriveStatusError } [4294785.192000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4294785.192000] ata3: status=0x51 { DriveReady SeekComplete Error } [4294785.192000] ata3: error=0x04 { DriveStatusError } [4294785.192000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4294785.192000] ata3: status=0x51 { DriveReady SeekComplete Error } [4294785.192000] ata3: error=0x04 { DriveStatusError } [4294785.192000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4294785.192000] ata3: status=0x51 { DriveReady SeekComplete Error } [4294785.192000] ata3: error=0x04 { DriveStatusError } [4294785.192000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4294785.192000] ata3: status=0x51 { DriveReady SeekComplete Error } [4294785.192000] ata3: error=0x04 { DriveStatusError } Did you mean you wanted us to test it like we normally do, ie, copy files/md5sum them on the disk and see if we can make it occur again, or? Justin. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 19:06 ` LibPATA code issues / 2.6.15.4 Justin Piszcz @ 2006-03-01 19:28 ` Mark Lord 2006-03-01 19:35 ` Mark Lord 1 sibling, 0 replies; 147+ messages in thread From: Mark Lord @ 2006-03-01 19:28 UTC (permalink / raw) To: Justin Piszcz Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Justin Piszcz wrote: > > I am using 2.6.16-rc5-git4, and after running: > > # smartctl -data -o off /dev/sdc > > I get: > > [4294785.192000] ata3: translated ATA stat/err 0x51/04 to SCSI > SK/ASC/ASCQ 0xb/00/00 > [4294785.192000] ata3: status=0x51 { DriveReady SeekComplete Error } > [4294785.192000] ata3: error=0x04 { DriveStatusError } That's probably just your drive reporting "unsupported sub-command". Nothing serious -- the man page for smartctl even mentions the possibility. Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 19:06 ` LibPATA code issues / 2.6.15.4 Justin Piszcz 2006-03-01 19:28 ` Mark Lord @ 2006-03-01 19:35 ` Mark Lord 2006-03-01 19:38 ` Justin Piszcz 1 sibling, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-03-01 19:35 UTC (permalink / raw) To: Justin Piszcz Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Justin Piszcz wrote: > > Did you mean you wanted us to test it like we normally do, ie, copy > files/md5sum them on the disk and see if we can make it occur again, or? Yes. The S.M.A.R.T. stuff doesn't matter nearly as much as normal I/O. And Justin, can you get those S.M.A.R.T. errors to pop up on 2.6.15 as well? ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 19:35 ` Mark Lord @ 2006-03-01 19:38 ` Justin Piszcz 2006-03-01 19:41 ` Jeff Garzik 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-03-01 19:38 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Wed, 1 Mar 2006, Mark Lord wrote: > Justin Piszcz wrote: >> >> Did you mean you wanted us to test it like we normally do, ie, copy >> files/md5sum them on the disk and see if we can make it occur again, or? > > Yes. The S.M.A.R.T. stuff doesn't matter nearly as much as normal I/O. > > And Justin, can you get those S.M.A.R.T. errors to pop up on 2.6.15 as well? > Have not tested, can test later if necessary, running some I/O tests to the disk which is probably going to take quite a while to see if I can get it to error again with 2.6.16-rc5-git4. Justin. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 19:38 ` Justin Piszcz @ 2006-03-01 19:41 ` Jeff Garzik 0 siblings, 0 replies; 147+ messages in thread From: Jeff Garzik @ 2006-03-01 19:41 UTC (permalink / raw) To: Justin Piszcz Cc: Mark Lord, David Greaves, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Justin Piszcz wrote: > > > On Wed, 1 Mar 2006, Mark Lord wrote: > >> Justin Piszcz wrote: >> >>> >>> Did you mean you wanted us to test it like we normally do, ie, copy >>> files/md5sum them on the disk and see if we can make it occur again, or? >> >> >> Yes. The S.M.A.R.T. stuff doesn't matter nearly as much as normal I/O. >> >> And Justin, can you get those S.M.A.R.T. errors to pop up on 2.6.15 as >> well? >> > > Have not tested, can test later if necessary, running some I/O tests to > the disk which is probably going to take quite a while to see if I can > get it to error again with 2.6.16-rc5-git4. If there are FUA problems, it would be immediately apparent on the first write... Jeff ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-26 2:27 ` Mark Lord 2006-02-26 9:56 ` David Greaves @ 2006-02-26 12:27 ` James Courtier-Dutton 2006-02-26 12:55 ` David Greaves 2006-02-26 13:56 ` Mark Lord 1 sibling, 2 replies; 147+ messages in thread From: James Courtier-Dutton @ 2006-02-26 12:27 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun Mark Lord wrote: > David Greaves wrote: >> >> Linux haze 2.6.16-rc4patched #1 PREEMPT Sat Feb 25 19:29:11 UTC 2006 >> i686 GNU/Linux >> >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: error=0x04 { DriveStatusError } >> ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> sd 1:0:0:0: SCSI error: return code = 0x8000002 >> sdb: Current: sense key: Medium Error >> Additional sense: Unrecovered read error - auto reallocate failed >> end_request: I/O error, dev sdb, sector 398283329 >> raid1: Disk failure on sdb2, disabling device. >> Operation continuing on 1 devices > > Oh good, *now* we've gotten somewhere!! > > Albert / Jens / Jeff: > > The command failing above is SCSI WRITE_10, which is being > translated into ATA_CMD_WRITE_FUA_EXT by libata. > > This command fails -- unrecognized by the drive in question. > But libata reports it (most incorrectly) as a "medium error", > and the drive is taken out of service from its RAID. > > Bad, bad, and worse. > I have what looks like similar problems. The issue I have is that I don't think the problem is ONLY libata related. I have two linux PCs. One called "games", the other called "localhost". The problem happens quite quickly on the old "games" machine, but I can run for days/weeks until I see the problem on the "localhost". It might be happening on the "localhost", but I am just not noticing. The difference being that if reiserfs sees this error, it cannot recover, and I have reiserfs on the "games" machine. The "localhost" only uses ext3, and ext3 recovers gracefully from this problem. Can I use libata on this old "games" machine? It is an old Pentium 3 machine. In any case, The "games" machine is currently switched off until I can find a kernel that works, so I will happily test different kernels and patches, if people have suggestions. I have two desktop linux machines. One is an old Pentium 3 which shows the following errors(no libata involved): Linux version 2.6.15-rc4 (root@games) (gcc version 4.0.3 20051111 (prerelease) (Debian 4.0.2-4) ) #1 Sat Dec 3 18:47:19 GMT 2005 Dec 16 22:51:57 games kernel: hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error } Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=53058185, sector=53057951 Dec 16 22:52:32 games kernel: ide: failed opcode was: unknown Dec 16 22:52:32 games kernel: end_request: I/O error, dev hdc, sector 53057951 Dec 16 22:52:32 games kernel: hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error } Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x10 { SectorIdNotFound }, LBAsect=53058185, sector=53057959 Dec 16 22:52:32 games kernel: ide: failed opcode was: unknown The other has the following errors: Linux version 2.6.15.1 (root@localhost) (gcc version 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pi e-8.7.9)) #3 SMP PREEMPT Fri Feb 3 23:19:05 GMT 2006 Feb 10 23:30:07 localhost kernel: ata3: command 0xb0 timeout, stat 0xd0 host_stat 0x0 Feb 10 23:30:07 localhost kernel: ata3: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 Feb 10 23:30:07 localhost kernel: ata3: status=0xd0 { Busy } Feb 10 23:30:07 localhost kernel: ATA: abnormal status 0xD0 on port 0xF880E087 Feb 10 23:30:07 localhost last message repeated 3 times Feb 10 23:30:10 localhost kernel: ata3: PIO error Feb 10 23:30:10 localhost kernel: ata3: status=0x50 { DriveReady SeekComplete } Feb 11 10:18:10 localhost kernel: ata2: command 0xb0 timeout, stat 0xd0 host_stat 0x0 Feb 11 10:18:10 localhost kernel: ata2: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 Feb 11 10:18:10 localhost kernel: ata2: status=0xd0 { Busy } Feb 11 10:18:10 localhost kernel: ATA: abnormal status 0xD0 on port 0x177 Feb 11 10:18:10 localhost last message repeated 3 times ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-26 12:27 ` James Courtier-Dutton @ 2006-02-26 12:55 ` David Greaves 2006-02-26 13:56 ` Mark Lord 1 sibling, 0 replies; 147+ messages in thread From: David Greaves @ 2006-02-26 12:55 UTC (permalink / raw) To: James Courtier-Dutton Cc: Mark Lord, Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun James Courtier-Dutton wrote: > I have two desktop linux machines. One is an old Pentium 3 which shows > the following errors(no libata involved): > Linux version 2.6.15-rc4 (root@games) (gcc version 4.0.3 20051111 > (prerelease) (Debian 4.0.2-4) > ) #1 Sat Dec 3 18:47:19 GMT 2005 > Dec 16 22:51:57 games kernel: hdc: dma_intr: status=0x51 { DriveReady > SeekComplete Error } > Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x40 { > UncorrectableError }, LBAsect=53058185, sector=53057951 > Dec 16 22:52:32 games kernel: ide: failed opcode was: unknown > Dec 16 22:52:32 games kernel: end_request: I/O error, dev hdc, sector > 53057951 > Dec 16 22:52:32 games kernel: hdc: dma_intr: status=0x51 { DriveReady > SeekComplete Error } > Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x10 { > SectorIdNotFound }, LBAsect=53058185, sector=53057959 > Dec 16 22:52:32 games kernel: ide: failed opcode was: unknown This looks like a simple bad disk drive. Notice that the sectors are quite close. If you like you can move the drive to a working machine and run a badblocks on it. do 'man badblocks' before you start. Is it SMART capable? What does smartctl -a /dev/hdc show? ddrescue may be your friend if you need to recover data. Reply offlist if this is the case. > The other has the following errors: > Linux version 2.6.15.1 (root@localhost) (gcc version 3.4.5 (Gentoo > 3.4.5, ssp-3.4.5-1.0, pi > e-8.7.9)) #3 SMP PREEMPT Fri Feb 3 23:19:05 GMT 2006 > Feb 10 23:30:07 localhost kernel: ata3: command 0xb0 timeout, stat > 0xd0 host_stat 0x0 > Feb 10 23:30:07 localhost kernel: ata3: translated ATA stat/err > 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 > Feb 10 23:30:07 localhost kernel: ata3: status=0xd0 { Busy } > Feb 10 23:30:07 localhost kernel: ATA: abnormal status 0xD0 on port > 0xF880E087 > Feb 10 23:30:07 localhost last message repeated 3 times > Feb 10 23:30:10 localhost kernel: ata3: PIO error > Feb 10 23:30:10 localhost kernel: ata3: status=0x50 { DriveReady > SeekComplete } > Feb 11 10:18:10 localhost kernel: ata2: command 0xb0 timeout, stat > 0xd0 host_stat 0x0 > Feb 11 10:18:10 localhost kernel: ata2: translated ATA stat/err > 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 > Feb 11 10:18:10 localhost kernel: ata2: status=0xd0 { Busy } > Feb 11 10:18:10 localhost kernel: ATA: abnormal status 0xD0 on port 0x177 > Feb 11 10:18:10 localhost last message repeated 3 times Have you got smartd running? I get a similar problem running some smartcl commands (-s on and -o on) I suspect this is a libata ata passthru problem - but I'm *guessing* :) check the last messages in dmesg then run smartctl -data -s on /dev/sd... smartctl -data -o on /dev/sd... See if there are new messages in dmesg David -- ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-26 12:27 ` James Courtier-Dutton 2006-02-26 12:55 ` David Greaves @ 2006-02-26 13:56 ` Mark Lord 2006-02-26 14:30 ` Kernel SeekCompleteErrors... Different from " James Courtier-Dutton 1 sibling, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-02-26 13:56 UTC (permalink / raw) To: James Courtier-Dutton Cc: Mark Lord, David Greaves, Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun James Courtier-Dutton wrote: > > I have what looks like similar problems. The issue I have is that I Nope. Different issues. > ) #1 Sat Dec 3 18:47:19 GMT 2005 > Dec 16 22:51:57 games kernel: hdc: dma_intr: status=0x51 { DriveReady > SeekComplete Error } > Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x40 { > UncorrectableError }, LBAsect=53058185, sector=53057951 The disk really does have bad sectors in this case (above). > The other has the following errors: > Linux version 2.6.15.1 (root@localhost) (gcc version 3.4.5 (Gentoo > 3.4.5, ssp-3.4.5-1.0, pi > e-8.7.9)) #3 SMP PREEMPT Fri Feb 3 23:19:05 GMT 2006 > Feb 10 23:30:07 localhost kernel: ata3: command 0xb0 timeout, stat 0xd0 > host_stat 0x0 > Feb 10 23:30:07 localhost kernel: ata3: translated ATA stat/err 0xd0/00 > to SCSI SK/ASC/ASCQ 0xb/47/00 > Feb 10 23:30:07 localhost kernel: ata3: status=0xd0 { Busy } > Feb 10 23:30:07 localhost kernel: ATA: abnormal status 0xD0 on port > 0xF880E087 > Feb 10 23:30:07 localhost last message repeated 3 times > Feb 10 23:30:10 localhost kernel: ata3: PIO error > Feb 10 23:30:10 localhost kernel: ata3: status=0x50 { DriveReady > SeekComplete } > Feb 11 10:18:10 localhost kernel: ata2: command 0xb0 timeout, stat 0xd0 > host_stat 0x0 > Feb 11 10:18:10 localhost kernel: ata2: translated ATA stat/err 0xd0/00 > to SCSI SK/ASC/ASCQ 0xb/47/00 > Feb 11 10:18:10 localhost kernel: ata2: status=0xd0 { Busy } > Feb 11 10:18:10 localhost kernel: ATA: abnormal status 0xD0 on port 0x177 > Feb 11 10:18:10 localhost last message repeated 3 times PIO errors? Are you using Alan Cox's experimental PATA code for libata? -ml ^ permalink raw reply [flat|nested] 147+ messages in thread
* Kernel SeekCompleteErrors... Different from Re: LibPATA code issues / 2.6.15.4 2006-02-26 13:56 ` Mark Lord @ 2006-02-26 14:30 ` James Courtier-Dutton 2006-02-26 17:03 ` Mark Lord 0 siblings, 1 reply; 147+ messages in thread From: James Courtier-Dutton @ 2006-02-26 14:30 UTC (permalink / raw) To: Mark Lord Cc: Mark Lord, David Greaves, Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun Mark Lord wrote: > James Courtier-Dutton wrote: >> >> I have what looks like similar problems. The issue I have is that I > > Nope. Different issues. I have changed the Subject line to indicate this so any future responses can be indicated. > >> ) #1 Sat Dec 3 18:47:19 GMT 2005 >> Dec 16 22:51:57 games kernel: hdc: dma_intr: status=0x51 { DriveReady >> SeekComplete Error } >> Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x40 { >> UncorrectableError }, LBAsect=53058185, sector=53057951 > > The disk really does have bad sectors in this case (above). The disk has NO bad sectors. It has been checked using two different tests. 1) seatools (The seagate test tool passed the deep test where it reads all sectors.) 2) dd of the entire HD image onto another HD. No sector errors were encountered in either case. > > >> The other has the following errors: >> Linux version 2.6.15.1 (root@localhost) (gcc version 3.4.5 (Gentoo >> 3.4.5, ssp-3.4.5-1.0, pi >> e-8.7.9)) #3 SMP PREEMPT Fri Feb 3 23:19:05 GMT 2006 >> Feb 10 23:30:07 localhost kernel: ata3: command 0xb0 timeout, stat >> 0xd0 host_stat 0x0 >> Feb 10 23:30:07 localhost kernel: ata3: translated ATA stat/err >> 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 >> Feb 10 23:30:07 localhost kernel: ata3: status=0xd0 { Busy } >> Feb 10 23:30:07 localhost kernel: ATA: abnormal status 0xD0 on port >> 0xF880E087 >> Feb 10 23:30:07 localhost last message repeated 3 times >> Feb 10 23:30:10 localhost kernel: ata3: PIO error >> Feb 10 23:30:10 localhost kernel: ata3: status=0x50 { DriveReady >> SeekComplete } >> Feb 11 10:18:10 localhost kernel: ata2: command 0xb0 timeout, stat >> 0xd0 host_stat 0x0 >> Feb 11 10:18:10 localhost kernel: ata2: translated ATA stat/err >> 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 >> Feb 11 10:18:10 localhost kernel: ata2: status=0xd0 { Busy } >> Feb 11 10:18:10 localhost kernel: ATA: abnormal status 0xD0 on port >> 0x177 >> Feb 11 10:18:10 localhost last message repeated 3 times > > PIO errors? Are you using Alan Cox's experimental PATA code for libata? > > -ml > No, this is Linux kernel 2.6.15.1 with no patches. I cut and pasted the Linux version number to the top of each trace output in my original email. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: Kernel SeekCompleteErrors... Different from Re: LibPATA code issues / 2.6.15.4 2006-02-26 14:30 ` Kernel SeekCompleteErrors... Different from " James Courtier-Dutton @ 2006-02-26 17:03 ` Mark Lord 2006-02-26 17:13 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-02-26 17:03 UTC (permalink / raw) To: James Courtier-Dutton Cc: David Greaves, Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun James Courtier-Dutton wrote: > Mark Lord wrote: >> James Courtier-Dutton wrote: >>> >>> I have what looks like similar problems. The issue I have is that I >> >> Nope. Different issues. > I have changed the Subject line to indicate this so any future responses > can be indicated. > >> >>> ) #1 Sat Dec 3 18:47:19 GMT 2005 >>> Dec 16 22:51:57 games kernel: hdc: dma_intr: status=0x51 { DriveReady >>> SeekComplete Error } >>> Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x40 { >>> UncorrectableError }, LBAsect=53058185, sector=53057951 >> >> The disk really does have bad sectors in this case (above). > The disk has NO bad sectors. It has been checked using two different tests. The *only* test that matters is to enable S.M.A.R.T., and read out the error logs from it. "smartctl" is the tool. Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: Kernel SeekCompleteErrors... Different from Re: LibPATA code issues / 2.6.15.4 2006-02-26 17:03 ` Mark Lord @ 2006-02-26 17:13 ` Dr. David Alan Gilbert 2006-02-26 17:43 ` Alan Cox 0 siblings, 1 reply; 147+ messages in thread From: Dr. David Alan Gilbert @ 2006-02-26 17:13 UTC (permalink / raw) To: Mark Lord Cc: James Courtier-Dutton, David Greaves, Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun * Mark Lord (liml@rtr.ca) wrote: > >>James Courtier-Dutton wrote: > >>> > >>>I have what looks like similar problems. The issue I have is that I > >> > >>Nope. Different issues. > >I have changed the Subject line to indicate this so any future responses > >can be indicated. > > > >> > >>>) #1 Sat Dec 3 18:47:19 GMT 2005 > >>>Dec 16 22:51:57 games kernel: hdc: dma_intr: status=0x51 { DriveReady > >>>SeekComplete Error } > >>>Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x40 { > >>>UncorrectableError }, LBAsect=53058185, sector=53057951 > >> > >>The disk really does have bad sectors in this case (above). > >The disk has NO bad sectors. It has been checked using two different tests. > > The *only* test that matters is to enable S.M.A.R.T., > and read out the error logs from it. I have seen a set of drives that has reported UncorrectableErrors and : * Shows the Uncorrectable error in the SMART log * Passes a full SMART test * Shows no remapped sectors * Passes the vendors drive test * Now fully passes a dd if=/dev/hdx of=/dev/null with no errors. They were a set of 250GB SATA drives by the same vendor; I've taken them out one at a time as each did the same thing and replaced them with another vendors drive. They were all in use in RAID-1 MD configuration (under heavy load). I do wonder about the 'uncorrectable error rate' that vendors report; it doesn't seem very large - but I'll admit to not understanding its units. Are soft non-repeatable uncorrectable errors expected in principal? (Pointers to a good explanation of what this actually means would be appreciated). I do wonder how often this happens to people and if the read succeeds again they just blame it on software. Dave -- -----Open up your eyes, open up your mind, open up your code ------- / Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \ \ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex / \ _________________________|_____ http://www.treblig.org |_______/ ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: Kernel SeekCompleteErrors... Different from Re: LibPATA code issues / 2.6.15.4 2006-02-26 17:13 ` Dr. David Alan Gilbert @ 2006-02-26 17:43 ` Alan Cox 2006-02-26 20:36 ` Mark Lord 0 siblings, 1 reply; 147+ messages in thread From: Alan Cox @ 2006-02-26 17:43 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Mark Lord, James Courtier-Dutton, David Greaves, Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun On Sul, 2006-02-26 at 17:13 +0000, Dr. David Alan Gilbert wrote: > > The *only* test that matters is to enable S.M.A.R.T., > > and read out the error logs from it. SMART is unreliable for many cases > I have seen a set of drives that has reported UncorrectableErrors > and : > * Shows the Uncorrectable error in the SMART log > * Passes a full SMART test > * Shows no remapped sectors > * Passes the vendors drive test > * Now fully passes a dd if=/dev/hdx of=/dev/null with no errors. The very early SATA code didnt decode the errors from the drive fully so could produce bogus reports. The current code decodes it and also displays the ATA level diagnostics so should be reliable. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: Kernel SeekCompleteErrors... Different from Re: LibPATA code issues / 2.6.15.4 2006-02-26 17:43 ` Alan Cox @ 2006-02-26 20:36 ` Mark Lord 2006-02-27 11:48 ` Alan Cox 0 siblings, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-02-26 20:36 UTC (permalink / raw) To: Alan Cox Cc: Dr. David Alan Gilbert, James Courtier-Dutton, David Greaves, Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun Alan Cox wrote: > > The very early SATA code didnt decode the errors from the drive fully so > could produce bogus reports. The current code decodes it and also > displays the ATA level diagnostics so should be reliable. It still is unreliable, as being discussed in another thread. libata wrongly says "medium error" any time it issues a command that the drive rejects (unsupported, invalid parameters, etc..). This is biting a few people in 2.6.16-rc*, due to the FUA stuff. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: Kernel SeekCompleteErrors... Different from Re: LibPATA code issues / 2.6.15.4 2006-02-26 20:36 ` Mark Lord @ 2006-02-27 11:48 ` Alan Cox 2006-02-27 13:40 ` Mark Lord 0 siblings, 1 reply; 147+ messages in thread From: Alan Cox @ 2006-02-27 11:48 UTC (permalink / raw) To: Mark Lord Cc: Dr. David Alan Gilbert, James Courtier-Dutton, David Greaves, Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun On Sul, 2006-02-26 at 15:36 -0500, Mark Lord wrote: > It still is unreliable, as being discussed in another thread. > > libata wrongly says "medium error" any time it issues a command > that the drive rejects (unsupported, invalid parameters, etc..). It seems to still get a single case wrong. But it does the report the ATA state correctly still. > This is biting a few people in 2.6.16-rc*, due to the FUA stuff. It is driven by a table in libata-scsi.c:ata_to_sense_error() so if you can figure out the wrong entry and tweak the table that would be great ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: Kernel SeekCompleteErrors... Different from Re: LibPATA code issues / 2.6.15.4 2006-02-27 11:48 ` Alan Cox @ 2006-02-27 13:40 ` Mark Lord 0 siblings, 0 replies; 147+ messages in thread From: Mark Lord @ 2006-02-27 13:40 UTC (permalink / raw) To: Alan Cox Cc: Dr. David Alan Gilbert, James Courtier-Dutton, David Greaves, Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun Alan Cox wrote: > On Sul, 2006-02-26 at 15:36 -0500, Mark Lord wrote: >> It still is unreliable, as being discussed in another thread. >> >> libata wrongly says "medium error" any time it issues a command >> that the drive rejects (unsupported, invalid parameters, etc..). > > It seems to still get a single case wrong. But it does the report the > ATA state correctly still. > >> This is biting a few people in 2.6.16-rc*, due to the FUA stuff. > > It is driven by a table in > > libata-scsi.c:ata_to_sense_error() > > so if you can figure out the wrong entry and tweak the table that would be great It's the fall-through case, where the table is not used. /* No error? Undecoded? */ printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: 0x%02x\n", id, opcode, drv_stat); /* For our last chance pick, use medium read error because * it's much more common than an ATA drive telling you a write * has failed. */ *sk = MEDIUM_ERROR; *asc = 0x11; /* "unrecovered read error" */ *ascq = 0x04; /* "auto-reallocation failed" */ Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-14 14:50 ` Mark Lord 2006-02-14 16:27 ` David Greaves @ 2006-02-14 23:58 ` Justin Piszcz 2006-02-17 8:45 ` Jeff Garzik 2 siblings, 0 replies; 147+ messages in thread From: Justin Piszcz @ 2006-02-14 23:58 UTC (permalink / raw) To: Mark Lord; +Cc: Jeff Garzik, linux-kernel, IDE/ATA development list FYI: Make a 100GB file, md5sum it, copy it to 'problem' drive and md5sum it, same MD5SUMS. box:/x8# /usr/bin/time dd if=/dev/zero of=100gb bs=1M count=100000 ; /usr/bin/time md5sum 100gb; /usr/bin/time cp 100gb /x4 ; cd /x4 ; /usr/bin/time md5sum 100gb 100000+0 records in 100000+0 records out 104857600000 bytes transferred in 4735.034107 seconds (22145057 bytes/sec) 0.29user 245.59system 1:18:55elapsed 5%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+210minor)pagefaults 0swaps 1e95cd44e2cb773f483ea7b2f676258d 100gb 248.24user 98.17system 32:50.97elapsed 17%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1major+188minor)pagefaults 0swaps 14.75user 341.92system 35:25.25elapsed 16%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (4major+183minor)pagefaults 0swaps 1e95cd44e2cb773f483ea7b2f676258d 100gb 246.95user 110.41system 28:06.49elapsed 21%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1major+190minor)pagefaults 0swaps box:/x4# Also, all SMART tests passed with flying colors.. (FYI) On Tue, 14 Feb 2006, Mark Lord wrote: > Justin Piszcz wrote: > .. >> ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } > > I wonder if the FUA logic is inserting cache-flush commands > and perhaps the drive is rejecting those? > > Jeff, we really ought to be including the failed ATA opcode > in those error messages!! > > Cheers > ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-14 14:50 ` Mark Lord 2006-02-14 16:27 ` David Greaves 2006-02-14 23:58 ` Justin Piszcz @ 2006-02-17 8:45 ` Jeff Garzik 2006-02-17 14:59 ` Mark Lord 2 siblings, 1 reply; 147+ messages in thread From: Jeff Garzik @ 2006-02-17 8:45 UTC (permalink / raw) To: Mark Lord; +Cc: Justin Piszcz, linux-kernel, IDE/ATA development list Mark Lord wrote: > Jeff, we really ought to be including the failed ATA opcode > in those error messages!! Submit a patch... Jeff ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-17 8:45 ` Jeff Garzik @ 2006-02-17 14:59 ` Mark Lord 2006-02-17 15:00 ` Justin Piszcz 2006-02-18 20:43 ` Sander 0 siblings, 2 replies; 147+ messages in thread From: Mark Lord @ 2006-02-17 14:59 UTC (permalink / raw) To: Jeff Garzik; +Cc: Justin Piszcz, linux-kernel, IDE/ATA development list On Friday 17 February 2006 03:45, Jeff Garzik wrote: >Submit a patch... You mean, something like this one? Untested at present, as I was hoping to hear back from one of the original problem reporters after they tested it. Cheers! -------- Original Message -------- Subject: Re: LibPATA code issues / 2.6.15.4 Date: Tue, 14 Feb 2006 13:00:36 -0500 From: Mark Lord <lkml@rtr.ca> To: Justin Piszcz <jpiszcz@lucidpixels.com> CC: David Greaves <david@dgreaves.com>, Jeff Garzik <jgarzik@pobox.com>, linux-kernel@vger.kernel.org, IDE/ATA development list <linux-ide@vger.kernel.org> References: <Pine.LNX.4.64.0602140439580.3567@p34> <43F2050B.8020006@dgreaves.com> <Pine.LNX.4.64.0602141211350.10793@p34> On Tuesday 14 February 2006 12:12, Justin Piszcz wrote: > I would like to try the patch too, if available. Something like this: (for 2.6.16-rc3-git2, but should be okay on 2.6.15 also). Untested: include the original SCSI opcode in printk's for libata SCSI errors, to help understand where the errors are coming from. Signed-Off-By: Mark Lord <mlord@pobox.com> --- linux/drivers/scsi/libata-scsi.c.orig 2006-02-12 19:27:25.000000000 -0500 +++ linux/drivers/scsi/libata-scsi.c 2006-02-14 12:54:17.000000000 -0500 @@ -420,6 +420,7 @@ * @sk: the sense key we'll fill out * @asc: the additional sense code we'll fill out * @ascq: the additional sense code qualifier we'll fill out + * @opcode: the original SCSI command opcode byte * * Converts an ATA error into a SCSI error. Fill out pointers to * SK, ASC, and ASCQ bytes for later use in fixed or descriptor @@ -429,7 +430,7 @@ * spin_lock_irqsave(host_set lock) */ void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc, - u8 *ascq) + u8 *ascq, u8 opcode) { int i; @@ -508,8 +509,8 @@ } } /* No error? Undecoded? */ - printk(KERN_WARNING "ata%u: no sense translation for status: 0x%02x\n", - id, drv_stat); + printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: 0x%02x\n", + id, opcode, drv_stat); /* For our last chance pick, use medium read error because * it's much more common than an ATA drive telling you a write @@ -520,8 +521,8 @@ *ascq = 0x04; /* "auto-reallocation failed" */ translate_done: - printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to " - "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err, + printk(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to " + "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err, *sk, *asc, *ascq); return; } @@ -562,7 +563,7 @@ */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { ata_to_sense_error(qc->ap->id, tf->command, tf->feature, - &sb[1], &sb[2], &sb[3]); + &sb[1], &sb[2], &sb[3], cmd->cmnd[0]); sb[1] &= 0x0f; } @@ -637,7 +638,7 @@ */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { ata_to_sense_error(qc->ap->id, tf->command, tf->feature, - &sb[2], &sb[12], &sb[13]); + &sb[2], &sb[12], &sb[13], cmd->cmnd[0]); sb[2] &= 0x0f; } - ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-17 14:59 ` Mark Lord @ 2006-02-17 15:00 ` Justin Piszcz 2006-02-18 20:43 ` Sander 1 sibling, 0 replies; 147+ messages in thread From: Justin Piszcz @ 2006-02-17 15:00 UTC (permalink / raw) To: Mark Lord; +Cc: Jeff Garzik, linux-kernel, IDE/ATA development list I have patched the kernel and rebooted it with your patch, but, of course, with my luck it has not given me any errors since, even when repeating major file copies, bonnie++ and iozone!! :( On Fri, 17 Feb 2006, Mark Lord wrote: > On Friday 17 February 2006 03:45, Jeff Garzik wrote: >> Submit a patch... > > You mean, something like this one? > Untested at present, as I was hoping to hear > back from one of the original problem reporters > after they tested it. > > Cheers! > > > > -------- Original Message -------- > Subject: Re: LibPATA code issues / 2.6.15.4 > Date: Tue, 14 Feb 2006 13:00:36 -0500 > From: Mark Lord <lkml@rtr.ca> > To: Justin Piszcz <jpiszcz@lucidpixels.com> > CC: David Greaves <david@dgreaves.com>, Jeff Garzik <jgarzik@pobox.com>, > linux-kernel@vger.kernel.org, IDE/ATA development list > <linux-ide@vger.kernel.org> > References: <Pine.LNX.4.64.0602140439580.3567@p34> > <43F2050B.8020006@dgreaves.com> <Pine.LNX.4.64.0602141211350.10793@p34> > > On Tuesday 14 February 2006 12:12, Justin Piszcz wrote: >> I would like to try the patch too, if available. > > Something like this: (for 2.6.16-rc3-git2, but should be okay on 2.6.15 > also). > > Untested: include the original SCSI opcode in printk's for libata SCSI > errors, > to help understand where the errors are coming from. > > Signed-Off-By: Mark Lord <mlord@pobox.com> > > --- linux/drivers/scsi/libata-scsi.c.orig 2006-02-12 19:27:25.000000000 -0500 > +++ linux/drivers/scsi/libata-scsi.c 2006-02-14 12:54:17.000000000 -0500 > @@ -420,6 +420,7 @@ > * @sk: the sense key we'll fill out > * @asc: the additional sense code we'll fill out > * @ascq: the additional sense code qualifier we'll fill out > + * @opcode: the original SCSI command opcode byte > * > * Converts an ATA error into a SCSI error. Fill out pointers to > * SK, ASC, and ASCQ bytes for later use in fixed or descriptor > @@ -429,7 +430,7 @@ > * spin_lock_irqsave(host_set lock) > */ > void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 > *asc, > - u8 *ascq) > + u8 *ascq, u8 opcode) > { > int i; > > @@ -508,8 +509,8 @@ > } > } > /* No error? Undecoded? */ > - printk(KERN_WARNING "ata%u: no sense translation for status: 0x%02x\n", > - id, drv_stat); > + printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: > 0x%02x\n", > + id, opcode, drv_stat); > > /* For our last chance pick, use medium read error because > * it's much more common than an ATA drive telling you a write > @@ -520,8 +521,8 @@ > *ascq = 0x04; /* "auto-reallocation failed" */ > > translate_done: > - printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to " > - "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err, > + printk(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to " > + "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err, > *sk, *asc, *ascq); > return; > } > @@ -562,7 +563,7 @@ > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > - &sb[1], &sb[2], &sb[3]); > + &sb[1], &sb[2], &sb[3], cmd->cmnd[0]); > sb[1] &= 0x0f; > } > > @@ -637,7 +638,7 @@ > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > - &sb[2], &sb[12], &sb[13]); > + &sb[2], &sb[12], &sb[13], cmd->cmnd[0]); > sb[2] &= 0x0f; > } > > - > ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-17 14:59 ` Mark Lord 2006-02-17 15:00 ` Justin Piszcz @ 2006-02-18 20:43 ` Sander 2006-02-18 21:42 ` Mark Lord 1 sibling, 1 reply; 147+ messages in thread From: Sander @ 2006-02-18 20:43 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list Mark Lord wrote (ao): > On Friday 17 February 2006 03:45, Jeff Garzik wrote: > >Submit a patch... > > You mean, something like this one? > Untested at present, as I was hoping to hear > back from one of the original problem reporters > after they tested it. Not the original reporter, but your patch Works For Me. I get these: [ 633.449961] md: md1: sync done. [ 633.456070] RAID5 conf printout: [ 633.456117] --- rd:9 wd:9 fd:0 [ 633.456164] disk 0, o:1, dev:sda2 [ 633.456208] disk 1, o:1, dev:sdb2 [ 633.456250] disk 2, o:1, dev:sdc2 [ 633.456298] disk 3, o:1, dev:sdd2 [ 633.456340] disk 4, o:1, dev:sde2 [ 633.456383] disk 5, o:1, dev:sdf2 [ 633.456427] disk 6, o:1, dev:sdg2 [ 633.456470] disk 7, o:1, dev:sdh2 [ 633.456514] disk 8, o:1, dev:sdi2 [ 787.639858] kjournald starting. Commit interval 5 seconds [ 787.657991] EXT3 FS on md1, internal journal [ 787.658023] EXT3-fs: mounted filesystem with writeback data mode. [ 1872.338185] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 [ 1872.338239] ata6: status=0xd0 { Busy } [ 5749.285084] ata8: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 [ 5749.285138] ata8: status=0xd0 { Busy } [ 5906.008461] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 [ 5906.008515] ata6: status=0xd0 { Busy } [ 9892.904205] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 [ 9892.904259] ata6: status=0xd0 { Busy } [10146.084687] ata5: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 [10146.084740] ata5: status=0xd0 { Busy } [10293.949040] ata5: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 [10293.949093] ata5: status=0xd0 { Busy } Can you tell from this what they mean? This is with 2.6.16-rc3, your patch, and running nine Maxtors disks over onboard nForce4 and MV88SX6081 8-port SATA II PCI-X Controller (rev 09). for i in `seq 10` do dd if=/dev/zero of=bigfile.$i bs=1024k count=10000 done md5sum bigfile.* The errors mostly seem to happen during the md5sum (not during the dd). I do not see data corruption or slowdown. I do need a chunksize of 512k for the raid5. With anything lower (I tried the default 64k, 128k, 256k, 512k and 4096k) I get data corruption and the errors reported in: http://marc.theaimsgroup.com/?l=linux-ide&m=114016903530007&w=2 Thanks! Sander -- Humilis IT Services and Solutions http://www.humilis.net ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-18 20:43 ` Sander @ 2006-02-18 21:42 ` Mark Lord 2006-02-18 21:51 ` Justin Piszcz 2006-02-19 7:14 ` Sander 0 siblings, 2 replies; 147+ messages in thread From: Mark Lord @ 2006-02-18 21:42 UTC (permalink / raw) To: sander; +Cc: Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list Sander wrote: > Mark Lord wrote (ao): >> On Friday 17 February 2006 03:45, Jeff Garzik wrote: >>> Submit a patch... >> You mean, something like this one? ... > [ 633.449961] md: md1: sync done. > [ 633.456070] RAID5 conf printout: > [ 633.456117] --- rd:9 wd:9 fd:0 ... > [ 1872.338185] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 > [ 1872.338239] ata6: status=0xd0 { Busy } > [ 5749.285084] ata8: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 > [ 5749.285138] ata8: status=0xd0 { Busy } > [ 5906.008461] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 > [ 5906.008515] ata6: status=0xd0 { Busy } ... > This is with 2.6.16-rc3, your patch, and running nine Maxtors disks > over onboard nForce4 and MV88SX6081 8-port SATA II PCI-X Controller (rev 09). > > for i in `seq 10` > do dd if=/dev/zero of=bigfile.$i bs=1024k count=10000 > done > md5sum bigfile.* > > The errors mostly seem to happen during the md5sum (not during the dd). SCSI opcode 0x2a is WRITE_10, so the errors are being reported in response to the writes to bigfile.$i. But these are different from the previously reported error status values -- I wonder why it's getting "Busy" back as a status here ?? ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-18 21:42 ` Mark Lord @ 2006-02-18 21:51 ` Justin Piszcz 2006-02-19 7:14 ` Sander 1 sibling, 0 replies; 147+ messages in thread From: Justin Piszcz @ 2006-02-18 21:51 UTC (permalink / raw) To: Mark Lord; +Cc: sander, Jeff Garzik, linux-kernel, IDE/ATA development list $ for i in `seq 10` > do dd if=/dev/zero of=bigfile.$i bs=1024k count=10000 > done 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 190.997693 seconds (54899930 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 212.242724 seconds (49404568 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 189.324450 seconds (55385134 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 190.280352 seconds (55106898 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 191.567239 seconds (54736708 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 183.640928 seconds (57099254 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 179.974098 seconds (58262606 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 190.126087 seconds (55151611 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 192.227807 seconds (54548612 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 185.309607 seconds (56585086 bytes/sec) war@p34:/x4$ md5sum bigfile.* 26f56024ac39cdc54b228820107f040d bigfile.1 26f56024ac39cdc54b228820107f040d bigfile.10 26f56024ac39cdc54b228820107f040d bigfile.2 26f56024ac39cdc54b228820107f040d bigfile.3 26f56024ac39cdc54b228820107f040d bigfile.4 26f56024ac39cdc54b228820107f040d bigfile.5 26f56024ac39cdc54b228820107f040d bigfile.6 26f56024ac39cdc54b228820107f040d bigfile.7 26f56024ac39cdc54b228820107f040d bigfile.8 26f56024ac39cdc54b228820107f040d bigfile.9 No errors in dmesg yet (for my issue). On Sat, 18 Feb 2006, Mark Lord wrote: > Sander wrote: >> Mark Lord wrote (ao): >>> On Friday 17 February 2006 03:45, Jeff Garzik wrote: >>>> Submit a patch... >>> You mean, something like this one? > ... >> [ 633.449961] md: md1: sync done. >> [ 633.456070] RAID5 conf printout: >> [ 633.456117] --- rd:9 wd:9 fd:0 > ... >> [ 1872.338185] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI >> SK/ASC/ASCQ 0xb/47/00 >> [ 1872.338239] ata6: status=0xd0 { Busy } >> [ 5749.285084] ata8: translated op=0x2a ATA stat/err 0xd0/00 to SCSI >> SK/ASC/ASCQ 0xb/47/00 >> [ 5749.285138] ata8: status=0xd0 { Busy } >> [ 5906.008461] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI >> SK/ASC/ASCQ 0xb/47/00 >> [ 5906.008515] ata6: status=0xd0 { Busy } > ... >> This is with 2.6.16-rc3, your patch, and running nine Maxtors disks >> over onboard nForce4 and MV88SX6081 8-port SATA II PCI-X Controller (rev >> 09). >> >> for i in `seq 10` >> do dd if=/dev/zero of=bigfile.$i bs=1024k count=10000 >> done >> md5sum bigfile.* >> >> The errors mostly seem to happen during the md5sum (not during the dd). > > SCSI opcode 0x2a is WRITE_10, so the errors are being reported > in response to the writes to bigfile.$i. But these are different > from the previously reported error status values -- I wonder why > it's getting "Busy" back as a status here ?? > ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-18 21:42 ` Mark Lord 2006-02-18 21:51 ` Justin Piszcz @ 2006-02-19 7:14 ` Sander 2006-02-19 15:30 ` Mark Lord 1 sibling, 1 reply; 147+ messages in thread From: Sander @ 2006-02-19 7:14 UTC (permalink / raw) To: Mark Lord Cc: sander, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list Mark Lord wrote (ao): > Sander wrote: > >Mark Lord wrote (ao): > >>On Friday 17 February 2006 03:45, Jeff Garzik wrote: > >>>Submit a patch... > >>You mean, something like this one? > ... > >[ 633.449961] md: md1: sync done. > >[ 633.456070] RAID5 conf printout: > >[ 633.456117] --- rd:9 wd:9 fd:0 > ... > >[ 1872.338185] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI > >SK/ASC/ASCQ 0xb/47/00 > >[ 1872.338239] ata6: status=0xd0 { Busy } > >[ 5749.285084] ata8: translated op=0x2a ATA stat/err 0xd0/00 to SCSI > >SK/ASC/ASCQ 0xb/47/00 > >[ 5749.285138] ata8: status=0xd0 { Busy } > >[ 5906.008461] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI > >SK/ASC/ASCQ 0xb/47/00 > >[ 5906.008515] ata6: status=0xd0 { Busy } > ... > >This is with 2.6.16-rc3, your patch, and running nine Maxtors disks > >over onboard nForce4 and MV88SX6081 8-port SATA II PCI-X Controller (rev > >09). > > > >for i in `seq 10` > >do dd if=/dev/zero of=bigfile.$i bs=1024k count=10000 > >done > >md5sum bigfile.* > > > >The errors mostly seem to happen during the md5sum (not during the dd). > > SCSI opcode 0x2a is WRITE_10, so the errors are being reported > in response to the writes to bigfile.$i. Ah, my bad then. > But these are different from the previously reported error status > values -- I wonder why it's getting "Busy" back as a status here ?? Well, as I wrote, I am not the original reporter whoms thread you responded to with your patch. I just thought I could use it to get better errors messages for my bug reports. I am using the sata_mv driver, which is beta. That might explain why it behaves not totally as expected in your eyes. I have no clue anyway :-) I hope my reports are of any use to Jeff wrt the sata_mv driver. Thank you for your response. Sander -- Humilis IT Services and Solutions http://www.humilis.net ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-19 7:14 ` Sander @ 2006-02-19 15:30 ` Mark Lord 2006-02-19 17:16 ` Sander 0 siblings, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-02-19 15:30 UTC (permalink / raw) To: sander; +Cc: Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list Sander wrote: > Mark Lord wrote (ao): >> Sander wrote: >>> Mark Lord wrote (ao): >>>> On Friday 17 February 2006 03:45, Jeff Garzik wrote: >>>>> Submit a patch... >>>> You mean, something like this one? >> ... >>> [ 633.449961] md: md1: sync done. >>> [ 633.456070] RAID5 conf printout: >>> [ 633.456117] --- rd:9 wd:9 fd:0 >> ... >>> [ 1872.338185] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI >>> SK/ASC/ASCQ 0xb/47/00 >>> [ 1872.338239] ata6: status=0xd0 { Busy } >>> [ 5749.285084] ata8: translated op=0x2a ATA stat/err 0xd0/00 to SCSI >>> SK/ASC/ASCQ 0xb/47/00 >>> [ 5749.285138] ata8: status=0xd0 { Busy } >>> [ 5906.008461] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI >>> SK/ASC/ASCQ 0xb/47/00 >>> [ 5906.008515] ata6: status=0xd0 { Busy } ... >> SCSI opcode 0x2a is WRITE_10, so the errors are being reported >> in response to the writes to bigfile.$i. ... > I am using the sata_mv driver, which is beta. That might explain why it > behaves not totally as expected in your eyes. I have no clue anyway :-) Ahh.. that's useful to know. I expect to be taking a long hard look at the innards of the sata_mv code in the near future, so whatever is wrong here just might get fixed soon. Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-19 15:30 ` Mark Lord @ 2006-02-19 17:16 ` Sander 2006-07-06 23:08 ` Justin Piszcz 0 siblings, 1 reply; 147+ messages in thread From: Sander @ 2006-02-19 17:16 UTC (permalink / raw) To: Mark Lord Cc: sander, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list Mark Lord wrote (ao): > Sander wrote: > >Mark Lord wrote (ao): > >>Sander wrote: > >>>Mark Lord wrote (ao): > >>>>On Friday 17 February 2006 03:45, Jeff Garzik wrote: > >>>>>Submit a patch... > >>>>You mean, something like this one? > >>... > >>>[ 633.449961] md: md1: sync done. > >>>[ 633.456070] RAID5 conf printout: > >>>[ 633.456117] --- rd:9 wd:9 fd:0 > >>... > >>>[ 1872.338185] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI > >>>SK/ASC/ASCQ 0xb/47/00 > >>>[ 1872.338239] ata6: status=0xd0 { Busy } > >>>[ 5749.285084] ata8: translated op=0x2a ATA stat/err 0xd0/00 to SCSI > >>>SK/ASC/ASCQ 0xb/47/00 > >>>[ 5749.285138] ata8: status=0xd0 { Busy } > >>>[ 5906.008461] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI > >>>SK/ASC/ASCQ 0xb/47/00 > >>>[ 5906.008515] ata6: status=0xd0 { Busy } > ... > >>SCSI opcode 0x2a is WRITE_10, so the errors are being reported > >>in response to the writes to bigfile.$i. > ... > >I am using the sata_mv driver, which is beta. That might explain why it > >behaves not totally as expected in your eyes. I have no clue anyway :-) > > Ahh.. that's useful to know. I'm sorry for omitting that information in my previous mail. > I expect to be taking a long hard look at the innards of the sata_mv > code in the near future, so whatever is wrong here just might get > fixed soon. Consider me your happy and willing patch test victim :-) I can easily reproduce data corruption with sata_mv. FWIW, I like this card very much. It is cheap, seems to perform well, and Marvell seems to be Linux friendly, providing the docs (according to http://linux-ata.org/sata-status.html#marvell). I'm not subscribed to linux-ide, but am to linux-kernel. If you post it there (or cc me) I'll see and try it. Sander -- Humilis IT Services and Solutions http://www.humilis.net ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-19 17:16 ` Sander @ 2006-07-06 23:08 ` Justin Piszcz 2006-07-07 13:08 ` Mark Lord 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-07-06 23:08 UTC (permalink / raw) To: Sander; +Cc: Mark Lord, Jeff Garzik, linux-kernel, IDE/ATA development list Look at this: >From smartctl, look at the correspondence: 199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always - 4 [4301946.802000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4301946.802000] ata4: status=0x51 { DriveReady SeekComplete Error } [4301946.802000] ata4: error=0x04 { DriveStatusError } [4302380.482000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4302380.482000] ata4: status=0x51 { DriveReady SeekComplete Error } [4302380.482000] ata4: error=0x04 { DriveStatusError } [4302493.664000] ata4: no sense translation for status: 0x51 [4302493.664000] ata4: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0xb/00/00 [4302493.664000] ata4: status=0x51 { DriveReady SeekComplete Error } [4302863.673000] ata4: no sense translation for status: 0x51 [4302863.673000] ata4: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0xb/00/00 [4302863.673000] ata4: status=0x51 { DriveReady SeekComplete Error } different drive, different cable, same controller, but second port So that Stat/err = UDMA_CRC_Error_Count! Not sure if we can fix what is causing it (in Linux) but just FYI. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-06 23:08 ` Justin Piszcz @ 2006-07-07 13:08 ` Mark Lord 2006-07-07 13:24 ` Justin Piszcz 0 siblings, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-07-07 13:08 UTC (permalink / raw) To: Justin Piszcz, Sander; +Cc: Jeff Garzik, linux-kernel, IDE/ATA development list Justin Piszcz wrote: > Look at this: > >> From smartctl, look at the correspondence: > 199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always > - 4 > > [4301946.802000] ata4: translated ATA stat/err 0x51/04 to SCSI > SK/ASC/ASCQ 0xb/00/00 > [4301946.802000] ata4: status=0x51 { DriveReady SeekComplete Error } > [4301946.802000] ata4: error=0x04 { DriveStatusError } > [4302380.482000] ata4: translated ATA stat/err 0x51/04 to SCSI > SK/ASC/ASCQ 0xb/00/00 > [4302380.482000] ata4: status=0x51 { DriveReady SeekComplete Error } > [4302380.482000] ata4: error=0x04 { DriveStatusError } > [4302493.664000] ata4: no sense translation for status: 0x51 > [4302493.664000] ata4: translated ATA stat/err 0x51/00 to SCSI > SK/ASC/ASCQ 0xb/00/00 > [4302493.664000] ata4: status=0x51 { DriveReady SeekComplete Error } > [4302863.673000] ata4: no sense translation for status: 0x51 > [4302863.673000] ata4: translated ATA stat/err 0x51/00 to SCSI > SK/ASC/ASCQ 0xb/00/00 > [4302863.673000] ata4: status=0x51 { DriveReady SeekComplete Error } > > different drive, different cable, same controller, but second port > > So that Stat/err = UDMA_CRC_Error_Count! No, I don't think it is -- there's a bit in the drive status for indicating CRC errors, and it is not showing up here. I think it's still just libata sending some command that this drive does not implement. You really need to dump out the failed ATA opcode. I *think* this (uncompiled, untested) patch may do it for you on 2.6.16/17: --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-19 10:37:03.000000000 -0400 +++ linux/drivers/scsi/libata-scsi.c 2006-07-07 09:06:57.000000000 -0400 @@ -542,6 +542,7 @@ struct ata_taskfile *tf = &qc->tf; unsigned char *sb = cmd->sense_buffer; unsigned char *desc = sb + 8; + unsigned char ata_op = tf->command; memset(sb, 0, SCSI_SENSE_BUFFERSIZE); @@ -558,6 +559,7 @@ * onto sense key, asc & ascq. */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { + printk(KERN_WARN "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); ata_to_sense_error(qc->ap->id, tf->command, tf->feature, &sb[1], &sb[2], &sb[3]); sb[1] &= 0x0f; ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-07 13:08 ` Mark Lord @ 2006-07-07 13:24 ` Justin Piszcz 2006-07-07 13:43 ` Mark Lord 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-07-07 13:24 UTC (permalink / raw) To: Mark Lord; +Cc: Sander, Jeff Garzik, linux-kernel, IDE/ATA development list On Fri, 7 Jul 2006, Mark Lord wrote: > Justin Piszcz wrote: >> Look at this: >> >>> From smartctl, look at the correspondence: >> 199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always >> - 4 >> >> [4301946.802000] ata4: translated ATA stat/err 0x51/04 to SCSI >> SK/ASC/ASCQ 0xb/00/00 >> [4301946.802000] ata4: status=0x51 { DriveReady SeekComplete Error } >> [4301946.802000] ata4: error=0x04 { DriveStatusError } >> [4302380.482000] ata4: translated ATA stat/err 0x51/04 to SCSI >> SK/ASC/ASCQ 0xb/00/00 >> [4302380.482000] ata4: status=0x51 { DriveReady SeekComplete Error } >> [4302380.482000] ata4: error=0x04 { DriveStatusError } >> [4302493.664000] ata4: no sense translation for status: 0x51 >> [4302493.664000] ata4: translated ATA stat/err 0x51/00 to SCSI >> SK/ASC/ASCQ 0xb/00/00 >> [4302493.664000] ata4: status=0x51 { DriveReady SeekComplete Error } >> [4302863.673000] ata4: no sense translation for status: 0x51 >> [4302863.673000] ata4: translated ATA stat/err 0x51/00 to SCSI >> SK/ASC/ASCQ 0xb/00/00 >> [4302863.673000] ata4: status=0x51 { DriveReady SeekComplete Error } >> >> different drive, different cable, same controller, but second port >> >> So that Stat/err = UDMA_CRC_Error_Count! > > No, I don't think it is -- there's a bit in the drive status > for indicating CRC errors, and it is not showing up here. > > I think it's still just libata sending some command that this > drive does not implement. You really need to dump out the failed > ATA opcode. > > I *think* this (uncompiled, untested) patch may do it for you on 2.6.16/17: > > --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-19 10:37:03.000000000 -0400 > +++ linux/drivers/scsi/libata-scsi.c 2006-07-07 09:06:57.000000000 -0400 > @@ -542,6 +542,7 @@ > struct ata_taskfile *tf = &qc->tf; > unsigned char *sb = cmd->sense_buffer; > unsigned char *desc = sb + 8; > + unsigned char ata_op = tf->command; > > memset(sb, 0, SCSI_SENSE_BUFFERSIZE); > > @@ -558,6 +559,7 @@ > * onto sense key, asc & ascq. > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > + printk(KERN_WARN "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > &sb[1], &sb[2], &sb[3]); > sb[1] &= 0x0f; > had to change KERN_WARN -> KERN_WARNING then more errors the patch never worked for me even when I had gotten it to work in 2.6.15.4, it never showed me what I wanted to see drivers/scsi/libata-scsi.c: In function 'ata_gen_fixed_sense': drivers/scsi/libata-scsi.c:638: error: 'ata_op' undeclared (first use in this function) drivers/scsi/libata-scsi.c:638: error: (Each undeclared identifier is reported only once drivers/scsi/libata-scsi.c:638: error: for each function it appears in.) make[2]: *** [drivers/scsi/libata-scsi.o] Error 1 do you know who wrote the original patch? ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-07 13:24 ` Justin Piszcz @ 2006-07-07 13:43 ` Mark Lord 2006-07-07 13:48 ` Justin Piszcz ` (2 more replies) 0 siblings, 3 replies; 147+ messages in thread From: Mark Lord @ 2006-07-07 13:43 UTC (permalink / raw) To: Justin Piszcz; +Cc: Sander, Jeff Garzik, linux-kernel, IDE/ATA development list Justin Piszcz wrote: > > had to change > > KERN_WARN -> KERN_WARNING > > then more errors Eh? After fixing the KERN_WARN -> KERN_WARNING part, the patch compiles / links cleanly here on 2.6.17. (fixed copy below). Still untested, though. > do you know who wrote the original patch? I did. Cheers --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-19 10:37:03.000000000 -0400 +++ linux/drivers/scsi/libata-scsi.c 2006-07-07 09:06:57.000000000 -0400 @@ -542,6 +542,7 @@ struct ata_taskfile *tf = &qc->tf; unsigned char *sb = cmd->sense_buffer; unsigned char *desc = sb + 8; + unsigned char ata_op = tf->command; memset(sb, 0, SCSI_SENSE_BUFFERSIZE); @@ -558,6 +559,7 @@ * onto sense key, asc & ascq. */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); ata_to_sense_error(qc->ap->id, tf->command, tf->feature, &sb[1], &sb[2], &sb[3]); sb[1] &= 0x0f; ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-07 13:43 ` Mark Lord @ 2006-07-07 13:48 ` Justin Piszcz 2006-07-07 14:01 ` Justin Piszcz 2006-07-07 14:35 ` Justin Piszcz 2 siblings, 0 replies; 147+ messages in thread From: Justin Piszcz @ 2006-07-07 13:48 UTC (permalink / raw) To: Mark Lord; +Cc: Sander, Jeff Garzik, linux-kernel, IDE/ATA development list On Fri, 7 Jul 2006, Mark Lord wrote: > Justin Piszcz wrote: >> >> had to change >> >> KERN_WARN -> KERN_WARNING >> >> then more errors > > Eh? After fixing the KERN_WARN -> KERN_WARNING part, > the patch compiles / links cleanly here on 2.6.17. > (fixed copy below). Still untested, though. > >> do you know who wrote the original patch? > > I did. > > Cheers > > --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-19 10:37:03.000000000 -0400 > +++ linux/drivers/scsi/libata-scsi.c 2006-07-07 09:06:57.000000000 -0400 > @@ -542,6 +542,7 @@ > struct ata_taskfile *tf = &qc->tf; > unsigned char *sb = cmd->sense_buffer; > unsigned char *desc = sb + 8; > + unsigned char ata_op = tf->command; > > memset(sb, 0, SCSI_SENSE_BUFFERSIZE); > > @@ -558,6 +559,7 @@ > * onto sense key, asc & ascq. > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > &sb[1], &sb[2], &sb[3]); > sb[1] &= 0x0f; > Applied patch, rebooting, waiting to get the error again. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-07 13:43 ` Mark Lord 2006-07-07 13:48 ` Justin Piszcz @ 2006-07-07 14:01 ` Justin Piszcz 2006-07-07 14:35 ` Justin Piszcz 2 siblings, 0 replies; 147+ messages in thread From: Justin Piszcz @ 2006-07-07 14:01 UTC (permalink / raw) To: Mark Lord; +Cc: Sander, Jeff Garzik, linux-kernel, IDE/ATA development list On Fri, 7 Jul 2006, Mark Lord wrote: > Justin Piszcz wrote: >> >> had to change >> >> KERN_WARN -> KERN_WARNING >> >> then more errors > > Eh? After fixing the KERN_WARN -> KERN_WARNING part, > the patch compiles / links cleanly here on 2.6.17. > (fixed copy below). Still untested, though. > >> do you know who wrote the original patch? > > I did. > > Cheers > > --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-19 10:37:03.000000000 -0400 > +++ linux/drivers/scsi/libata-scsi.c 2006-07-07 09:06:57.000000000 -0400 > @@ -542,6 +542,7 @@ > struct ata_taskfile *tf = &qc->tf; > unsigned char *sb = cmd->sense_buffer; > unsigned char *desc = sb + 8; > + unsigned char ata_op = tf->command; > > memset(sb, 0, SCSI_SENSE_BUFFERSIZE); > > @@ -558,6 +559,7 @@ > * onto sense key, asc & ascq. > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > &sb[1], &sb[2], &sb[3]); > sb[1] &= 0x0f; > Mark, I've set a disk faulty in my SW RAID5 and rebuilding it now, note, in the past two rebuilds I have done (in exact same manner & disk) I've gotten 3-4 of these or so, so if I do not get them this time, that will be extremely odd. Justin. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-07 13:43 ` Mark Lord 2006-07-07 13:48 ` Justin Piszcz 2006-07-07 14:01 ` Justin Piszcz @ 2006-07-07 14:35 ` Justin Piszcz 2006-07-07 18:53 ` Justin Piszcz 2 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-07-07 14:35 UTC (permalink / raw) To: Mark Lord; +Cc: Sander, Jeff Garzik, linux-kernel, IDE/ATA development list On Fri, 7 Jul 2006, Mark Lord wrote: > Justin Piszcz wrote: >> >> had to change >> >> KERN_WARN -> KERN_WARNING >> >> then more errors > > Eh? After fixing the KERN_WARN -> KERN_WARNING part, > the patch compiles / links cleanly here on 2.6.17. > (fixed copy below). Still untested, though. > >> do you know who wrote the original patch? > > I did. > > Cheers > > --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-19 10:37:03.000000000 -0400 > +++ linux/drivers/scsi/libata-scsi.c 2006-07-07 09:06:57.000000000 -0400 > @@ -542,6 +542,7 @@ > struct ata_taskfile *tf = &qc->tf; > unsigned char *sb = cmd->sense_buffer; > unsigned char *desc = sb + 8; > + unsigned char ata_op = tf->command; > > memset(sb, 0, SCSI_SENSE_BUFFERSIZE); > > @@ -558,6 +559,7 @@ > * onto sense key, asc & ascq. > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > &sb[1], &sb[2], &sb[3]); > sb[1] &= 0x0f; > Mark!! It did it again, here you go: ==> /p34/var/log/messages <== Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { DriveReady SeekComplete Index Error } Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { DriveStatusError } ==> /p34/var/log/kern.log <== Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: translated ATA stat/err 0x53/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { DriveReady SeekComplete Index Error } Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { DriveStatusError } Does this help? Can we eliminate the cause of these errors now? ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-07 14:35 ` Justin Piszcz @ 2006-07-07 18:53 ` Justin Piszcz 2006-07-07 19:19 ` Jeff Garzik 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-07-07 18:53 UTC (permalink / raw) To: Mark Lord Cc: Sander, Jeff Garzik, linux-kernel, IDE/ATA development list, Alan Cox On Fri, 7 Jul 2006, Justin Piszcz wrote: > > > On Fri, 7 Jul 2006, Mark Lord wrote: > >> Justin Piszcz wrote: >>> >>> had to change >>> >>> KERN_WARN -> KERN_WARNING >>> >>> then more errors >> >> Eh? After fixing the KERN_WARN -> KERN_WARNING part, >> the patch compiles / links cleanly here on 2.6.17. >> (fixed copy below). Still untested, though. >> >>> do you know who wrote the original patch? >> >> I did. >> >> Cheers >> >> --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-19 10:37:03.000000000 >> -0400 >> +++ linux/drivers/scsi/libata-scsi.c 2006-07-07 09:06:57.000000000 -0400 >> @@ -542,6 +542,7 @@ >> struct ata_taskfile *tf = &qc->tf; >> unsigned char *sb = cmd->sense_buffer; >> unsigned char *desc = sb + 8; >> + unsigned char ata_op = tf->command; >> >> memset(sb, 0, SCSI_SENSE_BUFFERSIZE); >> >> @@ -558,6 +559,7 @@ >> * onto sense key, asc & ascq. >> */ >> if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { >> + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed >> ata_op=0x%02x\n", ata_op); >> ata_to_sense_error(qc->ap->id, tf->command, tf->feature, >> &sb[1], &sb[2], &sb[3]); >> sb[1] &= 0x0f; >> > > Mark!! It did it again, here you go: > > ==> /p34/var/log/messages <== > Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { DriveReady > SeekComplete Index Error } > Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { > DriveStatusError } > ==> /p34/var/log/kern.log <== > Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: translated ATA stat/err > 0x53/04 to SCSI SK/ASC/ASCQ 0xb/00/00 > Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { DriveReady > SeekComplete Index Error } > Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { > DriveStatusError } > > Does this help? > > Can we eliminate the cause of these errors now? > > Jeff or Alan, Does that ATA translation help in determining what *bad* commands are being sent to the drive? This occurs on two separate identical disks. Justin. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-07 18:53 ` Justin Piszcz @ 2006-07-07 19:19 ` Jeff Garzik 2006-07-07 19:28 ` Justin Piszcz 0 siblings, 1 reply; 147+ messages in thread From: Jeff Garzik @ 2006-07-07 19:19 UTC (permalink / raw) To: Justin Piszcz Cc: Mark Lord, Sander, linux-kernel, IDE/ATA development list, Alan Cox Justin Piszcz wrote: > > > On Fri, 7 Jul 2006, Justin Piszcz wrote: > >> >> >> On Fri, 7 Jul 2006, Mark Lord wrote: >> >>> Justin Piszcz wrote: >>>> >>>> had to change >>>> >>>> KERN_WARN -> KERN_WARNING >>>> >>>> then more errors >>> >>> Eh? After fixing the KERN_WARN -> KERN_WARNING part, >>> the patch compiles / links cleanly here on 2.6.17. >>> (fixed copy below). Still untested, though. >>> >>>> do you know who wrote the original patch? >>> >>> I did. >>> >>> Cheers >>> >>> --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-19 >>> 10:37:03.000000000 -0400 >>> +++ linux/drivers/scsi/libata-scsi.c 2006-07-07 09:06:57.000000000 >>> -0400 >>> @@ -542,6 +542,7 @@ >>> struct ata_taskfile *tf = &qc->tf; >>> unsigned char *sb = cmd->sense_buffer; >>> unsigned char *desc = sb + 8; >>> + unsigned char ata_op = tf->command; >>> >>> memset(sb, 0, SCSI_SENSE_BUFFERSIZE); >>> >>> @@ -558,6 +559,7 @@ >>> * onto sense key, asc & ascq. >>> */ >>> if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { >>> + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed >>> ata_op=0x%02x\n", ata_op); >>> ata_to_sense_error(qc->ap->id, tf->command, tf->feature, >>> &sb[1], &sb[2], &sb[3]); >>> sb[1] &= 0x0f; >>> >> >> Mark!! It did it again, here you go: >> >> ==> /p34/var/log/messages <== >> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { >> DriveReady SeekComplete Index Error } >> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { >> DriveStatusError } >> ==> /p34/var/log/kern.log <== >> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: translated ATA >> stat/err 0x53/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { >> DriveReady SeekComplete Index Error } >> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { >> DriveStatusError } >> >> Does this help? >> >> Can we eliminate the cause of these errors now? >> >> > > Jeff or Alan, > > Does that ATA translation help in determining what *bad* commands are > being sent to the drive? No, it needs the patch that Mark has been posting... Jeff ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-07 19:19 ` Jeff Garzik @ 2006-07-07 19:28 ` Justin Piszcz [not found] ` <200607091224.31451.liml@rtr.ca> 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-07-07 19:28 UTC (permalink / raw) To: Jeff Garzik Cc: Mark Lord, Sander, linux-kernel, IDE/ATA development list, Alan Cox On Fri, 7 Jul 2006, Jeff Garzik wrote: > Justin Piszcz wrote: >> >> >> On Fri, 7 Jul 2006, Justin Piszcz wrote: >> >>> >>> >>> On Fri, 7 Jul 2006, Mark Lord wrote: >>> >>>> Justin Piszcz wrote: >>>>> >>>>> had to change >>>>> >>>>> KERN_WARN -> KERN_WARNING >>>>> >>>>> then more errors >>>> >>>> Eh? After fixing the KERN_WARN -> KERN_WARNING part, >>>> the patch compiles / links cleanly here on 2.6.17. >>>> (fixed copy below). Still untested, though. >>>> >>>>> do you know who wrote the original patch? >>>> >>>> I did. >>>> >>>> Cheers >>>> >>>> --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-19 >>>> 10:37:03.000000000 -0400 >>>> +++ linux/drivers/scsi/libata-scsi.c 2006-07-07 09:06:57.000000000 >>>> -0400 >>>> @@ -542,6 +542,7 @@ >>>> struct ata_taskfile *tf = &qc->tf; >>>> unsigned char *sb = cmd->sense_buffer; >>>> unsigned char *desc = sb + 8; >>>> + unsigned char ata_op = tf->command; >>>> >>>> memset(sb, 0, SCSI_SENSE_BUFFERSIZE); >>>> >>>> @@ -558,6 +559,7 @@ >>>> * onto sense key, asc & ascq. >>>> */ >>>> if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { >>>> + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed >>>> ata_op=0x%02x\n", ata_op); >>>> ata_to_sense_error(qc->ap->id, tf->command, tf->feature, >>>> &sb[1], &sb[2], &sb[3]); >>>> sb[1] &= 0x0f; >>>> >>> >>> Mark!! It did it again, here you go: >>> >>> ==> /p34/var/log/messages <== >>> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { >>> DriveReady SeekComplete Index Error } >>> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { >>> DriveStatusError } >>> ==> /p34/var/log/kern.log <== >>> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: translated ATA stat/err >>> 0x53/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >>> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { >>> DriveReady SeekComplete Index Error } >>> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { >>> DriveStatusError } >>> >>> Does this help? >>> >>> Can we eliminate the cause of these errors now? >>> >>> >> >> Jeff or Alan, >> >> Does that ATA translation help in determining what *bad* commands are being >> sent to the drive? > > No, it needs the patch that Mark has been posting... > > Jeff > > > Jeff, the patch is applied and box booted the new kernel and I reproduced the error messages, THAT is what is produced with the patch. Without the patch: Jun 18 07:09:53 p34 kernel: [4297678.777000] ata3: status=0x51 { DriveReady SeekComplete Error } Jun 18 07:09:53 p34 kernel: [4297678.777000] ata3: error=0x04 { DriveStatusError } Jun 18 07:20:08 p34 -- MARK -- Jun 18 07:27:31 p34 kernel: [4298736.905000] ata3: status=0x51 { DriveReady SeekComplete Error } Jun 18 07:27:31 p34 kernel: [4298736.905000] ata3: error=0x04 { DriveStatusError } With the patch: Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: translated ATA stat/err 0x53/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { DriveReady SeekComplete Index Error } Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { DriveStatusError } Jul 7 10:49:29 p34 kernel: [4298273.178000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 7 10:49:29 p34 kernel: [4298273.178000] ata4: status=0x51 { DriveReady SeekComplete Error } Jul 7 10:49:29 p34 kernel: [4298273.178000] ata4: error=0x04 { DriveStatusError } Jul 7 11:43:02 p34 kernel: [4301488.359000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 7 11:43:02 p34 kernel: [4301488.359000] ata4: status=0x51 { DriveReady SeekComplete Error } Jul 7 11:43:02 p34 kernel: [4301488.359000] ata4: error=0x04 { DriveStatusError } Jul 7 12:35:27 p34 kernel: [4304634.600000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 7 12:35:27 p34 kernel: [4304634.600000] ata4: status=0x51 { DriveReady SeekComplete Error } Jul 7 12:35:27 p34 kernel: [4304634.600000] ata4: error=0x04 { DriveStatusError } Jul 7 12:44:14 p34 kernel: [4305162.220000] ata4: no sense translation for status: 0x51 Jul 7 12:44:14 p34 kernel: [4305162.220000] ata4: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 7 12:44:14 p34 kernel: [4305162.220000] ata4: status=0x51 { DriveReady SeekComplete Error } Jul 7 13:03:22 p34 kernel: [4306309.782000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 7 13:03:22 p34 kernel: [4306309.782000] ata4: status=0x51 { DriveReady SeekComplete Error } Jul 7 13:03:22 p34 kernel: [4306309.782000] ata4: error=0x04 { DriveStatusError } Jul 7 13:05:12 p34 kernel: [4306419.891000] ata4: no sense translation for status: 0x51 Jul 7 13:05:12 p34 kernel: [4306419.891000] ata4: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 7 13:05:12 p34 kernel: [4306419.891000] ata4: status=0x51 { DriveReady SeekComplete Error } Jul 7 13:32:20 p34 kernel: [4308048.717000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 7 13:32:20 p34 kernel: [4308048.717000] ata4: status=0x51 { DriveReady SeekComplete Error } Jul 7 13:32:20 p34 kernel: [4308048.717000] ata4: error=0x04 { DriveStatusError } When I had been running it earlier with 2.6.15.x: Mar 1 13:31:10 p34 kernel: [4295292.736000] +++PATCH: Original kernel error: Mar 1 13:31:10 p34 kernel: [4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 1 13:31:10 p34 kernel: [4295292.736000] +++PATCH: Mark Lord's extended verbosity patch: Mar 1 13:31:10 p34 kernel: [4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 1 13:31:10 p34 kernel: [4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error } Mar 1 13:31:10 p34 kernel: [4295292.736000] ata3: error=0x04 { DriveStatusError } Mar 1 13:31:10 p34 kernel: [4295292.736000] +++PATCH: Original kernel error: Mar 1 13:31:10 p34 kernel: [4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 1 13:31:10 p34 kernel: [4295292.736000] +++PATCH: Mark Lord's extended verbosity patch: Mar 1 13:31:10 p34 kernel: [4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 1 13:31:10 p34 kernel: [4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error } Mar 1 13:31:10 p34 kernel: [4295292.736000] ata3: error=0x04 { DriveStatusError } Perhaps the patch is not printing out the correct error message? This shows that the source file was patched in libata-scsi.c. /* * Use ata_to_sense_error() to map status register bits * onto sense key, asc & ascq. */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); ata_to_sense_error(qc->ap->id, tf->command, tf->feature, &sb[1], &sb[2], &sb[3]); sb[1] &= 0x0f; } This shows the kernel version. $ cat /usr/src/linux/.version 4 This shows I am running the patched version. $ uname -a Linux p34.internal.lan 2.6.17.3 #4 SMP PREEMPT Fri Jul 7 09:47:53 EDT 2006 i686 GNU/Linux $ Maybe something is blocking the opcode output from showing correctly? Justin. ^ permalink raw reply [flat|nested] 147+ messages in thread
[parent not found: <200607091224.31451.liml@rtr.ca>]
* Re: LibPATA code issues / 2.6.15.4 [not found] ` <200607091224.31451.liml@rtr.ca> @ 2006-07-09 17:27 ` Justin Piszcz 2006-07-09 20:16 ` Justin Piszcz 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-07-09 17:27 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, Sander, linux-kernel, IDE/ATA development list, Alan Cox On Sun, 9 Jul 2006, Mark Lord wrote: > Mmm.. there are two main paths into those messages, > and my current patch only caught one of them. > > Here's a reworked version that catches the ata_op on both paths. > Maybe this will dump out the info we need to diagnose Justin's system. > > Compiles & links fine on 2.6.17, but not tested. > > Cheers > > --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-23 13:38:37.000000000 -0400 > +++ linux/drivers/scsi/libata-scsi.c 2006-07-09 12:19:52.000000000 -0400 > @@ -542,6 +542,7 @@ > struct ata_taskfile *tf = &qc->tf; > unsigned char *sb = cmd->sense_buffer; > unsigned char *desc = sb + 8; > + unsigned char ata_op = tf->command; > > memset(sb, 0, SCSI_SENSE_BUFFERSIZE); > > @@ -558,6 +559,7 @@ > * onto sense key, asc & ascq. > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > &sb[1], &sb[2], &sb[3]); > sb[1] &= 0x0f; > @@ -617,6 +619,7 @@ > struct scsi_cmnd *cmd = qc->scsicmd; > struct ata_taskfile *tf = &qc->tf; > unsigned char *sb = cmd->sense_buffer; > + unsigned char ata_op = tf->command; > > memset(sb, 0, SCSI_SENSE_BUFFERSIZE); > > @@ -633,6 +636,7 @@ > * onto sense key, asc & ascq. > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > + printk(KERN_WARNING "ata_gen_fixed_sense: failed ata_op=0x%02x\n", ata_op); > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > &sb[2], &sb[12], &sb[13]); > sb[2] &= 0x0f; > Thanks Mark! Applying now. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-09 17:27 ` Justin Piszcz @ 2006-07-09 20:16 ` Justin Piszcz 2006-07-09 20:40 ` LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! Justin Piszcz 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-07-09 20:16 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, Sander, linux-kernel, IDE/ATA development list, Alan Cox On Sun, 9 Jul 2006, Justin Piszcz wrote: > > > On Sun, 9 Jul 2006, Mark Lord wrote: > >> Mmm.. there are two main paths into those messages, >> and my current patch only caught one of them. >> >> Here's a reworked version that catches the ata_op on both paths. >> Maybe this will dump out the info we need to diagnose Justin's system. >> >> Compiles & links fine on 2.6.17, but not tested. >> >> Cheers >> >> --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-23 13:38:37.000000000 >> -0400 >> +++ linux/drivers/scsi/libata-scsi.c 2006-07-09 12:19:52.000000000 -0400 >> @@ -542,6 +542,7 @@ >> struct ata_taskfile *tf = &qc->tf; >> unsigned char *sb = cmd->sense_buffer; >> unsigned char *desc = sb + 8; >> + unsigned char ata_op = tf->command; >> >> memset(sb, 0, SCSI_SENSE_BUFFERSIZE); >> >> @@ -558,6 +559,7 @@ >> * onto sense key, asc & ascq. >> */ >> if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { >> + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed >> ata_op=0x%02x\n", ata_op); >> ata_to_sense_error(qc->ap->id, tf->command, tf->feature, >> &sb[1], &sb[2], &sb[3]); >> sb[1] &= 0x0f; >> @@ -617,6 +619,7 @@ >> struct scsi_cmnd *cmd = qc->scsicmd; >> struct ata_taskfile *tf = &qc->tf; >> unsigned char *sb = cmd->sense_buffer; >> + unsigned char ata_op = tf->command; >> >> memset(sb, 0, SCSI_SENSE_BUFFERSIZE); >> >> @@ -633,6 +636,7 @@ >> * onto sense key, asc & ascq. >> */ >> if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { >> + printk(KERN_WARNING "ata_gen_fixed_sense: failed >> ata_op=0x%02x\n", ata_op); >> ata_to_sense_error(qc->ap->id, tf->command, tf->feature, >> &sb[2], &sb[12], &sb[13]); >> sb[2] &= 0x0f; >> > > Thanks Mark! > > Applying now. > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > Mark, Check line 519, this is where it is printing the error (I believe) and the patch does not print the ata_op here. It is in the ata_to_sense_error() function. I've already patched, as you can see, recompiled, etc.. # patch -p0 < /tmp/b patching file linux/drivers/scsi/libata-scsi.c Reversed (or previously applied) patch detected! Assume -R? [n] # Jul 9 15:22:57 p34 kernel: [4300704.724000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 9 15:22:57 p34 kernel: [4300704.724000] ata3: status=0x51 { DriveReady SeekComplete Error } Jul 9 15:22:57 p34 kernel: [4300704.724000] ata3: error=0x04 { DriveStatusError } This part needs the ata_op: 519 translate_done: 520 printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to " 521 "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err, 522 *sk, *asc, *ascq); Justin. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! 2006-07-09 20:16 ` Justin Piszcz @ 2006-07-09 20:40 ` Justin Piszcz 2006-07-09 20:46 ` Justin Piszcz 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-07-09 20:40 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, Sander, linux-kernel, IDE/ATA development list, Alan Cox I made my own patch (following Mark's example) but also added that printk in that function. Jul 9 16:37:52 p34 kernel: [4294810.556000] ata_gen_fixed_sense: failed ata_op=0x35 Jul 9 16:37:52 p34 kernel: [4294810.556000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 9 16:37:52 p34 kernel: [4294810.556000] ata_gen_ata_desc_sense: failed ata_op=0x51 Jul 9 16:37:52 p34 kernel: [4294810.556000] ata4: status=0x51 { DriveReady SeekComplete Error } Jul 9 16:37:52 p34 kernel: [4294810.556000] ata4: error=0x04 { DriveStatusError } Now that we have found the ata_op code of 0x35, what does this mean? Is it generated from a bad FUA/unsupported command from the kernel/SATA driver? Justin. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! 2006-07-09 20:40 ` LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! Justin Piszcz @ 2006-07-09 20:46 ` Justin Piszcz 2006-07-09 21:05 ` Justin Piszcz 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-07-09 20:46 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, Sander, linux-kernel, IDE/ATA development list, Alan Cox On Sun, 9 Jul 2006, Justin Piszcz wrote: > I made my own patch (following Mark's example) but also added that printk in > that function. > > Jul 9 16:37:52 p34 kernel: [4294810.556000] ata_gen_fixed_sense: failed > ata_op=0x35 > Jul 9 16:37:52 p34 kernel: [4294810.556000] ata4: translated ATA stat/err > 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 > Jul 9 16:37:52 p34 kernel: [4294810.556000] ata_gen_ata_desc_sense: failed > ata_op=0x51 > Jul 9 16:37:52 p34 kernel: [4294810.556000] ata4: status=0x51 { DriveReady > SeekComplete Error } > Jul 9 16:37:52 p34 kernel: [4294810.556000] ata4: error=0x04 { > DriveStatusError } > > Now that we have found the ata_op code of 0x35, what does this mean? Is it > generated from a bad FUA/unsupported command from the kernel/SATA driver? > > Justin. > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > In /usr/src/linux/include/linux/ata.h: ATA_CMD_WRITE_EXT = 0x35, Perhaps these drives do not support this command or do not support it properly? Any idea, Jeff/Alan? Justin. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! 2006-07-09 20:46 ` Justin Piszcz @ 2006-07-09 21:05 ` Justin Piszcz 2006-07-09 22:03 ` Justin Piszcz 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-07-09 21:05 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, Sander, linux-kernel, IDE/ATA development list, Alan Cox On Sun, 9 Jul 2006, Justin Piszcz wrote: > > > On Sun, 9 Jul 2006, Justin Piszcz wrote: > >> I made my own patch (following Mark's example) but also added that printk >> in that function. >> >> Jul 9 16:37:52 p34 kernel: [4294810.556000] ata_gen_fixed_sense: failed >> ata_op=0x35 >> Jul 9 16:37:52 p34 kernel: [4294810.556000] ata4: translated ATA stat/err >> 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> Jul 9 16:37:52 p34 kernel: [4294810.556000] ata_gen_ata_desc_sense: failed >> ata_op=0x51 >> Jul 9 16:37:52 p34 kernel: [4294810.556000] ata4: status=0x51 { DriveReady >> SeekComplete Error } >> Jul 9 16:37:52 p34 kernel: [4294810.556000] ata4: error=0x04 { >> DriveStatusError } >> >> Now that we have found the ata_op code of 0x35, what does this mean? Is it >> generated from a bad FUA/unsupported command from the kernel/SATA driver? >> >> Justin. >> >> - >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> > > In /usr/src/linux/include/linux/ata.h: > > ATA_CMD_WRITE_EXT = 0x35, > > Perhaps these drives do not support this command or do not support it > properly? > > Any idea, Jeff/Alan? > > Justin. > > Here are all the errors (when reading/writing heavily): [4294810.556000] ata_gen_fixed_sense: failed ata_op=0x35 [4294810.556000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4294810.556000] ata_gen_ata_desc_sense: failed ata_op=0x51 [4294810.556000] ata4: status=0x51 { DriveReady SeekComplete Error } [4294810.556000] ata4: error=0x04 { DriveStatusError } [4295514.668000] ata_gen_fixed_sense: failed ata_op=0x35 [4295514.668000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4295514.668000] ata_gen_ata_desc_sense: failed ata_op=0x51 [4295514.668000] ata3: status=0x51 { DriveReady SeekComplete Error } [4295514.668000] ata3: error=0x04 { DriveStatusError } Jeff/Mark, from these errors can we reach a consensus as to the cause of these errors and how to eliminate the problem? Thanks, Justin. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! 2006-07-09 21:05 ` Justin Piszcz @ 2006-07-09 22:03 ` Justin Piszcz 2006-07-10 13:59 ` Follow up? " Justin Piszcz 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-07-09 22:03 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, Sander, linux-kernel, IDE/ATA development list, Alan Cox [4294810.556000] ata_gen_ata_desc_sense: failed ata_op=0x51 [4294810.556000] ata4: status=0x51 { DriveReady SeekComplete Error } [4294810.556000] ata4: error=0x04 { DriveStatusError } [4295514.668000] ata_gen_fixed_sense: failed ata_op=0x35 [4295514.668000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4295514.668000] ata_gen_ata_desc_sense: failed ata_op=0x51 [4295514.668000] ata3: status=0x51 { DriveReady SeekComplete Error } [4295514.668000] ata3: error=0x04 { DriveStatusError } [4297033.649000] ata_gen_fixed_sense: failed ata_op=0xca [4297033.649000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4297033.649000] ata_gen_ata_desc_sense: failed ata_op=0x51 [4297033.649000] ata4: status=0x51 { DriveReady SeekComplete Error } [4297033.649000] ata4: error=0x04 { DriveStatusError } [4297741.057000] ata_gen_fixed_sense: failed ata_op=0x35 [4297741.057000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4297741.057000] ata_gen_ata_desc_sense: failed ata_op=0x51 [4297741.057000] ata4: status=0x51 { DriveReady SeekComplete Error } [4297741.057000] ata4: error=0x04 { DriveStatusError } Also got a 0xca. On Sun, 9 Jul 2006, Justin Piszcz wrote: > > > On Sun, 9 Jul 2006, Justin Piszcz wrote: > >> >> >> On Sun, 9 Jul 2006, Justin Piszcz wrote: >> >>> I made my own patch (following Mark's example) but also added that printk >>> in that function. >>> >>> Jul 9 16:37:52 p34 kernel: [4294810.556000] ata_gen_fixed_sense: failed >>> ata_op=0x35 >>> Jul 9 16:37:52 p34 kernel: [4294810.556000] ata4: translated ATA stat/err >>> 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >>> Jul 9 16:37:52 p34 kernel: [4294810.556000] ata_gen_ata_desc_sense: >>> failed ata_op=0x51 >>> Jul 9 16:37:52 p34 kernel: [4294810.556000] ata4: status=0x51 { >>> DriveReady SeekComplete Error } >>> Jul 9 16:37:52 p34 kernel: [4294810.556000] ata4: error=0x04 { >>> DriveStatusError } >>> >>> Now that we have found the ata_op code of 0x35, what does this mean? Is >>> it generated from a bad FUA/unsupported command from the kernel/SATA >>> driver? >>> >>> Justin. >>> >>> - >>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> Please read the FAQ at http://www.tux.org/lkml/ >>> >> >> In /usr/src/linux/include/linux/ata.h: >> >> ATA_CMD_WRITE_EXT = 0x35, >> >> Perhaps these drives do not support this command or do not support it >> properly? >> >> Any idea, Jeff/Alan? >> >> Justin. >> >> > > Here are all the errors (when reading/writing heavily): > > [4294810.556000] ata_gen_fixed_sense: failed ata_op=0x35 > [4294810.556000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ > 0xb/00/00 > [4294810.556000] ata_gen_ata_desc_sense: failed ata_op=0x51 > [4294810.556000] ata4: status=0x51 { DriveReady SeekComplete Error } > [4294810.556000] ata4: error=0x04 { DriveStatusError } > [4295514.668000] ata_gen_fixed_sense: failed ata_op=0x35 > [4295514.668000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ > 0xb/00/00 > [4295514.668000] ata_gen_ata_desc_sense: failed ata_op=0x51 > [4295514.668000] ata3: status=0x51 { DriveReady SeekComplete Error } > [4295514.668000] ata3: error=0x04 { DriveStatusError } > > Jeff/Mark, from these errors can we reach a consensus as to the cause of > these errors and how to eliminate the problem? > > Thanks, > > Justin. > ^ permalink raw reply [flat|nested] 147+ messages in thread
* Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! 2006-07-09 22:03 ` Justin Piszcz @ 2006-07-10 13:59 ` Justin Piszcz 2006-07-10 15:33 ` Alan Cox 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-07-10 13:59 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, Sander, linux-kernel, IDE/ATA development list, Alan Cox Any follow up now that we have the failed ata-translated op codes? On Sun, 9 Jul 2006, Justin Piszcz wrote: > [4294810.556000] ata_gen_ata_desc_sense: failed ata_op=0x51 > [4294810.556000] ata4: status=0x51 { DriveReady SeekComplete Error } > [4294810.556000] ata4: error=0x04 { DriveStatusError } > [4295514.668000] ata_gen_fixed_sense: failed ata_op=0x35 > [4295514.668000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ > 0xb/00/00 > [4295514.668000] ata_gen_ata_desc_sense: failed ata_op=0x51 > [4295514.668000] ata3: status=0x51 { DriveReady SeekComplete Error } > [4295514.668000] ata3: error=0x04 { DriveStatusError } > [4297033.649000] ata_gen_fixed_sense: failed ata_op=0xca > [4297033.649000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ > 0xb/00/00 > [4297033.649000] ata_gen_ata_desc_sense: failed ata_op=0x51 > [4297033.649000] ata4: status=0x51 { DriveReady SeekComplete Error } > [4297033.649000] ata4: error=0x04 { DriveStatusError } > [4297741.057000] ata_gen_fixed_sense: failed ata_op=0x35 > [4297741.057000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ > 0xb/00/00 > [4297741.057000] ata_gen_ata_desc_sense: failed ata_op=0x51 > [4297741.057000] ata4: status=0x51 { DriveReady SeekComplete Error } > [4297741.057000] ata4: error=0x04 { DriveStatusError } > > Also got a 0xca. > > > On Sun, 9 Jul 2006, Justin Piszcz wrote: > >> >> >> On Sun, 9 Jul 2006, Justin Piszcz wrote: >> >>> >>> >>> On Sun, 9 Jul 2006, Justin Piszcz wrote: >>> >>>> I made my own patch (following Mark's example) but also added that printk >>>> in that function. >>>> >>>> Jul 9 16:37:52 p34 kernel: [4294810.556000] ata_gen_fixed_sense: failed >>>> ata_op=0x35 >>>> Jul 9 16:37:52 p34 kernel: [4294810.556000] ata4: translated ATA >>>> stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >>>> Jul 9 16:37:52 p34 kernel: [4294810.556000] ata_gen_ata_desc_sense: >>>> failed ata_op=0x51 >>>> Jul 9 16:37:52 p34 kernel: [4294810.556000] ata4: status=0x51 { >>>> DriveReady SeekComplete Error } >>>> Jul 9 16:37:52 p34 kernel: [4294810.556000] ata4: error=0x04 { >>>> DriveStatusError } >>>> >>>> Now that we have found the ata_op code of 0x35, what does this mean? Is >>>> it generated from a bad FUA/unsupported command from the kernel/SATA >>>> driver? >>>> >>>> Justin. >>>> >>>> - >>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" >>>> in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> Please read the FAQ at http://www.tux.org/lkml/ >>>> >>> >>> In /usr/src/linux/include/linux/ata.h: >>> >>> ATA_CMD_WRITE_EXT = 0x35, >>> >>> Perhaps these drives do not support this command or do not support it >>> properly? >>> >>> Any idea, Jeff/Alan? >>> >>> Justin. >>> >>> >> >> Here are all the errors (when reading/writing heavily): >> >> [4294810.556000] ata_gen_fixed_sense: failed ata_op=0x35 >> [4294810.556000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ >> 0xb/00/00 >> [4294810.556000] ata_gen_ata_desc_sense: failed ata_op=0x51 >> [4294810.556000] ata4: status=0x51 { DriveReady SeekComplete Error } >> [4294810.556000] ata4: error=0x04 { DriveStatusError } >> [4295514.668000] ata_gen_fixed_sense: failed ata_op=0x35 >> [4295514.668000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ >> 0xb/00/00 >> [4295514.668000] ata_gen_ata_desc_sense: failed ata_op=0x51 >> [4295514.668000] ata3: status=0x51 { DriveReady SeekComplete Error } >> [4295514.668000] ata3: error=0x04 { DriveStatusError } >> >> Jeff/Mark, from these errors can we reach a consensus as to the cause of >> these errors and how to eliminate the problem? >> >> Thanks, >> >> Justin. >> > ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! 2006-07-10 13:59 ` Follow up? " Justin Piszcz @ 2006-07-10 15:33 ` Alan Cox 2006-07-10 15:45 ` Justin Piszcz 0 siblings, 1 reply; 147+ messages in thread From: Alan Cox @ 2006-07-10 15:33 UTC (permalink / raw) To: Justin Piszcz Cc: Mark Lord, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list Ar Llu, 2006-07-10 am 09:59 -0400, ysgrifennodd Justin Piszcz: > > [4297741.057000] ata_gen_fixed_sense: failed ata_op=0x35 > > [4297741.057000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ > > 0xb/00/00 > > [4297741.057000] ata_gen_ata_desc_sense: failed ata_op=0x51 > > [4297741.057000] ata4: status=0x51 { DriveReady SeekComplete Error } > > [4297741.057000] ata4: error=0x04 { DriveStatusError } > > > > Also got a 0xca. Thats "write" so if that is reporting as an unknown command something very odd indeed is happening. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! 2006-07-10 15:33 ` Alan Cox @ 2006-07-10 15:45 ` Justin Piszcz 2006-07-11 13:28 ` LibPATA code issues / 2.6.17.3 (What is the next step?) Justin Piszcz 2006-07-14 17:14 ` Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! Mark Lord 0 siblings, 2 replies; 147+ messages in thread From: Justin Piszcz @ 2006-07-10 15:45 UTC (permalink / raw) To: Alan Cox Cc: Mark Lord, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list [-- Attachment #1: Type: TEXT/PLAIN, Size: 637 bytes --] Please verify I did the patch correctly, thanks. On Mon, 10 Jul 2006, Alan Cox wrote: > Ar Llu, 2006-07-10 am 09:59 -0400, ysgrifennodd Justin Piszcz: >>> [4297741.057000] ata_gen_fixed_sense: failed ata_op=0x35 >>> [4297741.057000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ >>> 0xb/00/00 >>> [4297741.057000] ata_gen_ata_desc_sense: failed ata_op=0x51 >>> [4297741.057000] ata4: status=0x51 { DriveReady SeekComplete Error } >>> [4297741.057000] ata4: error=0x04 { DriveStatusError } >>> >>> Also got a 0xca. > > Thats "write" so if that is reporting as an unknown command something > very odd indeed is happening. > > [-- Attachment #2: Type: TEXT/PLAIN, Size: 2533 bytes --] diff -uprN linux-2.6.17.3/drivers/scsi/libata-scsi.c linux-2.6.17.3-diff/drivers/scsi/libata-scsi.c --- linux-2.6.17.3/drivers/scsi/libata-scsi.c 2006-06-30 13:37:38.000000000 -0400 +++ linux-2.6.17.3-diff/drivers/scsi/libata-scsi.c 2006-07-09 16:31:45.665112000 -0400 @@ -428,10 +428,16 @@ int ata_scsi_device_suspend(struct scsi_ * spin_lock_irqsave(host_set lock) */ void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc, - u8 *ascq) + u8 *ascq, struct ata_queued_cmd *qc) { int i; + struct scsi_cmnd *cmd = qc->scsicmd; + struct ata_taskfile *tf = &qc->tf; + unsigned char *sb = cmd->sense_buffer; + unsigned char *desc = sb + 8; + unsigned char ata_op = tf->command; + /* Based on the 3ware driver translation table */ static const unsigned char sense_table[][4] = { /* BBD|ECC|ID|MAR */ @@ -520,6 +526,7 @@ void ata_to_sense_error(unsigned id, u8 printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to " "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err, *sk, *asc, *ascq); + printk(KERN_ERR "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); return; } @@ -542,6 +549,7 @@ void ata_gen_ata_desc_sense(struct ata_q struct ata_taskfile *tf = &qc->tf; unsigned char *sb = cmd->sense_buffer; unsigned char *desc = sb + 8; + unsigned char ata_op = tf->command; memset(sb, 0, SCSI_SENSE_BUFFERSIZE); @@ -558,8 +566,9 @@ void ata_gen_ata_desc_sense(struct ata_q * onto sense key, asc & ascq. */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); ata_to_sense_error(qc->ap->id, tf->command, tf->feature, - &sb[1], &sb[2], &sb[3]); + &sb[1], &sb[2], &sb[3],qc); sb[1] &= 0x0f; } @@ -617,6 +626,7 @@ void ata_gen_fixed_sense(struct ata_queu struct scsi_cmnd *cmd = qc->scsicmd; struct ata_taskfile *tf = &qc->tf; unsigned char *sb = cmd->sense_buffer; + unsigned char ata_op = tf->command; memset(sb, 0, SCSI_SENSE_BUFFERSIZE); @@ -633,8 +643,9 @@ void ata_gen_fixed_sense(struct ata_queu * onto sense key, asc & ascq. */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { + printk(KERN_WARNING "ata_gen_fixed_sense: failed ata_op=0x%02x\n", ata_op); ata_to_sense_error(qc->ap->id, tf->command, tf->feature, - &sb[2], &sb[12], &sb[13]); + &sb[2], &sb[12], &sb[13],qc); sb[2] &= 0x0f; } ^ permalink raw reply [flat|nested] 147+ messages in thread
* LibPATA code issues / 2.6.17.3 (What is the next step?) 2006-07-10 15:45 ` Justin Piszcz @ 2006-07-11 13:28 ` Justin Piszcz 2006-07-11 16:12 ` Alan Cox 2006-07-14 17:16 ` Mark Lord 2006-07-14 17:14 ` Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! Mark Lord 1 sibling, 2 replies; 147+ messages in thread From: Justin Piszcz @ 2006-07-11 13:28 UTC (permalink / raw) To: Alan Cox Cc: Mark Lord, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list Alan/Jeff/Mark, Is there anything else I can do to further troubleshoot this problem now that we have the failed opcode(s)? Again, there is never any corruption on these drives, so it is more of an annoyance than anything else. Other people also have this problem with these drives if you search Google but I am not sure they are aware of where to report their errors/problems. opcode=0x35 & opcode=0xca Justin. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.17.3 (What is the next step?) 2006-07-11 13:28 ` LibPATA code issues / 2.6.17.3 (What is the next step?) Justin Piszcz @ 2006-07-11 16:12 ` Alan Cox 2006-07-12 22:10 ` David Greaves 2006-07-14 17:16 ` Mark Lord 1 sibling, 1 reply; 147+ messages in thread From: Alan Cox @ 2006-07-11 16:12 UTC (permalink / raw) To: Justin Piszcz Cc: Mark Lord, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list Ar Maw, 2006-07-11 am 09:28 -0400, ysgrifennodd Justin Piszcz: > Alan/Jeff/Mark, > > Is there anything else I can do to further troubleshoot this problem now > that we have the failed opcode(s)? Again, there is never any corruption > on these drives, so it is more of an annoyance than anything else. Nothing strikes me so far other than the data not making sense. Possibly it will become clearer later if/when we see other examples. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.17.3 (What is the next step?) 2006-07-11 16:12 ` Alan Cox @ 2006-07-12 22:10 ` David Greaves 2006-07-12 22:29 ` Justin Piszcz 2006-07-13 10:55 ` Erik Mouw 0 siblings, 2 replies; 147+ messages in thread From: David Greaves @ 2006-07-12 22:10 UTC (permalink / raw) To: Alan Cox Cc: Justin Piszcz, Mark Lord, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list Alan Cox wrote: > Ar Maw, 2006-07-11 am 09:28 -0400, ysgrifennodd Justin Piszcz: >> Alan/Jeff/Mark, >> >> Is there anything else I can do to further troubleshoot this problem now >> that we have the failed opcode(s)? Again, there is never any corruption >> on these drives, so it is more of an annoyance than anything else. > > Nothing strikes me so far other than the data not making sense. Possibly > it will become clearer later if/when we see other examples. For me it's SMART related. smartctl -data -o on /dev/sda reliably gets a similar message. Justin - does this smartctl command trigger a message for you? I've been mailing on and off since January-ish. (http://marc.theaimsgroup.com/?l=linux-ide&w=2&r=7&s=libpata&q=b) Back in March I was running 2.6.16 (with a different version of Mark's opcode patch) and I sent an email with the following info: dmesg: ata1: translated op=0x28 cmd=0x25 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x04 { DriveStatusError } Does that help with the diagnosis? Also see my emails: SMART on SATA reporting errors? http://marc.theaimsgroup.com/?l=linux-ide&m=113933732903205&w=2 I did reply but got no response so I assumed I was just so far off base that I was being ignored :) David ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.17.3 (What is the next step?) 2006-07-12 22:10 ` David Greaves @ 2006-07-12 22:29 ` Justin Piszcz 2006-07-14 15:33 ` David Greaves 2006-07-13 10:55 ` Erik Mouw 1 sibling, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-07-12 22:29 UTC (permalink / raw) To: David Greaves Cc: Alan Cox, Mark Lord, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list [-- Attachment #1: Type: TEXT/PLAIN, Size: 1513 bytes --] Unfortunately not, the correct patch you need is attached to get the ata_op code, against 2.6.17.3. On Wed, 12 Jul 2006, David Greaves wrote: > Alan Cox wrote: >> Ar Maw, 2006-07-11 am 09:28 -0400, ysgrifennodd Justin Piszcz: >>> Alan/Jeff/Mark, >>> >>> Is there anything else I can do to further troubleshoot this problem now >>> that we have the failed opcode(s)? Again, there is never any corruption >>> on these drives, so it is more of an annoyance than anything else. >> >> Nothing strikes me so far other than the data not making sense. Possibly >> it will become clearer later if/when we see other examples. > > For me it's SMART related. > > smartctl -data -o on /dev/sda reliably gets a similar message. > Justin - does this smartctl command trigger a message for you? > > I've been mailing on and off since January-ish. > (http://marc.theaimsgroup.com/?l=linux-ide&w=2&r=7&s=libpata&q=b) > > Back in March I was running 2.6.16 (with a different version of Mark's > opcode patch) and I sent an email with the following info: > > dmesg: > ata1: translated op=0x28 cmd=0x25 ATA stat/err 0x51/04 to SCSI > SK/ASC/ASCQ 0xb/00/00 > ata1: status=0x51 { DriveReady SeekComplete Error } > ata1: error=0x04 { DriveStatusError } > > Does that help with the diagnosis? > > Also see my emails: SMART on SATA reporting errors? > http://marc.theaimsgroup.com/?l=linux-ide&m=113933732903205&w=2 > > I did reply but got no response so I assumed I was just so far off base > that I was being ignored :) > > David > [-- Attachment #2: Type: TEXT/PLAIN, Size: 2533 bytes --] diff -uprN linux-2.6.17.3/drivers/scsi/libata-scsi.c linux-2.6.17.3-diff/drivers/scsi/libata-scsi.c --- linux-2.6.17.3/drivers/scsi/libata-scsi.c 2006-06-30 13:37:38.000000000 -0400 +++ linux-2.6.17.3-diff/drivers/scsi/libata-scsi.c 2006-07-09 16:31:45.665112000 -0400 @@ -428,10 +428,16 @@ int ata_scsi_device_suspend(struct scsi_ * spin_lock_irqsave(host_set lock) */ void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc, - u8 *ascq) + u8 *ascq, struct ata_queued_cmd *qc) { int i; + struct scsi_cmnd *cmd = qc->scsicmd; + struct ata_taskfile *tf = &qc->tf; + unsigned char *sb = cmd->sense_buffer; + unsigned char *desc = sb + 8; + unsigned char ata_op = tf->command; + /* Based on the 3ware driver translation table */ static const unsigned char sense_table[][4] = { /* BBD|ECC|ID|MAR */ @@ -520,6 +526,7 @@ void ata_to_sense_error(unsigned id, u8 printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to " "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err, *sk, *asc, *ascq); + printk(KERN_ERR "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); return; } @@ -542,6 +549,7 @@ void ata_gen_ata_desc_sense(struct ata_q struct ata_taskfile *tf = &qc->tf; unsigned char *sb = cmd->sense_buffer; unsigned char *desc = sb + 8; + unsigned char ata_op = tf->command; memset(sb, 0, SCSI_SENSE_BUFFERSIZE); @@ -558,8 +566,9 @@ void ata_gen_ata_desc_sense(struct ata_q * onto sense key, asc & ascq. */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); ata_to_sense_error(qc->ap->id, tf->command, tf->feature, - &sb[1], &sb[2], &sb[3]); + &sb[1], &sb[2], &sb[3],qc); sb[1] &= 0x0f; } @@ -617,6 +626,7 @@ void ata_gen_fixed_sense(struct ata_queu struct scsi_cmnd *cmd = qc->scsicmd; struct ata_taskfile *tf = &qc->tf; unsigned char *sb = cmd->sense_buffer; + unsigned char ata_op = tf->command; memset(sb, 0, SCSI_SENSE_BUFFERSIZE); @@ -633,8 +643,9 @@ void ata_gen_fixed_sense(struct ata_queu * onto sense key, asc & ascq. */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { + printk(KERN_WARNING "ata_gen_fixed_sense: failed ata_op=0x%02x\n", ata_op); ata_to_sense_error(qc->ap->id, tf->command, tf->feature, - &sb[2], &sb[12], &sb[13]); + &sb[2], &sb[12], &sb[13],qc); sb[2] &= 0x0f; } ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.17.3 (What is the next step?) 2006-07-12 22:29 ` Justin Piszcz @ 2006-07-14 15:33 ` David Greaves 0 siblings, 0 replies; 147+ messages in thread From: David Greaves @ 2006-07-14 15:33 UTC (permalink / raw) To: Alan Cox Cc: Justin Piszcz, Mark Lord, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list, htejun Justin Piszcz wrote: > On Wed, 12 Jul 2006, David Greaves wrote: > >> Alan Cox wrote: >>> Ar Maw, 2006-07-11 am 09:28 -0400, ysgrifennodd Justin Piszcz: >>>> Alan/Jeff/Mark, >>>> >>>> Is there anything else I can do to further troubleshoot this problem >>>> now >>>> that we have the failed opcode(s)? Again, there is never any >>>> corruption >>>> on these drives, so it is more of an annoyance than anything else. >>> >>> Nothing strikes me so far other than the data not making sense. Possibly >>> it will become clearer later if/when we see other examples. >> >> For me it's SMART related. >> >> smartctl -data -o on /dev/sda reliably gets a similar message. >> Justin - does this smartctl command trigger a message for you? >> >> I've been mailing on and off since January-ish. >> (http://marc.theaimsgroup.com/?l=linux-ide&w=2&r=7&s=libpata&q=b) >> >> Back in March I was running 2.6.16 (with a different version of Mark's >> opcode patch) and I sent an email with the following info: >> > Unfortunately not, the correct patch you need is attached to get the > ata_op code, against 2.6.17.3. [mutter, mutter, getting a teeny bit fed up with applying the same diagnostic patch (thanks Mark) and reporting this and getting no real feedback (apart from Erik - ta - who was off base, it doesn't appear to be BIOS and here's the pair of commands :) ... Ok, added Tejun to the list since he's been doing EH for libata and this is some kind of E that needs better H] 2.6.17.3 with op-code patch smartctl -data --smart=on /dev/sda no dmesg output smartctl -data -o on /dev/sda dmesg: ata1: PIO error ata_gen_ata_desc_sense: failed ata_op=0xb0 ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata_gen_ata_desc_sense: failed ata_op=0x51 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x04 { DriveStatusError } ata1: PIO error ata_gen_ata_desc_sense: failed ata_op=0xb0 ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata_gen_ata_desc_sense: failed ata_op=0x51 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x04 { DriveStatusError } ata1: PIO error ata_gen_ata_desc_sense: failed ata_op=0xb0 ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata_gen_ata_desc_sense: failed ata_op=0x51 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x04 { DriveStatusError } ata1: PIO error ata_gen_ata_desc_sense: failed ata_op=0xb0 ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata_gen_ata_desc_sense: failed ata_op=0x51 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x04 { DriveStatusError } ata1: PIO error ata_gen_ata_desc_sense: failed ata_op=0xb0 ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata_gen_ata_desc_sense: failed ata_op=0x51 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x04 { DriveStatusError } ata1: PIO error ata_gen_ata_desc_sense: failed ata_op=0xb0 ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata_gen_ata_desc_sense: failed ata_op=0x51 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x04 { DriveStatusError } David -- ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.17.3 (What is the next step?) 2006-07-12 22:10 ` David Greaves 2006-07-12 22:29 ` Justin Piszcz @ 2006-07-13 10:55 ` Erik Mouw 1 sibling, 0 replies; 147+ messages in thread From: Erik Mouw @ 2006-07-13 10:55 UTC (permalink / raw) To: David Greaves Cc: Alan Cox, Justin Piszcz, Mark Lord, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list On Wed, Jul 12, 2006 at 11:10:59PM +0100, David Greaves wrote: > Alan Cox wrote: > > Ar Maw, 2006-07-11 am 09:28 -0400, ysgrifennodd Justin Piszcz: > >> Alan/Jeff/Mark, > >> > >> Is there anything else I can do to further troubleshoot this problem now > >> that we have the failed opcode(s)? Again, there is never any corruption > >> on these drives, so it is more of an annoyance than anything else. > > > > Nothing strikes me so far other than the data not making sense. Possibly > > it will become clearer later if/when we see other examples. > > For me it's SMART related. > > smartctl -data -o on /dev/sda reliably gets a similar message. > Justin - does this smartctl command trigger a message for you? In that case SMART just isn't enabled. smartctl -d ata --smart=on /dev/sda should make those messages go away. Some BIOSes have a setting to enable/disable SMART, though the option is usually badly documented (hey, what do you expect from BIOS writers). Erik -- +-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 -- | Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.17.3 (What is the next step?) 2006-07-11 13:28 ` LibPATA code issues / 2.6.17.3 (What is the next step?) Justin Piszcz 2006-07-11 16:12 ` Alan Cox @ 2006-07-14 17:16 ` Mark Lord 2006-07-14 17:18 ` Justin Piszcz 1 sibling, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-07-14 17:16 UTC (permalink / raw) To: Justin Piszcz Cc: Alan Cox, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list Justin Piszcz wrote: > > opcode=0x35 & opcode=0xca Those are non-DMA WRITE opcodes. Using PIO for I/O is pretty rare these days, so I'm betting that this is not a hard disk device -- compactflash? -ml ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.17.3 (What is the next step?) 2006-07-14 17:16 ` Mark Lord @ 2006-07-14 17:18 ` Justin Piszcz 2006-07-14 17:39 ` Mark Lord 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-07-14 17:18 UTC (permalink / raw) To: Mark Lord Cc: Alan Cox, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list They are Western Digital 400* drives. [4294678.049000] Vendor: ATA Model: WDC WD4000KD-00N Rev: 01.0 [4294678.050000] Vendor: ATA Model: WDC WD4000KD-00N Rev: 01.0 On a SiL controller, it also happens when they are on a promise controller too. On Fri, 14 Jul 2006, Mark Lord wrote: > Justin Piszcz wrote: >> >> opcode=0x35 & opcode=0xca > > Those are non-DMA WRITE opcodes. Using PIO for I/O is pretty rare these > days, > so I'm betting that this is not a hard disk device -- compactflash? > > -ml > ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.17.3 (What is the next step?) 2006-07-14 17:18 ` Justin Piszcz @ 2006-07-14 17:39 ` Mark Lord 2006-07-14 18:18 ` Justin Piszcz 2006-07-14 20:02 ` Mark Lord 0 siblings, 2 replies; 147+ messages in thread From: Mark Lord @ 2006-07-14 17:39 UTC (permalink / raw) To: Justin Piszcz Cc: Alan Cox, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list Justin Piszcz wrote: > They are Western Digital 400* drives. > > [4294678.049000] Vendor: ATA Model: WDC WD4000KD-00N Rev: 01.0 > [4294678.050000] Vendor: ATA Model: WDC WD4000KD-00N Rev: 01.0 > > On a SiL controller, it also happens when they are on a promise > controller too. > > On Fri, 14 Jul 2006, Mark Lord wrote: > >> Justin Piszcz wrote: >>> >>> opcode=0x35 & opcode=0xca >> >> Those are non-DMA WRITE opcodes. Using PIO for I/O is pretty rare >> these days, >> so I'm betting that this is not a hard disk device -- compactflash? Okay. So why are we issuing PIO WRITE commands to drives that obviously should only be sent DMA commands by libata? Perhaps that's the bug. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.17.3 (What is the next step?) 2006-07-14 17:39 ` Mark Lord @ 2006-07-14 18:18 ` Justin Piszcz 2006-07-14 20:02 ` Mark Lord 1 sibling, 0 replies; 147+ messages in thread From: Justin Piszcz @ 2006-07-14 18:18 UTC (permalink / raw) To: Mark Lord Cc: Alan Cox, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list On Fri, 14 Jul 2006, Mark Lord wrote: > Justin Piszcz wrote: >> They are Western Digital 400* drives. >> >> [4294678.049000] Vendor: ATA Model: WDC WD4000KD-00N Rev: 01.0 >> [4294678.050000] Vendor: ATA Model: WDC WD4000KD-00N Rev: 01.0 >> >> On a SiL controller, it also happens when they are on a promise controller >> too. >> >> On Fri, 14 Jul 2006, Mark Lord wrote: >> >>> Justin Piszcz wrote: >>>> >>>> opcode=0x35 & opcode=0xca >>> >>> Those are non-DMA WRITE opcodes. Using PIO for I/O is pretty rare these >>> days, >>> so I'm betting that this is not a hard disk device -- compactflash? > > Okay. So why are we issuing PIO WRITE commands to drives that > obviously should only be sent DMA commands by libata? > > Perhaps that's the bug. > Jeff/Alan -- ? Could this be it? ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.17.3 (What is the next step?) 2006-07-14 17:39 ` Mark Lord 2006-07-14 18:18 ` Justin Piszcz @ 2006-07-14 20:02 ` Mark Lord 1 sibling, 0 replies; 147+ messages in thread From: Mark Lord @ 2006-07-14 20:02 UTC (permalink / raw) To: Justin Piszcz Cc: Alan Cox, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list Mark Lord wrote: > Justin Piszcz wrote: >> They are Western Digital 400* drives. >> >> [4294678.049000] Vendor: ATA Model: WDC WD4000KD-00N Rev: 01.0 >> [4294678.050000] Vendor: ATA Model: WDC WD4000KD-00N Rev: 01.0 >> >> On a SiL controller, it also happens when they are on a promise >> controller too. >> >> On Fri, 14 Jul 2006, Mark Lord wrote: >> >>> Justin Piszcz wrote: >>>> >>>> opcode=0x35 & opcode=0xca >>> >>> Those are non-DMA WRITE opcodes. Using PIO for I/O is pretty rare >>> these days, >>> so I'm betting that this is not a hard disk device -- compactflash? > > Okay. So why are we issuing PIO WRITE commands to drives that > obviously should only be sent DMA commands by libata? > > Perhaps that's the bug. Oh wait.. I remember this.. No, those are DMA commands, despite the misleading libata name for them. We went through this before last spring.. Okay. So I wonder what's really going on. The next step would be to instrument the interrupt handler, so that when it sees bad-status, it dumps out the stat/err values right then and there, before anything else can muck with them. It might also be good to have it dump out the controller engine's DMA status/err values, assuming the controller has registers for those. Then we should get a better picture of what's going on. Assuming the drives aren't lying to us (a perfectly good assumption here), then the controller must be aborting the transfer unexpectedly. Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! 2006-07-10 15:45 ` Justin Piszcz 2006-07-11 13:28 ` LibPATA code issues / 2.6.17.3 (What is the next step?) Justin Piszcz @ 2006-07-14 17:14 ` Mark Lord 2006-07-14 17:17 ` Justin Piszcz 1 sibling, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-07-14 17:14 UTC (permalink / raw) To: Justin Piszcz Cc: Alan Cox, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list >Jeff/Mark, from these errors can we reach a consensus as to the cause >of these errors and how to eliminate the problem? It is up to the current subsystem maintainer to help investigate this and come up with a solution, in cooperation with eager testers such as yourself. I gave away my kernel subsystem maintainer's duties about seven years ago, because it just takes too much time to do it really well. In this case, I'm proving a tiny amount of help, simply because I don't see anyone else even trying, and there is obviously something wrong here. Now.. your hacked version of my simple patch is incorrect. It is frequently dumping out ata_op=0x51, which is obviously the ATA status value not the original ATA command byte. But ignoring that, we also see some valid output from where it does trip the code from my original patches: ata_op=0x35. So, the drive is rejecting an LBA48 WRITE operation, which should happen only if the drive does not have LBA48 support. Now, I know you posted all of this nice info months ago, but let's see it again now, for the exact drive that is generating that specific message. We need to see the output from "hdparm --Istdout /dev/sdX" for that drive. Thanks ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! 2006-07-14 17:14 ` Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! Mark Lord @ 2006-07-14 17:17 ` Justin Piszcz 2006-07-14 17:37 ` Mark Lord 0 siblings, 1 reply; 147+ messages in thread From: Justin Piszcz @ 2006-07-14 17:17 UTC (permalink / raw) To: Mark Lord Cc: Alan Cox, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list On Fri, 14 Jul 2006, Mark Lord wrote: >> Jeff/Mark, from these errors can we reach a consensus as to the cause >> of these errors and how to eliminate the problem? > > It is up to the current subsystem maintainer to help investigate this > and come up with a solution, in cooperation with eager testers such > as yourself. I gave away my kernel subsystem maintainer's duties about > seven years ago, because it just takes too much time to do it really well. > > In this case, I'm proving a tiny amount of help, simply because I don't > see anyone else even trying, and there is obviously something wrong here. > > Now.. your hacked version of my simple patch is incorrect. It is frequently > dumping out ata_op=0x51, which is obviously the ATA status value not the > original ATA command byte. > > But ignoring that, we also see some valid output from where it does trip > the code from my original patches: ata_op=0x35. > > So, the drive is rejecting an LBA48 WRITE operation, which should happen > only if the drive does not have LBA48 support. Now, I know you posted all > of this nice info months ago, but let's see it again now, for the exact > drive that is generating that specific message. We need to see the output > from "hdparm --Istdout /dev/sdX" for that drive. > > Thanks > Here it is: They are identical disks (the WD 400KD), both show up as 373GB (formatted): p34:~# hdparm --Istdout /dev/sdc /dev/sdc: IO_support = 0 (default 16-bit) readonly = 0 (off) readahead = 256 (on) geometry = 48641/255/63, sectors = 781422768, start = 0 0c5a 3fff c837 0010 0000 0000 003f 0000 0000 0000 2020 2020 2020 2020 2020 2020 334e 4631 514a 3345 0000 8000 0004 332e 4141 4820 2020 5354 3334 3030 3633 3341 5320 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 8010 0000 2f00 4000 0200 0200 0007 3fff 0010 003f fc10 00fb 0010 ffff 0fff 0000 0007 0003 0078 0078 00f0 0078 0000 0000 0000 0000 0000 0000 001f 0502 0000 0040 0040 00fe 0000 346b 7d01 4023 3469 3c01 4023 407f 0000 0000 fefe fffe 0000 fe00 0000 0000 0000 0000 0000 90b0 2e93 0000 0000 0000 0000 4000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 90b0 2e93 90b0 2e93 2020 0002 02b6 0002 008a 3c06 3c0a 0000 07c6 0100 0800 100f 3000 0002 0080 0000 0000 00a0 0202 0000 0404 0000 0000 0000 0000 1000 000b 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 9da5 p34:~# hdparm --Istdout /dev/sdd /dev/sdd: IO_support = 0 (default 16-bit) readonly = 0 (off) readahead = 256 (on) geometry = 48641/255/63, sectors = 781422768, start = 0 427a 3fff c837 0010 e100 0258 003f 0000 0000 000e 2020 2020 2057 442d 574d 414d 5931 3131 3335 3636 0003 8000 003f 3031 2e30 3641 3031 5744 4320 5744 3430 3030 4b44 2d30 304e 4142 3020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 8010 0000 2f00 4001 0280 0000 0007 3fff 0010 003f fc10 00fb 0100 ffff 0fff 0000 0007 0003 0078 0078 0078 0078 0000 0000 0000 0000 0000 0000 001f 0702 0000 0044 0040 00fe 001d 746b 7f01 4023 7469 3c01 4023 207f 0000 0000 0000 0000 0000 80fe 0000 0000 0000 0000 0000 90b0 2e93 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0141 0000 0000 0000 075a 0000 0000 0000 0000 0000 0000 0000 0000 0002 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0087 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 103f 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 48a5 p34:~# ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! 2006-07-14 17:17 ` Justin Piszcz @ 2006-07-14 17:37 ` Mark Lord 2006-07-14 18:17 ` Justin Piszcz 0 siblings, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-07-14 17:37 UTC (permalink / raw) To: Justin Piszcz Cc: Alan Cox, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list Justin Piszcz wrote: > > > On Fri, 14 Jul 2006, Mark Lord wrote: > >> So, the drive is rejecting an LBA48 WRITE operation, which should happen >> only if the drive does not have LBA48 support. Now, I know you posted >> all >> of this nice info months ago, but let's see it again now, for the exact >> drive that is generating that specific message. We need to see the >> output >> from "hdparm --Istdout /dev/sdX" for that drive. >> >> Thanks >> > > Here it is: > > They are identical disks (the WD 400KD), both show up as 373GB (formatted): Which *exact* unit generated the WRITE errors you posted about? ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! 2006-07-14 17:37 ` Mark Lord @ 2006-07-14 18:17 ` Justin Piszcz 0 siblings, 0 replies; 147+ messages in thread From: Justin Piszcz @ 2006-07-14 18:17 UTC (permalink / raw) To: Mark Lord Cc: Alan Cox, Jeff Garzik, Sander, linux-kernel, IDE/ATA development list On Fri, 14 Jul 2006, Mark Lord wrote: > Justin Piszcz wrote: >> >> >> On Fri, 14 Jul 2006, Mark Lord wrote: >> >>> So, the drive is rejecting an LBA48 WRITE operation, which should happen >>> only if the drive does not have LBA48 support. Now, I know you posted all >>> of this nice info months ago, but let's see it again now, for the exact >>> drive that is generating that specific message. We need to see the output >>> from "hdparm --Istdout /dev/sdX" for that drive. >>> >>> Thanks >>> >> >> Here it is: >> >> They are identical disks (the WD 400KD), both show up as 373GB (formatted): > > > Which *exact* unit generated the WRITE errors you posted about? > Both have generated the errors, they are identical drives and firmware. ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4
@ 2006-03-01 19:00 Nicolas Mailhot
2006-03-01 19:22 ` Mark Lord
0 siblings, 1 reply; 147+ messages in thread
From: Nicolas Mailhot @ 2006-03-01 19:00 UTC (permalink / raw)
To: edmudama; +Cc: linux-ide, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 875 bytes --]
> those drives should support all FUA opcodes properly, both queued and unqueued
>
> On 2/28/06, Jeff Garzik <jgarzik@pobox.com> wrote:
> > Mark Lord wrote:
> > > David Greaves wrote:
> > >
> > >>
> > >> scsi1 : sata_sil
> > >> Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC
> > >> Type: Direct-Access ANSI SCSI revision: 05
> > >> Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC
> > >> Type: Direct-Access ANSI SCSI revision: 05
How about the drives that got blacklisted following :
http://bugzilla.kernel.org/show_bug.cgi?id=5914 ?
and
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951 ?
Device Model: Maxtor 6L300S0
Firmware Version: BANC1G10
on
Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
Regards,
--
Nicolas Mailhot
[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 199 bytes --]
^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 19:00 LibPATA code issues / 2.6.15.4 Nicolas Mailhot @ 2006-03-01 19:22 ` Mark Lord 2006-03-01 23:12 ` Nicolas Mailhot 0 siblings, 1 reply; 147+ messages in thread From: Mark Lord @ 2006-03-01 19:22 UTC (permalink / raw) To: Nicolas Mailhot; +Cc: edmudama, linux-ide, linux-kernel Nicolas Mailhot wrote: >> > How about the drives that got blacklisted following : > http://bugzilla.kernel.org/show_bug.cgi?id=5914 ? > and > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951 ? > > Device Model: Maxtor 6L300S0 > Firmware Version: BANC1G10 > > on Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02) Mmm.. somebody with one of those controllers should check to see if *any* drives work with FUA, and blacklist the controller instead of the drives if everything is failing. Cheers ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 19:22 ` Mark Lord @ 2006-03-01 23:12 ` Nicolas Mailhot 2006-03-01 23:31 ` Jeff Garzik 2006-03-02 1:19 ` Eric D. Mudama 0 siblings, 2 replies; 147+ messages in thread From: Nicolas Mailhot @ 2006-03-01 23:12 UTC (permalink / raw) To: Mark Lord; +Cc: edmudama, linux-ide, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1119 bytes --] Le mercredi 01 mars 2006 à 14:22 -0500, Mark Lord a écrit : > Nicolas Mailhot wrote: > >> > > How about the drives that got blacklisted following : > > http://bugzilla.kernel.org/show_bug.cgi?id=5914 ? > > and > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951 ? > > > > Device Model: Maxtor 6L300S0 > > Firmware Version: BANC1G10 > > > > on Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02) > > Mmm.. somebody with one of those controllers should check > to see if *any* drives work with FUA, and blacklist the controller > instead of the drives if everything is failing. I'm a someone with such a controller (that's my boog here) But I only have these drives. So I can only confirm the combo it deadly. (I could possibly try to plug one on the nforce4 controller, not sure if extracting the box from the tangle of cables and hardware he's part of is worth it. sata_nv is rev-eng, while the siI docs are public, right?) I do suspect Eric D. Mudama knows if the problem is on the hard-drive side though Regards, -- Nicolas Mailhot [-- Attachment #2: Ceci est une partie de message numériquement signée --] [-- Type: application/pgp-signature, Size: 199 bytes --] ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 23:12 ` Nicolas Mailhot @ 2006-03-01 23:31 ` Jeff Garzik 2006-03-02 1:19 ` Eric D. Mudama 1 sibling, 0 replies; 147+ messages in thread From: Jeff Garzik @ 2006-03-01 23:31 UTC (permalink / raw) To: Nicolas Mailhot; +Cc: Mark Lord, edmudama, linux-ide, linux-kernel Nicolas Mailhot wrote: > is worth it. sata_nv is rev-eng, while the siI docs are public, right?) sata_nv was written by NVIDIA. Jeff ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 23:12 ` Nicolas Mailhot 2006-03-01 23:31 ` Jeff Garzik @ 2006-03-02 1:19 ` Eric D. Mudama 2006-03-02 1:39 ` Eric D. Mudama 1 sibling, 1 reply; 147+ messages in thread From: Eric D. Mudama @ 2006-03-02 1:19 UTC (permalink / raw) To: Nicolas Mailhot; +Cc: Mark Lord, linux-ide, linux-kernel On 3/1/06, Nicolas Mailhot <nicolas.mailhot@gmail.com> wrote: > Le mercredi 01 mars 2006 à 14:22 -0500, Mark Lord a écrit : > > Nicolas Mailhot wrote: > > >> > > > How about the drives that got blacklisted following : > > > http://bugzilla.kernel.org/show_bug.cgi?id=5914 ? > > > and > > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951 ? > > > > > > Device Model: Maxtor 6L300S0 > > > Firmware Version: BANC1G10 > > > > > > on Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02) > > > > Mmm.. somebody with one of those controllers should check > > to see if *any* drives work with FUA, and blacklist the controller > > instead of the drives if everything is failing. > > I'm a someone with such a controller (that's my boog here) > But I only have these drives. > So I can only confirm the combo it deadly. > (I could possibly try to plug one on the nforce4 controller, not sure if > extracting the box from the tangle of cables and hardware he's part of > is worth it. sata_nv is rev-eng, while the siI docs are public, right?) > > I do suspect Eric D. Mudama knows if the problem is on the hard-drive > side though > > Regards, > > -- > Nicolas Mailhot > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.1 (GNU/Linux) > > iEYEABECAAYFAkQGKmoACgkQI2bVKDsp8g0veQCggJkweq1nQn7YNSEIobOHitk0 > QXsAn0TnHI/6LBG9nezBnS0MTskLml0W > =s1TM > -----END PGP SIGNATURE----- > I didn't know offhand so we plugged in a bus analzyer and took a look here in the lab... We didn't have a 3114 lying around, but issuing the Write DMA FUA (0x3D) opcode on a 3112 resulted in a D0h soft hang. I think they're related (4-port vs 2-port). Looking at the bus trace, the command is issued on the SATA bus, the drive generates a DMA Activate FIS which is accepted by the 3112, and then the 3112 generates a Data Payload FIS (46h) with no contents. The first DWORD of the payload is a HOLD primitive, to which the device promptly responds with HOLDA, and the two are in a soft bus lock and will sit forever. No data is ever generated by the host (stopped capture after 4 seconds). I believe this core should not be part of the FUA whitelist. If I remember correctly, there are other implementations out there with similar limitations to opcodes this "new" to ATA. --eric ^ permalink raw reply [flat|nested] 147+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-02 1:19 ` Eric D. Mudama @ 2006-03-02 1:39 ` Eric D. Mudama 0 siblings, 0 replies; 147+ messages in thread From: Eric D. Mudama @ 2006-03-02 1:39 UTC (permalink / raw) To: Nicolas Mailhot; +Cc: Mark Lord, linux-ide, linux-kernel On 3/1/06, Eric D. Mudama <edmudama@gmail.com> wrote: > I believe this core should not be part of the FUA whitelist. If I > remember correctly, there are other implementations out there with > similar limitations to opcodes this "new" to ATA. That being said, I see from https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951 that a blacklisting of some Maxtor drives for this issue has supposedly occurred or been pushed and accepted "upstream" in git .... For the obvious (selfish) reasons, I'd like to minimize the number of Maxtor drives that are blacklisted, as I don't believe this is a drive issue at all. If there's a drive model out there reporting support for FUA but screwing it up, I'm all ears as that's something I need to know about. If basic adapter functional testing is required for some of these low-level commands, then that might be something I can help with too (on a very limited scale), since we have access to ~100 different chipsets. --eric ^ permalink raw reply [flat|nested] 147+ messages in thread
end of thread, other threads:[~2006-07-14 20:02 UTC | newest] Thread overview: 147+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2006-02-14 9:48 LibPATA code issues / 2.6.15.4 Justin Piszcz 2006-02-14 14:50 ` Mark Lord 2006-02-14 16:27 ` David Greaves 2006-02-14 17:12 ` Justin Piszcz 2006-02-14 18:00 ` Mark Lord 2006-02-14 18:06 ` Justin Piszcz 2006-02-23 23:39 ` Justin Piszcz 2006-02-25 15:32 ` Mark Lord 2006-02-25 15:58 ` Justin Piszcz 2006-02-25 16:11 ` Jesper Juhl 2006-02-25 16:21 ` Mark Lord 2006-02-25 11:34 ` David Greaves 2006-02-25 16:20 ` Mark Lord 2006-02-25 17:45 ` Justin Piszcz 2006-02-25 18:28 ` Mark Lord 2006-02-25 18:55 ` Justin Piszcz 2006-02-25 19:29 ` Justin Piszcz 2006-02-25 19:53 ` David Greaves 2006-02-25 19:47 ` David Greaves 2006-02-26 2:27 ` Mark Lord 2006-02-26 9:56 ` David Greaves 2006-02-26 14:04 ` Mark Lord 2006-02-27 21:34 ` Mark Lord 2006-02-28 1:33 ` Tejun Heo 2006-02-28 1:46 ` Linus Torvalds 2006-02-28 2:07 ` Jeff Garzik 2006-02-28 2:14 ` Linus Torvalds 2006-02-28 2:52 ` Jeff Garzik 2006-02-28 3:36 ` Jeff Garzik 2006-02-28 4:11 ` Mark Lord 2006-02-28 10:30 ` Alan Cox 2006-02-28 8:03 ` Jens Axboe 2006-02-28 4:16 ` Mark Lord 2006-02-28 10:32 ` Alan Cox 2006-02-28 10:30 ` Justin Piszcz 2006-02-28 10:39 ` David Greaves 2006-02-28 14:37 ` Mark Lord 2006-02-28 21:04 ` Bill Davidsen 2006-03-08 2:57 ` Mark Lord 2006-03-08 3:18 ` Dave Jones 2006-03-08 3:23 ` Mark Lord 2006-03-08 15:37 ` Bill Davidsen 2006-02-28 14:38 ` Mark Lord 2006-02-28 15:16 ` Alan Cox 2006-03-01 17:33 ` David Greaves 2006-03-01 18:37 ` Alan Cox 2006-03-01 20:12 ` Phillip Susi 2006-03-08 16:46 ` Alan Cox 2006-02-28 15:31 ` Mark Lord 2006-02-28 15:34 ` Jeff Garzik 2006-02-28 16:57 ` Eric D. Mudama 2006-03-01 1:04 ` Mark Lord 2006-03-01 11:37 ` Justin Piszcz 2006-03-01 13:17 ` Justin Piszcz 2006-03-01 17:41 ` David Greaves 2006-03-01 17:46 ` Mark Lord 2006-03-01 18:12 ` David Greaves 2006-03-01 18:30 ` Mark Lord 2006-03-01 18:32 ` Justin Piszcz 2006-03-01 18:33 ` Justin Piszcz 2006-03-01 18:48 ` David Greaves 2006-03-01 19:49 ` David Greaves 2006-03-03 19:38 ` Justin Piszcz 2006-03-03 22:46 ` David Greaves 2006-03-04 14:25 ` Mark Lord 2006-03-06 6:13 ` David Greaves 2006-03-21 18:11 ` David Greaves 2006-03-22 15:23 ` David Greaves 2006-03-05 11:43 ` Justin Piszcz 2006-03-05 12:41 ` Justin Piszcz 2006-03-05 22:58 ` Mark Lord 2006-03-05 23:00 ` Mark Lord 2006-03-05 23:19 ` Justin Piszcz 2006-03-05 23:39 ` Jeff Garzik 2006-04-21 19:14 ` LibPATA code issues / 2.6.16 (previously, 2.6.15.x) Justin Piszcz 2006-04-21 19:18 ` Jeff Garzik 2006-04-21 19:28 ` Linus Torvalds 2006-04-21 22:46 ` Jeff Garzik 2006-04-22 0:05 ` Linus Torvalds 2006-05-06 15:09 ` [smartmontools-support]Re: " Leon Woestenberg 2006-05-07 12:44 ` Ingo Oeser 2006-06-11 11:13 ` Justin Piszcz 2006-03-01 19:06 ` LibPATA code issues / 2.6.15.4 Justin Piszcz 2006-03-01 19:28 ` Mark Lord 2006-03-01 19:35 ` Mark Lord 2006-03-01 19:38 ` Justin Piszcz 2006-03-01 19:41 ` Jeff Garzik 2006-02-26 12:27 ` James Courtier-Dutton 2006-02-26 12:55 ` David Greaves 2006-02-26 13:56 ` Mark Lord 2006-02-26 14:30 ` Kernel SeekCompleteErrors... Different from " James Courtier-Dutton 2006-02-26 17:03 ` Mark Lord 2006-02-26 17:13 ` Dr. David Alan Gilbert 2006-02-26 17:43 ` Alan Cox 2006-02-26 20:36 ` Mark Lord 2006-02-27 11:48 ` Alan Cox 2006-02-27 13:40 ` Mark Lord 2006-02-14 23:58 ` Justin Piszcz 2006-02-17 8:45 ` Jeff Garzik 2006-02-17 14:59 ` Mark Lord 2006-02-17 15:00 ` Justin Piszcz 2006-02-18 20:43 ` Sander 2006-02-18 21:42 ` Mark Lord 2006-02-18 21:51 ` Justin Piszcz 2006-02-19 7:14 ` Sander 2006-02-19 15:30 ` Mark Lord 2006-02-19 17:16 ` Sander 2006-07-06 23:08 ` Justin Piszcz 2006-07-07 13:08 ` Mark Lord 2006-07-07 13:24 ` Justin Piszcz 2006-07-07 13:43 ` Mark Lord 2006-07-07 13:48 ` Justin Piszcz 2006-07-07 14:01 ` Justin Piszcz 2006-07-07 14:35 ` Justin Piszcz 2006-07-07 18:53 ` Justin Piszcz 2006-07-07 19:19 ` Jeff Garzik 2006-07-07 19:28 ` Justin Piszcz [not found] ` <200607091224.31451.liml@rtr.ca> 2006-07-09 17:27 ` Justin Piszcz 2006-07-09 20:16 ` Justin Piszcz 2006-07-09 20:40 ` LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! Justin Piszcz 2006-07-09 20:46 ` Justin Piszcz 2006-07-09 21:05 ` Justin Piszcz 2006-07-09 22:03 ` Justin Piszcz 2006-07-10 13:59 ` Follow up? " Justin Piszcz 2006-07-10 15:33 ` Alan Cox 2006-07-10 15:45 ` Justin Piszcz 2006-07-11 13:28 ` LibPATA code issues / 2.6.17.3 (What is the next step?) Justin Piszcz 2006-07-11 16:12 ` Alan Cox 2006-07-12 22:10 ` David Greaves 2006-07-12 22:29 ` Justin Piszcz 2006-07-14 15:33 ` David Greaves 2006-07-13 10:55 ` Erik Mouw 2006-07-14 17:16 ` Mark Lord 2006-07-14 17:18 ` Justin Piszcz 2006-07-14 17:39 ` Mark Lord 2006-07-14 18:18 ` Justin Piszcz 2006-07-14 20:02 ` Mark Lord 2006-07-14 17:14 ` Follow up? LibPATA code issues / 2.6.15.4 (found the opcode=0x35)! Mark Lord 2006-07-14 17:17 ` Justin Piszcz 2006-07-14 17:37 ` Mark Lord 2006-07-14 18:17 ` Justin Piszcz 2006-03-01 19:00 LibPATA code issues / 2.6.15.4 Nicolas Mailhot 2006-03-01 19:22 ` Mark Lord 2006-03-01 23:12 ` Nicolas Mailhot 2006-03-01 23:31 ` Jeff Garzik 2006-03-02 1:19 ` Eric D. Mudama 2006-03-02 1:39 ` Eric D. Mudama
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).