* Re: LibPATA code issues / 2.6.15.4
@ 2006-03-01 19:00 Nicolas Mailhot
2006-03-01 19:22 ` Mark Lord
0 siblings, 1 reply; 131+ messages in thread
From: Nicolas Mailhot @ 2006-03-01 19:00 UTC (permalink / raw)
To: edmudama; +Cc: linux-ide, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 875 bytes --]
> those drives should support all FUA opcodes properly, both queued and unqueued
>
> On 2/28/06, Jeff Garzik <jgarzik@pobox.com> wrote:
> > Mark Lord wrote:
> > > David Greaves wrote:
> > >
> > >>
> > >> scsi1 : sata_sil
> > >> Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC
> > >> Type: Direct-Access ANSI SCSI revision: 05
> > >> Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC
> > >> Type: Direct-Access ANSI SCSI revision: 05
How about the drives that got blacklisted following :
http://bugzilla.kernel.org/show_bug.cgi?id=5914 ?
and
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951 ?
Device Model: Maxtor 6L300S0
Firmware Version: BANC1G10
on
Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
Regards,
--
Nicolas Mailhot
[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 199 bytes --]
^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 19:00 LibPATA code issues / 2.6.15.4 Nicolas Mailhot @ 2006-03-01 19:22 ` Mark Lord 2006-03-01 23:12 ` Nicolas Mailhot 0 siblings, 1 reply; 131+ messages in thread From: Mark Lord @ 2006-03-01 19:22 UTC (permalink / raw) To: Nicolas Mailhot; +Cc: edmudama, linux-ide, linux-kernel Nicolas Mailhot wrote: >> > How about the drives that got blacklisted following : > http://bugzilla.kernel.org/show_bug.cgi?id=5914 ? > and > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951 ? > > Device Model: Maxtor 6L300S0 > Firmware Version: BANC1G10 > > on Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02) Mmm.. somebody with one of those controllers should check to see if *any* drives work with FUA, and blacklist the controller instead of the drives if everything is failing. Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 19:22 ` Mark Lord @ 2006-03-01 23:12 ` Nicolas Mailhot 2006-03-01 23:31 ` Jeff Garzik 2006-03-02 1:19 ` Eric D. Mudama 0 siblings, 2 replies; 131+ messages in thread From: Nicolas Mailhot @ 2006-03-01 23:12 UTC (permalink / raw) To: Mark Lord; +Cc: edmudama, linux-ide, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1119 bytes --] Le mercredi 01 mars 2006 à 14:22 -0500, Mark Lord a écrit : > Nicolas Mailhot wrote: > >> > > How about the drives that got blacklisted following : > > http://bugzilla.kernel.org/show_bug.cgi?id=5914 ? > > and > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951 ? > > > > Device Model: Maxtor 6L300S0 > > Firmware Version: BANC1G10 > > > > on Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02) > > Mmm.. somebody with one of those controllers should check > to see if *any* drives work with FUA, and blacklist the controller > instead of the drives if everything is failing. I'm a someone with such a controller (that's my boog here) But I only have these drives. So I can only confirm the combo it deadly. (I could possibly try to plug one on the nforce4 controller, not sure if extracting the box from the tangle of cables and hardware he's part of is worth it. sata_nv is rev-eng, while the siI docs are public, right?) I do suspect Eric D. Mudama knows if the problem is on the hard-drive side though Regards, -- Nicolas Mailhot [-- Attachment #2: Ceci est une partie de message numériquement signée --] [-- Type: application/pgp-signature, Size: 199 bytes --] ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 23:12 ` Nicolas Mailhot @ 2006-03-01 23:31 ` Jeff Garzik 2006-03-02 1:19 ` Eric D. Mudama 1 sibling, 0 replies; 131+ messages in thread From: Jeff Garzik @ 2006-03-01 23:31 UTC (permalink / raw) To: Nicolas Mailhot; +Cc: Mark Lord, edmudama, linux-ide, linux-kernel Nicolas Mailhot wrote: > is worth it. sata_nv is rev-eng, while the siI docs are public, right?) sata_nv was written by NVIDIA. Jeff ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 23:12 ` Nicolas Mailhot 2006-03-01 23:31 ` Jeff Garzik @ 2006-03-02 1:19 ` Eric D. Mudama 2006-03-02 1:39 ` Eric D. Mudama 2006-03-02 1:56 ` FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) Jeff Garzik 1 sibling, 2 replies; 131+ messages in thread From: Eric D. Mudama @ 2006-03-02 1:19 UTC (permalink / raw) To: Nicolas Mailhot; +Cc: Mark Lord, linux-ide, linux-kernel On 3/1/06, Nicolas Mailhot <nicolas.mailhot@gmail.com> wrote: > Le mercredi 01 mars 2006 à 14:22 -0500, Mark Lord a écrit : > > Nicolas Mailhot wrote: > > >> > > > How about the drives that got blacklisted following : > > > http://bugzilla.kernel.org/show_bug.cgi?id=5914 ? > > > and > > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951 ? > > > > > > Device Model: Maxtor 6L300S0 > > > Firmware Version: BANC1G10 > > > > > > on Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02) > > > > Mmm.. somebody with one of those controllers should check > > to see if *any* drives work with FUA, and blacklist the controller > > instead of the drives if everything is failing. > > I'm a someone with such a controller (that's my boog here) > But I only have these drives. > So I can only confirm the combo it deadly. > (I could possibly try to plug one on the nforce4 controller, not sure if > extracting the box from the tangle of cables and hardware he's part of > is worth it. sata_nv is rev-eng, while the siI docs are public, right?) > > I do suspect Eric D. Mudama knows if the problem is on the hard-drive > side though > > Regards, > > -- > Nicolas Mailhot > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.1 (GNU/Linux) > > iEYEABECAAYFAkQGKmoACgkQI2bVKDsp8g0veQCggJkweq1nQn7YNSEIobOHitk0 > QXsAn0TnHI/6LBG9nezBnS0MTskLml0W > =s1TM > -----END PGP SIGNATURE----- > I didn't know offhand so we plugged in a bus analzyer and took a look here in the lab... We didn't have a 3114 lying around, but issuing the Write DMA FUA (0x3D) opcode on a 3112 resulted in a D0h soft hang. I think they're related (4-port vs 2-port). Looking at the bus trace, the command is issued on the SATA bus, the drive generates a DMA Activate FIS which is accepted by the 3112, and then the 3112 generates a Data Payload FIS (46h) with no contents. The first DWORD of the payload is a HOLD primitive, to which the device promptly responds with HOLDA, and the two are in a soft bus lock and will sit forever. No data is ever generated by the host (stopped capture after 4 seconds). I believe this core should not be part of the FUA whitelist. If I remember correctly, there are other implementations out there with similar limitations to opcodes this "new" to ATA. --eric ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-02 1:19 ` Eric D. Mudama @ 2006-03-02 1:39 ` Eric D. Mudama 2006-03-02 1:56 ` FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) Jeff Garzik 1 sibling, 0 replies; 131+ messages in thread From: Eric D. Mudama @ 2006-03-02 1:39 UTC (permalink / raw) To: Nicolas Mailhot; +Cc: Mark Lord, linux-ide, linux-kernel On 3/1/06, Eric D. Mudama <edmudama@gmail.com> wrote: > I believe this core should not be part of the FUA whitelist. If I > remember correctly, there are other implementations out there with > similar limitations to opcodes this "new" to ATA. That being said, I see from https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951 that a blacklisting of some Maxtor drives for this issue has supposedly occurred or been pushed and accepted "upstream" in git .... For the obvious (selfish) reasons, I'd like to minimize the number of Maxtor drives that are blacklisted, as I don't believe this is a drive issue at all. If there's a drive model out there reporting support for FUA but screwing it up, I'm all ears as that's something I need to know about. If basic adapter functional testing is required for some of these low-level commands, then that might be something I can help with too (on a very limited scale), since we have access to ~100 different chipsets. --eric ^ permalink raw reply [flat|nested] 131+ messages in thread
* FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 1:19 ` Eric D. Mudama 2006-03-02 1:39 ` Eric D. Mudama @ 2006-03-02 1:56 ` Jeff Garzik 2006-03-02 1:58 ` Jeff Garzik 1 sibling, 1 reply; 131+ messages in thread From: Jeff Garzik @ 2006-03-02 1:56 UTC (permalink / raw) To: Eric D. Mudama, Tejun Heo Cc: Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo Eric D. Mudama wrote: > I didn't know offhand so we plugged in a bus analzyer and took a look > here in the lab... We didn't have a 3114 lying around, but issuing the > Write DMA FUA (0x3D) opcode on a 3112 resulted in a D0h soft hang. I > think they're related (4-port vs 2-port). Looking at the public docs posted at http://gkernel.sourceforge.net/specs/sii/ ... FUA is not in the list of supported opcodes (Table 10-1). The 311x does have a facility that allows the driver to specify the command protocol associated with an unknown-to-the-chip opcode. Someone sufficiently interested could investigate using the VS Unlock and VS Set Command Protocol commands to patch in support (section 10.4.*). For libata, I think an ATA_FLAG_NO_FUA would be appropriate for situations like this... assume FUA is supported in the controller, and set a flag where it is not. Most chips will support FUA, either by design or by sheer luck. The ones that do not support FUA are the controllers that snoop the ATA command opcode, and internally choose the protocol based on that opcode. For such hardware, unknown opcodes will inevitably cause problems. Jeff ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 1:56 ` FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) Jeff Garzik @ 2006-03-02 1:58 ` Jeff Garzik 2006-03-02 2:20 ` Eric D. Mudama ` (2 more replies) 0 siblings, 3 replies; 131+ messages in thread From: Jeff Garzik @ 2006-03-02 1:58 UTC (permalink / raw) To: Jens Axboe, Eric D. Mudama, Tejun Heo Cc: Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo Jeff Garzik wrote: > For libata, I think an ATA_FLAG_NO_FUA would be appropriate for > situations like this... assume FUA is supported in the controller, and > set a flag where it is not. Most chips will support FUA, either by > design or by sheer luck. The ones that do not support FUA are the > controllers that snoop the ATA command opcode, and internally choose the > protocol based on that opcode. For such hardware, unknown opcodes will > inevitably cause problems. This also begs the question... what controller was being used, when the single Maxtor device listed in the blacklist was added? Perhaps it was a problem with the controller, not the device. Jeff ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 1:58 ` Jeff Garzik @ 2006-03-02 2:20 ` Eric D. Mudama 2006-03-02 2:46 ` Jeff Garzik 2006-03-02 16:05 ` Nicolas Mailhot 2006-03-02 7:22 ` Jens Axboe 2006-03-02 15:59 ` Nicolas Mailhot 2 siblings, 2 replies; 131+ messages in thread From: Eric D. Mudama @ 2006-03-02 2:20 UTC (permalink / raw) To: Jeff Garzik Cc: Jens Axboe, Tejun Heo, Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo On 3/1/06, Jeff Garzik <jgarzik@pobox.com> wrote: > This also begs the question... what controller was being used, when the > single Maxtor device listed in the blacklist was added? Perhaps it was > a problem with the controller, not the device. > > Jeff As reported here: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951 the controller was a 3114, and the bug was "fixed" by blacklisting his Maxtor drive's FUA support. I'd like Maxtor drives to be un-blacklisted if possible. --eric ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 2:20 ` Eric D. Mudama @ 2006-03-02 2:46 ` Jeff Garzik 2006-03-02 3:00 ` Eric D. Mudama 2006-03-02 16:03 ` Nicolas Mailhot 2006-03-02 16:05 ` Nicolas Mailhot 1 sibling, 2 replies; 131+ messages in thread From: Jeff Garzik @ 2006-03-02 2:46 UTC (permalink / raw) To: Eric D. Mudama Cc: Jens Axboe, Tejun Heo, Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo Eric D. Mudama wrote: > On 3/1/06, Jeff Garzik <jgarzik@pobox.com> wrote: > >>This also begs the question... what controller was being used, when the >>single Maxtor device listed in the blacklist was added? Perhaps it was >>a problem with the controller, not the device. >> >> Jeff > > > As reported here: > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951 > > the controller was a 3114, and the bug was "fixed" by blacklisting his > Maxtor drive's FUA support. I'd like Maxtor drives to be > un-blacklisted if possible. If its 3114 I agree un-blacklisting is the way to go... but its not clear to me whether the problematic configuration included sata_sil or sata_nv. Since I'm apparently blind :) which part of the bug points conclusively to sata_sil? Jeff ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 2:46 ` Jeff Garzik @ 2006-03-02 3:00 ` Eric D. Mudama 2006-03-02 3:06 ` Jeff Garzik 2006-03-02 16:07 ` Nicolas Mailhot 2006-03-02 16:03 ` Nicolas Mailhot 1 sibling, 2 replies; 131+ messages in thread From: Eric D. Mudama @ 2006-03-02 3:00 UTC (permalink / raw) To: Jeff Garzik Cc: Jens Axboe, Tejun Heo, Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo On 3/1/06, Jeff Garzik <jgarzik@pobox.com> wrote: > Eric D. Mudama wrote: > > On 3/1/06, Jeff Garzik <jgarzik@pobox.com> wrote: > > > >>This also begs the question... what controller was being used, when the > >>single Maxtor device listed in the blacklist was added? Perhaps it was > >>a problem with the controller, not the device. > >> > >> Jeff > > > > > > As reported here: > > > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951 > > > > the controller was a 3114, and the bug was "fixed" by blacklisting his > > Maxtor drive's FUA support. I'd like Maxtor drives to be > > un-blacklisted if possible. > > If its 3114 I agree un-blacklisting is the way to go... but its not > clear to me whether the problematic configuration included sata_sil or > sata_nv. Since I'm apparently blind :) which part of the bug points > conclusively to sata_sil? > > Jeff The "failing dmesg" has the plextor connected to sata_nv, and the two Maxtor drives connected to sata_sil, if I read it correctly. They're ata5/ata6 ports, mapped as sda/sdb. Nicolas' comment in the thread "Re: LibPATA code issues / 2.6.15.4" seemed to say it was the same adapter: http://marc.theaimsgroup.com/?l=linux-kernel&m=114123989405668&w=2 --eric ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 3:00 ` Eric D. Mudama @ 2006-03-02 3:06 ` Jeff Garzik 2006-03-02 3:13 ` Tejun Heo ` (2 more replies) 2006-03-02 16:07 ` Nicolas Mailhot 1 sibling, 3 replies; 131+ messages in thread From: Jeff Garzik @ 2006-03-02 3:06 UTC (permalink / raw) To: Eric D. Mudama Cc: Jens Axboe, Tejun Heo, Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo Eric D. Mudama wrote: > The "failing dmesg" has the plextor connected to sata_nv, and the two > Maxtor drives connected to sata_sil, if I read it correctly. They're > ata5/ata6 ports, mapped as sda/sdb. > > Nicolas' comment in the thread "Re: LibPATA code issues / 2.6.15.4" > seemed to say it was the same adapter: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=114123989405668&w=2 Sounds like un-blacklisting the drive, and adding ATA_FLAG_NO_FUA is the way to go... Jeff ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 3:06 ` Jeff Garzik @ 2006-03-02 3:13 ` Tejun Heo 2006-03-02 3:16 ` Mark Lord 2006-03-02 16:12 ` Nicolas Mailhot 2 siblings, 0 replies; 131+ messages in thread From: Tejun Heo @ 2006-03-02 3:13 UTC (permalink / raw) To: Jeff Garzik Cc: Eric D. Mudama, Jens Axboe, Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo Jeff Garzik wrote: > Eric D. Mudama wrote: > >> The "failing dmesg" has the plextor connected to sata_nv, and the two >> Maxtor drives connected to sata_sil, if I read it correctly. They're >> ata5/ata6 ports, mapped as sda/sdb. >> >> Nicolas' comment in the thread "Re: LibPATA code issues / 2.6.15.4" >> seemed to say it was the same adapter: >> >> http://marc.theaimsgroup.com/?l=linux-kernel&m=114123989405668&w=2 > > > Sounds like un-blacklisting the drive, and adding ATA_FLAG_NO_FUA is the > way to go... > Agreed. I'm currently implementing VDMA on sata_sil and will get to FUA via explicit protocol soon. -- tejun ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 3:06 ` Jeff Garzik 2006-03-02 3:13 ` Tejun Heo @ 2006-03-02 3:16 ` Mark Lord 2006-03-02 3:18 ` Jeff Garzik 2006-03-02 16:12 ` Nicolas Mailhot 2 siblings, 1 reply; 131+ messages in thread From: Mark Lord @ 2006-03-02 3:16 UTC (permalink / raw) To: Jeff Garzik Cc: Eric D. Mudama, Jens Axboe, Tejun Heo, Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo Jeff Garzik wrote: .. > Sounds like un-blacklisting the drive, and adding ATA_FLAG_NO_FUA is the > way to go... Might as well add sata_mv to that blacklist as well. And while I'm at it, the pdc_adma and sata_qstor controllers/drivers are fine with FUA. -ml ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 3:16 ` Mark Lord @ 2006-03-02 3:18 ` Jeff Garzik 2006-03-02 6:23 ` Eric D. Mudama ` (2 more replies) 0 siblings, 3 replies; 131+ messages in thread From: Jeff Garzik @ 2006-03-02 3:18 UTC (permalink / raw) To: Mark Lord Cc: Eric D. Mudama, Jens Axboe, Tejun Heo, Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo Mark Lord wrote: > Jeff Garzik wrote: > .. > >> Sounds like un-blacklisting the drive, and adding ATA_FLAG_NO_FUA is >> the way to go... > > > Might as well add sata_mv to that blacklist as well. Have you confirmed that it doesn't work with FUA? We recently patched sata_mv to add ATA_CMD_WRITE_FUA_EXT, in response to a nasty bug report, and ISTR the complainer went away. > And while I'm at it, the pdc_adma and sata_qstor controllers/drivers are > fine with FUA. Verified or just guessing? Jeff ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 3:18 ` Jeff Garzik @ 2006-03-02 6:23 ` Eric D. Mudama 2006-03-02 9:00 ` Sander 2006-03-02 11:52 ` Jeff Garzik 2006-03-02 8:57 ` Sander 2006-03-03 0:34 ` Mark Lord 2 siblings, 2 replies; 131+ messages in thread From: Eric D. Mudama @ 2006-03-02 6:23 UTC (permalink / raw) To: Jeff Garzik Cc: Mark Lord, Jens Axboe, Tejun Heo, Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo On 3/1/06, Jeff Garzik <jgarzik@pobox.com> wrote: > Mark Lord wrote: > > Jeff Garzik wrote: > > .. > > > >> Sounds like un-blacklisting the drive, and adding ATA_FLAG_NO_FUA is > >> the way to go... > > > > > > Might as well add sata_mv to that blacklist as well. > > Have you confirmed that it doesn't work with FUA? I'll see if I can find one of these around the lab tomorrow and test the raw command support. If that's fine at a basic level, it might be a bug in the driver? ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 6:23 ` Eric D. Mudama @ 2006-03-02 9:00 ` Sander 2006-03-02 11:52 ` Jeff Garzik 1 sibling, 0 replies; 131+ messages in thread From: Sander @ 2006-03-02 9:00 UTC (permalink / raw) To: Eric D. Mudama Cc: Jeff Garzik, Mark Lord, Jens Axboe, Tejun Heo, Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo Eric D. Mudama wrote (ao): > On 3/1/06, Jeff Garzik <jgarzik@pobox.com> wrote: > > Mark Lord wrote: > > > Jeff Garzik wrote: > > > .. > > > > > >> Sounds like un-blacklisting the drive, and adding ATA_FLAG_NO_FUA is > > >> the way to go... > > > > > > > > > Might as well add sata_mv to that blacklist as well. > > > > Have you confirmed that it doesn't work with FUA? > > I'll see if I can find one of these around the lab tomorrow and test > the raw command support. If that's fine at a basic level, it might be > a bug in the driver? If you tell me what to do (what to type in etc) I can save you from looking for one. I have a: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09) I can connect a Maxtor MaXLine Pro 500, a Maxtor DiamondMax11 and a WD Raptor 74GB to test if necessary. Sander -- Humilis IT Services and Solutions http://www.humilis.net ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 6:23 ` Eric D. Mudama 2006-03-02 9:00 ` Sander @ 2006-03-02 11:52 ` Jeff Garzik 1 sibling, 0 replies; 131+ messages in thread From: Jeff Garzik @ 2006-03-02 11:52 UTC (permalink / raw) To: Eric D. Mudama Cc: Mark Lord, Jens Axboe, Tejun Heo, Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo Eric D. Mudama wrote: > On 3/1/06, Jeff Garzik <jgarzik@pobox.com> wrote: > >>Mark Lord wrote: >> >>>Jeff Garzik wrote: >>>.. >>> >>> >>>>Sounds like un-blacklisting the drive, and adding ATA_FLAG_NO_FUA is >>>>the way to go... >>> >>> >>>Might as well add sata_mv to that blacklist as well. >> >>Have you confirmed that it doesn't work with FUA? > > > I'll see if I can find one of these around the lab tomorrow and test > the raw command support. If that's fine at a basic level, it might be > a bug in the driver? Quite possibly. Anything goes with sata_mv at the moment... I've done my best to cover most of the errata and get it working, but there are still some key errata workarounds missing. It's still marked "HIGHLY EXPERIMENTAL" in the Kconfig ;-) Jeff ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 3:18 ` Jeff Garzik 2006-03-02 6:23 ` Eric D. Mudama @ 2006-03-02 8:57 ` Sander 2006-03-03 0:34 ` Mark Lord 2 siblings, 0 replies; 131+ messages in thread From: Sander @ 2006-03-02 8:57 UTC (permalink / raw) To: Jeff Garzik Cc: Mark Lord, Eric D. Mudama, Jens Axboe, Tejun Heo, Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo Jeff Garzik wrote (ao): > Mark Lord wrote: > >Jeff Garzik wrote: > >.. > > > >>Sounds like un-blacklisting the drive, and adding ATA_FLAG_NO_FUA is > >>the way to go... > > > > > >Might as well add sata_mv to that blacklist as well. > > Have you confirmed that it doesn't work with FUA? > > We recently patched sata_mv to add ATA_CMD_WRITE_FUA_EXT, in response to > a nasty bug report, and ISTR the complainer went away. That is correct. I was that complainer and reported that the patch works for me: http://lkml.org/lkml/2006/2/15/175 Also, the patch went into the next -rc kernel that time. Sander PS, can I get you guys interested in the sata_mv driver? I would really love to use Marvell controller: http://www.ussg.iu.edu/hypermail/linux/kernel/0602.2/0914.html I'd be very happy to test any patches and will report how they do. -- Humilis IT Services and Solutions http://www.humilis.net ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 3:18 ` Jeff Garzik 2006-03-02 6:23 ` Eric D. Mudama 2006-03-02 8:57 ` Sander @ 2006-03-03 0:34 ` Mark Lord 2 siblings, 0 replies; 131+ messages in thread From: Mark Lord @ 2006-03-03 0:34 UTC (permalink / raw) To: Jeff Garzik Cc: Mark Lord, Eric D. Mudama, Jens Axboe, Tejun Heo, Nicolas Mailhot, linux-ide, linux-kernel, Carlos Pardo Jeff Garzik wrote: > Mark Lord wrote: >> Jeff Garzik wrote: >> .. >> >>> Sounds like un-blacklisting the drive, and adding ATA_FLAG_NO_FUA is >>> the way to go... >> >> >> Might as well add sata_mv to that blacklist as well. > > Have you confirmed that it doesn't work with FUA? Ooops. Defective memory here. The Marvell documentation for the 6081/6041 does indeed state that the FUA DMA commands *are* supported (queued or non-queued). So it should be okay, at least for those two specific chips. ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 3:06 ` Jeff Garzik 2006-03-02 3:13 ` Tejun Heo 2006-03-02 3:16 ` Mark Lord @ 2006-03-02 16:12 ` Nicolas Mailhot 2 siblings, 0 replies; 131+ messages in thread From: Nicolas Mailhot @ 2006-03-02 16:12 UTC (permalink / raw) To: Jeff Garzik Cc: Eric D. Mudama, Jens Axboe, Tejun Heo, Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo Le Jeu 2 mars 2006 04:06, Jeff Garzik a écrit : > Eric D. Mudama wrote: >> The "failing dmesg" has the plextor connected to sata_nv, and the two >> Maxtor drives connected to sata_sil, if I read it correctly. They're >> ata5/ata6 ports, mapped as sda/sdb. >> >> Nicolas' comment in the thread "Re: LibPATA code issues / 2.6.15.4" >> seemed to say it was the same adapter: >> >> http://marc.theaimsgroup.com/?l=linux-kernel&m=114123989405668&w=2 > > Sounds like un-blacklisting the drive, and adding ATA_FLAG_NO_FUA is the > way to go... Please add the ATA_FLAG_NO_FUA flag and *after* unblacklist the drive as I distinctly have no wish to do fsck stressing again. -- Nicolas Mailhot ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 3:00 ` Eric D. Mudama 2006-03-02 3:06 ` Jeff Garzik @ 2006-03-02 16:07 ` Nicolas Mailhot 1 sibling, 0 replies; 131+ messages in thread From: Nicolas Mailhot @ 2006-03-02 16:07 UTC (permalink / raw) To: Eric D. Mudama Cc: Jeff Garzik, Jens Axboe, Tejun Heo, Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo Le Jeu 2 mars 2006 04:00, Eric D. Mudama a écrit : > The "failing dmesg" has the plextor connected to sata_nv, and the two > Maxtor drives connected to sata_sil, if I read it correctly. They're > ata5/ata6 ports, mapped as sda/sdb. > > Nicolas' comment in the thread "Re: LibPATA code issues / 2.6.15.4" > seemed to say it was the same adapter: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=114123989405668&w=2 Not only it's the same adapter model, but we're talking about the same physical system. I opened the original boog, posted on lkml, etc -- Nicolas Mailhot ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 2:46 ` Jeff Garzik 2006-03-02 3:00 ` Eric D. Mudama @ 2006-03-02 16:03 ` Nicolas Mailhot 1 sibling, 0 replies; 131+ messages in thread From: Nicolas Mailhot @ 2006-03-02 16:03 UTC (permalink / raw) To: Jeff Garzik Cc: Eric D. Mudama, Jens Axboe, Tejun Heo, Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo Le Jeu 2 mars 2006 03:46, Jeff Garzik a écrit : > Eric D. Mudama wrote: > If its 3114 I agree un-blacklisting is the way to go... but its not > clear to me whether the problematic configuration included sata_sil or > sata_nv. Since I'm apparently blind :) which part of the bug points > conclusively to sata_sil? It's sata-sil I'm 100% sure it's how I cabled the system sata-nv only got a plextor drive attached (pata-nv has two pata drives on too) -- Nicolas Mailhot ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 2:20 ` Eric D. Mudama 2006-03-02 2:46 ` Jeff Garzik @ 2006-03-02 16:05 ` Nicolas Mailhot 1 sibling, 0 replies; 131+ messages in thread From: Nicolas Mailhot @ 2006-03-02 16:05 UTC (permalink / raw) To: Eric D. Mudama Cc: Jeff Garzik, Jens Axboe, Tejun Heo, Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo Le Jeu 2 mars 2006 03:20, Eric D. Mudama a écrit : > On 3/1/06, Jeff Garzik <jgarzik@pobox.com> wrote: >> This also begs the question... what controller was being used, when the >> single Maxtor device listed in the blacklist was added? Perhaps it was >> a problem with the controller, not the device. >> >> Jeff > > As reported here: > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951 > > the controller was a 3114, and the bug was "fixed" by blacklisting his > Maxtor drive's FUA support. I'd like Maxtor drives to be > un-blacklisted if possible. BTW Eric you should know : - these specific drives (and the Maxtor PATA drives they replaced) where bought because I knew you were hanging on the lists - I fully intended to ask you if the blacklisting where valif after the FUA dust had settled a little Regards, -- Nicolas Mailhot ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 1:58 ` Jeff Garzik 2006-03-02 2:20 ` Eric D. Mudama @ 2006-03-02 7:22 ` Jens Axboe 2006-03-02 15:59 ` Nicolas Mailhot 2 siblings, 0 replies; 131+ messages in thread From: Jens Axboe @ 2006-03-02 7:22 UTC (permalink / raw) To: Jeff Garzik Cc: Eric D. Mudama, Tejun Heo, Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo On Wed, Mar 01 2006, Jeff Garzik wrote: > Jeff Garzik wrote: > >For libata, I think an ATA_FLAG_NO_FUA would be appropriate for > >situations like this... assume FUA is supported in the controller, and > >set a flag where it is not. Most chips will support FUA, either by > >design or by sheer luck. The ones that do not support FUA are the > >controllers that snoop the ATA command opcode, and internally choose the > >protocol based on that opcode. For such hardware, unknown opcodes will > >inevitably cause problems. > > This also begs the question... what controller was being used, when the > single Maxtor device listed in the blacklist was added? Perhaps it was > a problem with the controller, not the device. Yeah which explains it a lot better as well... The FUA drive problem never made much sense to me. -- Jens Axboe ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 1:58 ` Jeff Garzik 2006-03-02 2:20 ` Eric D. Mudama 2006-03-02 7:22 ` Jens Axboe @ 2006-03-02 15:59 ` Nicolas Mailhot 2006-03-02 16:37 ` Jeff Garzik 2 siblings, 1 reply; 131+ messages in thread From: Nicolas Mailhot @ 2006-03-02 15:59 UTC (permalink / raw) To: Jeff Garzik Cc: Jens Axboe, Eric D. Mudama, Tejun Heo, Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo Le Jeu 2 mars 2006 02:58, Jeff Garzik a écrit : > Jeff Garzik wrote: >> For libata, I think an ATA_FLAG_NO_FUA would be appropriate for >> situations like this... assume FUA is supported in the controller, and >> set a flag where it is not. Most chips will support FUA, either by >> design or by sheer luck. The ones that do not support FUA are the >> controllers that snoop the ATA command opcode, and internally choose the >> protocol based on that opcode. For such hardware, unknown opcodes will >> inevitably cause problems. > > This also begs the question... what controller was being used, when the > single Maxtor device listed in the blacklist was added? Perhaps it was > a problem with the controller, not the device. The controller in the bugzilla entry ie a SiI 3114. It was a quick fix and I did expect more thorough investigation later (probably 2.6.17 frame). Though it seems FUA-related problems are so numerous FUA itself will be blacklisted for 2.6.16, so the limited blacklist is no longer needed. The thread leading to the blacklist is referenced in the bugzilla entry -- Nicolas Mailhot ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) 2006-03-02 15:59 ` Nicolas Mailhot @ 2006-03-02 16:37 ` Jeff Garzik 0 siblings, 0 replies; 131+ messages in thread From: Jeff Garzik @ 2006-03-02 16:37 UTC (permalink / raw) To: Nicolas Mailhot Cc: Jens Axboe, Eric D. Mudama, Tejun Heo, Nicolas Mailhot, Mark Lord, linux-ide, linux-kernel, Carlos Pardo Nicolas Mailhot wrote: > The controller in the bugzilla entry ie a SiI 3114. > It was a quick fix and I did expect more thorough investigation later > (probably 2.6.17 frame). Though it seems FUA-related problems are so > numerous FUA itself will be blacklisted for 2.6.16, so the limited > blacklist is no longer needed. Well, we're looking for a long term solution :) Disabling FUA by default in 2.6.16 is a temporary solution. Jeff ^ permalink raw reply [flat|nested] 131+ messages in thread
* LibPATA code issues / 2.6.15.4 @ 2006-02-14 9:48 Justin Piszcz 2006-02-14 14:50 ` Mark Lord 0 siblings, 1 reply; 131+ messages in thread From: Justin Piszcz @ 2006-02-14 9:48 UTC (permalink / raw) To: Jeff Garzik; +Cc: linux-kernel Jeff, I'd have to double check but I do not recall getting these errors before the pass-thru code was introduced in 2.6.15, I also was not running the smart daemon until 2.6.15 for SATA drives as it was not supported. I had a few issues before that I posted to LKML, those were due to too many SATA devices etc, everything is back to normal for the most part. Speed, etc, all is well again, almost... /dev/sdc: Timing buffered disk reads: 154 MB in 3.02 seconds = 50.97 MB/sec /dev/sdc: Timing buffered disk reads: 162 MB in 3.00 seconds = 53.94 MB/sec The only issue I have is when I copy a lot of files to a WD 400GB drive I these pesky errors in dmesg: ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x04 { DriveStatusError } Yet, everything copied (226GB) or so to the 400GB drive without a single I/O error that I am aware of. So my question is, why do I get these errors in dmesg if they are not critical? Thanks, Justin. ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-14 9:48 LibPATA code issues / 2.6.15.4 Justin Piszcz @ 2006-02-14 14:50 ` Mark Lord 2006-02-14 16:27 ` David Greaves ` (2 more replies) 0 siblings, 3 replies; 131+ messages in thread From: Mark Lord @ 2006-02-14 14:50 UTC (permalink / raw) To: Justin Piszcz; +Cc: Jeff Garzik, linux-kernel, IDE/ATA development list Justin Piszcz wrote: .. > ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 > ata3: status=0x51 { DriveReady SeekComplete Error } > ata3: error=0x04 { DriveStatusError } I wonder if the FUA logic is inserting cache-flush commands and perhaps the drive is rejecting those? Jeff, we really ought to be including the failed ATA opcode in those error messages!! Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-14 14:50 ` Mark Lord @ 2006-02-14 16:27 ` David Greaves 2006-02-14 17:12 ` Justin Piszcz 2006-02-14 23:58 ` Justin Piszcz 2006-02-17 8:45 ` Jeff Garzik 2 siblings, 1 reply; 131+ messages in thread From: David Greaves @ 2006-02-14 16:27 UTC (permalink / raw) To: Mark Lord Cc: Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list Mark Lord wrote: > Justin Piszcz wrote: > .. > >> ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } > > > I wonder if the FUA logic is inserting cache-flush commands > and perhaps the drive is rejecting those? > > Jeff, we really ought to be including the failed ATA opcode > in those error messages!! > If such a thing were available as a patch then I too would apply it and hopefully could provide useful feedback. David PS My problems: http://marc.theaimsgroup.com/?l=linux-kernel&m=113769509617034&w=2 http://marc.theaimsgroup.com/?l=linux-ide&m=113828551519727&w=2 http://marc.theaimsgroup.com/?l=linux-ide&m=113829573105369&w=2 http://marc.theaimsgroup.com/?l=linux-ide&m=113933732903205&w=2 ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-14 16:27 ` David Greaves @ 2006-02-14 17:12 ` Justin Piszcz 2006-02-14 18:00 ` Mark Lord 0 siblings, 1 reply; 131+ messages in thread From: Justin Piszcz @ 2006-02-14 17:12 UTC (permalink / raw) To: David Greaves Cc: Mark Lord, Jeff Garzik, linux-kernel, IDE/ATA development list I would like to try the patch too, if available. I got these errors when nothing (apparent) was going on. [25158.676998] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [25158.677005] ata3: status=0x51 { DriveReady SeekComplete Error } [25158.677009] ata3: error=0x04 { DriveStatusError } [27306.663556] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [27306.663563] ata3: status=0x51 { DriveReady SeekComplete Error } [27306.663567] ata3: error=0x04 { DriveStatusError } On Tue, 14 Feb 2006, David Greaves wrote: > Mark Lord wrote: > >> Justin Piszcz wrote: >> .. >> >>> ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >>> ata3: status=0x51 { DriveReady SeekComplete Error } >>> ata3: error=0x04 { DriveStatusError } >> >> >> I wonder if the FUA logic is inserting cache-flush commands >> and perhaps the drive is rejecting those? >> >> Jeff, we really ought to be including the failed ATA opcode >> in those error messages!! >> > If such a thing were available as a patch then I too would apply it and > hopefully could provide useful feedback. > > David > PS My problems: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=113769509617034&w=2 > http://marc.theaimsgroup.com/?l=linux-ide&m=113828551519727&w=2 > http://marc.theaimsgroup.com/?l=linux-ide&m=113829573105369&w=2 > http://marc.theaimsgroup.com/?l=linux-ide&m=113933732903205&w=2 > > ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-14 17:12 ` Justin Piszcz @ 2006-02-14 18:00 ` Mark Lord 2006-02-14 18:06 ` Justin Piszcz ` (2 more replies) 0 siblings, 3 replies; 131+ messages in thread From: Mark Lord @ 2006-02-14 18:00 UTC (permalink / raw) To: Justin Piszcz Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list On Tuesday 14 February 2006 12:12, Justin Piszcz wrote: > I would like to try the patch too, if available. Something like this: (for 2.6.16-rc3-git2, but should be okay on 2.6.15 also). Untested: include the original SCSI opcode in printk's for libata SCSI errors, to help understand where the errors are coming from. Signed-Off-By: Mark Lord <mlord@pobox.com> --- linux/drivers/scsi/libata-scsi.c.orig 2006-02-12 19:27:25.000000000 -0500 +++ linux/drivers/scsi/libata-scsi.c 2006-02-14 12:54:17.000000000 -0500 @@ -420,6 +420,7 @@ * @sk: the sense key we'll fill out * @asc: the additional sense code we'll fill out * @ascq: the additional sense code qualifier we'll fill out + * @opcode: the original SCSI command opcode byte * * Converts an ATA error into a SCSI error. Fill out pointers to * SK, ASC, and ASCQ bytes for later use in fixed or descriptor @@ -429,7 +430,7 @@ * spin_lock_irqsave(host_set lock) */ void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc, - u8 *ascq) + u8 *ascq, u8 opcode) { int i; @@ -508,8 +509,8 @@ } } /* No error? Undecoded? */ - printk(KERN_WARNING "ata%u: no sense translation for status: 0x%02x\n", - id, drv_stat); + printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: 0x%02x\n", + id, opcode, drv_stat); /* For our last chance pick, use medium read error because * it's much more common than an ATA drive telling you a write @@ -520,8 +521,8 @@ *ascq = 0x04; /* "auto-reallocation failed" */ translate_done: - printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to " - "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err, + printk(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to " + "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err, *sk, *asc, *ascq); return; } @@ -562,7 +563,7 @@ */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { ata_to_sense_error(qc->ap->id, tf->command, tf->feature, - &sb[1], &sb[2], &sb[3]); + &sb[1], &sb[2], &sb[3], cmd->cmnd[0]); sb[1] &= 0x0f; } @@ -637,7 +638,7 @@ */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { ata_to_sense_error(qc->ap->id, tf->command, tf->feature, - &sb[2], &sb[12], &sb[13]); + &sb[2], &sb[12], &sb[13], cmd->cmnd[0]); sb[2] &= 0x0f; } ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-14 18:00 ` Mark Lord @ 2006-02-14 18:06 ` Justin Piszcz 2006-02-23 23:39 ` Justin Piszcz 2006-02-25 11:34 ` David Greaves 2 siblings, 0 replies; 131+ messages in thread From: Justin Piszcz @ 2006-02-14 18:06 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list Thanks, I will reboot later tonight and see what type of error codes it gives me. Against 2.6.15.4: # patch -p1 < /tmp/a patching file drivers/scsi/libata-scsi.c Hunk #1 succeeded at 404 (offset -16 lines). Hunk #2 succeeded at 414 (offset -16 lines). Hunk #3 succeeded at 493 (offset -16 lines). Hunk #4 succeeded at 505 (offset -16 lines). Hunk #5 succeeded at 547 (offset -16 lines). Hunk #6 succeeded at 622 (offset -16 lines). # On Tue, 14 Feb 2006, Mark Lord wrote: > On Tuesday 14 February 2006 12:12, Justin Piszcz wrote: >> I would like to try the patch too, if available. > > Something like this: (for 2.6.16-rc3-git2, but should be okay on 2.6.15 also). > > Untested: include the original SCSI opcode in printk's for libata SCSI errors, > to help understand where the errors are coming from. > > Signed-Off-By: Mark Lord <mlord@pobox.com> > > --- linux/drivers/scsi/libata-scsi.c.orig 2006-02-12 19:27:25.000000000 -0500 > +++ linux/drivers/scsi/libata-scsi.c 2006-02-14 12:54:17.000000000 -0500 > @@ -420,6 +420,7 @@ > * @sk: the sense key we'll fill out > * @asc: the additional sense code we'll fill out > * @ascq: the additional sense code qualifier we'll fill out > + * @opcode: the original SCSI command opcode byte > * > * Converts an ATA error into a SCSI error. Fill out pointers to > * SK, ASC, and ASCQ bytes for later use in fixed or descriptor > @@ -429,7 +430,7 @@ > * spin_lock_irqsave(host_set lock) > */ > void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc, > - u8 *ascq) > + u8 *ascq, u8 opcode) > { > int i; > > @@ -508,8 +509,8 @@ > } > } > /* No error? Undecoded? */ > - printk(KERN_WARNING "ata%u: no sense translation for status: 0x%02x\n", > - id, drv_stat); > + printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: 0x%02x\n", > + id, opcode, drv_stat); > > /* For our last chance pick, use medium read error because > * it's much more common than an ATA drive telling you a write > @@ -520,8 +521,8 @@ > *ascq = 0x04; /* "auto-reallocation failed" */ > > translate_done: > - printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to " > - "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err, > + printk(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to " > + "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err, > *sk, *asc, *ascq); > return; > } > @@ -562,7 +563,7 @@ > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > - &sb[1], &sb[2], &sb[3]); > + &sb[1], &sb[2], &sb[3], cmd->cmnd[0]); > sb[1] &= 0x0f; > } > > @@ -637,7 +638,7 @@ > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > - &sb[2], &sb[12], &sb[13]); > + &sb[2], &sb[12], &sb[13], cmd->cmnd[0]); > sb[2] &= 0x0f; > } > > ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-14 18:00 ` Mark Lord 2006-02-14 18:06 ` Justin Piszcz @ 2006-02-23 23:39 ` Justin Piszcz 2006-02-25 15:32 ` Mark Lord 2006-02-25 11:34 ` David Greaves 2 siblings, 1 reply; 131+ messages in thread From: Justin Piszcz @ 2006-02-23 23:39 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list I have reproduced the error with the patched kernel! Here it is: [263864.109854] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [263864.109861] ata3: status=0x51 { DriveReady SeekComplete Error } [263864.109866] ata3: error=0x04 { DriveStatusError } Here is how I got it to error: $ for i in `seq 1 1000`; do dd if=/dev/zero of=file.$i bs=1M count=$i; done Now, how to fix? :) On Tue, 14 Feb 2006, Mark Lord wrote: > On Tuesday 14 February 2006 12:12, Justin Piszcz wrote: >> I would like to try the patch too, if available. > > Something like this: (for 2.6.16-rc3-git2, but should be okay on 2.6.15 also). > > Untested: include the original SCSI opcode in printk's for libata SCSI errors, > to help understand where the errors are coming from. > > Signed-Off-By: Mark Lord <mlord@pobox.com> > > --- linux/drivers/scsi/libata-scsi.c.orig 2006-02-12 19:27:25.000000000 -0500 > +++ linux/drivers/scsi/libata-scsi.c 2006-02-14 12:54:17.000000000 -0500 > @@ -420,6 +420,7 @@ > * @sk: the sense key we'll fill out > * @asc: the additional sense code we'll fill out > * @ascq: the additional sense code qualifier we'll fill out > + * @opcode: the original SCSI command opcode byte > * > * Converts an ATA error into a SCSI error. Fill out pointers to > * SK, ASC, and ASCQ bytes for later use in fixed or descriptor > @@ -429,7 +430,7 @@ > * spin_lock_irqsave(host_set lock) > */ > void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc, > - u8 *ascq) > + u8 *ascq, u8 opcode) > { > int i; > > @@ -508,8 +509,8 @@ > } > } > /* No error? Undecoded? */ > - printk(KERN_WARNING "ata%u: no sense translation for status: 0x%02x\n", > - id, drv_stat); > + printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: 0x%02x\n", > + id, opcode, drv_stat); > > /* For our last chance pick, use medium read error because > * it's much more common than an ATA drive telling you a write > @@ -520,8 +521,8 @@ > *ascq = 0x04; /* "auto-reallocation failed" */ > > translate_done: > - printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to " > - "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err, > + printk(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to " > + "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err, > *sk, *asc, *ascq); > return; > } > @@ -562,7 +563,7 @@ > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > - &sb[1], &sb[2], &sb[3]); > + &sb[1], &sb[2], &sb[3], cmd->cmnd[0]); > sb[1] &= 0x0f; > } > > @@ -637,7 +638,7 @@ > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > - &sb[2], &sb[12], &sb[13]); > + &sb[2], &sb[12], &sb[13], cmd->cmnd[0]); > sb[2] &= 0x0f; > } > > ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-23 23:39 ` Justin Piszcz @ 2006-02-25 15:32 ` Mark Lord 2006-02-25 15:58 ` Justin Piszcz 0 siblings, 1 reply; 131+ messages in thread From: Mark Lord @ 2006-02-25 15:32 UTC (permalink / raw) To: Justin Piszcz Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list Justin Piszcz wrote: > I have reproduced the error with the patched kernel! > > Here it is: > > [263864.109854] ata3: translated ATA stat/err 0x51/04 to SCSI > SK/ASC/ASCQ 0xb/00/00 > [263864.109861] ata3: status=0x51 { DriveReady SeekComplete Error } > [263864.109866] ata3: error=0x04 { DriveStatusError } Nope.. patch not present, as otherwise the line above would have read something like this: > [263864.109854] ata3: translated op=0x21 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 So we didn't get the extra info since the patch wasn't present. Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 15:32 ` Mark Lord @ 2006-02-25 15:58 ` Justin Piszcz 2006-02-25 16:11 ` Jesper Juhl 2006-02-25 16:21 ` Mark Lord 0 siblings, 2 replies; 131+ messages in thread From: Justin Piszcz @ 2006-02-25 15:58 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list The kernel is patched, if you did not get what you wanted maybe the patch does not work in some instances or there is a bug? On Sat, 25 Feb 2006, Mark Lord wrote: > Justin Piszcz wrote: >> I have reproduced the error with the patched kernel! >> >> Here it is: >> >> [263864.109854] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ >> 0xb/00/00 >> [263864.109861] ata3: status=0x51 { DriveReady SeekComplete Error } >> [263864.109866] ata3: error=0x04 { DriveStatusError } > > Nope.. patch not present, as otherwise the line above would have > read something like this: > >> [263864.109854] ata3: translated op=0x21 ATA stat/err 0x51/04 to SCSI > SK/ASC/ASCQ 0xb/00/00 > > So we didn't get the extra info since the patch wasn't present. > > Cheers > ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 15:58 ` Justin Piszcz @ 2006-02-25 16:11 ` Jesper Juhl 2006-02-25 16:21 ` Mark Lord 1 sibling, 0 replies; 131+ messages in thread From: Jesper Juhl @ 2006-02-25 16:11 UTC (permalink / raw) To: Justin Piszcz Cc: Mark Lord, David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list On 2/25/06, Justin Piszcz <jpiszcz@lucidpixels.com> wrote: Please don't top-post. > The kernel is patched, if you did not get what you wanted maybe the patch > does not work in some instances or there is a bug? > You may have patched a kernel source with Mark's patch, but you are very clearly not running a kernel build from that patched source. As can be seen from (for example) this bit from Mark's patch translate_done: - printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to " - "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err, + printk(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to " + "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err, *sk, *asc, *ascq); the patch changes the text being printed. In this case the text "ata%u: translated ATA stat/err ..." is changed into "ata%u: translated ATA stat/err ..." And if we look at the output you posted : > >> Here it is: > >> > >> [263864.109854] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ > >> 0xb/00/00 That string is clearly from an un-patched kernel as Mark also pointed out in his reply to you. -- Jesper Juhl <jesper.juhl@gmail.com> Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 15:58 ` Justin Piszcz 2006-02-25 16:11 ` Jesper Juhl @ 2006-02-25 16:21 ` Mark Lord 1 sibling, 0 replies; 131+ messages in thread From: Mark Lord @ 2006-02-25 16:21 UTC (permalink / raw) To: Justin Piszcz Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list Justin Piszcz wrote: > The kernel is patched, if you did not get what you wanted maybe the > patch does not work in some instances or there is a bug? No, the output would be there if those messages came from the patched kernel. (read the patch and see what I mean..). Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-14 18:00 ` Mark Lord 2006-02-14 18:06 ` Justin Piszcz 2006-02-23 23:39 ` Justin Piszcz @ 2006-02-25 11:34 ` David Greaves 2006-02-25 16:20 ` Mark Lord 2 siblings, 1 reply; 131+ messages in thread From: David Greaves @ 2006-02-25 11:34 UTC (permalink / raw) To: Mark Lord Cc: Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list Mark Lord wrote: >On Tuesday 14 February 2006 12:12, Justin Piszcz wrote: > > >>I would like to try the patch too, if available. >> >> > >Something like this: (for 2.6.16-rc3-git2, but should be okay on 2.6.15 also). > >Untested: include the original SCSI opcode in printk's for libata SCSI errors, >to help understand where the errors are coming from. > >Signed-Off-By: Mark Lord <mlord@pobox.com> > > Thanks Mark - I've finally gotten this patch applied. With smartd disabled and no smart commands issued, a readonly badblocks scan of /dev/sdb2 shows no problems and now gives: Feb 25 10:38:31 haze kernel: ata2: status=0x51 { DriveReady SeekComplete Error } Feb 25 10:38:32 haze kernel: ata2: no sense translation for op=0x28 status: 0x51 Feb 25 10:38:32 haze kernel: ata2: status=0x51 { DriveReady SeekComplete Error } Feb 25 10:38:35 haze kernel: ata2: no sense translation for op=0x28 status: 0x51 hundreds of times. and during boot I can get: ata2: no sense translation for op=0x28 status: 0x51 ata2: translated op=0x28 ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } Installing knfsd (copyright (C) 1996 okir@monad.swb.de). ata2: no sense translation for op=0x28 status: 0x51 ata2: translated op=0x28 ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for op=0x28 status: 0x51 ata2: translated op=0x28 ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } Subsequently a smartclt -data -a /dev/sdb shows no errors. So could this be a faulty disk that smart shows is OK and shows no read or write errors? The other problem I noticed was that smartctl -o on -data /dev/sda still just gives: Feb 25 10:51:47 haze kernel: ata1: PIO error Feb 25 10:51:47 haze kernel: ata1: status=0x51 { DriveReady SeekComplete Error } Feb 25 10:51:47 haze kernel: ata1: error=0x04 { DriveStatusError } Feb 25 10:51:47 haze kernel: ata1: PIO error Feb 25 10:51:47 haze kernel: ata1: status=0x51 { DriveReady SeekComplete Error } Feb 25 10:51:47 haze kernel: ata1: error=0x04 { DriveStatusError } Feb 25 10:51:47 haze kernel: ata1: PIO error many times. I get similar problems for all the drives under both sata_sil and sata_via. Linux haze 2.6.15patchsata #6 PREEMPT Fri Feb 24 19:15:07 UTC 2006 i686 GNU/Linux libata version 1.20 loaded. sata_sil 0000:00:0a.0: version 0.9 ACPI: PCI Interrupt 0000:00:0a.0[A] -> GSI 16 (level, low) -> IRQ 17 ata1: SATA max UDMA/100 cmd 0xF8804080 ctl 0xF880408A bmdma 0xF8804000 irq 17 ata2: SATA max UDMA/100 cmd 0xF88040C0 ctl 0xF88040CA bmdma 0xF8804008 irq 17 ata1: dev 0 cfg 49:2f00 82:7869 83:7d09 84:4043 85:7869 86:3c01 87:4043 88:203f ata1: dev 0 ATA-7, max UDMA/100, 390721968 sectors: LBA48 ata1: dev 0 configured for UDMA/100 scsi0 : sata_sil ata2: dev 0 cfg 49:2f00 82:7c6b 83:7f09 84:4063 85:7c69 86:3e01 87:4063 88:007f ata2: dev 0 ATA-7, max UDMA/133, 398297088 sectors: LBA48 ata2: dev 0 configured for UDMA/100 scsi1 : sata_sil Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC Type: Direct-Access ANSI SCSI revision: 05 Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC Type: Direct-Access ANSI SCSI revision: 05 sata_via 0000:00:0f.0: version 1.1 ACPI: PCI Interrupt 0000:00:0f.0[B] -> GSI 20 (level, low) -> IRQ 16 sata_via 0000:00:0f.0: routed to hard irq line 0 ata3: SATA max UDMA/133 cmd 0x9800 ctl 0x9402 bmdma 0x8400 irq 16 ata4: SATA max UDMA/133 cmd 0x9000 ctl 0x8802 bmdma 0x8408 irq 16 ata3: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3468 86:3c01 87:4003 88:407f ata3: dev 0 ATA-6, max UDMA/133, 312581808 sectors: LBA48 ata3: dev 0 configured for UDMA/133 scsi2 : sata_via ata4: dev 0 cfg 49:2f00 82:7c6b 83:7f09 84:4063 85:7c68 86:3e01 87:4063 88:407f ata4: dev 0 ATA-7, max UDMA/133, 398297088 sectors: LBA48 ata4: dev 0 configured for UDMA/133 scsi3 : sata_via Vendor: ATA Model: ST3160023AS Rev: 3.18 Type: Direct-Access ANSI SCSI revision: 05 Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB) SCSI device sda: drive cache: write back SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB) SCSI device sda: drive cache: write back sda: sda1 sd 0:0:0:0: Attached scsi disk sda SCSI device sdb: 398297088 512-byte hdwr sectors (203928 MB) SCSI device sdb: drive cache: write back SCSI device sdb: 398297088 512-byte hdwr sectors (203928 MB) SCSI device sdb: drive cache: write back sdb: sdb1 sdb2 sd 1:0:0:0: Attached scsi disk sdb SCSI device sdc: 312581808 512-byte hdwr sectors (160042 MB) SCSI device sdc: drive cache: write back SCSI device sdc: 312581808 512-byte hdwr sectors (160042 MB) SCSI device sdc: drive cache: write back sdc: sdc1 sdc2 sdc3 sdc4 sd 2:0:0:0: Attached scsi disk sdc SCSI device sdd: 398297088 512-byte hdwr sectors (203928 MB) SCSI device sdd: drive cache: write back SCSI device sdd: 398297088 512-byte hdwr sectors (203928 MB) SCSI device sdd: drive cache: write back sdd: sdd1 sdd2 sd 3:0:0:0: Attached scsi disk sdd sd 0:0:0:0: Attached scsi generic sg0 type 0 sd 1:0:0:0: Attached scsi generic sg1 type 0 sd 2:0:0:0: Attached scsi generic sg2 type 0 sd 3:0:0:0: Attached scsi generic sg3 type 0 David -- ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 11:34 ` David Greaves @ 2006-02-25 16:20 ` Mark Lord 2006-02-25 17:45 ` Justin Piszcz 0 siblings, 1 reply; 131+ messages in thread From: Mark Lord @ 2006-02-25 16:20 UTC (permalink / raw) To: David Greaves Cc: Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list [-- Attachment #1: Type: text/plain, Size: 1443 bytes --] David Greaves wrote: .. > Thanks Mark - I've finally gotten this patch applied. > > With smartd disabled and no smart commands issued, a readonly badblocks > scan of /dev/sdb2 shows no problems and now gives: > Feb 25 10:38:31 haze kernel: ata2: status=0x51 { DriveReady SeekComplete > Error } > Feb 25 10:38:32 haze kernel: ata2: no sense translation for op=0x28 > status: 0x51 > Feb 25 10:38:32 haze kernel: ata2: status=0x51 { DriveReady SeekComplete > Error } > Feb 25 10:38:35 haze kernel: ata2: no sense translation for op=0x28 > status: 0x51 > hundreds of times. .. Mmmm.. okay, it's happening due to a SCSI READ_10 opcode, which means it isn't being triggered by any of the FUA stuff. But there's still no obvious reason for the error. The drive is basically just saying "command rejected", and libata-scsi is translating that into "medium error" for some unknown reason. Unfortunately, the design of the current libata is such that we no longer have access to the actual ATA opcode that was rejected. It gets overwritten by the returned drive status on completion. So.. I need to generate another patch for you now, to save/show the real ATA opcode that was used to cause the errors. My theory is that we'll discover that it is one that your drive legitimately is rejecting (unsupported LBA48 or something..). But we won't know until we see the output. Second patch is attached: apply *in addition* to the first one. Cheers [-- Attachment #2: 12_libata_ata_opcode.patch --] [-- Type: text/x-patch, Size: 5983 bytes --] --- linux/drivers/scsi/libata-core.c.orig 2006-02-23 16:15:52.000000000 -0500 +++ linux/drivers/scsi/libata-core.c 2006-02-25 11:17:42.000000000 -0500 @@ -253,10 +253,11 @@ * spin_lock_irqsave(host_set lock) */ -static void ata_exec_command_pio(struct ata_port *ap, const struct ata_taskfile *tf) +static void ata_exec_command_pio(struct ata_port *ap, struct ata_taskfile *tf) { DPRINTK("ata%u: cmd 0x%X\n", ap->id, tf->command); + tf->saved_command = tf->command; outb(tf->command, ap->ioaddr.command_addr); ata_pause(ap); } @@ -274,10 +275,11 @@ * spin_lock_irqsave(host_set lock) */ -static void ata_exec_command_mmio(struct ata_port *ap, const struct ata_taskfile *tf) +static void ata_exec_command_mmio(struct ata_port *ap, struct ata_taskfile *tf) { DPRINTK("ata%u: cmd 0x%X\n", ap->id, tf->command); + tf->saved_command = tf->command; writeb(tf->command, (void __iomem *) ap->ioaddr.command_addr); ata_pause(ap); } @@ -294,7 +296,7 @@ * LOCKING: * spin_lock_irqsave(host_set lock) */ -void ata_exec_command(struct ata_port *ap, const struct ata_taskfile *tf) +void ata_exec_command(struct ata_port *ap, struct ata_taskfile *tf) { if (ap->flags & ATA_FLAG_MMIO) ata_exec_command_mmio(ap, tf); @@ -316,7 +318,7 @@ */ static inline void ata_tf_to_host(struct ata_port *ap, - const struct ata_taskfile *tf) + struct ata_taskfile *tf) { ap->ops->tf_load(ap, tf); ap->ops->exec_command(ap, tf); @@ -506,12 +508,13 @@ * Inherited from caller. */ -void ata_tf_to_fis(const struct ata_taskfile *tf, u8 *fis, u8 pmp) +void ata_tf_to_fis(struct ata_taskfile *tf, u8 *fis, u8 pmp) { fis[0] = 0x27; /* Register - Host to Device FIS */ fis[1] = (pmp & 0xf) | (1 << 7); /* Port multiplier number, bit 7 indicates Command FIS */ fis[2] = tf->command; + tf->saved_command = tf->command; fis[3] = tf->feature; fis[4] = tf->lbal; @@ -631,6 +634,7 @@ cmd = ata_rw_cmds[index + fua + lba48 + write]; if (cmd) { tf->command = cmd; + tf->saved_command = cmd; return 0; } return -1; --- linux/drivers/scsi/libata-scsi.c.orig 2006-02-25 10:58:41.000000000 -0500 +++ linux/drivers/scsi/libata-scsi.c 2006-02-25 11:16:07.000000000 -0500 @@ -438,7 +438,7 @@ * spin_lock_irqsave(host_set lock) */ void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc, - u8 *ascq, u8 opcode) + u8 *ascq, u8 opcode, u8 cmd) { int i; @@ -517,8 +517,8 @@ } } /* No error? Undecoded? */ - printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: 0x%02x\n", - id, opcode, drv_stat); + printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x cmd=0x%02x status: 0x%02x\n", + id, opcode, cmd, drv_stat); /* For our last chance pick, use medium read error because * it's much more common than an ATA drive telling you a write @@ -529,8 +529,8 @@ *ascq = 0x04; /* "auto-reallocation failed" */ translate_done: - DPRINTK(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to " - "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err, + DPRINTK(KERN_ERR "ata%u: translated op=0x%02x cmd=0x%02x ATA stat/err 0x%02x/%02x to " + "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, cmd, drv_stat, drv_err, *sk, *asc, *ascq); return; } @@ -571,7 +571,7 @@ */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { ata_to_sense_error(qc->ap->id, tf->command, tf->feature, - &sb[1], &sb[2], &sb[3], cmd->cmnd[0]); + &sb[1], &sb[2], &sb[3], cmd->cmnd[0], tf->saved_command); sb[1] &= 0x0f; } @@ -646,7 +646,7 @@ */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { ata_to_sense_error(qc->ap->id, tf->command, tf->feature, - &sb[2], &sb[12], &sb[13], cmd->cmnd[0]); + &sb[2], &sb[12], &sb[13], cmd->cmnd[0], tf->saved_command); sb[2] &= 0x0f; } @@ -1337,6 +1337,7 @@ goto early_finish; /* select device, send command to hardware */ + qc->tf.saved_command = qc->tf.command; if (ata_qc_issue(qc)) goto err_did; --- linux/include/linux/ata.h.orig 2006-02-17 17:23:45.000000000 -0500 +++ linux/include/linux/ata.h 2006-02-25 11:09:53.000000000 -0500 @@ -244,6 +244,7 @@ u8 device; u8 command; /* IO operation */ + u8 saved_command; /* IO operation */ }; #define ata_id_is_ata(id) (((id)[0] & (1 << 15)) == 0) --- linux/include/linux/libata.h.orig 2006-02-23 16:15:53.000000000 -0500 +++ linux/include/linux/libata.h 2006-02-25 11:17:14.000000000 -0500 @@ -420,7 +420,7 @@ void (*tf_load) (struct ata_port *ap, const struct ata_taskfile *tf); void (*tf_read) (struct ata_port *ap, struct ata_taskfile *tf); - void (*exec_command)(struct ata_port *ap, const struct ata_taskfile *tf); + void (*exec_command)(struct ata_port *ap, struct ata_taskfile *tf); u8 (*check_status)(struct ata_port *ap); u8 (*check_altstatus)(struct ata_port *ap); void (*dev_select)(struct ata_port *ap, unsigned int device); @@ -512,13 +512,13 @@ */ extern void ata_tf_load(struct ata_port *ap, const struct ata_taskfile *tf); extern void ata_tf_read(struct ata_port *ap, struct ata_taskfile *tf); -extern void ata_tf_to_fis(const struct ata_taskfile *tf, u8 *fis, u8 pmp); +extern void ata_tf_to_fis(struct ata_taskfile *tf, u8 *fis, u8 pmp); extern void ata_tf_from_fis(const u8 *fis, struct ata_taskfile *tf); extern void ata_noop_dev_select (struct ata_port *ap, unsigned int device); extern void ata_std_dev_select (struct ata_port *ap, unsigned int device); extern u8 ata_check_status(struct ata_port *ap); extern u8 ata_altstatus(struct ata_port *ap); -extern void ata_exec_command(struct ata_port *ap, const struct ata_taskfile *tf); +extern void ata_exec_command(struct ata_port *ap, struct ata_taskfile *tf); extern int ata_port_start (struct ata_port *ap); extern void ata_port_stop (struct ata_port *ap); extern void ata_host_stop (struct ata_host_set *host_set); ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 16:20 ` Mark Lord @ 2006-02-25 17:45 ` Justin Piszcz 2006-02-25 18:28 ` Mark Lord 0 siblings, 1 reply; 131+ messages in thread From: Justin Piszcz @ 2006-02-25 17:45 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list Second patch fails for me. On a clean 2.6.15.4 source tree: p34:/usr/src# ls -ld linux lrwxrwxrwx 1 root src 14 2006-02-25 12:41 linux -> linux-2.6.15.4/ The one from your e-mail earlier: p34:/usr/src/linux# patch -p1 < /tmp/patch1 patching file drivers/scsi/libata-scsi.c Hunk #1 succeeded at 404 (offset -16 lines). Hunk #2 succeeded at 414 (offset -16 lines). Hunk #3 succeeded at 493 (offset -16 lines). Hunk #4 succeeded at 505 (offset -16 lines). Hunk #5 succeeded at 547 (offset -16 lines). Hunk #6 succeeded at 622 (offset -16 lines). p34:/usr/src/linux# patch -p1 < /tmp/12_libata_ata_opcode.patch patching file drivers/scsi/libata-core.c Hunk #1 succeeded at 245 (offset -8 lines). Hunk #2 succeeded at 267 (offset -8 lines). Hunk #3 succeeded at 288 (offset -8 lines). Hunk #4 succeeded at 310 (offset -8 lines). Hunk #5 succeeded at 500 (offset -8 lines). Hunk #6 FAILED at 626. 1 out of 6 hunks FAILED -- saving rejects to file drivers/scsi/libata-core.c.rej patching file drivers/scsi/libata-scsi.c Hunk #1 succeeded at 414 (offset -24 lines). Hunk #2 succeeded at 493 (offset -24 lines). Hunk #3 FAILED at 505. Hunk #4 succeeded at 547 (offset -24 lines). Hunk #5 succeeded at 622 (offset -24 lines). Hunk #6 succeeded at 1308 (offset -29 lines). 1 out of 6 hunks FAILED -- saving rejects to file drivers/scsi/libata-scsi.c.rej patching file include/linux/ata.h Hunk #1 succeeded at 239 (offset -5 lines). patching file include/linux/libata.h Hunk #1 succeeded at 368 (offset -52 lines). Hunk #2 succeeded at 452 (offset -60 lines). p34:/usr/src/linux# Should I be using 2.6.16-rcX? On Sat, 25 Feb 2006, Mark Lord wrote: > David Greaves wrote: > .. >> Thanks Mark - I've finally gotten this patch applied. >> >> With smartd disabled and no smart commands issued, a readonly badblocks >> scan of /dev/sdb2 shows no problems and now gives: >> Feb 25 10:38:31 haze kernel: ata2: status=0x51 { DriveReady SeekComplete >> Error } >> Feb 25 10:38:32 haze kernel: ata2: no sense translation for op=0x28 >> status: 0x51 >> Feb 25 10:38:32 haze kernel: ata2: status=0x51 { DriveReady SeekComplete >> Error } >> Feb 25 10:38:35 haze kernel: ata2: no sense translation for op=0x28 >> status: 0x51 >> hundreds of times. > .. > > Mmmm.. okay, it's happening due to a SCSI READ_10 opcode, > which means it isn't being triggered by any of the FUA stuff. > > But there's still no obvious reason for the error. > The drive is basically just saying "command rejected", > and libata-scsi is translating that into "medium error" > for some unknown reason. > > Unfortunately, the design of the current libata is such that > we no longer have access to the actual ATA opcode that was rejected. > It gets overwritten by the returned drive status on completion. > > So.. I need to generate another patch for you now, to save/show > the real ATA opcode that was used to cause the errors. > My theory is that we'll discover that it is one that your drive > legitimately is rejecting (unsupported LBA48 or something..). > > But we won't know until we see the output. > > Second patch is attached: apply *in addition* to the first one. > > Cheers > > ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 17:45 ` Justin Piszcz @ 2006-02-25 18:28 ` Mark Lord 2006-02-25 18:55 ` Justin Piszcz ` (2 more replies) 0 siblings, 3 replies; 131+ messages in thread From: Mark Lord @ 2006-02-25 18:28 UTC (permalink / raw) To: Justin Piszcz Cc: Mark Lord, David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list Justin Piszcz wrote: > Second patch fails for me. .. > Should I be using 2.6.16-rcX? Mmm... that's what I'm using (plus other patches), so, yes.. give that a try. 2.6.16 does seem to be shaping up to be a nice kernel. Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 18:28 ` Mark Lord @ 2006-02-25 18:55 ` Justin Piszcz 2006-02-25 19:29 ` Justin Piszcz 2006-02-25 19:47 ` David Greaves 2 siblings, 0 replies; 131+ messages in thread From: Justin Piszcz @ 2006-02-25 18:55 UTC (permalink / raw) To: Mark Lord Cc: Mark Lord, David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list I will give 2.6.16-rcX a try shortly, here is the error again (with a freshly patched 2.6.15.4) just to rule out any problems with the first time that I patched: [ 1037.451784] ata3: translated op=0x2a ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [ 1037.451791] ata3: status=0x51 { DriveReady SeekComplete Error } [ 1037.451796] ata3: error=0x04 { DriveStatusError } [ 1517.050496] ata3: no sense translation for op=0x2a status: 0x51 [ 1517.050504] ata3: translated op=0x2a ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 [ 1517.050506] ata3: status=0x51 { DriveReady SeekComplete Error } On Sat, 25 Feb 2006, Mark Lord wrote: > Justin Piszcz wrote: >> Second patch fails for me. > .. >> Should I be using 2.6.16-rcX? > > Mmm... that's what I'm using (plus other patches), > so, yes.. give that a try. 2.6.16 does seem to > be shaping up to be a nice kernel. > > Cheers > ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 18:28 ` Mark Lord 2006-02-25 18:55 ` Justin Piszcz @ 2006-02-25 19:29 ` Justin Piszcz 2006-02-25 19:53 ` David Greaves 2006-02-25 19:47 ` David Greaves 2 siblings, 1 reply; 131+ messages in thread From: Justin Piszcz @ 2006-02-25 19:29 UTC (permalink / raw) To: Mark Lord Cc: Mark Lord, David Greaves, Jeff Garzik, linux-kernel, IDE/ATA development list Which kernel did you run your patch against? With 2.6.16-rc4.... First patch looks good.. p34:/usr/src/linux# patch -p1 < /tmp/patch1 patching file drivers/scsi/libata-scsi.c p34:/usr/src/linux# patch -p1 < /tmp/12_libata_ata_opcode.patch patching file drivers/scsi/libata-core.c Hunk #1 succeeded at 245 (offset -8 lines). Hunk #2 succeeded at 267 (offset -8 lines). Hunk #3 succeeded at 288 (offset -8 lines). Hunk #4 succeeded at 310 (offset -8 lines). Hunk #5 succeeded at 500 (offset -8 lines). Hunk #6 succeeded at 626 (offset -8 lines). patching file drivers/scsi/libata-scsi.c Hunk #1 succeeded at 430 (offset -8 lines). Hunk #2 succeeded at 509 (offset -8 lines). Hunk #3 FAILED at 521. Hunk #4 succeeded at 563 (offset -8 lines). Hunk #5 succeeded at 638 (offset -8 lines). Hunk #6 succeeded at 1329 (offset -8 lines). 1 out of 6 hunks FAILED -- saving rejects to file drivers/scsi/libata-scsi.c.rej patching file include/linux/ata.h patching file include/linux/libata.h Hunk #1 succeeded at 373 (offset -47 lines). Hunk #2 succeeded at 463 (offset -49 lines). p34:/usr/src/linux# ls -ld /usr/src/linux lrwxrwxrwx 1 root src 16 2006-02-25 14:24 /usr/src/linux -> linux-2.6.16-rc4/ p34:/usr/src/linux# Here is the *.rej file: # cat libata-scsi.c.rej *************** *** 521,528 **** *ascq = 0x04; /* "auto-reallocation failed" */ translate_done: - DPRINTK(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to " - "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err, *sk, *asc, *ascq); return; } --- 521,528 ---- *ascq = 0x04; /* "auto-reallocation failed" */ translate_done: + DPRINTK(KERN_ERR "ata%u: translated op=0x%02x cmd=0x%02x ATA stat/err 0x%02x/%02x to " + "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, cmd, drv_stat, drv_err, *sk, *asc, *ascq); return; } On Sat, 25 Feb 2006, Mark Lord wrote: > Justin Piszcz wrote: >> Second patch fails for me. > .. >> Should I be using 2.6.16-rcX? > > Mmm... that's what I'm using (plus other patches), > so, yes.. give that a try. 2.6.16 does seem to > be shaping up to be a nice kernel. > > Cheers > ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 19:29 ` Justin Piszcz @ 2006-02-25 19:53 ` David Greaves 0 siblings, 0 replies; 131+ messages in thread From: David Greaves @ 2006-02-25 19:53 UTC (permalink / raw) To: Justin Piszcz Cc: Mark Lord, Mark Lord, Jeff Garzik, linux-kernel, IDE/ATA development list Justin Piszcz wrote: > Which kernel did you run your patch against? > > With 2.6.16-rc4.... > > First patch looks good.. > Justin, I'll help you out off-list :) David ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 18:28 ` Mark Lord 2006-02-25 18:55 ` Justin Piszcz 2006-02-25 19:29 ` Justin Piszcz @ 2006-02-25 19:47 ` David Greaves 2006-02-26 2:27 ` Mark Lord 2 siblings, 1 reply; 131+ messages in thread From: David Greaves @ 2006-02-25 19:47 UTC (permalink / raw) To: Mark Lord Cc: Justin Piszcz, Mark Lord, Jeff Garzik, linux-kernel, IDE/ATA development list Mark Lord wrote: > Justin Piszcz wrote: > >> Should I be using 2.6.16-rcX? > > > Mmm... that's what I'm using (plus other patches), > so, yes.. give that a try. 2.6.16 does seem to > be shaping up to be a nice kernel. OK, failed for me too - I updated to 2.6.16-rc4 and it still failed (despite -F) so I fixed by hand. (printk -> DPRINTK anyway: Linux haze 2.6.16-rc4patched #1 PREEMPT Sat Feb 25 19:29:11 UTC 2006 i686 GNU/Linux ata2: status=0x51 { DriveReady SeekComplete Error } ata2: error=0x04 { DriveStatusError } ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } sd 1:0:0:0: SCSI error: return code = 0x8000002 sdb: Current: sense key: Medium Error Additional sense: Unrecovered read error - auto reallocate failed end_request: I/O error, dev sdb, sector 398283329 raid1: Disk failure on sdb2, disabling device. Operation continuing on 1 devices and later... device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com XFS mounting filesystem dm-0 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x04 { DriveStatusError } ata1: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata1: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } ata1: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata1: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } ata1: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata1: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } ata1: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata1: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } sd 0:0:0:0: SCSI error: return code = 0x8000002 sda: Current: sense key: Medium Error Additional sense: Unrecovered read error - auto reallocate failed end_request: I/O error, dev sda, sector 390716735 raid5: Disk failure on sda1, disabling device. Operation continuing on 2 devices ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } sd 1:0:0:0: SCSI error: return code = 0x8000002 sdb: Current: sense key: Medium Error Additional sense: Unrecovered read error - auto reallocate failed end_request: I/O error, dev sdb, sector 390716735 raid5: Disk failure on sdb1, disabling device. Operation continuing on 1 devices RAID5 conf printout: --- rd:3 wd:1 fd:2 disk 0, o:1, dev:sdd1 disk 1, o:0, dev:sdb1 disk 2, o:0, dev:sda1 xfs_force_shutdown(dm-0,0x1) called from line 338 of file fs/xfs/xfs_rw.c. Return address = 0xc020c0e9 Filesystem "dm-0": I/O Error Detected. Shutting down filesystem: dm-0 Please umount the filesystem, and rectify the problem(s) I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x640884a ("xlog_bwrite") error 5 buf count 262144 XFS: failed to locate log tail XFS: log mount/recovery failed: error 5 XFS: log mount failed RAID5 conf printout: --- rd:3 wd:1 fd:2 disk 0, o:1, dev:sdd1 disk 1, o:0, dev:sdb1 RAID5 conf printout: --- rd:3 wd:1 fd:2 disk 0, o:1, dev:sdd1 disk 1, o:0, dev:sdb1 RAID5 conf printout: --- rd:3 wd:1 fd:2 disk 0, o:1, dev:sdd1 So I guess my raid just blew up too... hope there's no corruption! David (PS Hi Mark, this is lbt from the Empeg BBS :) ) -- ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-25 19:47 ` David Greaves @ 2006-02-26 2:27 ` Mark Lord 2006-02-26 9:56 ` David Greaves 2006-02-26 12:27 ` James Courtier-Dutton 0 siblings, 2 replies; 131+ messages in thread From: Mark Lord @ 2006-02-26 2:27 UTC (permalink / raw) To: David Greaves Cc: Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun David Greaves wrote: > > Linux haze 2.6.16-rc4patched #1 PREEMPT Sat Feb 25 19:29:11 UTC 2006 > i686 GNU/Linux > > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: error=0x04 { DriveStatusError } > ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 > ata2: status=0x51 { DriveReady SeekComplete Error } > sd 1:0:0:0: SCSI error: return code = 0x8000002 > sdb: Current: sense key: Medium Error > Additional sense: Unrecovered read error - auto reallocate failed > end_request: I/O error, dev sdb, sector 398283329 > raid1: Disk failure on sdb2, disabling device. > Operation continuing on 1 devices Oh good, *now* we've gotten somewhere!! Albert / Jens / Jeff: The command failing above is SCSI WRITE_10, which is being translated into ATA_CMD_WRITE_FUA_EXT by libata. This command fails -- unrecognized by the drive in question. But libata reports it (most incorrectly) as a "medium error", and the drive is taken out of service from its RAID. Bad, bad, and worse. Libata should really recover from this, by recognizing that the command was rejected, and replacing it with a simple WRITE_EXT instead. Possibly followed by FLUSH_CACHE. So.. I've forgotten who put FUA into libata, but hopefully it's one of the folks on the CC: list, and that nice person can now generate a patch to fix this bug somehow. Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-26 2:27 ` Mark Lord @ 2006-02-26 9:56 ` David Greaves 2006-02-26 14:04 ` Mark Lord 2006-02-26 12:27 ` James Courtier-Dutton 1 sibling, 1 reply; 131+ messages in thread From: David Greaves @ 2006-02-26 9:56 UTC (permalink / raw) To: Mark Lord Cc: Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun, Linus Torvalds Mark Lord wrote: >> sdb: Current: sense key: Medium Error >> Additional sense: Unrecovered read error - auto reallocate failed >> end_request: I/O error, dev sdb, sector 398283329 >> raid1: Disk failure on sdb2, disabling device. >> Operation continuing on 1 devices > > > Oh good, *now* we've gotten somewhere!! > > Albert / Jens / Jeff: > > The command failing above is SCSI WRITE_10, which is being > translated into ATA_CMD_WRITE_FUA_EXT by libata. > > This command fails -- unrecognized by the drive in question. > But libata reports it (most incorrectly) as a "medium error", > and the drive is taken out of service from its RAID. > > Bad, bad, and worse. > > Libata should really recover from this, by recognizing that > the command was rejected, and replacing it with a simple > WRITE_EXT instead. Possibly followed by FLUSH_CACHE. > > So.. I've forgotten who put FUA into libata, but hopefully > it's one of the folks on the CC: list, and that nice person > can now generate a patch to fix this bug somehow. Thanks Mark I'm glad it's a bug and not bad hardware. I am quite concerned that the basic effect of just booting a practically vanilla 2.6.16-rc4 like this was to fry my raid array. Luckily it dropped 2 (of 3) disks so quickly that the event counter was the same allowing an easy rebuild. 2.6.15 has similar issues but they seem to happen *very* infrequently by comparison - this hit me several times during a single boot. Should Linus (cc'ed) hold off on 2.6.16 because of this or not? David ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-26 9:56 ` David Greaves @ 2006-02-26 14:04 ` Mark Lord 2006-02-27 21:34 ` Mark Lord 0 siblings, 1 reply; 131+ messages in thread From: Mark Lord @ 2006-02-26 14:04 UTC (permalink / raw) To: David Greaves Cc: Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun, Linus Torvalds David Greaves wrote: > Mark Lord wrote: > >>> sdb: Current: sense key: Medium Error >>> Additional sense: Unrecovered read error - auto reallocate failed >>> end_request: I/O error, dev sdb, sector 398283329 >>> raid1: Disk failure on sdb2, disabling device. >>> Operation continuing on 1 devices .. >> The command failing above is SCSI WRITE_10, which is being >> translated into ATA_CMD_WRITE_FUA_EXT by libata. >> >> This command fails -- unrecognized by the drive in question. >> But libata reports it (most incorrectly) as a "medium error", >> and the drive is taken out of service from its RAID. >> >> Bad, bad, and worse. .. > Thanks Mark > > I'm glad it's a bug and not bad hardware. > > I am quite concerned that the basic effect of just booting a practically > vanilla 2.6.16-rc4 like this was to fry my raid array. > > Luckily it dropped 2 (of 3) disks so quickly that the event counter was > the same allowing an easy rebuild. > > 2.6.15 has similar issues but they seem to happen *very* infrequently by > comparison - this hit me several times during a single boot. > > Should Linus (cc'ed) hold off on 2.6.16 because of this or not? Well, no doubt whatsoever about it being a "regression", since the FUA code is *new* in 2.6.16 (not present in 2.6.15). The FUA code should either get fixed, or removed from 2.6.16. Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-26 14:04 ` Mark Lord @ 2006-02-27 21:34 ` Mark Lord 2006-02-28 1:33 ` Tejun Heo 0 siblings, 1 reply; 131+ messages in thread From: Mark Lord @ 2006-02-27 21:34 UTC (permalink / raw) To: Jeff Garzik Cc: David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun, Linus Torvalds Mark Lord wrote: >> Mark Lord wrote: >> >>>> sdb: Current: sense key: Medium Error >>>> Additional sense: Unrecovered read error - auto reallocate failed >>>> end_request: I/O error, dev sdb, sector 398283329 >>>> raid1: Disk failure on sdb2, disabling device. >>>> Operation continuing on 1 devices > .. >>> The command failing above is SCSI WRITE_10, which is being >>> translated into ATA_CMD_WRITE_FUA_EXT by libata. >>> >>> This command fails -- unrecognized by the drive in question. >>> But libata reports it (most incorrectly) as a "medium error", >>> and the drive is taken out of service from its RAID. >>> >>> Bad, bad, and worse. .. hold off on 2.6.16 because of this or not? > > Well, no doubt whatsoever about it being a "regression", > since the FUA code is *new* in 2.6.16 (not present in 2.6.15). > > The FUA code should either get fixed, or removed from 2.6.16. Actually, now that I've done a little more digging, this FUA stuff is inherently dangerous as implemented. A least a few SATA controllers including pipelines and whatnot that rely upon recognizing the (S)ATA opcodes being using. And I sincerely doubt that any of those will recognize the very newish (and aptly named..) FUA opcodes. These may be unsafe in general, unless we tag controllers as FUA-capable and NON-FUA-capable, in addition to tagging the drives. :/ ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-27 21:34 ` Mark Lord @ 2006-02-28 1:33 ` Tejun Heo 2006-02-28 1:46 ` Linus Torvalds 2006-02-28 4:16 ` Mark Lord 0 siblings, 2 replies; 131+ messages in thread From: Tejun Heo @ 2006-02-28 1:33 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Hello, Mark. Mark Lord wrote: > > .. hold off on 2.6.16 because of this or not? > It certainly is dangerous. I guess we should turn off FUA for the time being. Barrier auto-fallback was once implemented but it didn't seem like a good idea as it was too complex and hides low level bug from higher level. The concensus seems to be developing blacklist of drives which lie about FUA support (currently only one drive). Official kernel doesn't seem to be the correct place to grow the blacklist, Maybe we should do it from -mm? >> >> Well, no doubt whatsoever about it being a "regression", >> since the FUA code is *new* in 2.6.16 (not present in 2.6.15). >> >> The FUA code should either get fixed, or removed from 2.6.16. > > > Actually, now that I've done a little more digging, this FUA stuff > is inherently dangerous as implemented. A least a few SATA controllers > including pipelines and whatnot that rely upon recognizing the (S)ATA > opcodes being using. And I sincerely doubt that any of those will > recognize the very newish (and aptly named..) FUA opcodes. > > These may be unsafe in general, unless we tag controllers as > FUA-capable and NON-FUA-capable, in addition to tagging the drives. All sii controllers and piix/ahci seem to handle FUA pretty ok. And yeah, we may have to create controller blacklist too. BTW, can you let me know what drive we're talking about now (model name and firmware revision)? -- tejun ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 1:33 ` Tejun Heo @ 2006-02-28 1:46 ` Linus Torvalds 2006-02-28 2:07 ` Jeff Garzik 2006-02-28 8:03 ` Jens Axboe 2006-02-28 4:16 ` Mark Lord 1 sibling, 2 replies; 131+ messages in thread From: Linus Torvalds @ 2006-02-28 1:46 UTC (permalink / raw) To: Tejun Heo Cc: Mark Lord, Jeff Garzik, David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe On Tue, 28 Feb 2006, Tejun Heo wrote: > Hello, Mark. > > Mark Lord wrote: > > > > .. hold off on 2.6.16 because of this or not? > > > > It certainly is dangerous. I guess we should turn off FUA for the time being. > Barrier auto-fallback was once implemented but it didn't seem like a good idea > as it was too complex and hides low level bug from higher level. The concensus > seems to be developing blacklist of drives which lie about FUA support > (currently only one drive). Official kernel doesn't seem to be the correct > place to grow the blacklist, Maybe we should do it from -mm? For 2.6.16, the only sane solution for now is to just turn it off. Somebody want to send me a patch that does that, along with an ack from Mark (and whoever else sees this) that it fixes his/their problems? Linus ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 1:46 ` Linus Torvalds @ 2006-02-28 2:07 ` Jeff Garzik 2006-02-28 2:14 ` Linus Torvalds 2006-02-28 10:30 ` Alan Cox 2006-02-28 8:03 ` Jens Axboe 1 sibling, 2 replies; 131+ messages in thread From: Jeff Garzik @ 2006-02-28 2:07 UTC (permalink / raw) To: Linus Torvalds Cc: Tejun Heo, Mark Lord, David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe [-- Attachment #1: Type: text/plain, Size: 312 bytes --] Linus Torvalds wrote: > For 2.6.16, the only sane solution for now is to just turn it off. > > Somebody want to send me a patch that does that, along with an ack from > Mark (and whoever else sees this) that it fixes his/their problems? I've had this waiting in the wings, in fact... [see attached] Jeff [-- Attachment #2: libata.txt --] [-- Type: text/plain, Size: 1644 bytes --] Please pull from 'upstream-fixes' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git to receive the following updates: drivers/scsi/libata-core.c | 4 ++++ drivers/scsi/libata-scsi.c | 2 ++ drivers/scsi/libata.h | 1 + 3 files changed, 7 insertions(+) Jeff Garzik: [libata] Disable FUA by default diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c index 5f1d758..ab3c9a4 100644 --- a/drivers/scsi/libata-core.c +++ b/drivers/scsi/libata-core.c @@ -82,6 +82,10 @@ int atapi_enabled = 0; module_param(atapi_enabled, int, 0444); MODULE_PARM_DESC(atapi_enabled, "Enable discovery of ATAPI devices (0=off, 1=on)"); +int fua = 0; +module_param(fua, int, 0444); +MODULE_PARM_DESC(fua, "FUA support (0=off, 1=on)"); + MODULE_AUTHOR("Jeff Garzik"); MODULE_DESCRIPTION("Library module for ATA devices"); MODULE_LICENSE("GPL"); diff --git a/drivers/scsi/libata-scsi.c b/drivers/scsi/libata-scsi.c index 07b1e7c..5ce33ae 100644 --- a/drivers/scsi/libata-scsi.c +++ b/drivers/scsi/libata-scsi.c @@ -1708,6 +1708,8 @@ static int ata_dev_supports_fua(u16 *id) { unsigned char model[41], fw[9]; + if (!fua) + return 0; if (!ata_id_has_fua(id)) return 0; diff --git a/drivers/scsi/libata.h b/drivers/scsi/libata.h index e03ce48..abfd18f 100644 --- a/drivers/scsi/libata.h +++ b/drivers/scsi/libata.h @@ -41,6 +41,7 @@ struct ata_scsi_args { /* libata-core.c */ extern int atapi_enabled; +extern int fua; extern struct ata_queued_cmd *ata_qc_new_init(struct ata_port *ap, struct ata_device *dev); extern int ata_rwcmd_protocol(struct ata_queued_cmd *qc); ^ permalink raw reply related [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 2:07 ` Jeff Garzik @ 2006-02-28 2:14 ` Linus Torvalds 2006-02-28 2:52 ` Jeff Garzik 2006-02-28 3:36 ` Jeff Garzik 2006-02-28 10:30 ` Alan Cox 1 sibling, 2 replies; 131+ messages in thread From: Linus Torvalds @ 2006-02-28 2:14 UTC (permalink / raw) To: Jeff Garzik Cc: Tejun Heo, Mark Lord, David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe On Mon, 27 Feb 2006, Jeff Garzik wrote: > > I've had this waiting in the wings, in fact... [see attached] I really hate having a _global_ variable called "fua". That's just bad taste. I would suggest calling it "atapi_forced_unit_attention_enabled", but maybe that is going a bit overboard. It's definitely better than just "fua", though. Linus ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 2:14 ` Linus Torvalds @ 2006-02-28 2:52 ` Jeff Garzik 2006-02-28 3:36 ` Jeff Garzik 1 sibling, 0 replies; 131+ messages in thread From: Jeff Garzik @ 2006-02-28 2:52 UTC (permalink / raw) To: Linus Torvalds Cc: Tejun Heo, Mark Lord, David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe Linus Torvalds wrote: > > On Mon, 27 Feb 2006, Jeff Garzik wrote: > >>I've had this waiting in the wings, in fact... [see attached] > > > I really hate having a _global_ variable called "fua". That's just bad > taste. I would suggest calling it "atapi_forced_unit_attention_enabled", > but maybe that is going a bit overboard. It's definitely better than just > "fua", though. <shrug> It will go away when things are fixed, and only users who are testing will even bother with it. Looking over the module subsystem, it looks like one could use module_param_named() to achieve proper namespace separation (C versus module opt) -- then you could call it libata_fua -- but for a temporary module option it seems like more trouble than its worth. Jeff ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 2:14 ` Linus Torvalds 2006-02-28 2:52 ` Jeff Garzik @ 2006-02-28 3:36 ` Jeff Garzik 2006-02-28 4:11 ` Mark Lord 1 sibling, 1 reply; 131+ messages in thread From: Jeff Garzik @ 2006-02-28 3:36 UTC (permalink / raw) To: Linus Torvalds Cc: Tejun Heo, Mark Lord, David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe [-- Attachment #1: Type: text/plain, Size: 436 bytes --] Linus Torvalds wrote: > > On Mon, 27 Feb 2006, Jeff Garzik wrote: > >>I've had this waiting in the wings, in fact... [see attached] > > > I really hate having a _global_ variable called "fua". That's just bad > taste. I would suggest calling it "atapi_forced_unit_attention_enabled", > but maybe that is going a bit overboard. It's definitely better than just > "fua", though. Here's the cleaner namespace version... Jeff [-- Attachment #2: libata.txt --] [-- Type: text/plain, Size: 1672 bytes --] Please pull from 'upstream-fixes' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git to receive the following updates: drivers/scsi/libata-core.c | 4 ++++ drivers/scsi/libata-scsi.c | 2 ++ drivers/scsi/libata.h | 1 + 3 files changed, 7 insertions(+) Jeff Garzik: [libata] Disable FUA diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c index 5f1d758..4f91b0d 100644 --- a/drivers/scsi/libata-core.c +++ b/drivers/scsi/libata-core.c @@ -82,6 +82,10 @@ int atapi_enabled = 0; module_param(atapi_enabled, int, 0444); MODULE_PARM_DESC(atapi_enabled, "Enable discovery of ATAPI devices (0=off, 1=on)"); +int libata_fua = 0; +module_param_named(fua, libata_fua, int, 0444); +MODULE_PARM_DESC(fua, "FUA support (0=off, 1=on)"); + MODULE_AUTHOR("Jeff Garzik"); MODULE_DESCRIPTION("Library module for ATA devices"); MODULE_LICENSE("GPL"); diff --git a/drivers/scsi/libata-scsi.c b/drivers/scsi/libata-scsi.c index 07b1e7c..59503c9 100644 --- a/drivers/scsi/libata-scsi.c +++ b/drivers/scsi/libata-scsi.c @@ -1708,6 +1708,8 @@ static int ata_dev_supports_fua(u16 *id) { unsigned char model[41], fw[9]; + if (!libata_fua) + return 0; if (!ata_id_has_fua(id)) return 0; diff --git a/drivers/scsi/libata.h b/drivers/scsi/libata.h index e03ce48..fddaf47 100644 --- a/drivers/scsi/libata.h +++ b/drivers/scsi/libata.h @@ -41,6 +41,7 @@ struct ata_scsi_args { /* libata-core.c */ extern int atapi_enabled; +extern int libata_fua; extern struct ata_queued_cmd *ata_qc_new_init(struct ata_port *ap, struct ata_device *dev); extern int ata_rwcmd_protocol(struct ata_queued_cmd *qc); ^ permalink raw reply related [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 3:36 ` Jeff Garzik @ 2006-02-28 4:11 ` Mark Lord 0 siblings, 0 replies; 131+ messages in thread From: Mark Lord @ 2006-02-28 4:11 UTC (permalink / raw) To: Jeff Garzik Cc: Linus Torvalds, Tejun Heo, David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe Jeff Garzik wrote: > Linus Torvalds wrote: .. >> I really hate having a _global_ variable called "fua". That's just bad >> taste. I would suggest calling it "atapi_forced_unit_attention_enabled" Heh heh.. It's actually short for "Force Unit Access", though oddly enough I don't think the patch mentions that in the MODULE_PARM_DESC(). > Here's the cleaner namespace version... David, do you want to ack this one for us? Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 2:07 ` Jeff Garzik 2006-02-28 2:14 ` Linus Torvalds @ 2006-02-28 10:30 ` Alan Cox 1 sibling, 0 replies; 131+ messages in thread From: Alan Cox @ 2006-02-28 10:30 UTC (permalink / raw) To: Jeff Garzik Cc: Linus Torvalds, Tejun Heo, Mark Lord, David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe On Llu, 2006-02-27 at 21:07 -0500, Jeff Garzik wrote: > led, "Enable discovery of ATAPI devices (0=off, 1=on)"); > > +int fua = 0; > +module_param(fua, int, 0444); > +MODULE_PARM_DESC(fua, "FUA support (0=off, 1=on)"); > + Not a good name for a global. ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 1:46 ` Linus Torvalds 2006-02-28 2:07 ` Jeff Garzik @ 2006-02-28 8:03 ` Jens Axboe 1 sibling, 0 replies; 131+ messages in thread From: Jens Axboe @ 2006-02-28 8:03 UTC (permalink / raw) To: Linus Torvalds Cc: Tejun Heo, Mark Lord, Jeff Garzik, David Greaves, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc On Mon, Feb 27 2006, Linus Torvalds wrote: > > > On Tue, 28 Feb 2006, Tejun Heo wrote: > > > Hello, Mark. > > > > Mark Lord wrote: > > > > > > .. hold off on 2.6.16 because of this or not? > > > > > > > It certainly is dangerous. I guess we should turn off FUA for the > > time being. Barrier auto-fallback was once implemented but it > > didn't seem like a good idea as it was too complex and hides low > > level bug from higher level. The concensus seems to be developing > > blacklist of drives which lie about FUA support (currently only one > > drive). Official kernel doesn't seem to be the correct place to grow > > the blacklist, Maybe we should do it from -mm? > > For 2.6.16, the only sane solution for now is to just turn it off. > > Somebody want to send me a patch that does that, along with an ack from > Mark (and whoever else sees this) that it fixes his/their problems? That's the best solution right now. I guess there's no way around a blacklist for FUA support and we need time to grow that :-( And proper fallback to non-FUA writes with disabling FUA based barriers as well. Mark, what drive model+firmware are you using? -- Jens Axboe ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 1:33 ` Tejun Heo 2006-02-28 1:46 ` Linus Torvalds @ 2006-02-28 4:16 ` Mark Lord 2006-02-28 10:32 ` Alan Cox 2006-02-28 10:39 ` David Greaves 1 sibling, 2 replies; 131+ messages in thread From: Mark Lord @ 2006-02-28 4:16 UTC (permalink / raw) To: Tejun Heo, David Greaves Cc: Mark Lord, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Tejun Heo wrote: .. >> These may be unsafe in general, unless we tag controllers as >> FUA-capable and NON-FUA-capable, in addition to tagging the drives. > > All sii controllers and piix/ahci seem to handle FUA pretty ok. And > yeah, we may have to create controller blacklist too. Or maybe a whitelist instead, since nearly all existing hardware pre-dates FUA commands. Or maybe just have a libata function to test whether the FUA commands actually work or not, before enabling them for general use. *That* could be a much better approach, given the large number of possible drive/controller combos, and it cuts down on the maintenance headache of having to list everything on a list somewhere. > BTW, can you let me know what drive we're talking about now (model name > and firmware revision)? David: we need to see the output from "hdparm --Istdout /dev/sda (or whichever drive it was that was failing on your system). Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 4:16 ` Mark Lord @ 2006-02-28 10:32 ` Alan Cox 2006-02-28 10:30 ` Justin Piszcz 2006-02-28 10:39 ` David Greaves 1 sibling, 1 reply; 131+ messages in thread From: Alan Cox @ 2006-02-28 10:32 UTC (permalink / raw) To: Mark Lord Cc: Tejun Heo, David Greaves, Mark Lord, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Llu, 2006-02-27 at 23:16 -0500, Mark Lord wrote: > Or maybe a whitelist instead, since nearly all existing hardware > pre-dates FUA commands. For controllers just add it as a host flag and it can be handled the same way as LBA48 is right now. It may also be some hosts can issue FUA with a bit of bandaging (state machine resets/pio etc) Alan ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 10:32 ` Alan Cox @ 2006-02-28 10:30 ` Justin Piszcz 0 siblings, 0 replies; 131+ messages in thread From: Justin Piszcz @ 2006-02-28 10:30 UTC (permalink / raw) To: Alan Cox Cc: Mark Lord, Tejun Heo, David Greaves, Mark Lord, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Tue, 28 Feb 2006, Alan Cox wrote: > On Llu, 2006-02-27 at 23:16 -0500, Mark Lord wrote: >> Or maybe a whitelist instead, since nearly all existing hardware >> pre-dates FUA commands. > > For controllers just add it as a host flag and it can be handled the > same way as LBA48 is right now. It may also be some hosts can issue FUA > with a bit of bandaging (state machine resets/pio etc) > > Alan > While I have not yet been able to reproduce the problem with the verbose patch, here is the hdparm -I: /dev/sdc: ATA device, with non-removable media Model Number: WDC WD4000KD-00NAB0 Serial Number: WD-WMAMY1020930 Firmware Revision: 01.06A01 Standards: Supported: 7 6 5 4 Likely used: 7 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 781422768 device size with M = 1024*1024: 381554 MBytes device size with M = 1000*1000: 400088 MBytes (400 GB) Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, with device specific minimum R/W multiple sector transfer: Max = 16 Current = 0 Recommended acoustic management value: 128, current value: 254 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * NOP cmd * READ BUFFER cmd * WRITE BUFFER cmd * Host Protected Area feature set * Look-ahead * Write cache * Power Management feature set Security Mode feature set * SMART feature set * FLUSH CACHE EXT command * Mandatory FLUSH CACHE command * Device Configuration Overlay feature set * 48-bit Address feature set Automatic Acoustic Management feature set SET MAX security extension * DOWNLOAD MICROCODE cmd * General Purpose Logging feature set * SMART self-test * SMART error logging Security: supported not enabled not locked not frozen not expired: security count not supported: enhanced erase Checksum: correct ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 4:16 ` Mark Lord 2006-02-28 10:32 ` Alan Cox @ 2006-02-28 10:39 ` David Greaves 2006-02-28 14:37 ` Mark Lord ` (2 more replies) 1 sibling, 3 replies; 131+ messages in thread From: David Greaves @ 2006-02-28 10:39 UTC (permalink / raw) To: Mark Lord Cc: Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Mark Lord wrote: > Tejun Heo wrote: > >> BTW, can you let me know what drive we're talking about now (model >> name and firmware revision)? > > > David: we need to see the output from "hdparm --Istdout /dev/sda > (or whichever drive it was that was failing on your system). > > Cheers > So here's the info for sda and sdb (see below for related log data). /dev/sda: IO_support = 0 (default 16-bit) readonly = 0 (off) readahead = 256 (on) geometry = 24321/255/63, sectors = 390721968, start = 0 0040 3fff c837 0010 0000 0000 003f 0000 0000 0000 4234 3033 3852 5248 2020 2020 2020 2020 2020 2020 0003 4000 0004 4241 4e43 3139 3830 4d61 7874 6f72 2036 4232 3030 4d30 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 8010 0000 2f00 4000 0200 0000 0007 3fff 0010 003f fc10 00fb 0100 ffff 0fff 0000 0007 0003 0078 0078 0078 0078 0000 0000 0000 0000 0000 0000 0000 0002 0000 0000 0000 00fe 001e 7869 7d09 4043 7869 3c01 4043 203f 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 f1b0 1749 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0113 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 d3a5 /dev/sdb: IO_support = 0 (default 16-bit) readonly = 0 (off) readahead = 256 (on) geometry = 24792/255/63, sectors = 398297088, start = 0 0040 3fff c837 0010 0000 0000 003f 0000 0000 0000 4234 3152 5641 3148 2020 2020 2020 2020 2020 2020 0003 4000 0004 4241 4e43 3142 5930 4d61 7874 6f72 2036 4232 3030 4d30 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 8010 0000 2f00 4000 0200 0000 0007 3fff 0010 003f fc10 00fb 0100 ffff 0fff 0000 0007 0003 0078 0078 0078 0078 0000 0000 0000 0000 0000 0000 001f 0102 0000 0000 0000 00fe 001e 7c6b 7f09 4063 7c69 3e01 4063 207f 0000 0000 0000 fffe 0000 c0fe 0000 0000 0000 0000 0000 8800 17bd 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0113 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 d8a5 The info below is from the log I saved booted with 2.6.16-rc4 I got these errors: sd 0:0:0:0: SCSI error: return code = 0x8000002 sda: Current: sense key: Medium Error Additional sense: Unrecovered read error - auto reallocate failed end_request: I/O error, dev sda, sector 390716735 raid5: Disk failure on sda1, disabling device. Operation continuing on 2 devices ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 ata2: status=0x51 { DriveReady SeekComplete Error } sd 1:0:0:0: SCSI error: return code = 0x8000002 sdb: Current: sense key: Medium Error Additional sense: Unrecovered read error - auto reallocate failed end_request: I/O error, dev sdb, sector 390716735 raid5: Disk failure on sdb1, disabling device. Operation continuing on 1 devices They are both attached to: libata version 1.20 loaded. sata_sil 0000:00:0a.0: version 0.9 ACPI: PCI Interrupt 0000:00:0a.0[A] -> GSI 16 (level, low) -> IRQ 17 ata1: SATA max UDMA/100 cmd 0xF8804080 ctl 0xF880408A bmdma 0xF8804000 irq 17 ata2: SATA max UDMA/100 cmd 0xF88040C0 ctl 0xF88040CA bmdma 0xF8804008 irq 17 ata1: SATA link up 1.5 Gbps (SStatus 113) ata1: dev 0 cfg 49:2f00 82:7869 83:7d09 84:4043 85:7869 86:3c01 87:4043 88:203f ata1: dev 0 ATA-7, max UDMA/100, 390721968 sectors: LBA48 ata1: dev 0 configured for UDMA/100 scsi0 : sata_sil ata2: SATA link up 1.5 Gbps (SStatus 113) ata2: dev 0 cfg 49:2f00 82:7c6b 83:7f09 84:4063 85:7c69 86:3e01 87:4063 88:007f ata2: dev 0 ATA-7, max UDMA/133, 398297088 sectors: LBA48 ata2: dev 0 configured for UDMA/100 scsi1 : sata_sil Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC Type: Direct-Access ANSI SCSI revision: 05 Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC Type: Direct-Access ANSI SCSI revision: 05 Are there any other tests; like swapping the disks to the other controller (sata_via) and seeing what happens. With and without the patch? David -- ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 10:39 ` David Greaves @ 2006-02-28 14:37 ` Mark Lord 2006-02-28 21:04 ` Bill Davidsen 2006-02-28 14:38 ` Mark Lord 2006-02-28 15:31 ` Mark Lord 2 siblings, 1 reply; 131+ messages in thread From: Mark Lord @ 2006-02-28 14:37 UTC (permalink / raw) To: David Greaves Cc: Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds David Greaves wrote: > > /dev/sda: .. > 0040 3fff c837 0010 0000 0000 003f 0000 > 0000 0000 4234 3033 3852 5248 2020 2020 > 2020 2020 2020 2020 0003 4000 0004 4241 > 4e43 3139 3830 4d61 7874 6f72 2036 4232 > 3030 4d30 2020 2020 2020 2020 2020 2020 > 2020 2020 2020 2020 2020 2020 2020 8010 > 0000 2f00 4000 0200 0000 0007 3fff 0010 > 003f fc10 00fb 0100 ffff 0fff 0000 0007 > 0003 0078 0078 0078 0078 0000 0000 0000 > 0000 0000 0000 0000 0002 0000 0000 0000 > 00fe 001e 7869 7d09 4043 7869 3c01 4043 > 203f 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 f1b0 1749 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0113 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 d3a5 .. hdparm-6.4 says: Model Number: Maxtor 6B200M0 Serial Number: B4038RRH Firmware Revision: BANC1980 Commands/features: Enabled Supported: * NOP cmd * READ BUFFER cmd * WRITE BUFFER cmd * Look-ahead * Write cache * Power Management feature set * SMART feature set * FLUSH_CACHE_EXT * Mandatory FLUSH_CACHE * Device Configuration Overlay feature set * 48-bit Address feature set SET_MAX security extension Advanced Power Management feature set * DOWNLOAD_MICROCODE * WRITE_{DMA|MULTIPLE}_FUA_EXT * SMART self-test * SMART error logging So, yes, the drive is either lying about "* WRITE_{DMA|MULTIPLE}_FUA_EXT", or it didn't like the parameters it was given, or the SATA/IDE controller chip didn't like the command. Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 14:37 ` Mark Lord @ 2006-02-28 21:04 ` Bill Davidsen 2006-03-08 2:57 ` Mark Lord 0 siblings, 1 reply; 131+ messages in thread From: Bill Davidsen @ 2006-02-28 21:04 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe Mark Lord wrote: > David Greaves wrote: >> >> /dev/sda: [...snip...] > .. > hdparm-6.4 says: Is there a version of that which will build on x86? I grabbed the version offered at freshmeat, but it won't compile on any x86 distro or gcc version to which I have access. RH8, RH9, FC1, FC3, FC4, ubuntu... with or without using the suggested alternate header. > > Model Number: Maxtor 6B200M0 > Serial Number: B4038RRH > Firmware Revision: BANC1980 > > Commands/features: > Enabled Supported: > * NOP cmd > * READ BUFFER cmd > * WRITE BUFFER cmd > * Look-ahead > * Write cache > * Power Management feature set > * SMART feature set > * FLUSH_CACHE_EXT > * Mandatory FLUSH_CACHE > * Device Configuration Overlay feature set > * 48-bit Address feature set > SET_MAX security extension > Advanced Power Management feature set > * DOWNLOAD_MICROCODE > * WRITE_{DMA|MULTIPLE}_FUA_EXT > * SMART self-test > * SMART error logging > > So, yes, the drive is either lying about "* WRITE_{DMA|MULTIPLE}_FUA_EXT", > or it didn't like the parameters it was given, or the SATA/IDE controller > chip didn't like the command. ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 21:04 ` Bill Davidsen @ 2006-03-08 2:57 ` Mark Lord 2006-03-08 3:18 ` Dave Jones 2006-03-08 15:37 ` Bill Davidsen 0 siblings, 2 replies; 131+ messages in thread From: Mark Lord @ 2006-03-08 2:57 UTC (permalink / raw) To: Bill Davidsen Cc: Jeff Garzik, linux-kernel, IDE/ATA development list, axboe, albertcc Bill Davidsen wrote: > > Is there a version of that which will build on x86? I grabbed the > version offered at freshmeat, but it won't compile on any x86 distro or > gcc version to which I have access. RH8, RH9, FC1, FC3, FC4, ubuntu... > with or without using the suggested alternate header. hdparm-6.5 is the current version now. Both it, and 6.4, build/install/run cleanly on Ubunutu-5.10, Debian-Sarge, and SLES9-SP3. You seem to be having trouble on only Redhat distros.. I guess they've done something unfriendly again. Care to be more specific about what Redhat is doing? Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-08 2:57 ` Mark Lord @ 2006-03-08 3:18 ` Dave Jones 2006-03-08 3:23 ` Mark Lord 2006-03-08 15:37 ` Bill Davidsen 1 sibling, 1 reply; 131+ messages in thread From: Dave Jones @ 2006-03-08 3:18 UTC (permalink / raw) To: Mark Lord Cc: Bill Davidsen, Jeff Garzik, linux-kernel, IDE/ATA development list, axboe, albertcc On Tue, Mar 07, 2006 at 09:57:07PM -0500, Mark Lord wrote: > Bill Davidsen wrote: > > > >Is there a version of that which will build on x86? I grabbed the > >version offered at freshmeat, but it won't compile on any x86 distro or > >gcc version to which I have access. RH8, RH9, FC1, FC3, FC4, ubuntu... > >with or without using the suggested alternate header. > > hdparm-6.5 is the current version now. Both it, and 6.4, > build/install/run cleanly on Ubunutu-5.10, Debian-Sarge, > and SLES9-SP3. > > You seem to be having trouble on only Redhat distros.. > I guess they've done something unfriendly again. > > Care to be more specific about what Redhat is doing? looks like our userspace includes aren't up to date with some of the kernel changes, so currently they're lacking the ide_task_request_t and related taskfile bits. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=184349 Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-08 3:18 ` Dave Jones @ 2006-03-08 3:23 ` Mark Lord 0 siblings, 0 replies; 131+ messages in thread From: Mark Lord @ 2006-03-08 3:23 UTC (permalink / raw) To: Dave Jones, Mark Lord, Bill Davidsen, Jeff Garzik, linux-kernel, IDE/ATA development list, axboe, albertcc Dave Jones wrote: > > looks like our userspace includes aren't up to date with some of the kernel > changes, so currently they're lacking the ide_task_request_t and related > taskfile bits. > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=184349 Ahh.. Thanks, Dave. hdparm-6.6 being released *now*, with that stuff #ifdef'd out when the necessary header structs are missing. It builds/runs for me, on RHEL4 at least. Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-08 2:57 ` Mark Lord 2006-03-08 3:18 ` Dave Jones @ 2006-03-08 15:37 ` Bill Davidsen 1 sibling, 0 replies; 131+ messages in thread From: Bill Davidsen @ 2006-03-08 15:37 UTC (permalink / raw) To: Mark Lord Cc: Bill Davidsen, Jeff Garzik, linux-kernel, IDE/ATA development list, axboe, albertcc On Tue, 7 Mar 2006, Mark Lord wrote: > Bill Davidsen wrote: > > > > Is there a version of that which will build on x86? I grabbed the > > version offered at freshmeat, but it won't compile on any x86 distro or > > gcc version to which I have access. RH8, RH9, FC1, FC3, FC4, ubuntu... > > with or without using the suggested alternate header. > > hdparm-6.5 is the current version now. Both it, and 6.4, > build/install/run cleanly on Ubunutu-5.10, Debian-Sarge, > and SLES9-SP3. > > You seem to be having trouble on only Redhat distros.. > I guess they've done something unfriendly again. > > Care to be more specific about what Redhat is doing? I'll mail you the first few hundred errors from the compiler after I go find 6.5 and try that. My ubuntu tester reported similar results, so I'm not sure what we are doing. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with little computers since 1979 ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 10:39 ` David Greaves 2006-02-28 14:37 ` Mark Lord @ 2006-02-28 14:38 ` Mark Lord 2006-02-28 15:16 ` Alan Cox 2006-02-28 15:31 ` Mark Lord 2 siblings, 1 reply; 131+ messages in thread From: Mark Lord @ 2006-02-28 14:38 UTC (permalink / raw) To: David Greaves Cc: Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds David Greaves wrote: .. > sd 0:0:0:0: SCSI error: return code = 0x8000002 > sda: Current: sense key: Medium Error > Additional sense: Unrecovered read error - auto reallocate failed > end_request: I/O error, dev sda, sector 390716735 > raid5: Disk failure on sda1, disabling device. Operation continuing on 2 > devices > ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 > ata2: status=0x51 { DriveReady SeekComplete Error } > sd 1:0:0:0: SCSI error: return code = 0x8000002 > sdb: Current: sense key: Medium Error > Additional sense: Unrecovered read error - auto reallocate failed > end_request: I/O error, dev sdb, sector 390716735 > raid5: Disk failure on sdb1, disabling device. Operation continuing on 1 > devices .. The error handling still sucks, regardless of FUA. All of this nonsense about "Medium Error" is pure bogosity here. Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 14:38 ` Mark Lord @ 2006-02-28 15:16 ` Alan Cox 2006-03-01 17:33 ` David Greaves 0 siblings, 1 reply; 131+ messages in thread From: Alan Cox @ 2006-02-28 15:16 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Maw, 2006-02-28 at 09:38 -0500, Mark Lord wrote: > > The error handling still sucks, regardless of FUA. > All of this nonsense about "Medium Error" is pure bogosity here. I've flipped my tree to report Aborted Command. Not sure there is a better scsi sense match for "it broke and I dont know why" ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 15:16 ` Alan Cox @ 2006-03-01 17:33 ` David Greaves 2006-03-01 18:37 ` Alan Cox 0 siblings, 1 reply; 131+ messages in thread From: David Greaves @ 2006-03-01 17:33 UTC (permalink / raw) To: Alan Cox Cc: Mark Lord, Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Alan Cox wrote: >On Maw, 2006-02-28 at 09:38 -0500, Mark Lord wrote: > > >>The error handling still sucks, regardless of FUA. >>All of this nonsense about "Medium Error" is pure bogosity here. >> >> > >I've flipped my tree to report Aborted Command. Not sure there is a >better scsi sense match for "it broke and I dont know why" > > As a user I prefer It Broke And I Dont Know Why to Aborted Command (honesty is the best policy) I certainly hate Medium Error as modern hard disks seem to be flakier than ever. David ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 17:33 ` David Greaves @ 2006-03-01 18:37 ` Alan Cox 2006-03-01 20:12 ` Phillip Susi 0 siblings, 1 reply; 131+ messages in thread From: Alan Cox @ 2006-03-01 18:37 UTC (permalink / raw) To: David Greaves Cc: Mark Lord, Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Mer, 2006-03-01 at 17:33 +0000, David Greaves wrote: > As a user I prefer > It Broke And I Dont Know Why > to > Aborted Command So whats the SCSI sense encoding for that ? ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 18:37 ` Alan Cox @ 2006-03-01 20:12 ` Phillip Susi 2006-03-08 16:46 ` Alan Cox 0 siblings, 1 reply; 131+ messages in thread From: Phillip Susi @ 2006-03-01 20:12 UTC (permalink / raw) To: Alan Cox Cc: David Greaves, Mark Lord, Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Alan Cox wrote: > On Mer, 2006-03-01 at 17:33 +0000, David Greaves wrote: >> As a user I prefer >> It Broke And I Dont Know Why >> to >> Aborted Command > > So whats the SCSI sense encoding for that ? > Wouldn't that just be 0/0/0? IIRC the standard defines that as "NO ADDITIONAL SENSE DATA" which sounds to me like another way of saying "I don't know what went wrong, but that didn't work". ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 20:12 ` Phillip Susi @ 2006-03-08 16:46 ` Alan Cox 0 siblings, 0 replies; 131+ messages in thread From: Alan Cox @ 2006-03-08 16:46 UTC (permalink / raw) To: Phillip Susi Cc: David Greaves, Mark Lord, Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus On Mer, 2006-03-01 at 15:12 -0500, Phillip Susi wrote: > >> It Broke And I Dont Know Why > >> to > >> Aborted Command > > > > So whats the SCSI sense encoding for that ? > > > > Wouldn't that just be 0/0/0? IIRC the standard defines that as "NO > ADDITIONAL SENSE DATA" which sounds to me like another way of saying "I > don't know what went wrong, but that didn't work". The 0/0/0 sense is already used. The question is what error do you use with that sense. At the moment I'm using aborted command. ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 10:39 ` David Greaves 2006-02-28 14:37 ` Mark Lord 2006-02-28 14:38 ` Mark Lord @ 2006-02-28 15:31 ` Mark Lord 2006-02-28 15:34 ` Jeff Garzik 2 siblings, 1 reply; 131+ messages in thread From: Mark Lord @ 2006-02-28 15:31 UTC (permalink / raw) To: David Greaves Cc: Tejun Heo, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds David Greaves wrote: > > scsi1 : sata_sil > Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC > Type: Direct-Access ANSI SCSI revision: 05 > Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC > Type: Direct-Access ANSI SCSI revision: 05 I wonder if the non-FUA component here is the sata_sil, rather than the two Maxtor drives. Also, your drives have different firmware, but both have trouble with FUA here. (sdb is slightly newer, and larger, than sda). Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 15:31 ` Mark Lord @ 2006-02-28 15:34 ` Jeff Garzik 2006-02-28 16:57 ` Eric D. Mudama 2006-03-01 17:41 ` David Greaves 0 siblings, 2 replies; 131+ messages in thread From: Jeff Garzik @ 2006-02-28 15:34 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Tejun Heo, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Mark Lord wrote: > David Greaves wrote: > >> >> scsi1 : sata_sil >> Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC >> Type: Direct-Access ANSI SCSI revision: 05 >> Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC >> Type: Direct-Access ANSI SCSI revision: 05 > > > I wonder if the non-FUA component here is the sata_sil, > rather than the two Maxtor drives. > > Also, your drives have different firmware, > but both have trouble with FUA here. sata_sil is indeed a piece of hardware that needs to know the opcodes ahead of time... Jeff ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 15:34 ` Jeff Garzik @ 2006-02-28 16:57 ` Eric D. Mudama 2006-03-01 1:04 ` Mark Lord 2006-03-01 17:41 ` David Greaves 1 sibling, 1 reply; 131+ messages in thread From: Eric D. Mudama @ 2006-02-28 16:57 UTC (permalink / raw) To: Jeff Garzik Cc: Mark Lord, David Greaves, Tejun Heo, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds those drives should support all FUA opcodes properly, both queued and unqueued On 2/28/06, Jeff Garzik <jgarzik@pobox.com> wrote: > Mark Lord wrote: > > David Greaves wrote: > > > >> > >> scsi1 : sata_sil > >> Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC > >> Type: Direct-Access ANSI SCSI revision: 05 > >> Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC > >> Type: Direct-Access ANSI SCSI revision: 05 > > > > > > I wonder if the non-FUA component here is the sata_sil, > > rather than the two Maxtor drives. > > > > Also, your drives have different firmware, > > but both have trouble with FUA here. > > sata_sil is indeed a piece of hardware that needs to know the opcodes > ahead of time... > > Jeff > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-ide" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 16:57 ` Eric D. Mudama @ 2006-03-01 1:04 ` Mark Lord 2006-03-01 11:37 ` Justin Piszcz 2006-03-01 13:17 ` Justin Piszcz 0 siblings, 2 replies; 131+ messages in thread From: Mark Lord @ 2006-03-01 1:04 UTC (permalink / raw) To: Eric D. Mudama Cc: Jeff Garzik, David Greaves, Tejun Heo, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Eric D. Mudama wrote: > those drives should support all FUA opcodes properly, both queued and unqueued His first drive (sda) does not support queued commands at all, but the newer firmware in his second drive (sdb) does support NCQ. Both drives support FUA. cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 1:04 ` Mark Lord @ 2006-03-01 11:37 ` Justin Piszcz 2006-03-01 13:17 ` Justin Piszcz 1 sibling, 0 replies; 131+ messages in thread From: Justin Piszcz @ 2006-03-01 11:37 UTC (permalink / raw) To: Mark Lord Cc: Eric D. Mudama, Jeff Garzik, David Greaves, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Tue, 28 Feb 2006, Mark Lord wrote: > Eric D. Mudama wrote: >> those drives should support all FUA opcodes properly, both queued and >> unqueued > > His first drive (sda) does not support queued commands at all, > but the newer firmware in his second drive (sdb) does support NCQ. > > Both drives support FUA. > > cheers > To trust or not to trust? I have a 400GB SATA drive: WDC WD4000KD-00N. With these errors in dmesg that have been mentioned throughout the thread, should I trust Linux using this drive, or should I remove it/wait until a patch is released to address this issue? Also, in the forums (storagereview.com I believe), it has been noted that these drives do NOT work on the Intel ICH5 controller, and this turned out to be true, when I put it on the Intel ICH5, the box stalls for 2-3 minutes and then it does not see the drive. However, on the Silicon Image, Inc. SiI 3112 chipset or Promise SATA/150 TX2 it works okay but it has those errors in dmesg. My question is, performing long and short smart tests, everything is physically ok with the drive; however, I probably should not use this drive for anything important in Linux, comments? Justin. ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 1:04 ` Mark Lord 2006-03-01 11:37 ` Justin Piszcz @ 2006-03-01 13:17 ` Justin Piszcz 1 sibling, 0 replies; 131+ messages in thread From: Justin Piszcz @ 2006-03-01 13:17 UTC (permalink / raw) To: Mark Lord Cc: Eric D. Mudama, Jeff Garzik, David Greaves, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Tue, 28 Feb 2006, Mark Lord wrote: > Eric D. Mudama wrote: >> those drives should support all FUA opcodes properly, both queued and >> unqueued > > His first drive (sda) does not support queued commands at all, > but the newer firmware in his second drive (sdb) does support NCQ. > > Both drives support FUA. > > cheers > Could someone *PLEASE* produce a *unified* patch that is compatible with 2.6.16-rc5 or 2.6.15.4 so I can reproduce the error? Mark had two patches, I have had the most PIA time getting them to work, patch properly, etc.. With 2.6.16-rc5: # make bzImage CHK include/linux/version.h scripts/kconfig/conf -s arch/i386/Kconfig # # using defaults found in .config # SPLIT include/linux/autoconf.h -> include/config/* CHK include/linux/compile.h CHK usr/initramfs_list GEN .version CHK include/linux/compile.h UPD include/linux/compile.h CC init/version.o LD init/built-in.o LD .tmp_vmlinux1 drivers/built-in.o: In function `ata_to_sense_error': undefined reference to `print' drivers/built-in.o: In function `ata_to_sense_error': undefined reference to `print' make: *** [.tmp_vmlinux1] Error 1 Command exited with non-zero status 2 ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-28 15:34 ` Jeff Garzik 2006-02-28 16:57 ` Eric D. Mudama @ 2006-03-01 17:41 ` David Greaves 2006-03-01 17:46 ` Mark Lord 1 sibling, 1 reply; 131+ messages in thread From: David Greaves @ 2006-03-01 17:41 UTC (permalink / raw) To: Jeff Garzik Cc: Mark Lord, Tejun Heo, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Jeff Garzik wrote: > Mark Lord wrote: > >> David Greaves wrote: >> >>> >>> scsi1 : sata_sil >>> Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC >>> Type: Direct-Access ANSI SCSI revision: 05 >>> Vendor: ATA Model: Maxtor 6B200M0 Rev: BANC >>> Type: Direct-Access ANSI SCSI revision: 05 >> >> >> >> I wonder if the non-FUA component here is the sata_sil, >> rather than the two Maxtor drives. >> >> Also, your drives have different firmware, >> but both have trouble with FUA here. > > > sata_sil is indeed a piece of hardware that needs to know the opcodes > ahead of time... > > Jeff > I actually have 3 of those drives - one runs through sata_via and doesn't have the same problem. (the sata_via ones *do* have : ata3: status=0x50 { DriveReady SeekComplete } ata3: PIO error problems with SMART) David ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 17:41 ` David Greaves @ 2006-03-01 17:46 ` Mark Lord 2006-03-01 18:12 ` David Greaves 0 siblings, 1 reply; 131+ messages in thread From: Mark Lord @ 2006-03-01 17:46 UTC (permalink / raw) To: David Greaves Cc: Jeff Garzik, Tejun Heo, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds David Greaves wrote: > > I actually have 3 of those drives - one runs through sata_via and > doesn't have the same problem. > > (the sata_via ones *do* have : > ata3: status=0x50 { DriveReady SeekComplete } > ata3: PIO error > problems with SMART) And once again, not enough information in the error messages for anyone to actually do anything about it (not David's fault). What command do you use to get that bug to pop up? BTW: hdparm-6.5 is now available (sourceforge), and should show all of the fancy features of your drives for comparism between versions. Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 17:46 ` Mark Lord @ 2006-03-01 18:12 ` David Greaves 2006-03-01 18:30 ` Mark Lord 0 siblings, 1 reply; 131+ messages in thread From: David Greaves @ 2006-03-01 18:12 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, Tejun Heo, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Mark Lord wrote: > David Greaves wrote: > >> >> I actually have 3 of those drives - one runs through sata_via and >> doesn't have the same problem. >> >> (the sata_via ones *do* have : >> ata3: status=0x50 { DriveReady SeekComplete } >> ata3: PIO error >> problems with SMART) > > > And once again, not enough information in the error messages > for anyone to actually do anything about it (not David's fault). > > What command do you use to get that bug to pop up? (FYI I'm running 2.6.15 with both 'info' patches 'cos I'm scared of 2.6.16-rc4!) haze:/usr/src# smartctl -data -s on /dev/sdc smartctl version 5.34 [i686-pc-linux-gnu] Copyright (C) 2002-5 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF ENABLE/DISABLE COMMANDS SECTION === SMART Enabled. No messages in dmesg haze:/usr/src# smartctl -data -o on /dev/sdc smartctl version 5.34 [i686-pc-linux-gnu] Copyright (C) 2002-5 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF ENABLE/DISABLE COMMANDS SECTION === Error SMART Enable Automatic Offline failed: Input/output error Smartctl: SMART Enable Automatic Offline Failed. dmesg contains this message repeated 31 times: ata3: PIO error ata3: status=0x50 { DriveReady SeekComplete } haze:/usr/src# smartctl -data -o off /dev/sdc succeeds but gives me: ata3: status=0x50 { DriveReady SeekComplete } ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x04 { DriveStatusError } ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x04 { DriveStatusError } ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x04 { DriveStatusError } haze:/usr/src# smartctl -data -o on /dev/sdd smartctl version 5.34 [i686-pc-linux-gnu] Copyright (C) 2002-5 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF ENABLE/DISABLE COMMANDS SECTION === Error SMART Enable Automatic Offline failed: Input/output error Smartctl: SMART Enable Automatic Offline Failed. ata4: PIO error ata4: status=0x50 { DriveReady SeekComplete } # smartctl -data -o off /dev/sdd ata4: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata4: status=0x51 { DriveReady SeekComplete Error } ata4: error=0x04 { DriveStatusError } ata4: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata4: status=0x51 { DriveReady SeekComplete Error } ata4: error=0x04 { DriveStatusError } ata4: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata4: status=0x51 { DriveReady SeekComplete Error } ata4: error=0x04 { DriveStatusError } haze:/usr/src# hdparm --Istdout /dev/sdc /dev/sdc: IO_support = 0 (default 16-bit) readonly = 0 (off) readahead = 256 (on) geometry = 19457/255/63, sectors = 312581808, start = 0 0c5a 3fff c837 0010 0000 0000 003f 0000 0000 0000 334a 5332 4b53 4c33 2020 2020 2020 2020 2020 2020 0000 4000 0004 332e 3138 2020 2020 5354 3331 3630 3032 3341 5320 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 8010 0000 2f00 0000 0200 0200 0007 3fff 0010 003f fc10 00fb 0110 ffff 0fff 0000 0007 0003 0078 0078 00f0 0078 0000 0000 0000 0000 0000 0000 0000 0002 0000 0000 0000 007e 001b 346b 7d01 4003 3468 3c01 4003 407f 0000 0000 fefe 0000 0000 fe00 0000 0000 0000 0000 0000 9eb0 12a1 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 9eb0 12a1 9eb0 12a1 2020 0002 42b6 8000 008a 3c06 3c0a ffff 07c6 0100 0800 0ff0 1000 0002 0030 0000 0000 0000 fe06 0000 0002 0050 008a 954f 0000 0023 000b 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 7ea5 haze:/usr/src# hdparm --Istdout /dev/sdd /dev/sdd: IO_support = 0 (default 16-bit) readonly = 0 (off) readahead = 256 (on) geometry = 24792/255/63, sectors = 398297088, start = 0 0040 3fff c837 0010 0000 0000 003f 0000 0000 0000 4234 3152 5643 3248 2020 2020 2020 2020 2020 2020 0003 4000 0004 4241 4e43 3142 5930 4d61 7874 6f72 2036 4232 3030 4d30 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 8010 0000 2f00 4000 0200 0000 0007 3fff 0010 003f fc10 00fb 0110 ffff 0fff 0000 0007 0003 0078 0078 0078 0078 0000 0000 0000 0000 0000 0000 001f 0102 0000 0000 0000 00fe 001e 7c6b 7f09 4063 7c68 3e01 4063 407f 0000 0000 0000 fffe 0000 c0fe 0000 0000 0000 0000 0000 8800 17bd 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0113 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 a6a5 David > > BTW: > hdparm-6.5 is now available (sourceforge), > and should show all of the fancy features > of your drives for comparism between versions. OK - soonish... ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 18:12 ` David Greaves @ 2006-03-01 18:30 ` Mark Lord 2006-03-01 18:32 ` Justin Piszcz ` (3 more replies) 0 siblings, 4 replies; 131+ messages in thread From: Mark Lord @ 2006-03-01 18:30 UTC (permalink / raw) To: David Greaves Cc: Jeff Garzik, Tejun Heo, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds David Greaves wrote: > > haze:/usr/src# smartctl -data -o off /dev/sdc > succeeds but gives me: > > ata3: status=0x50 { DriveReady SeekComplete } > ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 > ata3: status=0x51 { DriveReady SeekComplete Error } > ata3: error=0x04 { DriveStatusError } > ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 > ata3: status=0x51 { DriveReady SeekComplete Error } > ata3: error=0x04 { DriveStatusError } > ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 > ata3: status=0x51 { DriveReady SeekComplete Error } > ata3: error=0x04 { DriveStatusError } "DriveStatusError" is "Command Aborted" in ac-speak. From the man page for smartctl, we read: >-o VALUE Enables or disables SMART automatic offline test ... >Note that the SMART automatic offline test command is listed as "Obsolete" in every >version of the ATA and ATA/ATAPI Specifications. It was originally part of the >SFF-8035i Revision 2.0 specification, but was never part of any ATA specification. There's a chance that your drives simply do not fully support this feature, and are rejecting attempts to use it. By the way, the latest 2.6.16-rc5-git4 is available, and has FUA turned off by default now. So it should work with your drives, and *you* are expected to verify that for us all now. Cheers -ml ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 18:30 ` Mark Lord @ 2006-03-01 18:32 ` Justin Piszcz 2006-03-01 18:33 ` Justin Piszcz ` (2 subsequent siblings) 3 siblings, 0 replies; 131+ messages in thread From: Justin Piszcz @ 2006-03-01 18:32 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Wed, 1 Mar 2006, Mark Lord wrote: > David Greaves wrote: >> >> haze:/usr/src# smartctl -data -o off /dev/sdc >> succeeds but gives me: >> >> ata3: status=0x50 { DriveReady SeekComplete } >> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } >> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } >> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } > > "DriveStatusError" is "Command Aborted" in ac-speak. > From the man page for smartctl, we read: > >> -o VALUE Enables or disables SMART automatic offline test ... >> Note that the SMART automatic offline test command is listed as "Obsolete" > in every >> version of the ATA and ATA/ATAPI Specifications. It was originally part > of the >> SFF-8035i Revision 2.0 specification, but was never part of any ATA > specification. > > There's a chance that your drives simply do not fully support this feature, > and are rejecting attempts to use it. > > By the way, the latest 2.6.16-rc5-git4 is available, > and has FUA turned off by default now. So it should > work with your drives, and *you* are expected to verify > that for us all now. > > Cheers > > -ml > When running that command, I get it too: [4294684.510000] ACPI: PCI Interrupt 0000:02:06.0[A] -> GSI 22 (level, low) -> I RQ 17 [4294686.762000] process `syslogd' is using obsolete setsockopt SO_BSDCOMPAT [4295292.736000] +++PATCH: Original kernel error: [4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/AS CQ 0xb/00/00 [4295292.736000] +++PATCH: Mark Lord's extended verbosity patch: [4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error } [4295292.736000] ata3: error=0x04 { DriveStatusError } [4295292.736000] +++PATCH: Original kernel error: [4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/AS CQ 0xb/00/00 [4295292.736000] +++PATCH: Mark Lord's extended verbosity patch: [4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error } [4295292.736000] ata3: error=0x04 { DriveStatusError } [4295292.736000] +++PATCH: Original kernel error: [4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/AS CQ 0xb/00/00 [4295292.736000] +++PATCH: Mark Lord's extended verbosity patch: [4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error } [4295292.736000] ata3: error=0x04 { DriveStatusError } [4295292.736000] +++PATCH: Original kernel error: [4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/AS CQ 0xb/00/00 [4295292.736000] +++PATCH: Mark Lord's extended verbosity patch: [4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error } [4295292.736000] ata3: error=0x04 { DriveStatusError } [4295292.736000] +++PATCH: Original kernel error: [4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/AS CQ 0xb/00/00 [4295292.736000] +++PATCH: Mark Lord's extended verbosity patch: [4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error } [4295292.736000] ata3: error=0x04 { DriveStatusError } ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 18:30 ` Mark Lord 2006-03-01 18:32 ` Justin Piszcz @ 2006-03-01 18:33 ` Justin Piszcz 2006-03-01 18:48 ` David Greaves 2006-03-01 19:06 ` Justin Piszcz 3 siblings, 0 replies; 131+ messages in thread From: Justin Piszcz @ 2006-03-01 18:33 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Wed, 1 Mar 2006, Mark Lord wrote: > David Greaves wrote: >> >> haze:/usr/src# smartctl -data -o off /dev/sdc >> succeeds but gives me: >> >> ata3: status=0x50 { DriveReady SeekComplete } >> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } >> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } >> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } > > "DriveStatusError" is "Command Aborted" in ac-speak. > From the man page for smartctl, we read: > >> -o VALUE Enables or disables SMART automatic offline test ... >> Note that the SMART automatic offline test command is listed as "Obsolete" > in every >> version of the ATA and ATA/ATAPI Specifications. It was originally part > of the >> SFF-8035i Revision 2.0 specification, but was never part of any ATA > specification. > > There's a chance that your drives simply do not fully support this feature, > and are rejecting attempts to use it. > > By the way, the latest 2.6.16-rc5-git4 is available, > and has FUA turned off by default now. So it should > work with your drives, and *you* are expected to verify > that for us all now. > > Cheers > > -ml > Mark, After patching to 2.6.16-rc5-git4, we should no longer see these errors right? Then I can use my drive again without worrying about data loss? :) Justin. ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 18:30 ` Mark Lord 2006-03-01 18:32 ` Justin Piszcz 2006-03-01 18:33 ` Justin Piszcz @ 2006-03-01 18:48 ` David Greaves 2006-03-01 19:49 ` David Greaves 2006-03-01 19:06 ` Justin Piszcz 3 siblings, 1 reply; 131+ messages in thread From: David Greaves @ 2006-03-01 18:48 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, Tejun Heo, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Mark Lord wrote: > By the way, the latest 2.6.16-rc5-git4 is available, > and has FUA turned off by default now. So it should > work with your drives, and *you* are expected to verify > that for us all now. Yeah, I know - I've got it on the machine... but it's my wife's machine. I've asked nicely but she's editing a Hercule Poirot video so I'm not allowed to reboot it for a while... I've told her I'm not making pancakes until I've tested it so expect a report Real Soon Now... David ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 18:48 ` David Greaves @ 2006-03-01 19:49 ` David Greaves 2006-03-03 19:38 ` Justin Piszcz 2006-03-05 11:43 ` Justin Piszcz 0 siblings, 2 replies; 131+ messages in thread From: David Greaves @ 2006-03-01 19:49 UTC (permalink / raw) To: David Greaves Cc: Mark Lord, Jeff Garzik, Tejun Heo, Justin Piszcz, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds David Greaves wrote: >Mark Lord wrote: > > > >>By the way, the latest 2.6.16-rc5-git4 is available, >>and has FUA turned off by default now. So it should >>work with your drives, and *you* are expected to verify >>that for us all now. >> >> >Yeah, I know - I've got it on the machine... but it's my wife's machine. >I've asked nicely but she's editing a Hercule Poirot video so I'm not >allowed to reboot it for a while... > >I've told her I'm not making pancakes until I've tested it so expect a >report Real Soon Now... > > OK that worked (the pancakes - the kernel's not doing so well...) haze:~# uname -a Linux haze 2.6.16-rc5-git4 #2 PREEMPT Wed Mar 1 19:07:58 UTC 2006 i686 GNU/Linux The boot is pretty clean. I ran an xfs_repair -n on the lvm volume and got the following errors. The repair reported a clean filesystem and the drive was not booted from the raid so that's a big improvement. I was not able to trigger similar messages on ata1 but a simple dd doesn't trigger the messages on ata2 either (and for various reasons, xfs_repair wouldn't run on ata1 - I thought I'd leave it and report this first) ata2: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: error=0x04 { DriveStatusError } ata2: no sense translation for status: 0x51 ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for status: 0x51 ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for status: 0x51 ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for status: 0x51 ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for status: 0x51 ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for status: 0x51 ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } ata2: no sense translation for status: 0x51 ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata2: status=0x51 { DriveReady SeekComplete Error } David -- ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 19:49 ` David Greaves @ 2006-03-03 19:38 ` Justin Piszcz 2006-03-03 22:46 ` David Greaves 2006-03-05 11:43 ` Justin Piszcz 1 sibling, 1 reply; 131+ messages in thread From: Justin Piszcz @ 2006-03-03 19:38 UTC (permalink / raw) To: David Greaves Cc: Mark Lord, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Wed, 1 Mar 2006, David Greaves wrote: > David Greaves wrote: > >> Mark Lord wrote: >> >> >> >>> By the way, the latest 2.6.16-rc5-git4 is available, >>> and has FUA turned off by default now. So it should >>> work with your drives, and *you* are expected to verify >>> that for us all now. >>> >>> >> Yeah, I know - I've got it on the machine... but it's my wife's machine. >> I've asked nicely but she's editing a Hercule Poirot video so I'm not >> allowed to reboot it for a while... >> >> I've told her I'm not making pancakes until I've tested it so expect a >> report Real Soon Now... >> >> > OK that worked (the pancakes - the kernel's not doing so well...) > > haze:~# uname -a > Linux haze 2.6.16-rc5-git4 #2 PREEMPT Wed Mar 1 19:07:58 UTC 2006 i686 > GNU/Linux > > The boot is pretty clean. > I ran an xfs_repair -n on the lvm volume and got the following errors. > The repair reported a clean filesystem and the drive was not booted from > the raid so that's a big improvement. > > I was not able to trigger similar messages on ata1 but a simple dd > doesn't trigger the messages on ata2 either (and for various reasons, > xfs_repair wouldn't run on ata1 - I thought I'd leave it and report this > first) > > ata2: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: error=0x04 { DriveStatusError } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > > David > > -- > As of 2.6.16-rc5-git4, I have written 281GB so far over a period of 48+ hours with no errors yet :) Will keep you updated if I see any errors, but so far, so good! Thanks, Justin. ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-03 19:38 ` Justin Piszcz @ 2006-03-03 22:46 ` David Greaves 2006-03-04 14:25 ` Mark Lord 0 siblings, 1 reply; 131+ messages in thread From: David Greaves @ 2006-03-03 22:46 UTC (permalink / raw) To: Justin Piszcz Cc: Mark Lord, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Just FYI - I'm away (in Canada) for 2 weeks so can't do any additional testing until I return. David -- ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-03 22:46 ` David Greaves @ 2006-03-04 14:25 ` Mark Lord 2006-03-06 6:13 ` David Greaves 0 siblings, 1 reply; 131+ messages in thread From: Mark Lord @ 2006-03-04 14:25 UTC (permalink / raw) To: David Greaves Cc: Justin Piszcz, Mark Lord, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds David Greaves wrote: > Just FYI - I'm away (in Canada) for 2 weeks so can't do any additional > testing until I return. Am I correct, in that your last test on rc5-git4 was a failure? But without the "opcode" display in the error messages, so we have no idea exactly what caused the errors (again!)? [Whatcha doin up here?] Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-04 14:25 ` Mark Lord @ 2006-03-06 6:13 ` David Greaves 2006-03-21 18:11 ` David Greaves 0 siblings, 1 reply; 131+ messages in thread From: David Greaves @ 2006-03-06 6:13 UTC (permalink / raw) To: Mark Lord Cc: Justin Piszcz, Mark Lord, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Mark Lord wrote: > David Greaves wrote: >> Just FYI - I'm away (in Canada) for 2 weeks so can't do any additional >> testing until I return. > > Am I correct, in that your last test on rc5-git4 was a failure? It was *much* better than rc4 but it did have an error. I *think* the problem I'm seeing is likely to be similar to the one I orginally reported (on 2.6.15 IIRC) Same sporadic warning/error which didn't usually trigger the raid-boot-the-disk behaviour that the FUA code seemed to. > But without the "opcode" display in the error messages, > so we have no idea exactly what caused the errors (again!)? Yes. I thought the/a opcode-verbose patch was in there but I guess not. I don't have remote console access to the machine so wouldn't be able to carry out reliable kernel tests - sorry. Of course I'll do this as soon as I return. > > [Whatcha doin up here?] [:) 2weeks skiing in Whistler (this time - 10 days canadian canoeing in Algonquin last time!) Canada's great !!] David ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-06 6:13 ` David Greaves @ 2006-03-21 18:11 ` David Greaves 2006-03-22 15:23 ` David Greaves 0 siblings, 1 reply; 131+ messages in thread From: David Greaves @ 2006-03-21 18:11 UTC (permalink / raw) To: Mark Lord Cc: Justin Piszcz, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds David Greaves wrote: > Mark Lord wrote: > >> David Greaves wrote: >> >>> Just FYI - I'm away (in Canada) for 2 weeks so can't do any additional >>> testing until I return. >> >> >> Am I correct, in that your last test on rc5-git4 was a failure? > > It was *much* better than rc4 but it did have an error. > I *think* the problem I'm seeing is likely to be similar to the one I > orginally reported (on 2.6.15 IIRC) > Same sporadic warning/error which didn't usually trigger the > raid-boot-the-disk behaviour that the FUA code seemed to. > >> But without the "opcode" display in the error messages, >> so we have no idea exactly what caused the errors (again!)? > > Yes. I thought the/a opcode-verbose patch was in there but I guess not. > I don't have remote console access to the machine so wouldn't be able > to carry out reliable kernel tests - sorry. > Of course I'll do this as soon as I return. Hi Back now :) I've upgraded to 2.6.16 and applied your verbosity patches. I've persuaded my array to re-assemble and during the resync I got these messages dmesg: ata1: translated op=0x28 cmd=0x25 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x04 { DriveStatusError } ...(18mins later) ata1: no sense translation for op=0x28 cmd=0x25 status: 0x51 ata1: translated op=0x28 cmd=0x25 ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 ata1: status=0x51 { DriveReady SeekComplete Error } smartd is not running This did not cause the raid subsystem to boot the disk (thank goodness!) David ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-21 18:11 ` David Greaves @ 2006-03-22 15:23 ` David Greaves 0 siblings, 0 replies; 131+ messages in thread From: David Greaves @ 2006-03-22 15:23 UTC (permalink / raw) To: Mark Lord Cc: Justin Piszcz, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds David Greaves wrote: >I've upgraded to 2.6.16 and applied your verbosity patches. > >I've persuaded my array to re-assemble and during the resync I got these >messages > >dmesg: >ata1: translated op=0x28 cmd=0x25 ATA stat/err 0x51/04 to SCSI >SK/ASC/ASCQ 0xb/00/00 >ata1: status=0x51 { DriveReady SeekComplete Error } >ata1: error=0x04 { DriveStatusError } >...(18mins later) >ata1: no sense translation for op=0x28 cmd=0x25 status: 0x51 >ata1: translated op=0x28 cmd=0x25 ATA stat/err 0x51/00 to SCSI >SK/ASC/ASCQ 0x3/11/04 >ata1: status=0x51 { DriveReady SeekComplete Error } > >smartd is not running >This did not cause the raid subsystem to boot the disk (thank goodness!) > > Just providing a little more followon information... I have had a further 52 of these messages over the last day. No obvious cause. Mar 22 13:14:55 haze kernel: ata2: no sense translation for op=0x28 cmd=0x25 status: 0x51 Mar 22 13:14:55 haze kernel: ata2: status=0x51 { DriveReady SeekComplete Error } Most recently this happened: Mar 22 13:47:09 haze kernel: ata2: no sense translation for op=0x28 cmd=0x25 status: 0x51 Mar 22 13:47:09 haze kernel: ata2: status=0x51 { DriveReady SeekComplete Error } Mar 22 13:47:09 haze kernel: sd 1:0:0:0: SCSI error: return code = 0x8000002 Mar 22 13:47:09 haze kernel: sdb: Current: sense key: Medium Error Mar 22 13:47:09 haze kernel: Additional sense: Unrecovered read error - auto reallocate failed Mar 22 13:47:09 haze kernel: end_request: I/O error, dev sdb, sector 396518289 with dmesg piping up with: raid1: sdb2: rescheduling sector 5801424 raid1: sdd2: redirecting sector 5801424 to another mirror no drives were kicked from the array. David -- ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 19:49 ` David Greaves 2006-03-03 19:38 ` Justin Piszcz @ 2006-03-05 11:43 ` Justin Piszcz 2006-03-05 12:41 ` Justin Piszcz 1 sibling, 1 reply; 131+ messages in thread From: Justin Piszcz @ 2006-03-05 11:43 UTC (permalink / raw) To: David Greaves Cc: Mark Lord, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Wed, 1 Mar 2006, David Greaves wrote: > David Greaves wrote: > >> Mark Lord wrote: >> >> >> >>> By the way, the latest 2.6.16-rc5-git4 is available, >>> and has FUA turned off by default now. So it should >>> work with your drives, and *you* are expected to verify >>> that for us all now. >>> >>> >> Yeah, I know - I've got it on the machine... but it's my wife's machine. >> I've asked nicely but she's editing a Hercule Poirot video so I'm not >> allowed to reboot it for a while... >> >> I've told her I'm not making pancakes until I've tested it so expect a >> report Real Soon Now... >> >> > OK that worked (the pancakes - the kernel's not doing so well...) > > haze:~# uname -a > Linux haze 2.6.16-rc5-git4 #2 PREEMPT Wed Mar 1 19:07:58 UTC 2006 i686 > GNU/Linux > > The boot is pretty clean. > I ran an xfs_repair -n on the lvm volume and got the following errors. > The repair reported a clean filesystem and the drive was not booted from > the raid so that's a big improvement. > > I was not able to trigger similar messages on ata1 but a simple dd > doesn't trigger the messages on ata2 either (and for various reasons, > xfs_repair wouldn't run on ata1 - I thought I'd leave it and report this > first) > > ata2: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: error=0x04 { DriveStatusError } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > ata2: no sense translation for status: 0x51 > ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 > ata2: status=0x51 { DriveReady SeekComplete Error } > > David > > -- > Using 2.6.16-rc5-git4 and removing a directory of around 5.0GB of files while streaming a 1MB/s video stream on another (SATA disk), the I/O seemed to freeze up for a moment and I got this error: [4342671.839000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x22 Only 1 in dmesg, any idea what causes this error? ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-05 11:43 ` Justin Piszcz @ 2006-03-05 12:41 ` Justin Piszcz 2006-03-05 22:58 ` Mark Lord 0 siblings, 1 reply; 131+ messages in thread From: Justin Piszcz @ 2006-03-05 12:41 UTC (permalink / raw) To: David Greaves Cc: Mark Lord, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Sun, 5 Mar 2006, Justin Piszcz wrote: > On Wed, 1 Mar 2006, David Greaves wrote: > >> David Greaves wrote: >> >>> Mark Lord wrote: >>> >>> >>> >>>> By the way, the latest 2.6.16-rc5-git4 is available, >>>> and has FUA turned off by default now. So it should >>>> work with your drives, and *you* are expected to verify >>>> that for us all now. >>>> >>>> >>> Yeah, I know - I've got it on the machine... but it's my wife's machine. >>> I've asked nicely but she's editing a Hercule Poirot video so I'm not >>> allowed to reboot it for a while... >>> >>> I've told her I'm not making pancakes until I've tested it so expect a >>> report Real Soon Now... >>> >>> >> OK that worked (the pancakes - the kernel's not doing so well...) >> >> haze:~# uname -a >> Linux haze 2.6.16-rc5-git4 #2 PREEMPT Wed Mar 1 19:07:58 UTC 2006 i686 >> GNU/Linux >> >> The boot is pretty clean. >> I ran an xfs_repair -n on the lvm volume and got the following errors. >> The repair reported a clean filesystem and the drive was not booted from >> the raid so that's a big improvement. >> >> I was not able to trigger similar messages on ata1 but a simple dd >> doesn't trigger the messages on ata2 either (and for various reasons, >> xfs_repair wouldn't run on ata1 - I thought I'd leave it and report this >> first) >> >> ata2: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: error=0x04 { DriveStatusError } >> ata2: no sense translation for status: 0x51 >> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: no sense translation for status: 0x51 >> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: no sense translation for status: 0x51 >> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: no sense translation for status: 0x51 >> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: no sense translation for status: 0x51 >> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: no sense translation for status: 0x51 >> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: no sense translation for status: 0x51 >> ata2: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0x3/11/04 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> >> David >> >> -- >> > > Using 2.6.16-rc5-git4 and removing a directory of around 5.0GB of files while > streaming a 1MB/s video stream on another (SATA disk), the I/O seemed to > freeze up for a moment and I got this error: > > [4342671.839000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x22 > > Only 1 in dmesg, any idea what causes this error? > > The drive it occured on was a 74GB raptor on an ICH5 controller. [4294673.245000] Vendor: ATA Model: WDC WD740GD-00FL Rev: 33.0 0000:00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02) ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-05 12:41 ` Justin Piszcz @ 2006-03-05 22:58 ` Mark Lord 2006-03-05 23:00 ` Mark Lord 2006-03-05 23:39 ` Jeff Garzik 0 siblings, 2 replies; 131+ messages in thread From: Mark Lord @ 2006-03-05 22:58 UTC (permalink / raw) To: Justin Piszcz Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Justin Piszcz wrote: > >> Using 2.6.16-rc5-git4 and removing a directory of around 5.0GB of >> files while streaming a 1MB/s video stream on another (SATA disk), the >> I/O seemed to freeze up for a moment and I got this error: >> >> [4342671.839000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x22 >> >> Only 1 in dmesg, any idea what causes this error? > > The drive it occured on was a 74GB raptor on an ICH5 controller. > > [4294673.245000] Vendor: ATA Model: WDC WD740GD-00FL Rev: 33.0 > 0000:00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA > Controller (rev 02) SCSI opcode 0x35 is SYNCHRONIZE_CACHE. Pity we don't know exactly what that got translated to by libata. It would have been either a FLUSH_CACHE of some kind, or possibly(?) one of the _FUA_ commands. Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-05 22:58 ` Mark Lord @ 2006-03-05 23:00 ` Mark Lord 2006-03-05 23:19 ` Justin Piszcz 2006-03-05 23:39 ` Jeff Garzik 1 sibling, 1 reply; 131+ messages in thread From: Mark Lord @ 2006-03-05 23:00 UTC (permalink / raw) To: Justin Piszcz Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Mark Lord wrote: > Justin Piszcz wrote: >> >>> Using 2.6.16-rc5-git4 and removing a directory of around 5.0GB of >>> files while streaming a 1MB/s video stream on another (SATA disk), >>> the I/O seemed to freeze up for a moment and I got this error: >>> >>> [4342671.839000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x22 >>> >>> Only 1 in dmesg, any idea what causes this error? >> >> The drive it occured on was a 74GB raptor on an ICH5 controller. >> >> [4294673.245000] Vendor: ATA Model: WDC WD740GD-00FL Rev: 33.0 >> 0000:00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA >> Controller (rev 02) > > SCSI opcode 0x35 is SYNCHRONIZE_CACHE. Oh, wait a sec.. on that path, libata actually does show the ATA opcode, which would have been WRITE_DMA_EXT. Not an FUA command. Dunno what it's complaining about, though. ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-05 23:00 ` Mark Lord @ 2006-03-05 23:19 ` Justin Piszcz 0 siblings, 0 replies; 131+ messages in thread From: Justin Piszcz @ 2006-03-05 23:19 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Sun, 5 Mar 2006, Mark Lord wrote: > Mark Lord wrote: >> Justin Piszcz wrote: >>> >>>> Using 2.6.16-rc5-git4 and removing a directory of around 5.0GB of files >>>> while streaming a 1MB/s video stream on another (SATA disk), the I/O >>>> seemed to freeze up for a moment and I got this error: >>>> >>>> [4342671.839000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x22 >>>> >>>> Only 1 in dmesg, any idea what causes this error? >>> >>> The drive it occured on was a 74GB raptor on an ICH5 controller. >>> >>> [4294673.245000] Vendor: ATA Model: WDC WD740GD-00FL Rev: 33.0 >>> 0000:00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA >>> Controller (rev 02) >> >> SCSI opcode 0x35 is SYNCHRONIZE_CACHE. > > Oh, wait a sec.. on that path, libata actually does show the ATA opcode, > which would have been WRITE_DMA_EXT. Not an FUA command. > > Dunno what it's complaining about, though. > Well I know what it was now... The hard drive (RAPTOR/74GB failed)... [4294685.928000] process `syslogd' is using obsolete setsockopt SO_BSDCOMPAT [4342671.839000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x22 [4347012.243000] ata1: command 0x25 timeout, stat 0x50 host_stat 0x20 [4347157.486000] ata1: command 0x25 timeout, stat 0x80 host_stat 0x22 [4347157.486000] ata1: translated ATA stat/err 0x80/00 to SCSI SK/ASC/ASCQ 0xb/4 7/00 [4347157.486000] ata1: status=0x80 { Busy } [4347157.486000] sd 0:0:0:0: SCSI error: return code = 0x8000002 [4347157.486000] sda: Current: sense key=0xb [4347157.486000] ASC=0x47 ASCQ=0x0 [4347157.486000] end_request: I/O error, dev sda, sector 27646928 [4347157.486000] Buffer I/O error on device sda, logical block 3455866 [4347157.486000] ATA: abnormal status 0x80 on port 0xC007 [4347157.486000] ATA: abnormal status 0x80 on port 0xC007 [4347157.486000] ATA: abnormal status 0x80 on port 0xC007 [4347187.486000] ata1: command 0x25 timeout, stat 0x50 host_stat 0x21 [4347407.657000] ATA: abnormal status 0x80 on port 0xC007 [4347407.657000] ATA: abnormal status 0x80 on port 0xC007 [4347407.657000] ATA: abnormal status 0x80 on port 0xC007 [4347437.656000] ata1: command 0x35 timeout, stat 0x80 host_stat 0x21 [4347437.656000] ata1: translated ATA stat/err 0x80/00 to SCSI SK/ASC/ASCQ 0xb/4 7/00 [4347437.656000] ata1: status=0x80 { Busy } [4347437.656000] sd 0:0:0:0: SCSI error: return code = 0x8000002 [4347437.656000] sda: Current: sense key=0xb [4347437.656000] ASC=0x47 ASCQ=0x0 [4347437.656000] end_request: I/O error, dev sda, sector 76339746 [4347437.656000] ATA: abnormal status 0x80 on port 0xC007 [4347437.656000] ATA: abnormal status 0x80 on port 0xC007 [4347437.656000] ATA: abnormal status 0x80 on port 0xC007 [4347467.656000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x21 [4347467.656000] Device sda2 - XFS write error in file system meta-data block 0x 449af90 in sda2 [4347467.656000] ata1: command 0x35 timeout, stat 0x50 host_stat 0x21 [4347467.656000] Device sda2 - XFS write error in file system meta-data block 0x 449af90 in sda2 [4347497.656000] ata1: command 0x25 timeout, stat 0x50 host_stat 0x21 [4347527.663000] ata1: command 0x25 timeout, stat 0x50 host_stat 0x22 [4347527.663000] Unable to handle kernel paging request at virtual address 858f9 a70 [4347527.663000] printing eip: [4347527.663000] c021ff87 [4347527.663000] *pde = 00000000 [4347527.663000] Oops: 0000 [#1] [4347527.663000] PREEMPT SMP [4347527.663000] CPU: 0 [4347527.663000] EIP: 0060:[<c021ff87>] Not tainted VLI [4347527.663000] EFLAGS: 00210282 (2.6.16-rc5-git4 #3) [4347527.663000] EIP is at xfs_dir2_block_lookup_int+0xb0/0x1e9 [4347527.663000] eax: 9b86a560 ebx: 00000000 ecx: cdc352b0 edx: 00000000 [4347527.663000] esi: 177504f0 edi: 5e5cb7f4 ebp: 00000000 esp: f6c8bd18 [4347527.663000] ds: 007b es: 007b ss: 0068 [4347527.663000] Process nfsd (pid: 1359, threadinfo=f6c8a000 task=f7c14030) [4347527.663000] Stack: <0>00000000 c91fa944 00000000 021a0480 00000000 f6c8bd64 00000000 f6c8bd84 [4347527.663000] f6c8bd88 f6c8bdac c73e7438 f6f916c0 00000004 f7dbc800 00 000000 f3aa2000 [4347527.663000] 61a5869b c91fa9ac f7db9380 c73e7438 00000000 c91fa944 f6 c8bdac 00000000 [4347527.663000] Call Trace: [4347527.663000] [<c02200da>] xfs_dir2_block_lookup+0x1a/0xa1 [4347527.663000] [<c021f721>] xfs_dir2_lookup+0xd3/0x151 [4347527.663000] [<c035e9d3>] ip_output+0x171/0x2de [4347527.663000] [<c035e1c9>] ip_finish_output+0x0/0x22d [4347527.663000] [<c024e836>] xfs_dir_lookup_int+0x40/0x125 [4347527.663000] [<c0150b0d>] cache_alloc_refill+0xf1/0x50c [4347527.663000] [<c0252b39>] xfs_lookup+0x5f/0x88 [4347527.663000] [<c02613cc>] linvfs_lookup+0x52/0x99 [4347527.663000] [<c0161563>] __lookup_hash+0xc4/0xf3 [4347527.663000] [<c016160f>] lookup_one_len+0x7d/0x84 [4347527.663000] [<c01ad6c7>] nfsd_lookup+0xc0/0x4b2 [4347527.663000] [<c01b4bcd>] nfsd3_proc_lookup+0xa5/0xf3 [4347527.663000] [<c01a9497>] nfsd_dispatch+0x9c/0x214 [4347527.663000] [<c039fb21>] svc_process+0x3bf/0x69e [4347527.663000] [<c01a97bc>] nfsd+0x1ad/0x331 [4347527.663000] [<c01a960f>] nfsd+0x0/0x331 [4347527.663000] [<c0100e95>] kernel_thread_helper+0x5/0xb [4347527.663000] Code: 89 44 24 40 89 c2 0f ca 8d 04 d5 00 00 00 00 29 c6 8d 42 ff 8b 4c 24 24 8b 79 14 31 d2 eb 07 8d 51 01 39 c2 7f 17 8d 0c 02 d1 f9 <8b> 1c ce 0f cb 39 df 74 2a 77 e9 8d 41 ff 39 c2 7e e9 8b 74 24 [4347527.663000] [4347527.663000] <4>ATA: abnormal status 0x80 on port 0xC007 [4347567.674000] ATA: abnormal status 0x80 on port 0xC007 [4347567.674000] ATA: abnormal status 0x80 on port 0xC007 [4347597.674000] ata1: command 0x35 timeout, stat 0x80 host_stat 0x21 [4347597.674000] ata1: translated ATA stat/err 0x80/00 to SCSI SK/ASC/ASCQ 0xb/4 7/00 [4347597.674000] ata1: status=0x80 { Busy } [4347597.674000] sd 0:0:0:0: SCSI error: return code = 0x8000002 [4347597.674000] sda: Current: sense key=0xb [4347597.674000] ASC=0x47 ASCQ=0x0 [4347597.674000] end_request: I/O error, dev sda, sector 4401810 [4347597.674000] ATA: abnormal status 0x80 on port 0xC007 [4347597.674000] ATA: abnormal status 0x80 on port 0xC007 [4347597.674000] ATA: abnormal status 0x80 on port 0xC007 [4347627.674000] ata1: command 0x35 timeout, stat 0x80 host_stat 0x21 [4347627.674000] ata1: translated ATA stat/err 0x80/00 to SCSI SK/ASC/ASCQ 0xb/4 7/00 [4347627.674000] ata1: status=0x80 { Busy } [4347627.674000] sd 0:0:0:0: SCSI error: return code = 0x8000002 [4347627.674000] sda: Current: sense key=0xb [4347627.674000] ASC=0x47 ASCQ=0x0 [4347627.674000] end_request: I/O error, dev sda, sector 110074018 [4347627.674000] ATA: abnormal status 0x80 on port 0xC007 [4347627.674000] ATA: abnormal status 0x80 on port 0xC007 [4347627.674000] ATA: abnormal status 0x80 on port 0xC007 .. ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } SCSI error : <0 0 0 0> return code = 0x8000002 sda: Current: sense key=0x3 ASC=0x11 ASCQ=0x4 end_request: I/O error, dev sda, sector 66006018 Buffer I/O error on device sda2, logical block 61604208 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } SCSI error : <0 0 0 0> return code = 0x8000002 sda: Current: sense key=0x3 ASC=0x11 ASCQ=0x4 end_request: I/O error, dev sda, sector 66006019 Buffer I/O error on device sda2, logical block 61604209 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } SCSI error : <0 0 0 0> return code = 0x8000002 sda: Current: sense key=0x3 ASC=0x11 ASCQ=0x4 end_request: I/O error, dev sda, sector 66006020 Buffer I/O error on device sda2, logical block 61604210 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } SCSI error : <0 0 0 0> return code = 0x8000002 sda: Current: sense key=0x3 ASC=0x11 ASCQ=0x4 end_request: I/O error, dev sda, sector 66006021 Buffer I/O error on device sda2, logical block 61604211 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } SCSI error : <0 0 0 0> return code = 0x8000002 sda: Current: sense key=0x3 ASC=0x11 ASCQ=0x4 end_request: I/O error, dev sda, sector 66006018 Buffer I/O error on device sda2, logical block 61604208 ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x40 { UncorrectableError } SCSI error : <0 0 0 0> return code = 0x8000002 sda: Current: sense key=0x3 ASC=0x11 ASCQ=0x4 end_request: I/O error, dev sda, sector 66006019 .. I later ran mkfs.ext2 -c /dev/sda and it kept returning errors such as these: ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x40 { UncorrectableError } ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x40 { UncorrectableError } ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x40 { UncorrectableError } ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x40 { UncorrectableError } ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x40 { UncorrectableError } SCSI error : <2 0 0 0> return code = 0x8000002 sda: Current: sense key=0x3 ASC=0x11 ASCQ=0x4 end_request: I/O error, dev sda, sector 66006016 I ran WD's tool on the drive, it confirmed it had problems. Luckily I have a spare raptor and restored from backup and I am now back up and running with no errors yet. Justin. ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-05 22:58 ` Mark Lord 2006-03-05 23:00 ` Mark Lord @ 2006-03-05 23:39 ` Jeff Garzik 1 sibling, 0 replies; 131+ messages in thread From: Jeff Garzik @ 2006-03-05 23:39 UTC (permalink / raw) To: Mark Lord Cc: Justin Piszcz, David Greaves, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Mark Lord wrote: > SCSI opcode 0x35 is SYNCHRONIZE_CACHE. > > Pity we don't know exactly what that got translated to by libata. Gave up on reading code? If not, we know exactly what it was translated into. Jeff ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 18:30 ` Mark Lord ` (2 preceding siblings ...) 2006-03-01 18:48 ` David Greaves @ 2006-03-01 19:06 ` Justin Piszcz 2006-03-01 19:28 ` Mark Lord 2006-03-01 19:35 ` Mark Lord 3 siblings, 2 replies; 131+ messages in thread From: Justin Piszcz @ 2006-03-01 19:06 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Wed, 1 Mar 2006, Mark Lord wrote: > David Greaves wrote: >> >> haze:/usr/src# smartctl -data -o off /dev/sdc >> succeeds but gives me: >> >> ata3: status=0x50 { DriveReady SeekComplete } >> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } >> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } >> ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } > > "DriveStatusError" is "Command Aborted" in ac-speak. > From the man page for smartctl, we read: > >> -o VALUE Enables or disables SMART automatic offline test ... >> Note that the SMART automatic offline test command is listed as "Obsolete" > in every >> version of the ATA and ATA/ATAPI Specifications. It was originally part > of the >> SFF-8035i Revision 2.0 specification, but was never part of any ATA > specification. > > There's a chance that your drives simply do not fully support this feature, > and are rejecting attempts to use it. > > By the way, the latest 2.6.16-rc5-git4 is available, > and has FUA turned off by default now. So it should > work with your drives, and *you* are expected to verify > that for us all now. > > Cheers > > -ml > By the way, the latest 2.6.16-rc5-git4 is available, I am using 2.6.16-rc5-git4, and after running: # smartctl -data -o off /dev/sdc I get: [4294785.192000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4294785.192000] ata3: status=0x51 { DriveReady SeekComplete Error } [4294785.192000] ata3: error=0x04 { DriveStatusError } [4294785.192000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4294785.192000] ata3: status=0x51 { DriveReady SeekComplete Error } [4294785.192000] ata3: error=0x04 { DriveStatusError } [4294785.192000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4294785.192000] ata3: status=0x51 { DriveReady SeekComplete Error } [4294785.192000] ata3: error=0x04 { DriveStatusError } [4294785.192000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4294785.192000] ata3: status=0x51 { DriveReady SeekComplete Error } [4294785.192000] ata3: error=0x04 { DriveStatusError } [4294785.192000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4294785.192000] ata3: status=0x51 { DriveReady SeekComplete Error } [4294785.192000] ata3: error=0x04 { DriveStatusError } Did you mean you wanted us to test it like we normally do, ie, copy files/md5sum them on the disk and see if we can make it occur again, or? Justin. ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 19:06 ` Justin Piszcz @ 2006-03-01 19:28 ` Mark Lord 2006-03-01 19:35 ` Mark Lord 1 sibling, 0 replies; 131+ messages in thread From: Mark Lord @ 2006-03-01 19:28 UTC (permalink / raw) To: Justin Piszcz Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Justin Piszcz wrote: > > I am using 2.6.16-rc5-git4, and after running: > > # smartctl -data -o off /dev/sdc > > I get: > > [4294785.192000] ata3: translated ATA stat/err 0x51/04 to SCSI > SK/ASC/ASCQ 0xb/00/00 > [4294785.192000] ata3: status=0x51 { DriveReady SeekComplete Error } > [4294785.192000] ata3: error=0x04 { DriveStatusError } That's probably just your drive reporting "unsupported sub-command". Nothing serious -- the man page for smartctl even mentions the possibility. Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 19:06 ` Justin Piszcz 2006-03-01 19:28 ` Mark Lord @ 2006-03-01 19:35 ` Mark Lord 2006-03-01 19:38 ` Justin Piszcz 1 sibling, 1 reply; 131+ messages in thread From: Mark Lord @ 2006-03-01 19:35 UTC (permalink / raw) To: Justin Piszcz Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Justin Piszcz wrote: > > Did you mean you wanted us to test it like we normally do, ie, copy > files/md5sum them on the disk and see if we can make it occur again, or? Yes. The S.M.A.R.T. stuff doesn't matter nearly as much as normal I/O. And Justin, can you get those S.M.A.R.T. errors to pop up on 2.6.15 as well? ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 19:35 ` Mark Lord @ 2006-03-01 19:38 ` Justin Piszcz 2006-03-01 19:41 ` Jeff Garzik 0 siblings, 1 reply; 131+ messages in thread From: Justin Piszcz @ 2006-03-01 19:38 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Jeff Garzik, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds On Wed, 1 Mar 2006, Mark Lord wrote: > Justin Piszcz wrote: >> >> Did you mean you wanted us to test it like we normally do, ie, copy >> files/md5sum them on the disk and see if we can make it occur again, or? > > Yes. The S.M.A.R.T. stuff doesn't matter nearly as much as normal I/O. > > And Justin, can you get those S.M.A.R.T. errors to pop up on 2.6.15 as well? > Have not tested, can test later if necessary, running some I/O tests to the disk which is probably going to take quite a while to see if I can get it to error again with 2.6.16-rc5-git4. Justin. ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-03-01 19:38 ` Justin Piszcz @ 2006-03-01 19:41 ` Jeff Garzik 0 siblings, 0 replies; 131+ messages in thread From: Jeff Garzik @ 2006-03-01 19:41 UTC (permalink / raw) To: Justin Piszcz Cc: Mark Lord, David Greaves, Tejun Heo, linux-kernel, IDE/ATA development list, albertcc, axboe, Linus Torvalds Justin Piszcz wrote: > > > On Wed, 1 Mar 2006, Mark Lord wrote: > >> Justin Piszcz wrote: >> >>> >>> Did you mean you wanted us to test it like we normally do, ie, copy >>> files/md5sum them on the disk and see if we can make it occur again, or? >> >> >> Yes. The S.M.A.R.T. stuff doesn't matter nearly as much as normal I/O. >> >> And Justin, can you get those S.M.A.R.T. errors to pop up on 2.6.15 as >> well? >> > > Have not tested, can test later if necessary, running some I/O tests to > the disk which is probably going to take quite a while to see if I can > get it to error again with 2.6.16-rc5-git4. If there are FUA problems, it would be immediately apparent on the first write... Jeff ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-26 2:27 ` Mark Lord 2006-02-26 9:56 ` David Greaves @ 2006-02-26 12:27 ` James Courtier-Dutton 2006-02-26 12:55 ` David Greaves 2006-02-26 13:56 ` Mark Lord 1 sibling, 2 replies; 131+ messages in thread From: James Courtier-Dutton @ 2006-02-26 12:27 UTC (permalink / raw) To: Mark Lord Cc: David Greaves, Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun Mark Lord wrote: > David Greaves wrote: >> >> Linux haze 2.6.16-rc4patched #1 PREEMPT Sat Feb 25 19:29:11 UTC 2006 >> i686 GNU/Linux >> >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: error=0x04 { DriveStatusError } >> ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51 >> ata2: status=0x51 { DriveReady SeekComplete Error } >> sd 1:0:0:0: SCSI error: return code = 0x8000002 >> sdb: Current: sense key: Medium Error >> Additional sense: Unrecovered read error - auto reallocate failed >> end_request: I/O error, dev sdb, sector 398283329 >> raid1: Disk failure on sdb2, disabling device. >> Operation continuing on 1 devices > > Oh good, *now* we've gotten somewhere!! > > Albert / Jens / Jeff: > > The command failing above is SCSI WRITE_10, which is being > translated into ATA_CMD_WRITE_FUA_EXT by libata. > > This command fails -- unrecognized by the drive in question. > But libata reports it (most incorrectly) as a "medium error", > and the drive is taken out of service from its RAID. > > Bad, bad, and worse. > I have what looks like similar problems. The issue I have is that I don't think the problem is ONLY libata related. I have two linux PCs. One called "games", the other called "localhost". The problem happens quite quickly on the old "games" machine, but I can run for days/weeks until I see the problem on the "localhost". It might be happening on the "localhost", but I am just not noticing. The difference being that if reiserfs sees this error, it cannot recover, and I have reiserfs on the "games" machine. The "localhost" only uses ext3, and ext3 recovers gracefully from this problem. Can I use libata on this old "games" machine? It is an old Pentium 3 machine. In any case, The "games" machine is currently switched off until I can find a kernel that works, so I will happily test different kernels and patches, if people have suggestions. I have two desktop linux machines. One is an old Pentium 3 which shows the following errors(no libata involved): Linux version 2.6.15-rc4 (root@games) (gcc version 4.0.3 20051111 (prerelease) (Debian 4.0.2-4) ) #1 Sat Dec 3 18:47:19 GMT 2005 Dec 16 22:51:57 games kernel: hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error } Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=53058185, sector=53057951 Dec 16 22:52:32 games kernel: ide: failed opcode was: unknown Dec 16 22:52:32 games kernel: end_request: I/O error, dev hdc, sector 53057951 Dec 16 22:52:32 games kernel: hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error } Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x10 { SectorIdNotFound }, LBAsect=53058185, sector=53057959 Dec 16 22:52:32 games kernel: ide: failed opcode was: unknown The other has the following errors: Linux version 2.6.15.1 (root@localhost) (gcc version 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pi e-8.7.9)) #3 SMP PREEMPT Fri Feb 3 23:19:05 GMT 2006 Feb 10 23:30:07 localhost kernel: ata3: command 0xb0 timeout, stat 0xd0 host_stat 0x0 Feb 10 23:30:07 localhost kernel: ata3: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 Feb 10 23:30:07 localhost kernel: ata3: status=0xd0 { Busy } Feb 10 23:30:07 localhost kernel: ATA: abnormal status 0xD0 on port 0xF880E087 Feb 10 23:30:07 localhost last message repeated 3 times Feb 10 23:30:10 localhost kernel: ata3: PIO error Feb 10 23:30:10 localhost kernel: ata3: status=0x50 { DriveReady SeekComplete } Feb 11 10:18:10 localhost kernel: ata2: command 0xb0 timeout, stat 0xd0 host_stat 0x0 Feb 11 10:18:10 localhost kernel: ata2: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 Feb 11 10:18:10 localhost kernel: ata2: status=0xd0 { Busy } Feb 11 10:18:10 localhost kernel: ATA: abnormal status 0xD0 on port 0x177 Feb 11 10:18:10 localhost last message repeated 3 times ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-26 12:27 ` James Courtier-Dutton @ 2006-02-26 12:55 ` David Greaves 2006-02-26 13:56 ` Mark Lord 1 sibling, 0 replies; 131+ messages in thread From: David Greaves @ 2006-02-26 12:55 UTC (permalink / raw) To: James Courtier-Dutton Cc: Mark Lord, Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun James Courtier-Dutton wrote: > I have two desktop linux machines. One is an old Pentium 3 which shows > the following errors(no libata involved): > Linux version 2.6.15-rc4 (root@games) (gcc version 4.0.3 20051111 > (prerelease) (Debian 4.0.2-4) > ) #1 Sat Dec 3 18:47:19 GMT 2005 > Dec 16 22:51:57 games kernel: hdc: dma_intr: status=0x51 { DriveReady > SeekComplete Error } > Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x40 { > UncorrectableError }, LBAsect=53058185, sector=53057951 > Dec 16 22:52:32 games kernel: ide: failed opcode was: unknown > Dec 16 22:52:32 games kernel: end_request: I/O error, dev hdc, sector > 53057951 > Dec 16 22:52:32 games kernel: hdc: dma_intr: status=0x51 { DriveReady > SeekComplete Error } > Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x10 { > SectorIdNotFound }, LBAsect=53058185, sector=53057959 > Dec 16 22:52:32 games kernel: ide: failed opcode was: unknown This looks like a simple bad disk drive. Notice that the sectors are quite close. If you like you can move the drive to a working machine and run a badblocks on it. do 'man badblocks' before you start. Is it SMART capable? What does smartctl -a /dev/hdc show? ddrescue may be your friend if you need to recover data. Reply offlist if this is the case. > The other has the following errors: > Linux version 2.6.15.1 (root@localhost) (gcc version 3.4.5 (Gentoo > 3.4.5, ssp-3.4.5-1.0, pi > e-8.7.9)) #3 SMP PREEMPT Fri Feb 3 23:19:05 GMT 2006 > Feb 10 23:30:07 localhost kernel: ata3: command 0xb0 timeout, stat > 0xd0 host_stat 0x0 > Feb 10 23:30:07 localhost kernel: ata3: translated ATA stat/err > 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 > Feb 10 23:30:07 localhost kernel: ata3: status=0xd0 { Busy } > Feb 10 23:30:07 localhost kernel: ATA: abnormal status 0xD0 on port > 0xF880E087 > Feb 10 23:30:07 localhost last message repeated 3 times > Feb 10 23:30:10 localhost kernel: ata3: PIO error > Feb 10 23:30:10 localhost kernel: ata3: status=0x50 { DriveReady > SeekComplete } > Feb 11 10:18:10 localhost kernel: ata2: command 0xb0 timeout, stat > 0xd0 host_stat 0x0 > Feb 11 10:18:10 localhost kernel: ata2: translated ATA stat/err > 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 > Feb 11 10:18:10 localhost kernel: ata2: status=0xd0 { Busy } > Feb 11 10:18:10 localhost kernel: ATA: abnormal status 0xD0 on port 0x177 > Feb 11 10:18:10 localhost last message repeated 3 times Have you got smartd running? I get a similar problem running some smartcl commands (-s on and -o on) I suspect this is a libata ata passthru problem - but I'm *guessing* :) check the last messages in dmesg then run smartctl -data -s on /dev/sd... smartctl -data -o on /dev/sd... See if there are new messages in dmesg David -- ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-26 12:27 ` James Courtier-Dutton 2006-02-26 12:55 ` David Greaves @ 2006-02-26 13:56 ` Mark Lord 1 sibling, 0 replies; 131+ messages in thread From: Mark Lord @ 2006-02-26 13:56 UTC (permalink / raw) To: James Courtier-Dutton Cc: Mark Lord, David Greaves, Justin Piszcz, Jeff Garzik, linux-kernel, IDE/ATA development list, albertcc, axboe, htejun James Courtier-Dutton wrote: > > I have what looks like similar problems. The issue I have is that I Nope. Different issues. > ) #1 Sat Dec 3 18:47:19 GMT 2005 > Dec 16 22:51:57 games kernel: hdc: dma_intr: status=0x51 { DriveReady > SeekComplete Error } > Dec 16 22:52:32 games kernel: hdc: dma_intr: error=0x40 { > UncorrectableError }, LBAsect=53058185, sector=53057951 The disk really does have bad sectors in this case (above). > The other has the following errors: > Linux version 2.6.15.1 (root@localhost) (gcc version 3.4.5 (Gentoo > 3.4.5, ssp-3.4.5-1.0, pi > e-8.7.9)) #3 SMP PREEMPT Fri Feb 3 23:19:05 GMT 2006 > Feb 10 23:30:07 localhost kernel: ata3: command 0xb0 timeout, stat 0xd0 > host_stat 0x0 > Feb 10 23:30:07 localhost kernel: ata3: translated ATA stat/err 0xd0/00 > to SCSI SK/ASC/ASCQ 0xb/47/00 > Feb 10 23:30:07 localhost kernel: ata3: status=0xd0 { Busy } > Feb 10 23:30:07 localhost kernel: ATA: abnormal status 0xD0 on port > 0xF880E087 > Feb 10 23:30:07 localhost last message repeated 3 times > Feb 10 23:30:10 localhost kernel: ata3: PIO error > Feb 10 23:30:10 localhost kernel: ata3: status=0x50 { DriveReady > SeekComplete } > Feb 11 10:18:10 localhost kernel: ata2: command 0xb0 timeout, stat 0xd0 > host_stat 0x0 > Feb 11 10:18:10 localhost kernel: ata2: translated ATA stat/err 0xd0/00 > to SCSI SK/ASC/ASCQ 0xb/47/00 > Feb 11 10:18:10 localhost kernel: ata2: status=0xd0 { Busy } > Feb 11 10:18:10 localhost kernel: ATA: abnormal status 0xD0 on port 0x177 > Feb 11 10:18:10 localhost last message repeated 3 times PIO errors? Are you using Alan Cox's experimental PATA code for libata? -ml ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-14 14:50 ` Mark Lord 2006-02-14 16:27 ` David Greaves @ 2006-02-14 23:58 ` Justin Piszcz 2006-02-17 8:45 ` Jeff Garzik 2 siblings, 0 replies; 131+ messages in thread From: Justin Piszcz @ 2006-02-14 23:58 UTC (permalink / raw) To: Mark Lord; +Cc: Jeff Garzik, linux-kernel, IDE/ATA development list FYI: Make a 100GB file, md5sum it, copy it to 'problem' drive and md5sum it, same MD5SUMS. box:/x8# /usr/bin/time dd if=/dev/zero of=100gb bs=1M count=100000 ; /usr/bin/time md5sum 100gb; /usr/bin/time cp 100gb /x4 ; cd /x4 ; /usr/bin/time md5sum 100gb 100000+0 records in 100000+0 records out 104857600000 bytes transferred in 4735.034107 seconds (22145057 bytes/sec) 0.29user 245.59system 1:18:55elapsed 5%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+210minor)pagefaults 0swaps 1e95cd44e2cb773f483ea7b2f676258d 100gb 248.24user 98.17system 32:50.97elapsed 17%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1major+188minor)pagefaults 0swaps 14.75user 341.92system 35:25.25elapsed 16%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (4major+183minor)pagefaults 0swaps 1e95cd44e2cb773f483ea7b2f676258d 100gb 246.95user 110.41system 28:06.49elapsed 21%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1major+190minor)pagefaults 0swaps box:/x4# Also, all SMART tests passed with flying colors.. (FYI) On Tue, 14 Feb 2006, Mark Lord wrote: > Justin Piszcz wrote: > .. >> ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> ata3: status=0x51 { DriveReady SeekComplete Error } >> ata3: error=0x04 { DriveStatusError } > > I wonder if the FUA logic is inserting cache-flush commands > and perhaps the drive is rejecting those? > > Jeff, we really ought to be including the failed ATA opcode > in those error messages!! > > Cheers > ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-14 14:50 ` Mark Lord 2006-02-14 16:27 ` David Greaves 2006-02-14 23:58 ` Justin Piszcz @ 2006-02-17 8:45 ` Jeff Garzik 2006-02-17 14:59 ` Mark Lord 2 siblings, 1 reply; 131+ messages in thread From: Jeff Garzik @ 2006-02-17 8:45 UTC (permalink / raw) To: Mark Lord; +Cc: Justin Piszcz, linux-kernel, IDE/ATA development list Mark Lord wrote: > Jeff, we really ought to be including the failed ATA opcode > in those error messages!! Submit a patch... Jeff ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-17 8:45 ` Jeff Garzik @ 2006-02-17 14:59 ` Mark Lord 2006-02-17 15:00 ` Justin Piszcz 2006-02-18 20:43 ` Sander 0 siblings, 2 replies; 131+ messages in thread From: Mark Lord @ 2006-02-17 14:59 UTC (permalink / raw) To: Jeff Garzik; +Cc: Justin Piszcz, linux-kernel, IDE/ATA development list On Friday 17 February 2006 03:45, Jeff Garzik wrote: >Submit a patch... You mean, something like this one? Untested at present, as I was hoping to hear back from one of the original problem reporters after they tested it. Cheers! -------- Original Message -------- Subject: Re: LibPATA code issues / 2.6.15.4 Date: Tue, 14 Feb 2006 13:00:36 -0500 From: Mark Lord <lkml@rtr.ca> To: Justin Piszcz <jpiszcz@lucidpixels.com> CC: David Greaves <david@dgreaves.com>, Jeff Garzik <jgarzik@pobox.com>, linux-kernel@vger.kernel.org, IDE/ATA development list <linux-ide@vger.kernel.org> References: <Pine.LNX.4.64.0602140439580.3567@p34> <43F2050B.8020006@dgreaves.com> <Pine.LNX.4.64.0602141211350.10793@p34> On Tuesday 14 February 2006 12:12, Justin Piszcz wrote: > I would like to try the patch too, if available. Something like this: (for 2.6.16-rc3-git2, but should be okay on 2.6.15 also). Untested: include the original SCSI opcode in printk's for libata SCSI errors, to help understand where the errors are coming from. Signed-Off-By: Mark Lord <mlord@pobox.com> --- linux/drivers/scsi/libata-scsi.c.orig 2006-02-12 19:27:25.000000000 -0500 +++ linux/drivers/scsi/libata-scsi.c 2006-02-14 12:54:17.000000000 -0500 @@ -420,6 +420,7 @@ * @sk: the sense key we'll fill out * @asc: the additional sense code we'll fill out * @ascq: the additional sense code qualifier we'll fill out + * @opcode: the original SCSI command opcode byte * * Converts an ATA error into a SCSI error. Fill out pointers to * SK, ASC, and ASCQ bytes for later use in fixed or descriptor @@ -429,7 +430,7 @@ * spin_lock_irqsave(host_set lock) */ void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc, - u8 *ascq) + u8 *ascq, u8 opcode) { int i; @@ -508,8 +509,8 @@ } } /* No error? Undecoded? */ - printk(KERN_WARNING "ata%u: no sense translation for status: 0x%02x\n", - id, drv_stat); + printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: 0x%02x\n", + id, opcode, drv_stat); /* For our last chance pick, use medium read error because * it's much more common than an ATA drive telling you a write @@ -520,8 +521,8 @@ *ascq = 0x04; /* "auto-reallocation failed" */ translate_done: - printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to " - "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err, + printk(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to " + "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err, *sk, *asc, *ascq); return; } @@ -562,7 +563,7 @@ */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { ata_to_sense_error(qc->ap->id, tf->command, tf->feature, - &sb[1], &sb[2], &sb[3]); + &sb[1], &sb[2], &sb[3], cmd->cmnd[0]); sb[1] &= 0x0f; } @@ -637,7 +638,7 @@ */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { ata_to_sense_error(qc->ap->id, tf->command, tf->feature, - &sb[2], &sb[12], &sb[13]); + &sb[2], &sb[12], &sb[13], cmd->cmnd[0]); sb[2] &= 0x0f; } - ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-17 14:59 ` Mark Lord @ 2006-02-17 15:00 ` Justin Piszcz 2006-02-18 20:43 ` Sander 1 sibling, 0 replies; 131+ messages in thread From: Justin Piszcz @ 2006-02-17 15:00 UTC (permalink / raw) To: Mark Lord; +Cc: Jeff Garzik, linux-kernel, IDE/ATA development list I have patched the kernel and rebooted it with your patch, but, of course, with my luck it has not given me any errors since, even when repeating major file copies, bonnie++ and iozone!! :( On Fri, 17 Feb 2006, Mark Lord wrote: > On Friday 17 February 2006 03:45, Jeff Garzik wrote: >> Submit a patch... > > You mean, something like this one? > Untested at present, as I was hoping to hear > back from one of the original problem reporters > after they tested it. > > Cheers! > > > > -------- Original Message -------- > Subject: Re: LibPATA code issues / 2.6.15.4 > Date: Tue, 14 Feb 2006 13:00:36 -0500 > From: Mark Lord <lkml@rtr.ca> > To: Justin Piszcz <jpiszcz@lucidpixels.com> > CC: David Greaves <david@dgreaves.com>, Jeff Garzik <jgarzik@pobox.com>, > linux-kernel@vger.kernel.org, IDE/ATA development list > <linux-ide@vger.kernel.org> > References: <Pine.LNX.4.64.0602140439580.3567@p34> > <43F2050B.8020006@dgreaves.com> <Pine.LNX.4.64.0602141211350.10793@p34> > > On Tuesday 14 February 2006 12:12, Justin Piszcz wrote: >> I would like to try the patch too, if available. > > Something like this: (for 2.6.16-rc3-git2, but should be okay on 2.6.15 > also). > > Untested: include the original SCSI opcode in printk's for libata SCSI > errors, > to help understand where the errors are coming from. > > Signed-Off-By: Mark Lord <mlord@pobox.com> > > --- linux/drivers/scsi/libata-scsi.c.orig 2006-02-12 19:27:25.000000000 -0500 > +++ linux/drivers/scsi/libata-scsi.c 2006-02-14 12:54:17.000000000 -0500 > @@ -420,6 +420,7 @@ > * @sk: the sense key we'll fill out > * @asc: the additional sense code we'll fill out > * @ascq: the additional sense code qualifier we'll fill out > + * @opcode: the original SCSI command opcode byte > * > * Converts an ATA error into a SCSI error. Fill out pointers to > * SK, ASC, and ASCQ bytes for later use in fixed or descriptor > @@ -429,7 +430,7 @@ > * spin_lock_irqsave(host_set lock) > */ > void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 > *asc, > - u8 *ascq) > + u8 *ascq, u8 opcode) > { > int i; > > @@ -508,8 +509,8 @@ > } > } > /* No error? Undecoded? */ > - printk(KERN_WARNING "ata%u: no sense translation for status: 0x%02x\n", > - id, drv_stat); > + printk(KERN_WARNING "ata%u: no sense translation for op=0x%02x status: > 0x%02x\n", > + id, opcode, drv_stat); > > /* For our last chance pick, use medium read error because > * it's much more common than an ATA drive telling you a write > @@ -520,8 +521,8 @@ > *ascq = 0x04; /* "auto-reallocation failed" */ > > translate_done: > - printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to " > - "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err, > + printk(KERN_ERR "ata%u: translated op=0x%02x ATA stat/err 0x%02x/%02x to " > + "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, opcode, drv_stat, drv_err, > *sk, *asc, *ascq); > return; > } > @@ -562,7 +563,7 @@ > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > - &sb[1], &sb[2], &sb[3]); > + &sb[1], &sb[2], &sb[3], cmd->cmnd[0]); > sb[1] &= 0x0f; > } > > @@ -637,7 +638,7 @@ > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > - &sb[2], &sb[12], &sb[13]); > + &sb[2], &sb[12], &sb[13], cmd->cmnd[0]); > sb[2] &= 0x0f; > } > > - > ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-17 14:59 ` Mark Lord 2006-02-17 15:00 ` Justin Piszcz @ 2006-02-18 20:43 ` Sander 2006-02-18 21:42 ` Mark Lord 1 sibling, 1 reply; 131+ messages in thread From: Sander @ 2006-02-18 20:43 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list Mark Lord wrote (ao): > On Friday 17 February 2006 03:45, Jeff Garzik wrote: > >Submit a patch... > > You mean, something like this one? > Untested at present, as I was hoping to hear > back from one of the original problem reporters > after they tested it. Not the original reporter, but your patch Works For Me. I get these: [ 633.449961] md: md1: sync done. [ 633.456070] RAID5 conf printout: [ 633.456117] --- rd:9 wd:9 fd:0 [ 633.456164] disk 0, o:1, dev:sda2 [ 633.456208] disk 1, o:1, dev:sdb2 [ 633.456250] disk 2, o:1, dev:sdc2 [ 633.456298] disk 3, o:1, dev:sdd2 [ 633.456340] disk 4, o:1, dev:sde2 [ 633.456383] disk 5, o:1, dev:sdf2 [ 633.456427] disk 6, o:1, dev:sdg2 [ 633.456470] disk 7, o:1, dev:sdh2 [ 633.456514] disk 8, o:1, dev:sdi2 [ 787.639858] kjournald starting. Commit interval 5 seconds [ 787.657991] EXT3 FS on md1, internal journal [ 787.658023] EXT3-fs: mounted filesystem with writeback data mode. [ 1872.338185] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 [ 1872.338239] ata6: status=0xd0 { Busy } [ 5749.285084] ata8: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 [ 5749.285138] ata8: status=0xd0 { Busy } [ 5906.008461] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 [ 5906.008515] ata6: status=0xd0 { Busy } [ 9892.904205] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 [ 9892.904259] ata6: status=0xd0 { Busy } [10146.084687] ata5: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 [10146.084740] ata5: status=0xd0 { Busy } [10293.949040] ata5: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 [10293.949093] ata5: status=0xd0 { Busy } Can you tell from this what they mean? This is with 2.6.16-rc3, your patch, and running nine Maxtors disks over onboard nForce4 and MV88SX6081 8-port SATA II PCI-X Controller (rev 09). for i in `seq 10` do dd if=/dev/zero of=bigfile.$i bs=1024k count=10000 done md5sum bigfile.* The errors mostly seem to happen during the md5sum (not during the dd). I do not see data corruption or slowdown. I do need a chunksize of 512k for the raid5. With anything lower (I tried the default 64k, 128k, 256k, 512k and 4096k) I get data corruption and the errors reported in: http://marc.theaimsgroup.com/?l=linux-ide&m=114016903530007&w=2 Thanks! Sander -- Humilis IT Services and Solutions http://www.humilis.net ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-18 20:43 ` Sander @ 2006-02-18 21:42 ` Mark Lord 2006-02-18 21:51 ` Justin Piszcz 2006-02-19 7:14 ` Sander 0 siblings, 2 replies; 131+ messages in thread From: Mark Lord @ 2006-02-18 21:42 UTC (permalink / raw) To: sander; +Cc: Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list Sander wrote: > Mark Lord wrote (ao): >> On Friday 17 February 2006 03:45, Jeff Garzik wrote: >>> Submit a patch... >> You mean, something like this one? ... > [ 633.449961] md: md1: sync done. > [ 633.456070] RAID5 conf printout: > [ 633.456117] --- rd:9 wd:9 fd:0 ... > [ 1872.338185] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 > [ 1872.338239] ata6: status=0xd0 { Busy } > [ 5749.285084] ata8: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 > [ 5749.285138] ata8: status=0xd0 { Busy } > [ 5906.008461] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 > [ 5906.008515] ata6: status=0xd0 { Busy } ... > This is with 2.6.16-rc3, your patch, and running nine Maxtors disks > over onboard nForce4 and MV88SX6081 8-port SATA II PCI-X Controller (rev 09). > > for i in `seq 10` > do dd if=/dev/zero of=bigfile.$i bs=1024k count=10000 > done > md5sum bigfile.* > > The errors mostly seem to happen during the md5sum (not during the dd). SCSI opcode 0x2a is WRITE_10, so the errors are being reported in response to the writes to bigfile.$i. But these are different from the previously reported error status values -- I wonder why it's getting "Busy" back as a status here ?? ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-18 21:42 ` Mark Lord @ 2006-02-18 21:51 ` Justin Piszcz 2006-02-19 7:14 ` Sander 1 sibling, 0 replies; 131+ messages in thread From: Justin Piszcz @ 2006-02-18 21:51 UTC (permalink / raw) To: Mark Lord; +Cc: sander, Jeff Garzik, linux-kernel, IDE/ATA development list $ for i in `seq 10` > do dd if=/dev/zero of=bigfile.$i bs=1024k count=10000 > done 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 190.997693 seconds (54899930 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 212.242724 seconds (49404568 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 189.324450 seconds (55385134 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 190.280352 seconds (55106898 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 191.567239 seconds (54736708 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 183.640928 seconds (57099254 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 179.974098 seconds (58262606 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 190.126087 seconds (55151611 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 192.227807 seconds (54548612 bytes/sec) 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 185.309607 seconds (56585086 bytes/sec) war@p34:/x4$ md5sum bigfile.* 26f56024ac39cdc54b228820107f040d bigfile.1 26f56024ac39cdc54b228820107f040d bigfile.10 26f56024ac39cdc54b228820107f040d bigfile.2 26f56024ac39cdc54b228820107f040d bigfile.3 26f56024ac39cdc54b228820107f040d bigfile.4 26f56024ac39cdc54b228820107f040d bigfile.5 26f56024ac39cdc54b228820107f040d bigfile.6 26f56024ac39cdc54b228820107f040d bigfile.7 26f56024ac39cdc54b228820107f040d bigfile.8 26f56024ac39cdc54b228820107f040d bigfile.9 No errors in dmesg yet (for my issue). On Sat, 18 Feb 2006, Mark Lord wrote: > Sander wrote: >> Mark Lord wrote (ao): >>> On Friday 17 February 2006 03:45, Jeff Garzik wrote: >>>> Submit a patch... >>> You mean, something like this one? > ... >> [ 633.449961] md: md1: sync done. >> [ 633.456070] RAID5 conf printout: >> [ 633.456117] --- rd:9 wd:9 fd:0 > ... >> [ 1872.338185] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI >> SK/ASC/ASCQ 0xb/47/00 >> [ 1872.338239] ata6: status=0xd0 { Busy } >> [ 5749.285084] ata8: translated op=0x2a ATA stat/err 0xd0/00 to SCSI >> SK/ASC/ASCQ 0xb/47/00 >> [ 5749.285138] ata8: status=0xd0 { Busy } >> [ 5906.008461] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI >> SK/ASC/ASCQ 0xb/47/00 >> [ 5906.008515] ata6: status=0xd0 { Busy } > ... >> This is with 2.6.16-rc3, your patch, and running nine Maxtors disks >> over onboard nForce4 and MV88SX6081 8-port SATA II PCI-X Controller (rev >> 09). >> >> for i in `seq 10` >> do dd if=/dev/zero of=bigfile.$i bs=1024k count=10000 >> done >> md5sum bigfile.* >> >> The errors mostly seem to happen during the md5sum (not during the dd). > > SCSI opcode 0x2a is WRITE_10, so the errors are being reported > in response to the writes to bigfile.$i. But these are different > from the previously reported error status values -- I wonder why > it's getting "Busy" back as a status here ?? > ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-18 21:42 ` Mark Lord 2006-02-18 21:51 ` Justin Piszcz @ 2006-02-19 7:14 ` Sander 2006-02-19 15:30 ` Mark Lord 1 sibling, 1 reply; 131+ messages in thread From: Sander @ 2006-02-19 7:14 UTC (permalink / raw) To: Mark Lord Cc: sander, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list Mark Lord wrote (ao): > Sander wrote: > >Mark Lord wrote (ao): > >>On Friday 17 February 2006 03:45, Jeff Garzik wrote: > >>>Submit a patch... > >>You mean, something like this one? > ... > >[ 633.449961] md: md1: sync done. > >[ 633.456070] RAID5 conf printout: > >[ 633.456117] --- rd:9 wd:9 fd:0 > ... > >[ 1872.338185] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI > >SK/ASC/ASCQ 0xb/47/00 > >[ 1872.338239] ata6: status=0xd0 { Busy } > >[ 5749.285084] ata8: translated op=0x2a ATA stat/err 0xd0/00 to SCSI > >SK/ASC/ASCQ 0xb/47/00 > >[ 5749.285138] ata8: status=0xd0 { Busy } > >[ 5906.008461] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI > >SK/ASC/ASCQ 0xb/47/00 > >[ 5906.008515] ata6: status=0xd0 { Busy } > ... > >This is with 2.6.16-rc3, your patch, and running nine Maxtors disks > >over onboard nForce4 and MV88SX6081 8-port SATA II PCI-X Controller (rev > >09). > > > >for i in `seq 10` > >do dd if=/dev/zero of=bigfile.$i bs=1024k count=10000 > >done > >md5sum bigfile.* > > > >The errors mostly seem to happen during the md5sum (not during the dd). > > SCSI opcode 0x2a is WRITE_10, so the errors are being reported > in response to the writes to bigfile.$i. Ah, my bad then. > But these are different from the previously reported error status > values -- I wonder why it's getting "Busy" back as a status here ?? Well, as I wrote, I am not the original reporter whoms thread you responded to with your patch. I just thought I could use it to get better errors messages for my bug reports. I am using the sata_mv driver, which is beta. That might explain why it behaves not totally as expected in your eyes. I have no clue anyway :-) I hope my reports are of any use to Jeff wrt the sata_mv driver. Thank you for your response. Sander -- Humilis IT Services and Solutions http://www.humilis.net ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-19 7:14 ` Sander @ 2006-02-19 15:30 ` Mark Lord 2006-02-19 17:16 ` Sander 0 siblings, 1 reply; 131+ messages in thread From: Mark Lord @ 2006-02-19 15:30 UTC (permalink / raw) To: sander; +Cc: Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list Sander wrote: > Mark Lord wrote (ao): >> Sander wrote: >>> Mark Lord wrote (ao): >>>> On Friday 17 February 2006 03:45, Jeff Garzik wrote: >>>>> Submit a patch... >>>> You mean, something like this one? >> ... >>> [ 633.449961] md: md1: sync done. >>> [ 633.456070] RAID5 conf printout: >>> [ 633.456117] --- rd:9 wd:9 fd:0 >> ... >>> [ 1872.338185] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI >>> SK/ASC/ASCQ 0xb/47/00 >>> [ 1872.338239] ata6: status=0xd0 { Busy } >>> [ 5749.285084] ata8: translated op=0x2a ATA stat/err 0xd0/00 to SCSI >>> SK/ASC/ASCQ 0xb/47/00 >>> [ 5749.285138] ata8: status=0xd0 { Busy } >>> [ 5906.008461] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI >>> SK/ASC/ASCQ 0xb/47/00 >>> [ 5906.008515] ata6: status=0xd0 { Busy } ... >> SCSI opcode 0x2a is WRITE_10, so the errors are being reported >> in response to the writes to bigfile.$i. ... > I am using the sata_mv driver, which is beta. That might explain why it > behaves not totally as expected in your eyes. I have no clue anyway :-) Ahh.. that's useful to know. I expect to be taking a long hard look at the innards of the sata_mv code in the near future, so whatever is wrong here just might get fixed soon. Cheers ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-19 15:30 ` Mark Lord @ 2006-02-19 17:16 ` Sander 2006-07-06 23:08 ` Justin Piszcz 0 siblings, 1 reply; 131+ messages in thread From: Sander @ 2006-02-19 17:16 UTC (permalink / raw) To: Mark Lord Cc: sander, Jeff Garzik, Justin Piszcz, linux-kernel, IDE/ATA development list Mark Lord wrote (ao): > Sander wrote: > >Mark Lord wrote (ao): > >>Sander wrote: > >>>Mark Lord wrote (ao): > >>>>On Friday 17 February 2006 03:45, Jeff Garzik wrote: > >>>>>Submit a patch... > >>>>You mean, something like this one? > >>... > >>>[ 633.449961] md: md1: sync done. > >>>[ 633.456070] RAID5 conf printout: > >>>[ 633.456117] --- rd:9 wd:9 fd:0 > >>... > >>>[ 1872.338185] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI > >>>SK/ASC/ASCQ 0xb/47/00 > >>>[ 1872.338239] ata6: status=0xd0 { Busy } > >>>[ 5749.285084] ata8: translated op=0x2a ATA stat/err 0xd0/00 to SCSI > >>>SK/ASC/ASCQ 0xb/47/00 > >>>[ 5749.285138] ata8: status=0xd0 { Busy } > >>>[ 5906.008461] ata6: translated op=0x2a ATA stat/err 0xd0/00 to SCSI > >>>SK/ASC/ASCQ 0xb/47/00 > >>>[ 5906.008515] ata6: status=0xd0 { Busy } > ... > >>SCSI opcode 0x2a is WRITE_10, so the errors are being reported > >>in response to the writes to bigfile.$i. > ... > >I am using the sata_mv driver, which is beta. That might explain why it > >behaves not totally as expected in your eyes. I have no clue anyway :-) > > Ahh.. that's useful to know. I'm sorry for omitting that information in my previous mail. > I expect to be taking a long hard look at the innards of the sata_mv > code in the near future, so whatever is wrong here just might get > fixed soon. Consider me your happy and willing patch test victim :-) I can easily reproduce data corruption with sata_mv. FWIW, I like this card very much. It is cheap, seems to perform well, and Marvell seems to be Linux friendly, providing the docs (according to http://linux-ata.org/sata-status.html#marvell). I'm not subscribed to linux-ide, but am to linux-kernel. If you post it there (or cc me) I'll see and try it. Sander -- Humilis IT Services and Solutions http://www.humilis.net ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-02-19 17:16 ` Sander @ 2006-07-06 23:08 ` Justin Piszcz 2006-07-07 13:08 ` Mark Lord 0 siblings, 1 reply; 131+ messages in thread From: Justin Piszcz @ 2006-07-06 23:08 UTC (permalink / raw) To: Sander; +Cc: Mark Lord, Jeff Garzik, linux-kernel, IDE/ATA development list Look at this: >From smartctl, look at the correspondence: 199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always - 4 [4301946.802000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4301946.802000] ata4: status=0x51 { DriveReady SeekComplete Error } [4301946.802000] ata4: error=0x04 { DriveStatusError } [4302380.482000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 [4302380.482000] ata4: status=0x51 { DriveReady SeekComplete Error } [4302380.482000] ata4: error=0x04 { DriveStatusError } [4302493.664000] ata4: no sense translation for status: 0x51 [4302493.664000] ata4: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0xb/00/00 [4302493.664000] ata4: status=0x51 { DriveReady SeekComplete Error } [4302863.673000] ata4: no sense translation for status: 0x51 [4302863.673000] ata4: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0xb/00/00 [4302863.673000] ata4: status=0x51 { DriveReady SeekComplete Error } different drive, different cable, same controller, but second port So that Stat/err = UDMA_CRC_Error_Count! Not sure if we can fix what is causing it (in Linux) but just FYI. ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-06 23:08 ` Justin Piszcz @ 2006-07-07 13:08 ` Mark Lord 2006-07-07 13:24 ` Justin Piszcz 0 siblings, 1 reply; 131+ messages in thread From: Mark Lord @ 2006-07-07 13:08 UTC (permalink / raw) To: Justin Piszcz, Sander; +Cc: Jeff Garzik, linux-kernel, IDE/ATA development list Justin Piszcz wrote: > Look at this: > >> From smartctl, look at the correspondence: > 199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always > - 4 > > [4301946.802000] ata4: translated ATA stat/err 0x51/04 to SCSI > SK/ASC/ASCQ 0xb/00/00 > [4301946.802000] ata4: status=0x51 { DriveReady SeekComplete Error } > [4301946.802000] ata4: error=0x04 { DriveStatusError } > [4302380.482000] ata4: translated ATA stat/err 0x51/04 to SCSI > SK/ASC/ASCQ 0xb/00/00 > [4302380.482000] ata4: status=0x51 { DriveReady SeekComplete Error } > [4302380.482000] ata4: error=0x04 { DriveStatusError } > [4302493.664000] ata4: no sense translation for status: 0x51 > [4302493.664000] ata4: translated ATA stat/err 0x51/00 to SCSI > SK/ASC/ASCQ 0xb/00/00 > [4302493.664000] ata4: status=0x51 { DriveReady SeekComplete Error } > [4302863.673000] ata4: no sense translation for status: 0x51 > [4302863.673000] ata4: translated ATA stat/err 0x51/00 to SCSI > SK/ASC/ASCQ 0xb/00/00 > [4302863.673000] ata4: status=0x51 { DriveReady SeekComplete Error } > > different drive, different cable, same controller, but second port > > So that Stat/err = UDMA_CRC_Error_Count! No, I don't think it is -- there's a bit in the drive status for indicating CRC errors, and it is not showing up here. I think it's still just libata sending some command that this drive does not implement. You really need to dump out the failed ATA opcode. I *think* this (uncompiled, untested) patch may do it for you on 2.6.16/17: --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-19 10:37:03.000000000 -0400 +++ linux/drivers/scsi/libata-scsi.c 2006-07-07 09:06:57.000000000 -0400 @@ -542,6 +542,7 @@ struct ata_taskfile *tf = &qc->tf; unsigned char *sb = cmd->sense_buffer; unsigned char *desc = sb + 8; + unsigned char ata_op = tf->command; memset(sb, 0, SCSI_SENSE_BUFFERSIZE); @@ -558,6 +559,7 @@ * onto sense key, asc & ascq. */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { + printk(KERN_WARN "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); ata_to_sense_error(qc->ap->id, tf->command, tf->feature, &sb[1], &sb[2], &sb[3]); sb[1] &= 0x0f; ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-07 13:08 ` Mark Lord @ 2006-07-07 13:24 ` Justin Piszcz 2006-07-07 13:43 ` Mark Lord 0 siblings, 1 reply; 131+ messages in thread From: Justin Piszcz @ 2006-07-07 13:24 UTC (permalink / raw) To: Mark Lord; +Cc: Sander, Jeff Garzik, linux-kernel, IDE/ATA development list On Fri, 7 Jul 2006, Mark Lord wrote: > Justin Piszcz wrote: >> Look at this: >> >>> From smartctl, look at the correspondence: >> 199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always >> - 4 >> >> [4301946.802000] ata4: translated ATA stat/err 0x51/04 to SCSI >> SK/ASC/ASCQ 0xb/00/00 >> [4301946.802000] ata4: status=0x51 { DriveReady SeekComplete Error } >> [4301946.802000] ata4: error=0x04 { DriveStatusError } >> [4302380.482000] ata4: translated ATA stat/err 0x51/04 to SCSI >> SK/ASC/ASCQ 0xb/00/00 >> [4302380.482000] ata4: status=0x51 { DriveReady SeekComplete Error } >> [4302380.482000] ata4: error=0x04 { DriveStatusError } >> [4302493.664000] ata4: no sense translation for status: 0x51 >> [4302493.664000] ata4: translated ATA stat/err 0x51/00 to SCSI >> SK/ASC/ASCQ 0xb/00/00 >> [4302493.664000] ata4: status=0x51 { DriveReady SeekComplete Error } >> [4302863.673000] ata4: no sense translation for status: 0x51 >> [4302863.673000] ata4: translated ATA stat/err 0x51/00 to SCSI >> SK/ASC/ASCQ 0xb/00/00 >> [4302863.673000] ata4: status=0x51 { DriveReady SeekComplete Error } >> >> different drive, different cable, same controller, but second port >> >> So that Stat/err = UDMA_CRC_Error_Count! > > No, I don't think it is -- there's a bit in the drive status > for indicating CRC errors, and it is not showing up here. > > I think it's still just libata sending some command that this > drive does not implement. You really need to dump out the failed > ATA opcode. > > I *think* this (uncompiled, untested) patch may do it for you on 2.6.16/17: > > --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-19 10:37:03.000000000 -0400 > +++ linux/drivers/scsi/libata-scsi.c 2006-07-07 09:06:57.000000000 -0400 > @@ -542,6 +542,7 @@ > struct ata_taskfile *tf = &qc->tf; > unsigned char *sb = cmd->sense_buffer; > unsigned char *desc = sb + 8; > + unsigned char ata_op = tf->command; > > memset(sb, 0, SCSI_SENSE_BUFFERSIZE); > > @@ -558,6 +559,7 @@ > * onto sense key, asc & ascq. > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > + printk(KERN_WARN "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > &sb[1], &sb[2], &sb[3]); > sb[1] &= 0x0f; > had to change KERN_WARN -> KERN_WARNING then more errors the patch never worked for me even when I had gotten it to work in 2.6.15.4, it never showed me what I wanted to see drivers/scsi/libata-scsi.c: In function 'ata_gen_fixed_sense': drivers/scsi/libata-scsi.c:638: error: 'ata_op' undeclared (first use in this function) drivers/scsi/libata-scsi.c:638: error: (Each undeclared identifier is reported only once drivers/scsi/libata-scsi.c:638: error: for each function it appears in.) make[2]: *** [drivers/scsi/libata-scsi.o] Error 1 do you know who wrote the original patch? ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-07 13:24 ` Justin Piszcz @ 2006-07-07 13:43 ` Mark Lord 2006-07-07 13:48 ` Justin Piszcz ` (2 more replies) 0 siblings, 3 replies; 131+ messages in thread From: Mark Lord @ 2006-07-07 13:43 UTC (permalink / raw) To: Justin Piszcz; +Cc: Sander, Jeff Garzik, linux-kernel, IDE/ATA development list Justin Piszcz wrote: > > had to change > > KERN_WARN -> KERN_WARNING > > then more errors Eh? After fixing the KERN_WARN -> KERN_WARNING part, the patch compiles / links cleanly here on 2.6.17. (fixed copy below). Still untested, though. > do you know who wrote the original patch? I did. Cheers --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-19 10:37:03.000000000 -0400 +++ linux/drivers/scsi/libata-scsi.c 2006-07-07 09:06:57.000000000 -0400 @@ -542,6 +542,7 @@ struct ata_taskfile *tf = &qc->tf; unsigned char *sb = cmd->sense_buffer; unsigned char *desc = sb + 8; + unsigned char ata_op = tf->command; memset(sb, 0, SCSI_SENSE_BUFFERSIZE); @@ -558,6 +559,7 @@ * onto sense key, asc & ascq. */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); ata_to_sense_error(qc->ap->id, tf->command, tf->feature, &sb[1], &sb[2], &sb[3]); sb[1] &= 0x0f; ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-07 13:43 ` Mark Lord @ 2006-07-07 13:48 ` Justin Piszcz 2006-07-07 14:01 ` Justin Piszcz 2006-07-07 14:35 ` Justin Piszcz 2 siblings, 0 replies; 131+ messages in thread From: Justin Piszcz @ 2006-07-07 13:48 UTC (permalink / raw) To: Mark Lord; +Cc: Sander, Jeff Garzik, linux-kernel, IDE/ATA development list On Fri, 7 Jul 2006, Mark Lord wrote: > Justin Piszcz wrote: >> >> had to change >> >> KERN_WARN -> KERN_WARNING >> >> then more errors > > Eh? After fixing the KERN_WARN -> KERN_WARNING part, > the patch compiles / links cleanly here on 2.6.17. > (fixed copy below). Still untested, though. > >> do you know who wrote the original patch? > > I did. > > Cheers > > --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-19 10:37:03.000000000 -0400 > +++ linux/drivers/scsi/libata-scsi.c 2006-07-07 09:06:57.000000000 -0400 > @@ -542,6 +542,7 @@ > struct ata_taskfile *tf = &qc->tf; > unsigned char *sb = cmd->sense_buffer; > unsigned char *desc = sb + 8; > + unsigned char ata_op = tf->command; > > memset(sb, 0, SCSI_SENSE_BUFFERSIZE); > > @@ -558,6 +559,7 @@ > * onto sense key, asc & ascq. > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > &sb[1], &sb[2], &sb[3]); > sb[1] &= 0x0f; > Applied patch, rebooting, waiting to get the error again. ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-07 13:43 ` Mark Lord 2006-07-07 13:48 ` Justin Piszcz @ 2006-07-07 14:01 ` Justin Piszcz 2006-07-07 14:35 ` Justin Piszcz 2 siblings, 0 replies; 131+ messages in thread From: Justin Piszcz @ 2006-07-07 14:01 UTC (permalink / raw) To: Mark Lord; +Cc: Sander, Jeff Garzik, linux-kernel, IDE/ATA development list On Fri, 7 Jul 2006, Mark Lord wrote: > Justin Piszcz wrote: >> >> had to change >> >> KERN_WARN -> KERN_WARNING >> >> then more errors > > Eh? After fixing the KERN_WARN -> KERN_WARNING part, > the patch compiles / links cleanly here on 2.6.17. > (fixed copy below). Still untested, though. > >> do you know who wrote the original patch? > > I did. > > Cheers > > --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-19 10:37:03.000000000 -0400 > +++ linux/drivers/scsi/libata-scsi.c 2006-07-07 09:06:57.000000000 -0400 > @@ -542,6 +542,7 @@ > struct ata_taskfile *tf = &qc->tf; > unsigned char *sb = cmd->sense_buffer; > unsigned char *desc = sb + 8; > + unsigned char ata_op = tf->command; > > memset(sb, 0, SCSI_SENSE_BUFFERSIZE); > > @@ -558,6 +559,7 @@ > * onto sense key, asc & ascq. > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > &sb[1], &sb[2], &sb[3]); > sb[1] &= 0x0f; > Mark, I've set a disk faulty in my SW RAID5 and rebuilding it now, note, in the past two rebuilds I have done (in exact same manner & disk) I've gotten 3-4 of these or so, so if I do not get them this time, that will be extremely odd. Justin. ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-07 13:43 ` Mark Lord 2006-07-07 13:48 ` Justin Piszcz 2006-07-07 14:01 ` Justin Piszcz @ 2006-07-07 14:35 ` Justin Piszcz 2006-07-07 18:53 ` Justin Piszcz 2 siblings, 1 reply; 131+ messages in thread From: Justin Piszcz @ 2006-07-07 14:35 UTC (permalink / raw) To: Mark Lord; +Cc: Sander, Jeff Garzik, linux-kernel, IDE/ATA development list On Fri, 7 Jul 2006, Mark Lord wrote: > Justin Piszcz wrote: >> >> had to change >> >> KERN_WARN -> KERN_WARNING >> >> then more errors > > Eh? After fixing the KERN_WARN -> KERN_WARNING part, > the patch compiles / links cleanly here on 2.6.17. > (fixed copy below). Still untested, though. > >> do you know who wrote the original patch? > > I did. > > Cheers > > --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-19 10:37:03.000000000 -0400 > +++ linux/drivers/scsi/libata-scsi.c 2006-07-07 09:06:57.000000000 -0400 > @@ -542,6 +542,7 @@ > struct ata_taskfile *tf = &qc->tf; > unsigned char *sb = cmd->sense_buffer; > unsigned char *desc = sb + 8; > + unsigned char ata_op = tf->command; > > memset(sb, 0, SCSI_SENSE_BUFFERSIZE); > > @@ -558,6 +559,7 @@ > * onto sense key, asc & ascq. > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > &sb[1], &sb[2], &sb[3]); > sb[1] &= 0x0f; > Mark!! It did it again, here you go: ==> /p34/var/log/messages <== Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { DriveReady SeekComplete Index Error } Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { DriveStatusError } ==> /p34/var/log/kern.log <== Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: translated ATA stat/err 0x53/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { DriveReady SeekComplete Index Error } Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { DriveStatusError } Does this help? Can we eliminate the cause of these errors now? ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-07 14:35 ` Justin Piszcz @ 2006-07-07 18:53 ` Justin Piszcz 2006-07-07 19:19 ` Jeff Garzik 0 siblings, 1 reply; 131+ messages in thread From: Justin Piszcz @ 2006-07-07 18:53 UTC (permalink / raw) To: Mark Lord Cc: Sander, Jeff Garzik, linux-kernel, IDE/ATA development list, Alan Cox On Fri, 7 Jul 2006, Justin Piszcz wrote: > > > On Fri, 7 Jul 2006, Mark Lord wrote: > >> Justin Piszcz wrote: >>> >>> had to change >>> >>> KERN_WARN -> KERN_WARNING >>> >>> then more errors >> >> Eh? After fixing the KERN_WARN -> KERN_WARNING part, >> the patch compiles / links cleanly here on 2.6.17. >> (fixed copy below). Still untested, though. >> >>> do you know who wrote the original patch? >> >> I did. >> >> Cheers >> >> --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-19 10:37:03.000000000 >> -0400 >> +++ linux/drivers/scsi/libata-scsi.c 2006-07-07 09:06:57.000000000 -0400 >> @@ -542,6 +542,7 @@ >> struct ata_taskfile *tf = &qc->tf; >> unsigned char *sb = cmd->sense_buffer; >> unsigned char *desc = sb + 8; >> + unsigned char ata_op = tf->command; >> >> memset(sb, 0, SCSI_SENSE_BUFFERSIZE); >> >> @@ -558,6 +559,7 @@ >> * onto sense key, asc & ascq. >> */ >> if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { >> + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed >> ata_op=0x%02x\n", ata_op); >> ata_to_sense_error(qc->ap->id, tf->command, tf->feature, >> &sb[1], &sb[2], &sb[3]); >> sb[1] &= 0x0f; >> > > Mark!! It did it again, here you go: > > ==> /p34/var/log/messages <== > Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { DriveReady > SeekComplete Index Error } > Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { > DriveStatusError } > ==> /p34/var/log/kern.log <== > Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: translated ATA stat/err > 0x53/04 to SCSI SK/ASC/ASCQ 0xb/00/00 > Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { DriveReady > SeekComplete Index Error } > Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { > DriveStatusError } > > Does this help? > > Can we eliminate the cause of these errors now? > > Jeff or Alan, Does that ATA translation help in determining what *bad* commands are being sent to the drive? This occurs on two separate identical disks. Justin. ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-07 18:53 ` Justin Piszcz @ 2006-07-07 19:19 ` Jeff Garzik 2006-07-07 19:28 ` Justin Piszcz 0 siblings, 1 reply; 131+ messages in thread From: Jeff Garzik @ 2006-07-07 19:19 UTC (permalink / raw) To: Justin Piszcz Cc: Mark Lord, Sander, linux-kernel, IDE/ATA development list, Alan Cox Justin Piszcz wrote: > > > On Fri, 7 Jul 2006, Justin Piszcz wrote: > >> >> >> On Fri, 7 Jul 2006, Mark Lord wrote: >> >>> Justin Piszcz wrote: >>>> >>>> had to change >>>> >>>> KERN_WARN -> KERN_WARNING >>>> >>>> then more errors >>> >>> Eh? After fixing the KERN_WARN -> KERN_WARNING part, >>> the patch compiles / links cleanly here on 2.6.17. >>> (fixed copy below). Still untested, though. >>> >>>> do you know who wrote the original patch? >>> >>> I did. >>> >>> Cheers >>> >>> --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-19 >>> 10:37:03.000000000 -0400 >>> +++ linux/drivers/scsi/libata-scsi.c 2006-07-07 09:06:57.000000000 >>> -0400 >>> @@ -542,6 +542,7 @@ >>> struct ata_taskfile *tf = &qc->tf; >>> unsigned char *sb = cmd->sense_buffer; >>> unsigned char *desc = sb + 8; >>> + unsigned char ata_op = tf->command; >>> >>> memset(sb, 0, SCSI_SENSE_BUFFERSIZE); >>> >>> @@ -558,6 +559,7 @@ >>> * onto sense key, asc & ascq. >>> */ >>> if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { >>> + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed >>> ata_op=0x%02x\n", ata_op); >>> ata_to_sense_error(qc->ap->id, tf->command, tf->feature, >>> &sb[1], &sb[2], &sb[3]); >>> sb[1] &= 0x0f; >>> >> >> Mark!! It did it again, here you go: >> >> ==> /p34/var/log/messages <== >> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { >> DriveReady SeekComplete Index Error } >> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { >> DriveStatusError } >> ==> /p34/var/log/kern.log <== >> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: translated ATA >> stat/err 0x53/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { >> DriveReady SeekComplete Index Error } >> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { >> DriveStatusError } >> >> Does this help? >> >> Can we eliminate the cause of these errors now? >> >> > > Jeff or Alan, > > Does that ATA translation help in determining what *bad* commands are > being sent to the drive? No, it needs the patch that Mark has been posting... Jeff ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-07 19:19 ` Jeff Garzik @ 2006-07-07 19:28 ` Justin Piszcz [not found] ` <200607091224.31451.liml@rtr.ca> 0 siblings, 1 reply; 131+ messages in thread From: Justin Piszcz @ 2006-07-07 19:28 UTC (permalink / raw) To: Jeff Garzik Cc: Mark Lord, Sander, linux-kernel, IDE/ATA development list, Alan Cox On Fri, 7 Jul 2006, Jeff Garzik wrote: > Justin Piszcz wrote: >> >> >> On Fri, 7 Jul 2006, Justin Piszcz wrote: >> >>> >>> >>> On Fri, 7 Jul 2006, Mark Lord wrote: >>> >>>> Justin Piszcz wrote: >>>>> >>>>> had to change >>>>> >>>>> KERN_WARN -> KERN_WARNING >>>>> >>>>> then more errors >>>> >>>> Eh? After fixing the KERN_WARN -> KERN_WARNING part, >>>> the patch compiles / links cleanly here on 2.6.17. >>>> (fixed copy below). Still untested, though. >>>> >>>>> do you know who wrote the original patch? >>>> >>>> I did. >>>> >>>> Cheers >>>> >>>> --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-19 >>>> 10:37:03.000000000 -0400 >>>> +++ linux/drivers/scsi/libata-scsi.c 2006-07-07 09:06:57.000000000 >>>> -0400 >>>> @@ -542,6 +542,7 @@ >>>> struct ata_taskfile *tf = &qc->tf; >>>> unsigned char *sb = cmd->sense_buffer; >>>> unsigned char *desc = sb + 8; >>>> + unsigned char ata_op = tf->command; >>>> >>>> memset(sb, 0, SCSI_SENSE_BUFFERSIZE); >>>> >>>> @@ -558,6 +559,7 @@ >>>> * onto sense key, asc & ascq. >>>> */ >>>> if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { >>>> + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed >>>> ata_op=0x%02x\n", ata_op); >>>> ata_to_sense_error(qc->ap->id, tf->command, tf->feature, >>>> &sb[1], &sb[2], &sb[3]); >>>> sb[1] &= 0x0f; >>>> >>> >>> Mark!! It did it again, here you go: >>> >>> ==> /p34/var/log/messages <== >>> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { >>> DriveReady SeekComplete Index Error } >>> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { >>> DriveStatusError } >>> ==> /p34/var/log/kern.log <== >>> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: translated ATA stat/err >>> 0x53/04 to SCSI SK/ASC/ASCQ 0xb/00/00 >>> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { >>> DriveReady SeekComplete Index Error } >>> Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { >>> DriveStatusError } >>> >>> Does this help? >>> >>> Can we eliminate the cause of these errors now? >>> >>> >> >> Jeff or Alan, >> >> Does that ATA translation help in determining what *bad* commands are being >> sent to the drive? > > No, it needs the patch that Mark has been posting... > > Jeff > > > Jeff, the patch is applied and box booted the new kernel and I reproduced the error messages, THAT is what is produced with the patch. Without the patch: Jun 18 07:09:53 p34 kernel: [4297678.777000] ata3: status=0x51 { DriveReady SeekComplete Error } Jun 18 07:09:53 p34 kernel: [4297678.777000] ata3: error=0x04 { DriveStatusError } Jun 18 07:20:08 p34 -- MARK -- Jun 18 07:27:31 p34 kernel: [4298736.905000] ata3: status=0x51 { DriveReady SeekComplete Error } Jun 18 07:27:31 p34 kernel: [4298736.905000] ata3: error=0x04 { DriveStatusError } With the patch: Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: translated ATA stat/err 0x53/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: status=0x53 { DriveReady SeekComplete Index Error } Jul 7 10:26:06 p34 kernel: [4296869.461000] ata4: error=0x04 { DriveStatusError } Jul 7 10:49:29 p34 kernel: [4298273.178000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 7 10:49:29 p34 kernel: [4298273.178000] ata4: status=0x51 { DriveReady SeekComplete Error } Jul 7 10:49:29 p34 kernel: [4298273.178000] ata4: error=0x04 { DriveStatusError } Jul 7 11:43:02 p34 kernel: [4301488.359000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 7 11:43:02 p34 kernel: [4301488.359000] ata4: status=0x51 { DriveReady SeekComplete Error } Jul 7 11:43:02 p34 kernel: [4301488.359000] ata4: error=0x04 { DriveStatusError } Jul 7 12:35:27 p34 kernel: [4304634.600000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 7 12:35:27 p34 kernel: [4304634.600000] ata4: status=0x51 { DriveReady SeekComplete Error } Jul 7 12:35:27 p34 kernel: [4304634.600000] ata4: error=0x04 { DriveStatusError } Jul 7 12:44:14 p34 kernel: [4305162.220000] ata4: no sense translation for status: 0x51 Jul 7 12:44:14 p34 kernel: [4305162.220000] ata4: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 7 12:44:14 p34 kernel: [4305162.220000] ata4: status=0x51 { DriveReady SeekComplete Error } Jul 7 13:03:22 p34 kernel: [4306309.782000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 7 13:03:22 p34 kernel: [4306309.782000] ata4: status=0x51 { DriveReady SeekComplete Error } Jul 7 13:03:22 p34 kernel: [4306309.782000] ata4: error=0x04 { DriveStatusError } Jul 7 13:05:12 p34 kernel: [4306419.891000] ata4: no sense translation for status: 0x51 Jul 7 13:05:12 p34 kernel: [4306419.891000] ata4: translated ATA stat/err 0x51/00 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 7 13:05:12 p34 kernel: [4306419.891000] ata4: status=0x51 { DriveReady SeekComplete Error } Jul 7 13:32:20 p34 kernel: [4308048.717000] ata4: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 7 13:32:20 p34 kernel: [4308048.717000] ata4: status=0x51 { DriveReady SeekComplete Error } Jul 7 13:32:20 p34 kernel: [4308048.717000] ata4: error=0x04 { DriveStatusError } When I had been running it earlier with 2.6.15.x: Mar 1 13:31:10 p34 kernel: [4295292.736000] +++PATCH: Original kernel error: Mar 1 13:31:10 p34 kernel: [4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 1 13:31:10 p34 kernel: [4295292.736000] +++PATCH: Mark Lord's extended verbosity patch: Mar 1 13:31:10 p34 kernel: [4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 1 13:31:10 p34 kernel: [4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error } Mar 1 13:31:10 p34 kernel: [4295292.736000] ata3: error=0x04 { DriveStatusError } Mar 1 13:31:10 p34 kernel: [4295292.736000] +++PATCH: Original kernel error: Mar 1 13:31:10 p34 kernel: [4295292.736000] ata3: translated op=0x85 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 1 13:31:10 p34 kernel: [4295292.736000] +++PATCH: Mark Lord's extended verbosity patch: Mar 1 13:31:10 p34 kernel: [4295292.736000] ata3: translated op=0x85 cmd=0xb0 ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Mar 1 13:31:10 p34 kernel: [4295292.736000] ata3: status=0x51 { DriveReady SeekComplete Error } Mar 1 13:31:10 p34 kernel: [4295292.736000] ata3: error=0x04 { DriveStatusError } Perhaps the patch is not printing out the correct error message? This shows that the source file was patched in libata-scsi.c. /* * Use ata_to_sense_error() to map status register bits * onto sense key, asc & ascq. */ if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); ata_to_sense_error(qc->ap->id, tf->command, tf->feature, &sb[1], &sb[2], &sb[3]); sb[1] &= 0x0f; } This shows the kernel version. $ cat /usr/src/linux/.version 4 This shows I am running the patched version. $ uname -a Linux p34.internal.lan 2.6.17.3 #4 SMP PREEMPT Fri Jul 7 09:47:53 EDT 2006 i686 GNU/Linux $ Maybe something is blocking the opcode output from showing correctly? Justin. ^ permalink raw reply [flat|nested] 131+ messages in thread
[parent not found: <200607091224.31451.liml@rtr.ca>]
* Re: LibPATA code issues / 2.6.15.4 [not found] ` <200607091224.31451.liml@rtr.ca> @ 2006-07-09 17:27 ` Justin Piszcz 2006-07-09 20:16 ` Justin Piszcz 0 siblings, 1 reply; 131+ messages in thread From: Justin Piszcz @ 2006-07-09 17:27 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, Sander, linux-kernel, IDE/ATA development list, Alan Cox On Sun, 9 Jul 2006, Mark Lord wrote: > Mmm.. there are two main paths into those messages, > and my current patch only caught one of them. > > Here's a reworked version that catches the ata_op on both paths. > Maybe this will dump out the info we need to diagnose Justin's system. > > Compiles & links fine on 2.6.17, but not tested. > > Cheers > > --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-23 13:38:37.000000000 -0400 > +++ linux/drivers/scsi/libata-scsi.c 2006-07-09 12:19:52.000000000 -0400 > @@ -542,6 +542,7 @@ > struct ata_taskfile *tf = &qc->tf; > unsigned char *sb = cmd->sense_buffer; > unsigned char *desc = sb + 8; > + unsigned char ata_op = tf->command; > > memset(sb, 0, SCSI_SENSE_BUFFERSIZE); > > @@ -558,6 +559,7 @@ > * onto sense key, asc & ascq. > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed ata_op=0x%02x\n", ata_op); > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > &sb[1], &sb[2], &sb[3]); > sb[1] &= 0x0f; > @@ -617,6 +619,7 @@ > struct scsi_cmnd *cmd = qc->scsicmd; > struct ata_taskfile *tf = &qc->tf; > unsigned char *sb = cmd->sense_buffer; > + unsigned char ata_op = tf->command; > > memset(sb, 0, SCSI_SENSE_BUFFERSIZE); > > @@ -633,6 +636,7 @@ > * onto sense key, asc & ascq. > */ > if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { > + printk(KERN_WARNING "ata_gen_fixed_sense: failed ata_op=0x%02x\n", ata_op); > ata_to_sense_error(qc->ap->id, tf->command, tf->feature, > &sb[2], &sb[12], &sb[13]); > sb[2] &= 0x0f; > Thanks Mark! Applying now. ^ permalink raw reply [flat|nested] 131+ messages in thread
* Re: LibPATA code issues / 2.6.15.4 2006-07-09 17:27 ` Justin Piszcz @ 2006-07-09 20:16 ` Justin Piszcz 0 siblings, 0 replies; 131+ messages in thread From: Justin Piszcz @ 2006-07-09 20:16 UTC (permalink / raw) To: Mark Lord Cc: Jeff Garzik, Sander, linux-kernel, IDE/ATA development list, Alan Cox On Sun, 9 Jul 2006, Justin Piszcz wrote: > > > On Sun, 9 Jul 2006, Mark Lord wrote: > >> Mmm.. there are two main paths into those messages, >> and my current patch only caught one of them. >> >> Here's a reworked version that catches the ata_op on both paths. >> Maybe this will dump out the info we need to diagnose Justin's system. >> >> Compiles & links fine on 2.6.17, but not tested. >> >> Cheers >> >> --- linux/drivers/scsi/libata-scsi.c.orig 2006-06-23 13:38:37.000000000 >> -0400 >> +++ linux/drivers/scsi/libata-scsi.c 2006-07-09 12:19:52.000000000 -0400 >> @@ -542,6 +542,7 @@ >> struct ata_taskfile *tf = &qc->tf; >> unsigned char *sb = cmd->sense_buffer; >> unsigned char *desc = sb + 8; >> + unsigned char ata_op = tf->command; >> >> memset(sb, 0, SCSI_SENSE_BUFFERSIZE); >> >> @@ -558,6 +559,7 @@ >> * onto sense key, asc & ascq. >> */ >> if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { >> + printk(KERN_WARNING "ata_gen_ata_desc_sense: failed >> ata_op=0x%02x\n", ata_op); >> ata_to_sense_error(qc->ap->id, tf->command, tf->feature, >> &sb[1], &sb[2], &sb[3]); >> sb[1] &= 0x0f; >> @@ -617,6 +619,7 @@ >> struct scsi_cmnd *cmd = qc->scsicmd; >> struct ata_taskfile *tf = &qc->tf; >> unsigned char *sb = cmd->sense_buffer; >> + unsigned char ata_op = tf->command; >> >> memset(sb, 0, SCSI_SENSE_BUFFERSIZE); >> >> @@ -633,6 +636,7 @@ >> * onto sense key, asc & ascq. >> */ >> if (tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { >> + printk(KERN_WARNING "ata_gen_fixed_sense: failed >> ata_op=0x%02x\n", ata_op); >> ata_to_sense_error(qc->ap->id, tf->command, tf->feature, >> &sb[2], &sb[12], &sb[13]); >> sb[2] &= 0x0f; >> > > Thanks Mark! > > Applying now. > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > Mark, Check line 519, this is where it is printing the error (I believe) and the patch does not print the ata_op here. It is in the ata_to_sense_error() function. I've already patched, as you can see, recompiled, etc.. # patch -p0 < /tmp/b patching file linux/drivers/scsi/libata-scsi.c Reversed (or previously applied) patch detected! Assume -R? [n] # Jul 9 15:22:57 p34 kernel: [4300704.724000] ata3: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Jul 9 15:22:57 p34 kernel: [4300704.724000] ata3: status=0x51 { DriveReady SeekComplete Error } Jul 9 15:22:57 p34 kernel: [4300704.724000] ata3: error=0x04 { DriveStatusError } This part needs the ata_op: 519 translate_done: 520 printk(KERN_ERR "ata%u: translated ATA stat/err 0x%02x/%02x to " 521 "SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err, 522 *sk, *asc, *ascq); Justin. ^ permalink raw reply [flat|nested] 131+ messages in thread
end of thread, other threads:[~2006-07-09 20:16 UTC | newest] Thread overview: 131+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2006-03-01 19:00 LibPATA code issues / 2.6.15.4 Nicolas Mailhot 2006-03-01 19:22 ` Mark Lord 2006-03-01 23:12 ` Nicolas Mailhot 2006-03-01 23:31 ` Jeff Garzik 2006-03-02 1:19 ` Eric D. Mudama 2006-03-02 1:39 ` Eric D. Mudama 2006-03-02 1:56 ` FUA and 311x (was Re: LibPATA code issues / 2.6.15.4) Jeff Garzik 2006-03-02 1:58 ` Jeff Garzik 2006-03-02 2:20 ` Eric D. Mudama 2006-03-02 2:46 ` Jeff Garzik 2006-03-02 3:00 ` Eric D. Mudama 2006-03-02 3:06 ` Jeff Garzik 2006-03-02 3:13 ` Tejun Heo 2006-03-02 3:16 ` Mark Lord 2006-03-02 3:18 ` Jeff Garzik 2006-03-02 6:23 ` Eric D. Mudama 2006-03-02 9:00 ` Sander 2006-03-02 11:52 ` Jeff Garzik 2006-03-02 8:57 ` Sander 2006-03-03 0:34 ` Mark Lord 2006-03-02 16:12 ` Nicolas Mailhot 2006-03-02 16:07 ` Nicolas Mailhot 2006-03-02 16:03 ` Nicolas Mailhot 2006-03-02 16:05 ` Nicolas Mailhot 2006-03-02 7:22 ` Jens Axboe 2006-03-02 15:59 ` Nicolas Mailhot 2006-03-02 16:37 ` Jeff Garzik -- strict thread matches above, loose matches on Subject: below -- 2006-02-14 9:48 LibPATA code issues / 2.6.15.4 Justin Piszcz 2006-02-14 14:50 ` Mark Lord 2006-02-14 16:27 ` David Greaves 2006-02-14 17:12 ` Justin Piszcz 2006-02-14 18:00 ` Mark Lord 2006-02-14 18:06 ` Justin Piszcz 2006-02-23 23:39 ` Justin Piszcz 2006-02-25 15:32 ` Mark Lord 2006-02-25 15:58 ` Justin Piszcz 2006-02-25 16:11 ` Jesper Juhl 2006-02-25 16:21 ` Mark Lord 2006-02-25 11:34 ` David Greaves 2006-02-25 16:20 ` Mark Lord 2006-02-25 17:45 ` Justin Piszcz 2006-02-25 18:28 ` Mark Lord 2006-02-25 18:55 ` Justin Piszcz 2006-02-25 19:29 ` Justin Piszcz 2006-02-25 19:53 ` David Greaves 2006-02-25 19:47 ` David Greaves 2006-02-26 2:27 ` Mark Lord 2006-02-26 9:56 ` David Greaves 2006-02-26 14:04 ` Mark Lord 2006-02-27 21:34 ` Mark Lord 2006-02-28 1:33 ` Tejun Heo 2006-02-28 1:46 ` Linus Torvalds 2006-02-28 2:07 ` Jeff Garzik 2006-02-28 2:14 ` Linus Torvalds 2006-02-28 2:52 ` Jeff Garzik 2006-02-28 3:36 ` Jeff Garzik 2006-02-28 4:11 ` Mark Lord 2006-02-28 10:30 ` Alan Cox 2006-02-28 8:03 ` Jens Axboe 2006-02-28 4:16 ` Mark Lord 2006-02-28 10:32 ` Alan Cox 2006-02-28 10:30 ` Justin Piszcz 2006-02-28 10:39 ` David Greaves 2006-02-28 14:37 ` Mark Lord 2006-02-28 21:04 ` Bill Davidsen 2006-03-08 2:57 ` Mark Lord 2006-03-08 3:18 ` Dave Jones 2006-03-08 3:23 ` Mark Lord 2006-03-08 15:37 ` Bill Davidsen 2006-02-28 14:38 ` Mark Lord 2006-02-28 15:16 ` Alan Cox 2006-03-01 17:33 ` David Greaves 2006-03-01 18:37 ` Alan Cox 2006-03-01 20:12 ` Phillip Susi 2006-03-08 16:46 ` Alan Cox 2006-02-28 15:31 ` Mark Lord 2006-02-28 15:34 ` Jeff Garzik 2006-02-28 16:57 ` Eric D. Mudama 2006-03-01 1:04 ` Mark Lord 2006-03-01 11:37 ` Justin Piszcz 2006-03-01 13:17 ` Justin Piszcz 2006-03-01 17:41 ` David Greaves 2006-03-01 17:46 ` Mark Lord 2006-03-01 18:12 ` David Greaves 2006-03-01 18:30 ` Mark Lord 2006-03-01 18:32 ` Justin Piszcz 2006-03-01 18:33 ` Justin Piszcz 2006-03-01 18:48 ` David Greaves 2006-03-01 19:49 ` David Greaves 2006-03-03 19:38 ` Justin Piszcz 2006-03-03 22:46 ` David Greaves 2006-03-04 14:25 ` Mark Lord 2006-03-06 6:13 ` David Greaves 2006-03-21 18:11 ` David Greaves 2006-03-22 15:23 ` David Greaves 2006-03-05 11:43 ` Justin Piszcz 2006-03-05 12:41 ` Justin Piszcz 2006-03-05 22:58 ` Mark Lord 2006-03-05 23:00 ` Mark Lord 2006-03-05 23:19 ` Justin Piszcz 2006-03-05 23:39 ` Jeff Garzik 2006-03-01 19:06 ` Justin Piszcz 2006-03-01 19:28 ` Mark Lord 2006-03-01 19:35 ` Mark Lord 2006-03-01 19:38 ` Justin Piszcz 2006-03-01 19:41 ` Jeff Garzik 2006-02-26 12:27 ` James Courtier-Dutton 2006-02-26 12:55 ` David Greaves 2006-02-26 13:56 ` Mark Lord 2006-02-14 23:58 ` Justin Piszcz 2006-02-17 8:45 ` Jeff Garzik 2006-02-17 14:59 ` Mark Lord 2006-02-17 15:00 ` Justin Piszcz 2006-02-18 20:43 ` Sander 2006-02-18 21:42 ` Mark Lord 2006-02-18 21:51 ` Justin Piszcz 2006-02-19 7:14 ` Sander 2006-02-19 15:30 ` Mark Lord 2006-02-19 17:16 ` Sander 2006-07-06 23:08 ` Justin Piszcz 2006-07-07 13:08 ` Mark Lord 2006-07-07 13:24 ` Justin Piszcz 2006-07-07 13:43 ` Mark Lord 2006-07-07 13:48 ` Justin Piszcz 2006-07-07 14:01 ` Justin Piszcz 2006-07-07 14:35 ` Justin Piszcz 2006-07-07 18:53 ` Justin Piszcz 2006-07-07 19:19 ` Jeff Garzik 2006-07-07 19:28 ` Justin Piszcz [not found] ` <200607091224.31451.liml@rtr.ca> 2006-07-09 17:27 ` Justin Piszcz 2006-07-09 20:16 ` Justin Piszcz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).