All of lore.kernel.org
 help / color / mirror / Atom feed
* Intel ICH9M/M-E SATA error-handling/reset problems
@ 2009-02-14 20:06 Serguei Miridonov
  2009-02-14 20:53 ` Jeff Garzik
  2009-02-14 22:01 ` Robert Hancock
  0 siblings, 2 replies; 11+ messages in thread
From: Serguei Miridonov @ 2009-02-14 20:06 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1178 bytes --]

I have some problems with SATA in a new notebook PC (HP Pavilion dv5t, 
Intel chipset). Seagate FreeAgent Pro 1TB external drive practically 
can not be used with eSATA in Linux (fresh install from DVD Fedora 10, 
now fully updated), and yesterday I also had problem with DVD 
recording using internal HL-DT-ST BDDVDRW drive.

More details in attachment.

Both devices work with Windows Vista. Seagate external drive even in 
Vista produces "parity error" messages in Windows event log but OS is 
somehow recovering from these errors and continues to use the drive 
with slight slowdown (average speed varies between 60 and 110 MB/s). 
Of course, it could be cable/Seagate issue, but again - Vista can 
handle this.

It appears that Linux kernel has problems with error-handling/reset of 
SATA hardware. I have found a lot of reports regarding SATA problems: 
data transfer failures, CD/DVD recording, waking up from suspend to 
RAM, etc. Aren't they all related? Can Linux SATA chipsets drivers 
properly reset hardware into predictable state? Sure, I could be wrong 
and my issue may have nothing to do with others... Any idea?

Please, CC any reply to my e-mail.
Thank you.


[-- Attachment #2: sata-errors-report.txt --]
[-- Type: text/plain, Size: 18535 bytes --]

1. Problem with Seagate FreeAgent Pro 1TB drive connected with eSATA cable

  SUMMARY

  The drive can be recognized by the system.
  I could copy large file FROM the external drive to the internal disk.
  When I tried to copy the same file TO the drive the drive stopped responding.

2. Problem with DVD+R recording

  SUMMARY

  Happened only once, so far...
  k3b recognized the drive and empty media inside.
  Attempt to record DVD+R using k3b failed, Bluray/DVD/CD drive stopped responding.
  Reboot was required in order to use it again.
  Attempt to record DVD+R after reboot was successful.

Hardware: HP Pavilion dv5t Intel(R) Core(TM)2 Duo CPU P8600  @ 2.40GHz, 3GB RAM
# lspci
00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07)
00:01.0 PCI bridge: Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port (rev 07)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 03)
00:1c.2 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 3 (rev 03)
00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 03)
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 03)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:1d.3 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 93)
00:1f.0 ISA bridge: Intel Corporation ICH9M LPC Interface Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 03)
00:1f.6 Signal processing controller: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem (rev 03)
01:00.0 VGA compatible controller: nVidia Corporation GeForce 9200M GS (rev a1)
02:00.0 Network controller: Intel Corporation PRO/Wireless 5100 AGN [Shiloh] Network Connection
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
06:00.0 FireWire (IEEE 1394): JMicron Technologies, Inc. IEEE 1394 Host Controller
06:00.1 System peripheral: JMicron Technologies, Inc. SD/MMC Host Controller
06:00.2 SD Host controller: JMicron Technologies, Inc. Standard SD Host Controller
06:00.3 System peripheral: JMicron Technologies, Inc. MS Host Controller
06:00.4 System peripheral: JMicron Technologies, Inc. xD Host Controller

/var/log/messages:

=========== Seagate FreeAgent Pro 1TB drive connected: ===============

Feb  8 18:37:23 localhost kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
Feb  8 18:37:23 localhost kernel: ata6: irq_stat 0x00000040, connection status changed
Feb  8 18:37:23 localhost kernel: ata6: SError: { CommWake DevExch }
Feb  8 18:37:23 localhost kernel: ata6: hard resetting link
Feb  8 18:37:30 localhost kernel: ata6: link is slow to respond, please be patient (ready=0)
Feb  8 18:37:33 localhost kernel: ata6: softreset failed (device not ready)
Feb  8 18:37:33 localhost kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb  8 18:37:33 localhost kernel: ata6: link online but device misclassified, retrying
Feb  8 18:37:33 localhost kernel: ata6: hard resetting link
Feb  8 18:37:37 localhost kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb  8 18:37:37 localhost kernel: ata6.00: ATA-6: Seagate FreeAgent Pro, 4109, max UDMA/133
Feb  8 18:37:37 localhost kernel: ata6.00: 1953525168 sectors, multi 0: LBA48
Feb  8 18:37:37 localhost kernel: ata6.00: configured for UDMA/133
Feb  8 18:37:37 localhost kernel: ata6: EH complete
Feb  8 18:37:37 localhost kernel: scsi 5:0:0:0: Direct-Access     ATA      Seagate FreeAgen 4109 PQ: 0 ANSI: 5
Feb  8 18:37:37 localhost kernel: sd 5:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB)
Feb  8 18:37:37 localhost kernel: sd 5:0:0:0: [sdb] Write Protect is off
Feb  8 18:37:37 localhost kernel: sd 5:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb  8 18:37:37 localhost kernel: sd 5:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB)
Feb  8 18:37:37 localhost kernel: sd 5:0:0:0: [sdb] Write Protect is off
Feb  8 18:37:37 localhost kernel: sd 5:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb  8 18:37:38 localhost kernel: sdb: sdb1
Feb  8 18:37:38 localhost kernel: sd 5:0:0:0: [sdb] Attached SCSI disk
Feb  8 18:37:38 localhost kernel: sd 5:0:0:0: Attached scsi generic sg2 type 0
Feb  8 18:37:53 localhost hald: mounted /dev/sdb1 on behalf of uid 0

=========== Attempt to copy large file TO the external drive ================

Feb  8 18:40:41 localhost kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
Feb  8 18:40:41 localhost kernel: ata6.00: irq_stat 0x40000001
Feb  8 18:40:41 localhost kernel: ata6.00: cmd 35/00:00:b9:21:cb/00:04:21:00:00/e0 tag 0 dma 524288 out
Feb  8 18:40:41 localhost kernel:         res 51/84:00:b8:25:cb/00:00:21:00:00/e0 Emask 0x10 (ATA bus error)
Feb  8 18:40:41 localhost kernel: ata6.00: status: { DRDY ERR }
Feb  8 18:40:41 localhost kernel: ata6.00: error: { ICRC ABRT }
Feb  8 18:40:41 localhost kernel: ata6: hard resetting link
Feb  8 18:40:46 localhost kernel: ata6: link is slow to respond, please be patient (ready=0)
Feb  8 18:40:51 localhost kernel: ata6: softreset failed (device not ready)
Feb  8 18:40:51 localhost kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb  8 18:40:56 localhost kernel: ata6.00: qc timeout (cmd 0xec)
Feb  8 18:40:56 localhost kernel: ata6.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Feb  8 18:40:56 localhost kernel: ata6.00: revalidation failed (errno=-5)
Feb  8 18:40:56 localhost kernel: ata6: hard resetting link
Feb  8 18:41:02 localhost kernel: ata6: link is slow to respond, please be patient (ready=0)
Feb  8 18:41:06 localhost kernel: ata6: softreset failed (device not ready)
Feb  8 18:41:06 localhost kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb  8 18:41:16 localhost kernel: ata6.00: qc timeout (cmd 0xec)
Feb  8 18:41:16 localhost kernel: ata6.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Feb  8 18:41:16 localhost kernel: ata6.00: revalidation failed (errno=-5)
Feb  8 18:41:16 localhost kernel: ata6: hard resetting link
Feb  8 18:41:22 localhost kernel: ata6: link is slow to respond, please be patient (ready=0)
Feb  8 18:41:26 localhost kernel: ata6: softreset failed (device not ready)
Feb  8 18:41:26 localhost kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb  8 18:41:56 localhost kernel: ata6.00: qc timeout (cmd 0xec)
Feb  8 18:41:56 localhost kernel: ata6.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Feb  8 18:41:56 localhost kernel: ata6.00: revalidation failed (errno=-5)
Feb  8 18:41:56 localhost kernel: ata6.00: disabled
Feb  8 18:41:56 localhost kernel: ata6: hard resetting link
Feb  8 18:42:02 localhost kernel: ata6: link is slow to respond, please be patient (ready=0)
Feb  8 18:42:06 localhost kernel: ata6: softreset failed (device not ready)
Feb  8 18:42:06 localhost kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb  8 18:42:06 localhost kernel: ata6: EH complete
Feb  8 18:42:06 localhost kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Feb  8 18:42:06 localhost kernel: end_request: I/O error, dev sdb, sector 566960569
...
Last pair of lines repeated many times for different sectors. Then

Feb  8 18:42:06 localhost kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Feb  8 18:42:06 localhost kernel: end_request: I/O error, dev sdb, sector 64
Feb  8 18:42:06 localhost kernel: Buffer I/O error on device sdb1, logical block 1
Feb  8 18:42:06 localhost kernel: lost page write due to I/O error on sdb1
...

After that drive becomes unaccessable and can be accessed again only after power off/on.

================= Detecting Bluray/DVD/CD drive on boot =======================

Feb 13 16:56:20 localhost kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Feb 13 16:56:20 localhost kernel: ata2.00: ATAPI: HL-DT-ST BDDVDRW CT10L, YC07, max UDMA/100
Feb 13 16:56:20 localhost kernel: ata2.00: configured for UDMA/100
...
Feb 13 16:56:20 localhost kernel: scsi 1:0:0:0: CD-ROM            HL-DT-ST BDDVDRW CT10L    YC07 PQ: 0 ANSI: 5
Feb 13 16:56:20 localhost kernel: sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
Feb 13 16:56:20 localhost kernel: Uniform CD-ROM driver Revision: 3.20
Feb 13 16:56:20 localhost kernel: sr 1:0:0:0: Attached scsi generic sg1 type 5

================= Insert empty DVD+R (Sony) ===================================

Feb 13 19:43:11 localhost kernel: cdrom: This disc doesn't have any tracks I recognize!
Feb 13 19:43:11 localhost kernel: end_request: I/O error, dev sr0, sector 0
Feb 13 19:43:11 localhost kernel: Buffer I/O error on device sr0, logical block 0
Feb 13 19:43:11 localhost kernel: end_request: I/O error, dev sr0, sector 0
Feb 13 19:43:11 localhost kernel: Buffer I/O error on device sr0, logical block 0

================= Attempt to write DVD =========================================

Feb 13 19:48:33 localhost kernel: warning: `growisofs' uses 32-bit capabilities (legacy support in use)
Feb 13 19:49:35 localhost kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Feb 13 19:49:35 localhost kernel: ata2.00: cmd a0/01:00:00:00:80/00:00:00:00:00/a0 tag 0 dma 32768 out
Feb 13 19:49:35 localhost kernel:         cdb 2a 00 00 00 00 00 00 00  10 00 00 00 00 00 00 00
Feb 13 19:49:35 localhost kernel:         res 40/00:03:00:0c:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Feb 13 19:49:35 localhost kernel: ata2.00: status: { DRDY }
Feb 13 19:49:35 localhost kernel: ata2: hard resetting link
Feb 13 19:49:35 localhost kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Feb 13 19:49:35 localhost kernel: ata2.00: configured for UDMA/100
Feb 13 19:49:35 localhost kernel: ata2: EH complete

================= k3b reports error and ejects the disc, I load the same disc back ============

Feb 13 19:50:02 localhost kernel: cdrom: This disc doesn't have any tracks I recognize!
Feb 13 19:50:02 localhost kernel: end_request: I/O error, dev sr0, sector 0
Feb 13 19:50:02 localhost kernel: Buffer I/O error on device sr0, logical block 0
Feb 13 19:50:02 localhost kernel: end_request: I/O error, dev sr0, sector 0
Feb 13 19:50:02 localhost kernel: Buffer I/O error on device sr0, logical block 0
Feb 13 19:51:30 localhost kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Feb 13 19:51:30 localhost kernel: ata2.00: cmd a0/01:00:00:00:80/00:00:00:00:00/a0 tag 0 dma 32768 out
Feb 13 19:51:30 localhost kernel:         cdb 2a 00 00 00 00 00 00 00  10 00 00 00 00 00 00 00
Feb 13 19:51:30 localhost kernel:         res 40/00:03:00:0c:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Feb 13 19:51:30 localhost kernel: ata2.00: status: { DRDY }
Feb 13 19:51:30 localhost kernel: ata2: hard resetting link
Feb 13 19:51:31 localhost kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Feb 13 19:51:31 localhost kernel: ata2.00: configured for UDMA/100
Feb 13 19:51:31 localhost kernel: ata2: EH complete

================= k3b reports error and ejects the disc, I load the same disc back again =====

Feb 13 19:51:54 localhost kernel: cdrom: This disc doesn't have any tracks I recognize!
Feb 13 19:52:58 localhost kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Feb 13 19:52:58 localhost kernel: ata2.00: cmd a0/01:00:00:00:80/00:00:00:00:00/a0 tag 0 dma 32768 out
Feb 13 19:52:58 localhost kernel:         cdb 2a 00 00 00 00 00 00 00  10 00 00 00 00 00 00 00
Feb 13 19:52:58 localhost kernel:         res 40/00:03:00:0c:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Feb 13 19:52:58 localhost kernel: ata2.00: status: { DRDY }
Feb 13 19:52:58 localhost kernel: ata2: hard resetting link
Feb 13 19:52:58 localhost kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Feb 13 19:52:58 localhost kernel: ata2.00: configured for UDMA/100
Feb 13 19:52:58 localhost kernel: ata2: EH complete
Feb 13 19:54:01 localhost kernel: cdrom: This disc doesn't have any tracks I recognize!
Feb 13 19:54:01 localhost kernel: end_request: I/O error, dev sr0, sector 0
Feb 13 19:54:01 localhost kernel: Buffer I/O error on device sr0, logical block 0
Feb 13 19:54:01 localhost kernel: end_request: I/O error, dev sr0, sector 0
Feb 13 19:54:01 localhost kernel: Buffer I/O error on device sr0, logical block 0
Feb 13 19:55:20 localhost kernel: ata2: limiting SATA link speed to 1.5 Gbps
Feb 13 19:55:20 localhost kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Feb 13 19:55:20 localhost kernel: ata2.00: cmd a0/01:00:00:00:80/00:00:00:00:00/a0 tag 0 dma 32768 out
Feb 13 19:55:20 localhost kernel:         cdb 2a 00 00 00 00 00 00 00  10 00 00 00 00 00 00 00
Feb 13 19:55:20 localhost kernel:         res 40/00:03:00:0c:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Feb 13 19:55:20 localhost kernel: ata2.00: status: { DRDY }
Feb 13 19:55:20 localhost kernel: ata2: hard resetting link
Feb 13 19:55:26 localhost kernel: ata2: link is slow to respond, please be patient (ready=0)
Feb 13 19:55:30 localhost kernel: ata2: softreset failed (device not ready)
Feb 13 19:55:30 localhost kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Feb 13 19:55:35 localhost kernel: ata2.00: qc timeout (cmd 0xa1)
Feb 13 19:55:35 localhost kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Feb 13 19:55:35 localhost kernel: ata2.00: revalidation failed (errno=-5)
Feb 13 19:55:35 localhost kernel: ata2: hard resetting link
Feb 13 19:55:41 localhost kernel: ata2: link is slow to respond, please be patient (ready=0)
Feb 13 19:55:45 localhost kernel: ata2: softreset failed (device not ready)
Feb 13 19:55:45 localhost kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Feb 13 19:55:55 localhost kernel: ata2.00: qc timeout (cmd 0xa1)
Feb 13 19:55:55 localhost kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Feb 13 19:55:55 localhost kernel: ata2.00: revalidation failed (errno=-5)
Feb 13 19:55:55 localhost kernel: ata2: hard resetting link
Feb 13 19:56:01 localhost kernel: ata2: link is slow to respond, please be patient (ready=0)
Feb 13 19:56:05 localhost kernel: ata2: softreset failed (device not ready)
Feb 13 19:56:05 localhost kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Feb 13 19:56:35 localhost kernel: ata2.00: qc timeout (cmd 0xa1)
Feb 13 19:56:35 localhost kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Feb 13 19:56:35 localhost kernel: ata2.00: revalidation failed (errno=-5)
Feb 13 19:56:35 localhost kernel: ata2.00: disabled
Feb 13 19:56:35 localhost kernel: ata2: hard resetting link
Feb 13 19:56:41 localhost kernel: ata2: link is slow to respond, please be patient (ready=0)
Feb 13 19:56:45 localhost kernel: ata2: softreset failed (device not ready)
Feb 13 19:56:45 localhost kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Feb 13 19:56:45 localhost kernel: ata2: EH complete

============== The drive does not respond, I can not eject disc ==========================

After reboot everything worked OK...

============== Additional info: ==========================================================

lspci -vv (related to SATA controller):

00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03) (prog-if 01 [AHCI 1.0])
        Subsystem: Hewlett-Packard Company Device 3603
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin B routed to IRQ 19
        Region 0: I/O ports at a108 [size=8]
        Region 1: I/O ports at a114 [size=4]
        Region 2: I/O ports at a100 [size=8]
        Region 3: I/O ports at a110 [size=4]
        Region 4: I/O ports at a020 [size=32]
        Region 5: Memory at df305000 (32-bit, non-prefetchable) [size=2K]
        Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Count=1/16 Enable-
                Address: 00000000  Data: 0000
        Capabilities: [70] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [a8] SATA HBA <?>
        Capabilities: [b0] PCIe advanced features <?>
        Kernel driver in use: ahci

# dmesg | egrep '^ata.:|^ahci'
ahci 0000:00:1f.2: version 3.0
ahci 0000:00:1f.2: PCI INT B -> GSI 19 (level, low) -> IRQ 19
ahci 0000:00:1f.2: AHCI 0001.0200 32 slots 4 ports 3 Gbps 0x33 impl SATA mode
ahci 0000:00:1f.2: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part ems
ahci 0000:00:1f.2: setting latency timer to 64
ata1: SATA max UDMA/133 abar m2048@0xdf305000 port 0xdf305100 irq 19
ata2: SATA max UDMA/133 abar m2048@0xdf305000 port 0xdf305180 irq 19
ata3: DUMMY
ata4: DUMMY
ata5: SATA max UDMA/133 abar m2048@0xdf305000 port 0xdf305300 irq 19
ata6: SATA max UDMA/133 abar m2048@0xdf305000 port 0xdf305380 irq 19
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata5: SATA link down (SStatus 0 SControl 300)
ata6: SATA link down (SStatus 0 SControl 300)


# uname -a
Linux localhost 2.6.27.12-170.2.5.fc10.i686 #1 SMP Wed Jan 21 02:09:37 EST 2009 i686 i686 i386 GNU/Linux

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel ICH9M/M-E SATA error-handling/reset problems
  2009-02-14 20:06 Intel ICH9M/M-E SATA error-handling/reset problems Serguei Miridonov
@ 2009-02-14 20:53 ` Jeff Garzik
  2009-02-14 22:01 ` Robert Hancock
  1 sibling, 0 replies; 11+ messages in thread
From: Jeff Garzik @ 2009-02-14 20:53 UTC (permalink / raw)
  To: Serguei Miridonov; +Cc: linux-kernel, Linux IDE mailing list

Serguei Miridonov wrote:
> I have some problems with SATA in a new notebook PC (HP Pavilion dv5t, 
> Intel chipset). Seagate FreeAgent Pro 1TB external drive practically 
> can not be used with eSATA in Linux (fresh install from DVD Fedora 10, 
> now fully updated), and yesterday I also had problem with DVD 
> recording using internal HL-DT-ST BDDVDRW drive.

Some eSata fixes went into the more-recent kernels...  Can you try 
2.6.29-rc5?

	Jeff




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel ICH9M/M-E SATA error-handling/reset problems
  2009-02-14 20:06 Intel ICH9M/M-E SATA error-handling/reset problems Serguei Miridonov
  2009-02-14 20:53 ` Jeff Garzik
@ 2009-02-14 22:01 ` Robert Hancock
  2009-02-15 18:00   ` Serguei Miridonov
  1 sibling, 1 reply; 11+ messages in thread
From: Robert Hancock @ 2009-02-14 22:01 UTC (permalink / raw)
  To: Serguei Miridonov; +Cc: linux-kernel

Serguei Miridonov wrote:
> I have some problems with SATA in a new notebook PC (HP Pavilion dv5t, 
> Intel chipset). Seagate FreeAgent Pro 1TB external drive practically 
> can not be used with eSATA in Linux (fresh install from DVD Fedora 10, 
> now fully updated), and yesterday I also had problem with DVD 
> recording using internal HL-DT-ST BDDVDRW drive.
> 
> More details in attachment.
> 
> Both devices work with Windows Vista. Seagate external drive even in 
> Vista produces "parity error" messages in Windows event log but OS is 
> somehow recovering from these errors and continues to use the drive 
> with slight slowdown (average speed varies between 60 and 110 MB/s). 
> Of course, it could be cable/Seagate issue, but again - Vista can 
> handle this.

There are a lot of issues with eSATA drives and cabling. As Jeff 
mentioned, there are some changes in 2.6.29-rc that may improve the 
behavior, but the root cause here is a hardware issue (you should not 
expect very good behavior in Vista either with those errors).

As far as the DVD burning issue, it's hard to say for sure. It looks 
like a write command was timing out. Could be due to your drive not 
working well with that type of media.

> 
> It appears that Linux kernel has problems with error-handling/reset of 
> SATA hardware. I have found a lot of reports regarding SATA problems: 
> data transfer failures, CD/DVD recording, waking up from suspend to 
> RAM, etc. Aren't they all related? Can Linux SATA chipsets drivers 

Not related at all, mostly.. though a lot of people seem to think they 
are. Often times people think problems are related because the error 
messages seem similar, and even the same error can be triggered by 
numerous different problems, often not the fault of the kernel.

> properly reset hardware into predictable state? Sure, I could be wrong 
> and my issue may have nothing to do with others... Any idea?
> 
> Please, CC any reply to my e-mail.
> Thank you.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel ICH9M/M-E SATA error-handling/reset problems
  2009-02-14 22:01 ` Robert Hancock
@ 2009-02-15 18:00   ` Serguei Miridonov
  2009-02-15 18:04     ` Robert Hancock
  0 siblings, 1 reply; 11+ messages in thread
From: Serguei Miridonov @ 2009-02-15 18:00 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel, Jeff Garzik

Hello Robert and Jeff,

Thank you for your replies. 

On Saturday 14 February 2009, Jeff Garzik wrote:
> Serguei Miridonov wrote:
> > I have some problems with SATA in a new notebook PC (HP Pavilion
> > dv5t, Intel chipset). Seagate FreeAgent Pro 1TB external drive
> > practically can not be used with eSATA in Linux (fresh install
> > from DVD Fedora 10, now fully updated), and yesterday I also had
> > problem with DVD recording using internal HL-DT-ST BDDVDRW drive.
>
> Some eSata fixes went into the more-recent kernels...  Can you try
> 2.6.29-rc5?

Unfortunately, right now I can not provide a good testing bed for a 
new kernel. I was also thinking about bad cable and returned it to the 
store. Recording DVDs, as you understand, can not be considered for 
testing: I don't do it on regular basis... I will be looking for a new 
eSATA cable in a week or two, so when I have it I'll try to download 
and build the kernel for these experiments.

On Saturday 14 February 2009, Robert Hancock wrote:
> Serguei Miridonov wrote:
> > Both devices work with Windows Vista. Seagate external drive even
> > in Vista produces "parity error" messages in Windows event log
> > but OS is somehow recovering from these errors and continues to
> > use the drive with slight slowdown (average speed varies between
> > 60 and 110 MB/s). Of course, it could be cable/Seagate issue, but
> > again - Vista can handle this.
>
> There are a lot of issues with eSATA drives and cabling. As Jeff
> mentioned, there are some changes in 2.6.29-rc that may improve the
> behavior, but the root cause here is a hardware issue (you should
> not expect very good behavior in Vista either with those errors).

I agree with you completely. Nevertheless, something like 10 errors 
per 2GB transfer can not be the reason to give up. Vista, at least, 
recovers and continues the data transfer. Linux simply can not return 
the interface or connected device into operating mode. Do you think it 
is normal?

> As far as the DVD burning issue, it's hard to say for sure. It
> looks like a write command was timing out. Could be due to your
> drive not working well with that type of media.

Well, it could be, though I did not consider Sony DVD+R as bad media. 
My fault may be... Anyway, even if it true, why k3b (or whatever 
backend used for recording) just can not establish connection with a 
drive because of kernel which must keep hardware working even is there 
were some intermittent interface errors.

> > It appears that Linux kernel has problems with
> > error-handling/reset of SATA hardware. I have found a lot of
> > reports regarding SATA problems: data transfer failures, CD/DVD
> > recording, waking up from suspend to RAM, etc. Aren't they all
> > related? Can Linux SATA chipsets drivers
>
> Not related at all, mostly.. though a lot of people seem to think
> they are. Often times people think problems are related because the
> error messages seem similar, and even the same error can be
> triggered by numerous different problems, often not the fault of
> the kernel.

I'm not talking now about errors triggered by the kernel due to some 
bugs. What I see in the logs, this is the kernel fault to recover from 
errors, not causing it. I hope that this is fixed already in newer 
kernels, though I could not find such information in changelogs.

I could be wrong, of course, but it seems to me that if kernel can 
really reset the interface and return it and connected devices to 
operating mode, then most of issues mentioned above may become not so 
critical and people could live with them until root cause is fixed 
properly.

May be resetting the interface will not help is all cases if a device 
is left in some screwed up state due to earlier poor error handling... 
Well, this is another issue which can be device-vendor-dependent... 
However, regarding external Seagate drive, Vista does not have any 
special driver to handle its errors, it just works...


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel ICH9M/M-E SATA error-handling/reset problems
  2009-02-15 18:00   ` Serguei Miridonov
@ 2009-02-15 18:04     ` Robert Hancock
  2009-02-15 19:41       ` Serguei Miridonov
  2009-02-16  2:11       ` Tejun Heo
  0 siblings, 2 replies; 11+ messages in thread
From: Robert Hancock @ 2009-02-15 18:04 UTC (permalink / raw)
  To: Serguei Miridonov; +Cc: linux-kernel, Jeff Garzik

Serguei Miridonov wrote:
> Hello Robert and Jeff,
> 
> Thank you for your replies. 
> 
> On Saturday 14 February 2009, Jeff Garzik wrote:
>> Serguei Miridonov wrote:
>>> I have some problems with SATA in a new notebook PC (HP Pavilion
>>> dv5t, Intel chipset). Seagate FreeAgent Pro 1TB external drive
>>> practically can not be used with eSATA in Linux (fresh install
>>> from DVD Fedora 10, now fully updated), and yesterday I also had
>>> problem with DVD recording using internal HL-DT-ST BDDVDRW drive.
>> Some eSata fixes went into the more-recent kernels...  Can you try
>> 2.6.29-rc5?
> 
> Unfortunately, right now I can not provide a good testing bed for a 
> new kernel. I was also thinking about bad cable and returned it to the 
> store. Recording DVDs, as you understand, can not be considered for 
> testing: I don't do it on regular basis... I will be looking for a new 
> eSATA cable in a week or two, so when I have it I'll try to download 
> and build the kernel for these experiments.
> 
> On Saturday 14 February 2009, Robert Hancock wrote:
>> Serguei Miridonov wrote:
>>> Both devices work with Windows Vista. Seagate external drive even
>>> in Vista produces "parity error" messages in Windows event log
>>> but OS is somehow recovering from these errors and continues to
>>> use the drive with slight slowdown (average speed varies between
>>> 60 and 110 MB/s). Of course, it could be cable/Seagate issue, but
>>> again - Vista can handle this.
>> There are a lot of issues with eSATA drives and cabling. As Jeff
>> mentioned, there are some changes in 2.6.29-rc that may improve the
>> behavior, but the root cause here is a hardware issue (you should
>> not expect very good behavior in Vista either with those errors).
> 
> I agree with you completely. Nevertheless, something like 10 errors 
> per 2GB transfer can not be the reason to give up. Vista, at least, 
> recovers and continues the data transfer. Linux simply can not return 
> the interface or connected device into operating mode. Do you think it 
> is normal?

Could be that Linux is being a bit more aggressive on error handling. In 
your case, it looks like an error occurred, triggering a hard reset of 
the device, and the controller seemed unable to talk to the device 
afterwards. If the command had just been retried, maybe it would have 
worked better. However, doing that in general can cause issues since you 
don't know what the state of the link may be..

> 
>> As far as the DVD burning issue, it's hard to say for sure. It
>> looks like a write command was timing out. Could be due to your
>> drive not working well with that type of media.
> 
> Well, it could be, though I did not consider Sony DVD+R as bad media. 
> My fault may be... Anyway, even if it true, why k3b (or whatever 
> backend used for recording) just can not establish connection with a 
> drive because of kernel which must keep hardware working even is there 
> were some intermittent interface errors.
> 
>>> It appears that Linux kernel has problems with
>>> error-handling/reset of SATA hardware. I have found a lot of
>>> reports regarding SATA problems: data transfer failures, CD/DVD
>>> recording, waking up from suspend to RAM, etc. Aren't they all
>>> related? Can Linux SATA chipsets drivers
>> Not related at all, mostly.. though a lot of people seem to think
>> they are. Often times people think problems are related because the
>> error messages seem similar, and even the same error can be
>> triggered by numerous different problems, often not the fault of
>> the kernel.
> 
> I'm not talking now about errors triggered by the kernel due to some 
> bugs. What I see in the logs, this is the kernel fault to recover from 
> errors, not causing it. I hope that this is fixed already in newer 
> kernels, though I could not find such information in changelogs.
> 
> I could be wrong, of course, but it seems to me that if kernel can 
> really reset the interface and return it and connected devices to 
> operating mode, then most of issues mentioned above may become not so 
> critical and people could live with them until root cause is fixed 
> properly.
> 
> May be resetting the interface will not help is all cases if a device 
> is left in some screwed up state due to earlier poor error handling... 
> Well, this is another issue which can be device-vendor-dependent... 
> However, regarding external Seagate drive, Vista does not have any 
> special driver to handle its errors, it just works...

See above..

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel ICH9M/M-E SATA error-handling/reset problems
  2009-02-15 18:04     ` Robert Hancock
@ 2009-02-15 19:41       ` Serguei Miridonov
  2009-02-15 20:15         ` Robert Hancock
  2009-02-16  2:11       ` Tejun Heo
  1 sibling, 1 reply; 11+ messages in thread
From: Serguei Miridonov @ 2009-02-15 19:41 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel, Jeff Garzik

On Sunday 15 February 2009, Robert Hancock wrote:
> Serguei Miridonov wrote:
> > On Saturday 14 February 2009, Robert Hancock wrote:
> >> Serguei Miridonov wrote:
> > ... something like 10
> > errors per 2GB transfer can not be the reason to give up. Vista,
> > at least, recovers and continues the data transfer. Linux simply
> > can not return the interface or connected device into operating
> > mode. Do you think it is normal?
>
> Could be that Linux is being a bit more aggressive on error
> handling. In your case, it looks like an error occurred, triggering
> a hard reset of the device, and the controller seemed unable to
> talk to the device afterwards. If the command had just been
> retried, maybe it would have worked better. However, doing that in
> general can cause issues since you don't know what the state of the
> link may be..

Hmm... I was sure there are general recommendations from chipset 
vendors regarding recovery procedures.

What is the behavior expected from a SATA connected device if it 
detects parity error in received data? I'm not familiar with PATA/SATA 
protocols but I suppose that it just doesn't send data to the physical 
disk for recording, asserts the error line and waits next command from 
the controller. If the data block was too big to keep it in the drive 
cache memory, it may also set number of successfully (physically) 
written bytes to prevent the software to send it again.

If the above is correct then the kernel should only log the error, do 
some housekeeping work for the controller and attempt to send data 
again. There is no need for hard reset right after first error.

Another question is how the drive reacts to hard reset... My error log 
shows that both drives do not like it for some reason - they stop 
responding sometimes, so may be some additional programming of drives 
is necessary after hard reset... Something which is done in BIOS after 
power on... I don't know...

Well, it becomes interesting... I've got datasheet for ICH9 but don't 
have a kernel driver source to check what messages in log file really 
mean. Could you point me a link to the uncompressed kernel tree where 
I can see source files?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel ICH9M/M-E SATA error-handling/reset problems
  2009-02-15 19:41       ` Serguei Miridonov
@ 2009-02-15 20:15         ` Robert Hancock
  2009-02-15 21:55           ` Serguei Miridonov
  0 siblings, 1 reply; 11+ messages in thread
From: Robert Hancock @ 2009-02-15 20:15 UTC (permalink / raw)
  To: Serguei Miridonov; +Cc: linux-kernel, Jeff Garzik, Tejun Heo

Serguei Miridonov wrote:
> On Sunday 15 February 2009, Robert Hancock wrote:
>> Serguei Miridonov wrote:
>>> On Saturday 14 February 2009, Robert Hancock wrote:
>>>> Serguei Miridonov wrote:
>>> ... something like 10
>>> errors per 2GB transfer can not be the reason to give up. Vista,
>>> at least, recovers and continues the data transfer. Linux simply
>>> can not return the interface or connected device into operating
>>> mode. Do you think it is normal?
>> Could be that Linux is being a bit more aggressive on error
>> handling. In your case, it looks like an error occurred, triggering
>> a hard reset of the device, and the controller seemed unable to
>> talk to the device afterwards. If the command had just been
>> retried, maybe it would have worked better. However, doing that in
>> general can cause issues since you don't know what the state of the
>> link may be..
> 
> Hmm... I was sure there are general recommendations from chipset 
> vendors regarding recovery procedures.
> 
> What is the behavior expected from a SATA connected device if it 
> detects parity error in received data? I'm not familiar with PATA/SATA 
> protocols but I suppose that it just doesn't send data to the physical 
> disk for recording, asserts the error line and waits next command from 
> the controller. If the data block was too big to keep it in the drive 
> cache memory, it may also set number of successfully (physically) 
> written bytes to prevent the software to send it again.

In the case of a CRC error the error flag gets set and the transfer is 
aborted by whichever side detects it. In this case the entire transfer 
gets retried.

> 
> If the above is correct then the kernel should only log the error, do 
> some housekeeping work for the controller and attempt to send data 
> again. There is no need for hard reset right after first error.

Right now interface CRC error is considered an ATA bus error which 
always triggers a reset. It's possible this could be relaxed in some 
cases, but the issue is that if CRC errors are occurring the link may be 
in an invalid state which simply retrying the command will not clear.

Tejun, any thoughts?

> 
> Another question is how the drive reacts to hard reset... My error log 
> shows that both drives do not like it for some reason - they stop 
> responding sometimes, so may be some additional programming of drives 
> is necessary after hard reset... Something which is done in BIOS after 
> power on... I don't know...

The same hard reset is done (and generally has to be done) on driver 
initialization and when a drive is hot plugged, so it should work. 
However, if the link is having problems (and it obviously is, from the 
CRC errors) the drive may not receive the reset either.

> 
> Well, it becomes interesting... I've got datasheet for ICH9 but don't 
> have a kernel driver source to check what messages in log file really 
> mean. Could you point me a link to the uncompressed kernel tree where 
> I can see source files?
> 

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git is 
likely the easiest place to view..

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel ICH9M/M-E SATA error-handling/reset problems
  2009-02-15 20:15         ` Robert Hancock
@ 2009-02-15 21:55           ` Serguei Miridonov
  0 siblings, 0 replies; 11+ messages in thread
From: Serguei Miridonov @ 2009-02-15 21:55 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel, Jeff Garzik, Tejun Heo

On Sunday 15 February 2009, Robert Hancock wrote:
> Right now interface CRC error is considered an ATA bus error which
> always triggers a reset.

Well, my very strong opinion based just on general physics is that 
error rate on SATA can be (and will be) much higher than that one on 
PATA. PATA operates at lower frequencies and cables are much shorter. 
eSATA cables are longer and work at up to 3Gb/s. Moreover, consider 
all these consumer-grade connectors, cables, etc. So, CRC errors could 
be quite common and software needs to handle them properly to keep 
transfers fast and maintain the communication with a device.

> It's possible this could be relaxed in
> some cases, but the issue is that if CRC errors are occurring the
> link may be in an invalid state which simply retrying the command
> will not clear.

Let's think positively ;-). If CRC error occurs (in data or command 
sequence), the device just doesn't accept what it receives with the 
last transfer. So, it should wait what host says next. I think, before 
doing hard reset or whatever is necessary to completely restart the 
interface together with connected device - before doing that the 
kernel should try to check if link is up and the device is listenning. 
Why not to try a short request to let the device send something short 
in response?

> Tejun, any thoughts?
>
> > Another question is how the drive reacts to hard reset... My
> > error log shows that both drives do not like it for some reason -
> > they stop responding sometimes, so may be some additional
> > programming of drives is necessary after hard reset... Something
> > which is done in BIOS after power on... I don't know...
>
> The same hard reset is done (and generally has to be done) on
> driver initialization and when a drive is hot plugged, so it should
> work.

It depends... If hard reset is like a reboot for the driver firmware, 
it may take more that 30 seconds for Seagate external drive, though 
I'm not sure... Trying to push the interaface before the device is 
ready to receive commands may be considered by the drive as link 
problem and it may refuse to communicate. Well, again, I'm not 
familiar with this, just speculating...

> > ... Could you point me a link to the uncompressed
> > kernel tree where I can see source files?
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git is
> likely the easiest place to view..

Thank you, I'll take a look.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel ICH9M/M-E SATA error-handling/reset problems
  2009-02-15 18:04     ` Robert Hancock
  2009-02-15 19:41       ` Serguei Miridonov
@ 2009-02-16  2:11       ` Tejun Heo
  2009-02-16 16:17         ` Serguei Miridonov
  1 sibling, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2009-02-16  2:11 UTC (permalink / raw)
  To: Robert Hancock; +Cc: Serguei Miridonov, linux-kernel, Jeff Garzik

Hello,

Robert Hancock wrote:
> Serguei Miridonov wrote:
>> Hello Robert and Jeff,
>>
>> Thank you for your replies.
>> On Saturday 14 February 2009, Jeff Garzik wrote:
>>> Serguei Miridonov wrote:
>>>> I have some problems with SATA in a new notebook PC (HP Pavilion
>>>> dv5t, Intel chipset). Seagate FreeAgent Pro 1TB external drivee
>>>> practically can not be used with eSATA in Linux (fresh install
>>>> from DVD Fedora 10, now fully updated), and yesterday I also had
>>>> problem with DVD recording using internal HL-DT-ST BDDVDRW drive.
>>> Some eSata fixes went into the more-recent kernels...  Can you try
>>> 2.6.29-rc5?
>>
>> Unfortunately, right now I can not provide a good testing bed for a
>> new kernel. I was also thinking about bad cable and returned it to the
>> store. Recording DVDs, as you understand, can not be considered for
>> testing: I don't do it on regular basis... I will be looking for a new
>> eSATA cable in a week or two, so when I have it I'll try to download
>> and build the kernel for these experiments.

Please try shorter (or different) cable.  Most eSATA problems are
cabling problems.  Speeding down to 1.5Gbps often improves the
situation a lot (windows might do this by default).  There was a
stupid bug in speeding down logic and speeding down to 1.5Gbps didn't
happen as designed till lately.  The fix went into -stable and should
show up in most distros soon (or just roll your own kernel).

>> I agree with you completely. Nevertheless, something like 10 errors
>> per 2GB transfer can not be the reason to give up. Vista, at least,
>> recovers and continues the data transfer. Linux simply can not return
>> the interface or connected device into operating mode. Do you think it
>> is normal?

Well, there isn't much point in keeping retrying if the same command
fails consecutively.  The problem was the broken speed down logic, so
all the retries failed and FS eventually received IO failure.  Should
have been fixed with recent changes.

....

>>>> It appears that Linux kernel has problems with
>>>> error-handling/reset of SATA hardware. I have found a lot of
>>>> reports regarding SATA problems: data transfer failures, CD/DVD
>>>> recording, waking up from suspend to RAM, etc. Aren't they all
>>>> related? Can Linux SATA chipsets drivers
>>> Not related at all, mostly.. though a lot of people seem to think
>>> they are. Often times people think problems are related because the
>>> error messages seem similar, and even the same error can be
>>> triggered by numerous different problems, often not the fault of
>>> the kernel.

Heh... yeah, this sometimes gets tiring.  Maybe we should reformat ATA
error messages every six month or so?  :-)

Joking aside, yes, there have been and are repeated patterns of
failures.  Some have passed (e.g. the ATAPI transfer length ones) and
some stay (cabling, power).  Nonetheless, in most cases, what people
think they are experiencing isn't quite correct.

>> I'm not talking now about errors triggered by the kernel due to some
>> bugs. What I see in the logs, this is the kernel fault to recover from
>> errors, not causing it. I hope that this is fixed already in newer
>> kernels, though I could not find such information in changelogs.
>>
>> I could be wrong, of course, but it seems to me that if kernel can
>> really reset the interface and return it and connected devices to
>> operating mode, then most of issues mentioned above may become not so
>> critical and people could live with them until root cause is fixed
>> properly.
>>
>> May be resetting the interface will not help is all cases if a device
>> is left in some screwed up state due to earlier poor error handling...
>> Well, this is another issue which can be device-vendor-dependent...
>> However, regarding external Seagate drive, Vista does not have any
>> special driver to handle its errors, it just works...

libata EH actually does pretty good in most cases.  You'll see a lot
of current and archived bug reports but when considering the number of
ATA devices (many of them are crappy) out in the wild and that the
influx of bug reports has gone down considerably, I think it's doing
pretty good.

In the log, ata2.00 went down after a timeout.  The reset per-se isn't
the problem and is the RTTD after a timeout as the controller and
device states are unknown.  The situations like yours in the log often
happens because an ATAPI device shuts down completely after certain
transmission problems.  When this happens, there's nothing much the
driver can do and soft reboot wouldn't recover the device either.

But seeing you're on dv5, I think you might be experiencing something
else.  Please take a look at the following bz.

  http://bugzilla.kernel.org/show_bug.cgi?id=12276

It seems recent HP laptops do something differently and make the ahci
controller behave strangely.  On dv5 and HDX16t, suspend/resume
doesn't work.  The link simply doesn't come up after resuming and this
is the _ONLY_ report of this kind of problem for all intel AHCIs ever,
so yeah HP is doing something.  I'm trying to contact HP about this
but hasn't gotten anywhere yet.

So, you're more likely to be seeing similar problem, I think.  Can you
please test whether you see the same suspend/resume problem?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel ICH9M/M-E SATA error-handling/reset problems
  2009-02-16  2:11       ` Tejun Heo
@ 2009-02-16 16:17         ` Serguei Miridonov
  2009-02-19  6:29           ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Serguei Miridonov @ 2009-02-16 16:17 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Robert Hancock, linux-kernel, Jeff Garzik

Hello,

On Sunday 15 February 2009, Tejun Heo wrote:
> Please try shorter (or different) cable. 

I will, in a few days, may be.

> >> I agree with you completely. Nevertheless, something like 10
> >> errors per 2GB transfer can not be the reason to give up. Vista,
> >> at least, recovers and continues the data transfer. Linux simply
> >> can not return the interface or connected device into operating
> >> mode. Do you think it is normal?
>
> Well, there isn't much point in keeping retrying if the same
> command fails consecutively. 

I'm not talking about the _same_ transfer command. I mean intermittent 
errors, average 10 parity errors per 2GB file. Let me repeat myself 
from another post:

... my very strong opinion based just on general physics is that 
error rate on SATA can be (and will be) much higher than that one on 
PATA. PATA operates at lower frequencies and cables are much shorter. 
eSATA cables are longer and work at up to 3Gb/s. Moreover, consider 
all these consumer-grade connectors, cables, etc. So, CRC errors could 
be quite common and software needs to handle them properly to keep 
transfers fast and maintain the communication with a device.

And, remember USB bulk transfer? Who is taking care on CRC check and 
retries there?

> The problem was the broken speed down
> logic, so all the retries failed and FS eventually received IO
> failure.  Should have been fixed with recent changes.

Slow down may help to reduce amount of errors but it may happen that 
they can not be avoided completely.

> In the log, ata2.00 went down after a timeout.  The reset per-se
> isn't the problem and is the RTTD after a timeout as the controller
> and device states are unknown.  The situations like yours in the
> log often happens because an ATAPI device shuts down completely
> after certain transmission problems.  When this happens, there's
> nothing much the driver can do and soft reboot wouldn't recover the
> device either.

So, this is the kernel job to keep things working, not break them :-)

> But seeing you're on dv5, I think you might be experiencing
> something else.  Please take a look at the following bz.
>
>   http://bugzilla.kernel.org/show_bug.cgi?id=12276

Yes, I tried to suspend to RAM and when the laptop waked up it failed 
to communicate with the hard drive. So, I use hibernate instead.

> ... I'm trying to
> contact HP about this but hasn't gotten anywhere yet.

Please, let us know if they reply. 

Thank you.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Intel ICH9M/M-E SATA error-handling/reset problems
  2009-02-16 16:17         ` Serguei Miridonov
@ 2009-02-19  6:29           ` Tejun Heo
  0 siblings, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2009-02-19  6:29 UTC (permalink / raw)
  To: Serguei Miridonov; +Cc: Robert Hancock, linux-kernel, Jeff Garzik

Hello, Serguei.

Serguei Miridonov wrote:
>>>> I agree with you completely. Nevertheless, something like 10
>>>> errors per 2GB transfer can not be the reason to give up. Vista,
>>>> at least, recovers and continues the data transfer. Linux simply
>>>> can not return the interface or connected device into operating
>>>> mode. Do you think it is normal?
>> Well, there isn't much point in keeping retrying if the same
>> command fails consecutively. 
> 
> I'm not talking about the _same_ transfer command. I mean intermittent 
> errors, average 10 parity errors per 2GB file. Let me repeat myself 
> from another post:
> 
> ... my very strong opinion based just on general physics is that 
> error rate on SATA can be (and will be) much higher than that one on 
> PATA. PATA operates at lower frequencies and cables are much shorter. 
> eSATA cables are longer and work at up to 3Gb/s. Moreover, consider 
> all these consumer-grade connectors, cables, etc. So, CRC errors could 
> be quite common and software needs to handle them properly to keep 
> transfers fast and maintain the communication with a device.

The kernel doesn't give up after intermittent errors.

> And, remember USB bulk transfer? Who is taking care on CRC check and 
> retries there?

What you're describing is already handled.  No need to worry about it.

>> The problem was the broken speed down
>> logic, so all the retries failed and FS eventually received IO
>> failure.  Should have been fixed with recent changes.
> 
> Slow down may help to reduce amount of errors but it may happen that 
> they can not be avoided completely.
> 
>> In the log, ata2.00 went down after a timeout.  The reset per-se
>> isn't the problem and is the RTTD after a timeout as the controller
>> and device states are unknown.  The situations like yours in the
>> log often happens because an ATAPI device shuts down completely
>> after certain transmission problems.  When this happens, there's
>> nothing much the driver can do and soft reboot wouldn't recover the
>> device either.
> 
> So, this is the kernel job to keep things working, not break them :-)

Yeah, and other than the hardware quirkiness on your machine, it
already works fine.

>> But seeing you're on dv5, I think you might be experiencing
>> something else.  Please take a look at the following bz.
>>
>>   http://bugzilla.kernel.org/show_bug.cgi?id=12276
> 
> Yes, I tried to suspend to RAM and when the laptop waked up it failed 
> to communicate with the hard drive. So, I use hibernate instead.

Can you please try to take a look at the kernel log after the kernel
resumes and see whether you're actually seeing the same problem?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-02-19  6:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-14 20:06 Intel ICH9M/M-E SATA error-handling/reset problems Serguei Miridonov
2009-02-14 20:53 ` Jeff Garzik
2009-02-14 22:01 ` Robert Hancock
2009-02-15 18:00   ` Serguei Miridonov
2009-02-15 18:04     ` Robert Hancock
2009-02-15 19:41       ` Serguei Miridonov
2009-02-15 20:15         ` Robert Hancock
2009-02-15 21:55           ` Serguei Miridonov
2009-02-16  2:11       ` Tejun Heo
2009-02-16 16:17         ` Serguei Miridonov
2009-02-19  6:29           ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.