linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Frequent SATA resets with sata_nv (fwd)
@ 2007-06-24  0:52 Matthew "Cheetah" Gabeler-Lee
  2007-06-24 17:09 ` Alistair John Strachan
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew "Cheetah" Gabeler-Lee @ 2007-06-24  0:52 UTC (permalink / raw)
  To: linux-kernel

(Please cc me on replies)

I have three samsung hdds (/sys/block/sda/device/model says SAMSUNG 
SP2504C) in a raid configuration.  My system frequently (2-3x/day) 
experiences temporary lockups, which produce messages as below in my 
dmesg/syslog.  The system recovers, but the hang is annoying to say the 
least.

All three drives are connected to sata_nv ports.  Oddly, it almost 
always happens on ata6 or ata7 (the second and third ports of that 4 
port setup on my motherboard).  There is an identical drive connected at 
ata5, but I've only once or twice seen it hit that drive.

Googling around lkml.org, I found a few threads investigating what look 
like very similar problems, some of which never seemed to find the 
solution, but one of which came up with a fairly quick answer it seemed, 
namely that the drive's NCQ implementation was horked: 
http://lkml.org/lkml/2007/4/18/32

While I don't have older logs to verify exactly when this started, it 
was fairly recent, perhaps around my 2.6.20.1 to 2.6.21.1 kernel 
upgrade.

Any other info or tests I can provide/run to help?

Syslog snippet:
Jun 21 10:35:23 cheetah kernel: ata6: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0 next cpb idx 0x0
Jun 21 10:35:24 cheetah kernel: ata6: CPB 0: ctl_flags 0x9, resp_flags 0x0
Jun 21 10:35:24 cheetah kernel: ata6: timeout waiting for ADMA IDLE, stat=0x400
Jun 21 10:35:24 cheetah kernel: ata6: timeout waiting for ADMA LEGACY, stat=0x400
Jun 21 10:35:24 cheetah kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Jun 21 10:35:24 cheetah kernel: ata6.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
Jun 21 10:35:24 cheetah kernel:          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 21 10:35:24 cheetah kernel: ata6: soft resetting port
Jun 21 10:35:24 cheetah kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 21 10:35:24 cheetah kernel: ata6.00: configured for UDMA/133
Jun 21 10:35:24 cheetah kernel: ata6: EH complete
Jun 21 10:35:24 cheetah kernel: SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
Jun 21 10:35:24 cheetah kernel: sdb: Write Protect is off
Jun 21 10:35:24 cheetah kernel: sdb: Mode Sense: 00 3a 00 00
Jun 21 10:35:24 cheetah kernel: SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA

# lspci
00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3)
00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2)
00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2)
00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3)
00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2)
00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2)
00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)
00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)
00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2)
00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)
00:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:0c.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 80)
01:0d.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 15)
05:00.0 VGA compatible controller: nVidia Corporation NV43 [GeForce 6600 GT] (rev a2)

-- 
	-Cheetah
"Reality is that which, when you stop believing in it, doesn't go away".
                -- Philip K. Dick
GPG pubkey fingerprint: A57F B354 FD30 A502 795B 9637 3EF1 3F22 A85E 2AD1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Frequent SATA resets with sata_nv (fwd)
  2007-06-24  0:52 Frequent SATA resets with sata_nv (fwd) Matthew "Cheetah" Gabeler-Lee
@ 2007-06-24 17:09 ` Alistair John Strachan
  0 siblings, 0 replies; 5+ messages in thread
From: Alistair John Strachan @ 2007-06-24 17:09 UTC (permalink / raw)
  To: linux-kernel

(Sorry, accidentally dropped LKML)

On Sunday 24 June 2007 01:52:30 you wrote:
> Googling around lkml.org, I found a few threads investigating what look
> like very similar problems, some of which never seemed to find the
> solution, but one of which came up with a fairly quick answer it seemed,
> namely that the drive's NCQ implementation was horked:
> http://lkml.org/lkml/2007/4/18/32

Well, there's been generic problems with the ADMA code on the CK804, but I 
think Robert fixed them (added CC). I've certainly had NO problems since 
2.6.21.

However, assuming the drive's NCQ _is_ busted and needs to be blacklisted, you 
might find you can temporarily work around the problem by loading the sata_nv 
module with adma=0, or boot with sata_nv.adma=0. Not to point the finger at 
ADMA support specifically, of course, but simply that ADMA enables the NCQ 
features.

It'd be good if you could report back whether this helps fix it.

> While I don't have older logs to verify exactly when this started, it
> was fairly recent, perhaps around my 2.6.20.1 to 2.6.21.1 kernel
> upgrade.

Yes, this is probably around the time adma became the default.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Frequent SATA resets with sata_nv (fwd)
  2007-06-26 16:06   ` Matthew "Cheetah" Gabeler-Lee
@ 2007-06-26 17:29     ` Heinz Ulrich Stille
  0 siblings, 0 replies; 5+ messages in thread
From: Heinz Ulrich Stille @ 2007-06-26 17:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: ide

On Tuesday 26 June 2007, Matthew "Cheetah" Gabeler-Lee wrote:
> On Sun, 24 Jun 2007, Robert Hancock wrote:
> I selected that model in part because it at least claimed to support
> NCQ.  I'm not sure if it reflects NCQ or not, but I do have
> /sys/block/sd[abc]/device/queue_{depth,type} which show simple and 31
> respectively.  Samsung's page for the drive also says it supports NCQ:

I seem to have the same problem, just not as frequent: four drives of this
type, connected to an SiI 3132 and an nVidia MCP55; resets occur on both
controllers, and only with this drive type, not with several Maxtor 6B200M0.
I never thought to check, but all drives on the nv controller have NCQ
disabled according to /sys/block. So maybe it's not (directly) NCQ related?

MfG, Ulrich

-- 
Heinz Ulrich Stille / Tel.: +49-541-9400473 / Fax: +49-541-9400450
design_d gmbh / Wilhelmstr. 16 / 49076 Osnabrück / www.design-d.de
Osnabrück HRB 19116 / Geschäftsführer: Günter Tammen, Rolf Tammen


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Frequent SATA resets with sata_nv (fwd)
  2007-06-24 19:53 ` Robert Hancock
@ 2007-06-26 16:06   ` Matthew "Cheetah" Gabeler-Lee
  2007-06-26 17:29     ` Heinz Ulrich Stille
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew "Cheetah" Gabeler-Lee @ 2007-06-26 16:06 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel, ide

On Sun, 24 Jun 2007, Robert Hancock wrote:

> Matthew "Cheetah" Gabeler-Lee wrote:
> > (Please cc me on replies)
> > 
> > I have three samsung hdds (/sys/block/sda/device/model says SAMSUNG SP2504C)
> > in a raid configuration.  My system frequently (2-3x/day) experiences
> > temporary lockups, which produce messages as below in my dmesg/syslog.  The
> > system recovers, but the hang is annoying to say the least.
> 
> Does this drive actually support NCQ? I can't tell from this part of the  log.

I selected that model in part because it at least claimed to support 
NCQ.  I'm not sure if it reflects NCQ or not, but I do have 
/sys/block/sd[abc]/device/queue_{depth,type} which show simple and 31 
respectively.  Samsung's page for the drive also says it supports NCQ: 
http://www.samsung.com/Products/HardDiskDrive/SpinPointPSeries/HardDiskDrive_SpinPointPSeries_SP2504C.htm

-- 
	-Cheetah
"Reality is that which, when you stop believing in it, doesn't go away".
                -- Philip K. Dick
GPG pubkey fingerprint: A57F B354 FD30 A502 795B 9637 3EF1 3F22 A85E 2AD1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Frequent SATA resets with sata_nv (fwd)
       [not found] <fa.Z5bo5y2TvW4efa6slAJpcdPwqAA@ifi.uio.no>
@ 2007-06-24 19:53 ` Robert Hancock
  2007-06-26 16:06   ` Matthew "Cheetah" Gabeler-Lee
  0 siblings, 1 reply; 5+ messages in thread
From: Robert Hancock @ 2007-06-24 19:53 UTC (permalink / raw)
  To: Matthew "Cheetah" Gabeler-Lee; +Cc: linux-kernel, ide

(ccing linux-ide)

Matthew "Cheetah" Gabeler-Lee wrote:
> (Please cc me on replies)
> 
> I have three samsung hdds (/sys/block/sda/device/model says SAMSUNG 
> SP2504C) in a raid configuration.  My system frequently (2-3x/day) 
> experiences temporary lockups, which produce messages as below in my 
> dmesg/syslog.  The system recovers, but the hang is annoying to say the 
> least.
> 
> All three drives are connected to sata_nv ports.  Oddly, it almost 
> always happens on ata6 or ata7 (the second and third ports of that 4 
> port setup on my motherboard).  There is an identical drive connected at 
> ata5, but I've only once or twice seen it hit that drive.
> 
> Googling around lkml.org, I found a few threads investigating what look 
> like very similar problems, some of which never seemed to find the 
> solution, but one of which came up with a fairly quick answer it seemed, 
> namely that the drive's NCQ implementation was horked: 
> http://lkml.org/lkml/2007/4/18/32
> 
> While I don't have older logs to verify exactly when this started, it 
> was fairly recent, perhaps around my 2.6.20.1 to 2.6.21.1 kernel 
> upgrade.
> 
> Any other info or tests I can provide/run to help?
> 
> Syslog snippet:
> Jun 21 10:35:23 cheetah kernel: ata6: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0 next cpb idx 0x0
> Jun 21 10:35:24 cheetah kernel: ata6: CPB 0: ctl_flags 0x9, resp_flags 0x0
> Jun 21 10:35:24 cheetah kernel: ata6: timeout waiting for ADMA IDLE, stat=0x400
> Jun 21 10:35:24 cheetah kernel: ata6: timeout waiting for ADMA LEGACY, stat=0x400
> Jun 21 10:35:24 cheetah kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> Jun 21 10:35:24 cheetah kernel: ata6.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
> Jun 21 10:35:24 cheetah kernel:          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> Jun 21 10:35:24 cheetah kernel: ata6: soft resetting port
> Jun 21 10:35:24 cheetah kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> Jun 21 10:35:24 cheetah kernel: ata6.00: configured for UDMA/133
> Jun 21 10:35:24 cheetah kernel: ata6: EH complete
> Jun 21 10:35:24 cheetah kernel: SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
> Jun 21 10:35:24 cheetah kernel: sdb: Write Protect is off
> Jun 21 10:35:24 cheetah kernel: sdb: Mode Sense: 00 3a 00 00
> Jun 21 10:35:24 cheetah kernel: SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Unfortunately, this kind of problem is rather difficult to diagnose. 
Essentially what's happened is that we've sent a command (in this case a 
cache flush) to the controller but it's given no indication that it's 
done anything with it (somewhat different from the case in the link you 
mentioned above, where the controller indicates it's sent the command 
and is waiting for completion). This could be some kind of drive issue 
or drive/controller incompatibility, a controller bug, the driver doing 
something the controller doesn't expect..

Does this drive actually support NCQ? I can't tell from this part of the 
  log.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-06-26 17:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-24  0:52 Frequent SATA resets with sata_nv (fwd) Matthew "Cheetah" Gabeler-Lee
2007-06-24 17:09 ` Alistair John Strachan
     [not found] <fa.Z5bo5y2TvW4efa6slAJpcdPwqAA@ifi.uio.no>
2007-06-24 19:53 ` Robert Hancock
2007-06-26 16:06   ` Matthew "Cheetah" Gabeler-Lee
2007-06-26 17:29     ` Heinz Ulrich Stille

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).