* Re: SMART causes disks to go offline on an LSI SAS 1068 controller
@ 2009-09-14 22:34 Richard Scobie
2009-09-15 7:25 ` Gabor Gombas
0 siblings, 1 reply; 4+ messages in thread
From: Richard Scobie @ 2009-09-14 22:34 UTC (permalink / raw)
To: linux-scsi
Gabor Gombas wrote:
> I'm having problems when using smartmontools with SATA disks behind an
> LSI SAS controller. The machine is a Dell PowerEdge 1950-II, the
> controller in question:
<snip>
> Is this a kernel bug (2.6.22 at least did not drop the disks), or a
bug in smartmontools?
It is possibly a smartmontools build issue - it does not like -pie or
-fpie compiler flags.
See: https://bugzilla.redhat.com/show_bug.cgi?id=452389
and I reported it on the smartmontools list, but there was no resolution
that I am aware of.
Regards,
Richard
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: SMART causes disks to go offline on an LSI SAS 1068 controller
2009-09-14 22:34 SMART causes disks to go offline on an LSI SAS 1068 controller Richard Scobie
@ 2009-09-15 7:25 ` Gabor Gombas
2009-09-23 21:39 ` Pim Zandbergen
0 siblings, 1 reply; 4+ messages in thread
From: Gabor Gombas @ 2009-09-15 7:25 UTC (permalink / raw)
To: linux-scsi; +Cc: Richard Scobie
Hi!
(Please Cc: me, I'm not subscribed)
> It is possibly a smartmontools build issue - it does not like -pie or
> -fpie compiler flags.
>
> See: https://bugzilla.redhat.com/show_bug.cgi?id=452389
>
> and I reported it on the smartmontools list, but there was no resolution
> that I am aware of.
I've seen that report but according to the buildd logs at
https://buildd.debian.org/fetch.cgi?pkg=smartmontools;ver=5.38%2Bsvn2879-4;arch=amd64;stamp=1252352014
Debian does not use -pie or -fpie.
Also, the 2.6.22 kernel did not drop the disks using the same smart
binaries (version 5.38 at that time), so something must have changed on
the kernel side too.
Gabor
--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: SMART causes disks to go offline on an LSI SAS 1068 controller
2009-09-15 7:25 ` Gabor Gombas
@ 2009-09-23 21:39 ` Pim Zandbergen
0 siblings, 0 replies; 4+ messages in thread
From: Pim Zandbergen @ 2009-09-23 21:39 UTC (permalink / raw)
To: Gabor Gombas; +Cc: linux-scsi, Richard Scobie
I have had this problem on various machines running Fedora 11 2.6.29
kernels.
It went away recently with kernel-2.6.30.5-43.fc11.x86_64
Pim
^ permalink raw reply [flat|nested] 4+ messages in thread
* SMART causes disks to go offline on an LSI SAS 1068 controller
@ 2009-09-14 14:29 Gabor Gombas
0 siblings, 0 replies; 4+ messages in thread
From: Gabor Gombas @ 2009-09-14 14:29 UTC (permalink / raw)
To: smartmontools-support, linux-scsi
Hi,
I'm having problems when using smartmontools with SATA disks behind an
LSI SAS controller. The machine is a Dell PowerEdge 1950-II, the
controller in question:
02:08.0 SCSI storage controller [0100]: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS [1000:0054] (rev 01)
Subsystem: Dell SAS 5/i Integrated Controller [1028:1f06]
Flags: bus master, 66MHz, medium devsel, latency 72, IRQ 1270
I/O ports at ec00 [disabled] [size=256]
Memory at fc8fc000 (64-bit, non-prefetchable) [size=16K]
Memory at fc8e0000 (64-bit, non-prefetchable) [size=64K]
Expansion ROM at fc900000 [disabled] [size=1M]
Capabilities: [50] Power Management version 2
Capabilities: [98] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
Capabilities: [68] PCI-X non-bridge device
Capabilities: [b0] MSI-X: Enable- Mask- TabSize=1
Kernel driver in use: mptsas
Kernel modules: mptsas
History:
- The machine was running with kernel 2.6.22 and smartmontools 5.37 &
5.38 (from Debian) for a long time. smartd occassionally complained
about "Device: /dev/sdX, not capable of SMART self-check", but other
than that the machine was stable. smartd configuration:
/dev/sda -d sat -a -s (L/../../4/03|S/../.././02|O/../../6/03) -m root -I 190 -I 194
/dev/sdb -d sat -a -s (L/../../4/03|S/../.././02|O/../../6/03) -m root -I 190 -I 194
sda is a Samsung HD160JJ, sdb is a Seagate ST3160812AS (oh well).
- After switching to 2.6.26 (from Debian Lenny), running smartd started
to cause the disks to go offline in a couple of hours after boot. Log
sample:
Sep 7 08:50:36 gw kernel: [4917120.304690] mptscsih: ioc0: attempting task abort! (sc=ffff81007ff26940)
Sep 7 08:50:36 gw kernel: [4917120.304690] sd 0:0:1:0: [sdb] CDB: ATA command pass through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00
Sep 7 08:50:40 gw kernel: [4917126.213130] mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Sep 7 08:50:40 gw kernel: [4917126.215970] mptsas: ioc0: removing sata device, channel 0, id 1, phy 1
Sep 7 08:50:40 gw kernel: [4917126.215974] port-0:1: mptsas: ioc0: delete port (1)
Sep 7 08:50:40 gw kernel: [4917126.216570] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
Sep 7 08:50:40 gw kernel: [4917126.563597] mptscsih: ioc0: task abort: SUCCESS (sc=ffff81007ff26940)
Sep 7 08:50:40 gw kernel: [4917126.563606] mptscsih: ioc0: attempting task abort! (sc=ffff81007ff26bc0)
Sep 7 08:50:40 gw kernel: [4917126.563609] sd 0:0:1:0: [sdb] CDB: Write(10): 2a 00 01 49 f2 98 00 00 08 00
Sep 7 08:50:40 gw kernel: [4917126.563617] mptscsih: ioc0: task abort: SUCCESS (sc=ffff81007ff26bc0)
Sep 7 08:50:40 gw kernel: [4917126.563623] mptscsih: ioc0: attempting target reset! (sc=ffff81007ff26940)
Sep 7 08:50:40 gw kernel: [4917126.563625] sd 0:0:1:0: [sdb] CDB: ATA command pass through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00
Sep 7 08:50:40 gw kernel: [4917126.897143] mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007ff26940)
Sep 7 08:50:40 gw kernel: [4917126.897143] mptscsih: ioc0: attempting bus reset! (sc=ffff81007ff26940)
Sep 7 08:50:40 gw kernel: [4917126.897143] sd 0:0:1:0: [sdb] CDB: ATA command pass through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00
Sep 7 08:50:44 gw kernel: [4917131.074580] mptscsih: ioc0: bus reset: SUCCESS (sc=ffff81007ff26940)
Sep 7 08:50:54 gw kernel: [4917145.159523] mptscsih: ioc0: attempting host reset! (sc=ffff81007ff26940)
Sep 7 08:50:54 gw kernel: [4917145.163513] mptbase: ioc0: Initiating recovery
Sep 7 08:51:10 gw kernel: [4917167.457273] mptscsih: ioc0: host reset: SUCCESS (sc=ffff81007ff26940)
Sep 7 08:51:10 gw kernel: [4917167.457279] sd 0:0:1:0: Device offlined - not ready after error recovery
Sep 7 08:51:10 gw kernel: [4917167.457282] sd 0:0:1:0: Device offlined - not ready after error recovery
Sep 7 08:51:10 gw kernel: [4917167.457350] sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Sep 7 08:51:10 gw kernel: [4917167.457357] end_request: I/O error, dev sdb, sector 21623448
Sep 7 08:51:10 gw kernel: [4917167.457364] raid1: Disk failure on sdb6, disabling device.
Sep 7 08:51:10 gw kernel: [4917167.457365] raid1: Operation continuing on 1 devices.
Sep 7 08:51:10 gw kernel: [4917167.457388] end_request: I/O error, dev sdb, sector 1959743
Sep 7 08:51:10 gw kernel: [4917167.457393] md: super_written gets error=-5, uptodate=0
Sep 7 08:51:10 gw kernel: [4917167.457398] raid1: Disk failure on sdb1, disabling device.
Sep 7 08:51:22 gw kernel: [4917167.457399] raid1: Operation continuing on 1 devices.
Sep 7 08:51:22 gw kernel: [4917167.457411] end_request: I/O error, dev sdb, sector 21478687
Sep 7 08:51:22 gw kernel: [4917167.457415] md: super_written gets error=-5, uptodate=0
Sep 7 08:51:22 gw kernel: [4917167.457420] raid1: Disk failure on sdb5, disabling device.
Sep 7 08:51:22 gw kernel: [4917167.457421] raid1: Operation continuing on 1 devices.
Sep 7 08:51:22 gw kernel: [4917167.461613] sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Sep 7 08:51:22 gw kernel: [4917167.526799] raid1: Disk failure on sdb2, disabling device.
Sep 7 08:51:22 gw kernel: [4917167.526801] raid1: Operation continuing on 1 devices.
After such an error I have to manually remove and re-insert the drive
to make the controller detect it again.
- Upgrading to 2.6.30 (from Debian Sid) did not help.
- Upgrading the controller firmware to the latest version available from
Dell (the driver reports: FwRev=000a3300h) did not help.
- I've found this thread:
http://marc.info/?l=smartmontools-support&m=122518510306493&w=2
It claimed that a similar bug has been fixed in smartd in CVS HEAD as
of 2008-10-30, so I've upgraded to smartmontools 5.38+svn2879-4 from
Debian Sid (smartctl -V gives: smartctl 5.39 2009-08-29 r2879), but
that also did not help.
Is this a kernel bug (2.6.22 at least did not drop the disks), or a bug
in smartmontools?
Gabor
--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-09-23 21:46 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-14 22:34 SMART causes disks to go offline on an LSI SAS 1068 controller Richard Scobie
2009-09-15 7:25 ` Gabor Gombas
2009-09-23 21:39 ` Pim Zandbergen
-- strict thread matches above, loose matches on Subject: below --
2009-09-14 14:29 Gabor Gombas
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.