Re: SMART causes disks to go offline on an LSI SAS 1068 controller

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: SMART causes disks to go offline on an LSI SAS 1068 controller
@ 2009-09-14 22:34 Richard Scobie
  2009-09-15  7:25 ` Gabor Gombas
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Scobie @ 2009-09-14 22:34 UTC (permalink / raw)
  To: linux-scsi

Gabor Gombas wrote:

 > I'm having problems when using smartmontools with SATA disks behind an
 > LSI SAS controller. The machine is a Dell PowerEdge 1950-II, the
 > controller in question:

<snip>

 > Is this a kernel bug (2.6.22 at least did not drop the disks), or a 
bug in smartmontools?

It is possibly a smartmontools build issue - it does not like -pie or 
-fpie compiler flags.

See: https://bugzilla.redhat.com/show_bug.cgi?id=452389

and I reported it on the smartmontools list, but there was no resolution 
that I am aware of.

Regards,

Richard

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SMART causes disks to go offline on an LSI SAS 1068 controller
  2009-09-14 22:34 SMART causes disks to go offline on an LSI SAS 1068 controller Richard Scobie
@ 2009-09-15  7:25 ` Gabor Gombas
  2009-09-23 21:39   ` Pim Zandbergen
  0 siblings, 1 reply; 4+ messages in thread
From: Gabor Gombas @ 2009-09-15  7:25 UTC (permalink / raw)
  To: linux-scsi; +Cc: Richard Scobie

Hi!

(Please Cc: me, I'm not subscribed)

> It is possibly a smartmontools build issue - it does not like -pie or
> -fpie compiler flags.
> 
> See: https://bugzilla.redhat.com/show_bug.cgi?id=452389
>
> and I reported it on the smartmontools list, but there was no resolution
> that I am aware of.

I've seen that report but according to the buildd logs at
https://buildd.debian.org/fetch.cgi?pkg=smartmontools;ver=5.38%2Bsvn2879-4;arch=amd64;stamp=1252352014
Debian does not use -pie or -fpie.

Also, the 2.6.22 kernel did not drop the disks using the same smart
binaries (version 5.38 at that time), so something must have changed on
the kernel side too.

Gabor

-- 
     ---------------------------------------------------------
     MTA SZTAKI Computer and Automation Research Institute
                Hungarian Academy of Sciences
     ---------------------------------------------------------

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SMART causes disks to go offline on an LSI SAS 1068 controller
  2009-09-15  7:25 ` Gabor Gombas
@ 2009-09-23 21:39   ` Pim Zandbergen
  0 siblings, 0 replies; 4+ messages in thread
From: Pim Zandbergen @ 2009-09-23 21:39 UTC (permalink / raw)
  To: Gabor Gombas; +Cc: linux-scsi, Richard Scobie

I have had this problem on various machines running Fedora 11 2.6.29 
kernels.

It went away recently with kernel-2.6.30.5-43.fc11.x86_64

Pim


^ permalink raw reply	[flat|nested] 4+ messages in thread

* SMART causes disks to go offline on an LSI SAS 1068 controller
@ 2009-09-14 14:29 Gabor Gombas
  0 siblings, 0 replies; 4+ messages in thread
From: Gabor Gombas @ 2009-09-14 14:29 UTC (permalink / raw)
  To: smartmontools-support, linux-scsi

Hi,

I'm having problems when using smartmontools with SATA disks behind an
LSI SAS controller. The machine is a Dell PowerEdge 1950-II, the
controller in question:

02:08.0 SCSI storage controller [0100]: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS [1000:0054] (rev 01)
        Subsystem: Dell SAS 5/i Integrated Controller [1028:1f06]
        Flags: bus master, 66MHz, medium devsel, latency 72, IRQ 1270
        I/O ports at ec00 [disabled] [size=256]
        Memory at fc8fc000 (64-bit, non-prefetchable) [size=16K]
        Memory at fc8e0000 (64-bit, non-prefetchable) [size=64K]
        Expansion ROM at fc900000 [disabled] [size=1M]
        Capabilities: [50] Power Management version 2
        Capabilities: [98] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
        Capabilities: [68] PCI-X non-bridge device
        Capabilities: [b0] MSI-X: Enable- Mask- TabSize=1
        Kernel driver in use: mptsas
        Kernel modules: mptsas

History:

- The machine was running with kernel 2.6.22 and smartmontools 5.37 &
  5.38 (from Debian) for a long time. smartd occassionally complained
  about "Device: /dev/sdX, not capable of SMART self-check", but other
  than that the machine was stable. smartd configuration:

  /dev/sda -d sat -a -s (L/../../4/03|S/../.././02|O/../../6/03) -m root -I 190 -I 194
  /dev/sdb -d sat -a -s (L/../../4/03|S/../.././02|O/../../6/03) -m root -I 190 -I 194

  sda is a Samsung HD160JJ, sdb is a Seagate ST3160812AS (oh well).

- After switching to 2.6.26 (from Debian Lenny), running smartd started
  to cause the disks to go offline in a couple of hours after boot. Log
  sample:

Sep  7 08:50:36 gw kernel: [4917120.304690] mptscsih: ioc0: attempting task abort! (sc=ffff81007ff26940)
Sep  7 08:50:36 gw kernel: [4917120.304690] sd 0:0:1:0: [sdb] CDB: ATA command pass through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00
Sep  7 08:50:40 gw kernel: [4917126.213130] mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Sep  7 08:50:40 gw kernel: [4917126.215970] mptsas: ioc0: removing sata device, channel 0, id 1, phy 1
Sep  7 08:50:40 gw kernel: [4917126.215974]  port-0:1: mptsas: ioc0: delete port (1)
Sep  7 08:50:40 gw kernel: [4917126.216570] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
Sep  7 08:50:40 gw kernel: [4917126.563597] mptscsih: ioc0: task abort: SUCCESS (sc=ffff81007ff26940)
Sep  7 08:50:40 gw kernel: [4917126.563606] mptscsih: ioc0: attempting task abort! (sc=ffff81007ff26bc0)
Sep  7 08:50:40 gw kernel: [4917126.563609] sd 0:0:1:0: [sdb] CDB: Write(10): 2a 00 01 49 f2 98 00 00 08 00
Sep  7 08:50:40 gw kernel: [4917126.563617] mptscsih: ioc0: task abort: SUCCESS (sc=ffff81007ff26bc0)
Sep  7 08:50:40 gw kernel: [4917126.563623] mptscsih: ioc0: attempting target reset! (sc=ffff81007ff26940)
Sep  7 08:50:40 gw kernel: [4917126.563625] sd 0:0:1:0: [sdb] CDB: ATA command pass through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00
Sep  7 08:50:40 gw kernel: [4917126.897143] mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007ff26940)
Sep  7 08:50:40 gw kernel: [4917126.897143] mptscsih: ioc0: attempting bus reset! (sc=ffff81007ff26940)
Sep  7 08:50:40 gw kernel: [4917126.897143] sd 0:0:1:0: [sdb] CDB: ATA command pass through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00
Sep  7 08:50:44 gw kernel: [4917131.074580] mptscsih: ioc0: bus reset: SUCCESS (sc=ffff81007ff26940)
Sep  7 08:50:54 gw kernel: [4917145.159523] mptscsih: ioc0: attempting host reset! (sc=ffff81007ff26940)
Sep  7 08:50:54 gw kernel: [4917145.163513] mptbase: ioc0: Initiating recovery
Sep  7 08:51:10 gw kernel: [4917167.457273] mptscsih: ioc0: host reset: SUCCESS (sc=ffff81007ff26940)
Sep  7 08:51:10 gw kernel: [4917167.457279] sd 0:0:1:0: Device offlined - not ready after error recovery
Sep  7 08:51:10 gw kernel: [4917167.457282] sd 0:0:1:0: Device offlined - not ready after error recovery
Sep  7 08:51:10 gw kernel: [4917167.457350] sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Sep  7 08:51:10 gw kernel: [4917167.457357] end_request: I/O error, dev sdb, sector 21623448
Sep  7 08:51:10 gw kernel: [4917167.457364] raid1: Disk failure on sdb6, disabling device.
Sep  7 08:51:10 gw kernel: [4917167.457365] raid1: Operation continuing on 1 devices.
Sep  7 08:51:10 gw kernel: [4917167.457388] end_request: I/O error, dev sdb, sector 1959743
Sep  7 08:51:10 gw kernel: [4917167.457393] md: super_written gets error=-5, uptodate=0
Sep  7 08:51:10 gw kernel: [4917167.457398] raid1: Disk failure on sdb1, disabling device.
Sep  7 08:51:22 gw kernel: [4917167.457399] raid1: Operation continuing on 1 devices.
Sep  7 08:51:22 gw kernel: [4917167.457411] end_request: I/O error, dev sdb, sector 21478687
Sep  7 08:51:22 gw kernel: [4917167.457415] md: super_written gets error=-5, uptodate=0
Sep  7 08:51:22 gw kernel: [4917167.457420] raid1: Disk failure on sdb5, disabling device.
Sep  7 08:51:22 gw kernel: [4917167.457421] raid1: Operation continuing on 1 devices.
Sep  7 08:51:22 gw kernel: [4917167.461613] sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Sep  7 08:51:22 gw kernel: [4917167.526799] raid1: Disk failure on sdb2, disabling device.
Sep  7 08:51:22 gw kernel: [4917167.526801] raid1: Operation continuing on 1 devices.

  After such an error I have to manually remove and re-insert the drive
  to make the controller detect it again.
 
- Upgrading to 2.6.30 (from Debian Sid) did not help.

- Upgrading the controller firmware to the latest version available from
  Dell (the driver reports: FwRev=000a3300h) did not help.

- I've found this thread:
  http://marc.info/?l=smartmontools-support&m=122518510306493&w=2

  It claimed that a similar bug has been fixed in smartd in CVS HEAD as
  of 2008-10-30, so I've upgraded to smartmontools 5.38+svn2879-4 from
  Debian Sid (smartctl -V gives: smartctl 5.39 2009-08-29 r2879), but
  that also did not help.

Is this a kernel bug (2.6.22 at least did not drop the disks), or a bug
in smartmontools?

Gabor

-- 
     ---------------------------------------------------------
     MTA SZTAKI Computer and Automation Research Institute
                Hungarian Academy of Sciences
     ---------------------------------------------------------

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-09-23 21:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-14 22:34 SMART causes disks to go offline on an LSI SAS 1068 controller Richard Scobie
2009-09-15  7:25 ` Gabor Gombas
2009-09-23 21:39   ` Pim Zandbergen
  -- strict thread matches above, loose matches on Subject: below --
2009-09-14 14:29 Gabor Gombas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.