linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
       [not found]         ` <48E6DE07.70706@gmail.com>
@ 2008-10-07  0:37           ` Linda Walsh
  2008-10-07  1:08             ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Linda Walsh @ 2008-10-07  0:37 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Smartmontools Mailing List, Bruce Allen, LKML

Ok, this is my "latest" theory about why my SATA disks have been acting
strange.

Normally I have the drives set to go into standby after 30 minutes of
inactivity. This "can" work -- unless (and this may be obvious to some
people, but it's not entirely intuitive) ...unless you query the drive's
temperature with smartctl periodically.

So..._using_ the "-n standby" on  smartctl  doesn't have an effect unless
the drive is already on standby -- but if it is *not* on standby, then
it counts as drive activity and resets the "goto sleep timer".  This
isn't  the worst problem -- more of an annoyance.  I didn't try to keep
track of all the drives' temperatures until I started having the 2nd
problem which is decidedly "nastier"...

Second problem -- if a drive is in standby, then if  smartctl  or
smartd  try to run the short or long self-tests, the kernel starts
issuing time-out errors, and the drive is eventually, _logically_
removed from the system.  It never comes back from standby.

If I *access* the drive (do an 'ls' of a directory on the drive that
isn't in the cache buffers), then after a ~20 second pause, the drive
has spun up and all is good.  But, for some reason, the "smart" test
functionality isn't causing the drive to wake up.  Instead the kernel
views the drive as OTL (OutToLunch) and removes it from the device
table.  This is, IMO, the more serious problem and is a regression
compared to PATA disk functionality.

The bit of periodically checking temps resetting the activity timer --
that isn't something I normally was trying to do -- I only started that
to try to debug why the drives were going offline (didn't know if temps
were related, among other reasons).  But in the process of checking the
temps, I was also (I am guessing about the functionality based on
observation) resetting the inactivity timer.

So the real problem is why issuing a smart command isn't re-starting
the drive -- or bringing it back from standby.  Whereas a "normal" disk
read seems to bring it back to normal functioning just fine (and can
then do the smart-test).

Does this give anyone ideas about where the problem might be?  Also
sorta explains why my hangs have been infrequent, because I've been
periodically polling the temps of all the drives -- and only when I stop
the polling would the drive timeout, then die the next morning when
smartd tried to run a short test between 1 and 2 am.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
  2008-10-07  0:37           ` [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd) Linda Walsh
@ 2008-10-07  1:08             ` Tejun Heo
  2008-10-07  1:36               ` Linda Walsh
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2008-10-07  1:08 UTC (permalink / raw)
  To: Linda Walsh; +Cc: Smartmontools Mailing List, Bruce Allen, LKML

Linda Walsh wrote:
> So the real problem is why issuing a smart command isn't re-starting
> the drive -- or bringing it back from standby.  Whereas a "normal" disk
> read seems to bring it back to normal functioning just fine (and can
> then do the smart-test).
> 
> Does this give anyone ideas about where the problem might be?  Also
> sorta explains why my hangs have been infrequent, because I've been
> periodically polling the temps of all the drives -- and only when I stop
> the polling would the drive timeout, then die the next morning when
> smartd tried to run a short test between 1 and 2 am.

Sounds like a firmware problem to me.  Issuing ATA_CMD_VERIFY on block
0 before issuing test commands should work around the problem.  Also,
which controller are you using?  Can you post the failing kernel log?

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
  2008-10-07  1:08             ` Tejun Heo
@ 2008-10-07  1:36               ` Linda Walsh
  2008-10-07  1:42                 ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Linda Walsh @ 2008-10-07  1:36 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Smartmontools Mailing List, Bruce Allen, LKML

Controller is a Promise TX4/300
Is this what you were looking for?:

Oct  6 16:59:14 ish kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 
0x0 action 0x6 frozen
Oct  6 16:59:14 ish kernel: ata2.00: cmd 
b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0
Oct  6 16:59:14 ish kernel:          res 
40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct  6 16:59:14 ish kernel: ata2.00: status: { DRDY }
Oct  6 16:59:20 ish kernel: ata2: link is slow to respond, please be 
patient (ready=-19)
Oct  6 16:59:24 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 16:59:30 ish kernel: ata2: link is slow to respond, please be 
patient (ready=-19)
Oct  6 16:59:34 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 16:59:40 ish kernel: ata2: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:00:09 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 17:00:09 ish kernel: ata2: limiting SATA link speed to 1.5 Gbps
Oct  6 17:00:14 ish dhcpd: Forward map from ns1.sc.tlinx.org to 
192.168.3.242 FAILED: Has an A record but no DHCID, not mine.
Oct  6 17:00:15 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 17:00:15 ish kernel: ata2: reset failed, giving up
Oct  6 17:00:15 ish kernel: ata2.00: disabled
Oct  6 17:00:15 ish kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 
action 0xe frozen t4
Oct  6 17:00:15 ish kernel: ata2: hotplug_status 0x22
Oct  6 17:00:20 ish kernel: ata2: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:00:25 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 17:00:30 ish kernel: ata2: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:00:35 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 17:00:40 ish kernel: ata2: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:01:10 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 17:01:10 ish kernel: ata2: limiting SATA link speed to 1.5 Gbps
Oct  6 17:01:15 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 17:01:15 ish kernel: ata2: reset failed, giving up
Oct  6 17:01:15 ish kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 
action 0xe frozen t3
Oct  6 17:01:15 ish kernel: ata2: hotplug_status 0x22
Oct  6 17:01:20 ish kernel: ata2: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:01:25 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 17:01:30 ish kernel: ata2: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:01:35 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 17:01:40 ish kernel: ata2: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:02:10 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 17:02:10 ish kernel: ata2: limiting SATA link speed to 1.5 Gbps
Oct  6 17:02:15 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 17:02:15 ish kernel: ata2: reset failed, giving up
Oct  6 17:02:15 ish kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 
action 0xe frozen t2
Oct  6 17:02:15 ish kernel: ata2: hotplug_status 0x22
Oct  6 17:02:21 ish kernel: ata2: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:02:25 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 17:02:31 ish kernel: ata2: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:02:35 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 17:02:41 ish kernel: ata2: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:03:01 ish sshd[4020]: error: channel 0: chan_read_failed for 
istate 3
Oct  6 17:03:10 ish syslog-ng[13177]: last message repeated 2 times
Oct  6 17:03:10 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 17:03:10 ish kernel: ata2: limiting SATA link speed to 1.5 Gbps
Oct  6 17:03:15 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 17:03:15 ish kernel: ata2: reset failed, giving up
Oct  6 17:03:15 ish kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 
action 0xe frozen t1
Oct  6 17:03:15 ish kernel: ata2: hotplug_status 0x22
Oct  6 17:03:21 ish kernel: ata2: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:03:25 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 17:03:31 ish kernel: ata2: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:03:35 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 17:03:41 ish kernel: ata2: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:04:10 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 17:04:10 ish kernel: ata2: limiting SATA link speed to 1.5 Gbps
Oct  6 17:04:15 ish kernel: ata2: COMRESET failed (errno=-16)
Oct  6 17:04:15 ish kernel: ata2: reset failed, giving up
Oct  6 17:04:15 ish kernel: ata2: EH pending after 5 tries, giving up
Oct  6 17:04:15 ish kernel: sd 2:0:0:0: rejecting I/O to offline device
Oct  6 17:04:15 ish kernel: program smartctl is using a deprecated SCSI 
ioctl, please convert it to SG_IO
Oct  6 17:04:15 ish kernel: sd 2:0:0:0: [sdc] START_STOP FAILED
Oct  6 17:04:33 ish kernel: Filesystem "sdc1": xfs_log_force: error 5 
returned.
Oct  6 17:04:33 ish kernel: Filesystem "sdc1": xfs_log_force: error 5 
returned.
Oct  6 17:05:45 ish kernel: Filesystem "sdc1": xfs_log_force: error 5 
returned.
Oct  6 17:05:45 ish kernel: Filesystem "sdc1": xfs_log_force: error 5 
returned.
Oct  6 17:06:31 ish kernel: Filesystem "sdc1": xfs_log_force: error 5 
returned.
Oct  6 17:07:30 ish syslog-ng[13177]: last message repeated 2 times
Oct  6 17:07:33 ish kernel: Filesystem "sdc1": xfs_log_force: error 5 
returned.
Oct  6 17:07:33 ish kernel: Filesystem "sdc1": xfs_log_force: error 5 
returned.
Oct  6 17:08:32 ish kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 
0x0 action 0x6 frozen
Oct  6 17:08:32 ish kernel: ata1.00: cmd 
b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0
Oct  6 17:08:32 ish kernel:          res 
40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct  6 17:08:32 ish kernel: ata1.00: status: { DRDY }
Oct  6 17:08:38 ish kernel: ata1: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:08:42 ish kernel: ata1: COMRESET failed (errno=-16)
Oct  6 17:08:45 ish kernel: Filesystem "sdc1": xfs_log_force: error 5 
returned.
Oct  6 17:08:48 ish kernel: ata1: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:08:52 ish kernel: ata1: COMRESET failed (errno=-16)
Oct  6 17:08:58 ish kernel: ata1: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:09:21 ish kernel: Filesystem "sdc1": xfs_log_force: error 5 
returned.
Oct  6 17:09:27 ish kernel: ata1: COMRESET failed (errno=-16)
Oct  6 17:09:27 ish kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  6 17:09:32 ish kernel: ata1: COMRESET failed (errno=-16)
Oct  6 17:09:32 ish kernel: ata1: reset failed, giving up
Oct  6 17:09:32 ish kernel: ata1.00: disabled
Oct  6 17:09:32 ish kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 
action 0xe frozen t4
Oct  6 17:09:32 ish kernel: ata1: hotplug_status 0x88
Oct  6 17:09:38 ish kernel: ata1: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:09:42 ish kernel: ata1: COMRESET failed (errno=-16)
Oct  6 17:09:48 ish kernel: ata1: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:09:52 ish kernel: ata1: COMRESET failed (errno=-16)
Oct  6 17:09:57 ish kernel: Filesystem "sdc1": xfs_log_force: error 5 
returned.
Oct  6 17:09:58 ish kernel: ata1: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:10:27 ish kernel: ata1: COMRESET failed (errno=-16)
Oct  6 17:10:27 ish kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  6 17:10:32 ish kernel: ata1: COMRESET failed (errno=-16)
Oct  6 17:10:32 ish kernel: ata1: reset failed, giving up
Oct  6 17:10:32 ish kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 
action 0xe frozen t3
Oct  6 17:10:32 ish kernel: ata1: hotplug_status 0x88
Oct  6 17:10:33 ish kernel: Filesystem "sdc1": xfs_log_force: error 5 
returned.
Oct  6 17:10:38 ish kernel: ata1: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:10:42 ish kernel: ata1: COMRESET failed (errno=-16)
Oct  6 17:10:48 ish kernel: ata1: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:10:52 ish kernel: ata1: COMRESET failed (errno=-16)
Oct  6 17:10:58 ish kernel: ata1: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:11:09 ish kernel: Filesystem "sdc1": xfs_log_force: error 5 
returned.
Oct  6 17:11:27 ish kernel: ata1: COMRESET failed (errno=-16)
Oct  6 17:11:27 ish kernel: ata1: limiting SATA link speed to 1.5 Gbps
Oct  6 17:11:33 ish kernel: ata1: COMRESET failed (errno=-16)
Oct  6 17:11:33 ish kernel: ata1: reset failed, giving up
Oct  6 17:11:33 ish kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 
action 0xe frozen t2
Oct  6 17:11:33 ish kernel: ata1: hotplug_status 0x88
Oct  6 17:11:38 ish kernel: ata1: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:11:43 ish kernel: ata1: COMRESET failed (errno=-16)
Oct  6 17:11:45 ish kernel: Filesystem "sdc1": xfs_log_force: error 5 
returned.
Oct  6 17:11:48 ish kernel: ata1: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:11:53 ish kernel: ata1: COMRESET failed (errno=-16)
Oct  6 17:11:58 ish kernel: ata1: link is slow to respond, please be 
patient (ready=-19)
Oct  6 17:12:17 ish xinetd[2021]: Exiting...
Oct  6 17:12:17 ish kernel: nfsd: last server has exited
Oct  6 17:12:17 ish kernel: nfsd: unexporting all filesystems
Oct  6 17:12:17 ish apcupsd[1989]: apcupsd exiting, signal 15
Oct  6 17:12:17 ish apcupsd[1989]: apcupsd shutdown succeeded
Oct  6 17:12:17 ish rpc.statd[2074]: Caught signal 15, un-registering 
and exiting.
Oct  6 17:12:17 ish mountd[2075]: Caught signal 15, un-registering and 
exiting.
Oct  6 17:12:21 ish kernel: Filesystem "sdc1": xfs_log_force: error 5 
returned.
Tejun Heo wrote:
> Linda Walsh wrote:
>   
>> So the real problem is why issuing a smart command isn't re-starting
>> the drive -- or bringing it back from standby.  Whereas a "normal" disk
>> read seems to bring it back to normal functioning just fine (and can
>> then do the smart-test).
>>
>> Does this give anyone ideas about where the problem might be?  Also
>> sorta explains why my hangs have been infrequent, because I've been
>> periodically polling the temps of all the drives -- and only when I stop
>> the polling would the drive timeout, then die the next morning when
>> smartd tried to run a short test between 1 and 2 am.
>>     
>
> Sounds like a firmware problem to me.  Issuing ATA_CMD_VERIFY on block
> 0 before issuing test commands should work around the problem.  Also,
> which controller are you using?  Can you post the failing kernel log?
>
>   


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
  2008-10-07  1:36               ` Linda Walsh
@ 2008-10-07  1:42                 ` Tejun Heo
  2008-10-07 10:13                   ` Linda Walsh
  2008-10-07 22:27                   ` Linda Walsh
  0 siblings, 2 replies; 22+ messages in thread
From: Tejun Heo @ 2008-10-07  1:42 UTC (permalink / raw)
  To: Linda Walsh; +Cc: Smartmontools Mailing List, Bruce Allen, LKML

Linda Walsh wrote:
> Controller is a Promise TX4/300

Yeap.  After the drive goes offline, does unplugging and replugging
the power cable to the harddrive makes it come back?

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
  2008-10-07  1:42                 ` Tejun Heo
@ 2008-10-07 10:13                   ` Linda Walsh
  2008-10-07 22:27                   ` Linda Walsh
  1 sibling, 0 replies; 22+ messages in thread
From: Linda Walsh @ 2008-10-07 10:13 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Smartmontools Mailing List, Bruce Allen, LKML

Tejun Heo wrote:
> Linda Walsh wrote:
>   
>> Controller is a Promise TX4/300
>>     
>
> Yeap.  After the drive goes offline, does unplugging and replugging
> the power cable to the harddrive makes it come back?
>
>   
That's not easy to do.  It's an internal drive ...  will have to find 
some time
to take the system down and apart for that type of testing..

If I powercycle the whole machine it comes back up ...but that's 
probably not what you mean...:-/


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
  2008-10-07  1:42                 ` Tejun Heo
  2008-10-07 10:13                   ` Linda Walsh
@ 2008-10-07 22:27                   ` Linda Walsh
  2008-10-07 23:59                     ` Tejun Heo
  1 sibling, 1 reply; 22+ messages in thread
From: Linda Walsh @ 2008-10-07 22:27 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Smartmontools Mailing List, Bruce Allen, LKML

Tejun Heo wrote:
> Linda Walsh wrote:
>   
>> Controller is a Promise TX4/300
>> Yeap.  After the drive goes offline, does unplugging and replugging
>> the power cable to the harddrive makes it come back?
>>     
----
    No.  It hangs the computer. about 2-3 seconds after plugging the
drives back in.  Did it twice to verify it wasn't a fluke.  Verified
drives removed from /dev, then
plugged them back in -- was able to do about 1-2 ls commands on /dev, then
keyboard goes dead.

    First time I tried unplugging the power cables and replugging --
that hung...
2nd time tried unplugging a sata cable and replugging -- that hung too.

    Hopefully you won't need any more tests of this exact nature...? :-)




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
  2008-10-07 22:27                   ` Linda Walsh
@ 2008-10-07 23:59                     ` Tejun Heo
  2008-10-22  3:40                       ` Promise SATA-standby +selftest=hungdrive; Sil works Linda Walsh
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2008-10-07 23:59 UTC (permalink / raw)
  To: Linda Walsh; +Cc: Smartmontools Mailing List, Bruce Allen, LKML

Linda Walsh wrote:
> Tejun Heo wrote:
>> Linda Walsh wrote:
>>  
>>> Controller is a Promise TX4/300
>>> Yeap.  After the drive goes offline, does unplugging and replugging
>>> the power cable to the harddrive makes it come back?
>>>     
> ----
>    No.  It hangs the computer. about 2-3 seconds after plugging the
> drives back in.  Did it twice to verify it wasn't a fluke.  Verified
> drives removed from /dev, then
> plugged them back in -- was able to do about 1-2 ls commands on /dev, then
> keyboard goes dead.
> 
>    First time I tried unplugging the power cables and replugging --
> that hung...
> 2nd time tried unplugging a sata cable and replugging -- that hung too.

Ah.. okay, so the controller went bonkers then.  Any chance you can
shell out ~15 bucks and try a sil SATA controller?

>    Hopefully you won't need any more tests of this exact nature...? :-)

Wasn't it fun and empowering?  :-P

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* re: Promise SATA-standby +selftest=hungdrive; Sil works...
  2008-10-07 23:59                     ` Tejun Heo
@ 2008-10-22  3:40                       ` Linda Walsh
  2008-10-22  4:13                         ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Linda Walsh @ 2008-10-22  3:40 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Smartmontools Mailing List, Bruce Allen, LKML

This is with 2.6.26.5  (there are multiple other problems with 2.6.27[.0]).

The problem with the drive going "offline" doesn't happen with a
sil_sata(3124) controller -- so no need to unplug and replug...


I.e. when the drives are in standby, if smartd or a smartctl command
attempts to run a drive self-test (short), I get timeout errors from the
Promise controller (which hangs the sys if I try unplugging/replugging
the cable to the hung drive). 

The drives correctly spin up to speed and perform the short-test with
the sil controller.

It would seem there is a problem with the Promise controller or driver?


Tejun Heo wrote:
> Linda Walsh wrote:
>   
>> Tejun Heo wrote:
>>     
>>> Linda Walsh wrote:
>>>  
>>>       
>>>> Controller is a Promise TX4/300
>>>> Yeap.  After the drive goes offline, does unplugging and replugging
>>>> the power cable to the harddrive makes it come back?
>>>>     
>>>>         
>> ----
>>    No.  It hangs the computer. about 2-3 seconds after plugging the
>> drives back in.  Did it twice to verify it wasn't a fluke.  Verified
>> drives removed from /dev, then
>> plugged them back in -- was able to do about 1-2 ls commands on /dev, then
>> keyboard goes dead.
>>
>>    First time I tried unplugging the power cables and replugging --
>> that hung...
>> 2nd time tried unplugging a sata cable and replugging -- that hung too.
>>     
>
> Ah.. okay, so the controller went bonkers then.  Any chance you can
> shell out ~15 bucks and try a sil SATA controller?
>
>   
>>    Hopefully you won't need any more tests of this exact nature...? :-)
>>     
>
> Wasn't it fun and empowering?  :-P
>
>   


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Promise SATA-standby +selftest=hungdrive; Sil works...
  2008-10-22  3:40                       ` Promise SATA-standby +selftest=hungdrive; Sil works Linda Walsh
@ 2008-10-22  4:13                         ` Tejun Heo
  0 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2008-10-22  4:13 UTC (permalink / raw)
  To: Linda Walsh; +Cc: Smartmontools Mailing List, Bruce Allen, LKML

Linda Walsh wrote:
> This is with 2.6.26.5  (there are multiple other problems with 2.6.27[.0]).
> 
> The problem with the drive going "offline" doesn't happen with a
> sil_sata(3124) controller -- so no need to unplug and replug...
> 
> 
> I.e. when the drives are in standby, if smartd or a smartctl command
> attempts to run a drive self-test (short), I get timeout errors from the
> Promise controller (which hangs the sys if I try unplugging/replugging
> the cable to the hung drive).
> The drives correctly spin up to speed and perform the short-test with
> the sil controller.
> 
> It would seem there is a problem with the Promise controller or driver?

Yeah, Mikael found out that hardreset requires controller reset before
it.  Hopefully, it will be fixed soon.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
  2008-10-21  9:02               ` Tejun Heo
@ 2008-10-21  9:30                 ` Mikael Pettersson
  0 siblings, 0 replies; 22+ messages in thread
From: Mikael Pettersson @ 2008-10-21  9:30 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Mikael Pettersson, Christian Mueller, Bruce Allen,
	Smartmontools Mailing List, LKML, IDE/ATA development list

Tejun Heo writes:
 > Hello,
 > 
 > Mikael Pettersson wrote:
 > > Tejun Heo writes:
 > >  > Hello, Mikael.
 > >  > I would put this into ->hardreset itself as the controller can also
 > >  > get out of sync with reality during reset.
 > > 
 > > The only thing I see going on between prereset and (hard/soft)reset
 > > is an optional freeze, so I don't see why moving the pdc_reset_port()
 > > into the beginning of hardreset() would make any difference.
 > > 
 > > sata_promise currently uses the ->hardreset and ->softreset inherited
 > > from ata_sff_port_ops, so it would need to override both to ensure that
 > > we always do pdc_reset_port() before libata does its thing. That's why
 > > I felt doing that in ->prereset would be the right solution.
 > 
 > Hmm.. reset sequence goes on like the following.
 > 
 >  1. prereset
 >  2. hardreset, if fail, retry
 >  3. follow-up softreset if requested, if fail, goto #2
 >  4. postreset, if successful
 > 
 > So, if some PHY event happens while the reset is waiting for device
 > readiness and makes the controller state go out of sync with the
 > drive.  ->prereset() will NOT be called for the following retry.

I see. Ok, then I'll forget about ->prereset and bind ->hardreset to
code which does pdc_reset_port() before invoking sata_sff_hardreset().

Thanks for your input.

/Mikael

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
  2008-10-21  7:56             ` Mikael Pettersson
@ 2008-10-21  9:02               ` Tejun Heo
  2008-10-21  9:30                 ` Mikael Pettersson
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2008-10-21  9:02 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: Christian Mueller, Bruce Allen, Smartmontools Mailing List, LKML,
	IDE/ATA development list

Hello,

Mikael Pettersson wrote:
> Tejun Heo writes:
>  > Hello, Mikael.
>  > I would put this into ->hardreset itself as the controller can also
>  > get out of sync with reality during reset.
> 
> The only thing I see going on between prereset and (hard/soft)reset
> is an optional freeze, so I don't see why moving the pdc_reset_port()
> into the beginning of hardreset() would make any difference.
> 
> sata_promise currently uses the ->hardreset and ->softreset inherited
> from ata_sff_port_ops, so it would need to override both to ensure that
> we always do pdc_reset_port() before libata does its thing. That's why
> I felt doing that in ->prereset would be the right solution.

Hmm.. reset sequence goes on like the following.

 1. prereset
 2. hardreset, if fail, retry
 3. follow-up softreset if requested, if fail, goto #2
 4. postreset, if successful

So, if some PHY event happens while the reset is waiting for device
readiness and makes the controller state go out of sync with the
drive.  ->prereset() will NOT be called for the following retry.

As a rule, ->hardreset should be able to reset the controller from all
possible situations.  ->prereset can be used to smooth out initial
reset tries (ie. during initial probing, waiting for device readiness
before SRST for SFF controllers w/o hardreset) but at best its
function is advisory.  When things go wrong, ->hardreset should be
able to provide solution whatever state the controller is in.

If both hard and soft resets work better with the controller reset
added, I think it would be best to override both.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
  2008-10-21  7:59           ` Mikael Pettersson
@ 2008-10-21  8:55             ` Tejun Heo
  0 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2008-10-21  8:55 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: Christian Mueller, Bruce Allen, Smartmontools Mailing List, LKML,
	IDE/ATA development list

Hello, Mikael.
> I did a round of regression tests which identified another
> related but different problem.
> 
> In kernels 2.6.24 and 2.6.25 sata_promise would actually recover
> from the timeouts, while in kernels 2.6.26 and 2.6.27 it would not.
> Before 2.6.26 sata_promise explicitly used sata_std_hardreset, but
> in the "make reset related methods proper port operations" commit
> (a1efdaba2dbd6fb89e23a87b66d3f4dd92c9f5af), Tejun changed sata_promise
> to use the hardreset it now inherits from ata_sff_port_ops, namely
> sata_sff_hardreset. This change looks accidental. The main difference
> between these two procedures is that the sff version will poll the
> legacy status register until the port becomes ready.

Hmm... it's quite likely that I've introduced the regression but that
commit ain't it (I actually wrote a script to verify the inheritance
change doesn't actually change the function table).  What used to be
sata_std_hardreset() is now sata_sff_hardreset().  The change was made
while separating out SFF as [S]ATA as per the standard doesn't have
any way to wait for device readiness.  The TF-polling is SFF specific
and thus moved out to sata_sff_hardreset().

So, in both 2.6.24 and 2.6.25, sata_promise did wait for device
readiness after hardreset as does 2.6.26 or any later version.

> Changing sata_promise to use sata_std_hardreset in 2.6.26/.27
> makes the EH after the timeouts much more reliable. Not as
> tidy as with the previous ->prereset fix, but still working.

The only behavior change between 2.6.25 and 2.6.26 as far as
sata_promise is concerned is that hardrset is preferred over
softreset.  Here's what I think is going on.

Previously, after a timeout, libata-eh will invoke softreset and if
that works all should have been fine whether hardreset actually worked
or not.  Now, after something goes wrong, libata EH calls hardreset
and as hardreset doesn't work properly without the controller reset so
it fails.  So, the libata core layer change exposed a bug in
hardreset, which was one of the reasons why the change was made -
hardreset being the last line of defense, using it occasionally
doesn't make sense test-coverage-wise.

I agree the core layer changes can be quite confusing but they were
necessary to keep the core layer scalable.  libata now has oodles of
different types of low level drivers and things were and still are
getting quite treacherous for drivers living on the edge.

Anyways, so, please fix hardreset.  If it can't wait for device
readiness reliably, the right thing to do is to use
sata_std_hardreset() which will trigger follow-up softreset to wait
for device readiness and classify devices but if adding the controller
reset makes the hardreset more reliable, please do so.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
  2008-10-19 23:40         ` Mikael Pettersson
  2008-10-21  4:18           ` Tejun Heo
@ 2008-10-21  7:59           ` Mikael Pettersson
  2008-10-21  8:55             ` Tejun Heo
  1 sibling, 1 reply; 22+ messages in thread
From: Mikael Pettersson @ 2008-10-21  7:59 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: Tejun Heo, Christian Mueller, Bruce Allen,
	Smartmontools Mailing List, LKML, IDE/ATA development list

On Mon, 20 Oct 2008 01:40:21 +0200, Mikael Pettersson wrote:
>On Mon, 13 Oct 2008 14:16:24 +0900, Tejun Heo wrote:
>>Mikael Pettersson wrote:
>>> - hardreset in sata_promise seems broken. I'll take a closer look
>>>   at that in about a week's time (I'll be busy with other work the
>>>   next couple of days).
>>
>>This looks like a rather serious problem, so please take a look at
>>this.
>
>I've done more tests now, and the problem is that errors detected
>outside of sata_promise itself, typically timeouts, don't trigger
>the pdc_reset_port() call needed to bring the ATA engine behind the
>port back to sanity.

I did a round of regression tests which identified another
related but different problem.

In kernels 2.6.24 and 2.6.25 sata_promise would actually recover
from the timeouts, while in kernels 2.6.26 and 2.6.27 it would not.
Before 2.6.26 sata_promise explicitly used sata_std_hardreset, but
in the "make reset related methods proper port operations" commit
(a1efdaba2dbd6fb89e23a87b66d3f4dd92c9f5af), Tejun changed sata_promise
to use the hardreset it now inherits from ata_sff_port_ops, namely
sata_sff_hardreset. This change looks accidental. The main difference
between these two procedures is that the sff version will poll the
legacy status register until the port becomes ready.

Changing sata_promise to use sata_std_hardreset in 2.6.26/.27
makes the EH after the timeouts much more reliable. Not as
tidy as with the previous ->prereset fix, but still working.

/Mikael

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
  2008-10-21  4:18           ` Tejun Heo
@ 2008-10-21  7:56             ` Mikael Pettersson
  2008-10-21  9:02               ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Mikael Pettersson @ 2008-10-21  7:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Mikael Pettersson, Christian Mueller, Bruce Allen,
	Smartmontools Mailing List, LKML, IDE/ATA development list

Tejun Heo writes:
 > Hello, Mikael.
 > 
 > > --- linux-2.6.27/drivers/ata/sata_promise.c.~1~	2008-07-14 10:22:36.000000000 +0200
 > > +++ linux-2.6.27/drivers/ata/sata_promise.c	2008-10-20 00:20:58.000000000 +0200
 > > @@ -153,6 +153,7 @@ static void pdc_freeze(struct ata_port *
 > >  static void pdc_sata_freeze(struct ata_port *ap);
 > >  static void pdc_thaw(struct ata_port *ap);
 > >  static void pdc_sata_thaw(struct ata_port *ap);
 > > +static int pdc_prereset(struct ata_link *link, unsigned long deadline);
 > >  static void pdc_error_handler(struct ata_port *ap);
 > >  static void pdc_post_internal_cmd(struct ata_queued_cmd *qc);
 > >  static int pdc_pata_cable_detect(struct ata_port *ap);
 > > @@ -175,6 +176,7 @@ static const struct ata_port_operations 
 > >  	.sff_irq_clear		= pdc_irq_clear,
 > >  
 > >  	.post_internal_cmd	= pdc_post_internal_cmd,
 > > +	.prereset		= pdc_prereset,
 > >  	.error_handler		= pdc_error_handler,
 > >  };
 > >  
 > > @@ -691,6 +693,12 @@ static void pdc_sata_thaw(struct ata_por
 > >  	readl(host_mmio + hotplug_offset); /* flush */
 > >  }
 > >  
 > > +static int pdc_prereset(struct ata_link *link, unsigned long deadline)
 > > +{
 > > +	pdc_reset_port(link->ap);
 > 
 > I would put this into ->hardreset itself as the controller can also
 > get out of sync with reality during reset.

The only thing I see going on between prereset and (hard/soft)reset
is an optional freeze, so I don't see why moving the pdc_reset_port()
into the beginning of hardreset() would make any difference.

sata_promise currently uses the ->hardreset and ->softreset inherited
from ata_sff_port_ops, so it would need to override both to ensure that
we always do pdc_reset_port() before libata does its thing. That's why
I felt doing that in ->prereset would be the right solution.

/Mikael

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
  2008-10-19 23:40         ` Mikael Pettersson
@ 2008-10-21  4:18           ` Tejun Heo
  2008-10-21  7:56             ` Mikael Pettersson
  2008-10-21  7:59           ` Mikael Pettersson
  1 sibling, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2008-10-21  4:18 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: Christian Mueller, Bruce Allen, Smartmontools Mailing List, LKML,
	IDE/ATA development list

Hello, Mikael.

> --- linux-2.6.27/drivers/ata/sata_promise.c.~1~	2008-07-14 10:22:36.000000000 +0200
> +++ linux-2.6.27/drivers/ata/sata_promise.c	2008-10-20 00:20:58.000000000 +0200
> @@ -153,6 +153,7 @@ static void pdc_freeze(struct ata_port *
>  static void pdc_sata_freeze(struct ata_port *ap);
>  static void pdc_thaw(struct ata_port *ap);
>  static void pdc_sata_thaw(struct ata_port *ap);
> +static int pdc_prereset(struct ata_link *link, unsigned long deadline);
>  static void pdc_error_handler(struct ata_port *ap);
>  static void pdc_post_internal_cmd(struct ata_queued_cmd *qc);
>  static int pdc_pata_cable_detect(struct ata_port *ap);
> @@ -175,6 +176,7 @@ static const struct ata_port_operations 
>  	.sff_irq_clear		= pdc_irq_clear,
>  
>  	.post_internal_cmd	= pdc_post_internal_cmd,
> +	.prereset		= pdc_prereset,
>  	.error_handler		= pdc_error_handler,
>  };
>  
> @@ -691,6 +693,12 @@ static void pdc_sata_thaw(struct ata_por
>  	readl(host_mmio + hotplug_offset); /* flush */
>  }
>  
> +static int pdc_prereset(struct ata_link *link, unsigned long deadline)
> +{
> +	pdc_reset_port(link->ap);

I would put this into ->hardreset itself as the controller can also
get out of sync with reality during reset.  Other than that, looks
like the correct approach.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
  2008-10-13  5:16       ` Tejun Heo
  2008-10-13  7:03         ` Mikael Pettersson
@ 2008-10-19 23:40         ` Mikael Pettersson
  2008-10-21  4:18           ` Tejun Heo
  2008-10-21  7:59           ` Mikael Pettersson
  1 sibling, 2 replies; 22+ messages in thread
From: Mikael Pettersson @ 2008-10-19 23:40 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Mikael Pettersson, Christian Mueller, Bruce Allen,
	Smartmontools Mailing List, LKML, IDE/ATA development list

On Mon, 13 Oct 2008 14:16:24 +0900, Tejun Heo wrote:
>Mikael Pettersson wrote:
>> - hardreset in sata_promise seems broken. I'll take a closer look
>>   at that in about a week's time (I'll be busy with other work the
>>   next couple of days).
>
>This looks like a rather serious problem, so please take a look at
>this.

I've done more tests now, and the problem is that errors detected
outside of sata_promise itself, typically timeouts, don't trigger
the pdc_reset_port() call needed to bring the ATA engine behind the
port back to sanity.

And the reason no Promise-specific reset is done on timeouts is
that libata-eh freezes the port before calling ->error_handler.
sata_promise's error_handler only does a reset if the port is
non-frozen, and I think that's because we don't want to destroy
error status bits needed by EH autopsy.

The solution I've been testing is the straightforward one of
overriding ->prereset with code which calls pdc_reset_port()
before calling the default prereset. (See patch below.)
(Promise's own driver issues resets whenever there's a sign
of a problem.)

One of my test disks will often trigger a timeout if smartctl
accesses it when it's spun down. Previously the port would not
recover from that, but now it's just a brief reset/detect and
then it's up again.

/Mikael

--- linux-2.6.27/drivers/ata/sata_promise.c.~1~	2008-07-14 10:22:36.000000000 +0200
+++ linux-2.6.27/drivers/ata/sata_promise.c	2008-10-20 00:20:58.000000000 +0200
@@ -153,6 +153,7 @@ static void pdc_freeze(struct ata_port *
 static void pdc_sata_freeze(struct ata_port *ap);
 static void pdc_thaw(struct ata_port *ap);
 static void pdc_sata_thaw(struct ata_port *ap);
+static int pdc_prereset(struct ata_link *link, unsigned long deadline);
 static void pdc_error_handler(struct ata_port *ap);
 static void pdc_post_internal_cmd(struct ata_queued_cmd *qc);
 static int pdc_pata_cable_detect(struct ata_port *ap);
@@ -175,6 +176,7 @@ static const struct ata_port_operations 
 	.sff_irq_clear		= pdc_irq_clear,
 
 	.post_internal_cmd	= pdc_post_internal_cmd,
+	.prereset		= pdc_prereset,
 	.error_handler		= pdc_error_handler,
 };
 
@@ -691,6 +693,12 @@ static void pdc_sata_thaw(struct ata_por
 	readl(host_mmio + hotplug_offset); /* flush */
 }
 
+static int pdc_prereset(struct ata_link *link, unsigned long deadline)
+{
+	pdc_reset_port(link->ap);
+	return ata_sff_prereset(link, deadline);
+}
+
 static void pdc_error_handler(struct ata_port *ap)
 {
 	if (!(ap->pflags & ATA_PFLAG_FROZEN))

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
  2008-10-13  7:03         ` Mikael Pettersson
@ 2008-10-13  7:08           ` Tejun Heo
  0 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2008-10-13  7:08 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: Christian Mueller, Bruce Allen, Smartmontools Mailing List, LKML,
	IDE/ATA development list

Mikael Pettersson wrote:
> Tejun Heo writes:
>  > > - hardreset in sata_promise seems broken. I'll take a closer look
>  > >   at that in about a week's time (I'll be busy with other work the
>  > >   next couple of days).
>  > 
>  > This looks like a rather serious problem, so please take a look at
>  > this.  Also, after the drive went down, does unplugging and replugging
>  > the signal cable fix the problem, if not does doing the same thing
>  > with the power cable make any difference?
> 
> I couldn't bring the port back up with any combination of signal/power
> cable hotplugging.

Heh.. that rules out drive firmware stuck in wonderland.

> Even rmmod sata_promise; modprobe sata_promise failed to bring the
> port back up. Hence my suspicion that hardreset is borked or doesn't
> kick in.

It looks like the controller requires harder or more proper kick in
the ass to return to sane state.  :-)

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
  2008-10-13  5:16       ` Tejun Heo
@ 2008-10-13  7:03         ` Mikael Pettersson
  2008-10-13  7:08           ` Tejun Heo
  2008-10-19 23:40         ` Mikael Pettersson
  1 sibling, 1 reply; 22+ messages in thread
From: Mikael Pettersson @ 2008-10-13  7:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Mikael Pettersson, Christian Mueller, Bruce Allen,
	Smartmontools Mailing List, LKML, IDE/ATA development list

Tejun Heo writes:
 > Mikael Pettersson wrote:
 > > - smartctl -n standby waking Linda's disks from standby is probably
 > >   a disk firmware issue or a smartctl issue. I see no evidence that
 > >   sata_promise is to blame for that.
 > 
 > /me agrees.  I think it's most likely a problem in the firmware.
 > 
 > > - hardreset in sata_promise seems broken. I'll take a closer look
 > >   at that in about a week's time (I'll be busy with other work the
 > >   next couple of days).
 > 
 > This looks like a rather serious problem, so please take a look at
 > this.  Also, after the drive went down, does unplugging and replugging
 > the signal cable fix the problem, if not does doing the same thing
 > with the power cable make any difference?

I couldn't bring the port back up with any combination of signal/power
cable hotplugging. Even rmmod sata_promise; modprobe sata_promise failed
to bring the port back up. Hence my suspicion that hardreset is borked
or doesn't kick in.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
  2008-10-12 14:55     ` Mikael Pettersson
@ 2008-10-13  5:16       ` Tejun Heo
  2008-10-13  7:03         ` Mikael Pettersson
  2008-10-19 23:40         ` Mikael Pettersson
  0 siblings, 2 replies; 22+ messages in thread
From: Tejun Heo @ 2008-10-13  5:16 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: Christian Mueller, Bruce Allen, Smartmontools Mailing List, LKML,
	IDE/ATA development list

Mikael Pettersson wrote:
> - smartctl -n standby waking Linda's disks from standby is probably
>   a disk firmware issue or a smartctl issue. I see no evidence that
>   sata_promise is to blame for that.

/me agrees.  I think it's most likely a problem in the firmware.

> - hardreset in sata_promise seems broken. I'll take a closer look
>   at that in about a week's time (I'll be busy with other work the
>   next couple of days).

This looks like a rather serious problem, so please take a look at
this.  Also, after the drive went down, does unplugging and replugging
the signal cable fix the problem, if not does doing the same thing
with the power cable make any difference?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
  2008-10-08  0:28   ` Tejun Heo
@ 2008-10-12 14:55     ` Mikael Pettersson
  2008-10-13  5:16       ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Mikael Pettersson @ 2008-10-12 14:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Christian Mueller, Bruce Allen, Smartmontools Mailing List, LKML,
	IDE/ATA development list, Mikael Pettersson

Tejun Heo writes:
 > (cc'ing linux-ide and Mikael Pettersson)
 > 
 > Christian Mueller wrote:
 > > The messages I saw on the screen.
 > 
 > If the drive doesn't wake up and times out the smart command while
 > suspended (which probably is a bug in the firmware but it also can be
 > the controller's fault), then after the timeout, the driver should kick
 > in and reset the drive which will wake up the device and retry the
 > command.  It's not the prettiest picture but it's still gonna work.  In
 > Linda's case, it looks like the controller (sata_promise) went bonkers
 > on hardreset and requires power cycle to get back to working state.
 > That's why I asked Linda to try another controller.
 > 
 > Anyways, if you're seeing a similar problem, the driver or controller
 > probably can't do proper reset after timeout and I can't really help
 > with SAS driver on FreeBSD.  :-P
 > 
 > Mikael, the original thread is the following.
 > 
 >   http://article.gmane.org/gmane.linux.utilities.smartmontools/5842
 > 
 > Any ideas why hardreset doesn't work after SMART command timed out?

I've looked at the above posting and tried to reproduce the problem.

Linda wrote that "smartctl -n standby -A <device>" sometimes wakes
the device up, even though it shouldn't. On my Promise test box
(FC6 user-space with kernel 2.6.27 and smartmontools-5.37-1.2.fc6,
I put my test disks in standby with "hdparm -y /dev/sdb", and then
ran numerous "smartctl -n standby -A /dev/sdb -d ata" commands.
smartctl always noticed the standby state and never woke any disk.

I then dropped the "-n standby" to observe the wakeup behaviour.
On one disk (Seagate Barracuda 7200.9) the wakeups were completely
reliable with no signs of timeouts or libata EH activity.
On another disk (Hitachi Deskstar HDS722525VLSA80) the first one or
two times I ran "smartctl -A /dev/sdb -d ata" when the disk was in
standby it would wake up ok, but the next time it would suffer from
timeouts, failed COMRESETs, and eventually libata would disable the port.
When reloading sata_promise that port would fail detection but other
ports would be ok.

So I suspect two issues:
- smartctl -n standby waking Linda's disks from standby is probably
  a disk firmware issue or a smartctl issue. I see no evidence that
  sata_promise is to blame for that.
- hardreset in sata_promise seems broken. I'll take a closer look
  at that in about a week's time (I'll be busy with other work the
  next couple of days).

/Mikael

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
  2008-10-08  0:20 ` [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd) Christian Mueller
@ 2008-10-08  0:28   ` Tejun Heo
  2008-10-12 14:55     ` Mikael Pettersson
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2008-10-08  0:28 UTC (permalink / raw)
  To: Christian Mueller
  Cc: Bruce Allen, Smartmontools Mailing List, LKML,
	IDE/ATA development list, Mikael Pettersson

(cc'ing linux-ide and Mikael Pettersson)

Christian Mueller wrote:
> The messages I saw on the screen.

If the drive doesn't wake up and times out the smart command while
suspended (which probably is a bug in the firmware but it also can be
the controller's fault), then after the timeout, the driver should kick
in and reset the drive which will wake up the device and retry the
command.  It's not the prettiest picture but it's still gonna work.  In
Linda's case, it looks like the controller (sata_promise) went bonkers
on hardreset and requires power cycle to get back to working state.
That's why I asked Linda to try another controller.

Anyways, if you're seeing a similar problem, the driver or controller
probably can't do proper reset after timeout and I can't really help
with SAS driver on FreeBSD.  :-P

Mikael, the original thread is the following.

  http://article.gmane.org/gmane.linux.utilities.smartmontools/5842

Any ideas why hardreset doesn't work after SMART command timed out?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)
       [not found] <48EBFB63.20506@gmail.com>
@ 2008-10-08  0:20 ` Christian Mueller
  2008-10-08  0:28   ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Christian Mueller @ 2008-10-08  0:20 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Bruce Allen, Smartmontools Mailing List, LKML

Hi Tejun,

The messages I saw on the screen.


Thank you,
Christian 

-----Original Message-----
From: Tejun Heo [mailto:htejun@gmail.com] 
Sent: Tuesday, October 07, 2008 5:14 PM
To: Christian Mueller
Subject: Re: [smartmontools-support] inactive SATA drives won't stay in
standby or sleep, PATA models did. (fwd)

Christian Mueller wrote:
> Hi Tejun,
> 
> I am using an LSI 3080-X.  I have tried dmesg and it does not show
smartmon
> output. 

Then, what made you believe you're having the same problem?  Also, can
you please revive the cc list?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2008-10-22  4:13 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.64.0809142159550.18213@gc.phys.uwm.edu>
     [not found] ` <48E1B8F8.3090205@gmail.com>
     [not found]   ` <48E26BDA.8080804@tlinx.org>
     [not found]     ` <48E26E61.2010705@gmail.com>
     [not found]       ` <48E34BC8.3050009@tlinx.org>
     [not found]         ` <48E6DE07.70706@gmail.com>
2008-10-07  0:37           ` [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd) Linda Walsh
2008-10-07  1:08             ` Tejun Heo
2008-10-07  1:36               ` Linda Walsh
2008-10-07  1:42                 ` Tejun Heo
2008-10-07 10:13                   ` Linda Walsh
2008-10-07 22:27                   ` Linda Walsh
2008-10-07 23:59                     ` Tejun Heo
2008-10-22  3:40                       ` Promise SATA-standby +selftest=hungdrive; Sil works Linda Walsh
2008-10-22  4:13                         ` Tejun Heo
     [not found] <48EBFB63.20506@gmail.com>
2008-10-08  0:20 ` [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd) Christian Mueller
2008-10-08  0:28   ` Tejun Heo
2008-10-12 14:55     ` Mikael Pettersson
2008-10-13  5:16       ` Tejun Heo
2008-10-13  7:03         ` Mikael Pettersson
2008-10-13  7:08           ` Tejun Heo
2008-10-19 23:40         ` Mikael Pettersson
2008-10-21  4:18           ` Tejun Heo
2008-10-21  7:56             ` Mikael Pettersson
2008-10-21  9:02               ` Tejun Heo
2008-10-21  9:30                 ` Mikael Pettersson
2008-10-21  7:59           ` Mikael Pettersson
2008-10-21  8:55             ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).