All of lore.kernel.org
 help / color / mirror / Atom feed
* MD/RAID: what's wrong with sector 1953519935?
@ 2009-08-26  0:32 Andrei Tanas
  2009-08-26  0:50 ` NeilBrown
  0 siblings, 1 reply; 84+ messages in thread
From: Andrei Tanas @ 2009-08-26  0:32 UTC (permalink / raw)
  To: linux-kernel

Hello,

I'm using two ST31000528AS drives in RAID1 array using MD. I've had several
failures occur over a period of few months (see logs below). I've RMA'd the
drive, but then got curious why an otherwise normal drive locks up while
trying to write the same sector once a month or so, but does not report
having bad sectors, doesn't fail any tests, and does just fine if I do
dd if=/dev/urandom of=/dev/sdb bs=512 seek=1953519935 count=1
however many times I try.
I then tried Googling for this number (1953519935) and found that it comes
up quite a few times and most of the time (or always) in context of md/raid.
So my question is: is it just a coincidence (doesn't seem to be likely for a
number this big), or is it possible that when sent to hard drive, it gets
interpreted like some command and sends the drive into some unpredictable
state?

I will gladly provide any additional info that might be necessary.


#smartctl -i /dev/sdb
=== START OF INFORMATION SECTION ===
Device Model:     ST31000528AS
Serial Number:    6VP01LNL
Firmware Version: CC34
User Capacity:    1,000,204,886,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Thu Aug 20 10:52:31 2009 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

----------------------------------------------------
Jul 27 19:02:31 srv kernel: [901292.247428] ata2.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x6 frozen
Jul 27 19:02:31 srv kernel: [901292.247492] ata2.00: cmd
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Jul 27 19:02:31 srv kernel: [901292.247494]          res
40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 27 19:02:31 srv kernel: [901292.247500] ata2.00: status: { DRDY }
Jul 27 19:02:31 srv kernel: [901292.247512] ata2: hard resetting link
Jul 27 19:02:33 srv kernel: [901294.090746] ata2: SRST failed (errno=-19)
Jul 27 19:02:33 srv kernel: [901294.101922] ata2: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Jul 27 19:02:33 srv kernel: [901294.101938] ata2.00: failed to IDENTIFY (I/O
error, err_mask=0x40)
Jul 27 19:02:33 srv kernel: [901294.101943] ata2.00: revalidation failed
(errno=-5)
Jul 27 19:02:38 srv kernel: [901299.100347] ata2: hard resetting link
Jul 27 19:02:38 srv kernel: [901299.974103] ata2: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Jul 27 19:02:39 srv kernel: [901300.105734] ata2.00: configured for UDMA/133
Jul 27 19:02:39 srv kernel: [901300.105776] ata2: EH complete
Jul 27 19:02:39 srv kernel: [901300.137059] end_request: I/O error, dev sdb,
sector 1953519935
Jul 27 19:02:39 srv kernel: [901300.137069] md: super_written gets error=-5,
uptodate=0
Jul 27 19:02:39 srv kernel: [901300.137077] raid1: Disk failure on sdb1,
disabling device.
Jul 27 19:02:39 srv kernel: [901300.137079] raid1: Operation continuing on 1
devices.
Jul 27 19:02:39 srv kernel: [901300.208812] RAID1 conf printout:
Jul 27 19:02:39 srv kernel: [901300.208820]  --- wd:1 rd:2
Jul 27 19:02:39 srv kernel: [901300.208826]  disk 0, wo:0, o:1, dev:sda1
Jul 27 19:02:39 srv kernel: [901300.208830]  disk 1, wo:1, o:0, dev:sdb1
Jul 27 19:02:39 srv kernel: [901300.217392] RAID1 conf printout:
Jul 27 19:02:39 srv kernel: [901300.217399]  --- wd:1 rd:2
Jul 27 19:02:39 srv kernel: [901300.217404]  disk 0, wo:0, o:1, dev:sda1

Aug 20 00:15:36 srv kernel: [90307.328266] ata2.00: exception Emask 0x0 SAct
0x0 SErr 0x0 action 0x6 frozen
Aug 20 00:15:36 srv kernel: [90307.328275] ata2.00: cmd
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Aug 20 00:15:36 srv kernel: [90307.328277]          res
40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 20 00:15:36 srv kernel: [90307.328280] ata2.00: status: { DRDY }
Aug 20 00:15:36 srv kernel: [90307.328288] ata2: hard resetting link
Aug 20 00:15:47 srv kernel: [90313.218511] ata2: link is slow to respond,
please be patient (ready=0)
Aug 20 00:15:47 srv kernel: [90317.377711] ata2: SRST failed (errno=-16)
Aug 20 00:15:47 srv kernel: [90317.377720] ata2: hard resetting link
Aug 20 00:15:47 srv kernel: [90318.251720] ata2: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Aug 20 00:15:47 srv kernel: [90318.338026] ata2.00: configured for UDMA/133
Aug 20 00:15:47 srv kernel: [90318.338062] ata2: EH complete
Aug 20 00:15:47 srv kernel: [90318.370625] end_request: I/O error, dev sdb,
sector 1953519935
Aug 20 00:15:47 srv kernel: [90318.370632] md: super_written gets error=-5,
uptodate=0
Aug 20 00:15:47 srv kernel: [90318.370636] raid1: Disk failure on sdb1,
disabling device.
Aug 20 00:15:47 srv kernel: [90318.370637] raid1: Operation continuing on 1
devices.
Aug 20 00:15:47 srv kernel: [90318.396403] RAID1 conf printout:
Aug 20 00:15:47 srv kernel: [90318.396408]  --- wd:1 rd:2
Aug 20 00:15:47 srv kernel: [90318.396410]  disk 0, wo:0, o:1, dev:sda1
Aug 20 00:15:47 srv kernel: [90318.396413]  disk 1, wo:1, o:0, dev:sdb1
Aug 20 00:15:47 srv kernel: [90318.429178] RAID1 conf printout:
Aug 20 00:15:47 srv kernel: [90318.429185]  --- wd:1 rd:2
Aug 20 00:15:47 srv kernel: [90318.429189]  disk 0, wo:0, o:1, dev:sda1


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID: what's wrong with sector 1953519935?
  2009-08-26  0:32 MD/RAID: what's wrong with sector 1953519935? Andrei Tanas
@ 2009-08-26  0:50 ` NeilBrown
  2009-08-26  1:06   ` Ric Wheeler
  0 siblings, 1 reply; 84+ messages in thread
From: NeilBrown @ 2009-08-26  0:50 UTC (permalink / raw)
  To: Andrei Tanas; +Cc: linux-kernel

On Wed, August 26, 2009 10:32 am, Andrei Tanas wrote:
> Hello,
>
> I'm using two ST31000528AS drives in RAID1 array using MD. I've had
> several
> failures occur over a period of few months (see logs below). I've RMA'd
> the
> drive, but then got curious why an otherwise normal drive locks up while
> trying to write the same sector once a month or so, but does not report
> having bad sectors, doesn't fail any tests, and does just fine if I do
> dd if=/dev/urandom of=/dev/sdb bs=512 seek=1953519935 count=1
> however many times I try.
> I then tried Googling for this number (1953519935) and found that it comes
> up quite a few times and most of the time (or always) in context of
> md/raid.
> So my question is: is it just a coincidence (doesn't seem to be likely for
> a
> number this big), or is it possible that when sent to hard drive, it gets
> interpreted like some command and sends the drive into some unpredictable
> state?

All 1TB drives are exactly the same size.
If you create a single partition (e.g. sdb1) on such a device, and that
partition starts at sector 63 (which is common), and create an md
array using that partition, then the superblock will always be at the
address you quote.
The superblock is probably updated more often than any other block in
the array, so there is probably an increased likelyhood of an error
being reported against that sector.

So it is not just a coincidence.
Whether there is some deeper underlying problem though, I cannot say.
Google only claims 68 matches for that number which doesn't seem
big enough to be significant.

NeilBrown



>
> I will gladly provide any additional info that might be necessary.
>
>
> #smartctl -i /dev/sdb
> === START OF INFORMATION SECTION ===
> Device Model:     ST31000528AS
> Serial Number:    6VP01LNL
> Firmware Version: CC34
> User Capacity:    1,000,204,886,016 bytes
> Device is:        Not in smartctl database [for details use: -P showall]
> ATA Version is:   8
> ATA Standard is:  ATA-8-ACS revision 4
> Local Time is:    Thu Aug 20 10:52:31 2009 EDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> ----------------------------------------------------
> Jul 27 19:02:31 srv kernel: [901292.247428] ata2.00: exception Emask 0x0
> SAct 0x0 SErr 0x0 action 0x6 frozen
> Jul 27 19:02:31 srv kernel: [901292.247492] ata2.00: cmd
> ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
> Jul 27 19:02:31 srv kernel: [901292.247494]          res
> 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> Jul 27 19:02:31 srv kernel: [901292.247500] ata2.00: status: { DRDY }
> Jul 27 19:02:31 srv kernel: [901292.247512] ata2: hard resetting link
> Jul 27 19:02:33 srv kernel: [901294.090746] ata2: SRST failed (errno=-19)
> Jul 27 19:02:33 srv kernel: [901294.101922] ata2: SATA link up 3.0 Gbps
> (SStatus 123 SControl 300)
> Jul 27 19:02:33 srv kernel: [901294.101938] ata2.00: failed to IDENTIFY
> (I/O
> error, err_mask=0x40)
> Jul 27 19:02:33 srv kernel: [901294.101943] ata2.00: revalidation failed
> (errno=-5)
> Jul 27 19:02:38 srv kernel: [901299.100347] ata2: hard resetting link
> Jul 27 19:02:38 srv kernel: [901299.974103] ata2: SATA link up 3.0 Gbps
> (SStatus 123 SControl 300)
> Jul 27 19:02:39 srv kernel: [901300.105734] ata2.00: configured for
> UDMA/133
> Jul 27 19:02:39 srv kernel: [901300.105776] ata2: EH complete
> Jul 27 19:02:39 srv kernel: [901300.137059] end_request: I/O error, dev
> sdb,
> sector 1953519935
> Jul 27 19:02:39 srv kernel: [901300.137069] md: super_written gets
> error=-5,
> uptodate=0
> Jul 27 19:02:39 srv kernel: [901300.137077] raid1: Disk failure on sdb1,
> disabling device.
> Jul 27 19:02:39 srv kernel: [901300.137079] raid1: Operation continuing on
> 1
> devices.
> Jul 27 19:02:39 srv kernel: [901300.208812] RAID1 conf printout:
> Jul 27 19:02:39 srv kernel: [901300.208820]  --- wd:1 rd:2
> Jul 27 19:02:39 srv kernel: [901300.208826]  disk 0, wo:0, o:1, dev:sda1
> Jul 27 19:02:39 srv kernel: [901300.208830]  disk 1, wo:1, o:0, dev:sdb1
> Jul 27 19:02:39 srv kernel: [901300.217392] RAID1 conf printout:
> Jul 27 19:02:39 srv kernel: [901300.217399]  --- wd:1 rd:2
> Jul 27 19:02:39 srv kernel: [901300.217404]  disk 0, wo:0, o:1, dev:sda1
>
> Aug 20 00:15:36 srv kernel: [90307.328266] ata2.00: exception Emask 0x0
> SAct
> 0x0 SErr 0x0 action 0x6 frozen
> Aug 20 00:15:36 srv kernel: [90307.328275] ata2.00: cmd
> ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
> Aug 20 00:15:36 srv kernel: [90307.328277]          res
> 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> Aug 20 00:15:36 srv kernel: [90307.328280] ata2.00: status: { DRDY }
> Aug 20 00:15:36 srv kernel: [90307.328288] ata2: hard resetting link
> Aug 20 00:15:47 srv kernel: [90313.218511] ata2: link is slow to respond,
> please be patient (ready=0)
> Aug 20 00:15:47 srv kernel: [90317.377711] ata2: SRST failed (errno=-16)
> Aug 20 00:15:47 srv kernel: [90317.377720] ata2: hard resetting link
> Aug 20 00:15:47 srv kernel: [90318.251720] ata2: SATA link up 3.0 Gbps
> (SStatus 123 SControl 300)
> Aug 20 00:15:47 srv kernel: [90318.338026] ata2.00: configured for
> UDMA/133
> Aug 20 00:15:47 srv kernel: [90318.338062] ata2: EH complete
> Aug 20 00:15:47 srv kernel: [90318.370625] end_request: I/O error, dev
> sdb,
> sector 1953519935
> Aug 20 00:15:47 srv kernel: [90318.370632] md: super_written gets
> error=-5,
> uptodate=0
> Aug 20 00:15:47 srv kernel: [90318.370636] raid1: Disk failure on sdb1,
> disabling device.
> Aug 20 00:15:47 srv kernel: [90318.370637] raid1: Operation continuing on
> 1
> devices.
> Aug 20 00:15:47 srv kernel: [90318.396403] RAID1 conf printout:
> Aug 20 00:15:47 srv kernel: [90318.396408]  --- wd:1 rd:2
> Aug 20 00:15:47 srv kernel: [90318.396410]  disk 0, wo:0, o:1, dev:sda1
> Aug 20 00:15:47 srv kernel: [90318.396413]  disk 1, wo:1, o:0, dev:sdb1
> Aug 20 00:15:47 srv kernel: [90318.429178] RAID1 conf printout:
> Aug 20 00:15:47 srv kernel: [90318.429185]  --- wd:1 rd:2
> Aug 20 00:15:47 srv kernel: [90318.429189]  disk 0, wo:0, o:1, dev:sda1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID: what's wrong with sector 1953519935?
  2009-08-26  0:50 ` NeilBrown
@ 2009-08-26  1:06   ` Ric Wheeler
  2009-08-26  1:24     ` NeilBrown
  0 siblings, 1 reply; 84+ messages in thread
From: Ric Wheeler @ 2009-08-26  1:06 UTC (permalink / raw)
  To: NeilBrown; +Cc: Andrei Tanas, linux-kernel

On 08/25/2009 08:50 PM, NeilBrown wrote:
> On Wed, August 26, 2009 10:32 am, Andrei Tanas wrote:
>    
>> Hello,
>>
>> I'm using two ST31000528AS drives in RAID1 array using MD. I've had
>> several
>> failures occur over a period of few months (see logs below). I've RMA'd
>> the
>> drive, but then got curious why an otherwise normal drive locks up while
>> trying to write the same sector once a month or so, but does not report
>> having bad sectors, doesn't fail any tests, and does just fine if I do
>> dd if=/dev/urandom of=/dev/sdb bs=512 seek=1953519935 count=1
>> however many times I try.
>> I then tried Googling for this number (1953519935) and found that it comes
>> up quite a few times and most of the time (or always) in context of
>> md/raid.
>> So my question is: is it just a coincidence (doesn't seem to be likely for
>> a
>> number this big), or is it possible that when sent to hard drive, it gets
>> interpreted like some command and sends the drive into some unpredictable
>> state?
>>      
> All 1TB drives are exactly the same size.
> If you create a single partition (e.g. sdb1) on such a device, and that
> partition starts at sector 63 (which is common), and create an md
> array using that partition, then the superblock will always be at the
> address you quote.
> The superblock is probably updated more often than any other block in
> the array, so there is probably an increased likelyhood of an error
> being reported against that sector.
>
> So it is not just a coincidence.
> Whether there is some deeper underlying problem though, I cannot say.
> Google only claims 68 matches for that number which doesn't seem
> big enough to be significant.
>
> NeilBrown
>
>    

Neil,

One thing that can happen is when we have a hot spot (like the super 
block) on high capacity drives is that the frequent write degrade the 
data in adjacent tracks.  Some drives have firmware that watches for 
this and rewrites adjacent tracks, but it is also a good idea to avoid 
too frequent updates.

Didn't you have a tunable to decrease this update frequency?

Ric

>
>    
>> I will gladly provide any additional info that might be necessary.
>>
>>
>> #smartctl -i /dev/sdb
>> === START OF INFORMATION SECTION ===
>> Device Model:     ST31000528AS
>> Serial Number:    6VP01LNL
>> Firmware Version: CC34
>> User Capacity:    1,000,204,886,016 bytes
>> Device is:        Not in smartctl database [for details use: -P showall]
>> ATA Version is:   8
>> ATA Standard is:  ATA-8-ACS revision 4
>> Local Time is:    Thu Aug 20 10:52:31 2009 EDT
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> ----------------------------------------------------
>> Jul 27 19:02:31 srv kernel: [901292.247428] ata2.00: exception Emask 0x0
>> SAct 0x0 SErr 0x0 action 0x6 frozen
>> Jul 27 19:02:31 srv kernel: [901292.247492] ata2.00: cmd
>> ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
>> Jul 27 19:02:31 srv kernel: [901292.247494]          res
>> 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
>> Jul 27 19:02:31 srv kernel: [901292.247500] ata2.00: status: { DRDY }
>> Jul 27 19:02:31 srv kernel: [901292.247512] ata2: hard resetting link
>> Jul 27 19:02:33 srv kernel: [901294.090746] ata2: SRST failed (errno=-19)
>> Jul 27 19:02:33 srv kernel: [901294.101922] ata2: SATA link up 3.0 Gbps
>> (SStatus 123 SControl 300)
>> Jul 27 19:02:33 srv kernel: [901294.101938] ata2.00: failed to IDENTIFY
>> (I/O
>> error, err_mask=0x40)
>> Jul 27 19:02:33 srv kernel: [901294.101943] ata2.00: revalidation failed
>> (errno=-5)
>> Jul 27 19:02:38 srv kernel: [901299.100347] ata2: hard resetting link
>> Jul 27 19:02:38 srv kernel: [901299.974103] ata2: SATA link up 3.0 Gbps
>> (SStatus 123 SControl 300)
>> Jul 27 19:02:39 srv kernel: [901300.105734] ata2.00: configured for
>> UDMA/133
>> Jul 27 19:02:39 srv kernel: [901300.105776] ata2: EH complete
>> Jul 27 19:02:39 srv kernel: [901300.137059] end_request: I/O error, dev
>> sdb,
>> sector 1953519935
>> Jul 27 19:02:39 srv kernel: [901300.137069] md: super_written gets
>> error=-5,
>> uptodate=0
>> Jul 27 19:02:39 srv kernel: [901300.137077] raid1: Disk failure on sdb1,
>> disabling device.
>> Jul 27 19:02:39 srv kernel: [901300.137079] raid1: Operation continuing on
>> 1
>> devices.
>> Jul 27 19:02:39 srv kernel: [901300.208812] RAID1 conf printout:
>> Jul 27 19:02:39 srv kernel: [901300.208820]  --- wd:1 rd:2
>> Jul 27 19:02:39 srv kernel: [901300.208826]  disk 0, wo:0, o:1, dev:sda1
>> Jul 27 19:02:39 srv kernel: [901300.208830]  disk 1, wo:1, o:0, dev:sdb1
>> Jul 27 19:02:39 srv kernel: [901300.217392] RAID1 conf printout:
>> Jul 27 19:02:39 srv kernel: [901300.217399]  --- wd:1 rd:2
>> Jul 27 19:02:39 srv kernel: [901300.217404]  disk 0, wo:0, o:1, dev:sda1
>>
>> Aug 20 00:15:36 srv kernel: [90307.328266] ata2.00: exception Emask 0x0
>> SAct
>> 0x0 SErr 0x0 action 0x6 frozen
>> Aug 20 00:15:36 srv kernel: [90307.328275] ata2.00: cmd
>> ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
>> Aug 20 00:15:36 srv kernel: [90307.328277]          res
>> 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
>> Aug 20 00:15:36 srv kernel: [90307.328280] ata2.00: status: { DRDY }
>> Aug 20 00:15:36 srv kernel: [90307.328288] ata2: hard resetting link
>> Aug 20 00:15:47 srv kernel: [90313.218511] ata2: link is slow to respond,
>> please be patient (ready=0)
>> Aug 20 00:15:47 srv kernel: [90317.377711] ata2: SRST failed (errno=-16)
>> Aug 20 00:15:47 srv kernel: [90317.377720] ata2: hard resetting link
>> Aug 20 00:15:47 srv kernel: [90318.251720] ata2: SATA link up 3.0 Gbps
>> (SStatus 123 SControl 300)
>> Aug 20 00:15:47 srv kernel: [90318.338026] ata2.00: configured for
>> UDMA/133
>> Aug 20 00:15:47 srv kernel: [90318.338062] ata2: EH complete
>> Aug 20 00:15:47 srv kernel: [90318.370625] end_request: I/O error, dev
>> sdb,
>> sector 1953519935
>> Aug 20 00:15:47 srv kernel: [90318.370632] md: super_written gets
>> error=-5,
>> uptodate=0
>> Aug 20 00:15:47 srv kernel: [90318.370636] raid1: Disk failure on sdb1,
>> disabling device.
>> Aug 20 00:15:47 srv kernel: [90318.370637] raid1: Operation continuing on
>> 1
>> devices.
>> Aug 20 00:15:47 srv kernel: [90318.396403] RAID1 conf printout:
>> Aug 20 00:15:47 srv kernel: [90318.396408]  --- wd:1 rd:2
>> Aug 20 00:15:47 srv kernel: [90318.396410]  disk 0, wo:0, o:1, dev:sda1
>> Aug 20 00:15:47 srv kernel: [90318.396413]  disk 1, wo:1, o:0, dev:sdb1
>> Aug 20 00:15:47 srv kernel: [90318.429178] RAID1 conf printout:
>> Aug 20 00:15:47 srv kernel: [90318.429185]  --- wd:1 rd:2
>> Aug 20 00:15:47 srv kernel: [90318.429189]  disk 0, wo:0, o:1, dev:sda1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>>      
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>    


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID: what's wrong with sector 1953519935?
  2009-08-26  1:06   ` Ric Wheeler
@ 2009-08-26  1:24     ` NeilBrown
  2009-08-26  1:31       ` Ric Wheeler
  0 siblings, 1 reply; 84+ messages in thread
From: NeilBrown @ 2009-08-26  1:24 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: Andrei Tanas, linux-kernel

On Wed, August 26, 2009 11:06 am, Ric Wheeler wrote:
> On 08/25/2009 08:50 PM, NeilBrown wrote:

>> All 1TB drives are exactly the same size.
>> If you create a single partition (e.g. sdb1) on such a device, and that
>> partition starts at sector 63 (which is common), and create an md
>> array using that partition, then the superblock will always be at the
>> address you quote.
>> The superblock is probably updated more often than any other block in
>> the array, so there is probably an increased likelyhood of an error
>> being reported against that sector.
>>
>> So it is not just a coincidence.
>> Whether there is some deeper underlying problem though, I cannot say.
>> Google only claims 68 matches for that number which doesn't seem
>> big enough to be significant.
>>
>> NeilBrown
>>
>>
>
> Neil,
>
> One thing that can happen is when we have a hot spot (like the super
> block) on high capacity drives is that the frequent write degrade the
> data in adjacent tracks.  Some drives have firmware that watches for
> this and rewrites adjacent tracks, but it is also a good idea to avoid
> too frequent updates.

Yet another detail to worry about.... :-(

>
> Didn't you have a tunable to decrease this update frequency?

/sys/block/mdX/md/safe_mode_delay
is a time in seconds (Default 0.200) between when the last write to
the array completes and when the superblock is marked as clean.
Depending on the actual rate of writes to the array, the superblock
can be updated as much as twice in this time (once to mark dirty,
once to mark clean).

Increasing the number can decrease the update frequency of the superblock,
but the exact effect on update frequency is very load-dependant.

Obviously a write-intent-bitmap, which is rarely more that a few
sectors, can also see lots of updates, and it is harder to tune
that (you have to set things up when you create the bitmap).

NeilBrown


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID: what's wrong with sector 1953519935?
  2009-08-26  1:24     ` NeilBrown
@ 2009-08-26  1:31       ` Ric Wheeler
  2009-08-26  2:22         ` Andrei Tanas
  0 siblings, 1 reply; 84+ messages in thread
From: Ric Wheeler @ 2009-08-26  1:31 UTC (permalink / raw)
  To: NeilBrown; +Cc: Andrei Tanas, linux-kernel

On 08/25/2009 09:24 PM, NeilBrown wrote:
> On Wed, August 26, 2009 11:06 am, Ric Wheeler wrote:
>> On 08/25/2009 08:50 PM, NeilBrown wrote:
>
>>> All 1TB drives are exactly the same size.
>>> If you create a single partition (e.g. sdb1) on such a device, and that
>>> partition starts at sector 63 (which is common), and create an md
>>> array using that partition, then the superblock will always be at the
>>> address you quote.
>>> The superblock is probably updated more often than any other block in
>>> the array, so there is probably an increased likelyhood of an error
>>> being reported against that sector.
>>>
>>> So it is not just a coincidence.
>>> Whether there is some deeper underlying problem though, I cannot say.
>>> Google only claims 68 matches for that number which doesn't seem
>>> big enough to be significant.
>>>
>>> NeilBrown
>>>
>>>
>>
>> Neil,
>>
>> One thing that can happen is when we have a hot spot (like the super
>> block) on high capacity drives is that the frequent write degrade the
>> data in adjacent tracks.  Some drives have firmware that watches for
>> this and rewrites adjacent tracks, but it is also a good idea to avoid
>> too frequent updates.
>
> Yet another detail to worry about.... :-(

it never ends :-)

>
>>
>> Didn't you have a tunable to decrease this update frequency?
>
> /sys/block/mdX/md/safe_mode_delay
> is a time in seconds (Default 0.200) between when the last write to
> the array completes and when the superblock is marked as clean.
> Depending on the actual rate of writes to the array, the superblock
> can be updated as much as twice in this time (once to mark dirty,
> once to mark clean).
>
> Increasing the number can decrease the update frequency of the superblock,
> but the exact effect on update frequency is very load-dependant.
>
> Obviously a write-intent-bitmap, which is rarely more that a few
> sectors, can also see lots of updates, and it is harder to tune
> that (you have to set things up when you create the bitmap).
>
> NeilBrown
>

We did see issues in practice with adjacent sectors with some drives, so this 
one is worth tuning down.

I would suggest that Andrei might try to write and clear the IO error at that 
offset. You can use Mark Lord's hdparm to clear a specific sector or just do the 
math (carefully!) and dd over it. It the write succeeds (without bumping your 
remapped sectors count) this is a likely match to this problem,

ric





^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: MD/RAID: what's wrong with sector 1953519935?
  2009-08-26  1:31       ` Ric Wheeler
@ 2009-08-26  2:22         ` Andrei Tanas
  2009-08-26  2:41           ` Ric Wheeler
  0 siblings, 1 reply; 84+ messages in thread
From: Andrei Tanas @ 2009-08-26  2:22 UTC (permalink / raw)
  To: 'Ric Wheeler', 'NeilBrown'; +Cc: linux-kernel

> >> One thing that can happen is when we have a hot spot (like the super
> >> block) on high capacity drives is that the frequent write degrade
> the
> >> data in adjacent tracks.  Some drives have firmware that watches for
> >> this and rewrites adjacent tracks, but it is also a good idea to
> avoid
> >> too frequent updates.
> >
> > Yet another detail to worry about.... :-(
> 
> it never ends :-)
> 
> >
> >>
> >> Didn't you have a tunable to decrease this update frequency?
> >
> > /sys/block/mdX/md/safe_mode_delay
> > is a time in seconds (Default 0.200) between when the last write to
> > the array completes and when the superblock is marked as clean.
> > Depending on the actual rate of writes to the array, the superblock
> > can be updated as much as twice in this time (once to mark dirty,
> > once to mark clean).
> >
> > Increasing the number can decrease the update frequency of the
> superblock,
> > but the exact effect on update frequency is very load-dependant.
> >
> > Obviously a write-intent-bitmap, which is rarely more that a few
> > sectors, can also see lots of updates, and it is harder to tune
> > that (you have to set things up when you create the bitmap).
> >
> > NeilBrown
> >
> 
> We did see issues in practice with adjacent sectors with some drives,
> so this
> one is worth tuning down.
> 
> I would suggest that Andrei might try to write and clear the IO error
> at that
> offset. You can use Mark Lord's hdparm to clear a specific sector or
> just do the
> math (carefully!) and dd over it. It the write succeeds (without
> bumping your
> remapped sectors count) this is a likely match to this problem,

I've tried dd multiple times, it always succeeds, and the relocated sector
count is currently 1 on this drive, even though this particular fault
happened at least 3 times so far.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID: what's wrong with sector 1953519935?
  2009-08-26  2:22         ` Andrei Tanas
@ 2009-08-26  2:41           ` Ric Wheeler
  2009-08-26  3:45             ` Andrei Tanas
  0 siblings, 1 reply; 84+ messages in thread
From: Ric Wheeler @ 2009-08-26  2:41 UTC (permalink / raw)
  To: Andrei Tanas; +Cc: 'NeilBrown', linux-kernel

On 08/25/2009 10:22 PM, Andrei Tanas wrote:
>>>> One thing that can happen is when we have a hot spot (like the super
>>>> block) on high capacity drives is that the frequent write degrade
>> the
>>>> data in adjacent tracks.  Some drives have firmware that watches for
>>>> this and rewrites adjacent tracks, but it is also a good idea to
>> avoid
>>>> too frequent updates.
>>>
>>> Yet another detail to worry about.... :-(
>>
>> it never ends :-)
>>
>>>
>>>>
>>>> Didn't you have a tunable to decrease this update frequency?
>>>
>>> /sys/block/mdX/md/safe_mode_delay
>>> is a time in seconds (Default 0.200) between when the last write to
>>> the array completes and when the superblock is marked as clean.
>>> Depending on the actual rate of writes to the array, the superblock
>>> can be updated as much as twice in this time (once to mark dirty,
>>> once to mark clean).
>>>
>>> Increasing the number can decrease the update frequency of the
>> superblock,
>>> but the exact effect on update frequency is very load-dependant.
>>>
>>> Obviously a write-intent-bitmap, which is rarely more that a few
>>> sectors, can also see lots of updates, and it is harder to tune
>>> that (you have to set things up when you create the bitmap).
>>>
>>> NeilBrown
>>>
>>
>> We did see issues in practice with adjacent sectors with some drives,
>> so this
>> one is worth tuning down.
>>
>> I would suggest that Andrei might try to write and clear the IO error
>> at that
>> offset. You can use Mark Lord's hdparm to clear a specific sector or
>> just do the
>> math (carefully!) and dd over it. It the write succeeds (without
>> bumping your
>> remapped sectors count) this is a likely match to this problem,
>
> I've tried dd multiple times, it always succeeds, and the relocated sector
> count is currently 1 on this drive, even though this particular fault
> happened at least 3 times so far.
>

I would bump that count way up (say to 2) and see if you have an issue...

ric


^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: MD/RAID: what's wrong with sector 1953519935?
  2009-08-26  2:41           ` Ric Wheeler
@ 2009-08-26  3:45             ` Andrei Tanas
  2009-08-26 10:34               ` Ric Wheeler
  0 siblings, 1 reply; 84+ messages in thread
From: Andrei Tanas @ 2009-08-26  3:45 UTC (permalink / raw)
  To: 'Ric Wheeler'; +Cc: 'NeilBrown', linux-kernel

> >> I would suggest that Andrei might try to write and clear the IO
> error
> >> at that
> >> offset. You can use Mark Lord's hdparm to clear a specific sector or
> >> just do the
> >> math (carefully!) and dd over it. It the write succeeds (without
> >> bumping your
> >> remapped sectors count) this is a likely match to this problem,
> >
> > I've tried dd multiple times, it always succeeds, and the relocated
> sector
> > count is currently 1 on this drive, even though this particular fault
> > happened at least 3 times so far.
> >
> 
> I would bump that count way up (say to 2) and see if you have an
> issue...

Not sure what you mean by this: how can I artificially bump the relocated
sector count?


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID: what's wrong with sector 1953519935?
  2009-08-26  3:45             ` Andrei Tanas
@ 2009-08-26 10:34               ` Ric Wheeler
  2009-08-26 14:46                 ` Andrei Tanas
  0 siblings, 1 reply; 84+ messages in thread
From: Ric Wheeler @ 2009-08-26 10:34 UTC (permalink / raw)
  To: Andrei Tanas; +Cc: 'NeilBrown', linux-kernel

On 08/25/2009 11:45 PM, Andrei Tanas wrote:
>>>> I would suggest that Andrei might try to write and clear the IO
>>>>          
>> error
>>      
>>>> at that
>>>> offset. You can use Mark Lord's hdparm to clear a specific sector or
>>>> just do the
>>>> math (carefully!) and dd over it. It the write succeeds (without
>>>> bumping your
>>>> remapped sectors count) this is a likely match to this problem,
>>>>          
>>> I've tried dd multiple times, it always succeeds, and the relocated
>>>        
>> sector
>>      
>>> count is currently 1 on this drive, even though this particular fault
>>> happened at least 3 times so far.
>>>
>>>        
>> I would bump that count way up (say to 2) and see if you have an
>> issue...
>>      
> Not sure what you mean by this: how can I artificially bump the relocated
> sector count?
>
>    
Sorry - you need to set the tunable:

/sys/block/mdX/md/safe_mode_delay

to something like "2" to prevent that sector from being a hotspot...

ric





^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID: what's wrong with sector 1953519935?
  2009-08-26 10:34               ` Ric Wheeler
@ 2009-08-26 14:46                 ` Andrei Tanas
  2009-08-26 14:49                   ` Andrei Tanas
  2009-08-26 15:39                   ` Ric Wheeler
  0 siblings, 2 replies; 84+ messages in thread
From: Andrei Tanas @ 2009-08-26 14:46 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: NeilBrown, linux-kernel

On Wed, 26 Aug 2009 06:34:14 -0400, Ric Wheeler <rwheeler@redhat.com>
wrote:
> On 08/25/2009 11:45 PM, Andrei Tanas wrote:
>>>>> I would suggest that Andrei might try to write and clear the IO
>>>>>          
>>> error
>>>      
>>>>> at that
>>>>> offset. You can use Mark Lord's hdparm to clear a specific sector or
>>>>> just do the
>>>>> math (carefully!) and dd over it. It the write succeeds (without
>>>>> bumping your
>>>>> remapped sectors count) this is a likely match to this problem,
>>>>>          
>>>> I've tried dd multiple times, it always succeeds, and the relocated
>>>>        
>>> sector
>>>      
>>>> count is currently 1 on this drive, even though this particular fault
>>>> happened at least 3 times so far.
>>>>
>>>>        
>>> I would bump that count way up (say to 2) and see if you have an
>>> issue...
>>>      
>> Not sure what you mean by this: how can I artificially bump the
relocated
>> sector count?
>>
>>    
> Sorry - you need to set the tunable:
> 
> /sys/block/mdX/md/safe_mode_delay
> 
> to something like "2" to prevent that sector from being a hotspot...

I did that as soon as you suggested that it's possible to tune it. The
array is still being rebuilt (it's a fairly busy machine, so rebuilding is
slow). I'll monitor it, but I don't expect to see the results soon as even
with the default value of 0.2 it used to happen once in several weeks.

On the other note: is it possible that the drive was actually working
properly but was not given enough time to complete the write request? These
newer drives have 32MB cache but the same rotational speed and seek times
as the older ones so they must need more time to flush their cache?

Andrei.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID: what's wrong with sector 1953519935?
  2009-08-26 14:46                 ` Andrei Tanas
@ 2009-08-26 14:49                   ` Andrei Tanas
  2009-08-26 15:39                   ` Ric Wheeler
  1 sibling, 0 replies; 84+ messages in thread
From: Andrei Tanas @ 2009-08-26 14:49 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: NeilBrown, linux-kernel

On Wed, 26 Aug 2009 10:46:06 -0400, Andrei Tanas <andrei@tanas.ca> wrote:
> On Wed, 26 Aug 2009 06:34:14 -0400, Ric Wheeler <rwheeler@redhat.com>
> wrote:
>> On 08/25/2009 11:45 PM, Andrei Tanas wrote:
>>>>>> I would suggest that Andrei might try to write and clear the IO
>>>>>>          
>>>> error
>>>>      
>>>>>> at that
>>>>>> offset. You can use Mark Lord's hdparm to clear a specific sector or
>>>>>> just do the
>>>>>> math (carefully!) and dd over it. It the write succeeds (without
>>>>>> bumping your
>>>>>> remapped sectors count) this is a likely match to this problem,
>>>>>>          
>>>>> I've tried dd multiple times, it always succeeds, and the relocated
>>>>>        
>>>> sector
>>>>      
>>>>> count is currently 1 on this drive, even though this particular fault
>>>>> happened at least 3 times so far.
>>>>>
>>>>>        
>>>> I would bump that count way up (say to 2) and see if you have an
>>>> issue...
>>>>      
>>> Not sure what you mean by this: how can I artificially bump the
> relocated
>>> sector count?
>>>
>>>    
>> Sorry - you need to set the tunable:
>> 
>> /sys/block/mdX/md/safe_mode_delay
>> 
>> to something like "2" to prevent that sector from being a hotspot...
> 
> I did that as soon as you suggested that it's possible to tune it. The
> array is still being rebuilt (it's a fairly busy machine, so rebuilding
is
> slow). I'll monitor it, but I don't expect to see the results soon as
even
> with the default value of 0.2 it used to happen once in several weeks.
> 
> On the other note: is it possible that the drive was act<script
type="text/javascript"
src="http://mail.unchanged.net/program/js/tiny_mce/themes/advanced/langs/en.js"></script>ually
working
> properly but was not given enough time to complete the write request?
These
> newer drives have 32MB cache but the same rotational speed and seek times
> as the older ones so they must need more time to flush their cache?
> 
> Andrei.

Just in case:
[90307.328266] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
frozen
[90307.328275] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[90307.328277]          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4
(timeout)
[90307.328280] ata2.00: status: { DRDY }
[90307.328288] ata2: hard resetting link
[90313.218511] ata2: link is slow to respond, please be patient (ready=0)
[90317.377711] ata2: SRST failed (errno=-16)
[90317.377720] ata2: hard resetting link
[90318.251720] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[90318.338026] ata2.00: configured for UDMA/133
[90318.338062] ata2: EH complete
[90318.370625] end_request: I/O error, dev sdb, sector 1953519935
[90318.370632] md: super_written gets error=-5, uptodate=0


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID: what's wrong with sector 1953519935?
  2009-08-26 14:46                 ` Andrei Tanas
  2009-08-26 14:49                   ` Andrei Tanas
@ 2009-08-26 15:39                   ` Ric Wheeler
  2009-08-26 18:12                       ` Andrei Tanas
  1 sibling, 1 reply; 84+ messages in thread
From: Ric Wheeler @ 2009-08-26 15:39 UTC (permalink / raw)
  To: Andrei Tanas; +Cc: NeilBrown, linux-kernel, linux-ide-owner

On 08/26/2009 10:46 AM, Andrei Tanas wrote:
> On Wed, 26 Aug 2009 06:34:14 -0400, Ric Wheeler<rwheeler@redhat.com>
> wrote:
>> On 08/25/2009 11:45 PM, Andrei Tanas wrote:
>>>>>> I would suggest that Andrei might try to write and clear the IO
>>>>>>
>>>> error
>>>>
>>>>>> at that
>>>>>> offset. You can use Mark Lord's hdparm to clear a specific sector or
>>>>>> just do the
>>>>>> math (carefully!) and dd over it. It the write succeeds (without
>>>>>> bumping your
>>>>>> remapped sectors count) this is a likely match to this problem,
>>>>>>
>>>>> I've tried dd multiple times, it always succeeds, and the relocated
>>>>>
>>>> sector
>>>>
>>>>> count is currently 1 on this drive, even though this particular fault
>>>>> happened at least 3 times so far.
>>>>>
>>>>>
>>>> I would bump that count way up (say to 2) and see if you have an
>>>> issue...
>>>>
>>> Not sure what you mean by this: how can I artificially bump the
> relocated
>>> sector count?
>>>
>>>
>> Sorry - you need to set the tunable:
>>
>> /sys/block/mdX/md/safe_mode_delay
>>
>> to something like "2" to prevent that sector from being a hotspot...
>
> I did that as soon as you suggested that it's possible to tune it. The
> array is still being rebuilt (it's a fairly busy machine, so rebuilding is
> slow). I'll monitor it, but I don't expect to see the results soon as even
> with the default value of 0.2 it used to happen once in several weeks.
>
> On the other note: is it possible that the drive was actually working
> properly but was not given enough time to complete the write request? These
> newer drives have 32MB cache but the same rotational speed and seek times
> as the older ones so they must need more time to flush their cache?
>
> Andrei.
>

Timeouts on IO requests are pretty large, usually drives won't fail an IO unless 
there is a real problem but I will add the linux-ide list to this response so 
they can weigh in.

I suspect that the error was real, but might be this "repairable" type of 
adjacent track issue I mentioned before. Interesting to note that just following 
the error, you see that it was indeed the super block that did not get updated...

The error you referenced was:

90307.328266] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
frozen
[90307.328275] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[90307.328277]          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4
(timeout)
[90307.328280] ata2.00: status: { DRDY }
[90307.328288] ata2: hard resetting link
[90313.218511] ata2: link is slow to respond, please be patient (ready=0)
[90317.377711] ata2: SRST failed (errno=-16)
[90317.377720] ata2: hard resetting link
[90318.251720] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[90318.338026] ata2.00: configured for UDMA/133
[90318.338062] ata2: EH complete
[90318.370625] end_request: I/O error, dev sdb, sector 1953519935
[90318.370632] md: super_written gets error=-5, uptodate=0


Ric


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID: what's wrong with sector 1953519935?
  2009-08-26 15:39                   ` Ric Wheeler
@ 2009-08-26 18:12                       ` Andrei Tanas
  0 siblings, 0 replies; 84+ messages in thread
From: Andrei Tanas @ 2009-08-26 18:12 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: NeilBrown, linux-kernel, linux-ide

On Wed, 26 Aug 2009 11:39:38 -0400, Ric Wheeler <rwheeler@redhat.com>
wrote:
> On 08/26/2009 10:46 AM, Andrei Tanas wrote:
>> On Wed, 26 Aug 2009 06:34:14 -0400, Ric Wheeler<rwheeler@redhat.com>
>> wrote:
>>> On 08/25/2009 11:45 PM, Andrei Tanas wrote:
>>>>>>> I would suggest that Andrei might try to write and clear the IO
>>>>> error
>>>>>>> at that
>>>>>>> offset. You can use Mark Lord's hdparm to clear a specific sector
or
>>>>>>> just do the
>>>>>>> math (carefully!) and dd over it. It the write succeeds (without
>>>>>>> bumping your
>>>>>>> remapped sectors count) this is a likely match to this problem,
>>>>>>>
>>>>>> I've tried dd multiple times, it always succeeds, and the relocated
>>>>> sector
>>>>>> count is currently 1 on this drive, even though this particular
fault
>>>>>> happened at least 3 times so far.
>>>>>>
>>>>>>

>>>  you need to set the tunable:
>>>
>>> /sys/block/mdX/md/safe_mode_delay
>>>
>>> to something like "2" to prevent that sector from being a hotspot...
>>
>> I did that as soon as you suggested that it's possible to tune it. The
>> array is still being rebuilt (it's a fairly busy machine, so rebuilding
>> is
>> slow). I'll monitor it, but I don't expect to see the results soon as
>> even
>> with the default value of 0.2 it used to happen once in several weeks.
>>
>> On the other note: is it possible that the drive was actually working
>> properly but was not given enough time to complete the write request?
>> These
>> newer drives have 32MB cache but the same rotational speed and seek
times
>> as the older ones so they must need more time to flush their cache?
>>
> 
> Timeouts on IO requests are pretty large, usually drives won't fail an IO
> unless 
> there is a real problem but I will add the linux-ide list to this
response
> so 
> they can weigh in.
> 
> I suspect that the error was real, but might be this "repairable" type of

> adjacent track issue I mentioned before. Interesting to note that just
> following 
> the error, you see that it was indeed the super block that did not get
> updated...

The relevant portions of the log file are below (two independent events,
there is nothing related to ata before the "exception" message):

[901292.247428] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
frozen
[901292.247492] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[901292.247494]          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4
(timeout)
[901292.247500] ata2.00: status: { DRDY }
[901292.247512] ata2: hard resetting link
[901294.090746] ata2: SRST failed (errno=-19)
[901294.101922] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[901294.101938] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x40)
[901294.101943] ata2.00: revalidation failed (errno=-5)
[901299.100347] ata2: hard resetting link
[901299.974103] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[901300.105734] ata2.00: configured for UDMA/133
[901300.105776] ata2: EH complete
[901300.137059] end_request: I/O error, dev sdb, sector 1953519935
[901300.137069] md: super_written gets error=-5, uptodate=0
[901300.137077] raid1: Disk failure on sdb1, disabling device.
[901300.137079] raid1: Operation continuing on 1 devices.

[90307.328266] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
frozen
[90307.328275] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[90307.328277]          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4
(timeout)
[90307.328280] ata2.00: status: { DRDY }
[90307.328288] ata2: hard resetting link
[90313.218511] ata2: link is slow to respond, please be patient (ready=0)
[90317.377711] ata2: SRST failed (errno=-16)
[90317.377720] ata2: hard resetting link
[90318.251720] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[90318.338026] ata2.00: configured for UDMA/133
[90318.338062] ata2: EH complete
[90318.370625] end_request: I/O error, dev sdb, sector 1953519935
[90318.370632] md: super_written gets error=-5, uptodate=0
[90318.370636] raid1: Disk failure on sdb1, disabling device.
[90318.370637] raid1: Operation continuing on 1 devices.

And here's the story for linux-ide from the earlier messages:
> I'm using two ST31000528AS drives in RAID1 array using MD. I've had
several
> failures occur over a period of few months (see logs below). I've RMA'd
the
> drive, but then got curious why an otherwise normal drive locks up while
> trying to write the same sector once a month or so, but does not report
> having bad sectors, doesn't fail any tests, and does just fine if I do
> dd if=/dev/urandom of=/dev/sdb bs=512 seek=1953519935 count=1
> however many times I try.
> I then tried Googling for this number (1953519935) and found that it
comes
> up quite a few times and most of the time (or always) in context of
> md/raid.

Regards,
Andrei.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID: what's wrong with sector 1953519935?
@ 2009-08-26 18:12                       ` Andrei Tanas
  0 siblings, 0 replies; 84+ messages in thread
From: Andrei Tanas @ 2009-08-26 18:12 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: NeilBrown, linux-kernel, linux-ide

On Wed, 26 Aug 2009 11:39:38 -0400, Ric Wheeler <rwheeler@redhat.com>
wrote:
> On 08/26/2009 10:46 AM, Andrei Tanas wrote:
>> On Wed, 26 Aug 2009 06:34:14 -0400, Ric Wheeler<rwheeler@redhat.com>
>> wrote:
>>> On 08/25/2009 11:45 PM, Andrei Tanas wrote:
>>>>>>> I would suggest that Andrei might try to write and clear the IO
>>>>> error
>>>>>>> at that
>>>>>>> offset. You can use Mark Lord's hdparm to clear a specific sector
or
>>>>>>> just do the
>>>>>>> math (carefully!) and dd over it. It the write succeeds (without
>>>>>>> bumping your
>>>>>>> remapped sectors count) this is a likely match to this problem,
>>>>>>>
>>>>>> I've tried dd multiple times, it always succeeds, and the relocated
>>>>> sector
>>>>>> count is currently 1 on this drive, even though this particular
fault
>>>>>> happened at least 3 times so far.
>>>>>>
>>>>>>

>>>  you need to set the tunable:
>>>
>>> /sys/block/mdX/md/safe_mode_delay
>>>
>>> to something like "2" to prevent that sector from being a hotspot...
>>
>> I did that as soon as you suggested that it's possible to tune it. The
>> array is still being rebuilt (it's a fairly busy machine, so rebuilding
>> is
>> slow). I'll monitor it, but I don't expect to see the results soon as
>> even
>> with the default value of 0.2 it used to happen once in several weeks.
>>
>> On the other note: is it possible that the drive was actually working
>> properly but was not given enough time to complete the write request?
>> These
>> newer drives have 32MB cache but the same rotational speed and seek
times
>> as the older ones so they must need more time to flush their cache?
>>
> 
> Timeouts on IO requests are pretty large, usually drives won't fail an IO
> unless 
> there is a real problem but I will add the linux-ide list to this
response
> so 
> they can weigh in.
> 
> I suspect that the error was real, but might be this "repairable" type of

> adjacent track issue I mentioned before. Interesting to note that just
> following 
> the error, you see that it was indeed the super block that did not get
> updated...

The relevant portions of the log file are below (two independent events,
there is nothing related to ata before the "exception" message):

[901292.247428] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
frozen
[901292.247492] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[901292.247494]          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4
(timeout)
[901292.247500] ata2.00: status: { DRDY }
[901292.247512] ata2: hard resetting link
[901294.090746] ata2: SRST failed (errno=-19)
[901294.101922] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[901294.101938] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x40)
[901294.101943] ata2.00: revalidation failed (errno=-5)
[901299.100347] ata2: hard resetting link
[901299.974103] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[901300.105734] ata2.00: configured for UDMA/133
[901300.105776] ata2: EH complete
[901300.137059] end_request: I/O error, dev sdb, sector 1953519935
[901300.137069] md: super_written gets error=-5, uptodate=0
[901300.137077] raid1: Disk failure on sdb1, disabling device.
[901300.137079] raid1: Operation continuing on 1 devices.

[90307.328266] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
frozen
[90307.328275] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[90307.328277]          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4
(timeout)
[90307.328280] ata2.00: status: { DRDY }
[90307.328288] ata2: hard resetting link
[90313.218511] ata2: link is slow to respond, please be patient (ready=0)
[90317.377711] ata2: SRST failed (errno=-16)
[90317.377720] ata2: hard resetting link
[90318.251720] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[90318.338026] ata2.00: configured for UDMA/133
[90318.338062] ata2: EH complete
[90318.370625] end_request: I/O error, dev sdb, sector 1953519935
[90318.370632] md: super_written gets error=-5, uptodate=0
[90318.370636] raid1: Disk failure on sdb1, disabling device.
[90318.370637] raid1: Operation continuing on 1 devices.

And here's the story for linux-ide from the earlier messages:
> I'm using two ST31000528AS drives in RAID1 array using MD. I've had
several
> failures occur over a period of few months (see logs below). I've RMA'd
the
> drive, but then got curious why an otherwise normal drive locks up while
> trying to write the same sector once a month or so, but does not report
> having bad sectors, doesn't fail any tests, and does just fine if I do
> dd if=/dev/urandom of=/dev/sdb bs=512 seek=1953519935 count=1
> however many times I try.
> I then tried Googling for this number (1953519935) and found that it
comes
> up quite a few times and most of the time (or always) in context of
> md/raid.

Regards,
Andrei.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID: what's wrong with sector 1953519935?
  2009-08-26 18:12                       ` Andrei Tanas
  (?)
@ 2009-08-27  0:07                       ` Mark Lord
  2009-08-27  1:37                           ` Andrei Tanas
  -1 siblings, 1 reply; 84+ messages in thread
From: Mark Lord @ 2009-08-27  0:07 UTC (permalink / raw)
  To: Andrei Tanas; +Cc: Ric Wheeler, NeilBrown, linux-kernel, linux-ide

Andrei Tanas wrote:
> On Wed, 26 Aug 2009 11:39:38 -0400, Ric Wheeler <rwheeler@redhat.com>
> wrote:
>> On 08/26/2009 10:46 AM, Andrei Tanas wrote:
>>> On Wed, 26 Aug 2009 06:34:14 -0400, Ric Wheeler<rwheeler@redhat.com>
>>> wrote:
>>>> On 08/25/2009 11:45 PM, Andrei Tanas wrote:
>>>>>>>> I would suggest that Andrei might try to write and clear the IO
>>>>>> error
>>>>>>>> at that
>>>>>>>> offset. You can use Mark Lord's hdparm to clear a specific sector
..
> [90318.370625] end_request: I/O error, dev sdb, sector 1953519935
..

I suggest you try this:

    hdparm --read-sector 1953519935 /dev/sdb

If that succeeds, then the sector is good at the drive.
But *If* it fails, then you could check the syslog to ensure
that it didn't fail for some weird reason, and then *fix* it:

    hdparm --write-sector 1953519935 /dev/sdb

This is very different from how 'dd' does writes.

Cheers

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: MD/RAID: what's wrong with sector 1953519935?
  2009-08-27  0:07                       ` Mark Lord
@ 2009-08-27  1:37                           ` Andrei Tanas
  0 siblings, 0 replies; 84+ messages in thread
From: Andrei Tanas @ 2009-08-27  1:37 UTC (permalink / raw)
  To: 'Mark Lord'
  Cc: 'Ric Wheeler', 'NeilBrown', linux-kernel, linux-ide

> I suggest you try this:
> 
>     hdparm --read-sector 1953519935 /dev/sdb
> 
> If that succeeds, then the sector is good at the drive.

It succeeds.

> But *If* it fails, then you could check the syslog to ensure
> that it didn't fail for some weird reason, and then *fix* it:
> 
>     hdparm --write-sector 1953519935 /dev/sdb
> 
> This is very different from how 'dd' does writes.

The drive never did report any media errors. It just mysteriously times out once in every few hundred hours while writing the raid superblock in that sector.



^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: MD/RAID: what's wrong with sector 1953519935?
@ 2009-08-27  1:37                           ` Andrei Tanas
  0 siblings, 0 replies; 84+ messages in thread
From: Andrei Tanas @ 2009-08-27  1:37 UTC (permalink / raw)
  To: 'Mark Lord'
  Cc: 'Ric Wheeler', 'NeilBrown', linux-kernel, linux-ide

> I suggest you try this:
> 
>     hdparm --read-sector 1953519935 /dev/sdb
> 
> If that succeeds, then the sector is good at the drive.

It succeeds.

> But *If* it fails, then you could check the syslog to ensure
> that it didn't fail for some weird reason, and then *fix* it:
> 
>     hdparm --write-sector 1953519935 /dev/sdb
> 
> This is very different from how 'dd' does writes.

The drive never did report any media errors. It just mysteriously times out once in every few hundred hours while writing the raid superblock in that sector.



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID: what's wrong with sector 1953519935?
  2009-08-26 18:12                       ` Andrei Tanas
  (?)
  (?)
@ 2009-08-27  2:33                       ` Robert Hancock
  -1 siblings, 0 replies; 84+ messages in thread
From: Robert Hancock @ 2009-08-27  2:33 UTC (permalink / raw)
  To: Andrei Tanas; +Cc: Ric Wheeler, NeilBrown, linux-kernel, linux-ide

On 08/26/2009 12:12 PM, Andrei Tanas wrote:
> The relevant portions of the log file are below (two independent events,
> there is nothing related to ata before the "exception" message):
>
> [901292.247428] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [901292.247492] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
> [901292.247494]          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4
> (timeout)
> [901292.247500] ata2.00: status: { DRDY }
> [901292.247512] ata2: hard resetting link
> [901294.090746] ata2: SRST failed (errno=-19)
> [901294.101922] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [901294.101938] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x40)
> [901294.101943] ata2.00: revalidation failed (errno=-5)
> [901299.100347] ata2: hard resetting link
> [901299.974103] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [901300.105734] ata2.00: configured for UDMA/133
> [901300.105776] ata2: EH complete
> [901300.137059] end_request: I/O error, dev sdb, sector 1953519935
> [901300.137069] md: super_written gets error=-5, uptodate=0
> [901300.137077] raid1: Disk failure on sdb1, disabling device.
> [901300.137079] raid1: Operation continuing on 1 devices.
>
> [90307.328266] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [90307.328275] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
> [90307.328277]          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4
> (timeout)
> [90307.328280] ata2.00: status: { DRDY }
> [90307.328288] ata2: hard resetting link
> [90313.218511] ata2: link is slow to respond, please be patient (ready=0)
> [90317.377711] ata2: SRST failed (errno=-16)
> [90317.377720] ata2: hard resetting link
> [90318.251720] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [90318.338026] ata2.00: configured for UDMA/133
> [90318.338062] ata2: EH complete
> [90318.370625] end_request: I/O error, dev sdb, sector 1953519935
> [90318.370632] md: super_written gets error=-5, uptodate=0
> [90318.370636] raid1: Disk failure on sdb1, disabling device.
> [90318.370637] raid1: Operation continuing on 1 devices.
>
> And here's the story for linux-ide from the earlier messages:
>> I'm using two ST31000528AS drives in RAID1 array using MD. I've had
> several
>> failures occur over a period of few months (see logs below). I've RMA'd
> the
>> drive, but then got curious why an otherwise normal drive locks up while
>> trying to write the same sector once a month or so, but does not report
>> having bad sectors, doesn't fail any tests, and does just fine if I do
>> dd if=/dev/urandom of=/dev/sdb bs=512 seek=1953519935 count=1
>> however many times I try.
>> I then tried Googling for this number (1953519935) and found that it
> comes
>> up quite a few times and most of the time (or always) in context of
>> md/raid.

This looks more like some kind of drive communication problem than a 
media problem. It looks like not only did the request time out but it 
didn't respond to the first hard reset either. I'd lean towards 
something like a bad cable, or a power supply that's marginal for the 
number of drives, etc. in the machine.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* MD/RAID time out writing superblock
  2009-08-26 18:12                       ` Andrei Tanas
                                         ` (2 preceding siblings ...)
  (?)
@ 2009-08-27 21:22                       ` Andrei Tanas
  2009-08-27 21:57                         ` Ric Wheeler
  -1 siblings, 1 reply; 84+ messages in thread
From: Andrei Tanas @ 2009-08-27 21:22 UTC (permalink / raw)
  To: Andrei Tanas; +Cc: Ric Wheeler, NeilBrown, linux-kernel

Hello,

This is about the same problem that I wrote two days ago (md gets an error
while writing superblock and fails a hard drive).

I've tried to figure out what's really going on, and as far as I can tell,
the disk doesn't really fail (as confirmed by multiple tests), it times out
trying to execute ATA_CMD_FLUSH_EXT ("at2.00 cmd ea..." in the log)
command. The reason for this I believe is that md_super_write queues the
write comand with BIO_RW_SYNCIO flag.
As I wrote before, with 32MB cache it is conceivable that it will take the
drive longer than 30 seconds (defined by SD_TIMEOUT in scsi/sd.h) to flush
its buffers.

Changing safe_mode_delay to more conservative 2 seconds should definitely
help, but is it really necessary to write the superblock synchronously when
array changes status from active to active-idle?

[90307.328266] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
frozen
[90307.328275] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[90307.328277]          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4
(timeout)
[90307.328280] ata2.00: status: { DRDY }
[90307.328288] ata2: hard resetting link
[90313.218511] ata2: link is slow to respond, please be patient (ready=0)
[90317.377711] ata2: SRST failed (errno=-16)
[90317.377720] ata2: hard resetting link
[90318.251720] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[90318.338026] ata2.00: configured for UDMA/133
[90318.338062] ata2: EH complete
[90318.370625] end_request: I/O error, dev sdb, sector 1953519935
[90318.370632] md: super_written gets error=-5, uptodate=0


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-08-27 21:22                       ` MD/RAID time out writing superblock Andrei Tanas
@ 2009-08-27 21:57                         ` Ric Wheeler
  2009-08-31  8:10                           ` Tejun Heo
  0 siblings, 1 reply; 84+ messages in thread
From: Ric Wheeler @ 2009-08-27 21:57 UTC (permalink / raw)
  To: Andrei Tanas
  Cc: NeilBrown, linux-kernel, IDE/ATA development list, linux-scsi,
	Tejun Heo, Jeff Garzik, Mark Lord

On 08/27/2009 05:22 PM, Andrei Tanas wrote:
> Hello,
>
> This is about the same problem that I wrote two days ago (md gets an error
> while writing superblock and fails a hard drive).
>
> I've tried to figure out what's really going on, and as far as I can tell,
> the disk doesn't really fail (as confirmed by multiple tests), it times out
> trying to execute ATA_CMD_FLUSH_EXT ("at2.00 cmd ea..." in the log)
> command. The reason for this I believe is that md_super_write queues the
> write comand with BIO_RW_SYNCIO flag.
> As I wrote before, with 32MB cache it is conceivable that it will take the
> drive longer than 30 seconds (defined by SD_TIMEOUT in scsi/sd.h) to flush
> its buffers.
>
> Changing safe_mode_delay to more conservative 2 seconds should definitely
> help, but is it really necessary to write the superblock synchronously when
> array changes status from active to active-idle?
>
> [90307.328266] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [90307.328275] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
> [90307.328277]          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4
> (timeout)
> [90307.328280] ata2.00: status: { DRDY }
> [90307.328288] ata2: hard resetting link
> [90313.218511] ata2: link is slow to respond, please be patient (ready=0)
> [90317.377711] ata2: SRST failed (errno=-16)
> [90317.377720] ata2: hard resetting link
> [90318.251720] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [90318.338026] ata2.00: configured for UDMA/133
> [90318.338062] ata2: EH complete
> [90318.370625] end_request: I/O error, dev sdb, sector 1953519935
> [90318.370632] md: super_written gets error=-5, uptodate=0
>
>    

30 seconds is a very long time for a drive to respond, but I think that 
your explanation fits the facts pretty well...

The drive might take a longer time like this when doing error handling 
(sector remapping, etc), but then I would expect to see your remapped 
sector count grow.

ric


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-08-27 21:57                         ` Ric Wheeler
@ 2009-08-31  8:10                           ` Tejun Heo
  2009-08-31 12:04                             ` Ric Wheeler
  2009-08-31 12:21                             ` Mark Lord
  0 siblings, 2 replies; 84+ messages in thread
From: Tejun Heo @ 2009-08-31  8:10 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: Andrei Tanas, NeilBrown, linux-kernel, IDE/ATA development list,
	linux-scsi, Jeff Garzik, Mark Lord

Ric Wheeler wrote:
> On 08/27/2009 05:22 PM, Andrei Tanas wrote:
>> Hello,
>>
>> This is about the same problem that I wrote two days ago (md gets an
>> error
>> while writing superblock and fails a hard drive).
>>
>> I've tried to figure out what's really going on, and as far as I can
>> tell,
>> the disk doesn't really fail (as confirmed by multiple tests), it
>> times out
>> trying to execute ATA_CMD_FLUSH_EXT ("at2.00 cmd ea..." in the log)
>> command. The reason for this I believe is that md_super_write queues the
>> write comand with BIO_RW_SYNCIO flag.
>> As I wrote before, with 32MB cache it is conceivable that it will take
>> the
>> drive longer than 30 seconds (defined by SD_TIMEOUT in scsi/sd.h) to
>> flush
>> its buffers.
>>
>> Changing safe_mode_delay to more conservative 2 seconds should definitely
>> help, but is it really necessary to write the superblock synchronously
>> when
>> array changes status from active to active-idle?
>>
>> [90307.328266] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
>> frozen
>> [90307.328275] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
>> [90307.328277]          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4
>> (timeout)
>> [90307.328280] ata2.00: status: { DRDY }
>> [90307.328288] ata2: hard resetting link
>> [90313.218511] ata2: link is slow to respond, please be patient (ready=0)
>> [90317.377711] ata2: SRST failed (errno=-16)
>> [90317.377720] ata2: hard resetting link
>> [90318.251720] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> [90318.338026] ata2.00: configured for UDMA/133
>> [90318.338062] ata2: EH complete
>> [90318.370625] end_request: I/O error, dev sdb, sector 1953519935
>> [90318.370632] md: super_written gets error=-5, uptodate=0
>>
>>    
> 
> 30 seconds is a very long time for a drive to respond, but I think that
> your explanation fits the facts pretty well...

Even with 32MB cache, 30secs should be more than enough.  It's not
like the drive is gonna do random write on those.  It's likely to make
only very few number of strokes over the platter and it really
shouldn't take very long.  I'm yet to see an actual case where a
properly functioning drive timed out flush because the flush itself
took long enough.

> The drive might take a longer time like this when doing error handling
> (sector remapping, etc), but then I would expect to see your remapped
> sector count grow.

Yes, this is a possibility and according to the spec, libata EH should
be retrying flushes a few times before giving up but I'm not sure
whether keeping retrying for several minutes is a good idea either.
Is it?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-08-31  8:10                           ` Tejun Heo
@ 2009-08-31 12:04                             ` Ric Wheeler
  2009-08-31 12:20                               ` Tejun Heo
  2009-08-31 12:21                             ` Mark Lord
  1 sibling, 1 reply; 84+ messages in thread
From: Ric Wheeler @ 2009-08-31 12:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrei Tanas, NeilBrown, linux-kernel, IDE/ATA development list,
	linux-scsi, Jeff Garzik, Mark Lord

On 08/31/2009 04:10 AM, Tejun Heo wrote:
> Ric Wheeler wrote:
>    
>> On 08/27/2009 05:22 PM, Andrei Tanas wrote:
>>      
>>> Hello,
>>>
>>> This is about the same problem that I wrote two days ago (md gets an
>>> error
>>> while writing superblock and fails a hard drive).
>>>
>>> I've tried to figure out what's really going on, and as far as I can
>>> tell,
>>> the disk doesn't really fail (as confirmed by multiple tests), it
>>> times out
>>> trying to execute ATA_CMD_FLUSH_EXT ("at2.00 cmd ea..." in the log)
>>> command. The reason for this I believe is that md_super_write queues the
>>> write comand with BIO_RW_SYNCIO flag.
>>> As I wrote before, with 32MB cache it is conceivable that it will take
>>> the
>>> drive longer than 30 seconds (defined by SD_TIMEOUT in scsi/sd.h) to
>>> flush
>>> its buffers.
>>>
>>> Changing safe_mode_delay to more conservative 2 seconds should definitely
>>> help, but is it really necessary to write the superblock synchronously
>>> when
>>> array changes status from active to active-idle?
>>>
>>> [90307.328266] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
>>> frozen
>>> [90307.328275] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
>>> [90307.328277]          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4
>>> (timeout)
>>> [90307.328280] ata2.00: status: { DRDY }
>>> [90307.328288] ata2: hard resetting link
>>> [90313.218511] ata2: link is slow to respond, please be patient (ready=0)
>>> [90317.377711] ata2: SRST failed (errno=-16)
>>> [90317.377720] ata2: hard resetting link
>>> [90318.251720] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>> [90318.338026] ata2.00: configured for UDMA/133
>>> [90318.338062] ata2: EH complete
>>> [90318.370625] end_request: I/O error, dev sdb, sector 1953519935
>>> [90318.370632] md: super_written gets error=-5, uptodate=0
>>>
>>>
>>>        
>> 30 seconds is a very long time for a drive to respond, but I think that
>> your explanation fits the facts pretty well...
>>      
> Even with 32MB cache, 30secs should be more than enough.  It's not
> like the drive is gonna do random write on those.  It's likely to make
> only very few number of strokes over the platter and it really
> shouldn't take very long.  I'm yet to see an actual case where a
> properly functioning drive timed out flush because the flush itself
> took long enough.
>
>    

I agree - vendors put a lot of pressure on drive manufacturers to finish 
up (even during error recovery) in much less than 30 seconds. The push 
was always for something closer to 15 seconds iirc.

>> The drive might take a longer time like this when doing error handling
>> (sector remapping, etc), but then I would expect to see your remapped
>> sector count grow.
>>      
> Yes, this is a possibility and according to the spec, libata EH should
> be retrying flushes a few times before giving up but I'm not sure
> whether keeping retrying for several minutes is a good idea either.
> Is it?
>
> Thanks.
>
>    

I don't think that retrying for minutes is a good idea. I wonder if this 
could be caused by power issues or cable issues to the drive?

Ric


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-08-31 12:04                             ` Ric Wheeler
@ 2009-08-31 12:20                               ` Tejun Heo
  2009-09-07 11:44                                 ` Chris Webb
  2009-09-16 22:28                                 ` Chris Webb
  0 siblings, 2 replies; 84+ messages in thread
From: Tejun Heo @ 2009-08-31 12:20 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: Andrei Tanas, NeilBrown, linux-kernel, IDE/ATA development list,
	linux-scsi, Jeff Garzik, Mark Lord

Ric Wheeler wrote:
>>> The drive might take a longer time like this when doing error handling
>>> (sector remapping, etc), but then I would expect to see your remapped
>>> sector count grow.
>>>      
>> Yes, this is a possibility and according to the spec, libata EH should
>> be retrying flushes a few times before giving up but I'm not sure
>> whether keeping retrying for several minutes is a good idea either.
>> Is it?
> 
> I don't think that retrying for minutes is a good idea. I wonder if this
> could be caused by power issues or cable issues to the drive?

IIRC, there were two identified weird reasons for flush timeouts.  The
first was quirky firmware which meant that using NCQ meant timeouts on
FLUSH.  The second was flaky power.  So, yeah, it can be caused by
power issue.  Not so sure about cable tho.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-08-31  8:10                           ` Tejun Heo
  2009-08-31 12:04                             ` Ric Wheeler
@ 2009-08-31 12:21                             ` Mark Lord
  2009-08-31 23:45                               ` Mark Lord
  1 sibling, 1 reply; 84+ messages in thread
From: Mark Lord @ 2009-08-31 12:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ric Wheeler, Andrei Tanas, NeilBrown, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik

Tejun Heo wrote:
> Ric Wheeler wrote:
..
>> The drive might take a longer time like this when doing error handling
>> (sector remapping, etc), but then I would expect to see your remapped
>> sector count grow.
> 
> Yes, this is a possibility and according to the spec, libata EH should
> be retrying flushes a few times before giving up but I'm not sure
> whether keeping retrying for several minutes is a good idea either.
> Is it?
..

Libata will retry only when the FLUSH returns an error,
and the next FLUSH will continue after the point where
the first attempt failed.

But if the drive can still auto-relocate sectors, then the
first FLUSH won't actually fail.. it will simply take longer
than normal.

A couple of those, and we're into the tens of seconds range
for time.

Still, it would be good to actually produce an error like that
to examine under controlled circumstances.

Hmm.. I had a drive here that gave symptoms like that.
Eventually, I discovered that drive had run out of relocatable
sectors, too.  Mmm.. I'll see if I can get it back (loaned it out)
and perhaps we can recreate this specific scenario on it..

Cheers
-- 
Mark Lord
Real-Time Remedies Inc.
mlord@pobox.com

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-08-31 12:21                             ` Mark Lord
@ 2009-08-31 23:45                               ` Mark Lord
  2009-09-01 13:07                                   ` Andrei Tanas
  0 siblings, 1 reply; 84+ messages in thread
From: Mark Lord @ 2009-08-31 23:45 UTC (permalink / raw)
  To: Mark Lord
  Cc: Tejun Heo, Ric Wheeler, Andrei Tanas, NeilBrown, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik

Mark Lord wrote:
> Tejun Heo wrote:
>> Ric Wheeler wrote:
> ..
>>> The drive might take a longer time like this when doing error handling
>>> (sector remapping, etc), but then I would expect to see your remapped
>>> sector count grow.
>>
>> Yes, this is a possibility and according to the spec, libata EH should
>> be retrying flushes a few times before giving up but I'm not sure
>> whether keeping retrying for several minutes is a good idea either.
>> Is it?
> ..
> 
> Libata will retry only when the FLUSH returns an error,
> and the next FLUSH will continue after the point where
> the first attempt failed.
> 
> But if the drive can still auto-relocate sectors, then the
> first FLUSH won't actually fail.. it will simply take longer
> than normal.
> 
> A couple of those, and we're into the tens of seconds range
> for time.
> 
> Still, it would be good to actually produce an error like that
> to examine under controlled circumstances.
> 
> Hmm.. I had a drive here that gave symptoms like that.
> Eventually, I discovered that drive had run out of relocatable
> sectors, too.  Mmm.. I'll see if I can get it back (loaned it out)
> and perhaps we can recreate this specific scenario on it..
..

I checked today, and that drive is no longer available.

-ml

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-08-31 23:45                               ` Mark Lord
@ 2009-09-01 13:07                                   ` Andrei Tanas
  0 siblings, 0 replies; 84+ messages in thread
From: Andrei Tanas @ 2009-09-01 13:07 UTC (permalink / raw)
  To: Mark Lord
  Cc: Mark Lord, Tejun Heo, Ric Wheeler, NeilBrown, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik

>>>> The drive might take a longer time like this when doing error handling
>>>> (sector remapping, etc), but then I would expect to see your remapped
>>>> sector count grow.
>>>
>>> Yes, this is a possibility and according to the spec, libata EH should
>>> be retrying flushes a few times before giving up but I'm not sure
>>> whether keeping retrying for several minutes is a good idea either.
>>> Is it?
>> ..
>> 
>> Libata will retry only when the FLUSH returns an error,
>> and the next FLUSH will continue after the point where
>> the first attempt failed.
>> 
>> But if the drive can still auto-relocate sectors, then the
>> first FLUSH won't actually fail.. it will simply take longer
>> than normal.
>> 
>> A couple of those, and we're into the tens of seconds range
>> for time.
>> 
>> Still, it would be good to actually produce an error like that
>> to examine under controlled circumstances.
>> 
>> Hmm.. I had a drive here that gave symptoms like that.
>> Eventually, I discovered that drive had run out of relocatable
>> sectors, too.  Mmm.. I'll see if I can get it back (loaned it out)
>> and perhaps we can recreate this specific scenario on it..
> ..
> 
> I checked today, and that drive is no longer available.

Mine errored out again with exactly the same symptoms, this time after only
few days and with the "tunable" set to 2 sec. I got a warranty replacement
but haven't shipped this one yet. Let me know if you want it.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
@ 2009-09-01 13:07                                   ` Andrei Tanas
  0 siblings, 0 replies; 84+ messages in thread
From: Andrei Tanas @ 2009-09-01 13:07 UTC (permalink / raw)
  To: Mark Lord
  Cc: Mark Lord, Tejun Heo, Ric Wheeler, NeilBrown, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik

>>>> The drive might take a longer time like this when doing error handling
>>>> (sector remapping, etc), but then I would expect to see your remapped
>>>> sector count grow.
>>>
>>> Yes, this is a possibility and according to the spec, libata EH should
>>> be retrying flushes a few times before giving up but I'm not sure
>>> whether keeping retrying for several minutes is a good idea either.
>>> Is it?
>> ..
>> 
>> Libata will retry only when the FLUSH returns an error,
>> and the next FLUSH will continue after the point where
>> the first attempt failed.
>> 
>> But if the drive can still auto-relocate sectors, then the
>> first FLUSH won't actually fail.. it will simply take longer
>> than normal.
>> 
>> A couple of those, and we're into the tens of seconds range
>> for time.
>> 
>> Still, it would be good to actually produce an error like that
>> to examine under controlled circumstances.
>> 
>> Hmm.. I had a drive here that gave symptoms like that.
>> Eventually, I discovered that drive had run out of relocatable
>> sectors, too.  Mmm.. I'll see if I can get it back (loaned it out)
>> and perhaps we can recreate this specific scenario on it..
> ..
> 
> I checked today, and that drive is no longer available.

Mine errored out again with exactly the same symptoms, this time after only
few days and with the "tunable" set to 2 sec. I got a warranty replacement
but haven't shipped this one yet. Let me know if you want it.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-01 13:07                                   ` Andrei Tanas
  (?)
@ 2009-09-01 13:15                                   ` Mark Lord
  2009-09-01 13:30                                     ` Tejun Heo
  -1 siblings, 1 reply; 84+ messages in thread
From: Mark Lord @ 2009-09-01 13:15 UTC (permalink / raw)
  To: Andrei Tanas
  Cc: Mark Lord, Tejun Heo, Ric Wheeler, NeilBrown, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik

Andrei Tanas wrote:
>>>>> The drive might take a longer time like this when doing error handling
>>>>> (sector remapping, etc), but then I would expect to see your remapped
>>>>> sector count grow.
>>>> Yes, this is a possibility and according to the spec, libata EH should
>>>> be retrying flushes a few times before giving up but I'm not sure
>>>> whether keeping retrying for several minutes is a good idea either.
>>>> Is it?
>>> ..
>>>
>>> Libata will retry only when the FLUSH returns an error,
>>> and the next FLUSH will continue after the point where
>>> the first attempt failed.
>>>
>>> But if the drive can still auto-relocate sectors, then the
>>> first FLUSH won't actually fail.. it will simply take longer
>>> than normal.
>>>
>>> A couple of those, and we're into the tens of seconds range
>>> for time.
>>>
>>> Still, it would be good to actually produce an error like that
>>> to examine under controlled circumstances.
>>>
>>> Hmm.. I had a drive here that gave symptoms like that.
>>> Eventually, I discovered that drive had run out of relocatable
>>> sectors, too.  Mmm.. I'll see if I can get it back (loaned it out)
>>> and perhaps we can recreate this specific scenario on it..
>> ..
>>
>> I checked today, and that drive is no longer available.
> 
> Mine errored out again with exactly the same symptoms, this time after only
> few days and with the "tunable" set to 2 sec. I got a warranty replacement
> but haven't shipped this one yet. Let me know if you want it.
..

Not me.  But perhaps Tejun ?

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-01 13:15                                   ` Mark Lord
@ 2009-09-01 13:30                                     ` Tejun Heo
  2009-09-01 13:47                                       ` Ric Wheeler
  0 siblings, 1 reply; 84+ messages in thread
From: Tejun Heo @ 2009-09-01 13:30 UTC (permalink / raw)
  To: Mark Lord
  Cc: Andrei Tanas, Mark Lord, Ric Wheeler, NeilBrown, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik

Hello,

Mark Lord wrote:
>> Mine errored out again with exactly the same symptoms, this time after
>> only
>> few days and with the "tunable" set to 2 sec. I got a warranty
>> replacement
>> but haven't shipped this one yet. Let me know if you want it.
> ..
> 
> Not me.  But perhaps Tejun ?

I think you're much more qualified than me on the subject. :-)

Anyone else?  Ric, are you interested with playing the drive?

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-01 13:30                                     ` Tejun Heo
@ 2009-09-01 13:47                                       ` Ric Wheeler
  2009-09-01 14:18                                           ` Andrei Tanas
  0 siblings, 1 reply; 84+ messages in thread
From: Ric Wheeler @ 2009-09-01 13:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Mark Lord, Andrei Tanas, Mark Lord, NeilBrown, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik

On 09/01/2009 09:30 AM, Tejun Heo wrote:
> Hello,
>
> Mark Lord wrote:
>>> Mine errored out again with exactly the same symptoms, this time after
>>> only
>>> few days and with the "tunable" set to 2 sec. I got a warranty
>>> replacement
>>> but haven't shipped this one yet. Let me know if you want it.
>> ..
>>
>> Not me.  But perhaps Tejun ?
>
> I think you're much more qualified than me on the subject. :-)
>
> Anyone else?  Ric, are you interested with playing the drive?
>

No thanks....

I would suggest that Andrei install the new drive and watch it for a few days to 
make sure that it does not fail in the same way. If it does, you might want to 
look at the power supply/cables/etc?

ric


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-01 13:47                                       ` Ric Wheeler
@ 2009-09-01 14:18                                           ` Andrei Tanas
  0 siblings, 0 replies; 84+ messages in thread
From: Andrei Tanas @ 2009-09-01 14:18 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: Tejun Heo, Mark Lord, Mark Lord, NeilBrown, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik

On Tue, 01 Sep 2009 09:47:31 -0400, Ric Wheeler <rwheeler@redhat.com>
wrote:
>>>> Mine errored out again with exactly the same symptoms, this time after
>>>> only
>>>> few days and with the "tunable" set to 2 sec. I got a warranty
>>>> replacement
>>>> but haven't shipped this one yet. Let me know if you want it.
>>> ..
>>>
>>> Not me.  But perhaps Tejun ?
>>
>> I think you're much more qualified than me on the subject. :-)
>>
>> Anyone else?  Ric, are you interested with playing the drive?
> 
> No thanks....
> 
> I would suggest that Andrei install the new drive and watch it for a few
> days to 
> make sure that it does not fail in the same way. If it does, you might
want
> to look at the power supply/cables/etc?

The drive is the second member of RAID1 array, as far as I understand, both
drives should be experiencing very similar access patterns, and they are
the same model with the same firmware, and manufactured on the same day,
but only one of them showed these symptoms, so there must be something
"special" about it.
By now I think that MD made the right "decision" failing the drive and
removing it from the array, so I guess let's leave it at that.

Andrei.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
@ 2009-09-01 14:18                                           ` Andrei Tanas
  0 siblings, 0 replies; 84+ messages in thread
From: Andrei Tanas @ 2009-09-01 14:18 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: Tejun Heo, Mark Lord, Mark Lord, NeilBrown, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik

On Tue, 01 Sep 2009 09:47:31 -0400, Ric Wheeler <rwheeler@redhat.com>
wrote:
>>>> Mine errored out again with exactly the same symptoms, this time after
>>>> only
>>>> few days and with the "tunable" set to 2 sec. I got a warranty
>>>> replacement
>>>> but haven't shipped this one yet. Let me know if you want it.
>>> ..
>>>
>>> Not me.  But perhaps Tejun ?
>>
>> I think you're much more qualified than me on the subject. :-)
>>
>> Anyone else?  Ric, are you interested with playing the drive?
> 
> No thanks....
> 
> I would suggest that Andrei install the new drive and watch it for a few
> days to 
> make sure that it does not fail in the same way. If it does, you might
want
> to look at the power supply/cables/etc?

The drive is the second member of RAID1 array, as far as I understand, both
drives should be experiencing very similar access patterns, and they are
the same model with the same firmware, and manufactured on the same day,
but only one of them showed these symptoms, so there must be something
"special" about it.
By now I think that MD made the right "decision" failing the drive and
removing it from the array, so I guess let's leave it at that.

Andrei.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-01 13:07                                   ` Andrei Tanas
  (?)
  (?)
@ 2009-09-02 21:58                                   ` Allan Wind
  2009-09-04 19:39                                     ` Andrei Tanas
  -1 siblings, 1 reply; 84+ messages in thread
From: Allan Wind @ 2009-09-02 21:58 UTC (permalink / raw)
  To: IDE/ATA development list, linux-scsi

On 2009-09-01T09:07:11, Andrei Tanas wrote:
> Mine errored out again with exactly the same symptoms, this time after only
> few days and with the "tunable" set to 2 sec. I got a warranty replacement
> but haven't shipped this one yet. Let me know if you want it.

How do you set it to 2 sec?  I have a raid 1 array of two new WDC 
2 TB drives that fail on the 2nd drive all the time.  Disabling 
NCQ might have helped but it is hard to tell.  It was stable for 
a week, but probably failed 10 times over the weekend.  I either 
need to add a 3rd drive to get any protection from hardware 
failure, or use lvm to mirror the two drives instead of md :-(


/Allan
-- 
Allan Wind
Life Integrity, LLC
<http://lifeintegrity.com>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-02 21:58                                   ` Allan Wind
@ 2009-09-04 19:39                                     ` Andrei Tanas
  0 siblings, 0 replies; 84+ messages in thread
From: Andrei Tanas @ 2009-09-04 19:39 UTC (permalink / raw)
  To: Allan Wind; +Cc: IDE/ATA development list, linux-scsi

On Wed, 2 Sep 2009 17:58:41 -0400, Allan Wind
<allan_wind@lifeintegrity.com>
wrote:
> On 2009-09-01T09:07:11, Andrei Tanas wrote:
>> Mine errored out again with exactly the same symptoms, this time after
>> only
>> few days and with the "tunable" set to 2 sec. I got a warranty
>> replacement
>> but haven't shipped this one yet. Let me know if you want it.
> 
> How do you set it to 2 sec?  I have a raid 1 array of two new WDC 
> 2 TB drives that fail on the 2nd drive all the time.  Disabling 
> NCQ might have helped but it is hard to tell.  It was stable for 
> a week, but probably failed 10 times over the weekend.  I either 
> need to add a 3rd drive to get any protection from hardware 
> failure, or use lvm to mirror the two drives instead of md :-(

echo 2.0 > /sys/block/md0/md/safe_mode_delay

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-08-31 12:20                               ` Tejun Heo
@ 2009-09-07 11:44                                 ` Chris Webb
  2009-09-07 11:59                                   ` Chris Webb
                                                     ` (2 more replies)
  2009-09-16 22:28                                 ` Chris Webb
  1 sibling, 3 replies; 84+ messages in thread
From: Chris Webb @ 2009-09-07 11:44 UTC (permalink / raw)
  To: linux-scsi
  Cc: Tejun Heo, Ric Wheeler, Andrei Tanas, NeilBrown, linux-kernel,
	IDE/ATA development list, Jeff Garzik, Mark Lord

Sorry for the late follow up to this thread, but I'm also seeing symptoms that
look identical to these and would be grateful for any advice. I think I can
reasonably rule out a single faulty drive, controller or cabling set as I'm
seeing it across a cluster of Supermicro machines with six Seagate ST3750523AS
SATA drives in each and the drive that times out is apparently randomly
distributed across the cluster. (Of course, since the hardware is identical, it
could still be a hardware design or firmware problem.)

We're running x86-64 2.6.30.5 (and previously 2.6.30.4 where we also saw the
problem) with these drives on top of an on-motherboard ahci controller:

  ahci 0000:00:1f.2: version 3.0
    alloc irq_desc for 19 on cpu 0 node 0
    alloc kstat_irqs on cpu 0 node 0
  ahci 0000:00:1f.2: PCI INT B -> GSI 19 (level, low) -> IRQ 19
  ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 6 ports 3 Gbps 0x3f impl RAID mode
  ahci 0000:00:1f.2: flags: 64bit ncq pm led pmp slum part 
  ahci 0000:00:1f.2: setting latency timer to 64
  scsi0 : ahci
  scsi1 : ahci
  scsi2 : ahci
  scsi3 : ahci
  scsi4 : ahci
  scsi5 : ahci
  ata1: SATA max UDMA/133 abar m1024@0xd8500400 port 0xd8500500 irq 19
  ata2: SATA max UDMA/133 irq_stat 0x00400040, connection status changed irq 19
  ata3: SATA max UDMA/133 irq 19
  ata4: SATA max UDMA/133 irq_stat 0x00400040, connection status changed
  ata5: SATA max UDMA/133 irq_stat 0x00400040, connection status changed
  ata6: SATA max UDMA/133 irq_stat 0x00400040, connection status changed
  [...]
  ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  ata1.00: ATA-8: ST3750523AS, CC34, max UDMA/133
  ata1.00: 1465149168 sectors, multi 0: LBA48 NCQ (depth 31/32)
  ata1.00: configured for UDMA/133
  scsi 0:0:0:0: Direct-Access     ATA      ST3750523AS      CC34 PQ: 0 ANSI: 5
  sd 0:0:0:0: [sda] 1465149168 512-byte hardware sectors: (750 GB/698 GiB)
  sd 0:0:0:0: [sda] Write Protect is off
  sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
  sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
   sda:<5>sd 0:0:0:0: Attached scsi generic sg0 type 0
   sda1 sda2 sda3
  sd 0:0:0:0: [sda] Attached SCSI disk
  [etc]

These machines host large numbers of kvm virtual machines and have three RAID
arrays, a RAID1 of /dev/sd[abcdef]1 for /, a RAID10 of /dev/sd[abcdef]2 for
swap and a large RAID10 of /dev/sd[abcdef]3 which is used as an LVM2 PV, out of
which the virtual drives are carved.

Everything will be running fine when suddenly:

  ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
  ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
          res 40/00:00:80:17:91/00:00:37:00:00/40 Emask 0x4 (timeout)
  ata1.00: status: { DRDY }
  ata1: hard resetting link
  ata1: softreset failed (device not ready)
  ata1: hard resetting link
  ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  ata1.00: configured for UDMA/133
  ata1: EH complete
  end_request: I/O error, dev sda, sector 1465147272
  md: super_written gets error=-5, uptodate=0
  raid10: Disk failure on sda3, disabling device.
  raid10: Operation continuing on 5 devices.

The drive shows no errors in the SMART log, it doesn't really have any read or
write problems at 1465147272 (verified with O_DIRECT dd), and the reallocated
sector count is still zero. mdadm --remove and mdadm --add on the component
bring everything back to life fine.

I'm using the deadline IO scheduler with default 5s timeout value for these
disks, in case that could be part of the problem? LVM2 does O_DIRECT reads and
writes to the md array to manipulate its metadata, and my virtual machines are
doing quite a lot of IO to the logical volumes too, which they open O_SYNC, so
the arrays are reasonably heavily loaded.

I think there are two things that concern me here. One is obviously that the
timeouts and resets are happening at all and I'd like to get to the bottom
of this. However, the other is that the response to a SCSI disk having to be
reset is to chuck it out of the array even though it'll be fine following
the reset! I'd very much like to stop this happening somehow.

A couple of other problems I'm seeing, which may be connected, and make this
problem more painful...

Perhaps it's inevitable because of the IO load the machine is under, but I
find I need to set /proc/sys/dev/raid/speed_limit_min right down to 0 before
any resync or all disk accesses seem to deadlock completely, including the
RAID resync itself. In this state, all attempts to dd even a single block
from /dev/md2 will hang forever. With the speed limit minimum reduced to 0,
resync proceeds fine most of the time, occasionally dropping down to zero
speed when other IO is going on (which is fine), but apparently not
deadlocking.

I have a bitmap on the array, but sometimes when I remove and re-add a
failed component, it doesn't seem to use the bitmap and does a lengthy full
recovery instead. One example that's ongoing at the moment:-

      [=>...................]  recovery =  5.7% (40219648/703205312) finish=7546.3min speed=1463K/sec
      bitmap: 34/126 pages [136KB], 8192KB chunk

which is rather painful and has to be throttled back with speed_limit_max to
avoid the virtual machines running on top of it from having extremely poor IO
latency.

Best wishes,

Chris.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-07 11:44                                 ` Chris Webb
@ 2009-09-07 11:59                                   ` Chris Webb
  2009-09-09 12:02                                     ` Chris Webb
  2009-09-07 16:55                                   ` Allan Wind
  2009-09-07 16:55                                   ` Allan Wind
  2 siblings, 1 reply; 84+ messages in thread
From: Chris Webb @ 2009-09-07 11:59 UTC (permalink / raw)
  To: linux-scsi
  Cc: Tejun Heo, Ric Wheeler, Andrei Tanas, NeilBrown, linux-kernel,
	IDE/ATA development list, Jeff Garzik, Mark Lord

Chris Webb <chris@arachsys.com> writes:

> I have a bitmap on the array, but sometimes when I remove and re-add a
> failed component, it doesn't seem to use the bitmap and does a lengthy full
> recovery instead. One example that's ongoing at the moment:-
> 
>       [=>...................]  recovery =  5.7% (40219648/703205312) finish=7546.3min speed=1463K/sec
>       bitmap: 34/126 pages [136KB], 8192KB chunk
> 
> which is rather painful and has to be throttled back with speed_limit_max to
> avoid the virtual machines running on top of it from having extremely poor IO
> latency.

I've also noticed that during this recovery, I'm seeing lots of timeouts but
they don't seem to interrupt the resync:

  05:47:39 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
  05:47:39 ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
  05:47:39         res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
  05:47:39 ata5.00: status: { DRDY }
  05:47:39 ata5: hard resetting link
  05:47:49 ata5: softreset failed (device not ready)
  05:47:49 ata5: hard resetting link
  05:47:49 ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  05:47:49 ata5.00: configured for UDMA/133
  05:47:49 ata5: EH complete
  
  08:17:39 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
  08:17:39 ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
  08:17:39         res 40/00:00:35:83:f8/00:00:4d:00:00/40 Emask 0x4 (timeout)
  08:17:39 ata5.00: status: { DRDY }
  08:17:39 ata5: hard resetting link
  08:17:49 ata5: softreset failed (device not ready)
  08:17:49 ata5: hard resetting link
  08:17:49 ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  08:17:49 ata5.00: configured for UDMA/133
  08:17:49 ata5: EH complete
  
  10:22:39 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
  10:22:39 ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
  10:22:39         res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
  10:22:39 ata5.00: status: { DRDY }
  10:22:39 ata5: hard resetting link
  10:22:49 ata5: softreset failed (device not ready)
  10:22:49 ata5: hard resetting link
  10:22:50 ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  10:22:51 ata5.00: configured for UDMA/133
  10:22:51 ata5: EH complete

Cheers,

Chris.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-07 11:44                                 ` Chris Webb
  2009-09-07 11:59                                   ` Chris Webb
  2009-09-07 16:55                                   ` Allan Wind
@ 2009-09-07 16:55                                   ` Allan Wind
  2009-09-07 23:26                                       ` Thomas Fjellstrom
  2 siblings, 1 reply; 84+ messages in thread
From: Allan Wind @ 2009-09-07 16:55 UTC (permalink / raw)
  To: Chris Webb
  Cc: linux-scsi, Tejun Heo, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

On 2009-09-07T12:44:42, Chris Webb wrote:
> Sorry for the late follow up to this thread, but I'm also seeing symptoms that
> look identical to these and would be grateful for any advice. I think I can
> reasonably rule out a single faulty drive, controller or cabling set as I'm
> seeing it across a cluster of Supermicro machines with six Seagate ST3750523AS
> SATA drives in each and the drive that times out is apparently randomly
> distributed across the cluster. (Of course, since the hardware is identical, it
> could still be a hardware design or firmware problem.)

Seeing the same thing with a Supermicro motherboard and a pair WDC 2 TB 
drives.  Disabling NCQ does not resolve the issue, nor increasing 
the safe_mode_delay.  This is with 2.6.30.4.  This machine is 
sitting on its hand (i.e. no significant load).


/Allan
-- 
Allan Wind
Life Integrity, LLC
<http://lifeintegrity.com>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-07 11:44                                 ` Chris Webb
  2009-09-07 11:59                                   ` Chris Webb
@ 2009-09-07 16:55                                   ` Allan Wind
  2009-09-07 16:55                                   ` Allan Wind
  2 siblings, 0 replies; 84+ messages in thread
From: Allan Wind @ 2009-09-07 16:55 UTC (permalink / raw)
  To: Chris Webb
  Cc: linux-scsi, Tejun Heo, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

On 2009-09-07T12:44:42, Chris Webb wrote:
> Sorry for the late follow up to this thread, but I'm also seeing symptoms that
> look identical to these and would be grateful for any advice. I think I can
> reasonably rule out a single faulty drive, controller or cabling set as I'm
> seeing it across a cluster of Supermicro machines with six Seagate ST3750523AS
> SATA drives in each and the drive that times out is apparently randomly
> distributed across the cluster. (Of course, since the hardware is identical, it
> could still be a hardware design or firmware problem.)

Seeing the same thing with a Supermicro motherboard and a pair WDC 2 TB 
drives.  Disabling NCQ does not resolve the issue, nor increasing 
the safe_mode_delay.  This is with 2.6.30.4.  This machine is 
sitting on its hand (i.e. no significant load).


/Allan
-- 
Allan Wind
Life Integrity, LLC
<http://lifeintegrity.com>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-07 16:55                                   ` Allan Wind
@ 2009-09-07 23:26                                       ` Thomas Fjellstrom
  0 siblings, 0 replies; 84+ messages in thread
From: Thomas Fjellstrom @ 2009-09-07 23:26 UTC (permalink / raw)
  To: Chris Webb, linux-scsi, Tejun Heo, Ric Wheeler, Andrei Tanas, NeilBrown

On Mon September 7 2009, Allan Wind wrote:
> On 2009-09-07T12:44:42, Chris Webb wrote:
> > Sorry for the late follow up to this thread, but I'm also seeing symptoms
> > that look identical to these and would be grateful for any advice. I
> > think I can reasonably rule out a single faulty drive, controller or
> > cabling set as I'm seeing it across a cluster of Supermicro machines with
> > six Seagate ST3750523AS SATA drives in each and the drive that times out
> > is apparently randomly distributed across the cluster. (Of course, since
> > the hardware is identical, it could still be a hardware design or
> > firmware problem.)
> 
> Seeing the same thing with a Supermicro motherboard and a pair WDC 2 TB
> drives.  Disabling NCQ does not resolve the issue, nor increasing
> the safe_mode_delay.  This is with 2.6.30.4.  This machine is
> sitting on its hand (i.e. no significant load).

I have the same issue with a single WD 2TB Green drive. Technically two, but 
it always only gets errors from the same drive, so I was assuming it was the 
drive. I only have to setup the raid0 array, and put some light load on it for 
the kernel to start complaining, and eventually it just kicks the drive 
completely with the following messages:

sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
end_request: I/O error, dev sdb, sector 202026972

The drive does work fine prior to the frozen timeout errors. And I was using 
it in windows (same raid0 config) just fine with no errors what so ever.

> 
> /Allan
> 


-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
@ 2009-09-07 23:26                                       ` Thomas Fjellstrom
  0 siblings, 0 replies; 84+ messages in thread
From: Thomas Fjellstrom @ 2009-09-07 23:26 UTC (permalink / raw)
  To: Chris Webb, linux-scsi, Tejun Heo, Ric Wheeler, Andrei Tanas,
	NeilBrown, linux-kernel, IDE/ATA development list, Jeff Garzik,
	Mark Lord

On Mon September 7 2009, Allan Wind wrote:
> On 2009-09-07T12:44:42, Chris Webb wrote:
> > Sorry for the late follow up to this thread, but I'm also seeing symptoms
> > that look identical to these and would be grateful for any advice. I
> > think I can reasonably rule out a single faulty drive, controller or
> > cabling set as I'm seeing it across a cluster of Supermicro machines with
> > six Seagate ST3750523AS SATA drives in each and the drive that times out
> > is apparently randomly distributed across the cluster. (Of course, since
> > the hardware is identical, it could still be a hardware design or
> > firmware problem.)
> 
> Seeing the same thing with a Supermicro motherboard and a pair WDC 2 TB
> drives.  Disabling NCQ does not resolve the issue, nor increasing
> the safe_mode_delay.  This is with 2.6.30.4.  This machine is
> sitting on its hand (i.e. no significant load).

I have the same issue with a single WD 2TB Green drive. Technically two, but 
it always only gets errors from the same drive, so I was assuming it was the 
drive. I only have to setup the raid0 array, and put some light load on it for 
the kernel to start complaining, and eventually it just kicks the drive 
completely with the following messages:

sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
end_request: I/O error, dev sdb, sector 202026972

The drive does work fine prior to the frozen timeout errors. And I was using 
it in windows (same raid0 config) just fine with no errors what so ever.

> 
> /Allan
> 


-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-07 11:59                                   ` Chris Webb
@ 2009-09-09 12:02                                     ` Chris Webb
  2009-09-14  7:41                                       ` Tejun Heo
  0 siblings, 1 reply; 84+ messages in thread
From: Chris Webb @ 2009-09-09 12:02 UTC (permalink / raw)
  To: linux-scsi
  Cc: Tejun Heo, Ric Wheeler, Andrei Tanas, NeilBrown, linux-kernel,
	IDE/ATA development list, Jeff Garzik, Mark Lord

Chris Webb <chris@arachsys.com> writes:

> I've also noticed that during this recovery, I'm seeing lots of timeouts but
> they don't seem to interrupt the resync:
> 
>   05:47:39 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>   05:47:39 ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
>   05:47:39         res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
>   05:47:39 ata5.00: status: { DRDY }
>   05:47:39 ata5: hard resetting link
>   05:47:49 ata5: softreset failed (device not ready)
>   05:47:49 ata5: hard resetting link
>   05:47:49 ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>   05:47:49 ata5.00: configured for UDMA/133
>   05:47:49 ata5: EH complete
>   
>   08:17:39 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>   08:17:39 ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
>   08:17:39         res 40/00:00:35:83:f8/00:00:4d:00:00/40 Emask 0x4 (timeout)
>   08:17:39 ata5.00: status: { DRDY }
>   08:17:39 ata5: hard resetting link
>   08:17:49 ata5: softreset failed (device not ready)
>   08:17:49 ata5: hard resetting link
>   08:17:49 ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>   08:17:49 ata5.00: configured for UDMA/133
>   08:17:49 ata5: EH complete
>   
>   10:22:39 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>   10:22:39 ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
>   10:22:39         res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
>   10:22:39 ata5.00: status: { DRDY }
>   10:22:39 ata5: hard resetting link
>   10:22:49 ata5: softreset failed (device not ready)
>   10:22:49 ata5: hard resetting link
>   10:22:50 ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>   10:22:51 ata5.00: configured for UDMA/133
>   10:22:51 ata5: EH complete

... the difference being that a timeout which causes a super_written failure
seems to return an I/O error whereas the others don't:

  ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
  ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
          res 40/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
  ata5.00: status: { DRDY }
  ata5: hard resetting link
  ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  ata5.00: configured for UDMA/133
  ata5: EH complete
  end_request: I/O error, dev sde, sector 1465147272
  md: super_written gets error=-5, uptodate=0
  raid10: Disk failure on sde3, disabling device.

I wonder what's different about these two timeouts such that one causes an I/O
error and the other just causes a retry after reset? Presumably if the latter
was also just a retry, everything would be (closer to being) fine.

Cheers,

Chris.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-01 14:18                                           ` Andrei Tanas
@ 2009-09-14  5:30                                             ` Marc Giger
  -1 siblings, 0 replies; 84+ messages in thread
From: Marc Giger @ 2009-09-14  5:30 UTC (permalink / raw)
  To: Andrei Tanas
  Cc: Ric Wheeler, Tejun Heo, Mark Lord, Mark Lord, NeilBrown,
	linux-kernel, IDE/ATA development list, linux-scsi, Jeff Garzik

Hi,

I have similar problem with my two Sun T2000 machines.
During last week I got two times a degraded array. Everytime another
disk is kicked of the array. On the other T2000 machine the same happend
multiple times in the past too. The interesting part is, it is always the
same sector involved on every disk as in the original report. After a manual resync of the disks it
seems to work for some time until it is failing again. smart doesn't show
any errors on the disks.

[871180.857895] sd 0:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00                                                                                          
[871180.857929] end_request: I/O error, dev sda, sector 143363852                                                                                                
[871180.857950] md: super_written gets error=-5, uptodate=0                                                                                                      
[871180.857968] raid1: Disk failure on sda2, disabling device.                                                                                                   
[871180.857976]         Operation continuing on 1 devices                                                                                                        
[871180.863652] RAID1 conf printout:                                                                                                                             
[871180.863678]  --- wd:1 rd:2                                                                                                                                   
[871180.863694]  disk 0, wo:1, o:0, dev:sda2                                                                                                                     
[871180.863710]  disk 1, wo:0, o:1, dev:sdb2                                                                                                                     
[871180.873021] RAID1 conf printout:                                                                                                                             
[871180.873041]  --- wd:1 rd:2                                                                                                                                   
[871180.873053]  disk 1, wo:0, o:1, dev:sdb2                                                                                                                     
[925797.120488] md: data-check of RAID array md0                                                                                                                 
[925797.120516] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.                                                                                               
[925797.120531] md: using maximum available idle IO bandwidth (but not more than 30000 KB/sec) for data-check.                                                   
[925797.120573] md: using 256k window, over a total of 71585536 blocks.                                                                                          
[925797.121308] md: md0: data-check done.                                                                                                                        
[925797.137397] RAID1 conf printout:                                                                                                                             
[925797.137419]  --- wd:1 rd:2                                                                                                                                   
[925797.137433]  disk 1, wo:0, o:1, dev:sdb2                                                                                                                     
[1036034.437130] md: unbind<sda2>                                                                                                                                
[1036034.437168] md: export_rdev(sda2)
[1036044.572402] md: bind<sda2>
[1036044.574923] RAID1 conf printout:
[1036044.574945]  --- wd:1 rd:2
[1036044.574960]  disk 0, wo:1, o:1, dev:sda2
[1036044.574976]  disk 1, wo:0, o:1, dev:sdb2
[1036044.575157] md: recovery of RAID array md0
[1036044.575171] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[1036044.575186] md: using maximum available idle IO bandwidth (but not more than 30000 KB/sec) for recovery.
[1036044.575227] md: using 256k window, over a total of 71585536 blocks.
[1038465.450853] md: md0: recovery done.
[1038465.549707] RAID1 conf printout:
[1038465.549728]  --- wd:2 rd:2
[1038465.549743]  disk 0, wo:0, o:1, dev:sda2
[1038465.549759]  disk 1, wo:0, o:1, dev:sdb2
[1192672.830876] sd 0:0:1:0: [sdb] Result: hostbyte=0x01 driverbyte=0x00
[1192672.830910] end_request: I/O error, dev sdb, sector 143363852
[1192672.830932] md: super_written gets error=-5, uptodate=0
[1192672.830951] raid1: Disk failure on sdb2, disabling device.
[1192672.830958]        Operation continuing on 1 devices
[1192672.836943] RAID1 conf printout:
[1192672.836964]  --- wd:1 rd:2
[1192672.836976]  disk 0, wo:0, o:1, dev:sda2
[1192672.836990]  disk 1, wo:1, o:0, dev:sdb2
[1192672.846157] RAID1 conf printout:
[1192672.846177]  --- wd:1 rd:2
[1192672.846189]  disk 0, wo:0, o:1, dev:sda2


The used disks are:

Device: FUJITSU  MAY2073RCSUN72G  Version: 0401
Device type: disk                              
Transport protocol: SAS                        
Local Time is: Mon Sep 14 07:24:28 2009 CEST   
Device supports SMART and is Enabled           
Temperature Warning Disabled or Not Supported  
SMART Health Status: OK                        

Current Drive Temperature:     34 C
Drive Trip Temperature:        65 C
Manufactured in week 38 of year 2006
Recommended maximum start stop count:  10000 times
Current start stop count:      56 times           
Elements in grown defect list: 0

Device: FUJITSU  MAY2073RCSUN72G  Version: 0401
Device type: disk
Transport protocol: SAS
Local Time is: Mon Sep 14 07:25:49 2009 CEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Current Drive Temperature:     33 C
Drive Trip Temperature:        65 C
Manufactured in week 38 of year 2006
Recommended maximum start stop count:  10000 times
Current start stop count:      56 times
Elements in grown defect list: 0


Controller:
0000:07:00.0 SCSI storage controller: LSI Logic / Symbios Logic
SAS1064ET PCI-Express Fusion-MPT SAS (rev 02)

Thanks

Marc



On Tue, 01 Sep 2009 10:18:06 -0400
Andrei Tanas <andrei@tanas.ca> wrote:

> On Tue, 01 Sep 2009 09:47:31 -0400, Ric Wheeler <rwheeler@redhat.com>
> wrote:
> >>>> Mine errored out again with exactly the same symptoms, this time after
> >>>> only
> >>>> few days and with the "tunable" set to 2 sec. I got a warranty
> >>>> replacement
> >>>> but haven't shipped this one yet. Let me know if you want it.
> >>> ..
> >>>
> >>> Not me.  But perhaps Tejun ?
> >>
> >> I think you're much more qualified than me on the subject. :-)
> >>
> >> Anyone else?  Ric, are you interested with playing the drive?
> > 
> > No thanks....
> > 
> > I would suggest that Andrei install the new drive and watch it for a few
> > days to 
> > make sure that it does not fail in the same way. If it does, you might
> want
> > to look at the power supply/cables/etc?
> 
> The drive is the second member of RAID1 array, as far as I understand, both
> drives should be experiencing very similar access patterns, and they are
> the same model with the same firmware, and manufactured on the same day,
> but only one of them showed these symptoms, so there must be something
> "special" about it.
> By now I think that MD made the right "decision" failing the drive and
> removing it from the array, so I guess let's leave it at that.
> 
> Andrei.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
@ 2009-09-14  5:30                                             ` Marc Giger
  0 siblings, 0 replies; 84+ messages in thread
From: Marc Giger @ 2009-09-14  5:30 UTC (permalink / raw)
  To: Andrei Tanas
  Cc: Ric Wheeler, Tejun Heo, Mark Lord, Mark Lord, NeilBrown,
	linux-kernel, IDE/ATA development list, linux-scsi, Jeff Garzik

Hi,

I have similar problem with my two Sun T2000 machines.
During last week I got two times a degraded array. Everytime another
disk is kicked of the array. On the other T2000 machine the same happend
multiple times in the past too. The interesting part is, it is always the
same sector involved on every disk as in the original report. After a manual resync of the disks it
seems to work for some time until it is failing again. smart doesn't show
any errors on the disks.

[871180.857895] sd 0:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00                                                                                          
[871180.857929] end_request: I/O error, dev sda, sector 143363852                                                                                                
[871180.857950] md: super_written gets error=-5, uptodate=0                                                                                                      
[871180.857968] raid1: Disk failure on sda2, disabling device.                                                                                                   
[871180.857976]         Operation continuing on 1 devices                                                                                                        
[871180.863652] RAID1 conf printout:                                                                                                                             
[871180.863678]  --- wd:1 rd:2                                                                                                                                   
[871180.863694]  disk 0, wo:1, o:0, dev:sda2                                                                                                                     
[871180.863710]  disk 1, wo:0, o:1, dev:sdb2                                                                                                                     
[871180.873021] RAID1 conf printout:                                                                                                                             
[871180.873041]  --- wd:1 rd:2                                                                                                                                   
[871180.873053]  disk 1, wo:0, o:1, dev:sdb2                                                                                                                     
[925797.120488] md: data-check of RAID array md0                                                                                                                 
[925797.120516] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.                                                                                               
[925797.120531] md: using maximum available idle IO bandwidth (but not more than 30000 KB/sec) for data-check.                                                   
[925797.120573] md: using 256k window, over a total of 71585536 blocks.                                                                                          
[925797.121308] md: md0: data-check done.                                                                                                                        
[925797.137397] RAID1 conf printout:                                                                                                                             
[925797.137419]  --- wd:1 rd:2                                                                                                                                   
[925797.137433]  disk 1, wo:0, o:1, dev:sdb2                                                                                                                     
[1036034.437130] md: unbind<sda2>                                                                                                                                
[1036034.437168] md: export_rdev(sda2)
[1036044.572402] md: bind<sda2>
[1036044.574923] RAID1 conf printout:
[1036044.574945]  --- wd:1 rd:2
[1036044.574960]  disk 0, wo:1, o:1, dev:sda2
[1036044.574976]  disk 1, wo:0, o:1, dev:sdb2
[1036044.575157] md: recovery of RAID array md0
[1036044.575171] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[1036044.575186] md: using maximum available idle IO bandwidth (but not more than 30000 KB/sec) for recovery.
[1036044.575227] md: using 256k window, over a total of 71585536 blocks.
[1038465.450853] md: md0: recovery done.
[1038465.549707] RAID1 conf printout:
[1038465.549728]  --- wd:2 rd:2
[1038465.549743]  disk 0, wo:0, o:1, dev:sda2
[1038465.549759]  disk 1, wo:0, o:1, dev:sdb2
[1192672.830876] sd 0:0:1:0: [sdb] Result: hostbyte=0x01 driverbyte=0x00
[1192672.830910] end_request: I/O error, dev sdb, sector 143363852
[1192672.830932] md: super_written gets error=-5, uptodate=0
[1192672.830951] raid1: Disk failure on sdb2, disabling device.
[1192672.830958]        Operation continuing on 1 devices
[1192672.836943] RAID1 conf printout:
[1192672.836964]  --- wd:1 rd:2
[1192672.836976]  disk 0, wo:0, o:1, dev:sda2
[1192672.836990]  disk 1, wo:1, o:0, dev:sdb2
[1192672.846157] RAID1 conf printout:
[1192672.846177]  --- wd:1 rd:2
[1192672.846189]  disk 0, wo:0, o:1, dev:sda2


The used disks are:

Device: FUJITSU  MAY2073RCSUN72G  Version: 0401
Device type: disk                              
Transport protocol: SAS                        
Local Time is: Mon Sep 14 07:24:28 2009 CEST   
Device supports SMART and is Enabled           
Temperature Warning Disabled or Not Supported  
SMART Health Status: OK                        

Current Drive Temperature:     34 C
Drive Trip Temperature:        65 C
Manufactured in week 38 of year 2006
Recommended maximum start stop count:  10000 times
Current start stop count:      56 times           
Elements in grown defect list: 0

Device: FUJITSU  MAY2073RCSUN72G  Version: 0401
Device type: disk
Transport protocol: SAS
Local Time is: Mon Sep 14 07:25:49 2009 CEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Current Drive Temperature:     33 C
Drive Trip Temperature:        65 C
Manufactured in week 38 of year 2006
Recommended maximum start stop count:  10000 times
Current start stop count:      56 times
Elements in grown defect list: 0


Controller:
0000:07:00.0 SCSI storage controller: LSI Logic / Symbios Logic
SAS1064ET PCI-Express Fusion-MPT SAS (rev 02)

Thanks

Marc



On Tue, 01 Sep 2009 10:18:06 -0400
Andrei Tanas <andrei@tanas.ca> wrote:

> On Tue, 01 Sep 2009 09:47:31 -0400, Ric Wheeler <rwheeler@redhat.com>
> wrote:
> >>>> Mine errored out again with exactly the same symptoms, this time after
> >>>> only
> >>>> few days and with the "tunable" set to 2 sec. I got a warranty
> >>>> replacement
> >>>> but haven't shipped this one yet. Let me know if you want it.
> >>> ..
> >>>
> >>> Not me.  But perhaps Tejun ?
> >>
> >> I think you're much more qualified than me on the subject. :-)
> >>
> >> Anyone else?  Ric, are you interested with playing the drive?
> > 
> > No thanks....
> > 
> > I would suggest that Andrei install the new drive and watch it for a few
> > days to 
> > make sure that it does not fail in the same way. If it does, you might
> want
> > to look at the power supply/cables/etc?
> 
> The drive is the second member of RAID1 array, as far as I understand, both
> drives should be experiencing very similar access patterns, and they are
> the same model with the same firmware, and manufactured on the same day,
> but only one of them showed these symptoms, so there must be something
> "special" about it.
> By now I think that MD made the right "decision" failing the drive and
> removing it from the array, so I guess let's leave it at that.
> 
> Andrei.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-09 12:02                                     ` Chris Webb
@ 2009-09-14  7:41                                       ` Tejun Heo
  2009-09-14  7:44                                         ` Tejun Heo
  2009-09-14 13:14                                         ` Gabor Gombas
  0 siblings, 2 replies; 84+ messages in thread
From: Tejun Heo @ 2009-09-14  7:41 UTC (permalink / raw)
  To: Chris Webb
  Cc: linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown, linux-kernel,
	IDE/ATA development list, Jeff Garzik, Mark Lord

Hello, Chris.

Chris Webb wrote:
> Chris Webb <chris@arachsys.com> writes:
> 
>> I've also noticed that during this recovery, I'm seeing lots of timeouts but
>> they don't seem to interrupt the resync:
>>
>>   05:47:39 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>>   05:47:39 ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
>>   05:47:39         res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
>>   05:47:39 ata5.00: status: { DRDY }
>>   05:47:39 ata5: hard resetting link
>>   05:47:49 ata5: softreset failed (device not ready)
>>   05:47:49 ata5: hard resetting link
>>   05:47:49 ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>   05:47:49 ata5.00: configured for UDMA/133
>>   05:47:49 ata5: EH complete
>>   
>>   08:17:39 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>>   08:17:39 ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
>>   08:17:39         res 40/00:00:35:83:f8/00:00:4d:00:00/40 Emask 0x4 (timeout)
>>   08:17:39 ata5.00: status: { DRDY }
>>   08:17:39 ata5: hard resetting link
>>   08:17:49 ata5: softreset failed (device not ready)
>>   08:17:49 ata5: hard resetting link
>>   08:17:49 ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>   08:17:49 ata5.00: configured for UDMA/133
>>   08:17:49 ata5: EH complete
>>   
>>   10:22:39 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>>   10:22:39 ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
>>   10:22:39         res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
>>   10:22:39 ata5.00: status: { DRDY }
>>   10:22:39 ata5: hard resetting link
>>   10:22:49 ata5: softreset failed (device not ready)
>>   10:22:49 ata5: hard resetting link
>>   10:22:50 ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>   10:22:51 ata5.00: configured for UDMA/133
>>   10:22:51 ata5: EH complete
> 
> ... the difference being that a timeout which causes a super_written failure
> seems to return an I/O error whereas the others don't:

The aboves are IDENTIFY.  Who's issuing IDENTIFY regularly?  It isn't
from the regular IO paths or md.  It's probably being issued via SG_IO
from userland.  These failures don't affect normal operation.

>   ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>   ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
>           res 40/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
>   ata5.00: status: { DRDY }
>   ata5: hard resetting link
>   ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>   ata5.00: configured for UDMA/133
>   ata5: EH complete
>   end_request: I/O error, dev sde, sector 1465147272
>   md: super_written gets error=-5, uptodate=0
>   raid10: Disk failure on sde3, disabling device.
> 
> I wonder what's different about these two timeouts such that one causes an I/O
> error and the other just causes a retry after reset? Presumably if the latter
> was also just a retry, everything would be (closer to being) fine.

Because this error is actually seen by the md layer and FLUSH in
general can't be retried cleanly.  On retrial, the drive goes on and
retry the sectors after the point of failure.  I'm not sure whether
FLUSH is actually failing here or it's a communication glitch.  At any
rate, if FLUSH is failing or timing out, the only right thing to do is
to kick it out of the array as keeping after retrying may lead to
silent data corruption.  Seriously, it's most likely a hardware
malfunction although I can't tell where the problem is with the given
data.  Get the hardware fixed.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-14  7:41                                       ` Tejun Heo
@ 2009-09-14  7:44                                         ` Tejun Heo
  2009-09-14 12:48                                           ` Mark Lord
  2009-09-14 13:11                                           ` Henrique de Moraes Holschuh
  2009-09-14 13:14                                         ` Gabor Gombas
  1 sibling, 2 replies; 84+ messages in thread
From: Tejun Heo @ 2009-09-14  7:44 UTC (permalink / raw)
  To: Chris Webb
  Cc: linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown, linux-kernel,
	IDE/ATA development list, Jeff Garzik, Mark Lord

Tejun Heo wrote:
>> I wonder what's different about these two timeouts such that one causes an I/O
>> error and the other just causes a retry after reset? Presumably if the latter
>> was also just a retry, everything would be (closer to being) fine.
> 
> Because this error is actually seen by the md layer and FLUSH in
> general can't be retried cleanly.  On retrial, the drive goes on and
> retry the sectors after the point of failure.  I'm not sure whether
> FLUSH is actually failing here or it's a communication glitch.  At any
> rate, if FLUSH is failing or timing out, the only right thing to do is
> to kick it out of the array as keeping after retrying may lead to
> silent data corruption.  Seriously, it's most likely a hardware
> malfunction although I can't tell where the problem is with the given
> data.  Get the hardware fixed.

Oooh, another possibility is the above continuous IDENTIFY tries.
Doing things like that generally isn't a good idea because vendors
don't expect IDENTIFY to be mixed regularly with normal IOs and
firmwares aren't tested against that.  Even smart commands sometimes
cause problems.  So, finding out the thing which is obsessed with the
identity of the drive and stopping it might help.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-07 23:26                                       ` Thomas Fjellstrom
  (?)
@ 2009-09-14  7:46                                       ` Tejun Heo
  2009-09-14 21:13                                         ` Thomas Fjellstrom
  -1 siblings, 1 reply; 84+ messages in thread
From: Tejun Heo @ 2009-09-14  7:46 UTC (permalink / raw)
  To: tfjellstrom
  Cc: Chris Webb, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

Hello,

Thomas Fjellstrom wrote:
> I have the same issue with a single WD 2TB Green drive. Technically two, but 
> it always only gets errors from the same drive, so I was assuming it was the 
> drive. I only have to setup the raid0 array, and put some light load on it for 
> the kernel to start complaining, and eventually it just kicks the drive 
> completely with the following messages:
> 
> sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> end_request: I/O error, dev sdb, sector 202026972
> 
> The drive does work fine prior to the frozen timeout errors. And I was using 
> it in windows (same raid0 config) just fine with no errors what so ever.

Can you post full dmesg output?  The above doesn't tell much about ATA
side of things.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-14  7:44                                         ` Tejun Heo
@ 2009-09-14 12:48                                           ` Mark Lord
  2009-09-14 13:05                                             ` Tejun Heo
  2009-09-14 13:11                                           ` Henrique de Moraes Holschuh
  1 sibling, 1 reply; 84+ messages in thread
From: Mark Lord @ 2009-09-14 12:48 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Chris Webb, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

Tejun Heo wrote:
..
> Oooh, another possibility is the above continuous IDENTIFY tries.
> Doing things like that generally isn't a good idea because vendors
> don't expect IDENTIFY to be mixed regularly with normal IOs and
> firmwares aren't tested against that.  Even smart commands sometimes
> cause problems.  So, finding out the thing which is obsessed with the
> identity of the drive and stopping it might help.
..

Bullpucky.  That sort of thing, specifically with IDENTIFY,
has never been an issue.

I wonder if the IDENTIFY is actually coming from libata EH
after something else failed ?

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-14 12:48                                           ` Mark Lord
@ 2009-09-14 13:05                                             ` Tejun Heo
  2009-09-14 14:25                                               ` Mark Lord
  0 siblings, 1 reply; 84+ messages in thread
From: Tejun Heo @ 2009-09-14 13:05 UTC (permalink / raw)
  To: Mark Lord
  Cc: Chris Webb, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

Mark Lord wrote:
> Tejun Heo wrote:
> ..
>> Oooh, another possibility is the above continuous IDENTIFY tries.
>> Doing things like that generally isn't a good idea because vendors
>> don't expect IDENTIFY to be mixed regularly with normal IOs and
>> firmwares aren't tested against that.  Even smart commands sometimes
>> cause problems.  So, finding out the thing which is obsessed with the
>> identity of the drive and stopping it might help.
> ..
> 
> Bullpucky.  That sort of thing, specifically with IDENTIFY,
> has never been an issue.

With SMART it has.  I wouldn't be too surprised if some new firmware
chokes on repeated IDENTIFY mixed with stream of NCQ commands.  It's
just not something people (including vendors) do regularly.

> I wonder if the IDENTIFY is actually coming from libata EH
> after something else failed ?

In that case, libata-eh would print "ataP.DD: failed to IDENTIFY
(blah, err_mask=0x%x\n" instead of the full TF dump.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-14  7:44                                         ` Tejun Heo
  2009-09-14 12:48                                           ` Mark Lord
@ 2009-09-14 13:11                                           ` Henrique de Moraes Holschuh
  2009-09-14 13:24                                             ` Tejun Heo
  1 sibling, 1 reply; 84+ messages in thread
From: Henrique de Moraes Holschuh @ 2009-09-14 13:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Chris Webb, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

On Mon, 14 Sep 2009, Tejun Heo wrote:
> Oooh, another possibility is the above continuous IDENTIFY tries.
> Doing things like that generally isn't a good idea because vendors
> don't expect IDENTIFY to be mixed regularly with normal IOs and

IMHO that means the kernel should be special-casing such commands, then (i.e
quiesce drive, do command, quiesce driver, start IO again), probably
rate-limiting it for good effect.

This is the kind of stuff that userspace should NOT have to worry about
(because it will get it wrong and cause data corruption eventually).

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-14  7:41                                       ` Tejun Heo
  2009-09-14  7:44                                         ` Tejun Heo
@ 2009-09-14 13:14                                         ` Gabor Gombas
  1 sibling, 0 replies; 84+ messages in thread
From: Gabor Gombas @ 2009-09-14 13:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Chris Webb, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

On Mon, Sep 14, 2009 at 04:41:56PM +0900, Tejun Heo wrote:

> Because this error is actually seen by the md layer and FLUSH in
> general can't be retried cleanly.  On retrial, the drive goes on and
> retry the sectors after the point of failure.  I'm not sure whether
> FLUSH is actually failing here or it's a communication glitch.  At any
> rate, if FLUSH is failing or timing out, the only right thing to do is
> to kick it out of the array as keeping after retrying may lead to
> silent data corruption.

Hmm, how's that supposed to work with TLER on WD enterprise drives?
Isn't the idea behind TLER to prevent drives being kicked out of the
array because the RAID system can have a much more intelligent
retry/recovery logic than a single drive?

AFAIK md RAID can already take advantage of TLER if the operation that's
failing due to TLER is a READ, but I don't know what happens if TLER
kicks in during a WRITE or a FLUSH.

Gabor

-- 
     ---------------------------------------------------------
     MTA SZTAKI Computer and Automation Research Institute
                Hungarian Academy of Sciences
     ---------------------------------------------------------

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-14 13:11                                           ` Henrique de Moraes Holschuh
@ 2009-09-14 13:24                                             ` Tejun Heo
  2009-09-14 14:02                                               ` Henrique de Moraes Holschuh
  0 siblings, 1 reply; 84+ messages in thread
From: Tejun Heo @ 2009-09-14 13:24 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh
  Cc: Chris Webb, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

Henrique de Moraes Holschuh wrote:
> On Mon, 14 Sep 2009, Tejun Heo wrote:
>> Oooh, another possibility is the above continuous IDENTIFY tries.
>> Doing things like that generally isn't a good idea because vendors
>> don't expect IDENTIFY to be mixed regularly with normal IOs and
> 
> IMHO that means the kernel should be special-casing such commands, then (i.e
> quiesce drive, do command, quiesce driver, start IO again), probably
> rate-limiting it for good effect.
> 
> This is the kind of stuff that userspace should NOT have to worry about
> (because it will get it wrong and cause data corruption eventually).

If this indeed is the case (As Mark pointed out, there hasn't been any
precedence involving IDENTIFY but it's also the first time I see
IDENTIFY timeouts which are issued from userland), this is the kind
that userspace shouldn't do to begin with.

There was another similar problem.  Some acpi package in ubuntu issues
APM adjustment commands whenever power related stuff changes.  The
firmware on the drive which shipped on Samsung NC10 for some reason
locks up after being hit with enough of those commands.  It's just not
safe to assume these kind of stuff would reliably work.  If you're
ready to do some research and experiments, it's fine.  If you're doing
OEM customization with specific hardware and QA, sure, why not (this
is basically what windows OEMs do too).  But, doing things which
aren't _usually_ used that way repeatedly _by default_ is asking for
trouble.  There's a reason why these operations are root only.  :-)

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-14 13:24                                             ` Tejun Heo
@ 2009-09-14 14:02                                               ` Henrique de Moraes Holschuh
  2009-09-14 14:34                                                 ` Tejun Heo
  0 siblings, 1 reply; 84+ messages in thread
From: Henrique de Moraes Holschuh @ 2009-09-14 14:02 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Chris Webb, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

On Mon, 14 Sep 2009, Tejun Heo wrote:
> Henrique de Moraes Holschuh wrote:
> > On Mon, 14 Sep 2009, Tejun Heo wrote:
> >> Oooh, another possibility is the above continuous IDENTIFY tries.
> >> Doing things like that generally isn't a good idea because vendors
> >> don't expect IDENTIFY to be mixed regularly with normal IOs and
> > 
> > IMHO that means the kernel should be special-casing such commands, then (i.e
> > quiesce drive, do command, quiesce driver, start IO again), probably
> > rate-limiting it for good effect.
> > 
> > This is the kind of stuff that userspace should NOT have to worry about
> > (because it will get it wrong and cause data corruption eventually).
> 
> If this indeed is the case (As Mark pointed out, there hasn't been any
> precedence involving IDENTIFY but it's also the first time I see
> IDENTIFY timeouts which are issued from userland), this is the kind
> that userspace shouldn't do to begin with.

There are many reasons why userspace would issue identify (note: I didn't
say they are good reasons), and off the hand I recall hddtemp as a likely
culprit.  Also, sometimes the local admin does hdparm -I for whatever
reason.  So, I am not surprised someone found a way to cause many IDENTIFY
commands to be issued.

Other SMART-maintenance utilities might issue IDENTIFY as well.  And if this
is an issue with SMART in general, smartd issues SMART commands (I don't
know if it uses IDENTIFY) once per hour to check attributes, and can be
configured to fire off SMART short/long/offline tests automatically.  The
local admin sends SMART commands (through smartctl) with the disks hot to
check the error log after EH, etc.

IMHO, the kernel really should be protecting userland against data
corruption here, even if it means a massive hit on disk performance while
the SMART commands are being processed.

> There was another similar problem.  Some acpi package in ubuntu issues
> APM adjustment commands whenever power related stuff changes.  The

Yes.  If you fail to do this on ThinkPads (many models, but probably not
all), your disk will break in 1-2yr maximum, and THAT assumes you have
Hitachi notebook HDs that are supposed to take 600k head unloads before
croaking...  most other vendors say thay can only do 300k head unloads in
their datasheets (if you can find a datasheet at all).  If you need a reason
to buy Hitachi HDs, this is it: they give you full, proper datasheets.

The *firmware* of these laptops will issue these annoying APM commands by
itself when power state changes, and not even setting the BIOS to
"performance" mode makes it stop with the destructive behaviour.  So any
disk that cannot take receiving APM commands many times per day on such
laptops will cause problems.

Now, why Ubuntu would do this outside of the ThinkPads, or target anything
other than magnetic disk media, I don't know.  Maybe other laptop vendors
also had the same idea.  Maybe Ubuntu was simplistic on their approach when
they added this defensive feature.  Maybe it was considered a PM feature and
it is not even related to the ThinkPad APM annoyance.  You'd have to ask
them.

> firmware on the drive which shipped on Samsung NC10 for some reason
> locks up after being hit with enough of those commands.  It's just not
> safe to assume these kind of stuff would reliably work.  If you're

Maybe we can blacklist such commands on drives known to mismimplement them?

> ready to do some research and experiments, it's fine.  If you're doing
> OEM customization with specific hardware and QA, sure, why not (this
> is basically what windows OEMs do too).  But, doing things which
> aren't _usually_ used that way repeatedly _by default_ is asking for
> trouble.  There's a reason why these operations are root only.  :-)

There are real user cases for APM commands, and for SMART commands...

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-14 13:05                                             ` Tejun Heo
@ 2009-09-14 14:25                                               ` Mark Lord
  2009-09-16 23:19                                                 ` Chris Webb
  0 siblings, 1 reply; 84+ messages in thread
From: Mark Lord @ 2009-09-14 14:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Chris Webb, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

Tejun Heo wrote:
> Mark Lord wrote:
>> Tejun Heo wrote:
>> ..
>>> Oooh, another possibility is the above continuous IDENTIFY tries.
>>> Doing things like that generally isn't a good idea because vendors
>>> don't expect IDENTIFY to be mixed regularly with normal IOs and
>>> firmwares aren't tested against that.  Even smart commands sometimes
>>> cause problems.  So, finding out the thing which is obsessed with the
>>> identity of the drive and stopping it might help.
>> ..
>>
>> Bullpucky.  That sort of thing, specifically with IDENTIFY,
>> has never been an issue.
> 
> With SMART it has.  I wouldn't be too surprised if some new firmware
> chokes on repeated IDENTIFY mixed with stream of NCQ commands.  It's
> just not something people (including vendors) do regularly.
..

Yeah, some drives really don't like SMART commands (hddtemp & smartctl).
That's a strange one, too.  Because the whole idea of SMART
is that it gets used to periodically monitor drive health.

IDENTIFY is much safer -- usually no media access after initial spin-up,
and lots of things exercise it quite regularly.

Pretty much any hdparm command triggers an IDENTIFY beforehand now,
hddtemp and smartctl both use it too.

I suspect we're missing some info from this specific failure.
Looking back at Chris's earlier posting, the whole thing started
with a FLUSH_CACHE_EXT failure.  Once that happens, all bets are
off on anything that follows.

> Everything will be running fine when suddenly:
> 
>   ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>   ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
>           res 40/00:00:80:17:91/00:00:37:00:00/40 Emask 0x4 (timeout)
>   ata1.00: status: { DRDY }
>   ata1: hard resetting link
>   ata1: softreset failed (device not ready)
>   ata1: hard resetting link
>   ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>   ata1.00: configured for UDMA/133
>   ata1: EH complete
>   end_request: I/O error, dev sda, sector 1465147272
>   md: super_written gets error=-5, uptodate=0
>   raid10: Disk failure on sda3, disabling device.
>   raid10: Operation continuing on 5 devices.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-14 14:02                                               ` Henrique de Moraes Holschuh
@ 2009-09-14 14:34                                                 ` Tejun Heo
  0 siblings, 0 replies; 84+ messages in thread
From: Tejun Heo @ 2009-09-14 14:34 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh
  Cc: Chris Webb, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

Hello,

Henrique de Moraes Holschuh wrote:
>>> This is the kind of stuff that userspace should NOT have to worry about
>>> (because it will get it wrong and cause data corruption eventually).
>> If this indeed is the case (As Mark pointed out, there hasn't been any
>> precedence involving IDENTIFY but it's also the first time I see
>> IDENTIFY timeouts which are issued from userland), this is the kind
>> that userspace shouldn't do to begin with.
> 
> There are many reasons why userspace would issue identify (note: I didn't
> say they are good reasons), and off the hand I recall hddtemp as a likely
> culprit.  Also, sometimes the local admin does hdparm -I for whatever
> reason.  So, I am not surprised someone found a way to cause many IDENTIFY
> commands to be issued.

Heh... and there have been plenty of IO errors and timeouts coming
from hddtemp.  :-)

> Other SMART-maintenance utilities might issue IDENTIFY as well.  And if this
> is an issue with SMART in general, smartd issues SMART commands (I don't
> know if it uses IDENTIFY) once per hour to check attributes, and can be
> configured to fire off SMART short/long/offline tests automatically.  The
> local admin sends SMART commands (through smartctl) with the disks hot to
> check the error log after EH, etc.
> 
> IMHO, the kernel really should be protecting userland against data
> corruption here, even if it means a massive hit on disk performance while
> the SMART commands are being processed.

I don't know.  The problem is with test coverage.  As those aren't
used too often, they don't get tested too much so the coverage of the
blacklist wouldn't be too good and so on and there's very good reason
why those aren't used too often.  They're not all that useful for most
people.

>> There was another similar problem.  Some acpi package in ubuntu issues
>> APM adjustment commands whenever power related stuff changes.  The
> 
> Yes.  If you fail to do this on ThinkPads (many models, but probably not
> all), your disk will break in 1-2yr maximum, and THAT assumes you have
> Hitachi notebook HDs that are supposed to take 600k head unloads before
> croaking...  most other vendors say thay can only do 300k head unloads in
> their datasheets (if you can find a datasheet at all).  If you need a reason
> to buy Hitachi HDs, this is it: they give you full, proper datasheets.

There are plenty drives and configurations like that and different
drives need different APM value to function properly.  storage-fixup
deals exactly with the problem.

 http://git.kernel.org/?p=linux/kernel/git/tj/storage-fixup.git;a=summary

But please note that it's only done once during boot and resume on
machines which are known to specifically need it and with values
reported to work.

> The *firmware* of these laptops will issue these annoying APM commands by
> itself when power state changes, and not even setting the BIOS to
> "performance" mode makes it stop with the destructive behaviour.  So any
> disk that cannot take receiving APM commands many times per day on such
> laptops will cause problems.

Yeap, well, that's what vendors do.  They put together specific subset
of components and try to figure out configurations which work.  If you
replace components on your own, they won't guarantee it will work.
Sucky but that's the way it is.

> Now, why Ubuntu would do this outside of the ThinkPads, or target anything
> other than magnetic disk media, I don't know.  Maybe other laptop vendors
> also had the same idea.  Maybe Ubuntu was simplistic on their approach when
> they added this defensive feature.  Maybe it was considered a PM feature and
> it is not even related to the ThinkPad APM annoyance.  You'd have to ask
> them.

The feature probabaly doesn't have much to do with the frequent head
unload problem.  Unplugging or pluggin in the AC cord also triggered
APM commands to be issued so it's more likely they were trying to
optimize performance / power balance.  The only problem is that APM
setting values aren't clearly defined and just are not too well
tested.

>> firmware on the drive which shipped on Samsung NC10 for some reason
>> locks up after being hit with enough of those commands.  It's just not
>> safe to assume these kind of stuff would reliably work.  If you're
> 
> Maybe we can blacklist such commands on drives known to mismimplement them?

Yes, a possibility but we're unlikely to build meaningful coverage and
likely to prevent valid usages too.  ie. A firmware might lock up when
APM settings are adjusted continuously while setting it once after
booting is fine.  I really want to avoid implementing such logics for
different drives in kernel.

>> ready to do some research and experiments, it's fine.  If you're doing
>> OEM customization with specific hardware and QA, sure, why not (this
>> is basically what windows OEMs do too).  But, doing things which
>> aren't _usually_ used that way repeatedly _by default_ is asking for
>> trouble.  There's a reason why these operations are root only.  :-)
> 
> There are real user cases for APM commands, and for SMART commands...

Yeap, sure, but it just doesn't work very well, not yet at least.
SMART is usually better tested than APM but given the number of
reports I've seen from hddtemp users, certain aspects of it are broken
on many drives.  There isn't a clear answer.  For usual parts of
SMART, it's probably pretty safe but then again don't go too far with
it.  Do it every several hours or every day not every ten secs.  APM
is way more dangerous, if your machine needs it use it minimally.  If
certain combination of values are known to work for the particular
configuration, go ahead and use it.  In other cases, just stay away
from it.

What people use often get tested and verified by vendors and promptly
fixed.  What people don't use often won't be and will be unreliable.
If you want to do things people usually don't do, it's your
responsibility to ensure it actually works.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-14  7:46                                       ` Tejun Heo
@ 2009-09-14 21:13                                         ` Thomas Fjellstrom
  2009-09-14 22:23                                           ` Tejun Heo
  0 siblings, 1 reply; 84+ messages in thread
From: Thomas Fjellstrom @ 2009-09-14 21:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Chris Webb, linux-scsi, Ric Wheeler, Andrei Tanas,
	NeilBrown, IDE/ATA development list, Jeff Garzik, Mark Lord

[-- Attachment #1: Type: Text/Plain, Size: 1443 bytes --]

On Mon September 14 2009, Tejun Heo wrote:
> Hello,
> 
> Thomas Fjellstrom wrote:
> > I have the same issue with a single WD 2TB Green drive. Technically two,
> > but it always only gets errors from the same drive, so I was assuming it
> > was the drive. I only have to setup the raid0 array, and put some light
> > load on it for the kernel to start complaining, and eventually it just
> > kicks the drive completely with the following messages:
> >
> > sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> > end_request: I/O error, dev sdb, sector 202026972
> >
> > The drive does work fine prior to the frozen timeout errors. And I was
> > using it in windows (same raid0 config) just fine with no errors what so
> > ever.
> 
> Can you post full dmesg output?  The above doesn't tell much about ATA
> side of things.
> 
> Thanks.
> 

Sure, I've attached the full dmesg from a full test I ran today (I couldn't 
find the old log where that bit came from). I'm running 2.6.31-rc9 right now, 
and will probably update to the final 31 release soonish. The test I ran 
actually finished (dd if=/dev/sdc of=/dev/null bs=8M), whereas with earlier 
kernels it was completely failing. Of course, I was actually trying to bring 
up the md raid0 array (2x2TB), mount the filesystem, and copy the files off 
before. mdraid is probably more sensitive to the end_request errors than dd 
is.

-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

[-- Attachment #2: disk.dmesg --]
[-- Type: text/plain, Size: 89010 bytes --]

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.31-rc9 (root@natasha) (gcc version 4.3.4 (Debian 4.3.4-2) ) #2 SMP Wed Sep 9 08:08:59 MDT 2009
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-2.6.31-rc9 root=UUID=272a1709-267d-4134-a6b4-e8d3884dca37 ro
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
[    0.000000]  BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000cfde0000 (usable)
[    0.000000]  BIOS-e820: 00000000cfde0000 - 00000000cfde3000 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000cfde3000 - 00000000cfdf0000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000cfdf0000 - 00000000cfe00000 (reserved)
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 0000000130000000 (usable)
[    0.000000] DMI 2.4 present.
[    0.000000] last_pfn = 0x130000 max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: uncachable
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-C7FFF write-protect
[    0.000000]   C8000-FFFFF uncachable
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 000000000000 mask FFFF80000000 write-back
[    0.000000]   1 base 000080000000 mask FFFFC0000000 write-back
[    0.000000]   2 base 0000C0000000 mask FFFFF0000000 write-back
[    0.000000]   3 base 0000CFE00000 mask FFFFFFE00000 uncachable
[    0.000000]   4 base 000100000000 mask FFFFE0000000 write-back
[    0.000000]   5 base 000120000000 mask FFFFF0000000 write-back
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] TOM2: 0000000130000000 aka 4864M
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[    0.000000] e820 update range: 00000000cfe00000 - 0000000100000000 (usable) ==> (reserved)
[    0.000000] last_pfn = 0xcfde0 max_arch_pfn = 0x400000000
[    0.000000] initial memory mapped : 0 - 20000000
[    0.000000] Using GB pages for direct mapping
[    0.000000] init_memory_mapping: 0000000000000000-00000000cfde0000
[    0.000000]  0000000000 - 00c0000000 page 1G
[    0.000000]  00c0000000 - 00cfc00000 page 2M
[    0.000000]  00cfc00000 - 00cfde0000 page 4k
[    0.000000] kernel direct mapping tables up to cfde0000 @ 8000-b000
[    0.000000] init_memory_mapping: 0000000100000000-0000000130000000
[    0.000000]  0100000000 - 0130000000 page 2M
[    0.000000] kernel direct mapping tables up to 130000000 @ a000-c000
[    0.000000] RAMDISK: 376f1000 - 37fefd50
[    0.000000] ACPI: RSDP 00000000000f7550 00014 (v00 GBT   )
[    0.000000] ACPI: RSDT 00000000cfde3000 0003C (v01 GBT    GBTUACPI 42302E31 GBTU 01010101)
[    0.000000] ACPI: FACP 00000000cfde3040 00074 (v01 GBT    GBTUACPI 42302E31 GBTU 01010101)
[    0.000000] ACPI: DSDT 00000000cfde30c0 0731E (v01 GBT    GBTUACPI 00001000 MSFT 03000000)
[    0.000000] ACPI: FACS 00000000cfde0000 00040
[    0.000000] ACPI: SSDT 00000000cfdea4c0 0088C (v01 PTLTD  POWERNOW 00000001  LTP 00000001)
[    0.000000] ACPI: HPET 00000000cfdead80 00038 (v01 GBT    GBTUACPI 42302E31 GBTU 00000098)
[    0.000000] ACPI: MCFG 00000000cfdeadc0 0003C (v01 GBT    GBTUACPI 42302E31 GBTU 01010101)
[    0.000000] ACPI: TAMG 00000000cfdeae00 0030A (v01 GBT    GBT   B0 5455312E BG\x01\x01 53450101)
[    0.000000] ACPI: APIC 00000000cfdea400 00084 (v01 GBT    GBTUACPI 42302E31 GBTU 01010101)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] Scanning NUMA topology in Northbridge 24
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at 0000000000000000-0000000130000000
[    0.000000] Bootmem setup node 0 0000000000000000-0000000130000000
[    0.000000]   NODE_DATA [000000000000b000 - 0000000000012fff]
[    0.000000]   bootmap [0000000000013000 -  0000000000038fff] pages 26
[    0.000000] (8 early reservations) ==> bootmem [0000000000 - 0130000000]
[    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
[    0.000000]   #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
[    0.000000]   #2 [0001000000 - 0001623dac]    TEXT DATA BSS ==> [0001000000 - 0001623dac]
[    0.000000]   #3 [00376f1000 - 0037fefd50]          RAMDISK ==> [00376f1000 - 0037fefd50]
[    0.000000]   #4 [000009f800 - 0000100000]    BIOS reserved ==> [000009f800 - 0000100000]
[    0.000000]   #5 [0001624000 - 0001624106]              BRK ==> [0001624000 - 0001624106]
[    0.000000]   #6 [0000008000 - 000000a000]          PGTABLE ==> [0000008000 - 000000a000]
[    0.000000]   #7 [000000a000 - 000000b000]          PGTABLE ==> [000000a000 - 000000b000]
[    0.000000] found SMP MP-table at [ffff8800000f5b40] f5b40
[    0.000000]  [ffffea0000000000-ffffea00043fffff] PMD -> [ffff880028600000-ffff88002bffffff] on node 0
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000000 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   0x00100000 -> 0x00130000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[3] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x0000009f
[    0.000000]     0: 0x00000100 -> 0x000cfde0
[    0.000000]     0: 0x00100000 -> 0x00130000
[    0.000000] On node 0 totalpages: 1047935
[    0.000000]   DMA zone: 56 pages used for memmap
[    0.000000]   DMA zone: 102 pages reserved
[    0.000000]   DMA zone: 3841 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 14280 pages used for memmap
[    0.000000]   DMA32 zone: 833048 pages, LIFO batch:31
[    0.000000]   Normal zone: 2688 pages used for memmap
[    0.000000]   Normal zone: 193920 pages, LIFO batch:31
[    0.000000] ACPI: PM-Timer IO Port: 0x4008
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1])
[    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 2, version 33, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] ACPI: IRQ2 used by override.
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x10b9a201 base: 0xfed00000
[    0.000000] SMP: Allowing 4 CPUs, 0 hotplug CPUs
[    0.000000] nr_irqs_gsi: 24
[    0.000000] PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
[    0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
[    0.000000] PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
[    0.000000] PM: Registered nosave memory: 00000000cfde0000 - 00000000cfde3000
[    0.000000] PM: Registered nosave memory: 00000000cfde3000 - 00000000cfdf0000
[    0.000000] PM: Registered nosave memory: 00000000cfdf0000 - 00000000cfe00000
[    0.000000] PM: Registered nosave memory: 00000000cfe00000 - 00000000e0000000
[    0.000000] PM: Registered nosave memory: 00000000e0000000 - 00000000f0000000
[    0.000000] PM: Registered nosave memory: 00000000f0000000 - 00000000fec00000
[    0.000000] PM: Registered nosave memory: 00000000fec00000 - 0000000100000000
[    0.000000] Allocating PCI resources starting at cfe00000 (gap: cfe00000:10200000)
[    0.000000] NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:4 nr_node_ids:1
[    0.000000] PERCPU: Embedded 29 pages at ffff880028022000, static data 86368 bytes
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 1030809
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-2.6.31-rc9 root=UUID=272a1709-267d-4134-a6b4-e8d3884dca37 ro
[    0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
[    0.000000] Initializing CPU#0
[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Node 0: aperture @ 8d2c000000 size 32 MB
[    0.000000] Aperture beyond 4GB. Ignoring.
[    0.000000] Your BIOS doesn't leave a aperture memory hole
[    0.000000] Please enable the IOMMU option in the BIOS setup
[    0.000000] This costs you 64 MB of RAM
[    0.000000] Mapping aperture over 65536 KB of RAM @ 20000000
[    0.000000] PM: Registered nosave memory: 0000000020000000 - 0000000024000000
[    0.000000] Memory: 4050584k/4980736k available (2944k kernel code, 788996k absent, 141156k reserved, 1640k data, 584k init)
[    0.000000] Hierarchical RCU implementation.
[    0.000000] NR_IRQS:4352 nr_irqs:440
[    0.000000] Fast TSC calibration using PIT
[    0.000000] Detected 2611.766 MHz processor.
[    0.000010] spurious 8259A interrupt: IRQ7.
[    0.004000] Console: colour VGA+ 80x25
[    0.004000] console [tty0] enabled
[    0.004000] hpet clockevent registered
[    0.004000]   alloc irq_desc for 24 on node 0
[    0.004000]   alloc kstat_irqs on node 0
[    0.004000] HPET: 4 timers in total, 1 timers will be used for per-cpu timer
[    0.004006] Calibrating delay loop (skipped), value calculated using timer frequency.. 5223.52 BogoMIPS (lpj=10447048)
[    0.004169] Security Framework initialized
[    0.004225] SELinux:  Disabled at boot.
[    0.004502] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
[    0.005671] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
[    0.006221] Mount-cache hash table entries: 256
[    0.006374] Initializing cgroup subsys ns
[    0.006429] Initializing cgroup subsys cpuacct
[    0.006485] Initializing cgroup subsys devices
[    0.006539] Initializing cgroup subsys freezer
[    0.006595] Initializing cgroup subsys net_cls
[    0.006665] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[    0.006722] CPU: L2 Cache: 512K (64 bytes/line)
[    0.006778] CPU 0/0x0 -> Node 0
[    0.006832] tseg: 00cfe00000
[    0.006833] CPU: Physical Processor ID: 0
[    0.006887] CPU: Processor Core ID: 0
[    0.006941] mce: CPU supports 6 MCE banks
[    0.007000] using C1E aware idle routine
[    0.007054] Performance Counters: AMD PMU driver.
[    0.007151] ... version:                 0
[    0.007205] ... bit width:               48
[    0.007258] ... generic counters:        4
[    0.007312] ... value mask:              0000ffffffffffff
[    0.007367] ... max period:              00007fffffffffff
[    0.007423] ... fixed-purpose counters:  0
[    0.007476] ... counter mask:            000000000000000f
[    0.008232] ACPI: Core revision 20090521
[    0.020059] Setting APIC routing to flat
[    0.020593] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.062906] CPU0: AMD Phenom(tm) II X4 810 Processor stepping 02
[    0.064001] Booting processor 1 APIC 0x1 ip 0x6000
[    0.004000] Initializing CPU#1
[    0.004000] Calibrating delay using timer specific routine.. 5223.83 BogoMIPS (lpj=10447671)
[    0.004000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[    0.004000] CPU: L2 Cache: 512K (64 bytes/line)
[    0.004000] CPU 1/0x1 -> Node 0
[    0.004000] CPU: Physical Processor ID: 0
[    0.004000] CPU: Processor Core ID: 1
[    0.004000] mce: CPU supports 6 MCE banks
[    0.004000] x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
[    0.148742] CPU1: AMD Phenom(tm) II X4 810 Processor stepping 02
[    0.149373] checking TSC synchronization [CPU#0 -> CPU#1]: passed.
[    0.152017] System has AMD C1E enabled
[    0.152067] Booting processor 2 APIC 0x2 ip 0x6000
[    0.152140] Switch to broadcast mode on CPU1
[    0.004000] Initializing CPU#2
[    0.004000] Calibrating delay using timer specific routine.. 5223.82 BogoMIPS (lpj=10447655)
[    0.004000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[    0.004000] CPU: L2 Cache: 512K (64 bytes/line)
[    0.004000] CPU 2/0x2 -> Node 0
[    0.004000] CPU: Physical Processor ID: 0
[    0.004000] CPU: Processor Core ID: 2
[    0.004000] mce: CPU supports 6 MCE banks
[    0.004000] x86 PAT enabled: cpu 2, old 0x7040600070406, new 0x7010600070106
[    0.244673] CPU2: AMD Phenom(tm) II X4 810 Processor stepping 02
[    0.245304] checking TSC synchronization [CPU#0 -> CPU#2]: passed.
[    0.248018] Switch to broadcast mode on CPU2
[    0.248073] Booting processor 3 APIC 0x3 ip 0x6000
[    0.004000] Initializing CPU#3
[    0.004000] Calibrating delay using timer specific routine.. 5223.83 BogoMIPS (lpj=10447677)
[    0.004000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[    0.004000] CPU: L2 Cache: 512K (64 bytes/line)
[    0.004000] CPU 3/0x3 -> Node 0
[    0.004000] CPU: Physical Processor ID: 0
[    0.004000] CPU: Processor Core ID: 3
[    0.004000] mce: CPU supports 6 MCE banks
[    0.004000] x86 PAT enabled: cpu 3, old 0x7040600070406, new 0x7010600070106
[    0.340711] CPU3: AMD Phenom(tm) II X4 810 Processor stepping 02
[    0.341341] checking TSC synchronization [CPU#0 -> CPU#3]: passed.
[    0.344022] Brought up 4 CPUs
[    0.344022] Switch to broadcast mode on CPU3
[    0.344129] Total of 4 processors activated (20895.02 BogoMIPS).
[    0.344293] CPU0 attaching sched-domain:
[    0.344295]  domain 0: span 0-3 level MC
[    0.344297]   groups: 0 1 2 3
[    0.344302] CPU1 attaching sched-domain:
[    0.344304]  domain 0: span 0-3 level MC
[    0.344305]   groups: 1 2 3 0
[    0.344309] CPU2 attaching sched-domain:
[    0.344310]  domain 0: span 0-3 level MC
[    0.344312]   groups: 2 3 0 1
[    0.344316] CPU3 attaching sched-domain:
[    0.344317]  domain 0: span 0-3 level MC
[    0.344318]   groups: 3 0 1 2
[    0.344381] Switch to broadcast mode on CPU0
[    0.344381] Booting paravirtualized kernel on bare hardware
[    0.344381] regulator: core version 0.5
[    0.344381] NET: Registered protocol family 16
[    0.344381] node 0 link 0: io port [a000, ffff]
[    0.344381] TOM: 00000000d0000000 aka 3328M
[    0.344381] Fam 10h mmconf [e0000000, e00fffff]
[    0.344381] node 0 link 0: mmio [a0000, bffff]
[    0.344381] node 0 link 0: mmio [d0000000, dfffffff]
[    0.344381] node 0 link 0: mmio [f0000000, fe02ffff]
[    0.344381] node 0 link 0: mmio [e0000000, e05fffff] ==> [e0100000, e05fffff]
[    0.344381] TOM2: 0000000130000000 aka 4864M
[    0.344381] bus: [00,05] on node 0 link 0
[    0.344381] bus: 00 index 0 io port: [0, ffff]
[    0.344381] bus: 00 index 1 mmio: [a0000, bffff]
[    0.344381] bus: 00 index 2 mmio: [d0000000, dfffffff]
[    0.344381] bus: 00 index 3 mmio: [e0600000, ffffffff]
[    0.344381] bus: 00 index 4 mmio: [e0100000, e05fffff]
[    0.344381] bus: 00 index 5 mmio: [130000000, fcffffffff]
[    0.344381] ACPI: bus type pci registered
[    0.344381] PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 255
[    0.344381] PCI: MCFG area at e0000000 reserved in E820
[    0.351426] PCI: Using MMCONFIG at e0000000 - efffffff
[    0.351482] PCI: Using configuration type 1 for base access
[    0.352108] bio: create slab <bio-0> at 0
[    0.352322] ACPI: EC: Look up EC in DSDT
[    0.360448] ACPI: Interpreter enabled
[    0.360505] ACPI: (supports S0 S3 S4 S5)
[    0.360739] ACPI: Using IOAPIC for interrupt routing
[    0.364369] ACPI: No dock devices found.
[    0.364369] ACPI: PCI Root Bridge [PCI0] (0000:00)
[    0.364394] pci 0000:00:00.0: reg 1c 64bit mmio: [0xe0000000-0xffffffff]
[    0.364394] pci 0000:00:02.0: PME# supported from D0 D3hot D3cold
[    0.364394] pci 0000:00:02.0: PME# disabled
[    0.364394] pci 0000:00:07.0: PME# supported from D0 D3hot D3cold
[    0.364394] pci 0000:00:07.0: PME# disabled
[    0.364394] pci 0000:00:09.0: PME# supported from D0 D3hot D3cold
[    0.364394] pci 0000:00:09.0: PME# disabled
[    0.364468] pci 0000:00:0b.0: PME# supported from D0 D3hot D3cold
[    0.364525] pci 0000:00:0b.0: PME# disabled
[    0.364620] pci 0000:00:11.0: reg 10 io port: [0xff00-0xff07]
[    0.364627] pci 0000:00:11.0: reg 14 io port: [0xfe00-0xfe03]
[    0.364633] pci 0000:00:11.0: reg 18 io port: [0xfd00-0xfd07]
[    0.364639] pci 0000:00:11.0: reg 1c io port: [0xfc00-0xfc03]
[    0.364644] pci 0000:00:11.0: reg 20 io port: [0xfb00-0xfb0f]
[    0.364651] pci 0000:00:11.0: reg 24 32bit mmio: [0xfe02f000-0xfe02f3ff]
[    0.364698] pci 0000:00:12.0: reg 10 32bit mmio: [0xfe02e000-0xfe02efff]
[    0.364747] pci 0000:00:12.1: reg 10 32bit mmio: [0xfe02d000-0xfe02dfff]
[    0.364813] pci 0000:00:12.2: reg 10 32bit mmio: [0xfe02c000-0xfe02c0ff]
[    0.364860] pci 0000:00:12.2: supports D1 D2
[    0.364861] pci 0000:00:12.2: PME# supported from D0 D1 D2 D3hot
[    0.364919] pci 0000:00:12.2: PME# disabled
[    0.365001] pci 0000:00:13.0: reg 10 32bit mmio: [0xfe02b000-0xfe02bfff]
[    0.365049] pci 0000:00:13.1: reg 10 32bit mmio: [0xfe02a000-0xfe02afff]
[    0.365115] pci 0000:00:13.2: reg 10 32bit mmio: [0xfe029000-0xfe0290ff]
[    0.365162] pci 0000:00:13.2: supports D1 D2
[    0.365163] pci 0000:00:13.2: PME# supported from D0 D1 D2 D3hot
[    0.365221] pci 0000:00:13.2: PME# disabled
[    0.365378] pci 0000:00:14.1: reg 10 io port: [0x00-0x07]
[    0.365384] pci 0000:00:14.1: reg 14 io port: [0x00-0x03]
[    0.365390] pci 0000:00:14.1: reg 18 io port: [0x00-0x07]
[    0.365396] pci 0000:00:14.1: reg 1c io port: [0x00-0x03]
[    0.365402] pci 0000:00:14.1: reg 20 io port: [0xfa00-0xfa0f]
[    0.365461] pci 0000:00:14.2: reg 10 64bit mmio: [0xfe024000-0xfe027fff]
[    0.365500] pci 0000:00:14.2: PME# supported from D0 D3hot D3cold
[    0.365558] pci 0000:00:14.2: PME# disabled
[    0.365706] pci 0000:00:14.5: reg 10 32bit mmio: [0xfe028000-0xfe028fff]
[    0.365823] pci 0000:01:00.0: reg 10 32bit mmio: [0xfa000000-0xfaffffff]
[    0.365831] pci 0000:01:00.0: reg 14 64bit mmio: [0xd0000000-0xdfffffff]
[    0.365838] pci 0000:01:00.0: reg 1c 64bit mmio: [0xf8000000-0xf9ffffff]
[    0.365843] pci 0000:01:00.0: reg 24 io port: [0xcf00-0xcf7f]
[    0.365848] pci 0000:01:00.0: reg 30 32bit mmio: [0x000000-0x01ffff]
[    0.365910] pci 0000:00:02.0: bridge io port: [0xc000-0xcfff]
[    0.365912] pci 0000:00:02.0: bridge 32bit mmio: [0xf8000000-0xfbffffff]
[    0.365915] pci 0000:00:02.0: bridge 64bit mmio pref: [0xd0000000-0xdfffffff]
[    0.365946] pci 0000:02:00.0: reg 10 io port: [0xbe00-0xbeff]
[    0.365959] pci 0000:02:00.0: reg 18 64bit mmio: [0xfdbff000-0xfdbfffff]
[    0.365969] pci 0000:02:00.0: reg 20 64bit mmio: [0xfdbf8000-0xfdbfbfff]
[    0.365975] pci 0000:02:00.0: reg 30 32bit mmio: [0x000000-0x01ffff]
[    0.366001] pci 0000:02:00.0: supports D1 D2
[    0.366003] pci 0000:02:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.366061] pci 0000:02:00.0: PME# disabled
[    0.368023] pci 0000:00:07.0: bridge io port: [0xb000-0xbfff]
[    0.368025] pci 0000:00:07.0: bridge 32bit mmio: [0xfdc00000-0xfdcfffff]
[    0.368028] pci 0000:00:07.0: bridge 64bit mmio pref: [0xfdb00000-0xfdbfffff]
[    0.368059] pci 0000:03:00.0: reg 10 io port: [0xee00-0xeeff]
[    0.368072] pci 0000:03:00.0: reg 18 64bit mmio: [0xfdfff000-0xfdffffff]
[    0.368082] pci 0000:03:00.0: reg 20 64bit mmio: [0xfdff8000-0xfdffbfff]
[    0.368088] pci 0000:03:00.0: reg 30 32bit mmio: [0x000000-0x01ffff]
[    0.368114] pci 0000:03:00.0: supports D1 D2
[    0.368115] pci 0000:03:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.368174] pci 0000:03:00.0: PME# disabled
[    0.368279] pci 0000:00:09.0: bridge io port: [0xe000-0xefff]
[    0.368282] pci 0000:00:09.0: bridge 32bit mmio: [0xfd800000-0xfd8fffff]
[    0.368285] pci 0000:00:09.0: bridge 64bit mmio pref: [0xfdf00000-0xfdffffff]
[    0.368319] pci 0000:04:00.0: reg 18 io port: [0xdf00-0xdf7f]
[    0.368331] pci 0000:04:00.0: reg 20 64bit mmio: [0xfdef0000-0xfdefffff]
[    0.368336] pci 0000:04:00.0: reg 30 32bit mmio: [0x000000-0x03ffff]
[    0.368357] pci 0000:04:00.0: supports D1
[    0.368358] pci 0000:04:00.0: PME# supported from D0 D1 D3hot
[    0.368415] pci 0000:04:00.0: PME# disabled
[    0.368513] pci 0000:00:0b.0: bridge io port: [0xd000-0xdfff]
[    0.368515] pci 0000:00:0b.0: bridge 32bit mmio: [0xfde00000-0xfdefffff]
[    0.368519] pci 0000:00:0b.0: bridge 64bit mmio pref: [0xfdd00000-0xfddfffff]
[    0.368580] pci 0000:05:0e.0: reg 10 32bit mmio: [0xfdaff000-0xfdaff7ff]
[    0.368588] pci 0000:05:0e.0: reg 14 32bit mmio: [0xfdaf8000-0xfdafbfff]
[    0.368642] pci 0000:05:0e.0: supports D1 D2
[    0.368643] pci 0000:05:0e.0: PME# supported from D0 D1 D2 D3hot
[    0.368702] pci 0000:05:0e.0: PME# disabled
[    0.368787] pci 0000:00:14.4: transparent bridge
[    0.368843] pci 0000:00:14.4: bridge io port: [0xa000-0xafff]
[    0.368847] pci 0000:00:14.4: bridge 32bit mmio: [0xfda00000-0xfdafffff]
[    0.368851] pci 0000:00:14.4: bridge 32bit mmio pref: [0xfd900000-0xfd9fffff]
[    0.368866] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[    0.369104] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P2P_._PRT]
[    0.369176] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCE2._PRT]
[    0.369228] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCE7._PRT]
[    0.369274] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCE9._PRT]
[    0.369320] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCEB._PRT]
[    0.388828] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 11) *0, disabled.
[    0.388828] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 11) *0, disabled.
[    0.389253] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 10 11) *0, disabled.
[    0.389846] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 11) *0, disabled.
[    0.392302] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11) *0, disabled.
[    0.392895] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11) *0, disabled.
[    0.393488] ACPI: PCI Interrupt Link [LNK0] (IRQs 3 4 5 6 7 10 11) *0, disabled.
[    0.394081] ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 5 6 7 10 11) *0, disabled.
[    0.394635] usbcore: registered new interface driver usbfs
[    0.394635] usbcore: registered new interface driver hub
[    0.394635] usbcore: registered new device driver usb
[    0.394635] PCI: Using ACPI for IRQ routing
[    0.394635] pci 0000:00:00.0: BAR 3: address space collision on of device [0xe0000000-0xffffffff]
[    0.394635] pci 0000:00:00.0: BAR 3: can't allocate resource
[    0.408117] PCI-DMA: Disabling AGP.
[    0.408215] PCI-DMA: aperture base @ 20000000 size 65536 KB
[    0.408215] PCI-DMA: using GART IOMMU.
[    0.408215] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
[    0.409515] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 24, 0
[    0.409780] hpet0: 4 comparators, 32-bit 14.318180 MHz counter
[    0.416029] hpet: hpet2 irq 24 for MSI
[    0.452132] pnp: PnP ACPI init
[    0.452198] ACPI: bus type pnp registered
[    0.454894] pnp 00:0d: mem resource (0xd2c00-0xd3fff) overlaps 0000:00:00.0 BAR 3 (0x0-0x1fffffff), disabling
[    0.454963] pnp 00:0d: mem resource (0xf0000-0xf7fff) overlaps 0000:00:00.0 BAR 3 (0x0-0x1fffffff), disabling
[    0.455032] pnp 00:0d: mem resource (0xf8000-0xfbfff) overlaps 0000:00:00.0 BAR 3 (0x0-0x1fffffff), disabling
[    0.455100] pnp 00:0d: mem resource (0xfc000-0xfffff) overlaps 0000:00:00.0 BAR 3 (0x0-0x1fffffff), disabling
[    0.455169] pnp 00:0d: mem resource (0x0-0x9ffff) overlaps 0000:00:00.0 BAR 3 (0x0-0x1fffffff), disabling
[    0.455237] pnp 00:0d: mem resource (0x100000-0xcfddffff) overlaps 0000:00:00.0 BAR 3 (0x0-0x1fffffff), disabling
[    0.455434] pnp: PnP ACPI: found 14 devices
[    0.455488] ACPI: ACPI bus type pnp unregistered
[    0.455549] system 00:01: ioport range 0x4d0-0x4d1 has been reserved
[    0.455606] system 00:01: ioport range 0x220-0x225 has been reserved
[    0.455663] system 00:01: ioport range 0x290-0x294 has been reserved
[    0.455723] system 00:02: ioport range 0x4100-0x411f has been reserved
[    0.455780] system 00:02: ioport range 0x228-0x22f has been reserved
[    0.455837] system 00:02: ioport range 0x40b-0x40b has been reserved
[    0.455893] system 00:02: ioport range 0x4d6-0x4d6 has been reserved
[    0.455950] system 00:02: ioport range 0xc00-0xc01 has been reserved
[    0.456010] system 00:02: ioport range 0xc14-0xc14 has been reserved
[    0.456067] system 00:02: ioport range 0xc50-0xc52 has been reserved
[    0.456123] system 00:02: ioport range 0xc6c-0xc6d has been reserved
[    0.456180] system 00:02: ioport range 0xc6f-0xc6f has been reserved
[    0.456237] system 00:02: ioport range 0xcd0-0xcd1 has been reserved
[    0.456293] system 00:02: ioport range 0xcd2-0xcd3 has been reserved
[    0.456350] system 00:02: ioport range 0xcd4-0xcdf has been reserved
[    0.456407] system 00:02: ioport range 0x4000-0x40fe has been reserved
[    0.456463] system 00:02: ioport range 0x4210-0x4217 has been reserved
[    0.456520] system 00:02: ioport range 0xb00-0xb0f has been reserved
[    0.456577] system 00:02: ioport range 0xb10-0xb1f has been reserved
[    0.456634] system 00:02: ioport range 0xb20-0xb3f has been reserved
[    0.456694] system 00:0c: iomem range 0xe0000000-0xefffffff has been reserved
[    0.456754] system 00:0d: iomem range 0xcfde0000-0xcfdfffff could not be reserved
[    0.456820] system 00:0d: iomem range 0xffff0000-0xffffffff has been reserved
[    0.456878] system 00:0d: iomem range 0xfec00000-0xfec00fff could not be reserved
[    0.456944] system 00:0d: iomem range 0xfee00000-0xfee00fff has been reserved
[    0.457001] system 00:0d: iomem range 0xfff80000-0xfffeffff has been reserved
[    0.462011] pci 0000:00:02.0: PCI bridge, secondary bus 0000:01
[    0.462068] pci 0000:00:02.0:   IO window: 0xc000-0xcfff
[    0.462124] pci 0000:00:02.0:   MEM window: 0xf8000000-0xfbffffff
[    0.462181] pci 0000:00:02.0:   PREFETCH window: 0x000000d0000000-0x000000dfffffff
[    0.462249] pci 0000:00:07.0: PCI bridge, secondary bus 0000:02
[    0.462306] pci 0000:00:07.0:   IO window: 0xb000-0xbfff
[    0.462362] pci 0000:00:07.0:   MEM window: 0xfdc00000-0xfdcfffff
[    0.462418] pci 0000:00:07.0:   PREFETCH window: 0x000000fdb00000-0x000000fdbfffff
[    0.462486] pci 0000:00:09.0: PCI bridge, secondary bus 0000:03
[    0.462542] pci 0000:00:09.0:   IO window: 0xe000-0xefff
[    0.462598] pci 0000:00:09.0:   MEM window: 0xfd800000-0xfd8fffff
[    0.462655] pci 0000:00:09.0:   PREFETCH window: 0x000000fdf00000-0x000000fdffffff
[    0.462722] pci 0000:00:0b.0: PCI bridge, secondary bus 0000:04
[    0.462778] pci 0000:00:0b.0:   IO window: 0xd000-0xdfff
[    0.462834] pci 0000:00:0b.0:   MEM window: 0xfde00000-0xfdefffff
[    0.462891] pci 0000:00:0b.0:   PREFETCH window: 0x000000fdd00000-0x000000fddfffff
[    0.462958] pci 0000:00:14.4: PCI bridge, secondary bus 0000:05
[    0.463015] pci 0000:00:14.4:   IO window: 0xa000-0xafff
[    0.463073] pci 0000:00:14.4:   MEM window: 0xfda00000-0xfdafffff
[    0.463131] pci 0000:00:14.4:   PREFETCH window: 0xfd900000-0xfd9fffff
[    0.463193]   alloc irq_desc for 18 on node 0
[    0.463194]   alloc kstat_irqs on node 0
[    0.463202] pci 0000:00:02.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[    0.463260] pci 0000:00:02.0: setting latency timer to 64
[    0.463263]   alloc irq_desc for 19 on node 0
[    0.463265]   alloc kstat_irqs on node 0
[    0.463271] pci 0000:00:07.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[    0.463328] pci 0000:00:07.0: setting latency timer to 64
[    0.463332]   alloc irq_desc for 17 on node 0
[    0.463333]   alloc kstat_irqs on node 0
[    0.463339] pci 0000:00:09.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[    0.463396] pci 0000:00:09.0: setting latency timer to 64
[    0.463400] pci 0000:00:0b.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[    0.463457] pci 0000:00:0b.0: setting latency timer to 64
[    0.463463] pci_bus 0000:00: resource 0 io:  [0x00-0xffff]
[    0.463465] pci_bus 0000:00: resource 1 mem: [0x000000-0xffffffffffffffff]
[    0.463468] pci_bus 0000:01: resource 0 io:  [0xc000-0xcfff]
[    0.463469] pci_bus 0000:01: resource 1 mem: [0xf8000000-0xfbffffff]
[    0.463471] pci_bus 0000:01: resource 2 pref mem [0xd0000000-0xdfffffff]
[    0.463473] pci_bus 0000:02: resource 0 io:  [0xb000-0xbfff]
[    0.463475] pci_bus 0000:02: resource 1 mem: [0xfdc00000-0xfdcfffff]
[    0.463477] pci_bus 0000:02: resource 2 pref mem [0xfdb00000-0xfdbfffff]
[    0.463479] pci_bus 0000:03: resource 0 io:  [0xe000-0xefff]
[    0.463481] pci_bus 0000:03: resource 1 mem: [0xfd800000-0xfd8fffff]
[    0.463483] pci_bus 0000:03: resource 2 pref mem [0xfdf00000-0xfdffffff]
[    0.463484] pci_bus 0000:04: resource 0 io:  [0xd000-0xdfff]
[    0.463486] pci_bus 0000:04: resource 1 mem: [0xfde00000-0xfdefffff]
[    0.463488] pci_bus 0000:04: resource 2 pref mem [0xfdd00000-0xfddfffff]
[    0.463490] pci_bus 0000:05: resource 0 io:  [0xa000-0xafff]
[    0.463492] pci_bus 0000:05: resource 1 mem: [0xfda00000-0xfdafffff]
[    0.463494] pci_bus 0000:05: resource 2 pref mem [0xfd900000-0xfd9fffff]
[    0.463496] pci_bus 0000:05: resource 3 io:  [0x00-0xffff]
[    0.463497] pci_bus 0000:05: resource 4 mem: [0x000000-0xffffffffffffffff]
[    0.463574] NET: Registered protocol family 2
[    0.463762] IP route cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.464653] TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
[    0.466866] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    0.467186] TCP: Hash tables configured (established 524288 bind 65536)
[    0.467244] TCP reno registered
[    0.467414] NET: Registered protocol family 1
[    0.467510] Trying to unpack rootfs image as initramfs...
[    0.503895] Switched to high resolution mode on CPU 1
[    0.503898] Switched to high resolution mode on CPU 2
[    0.503902] Switched to high resolution mode on CPU 3
[    0.504022] Switched to high resolution mode on CPU 0
[    0.622475] Freeing initrd memory: 9211k freed
[    0.626162] audit: initializing netlink socket (disabled)
[    0.626228] type=2000 audit(1252917058.625:1): initialized
[    0.626578] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    0.626774] VFS: Disk quotas dquot_6.5.2
[    0.626854] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.626949] msgmni has been set to 7929
[    0.627155] alg: No test for stdrng (krng)
[    0.627258] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[    0.627325] io scheduler noop registered
[    0.627378] io scheduler anticipatory registered
[    0.627433] io scheduler deadline registered
[    0.631821] io scheduler cfq registered (default)
[    0.772136] pci 0000:01:00.0: Boot video device
[    0.772281]   alloc irq_desc for 25 on node 0
[    0.772282]   alloc kstat_irqs on node 0
[    0.772289] pcieport-driver 0000:00:02.0: irq 25 for MSI/MSI-X
[    0.772293] pcieport-driver 0000:00:02.0: setting latency timer to 64
[    0.772410]   alloc irq_desc for 26 on node 0
[    0.772411]   alloc kstat_irqs on node 0
[    0.772415] pcieport-driver 0000:00:07.0: irq 26 for MSI/MSI-X
[    0.772419] pcieport-driver 0000:00:07.0: setting latency timer to 64
[    0.772530]   alloc irq_desc for 27 on node 0
[    0.772531]   alloc kstat_irqs on node 0
[    0.772535] pcieport-driver 0000:00:09.0: irq 27 for MSI/MSI-X
[    0.772538] pcieport-driver 0000:00:09.0: setting latency timer to 64
[    0.772647]   alloc irq_desc for 28 on node 0
[    0.772649]   alloc kstat_irqs on node 0
[    0.772652] pcieport-driver 0000:00:0b.0: irq 28 for MSI/MSI-X
[    0.772656] pcieport-driver 0000:00:0b.0: setting latency timer to 64
[    0.775299] Linux agpgart interface v0.103
[    0.775354] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.775527] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[    0.775947] 00:09: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[    0.777464] brd: module loaded
[    0.777578] input: Macintosh mouse button emulation as /devices/virtual/input/input0
[    0.777793] PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1
[    0.777849] PNP: PS/2 appears to have AUX port disabled, if this is incorrect please boot with i8042.nopnp
[    0.778023] serio: i8042 KBD port at 0x60,0x64 irq 1
[    0.778188] mice: PS/2 mouse device common for all mice
[    0.778293] rtc_cmos 00:05: RTC can wake from S4
[    0.778385] rtc_cmos 00:05: rtc core: registered rtc_cmos as rtc0
[    0.778471] rtc0: alarms up to one month, 242 bytes nvram, hpet irqs
[    0.778551] cpuidle: using governor ladder
[    0.778605] cpuidle: using governor menu
[    0.778662] No iBFT detected.
[    0.778974] TCP cubic registered
[    0.779082] NET: Registered protocol family 10
[    0.779485] lo: Disabled Privacy Extensions
[    0.779768] Mobile IPv6
[    0.779821] NET: Registered protocol family 17
[    0.780180] registered taskstats version 1
[    0.780381] rtc_cmos 00:05: setting system clock to 2009-09-14 08:30:59 UTC (1252917059)
[    0.780486] Freeing unused kernel memory: 584k freed
[    0.780679] Write protecting the kernel read-only data: 4000k
[    0.802581] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
[    0.882270] Floppy drive(s): fd0 is 1.44M
[    0.889776] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[    0.889848] r8169 0000:02:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[    0.889950] r8169 0000:02:00.0: setting latency timer to 64
[    0.889980]   alloc irq_desc for 29 on node 0
[    0.889982]   alloc kstat_irqs on node 0
[    0.889992] r8169 0000:02:00.0: irq 29 for MSI/MSI-X
[    0.890429] eth0: RTL8168d/8111d at 0xffffc90000674000, 00:24:1d:18:f8:b2, XID 281000c0 IRQ 29
[    0.894212] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[    0.894277] r8169 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[    0.894352] r8169 0000:03:00.0: setting latency timer to 64
[    0.894380]   alloc irq_desc for 30 on node 0
[    0.894381]   alloc kstat_irqs on node 0
[    0.894389] r8169 0000:03:00.0: irq 30 for MSI/MSI-X
[    0.894818] eth1: RTL8168d/8111d at 0xffffc90000634000, 00:24:1d:18:f8:f3, XID 281000c0 IRQ 30
[    0.900291] FDC 0 is a post-1991 82077
[    0.902177] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    0.902409] ehci_hcd 0000:00:12.2: PCI INT B -> GSI 17 (level, low) -> IRQ 17
[    0.902491] ehci_hcd 0000:00:12.2: EHCI Host Controller
[    0.902600] ehci_hcd 0000:00:12.2: new USB bus registered, assigned bus number 1
[    0.902692] ehci_hcd 0000:00:12.2: applying AMD SB600/SB700 USB freeze workaround
[    0.902772] ehci_hcd 0000:00:12.2: debug port 1
[    0.902847] ehci_hcd 0000:00:12.2: irq 17, io mem 0xfe02c000
[    0.903960] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    0.912020] ehci_hcd 0000:00:12.2: USB 2.0 started, EHCI 1.00
[    0.912096] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[    0.912153] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    0.912219] usb usb1: Product: EHCI Host Controller
[    0.912274] usb usb1: Manufacturer: Linux 2.6.31-rc9 ehci_hcd
[    0.912329] usb usb1: SerialNumber: 0000:00:12.2
[    0.912431] usb usb1: configuration #1 chosen from 1 choice
[    0.912516] hub 1-0:1.0: USB hub found
[    0.912577] hub 1-0:1.0: 6 ports detected
[    0.912732] SCSI subsystem initialized
[    0.917597]   alloc irq_desc for 16 on node 0
[    0.917600]   alloc kstat_irqs on node 0
[    0.917610] ohci_hcd 0000:00:12.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    0.917695] ohci_hcd 0000:00:12.0: OHCI Host Controller
[    0.917792] ohci_hcd 0000:00:12.0: new USB bus registered, assigned bus number 2
[    0.917892] ohci_hcd 0000:00:12.0: irq 16, io mem 0xfe02e000
[    0.918433] Uniform Multi-Platform E-IDE driver
[    0.928423] libata version 3.00 loaded.
[    0.945623]   alloc irq_desc for 22 on node 0
[    0.945626]   alloc kstat_irqs on node 0
[    0.945637] firewire_ohci 0000:05:0e.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
[    0.945779] mvsas 0000:04:00.0: mvsas: driver version 0.8.2
[    0.945839] mvsas 0000:04:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[    0.945898] mvsas 0000:04:00.0: setting latency timer to 64
[    0.947342] mvsas 0000:04:00.0: mvsas: PCI-E x4, Bandwidth Usage: 2.5 Gbps
[    0.988854] usb usb2: New USB device found, idVendor=1d6b, idProduct=0001
[    0.988914] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    0.988980] usb usb2: Product: OHCI Host Controller
[    0.989038] usb usb2: Manufacturer: Linux 2.6.31-rc9 ohci_hcd
[    0.989094] usb usb2: SerialNumber: 0000:00:12.0
[    0.989229] usb usb2: configuration #1 chosen from 1 choice
[    0.989329] hub 2-0:1.0: USB hub found
[    0.989390] hub 2-0:1.0: 3 ports detected
[    0.989627] ehci_hcd 0000:00:13.2: PCI INT B -> GSI 19 (level, low) -> IRQ 19
[    0.989704] ehci_hcd 0000:00:13.2: EHCI Host Controller
[    0.989789] ehci_hcd 0000:00:13.2: new USB bus registered, assigned bus number 3
[    0.989874] ehci_hcd 0000:00:13.2: applying AMD SB600/SB700 USB freeze workaround
[    0.989953] ehci_hcd 0000:00:13.2: debug port 1
[    0.990032] ehci_hcd 0000:00:13.2: irq 19, io mem 0xfe029000
[    1.000114] ehci_hcd 0000:00:13.2: USB 2.0 started, EHCI 1.00
[    1.000185] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002
[    1.000241] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    1.000307] usb usb3: Product: EHCI Host Controller
[    1.000361] usb usb3: Manufacturer: Linux 2.6.31-rc9 ehci_hcd
[    1.000417] usb usb3: SerialNumber: 0000:00:13.2
[    1.000508] usb usb3: configuration #1 chosen from 1 choice
[    1.000585] hub 3-0:1.0: USB hub found
[    1.000644] hub 3-0:1.0: 6 ports detected
[    1.000977] atiixp 0000:00:14.1: IDE controller (0x1002:0x439c rev 0x00)
[    1.001043] ATIIXP_IDE 0000:00:14.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    1.001122] atiixp 0000:00:14.1: not 100% native mode: will probe irqs later
[    1.001183]     ide0: BM-DMA at 0xfa00-0xfa07
[    1.001250] atiixp 0000:00:14.1: simplex device: DMA disabled
[    1.001305] ide1: DMA disabled
[    1.001367] Probing IDE interface ide0...
[    1.016139] firewire_ohci: Added fw-ohci device 0000:05:0e.0, OHCI version 1.10
[    1.460142] usb 2-3: new low speed USB device using ohci_hcd and address 2
[    1.517668] firewire_core: created device fw0: GUID 001a9c2c0000241d, S400
[    1.568143] Probing IDE interface ide1...
[    1.571122] ide1: no devices on the port
[    1.571212] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
[    1.571349] ide1 at 0x170-0x177,0x376 on irq 15
[    1.571658] ahci 0000:00:11.0: version 3.0
[    1.571671] ahci 0000:00:11.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
[    1.571847] ahci 0000:00:11.0: AHCI 0001.0100 32 slots 6 ports 3 Gbps 0x3f impl SATA mode
[    1.571914] ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part 
[    1.572436] scsi1 : ahci
[    1.572648] scsi2 : ahci
[    1.572745] scsi3 : ahci
[    1.572838] scsi4 : ahci
[    1.572928] scsi5 : ahci
[    1.573022] scsi6 : ahci
[    1.573165] ata1: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f100 irq 22
[    1.573232] ata2: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f180 irq 22
[    1.573299] ata3: SATA max UDMA/133 irq_stat 0x00000040, connection status changed
[    1.573366] ata4: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f280 irq 22
[    1.573432] ata5: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f300 irq 22
[    1.573499] ata6: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f380 irq 22
[    1.573652] ohci_hcd 0000:00:12.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    1.573739] ohci_hcd 0000:00:12.1: OHCI Host Controller
[    1.573838] ohci_hcd 0000:00:12.1: new USB bus registered, assigned bus number 4
[    1.573923] ohci_hcd 0000:00:12.1: irq 16, io mem 0xfe02d000
[    1.629951] usb 2-3: New USB device found, idVendor=046d, idProduct=c01d
[    1.630010] usb 2-3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    1.630067] usb 2-3: Product: USB-PS/2 Optical Mouse
[    1.630122] usb 2-3: Manufacturer: Logitech
[    1.630226] usb 2-3: configuration #1 chosen from 1 choice
[    1.632036] usb usb4: New USB device found, idVendor=1d6b, idProduct=0001
[    1.632095] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    1.632160] usb usb4: Product: OHCI Host Controller
[    1.632215] usb usb4: Manufacturer: Linux 2.6.31-rc9 ohci_hcd
[    1.632270] usb usb4: SerialNumber: 0000:00:12.1
[    1.632368] usb usb4: configuration #1 chosen from 1 choice
[    1.632448] hub 4-0:1.0: USB hub found
[    1.632514] hub 4-0:1.0: 3 ports detected
[    1.632679] ohci_hcd 0000:00:13.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[    1.632748] ohci_hcd 0000:00:13.0: OHCI Host Controller
[    1.632830] ohci_hcd 0000:00:13.0: new USB bus registered, assigned bus number 5
[    1.632925] ohci_hcd 0000:00:13.0: irq 18, io mem 0xfe02b000
[    1.692106] usb usb5: New USB device found, idVendor=1d6b, idProduct=0001
[    1.692165] usb usb5: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    1.692230] usb usb5: Product: OHCI Host Controller
[    1.692284] usb usb5: Manufacturer: Linux 2.6.31-rc9 ohci_hcd
[    1.692340] usb usb5: SerialNumber: 0000:00:13.0
[    1.692434] usb usb5: configuration #1 chosen from 1 choice
[    1.692512] hub 5-0:1.0: USB hub found
[    1.692577] hub 5-0:1.0: 3 ports detected
[    1.692716] ohci_hcd 0000:00:13.1: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[    1.692784] ohci_hcd 0000:00:13.1: OHCI Host Controller
[    1.692865] ohci_hcd 0000:00:13.1: new USB bus registered, assigned bus number 6
[    1.692947] ohci_hcd 0000:00:13.1: irq 18, io mem 0xfe02a000
[    1.752089] usb usb6: New USB device found, idVendor=1d6b, idProduct=0001
[    1.752145] usb usb6: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    1.756490] usb usb6: Product: OHCI Host Controller
[    1.756544] usb usb6: Manufacturer: Linux 2.6.31-rc9 ohci_hcd
[    1.756600] usb usb6: SerialNumber: 0000:00:13.1
[    1.756694] usb usb6: configuration #1 chosen from 1 choice
[    1.756766] hub 6-0:1.0: USB hub found
[    1.756826] hub 6-0:1.0: 3 ports detected
[    1.756971] ohci_hcd 0000:00:14.5: PCI INT C -> GSI 18 (level, low) -> IRQ 18
[    1.757040] ohci_hcd 0000:00:14.5: OHCI Host Controller
[    1.757119] ohci_hcd 0000:00:14.5: new USB bus registered, assigned bus number 7
[    1.757198] ohci_hcd 0000:00:14.5: irq 18, io mem 0xfe028000
[    1.816089] usb usb7: New USB device found, idVendor=1d6b, idProduct=0001
[    1.816145] usb usb7: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    1.816211] usb usb7: Product: OHCI Host Controller
[    1.816265] usb usb7: Manufacturer: Linux 2.6.31-rc9 ohci_hcd
[    1.816321] usb usb7: SerialNumber: 0000:00:14.5
[    1.816409] usb usb7: configuration #1 chosen from 1 choice
[    1.816480] hub 7-0:1.0: USB hub found
[    1.816543] hub 7-0:1.0: 2 ports detected
[    1.893131] ata4: SATA link down (SStatus 0 SControl 300)
[    2.056108] ata6: softreset failed (device not ready)
[    2.056113] ata1: softreset failed (device not ready)
[    2.056116] ata1: applying SB600 PMP SRST workaround and retrying
[    2.056287] ata6: applying SB600 PMP SRST workaround and retrying
[    2.056357] ata5: softreset failed (device not ready)
[    2.056412] ata5: applying SB600 PMP SRST workaround and retrying
[    2.056481] ata2: softreset failed (device not ready)
[    2.056542] ata2: applying SB600 PMP SRST workaround and retrying
[    2.220149] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    2.220160] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    2.220186] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    2.220215] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    2.220802] ata1.00: ATA-7: OCZ-VERTEX v1.10, 1370, max UDMA/133
[    2.220862] ata1.00: 62533296 sectors, multi 16: LBA48 NCQ (depth 31/32)
[    2.221498] ata1.00: configured for UDMA/133
[    2.236190] scsi 1:0:0:0: Direct-Access     ATA      OCZ-VERTEX v1.10 1370 PQ: 0 ANSI: 5
[    2.247396] ata2.00: ATA-8: ST31000528AS, CC34, max UDMA/133
[    2.247452] ata2.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[    2.262851] ata6.00: ATA-8: WDC WD20EADS-00R6B0, 01.00A01, max UDMA/133
[    2.262908] ata6.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[    2.267599] ata6.00: configured for UDMA/133
[    2.269157] ata5.00: ATA-8: WDC WD20EADS-00R6B0, 01.00A01, max UDMA/133
[    2.269214] ata5.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[    2.275112] ata5.00: configured for UDMA/133
[    2.287189] ata2.00: configured for UDMA/133
[    2.300128] scsi 2:0:0:0: Direct-Access     ATA      ST31000528AS     CC34 PQ: 0 ANSI: 5
[    2.460125] ata3: softreset failed (device not ready)
[    2.460181] ata3: applying SB600 PMP SRST workaround and retrying
[    2.624127] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    2.625490] ata3.00: ATAPI: PIONEER DVD-RW  DVR-212D, 1.21, max UDMA/66
[    2.627143] ata3.00: configured for UDMA/66
[    2.654687] scsi 3:0:0:0: CD-ROM            PIONEER  DVD-RW  DVR-212D 1.21 PQ: 0 ANSI: 5
[    2.654856] scsi 5:0:0:0: Direct-Access     ATA      WDC WD20EADS-00R 01.0 PQ: 0 ANSI: 5
[    2.654999] scsi 6:0:0:0: Direct-Access     ATA      WDC WD20EADS-00R 01.0 PQ: 0 ANSI: 5
[    2.658652] usbcore: registered new interface driver hiddev
[    2.664193] input: Logitech USB-PS/2 Optical Mouse as /devices/pci0000:00/0000:00:12.0/usb2/2-3/2-3:1.0/input/input2
[    2.664301] generic-usb 0003:046D:C01D.0001: input,hidraw0: USB HID v1.10 Mouse [Logitech USB-PS/2 Optical Mouse] on usb-0000:00:12.0-3/input0
[    2.664387] usbcore: registered new interface driver usbhid
[    2.664443] usbhid: v2.6:USB HID core driver
[    2.666178] sd 1:0:0:0: [sda] 62533296 512-byte logical blocks: (32.0 GB/29.8 GiB)
[    2.666261] sd 2:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[    2.666271] sd 1:0:0:0: [sda] Write Protect is off
[    2.666273] sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    2.666285] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.666356]  sda:
[    2.666391] sd 5:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[    2.666412] sd 5:0:0:0: [sdc] Write Protect is off
[    2.666414] sd 5:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[    2.666425] sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.666486]  sdc: sda2
[    2.666765] sd 2:0:0:0: [sdb] Write Protect is off
[    2.666767] sd 2:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    2.666778] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.666958] 
[    2.667084]  sdb:
[    2.667145] sd 1:0:0:0: [sda] Attached SCSI disk
[    2.667159] sd 6:0:0:0: [sdd] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[    2.667180] sd 6:0:0:0: [sdd] Write Protect is off
[    2.667181] sd 6:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[    2.667192] sd 6:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.667251]  sdd: sdb1 sdb2 sdb3 sdb4
[    2.685935] sr0: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray
[    2.685994] Uniform CD-ROM driver Revision: 3.20
[    2.686124] sr 3:0:0:0: Attached scsi CD-ROM sr0
[    2.700049] ldm_parse_tocblock(): Cannot find TOCBLOCK, database may be corrupt.
[    2.700118] ldm_parse_tocblock(): Cannot find TOCBLOCK, database may be corrupt.
[    2.706984] sd 2:0:0:0: [sdb] Attached SCSI disk
[    2.716510] ldm_parse_tocblock(): Cannot find TOCBLOCK, database may be corrupt.
[    2.716579] ldm_parse_tocblock(): Cannot find TOCBLOCK, database may be corrupt.
[    2.735845]  [LDM] sdc1
[    2.736224] sd 5:0:0:0: [sdc] Attached SCSI disk
[    2.749709]  [LDM] sdd1
[    2.750067] sd 6:0:0:0: [sdd] Attached SCSI disk
[    2.754121] sd 1:0:0:0: Attached scsi generic sg0 type 0
[    2.754198] sd 2:0:0:0: Attached scsi generic sg1 type 0
[    2.754271] sr 3:0:0:0: Attached scsi generic sg2 type 5
[    2.754343] sd 5:0:0:0: Attached scsi generic sg3 type 0
[    2.754416] sd 6:0:0:0: Attached scsi generic sg4 type 0
[    5.232112] drivers/scsi/mvsas/mv_sas.c 1214:port 0 attach dev info is 0
[    5.232115] drivers/scsi/mvsas/mv_sas.c 1216:port 0 attach sas addr is 0
[    5.336121] drivers/scsi/mvsas/mv_sas.c 1214:port 1 attach dev info is 0
[    5.336123] drivers/scsi/mvsas/mv_sas.c 1216:port 1 attach sas addr is 0
[    5.440126] drivers/scsi/mvsas/mv_sas.c 1214:port 2 attach dev info is 0
[    5.440127] drivers/scsi/mvsas/mv_sas.c 1216:port 2 attach sas addr is 0
[    5.544125] drivers/scsi/mvsas/mv_sas.c 1214:port 3 attach dev info is 0
[    5.544127] drivers/scsi/mvsas/mv_sas.c 1216:port 3 attach sas addr is 0
[    5.648111] drivers/scsi/mvsas/mv_sas.c 1214:port 4 attach dev info is 0
[    5.648112] drivers/scsi/mvsas/mv_sas.c 1216:port 4 attach sas addr is 0
[    5.752107] drivers/scsi/mvsas/mv_sas.c 1214:port 5 attach dev info is 0
[    5.752109] drivers/scsi/mvsas/mv_sas.c 1216:port 5 attach sas addr is 0
[    5.856105] drivers/scsi/mvsas/mv_sas.c 1214:port 6 attach dev info is 0
[    5.856107] drivers/scsi/mvsas/mv_sas.c 1216:port 6 attach sas addr is 0
[    5.960125] drivers/scsi/mvsas/mv_sas.c 1214:port 7 attach dev info is 0
[    5.960127] drivers/scsi/mvsas/mv_sas.c 1216:port 7 attach sas addr is 0
[    5.960132] scsi0 : mvsas
[    6.017934] device-mapper: uevent: version 1.0.3
[    6.018168] device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: dm-devel@redhat.com
[    6.033567] kjournald starting.  Commit interval 5 seconds
[    6.033572] EXT3-fs: mounted filesystem with ordered data mode.
[    6.183525] udev: starting version 141
[    6.214940] processor LNXCPU:00: registered as cooling_device0
[    6.215029] processor LNXCPU:01: registered as cooling_device1
[    6.215113] processor LNXCPU:02: registered as cooling_device2
[    6.215196] processor LNXCPU:03: registered as cooling_device3
[    6.217051] ACPI: WMI: Mapper loaded
[    6.218023] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input3
[    6.218093] ACPI: Power Button [PWRF]
[    6.218193] input: Power Button as /devices/LNXSYSTM:00/device:00/PNP0C0C:00/input/input4
[    6.218260] ACPI: Power Button [PWRB]
[    6.224685] parport_pc 00:0a: reported by Plug and Play ACPI
[    6.224792] parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
[    6.266109] ACPI: I/O resource piix4_smbus [0xb00-0xb07] conflicts with ACPI region SOR1 [0xb00-0xb0f]
[    6.266180] ACPI: Device needs an ACPI driver
[    6.266250] piix4_smbus: probe of 0000:00:14.0 failed with error -16
[    6.268207] EDAC MC: Ver: 2.1.0 Sep  9 2009
[    6.274408] EDAC amd64_edac:  Ver: 3.2.0 Sep  9 2009
[    6.276436] EDAC amd64: This node reports that Memory ECC is currently disabled.
[    6.276504] EDAC amd64: bit 0x400000 in register F3x44 of the MISC_CONTROL device (0000:00:18.3) should be enabled
[    6.276573] EDAC amd64: WARNING: ECC is NOT currently enabled by the BIOS. Module will NOT be loaded.
[    6.276574]     Either Enable ECC in the BIOS, or use the 'ecc_enable_override' parameter.
[    6.276575]     Might be a BIOS bug, if BIOS says ECC is enabled
[    6.276576]     Use of the override can cause unknown side effects.
[    6.276835] amd64_edac: probe of 0000:00:18.2 failed with error -22
[    6.361783] HDA Intel 0000:00:14.2: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    6.441160] hda_codec: Unknown model for ALC889A, trying auto-probe from BIOS...
[    6.441403] input: HDA Digital PCBeep as /devices/pci0000:00/0000:00:14.2/input/input5
[    6.642244] Adding 8385920k swap on /dev/sdb4.  Priority:-1 extents:1 across:8385920k 
[    6.662562] EXT3 FS on sda2, internal journal
[    6.680895] loop: module loaded
[    6.699068] it87: Found IT8720F chip at 0x228, revision 5
[    6.699132] it87: in3 is VCC (+5V)
[    7.024976] kjournald starting.  Commit interval 5 seconds
[    7.025418] EXT3 FS on sdb3, internal journal
[    7.025523] EXT3-fs: mounted filesystem with ordered data mode.
[    7.135226] Bridge firewalling registered
[    7.137827] device eth0 entered promiscuous mode
[    7.138843] r8169: eth0: link up
[    7.138901] r8169: eth0: link up
[    7.140606] br0: port 1(eth0) entering learning state
[   17.716140] br0: no IPv6 routers present
[   17.944120] eth0: no IPv6 routers present
[   22.140134] br0: port 1(eth0) entering forwarding state
[   35.456274] RPC: Registered udp transport module.
[   35.456340] RPC: Registered tcp transport module.
[   35.466800] Slow work thread pool: Starting up
[   35.467033] Slow work thread pool: Ready
[   35.467121] FS-Cache: Loaded
[   35.485760] FS-Cache: Netfs 'nfs' registered for caching
[   35.497079] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[   35.686894] svc: failed to register lockdv1 RPC service (errno 97).
[   36.015021] kvm: Nested Paging enabled
[   36.048120] powernow-k8: Found 1 AMD Phenom(tm) II X4 810 Processor processors (4 cpu cores) (version 2.20.00)
[   36.048156] powernow-k8:    0 : pstate 0 (2600 MHz)
[   36.048157] powernow-k8:    1 : pstate 1 (1900 MHz)
[   36.048159] powernow-k8:    2 : pstate 2 (1400 MHz)
[   36.048160] powernow-k8:    3 : pstate 3 (800 MHz)
[   46.962610] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[   46.966004] NFSD: starting 90-second grace period
[   47.073495] lo: Disabled Privacy Extensions
[   47.074913] process `upsagentd' is using obsolete setsockopt SO_BSDCOMPAT
[  214.276612] CE: hpet increasing min_delta_ns to 15000 nsec
[ 7089.781711] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 7089.781731] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[ 7089.781735]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 7089.781742] ata5.00: status: { DRDY }
[ 7089.781754] ata5: hard resetting link
[ 7090.264636] ata5: softreset failed (device not ready)
[ 7090.264646] ata5: applying SB600 PMP SRST workaround and retrying
[ 7090.429567] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 7090.441356] ata5.00: configured for UDMA/133
[ 7090.441384] ata5: EH complete
[ 7809.252040] CE: hpet increasing min_delta_ns to 22500 nsec
[ 8001.781161] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 8001.781181] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[ 8001.781184]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 8001.781192] ata5.00: status: { DRDY }
[ 8001.781204] ata5: hard resetting link
[ 8002.264696] ata5: softreset failed (device not ready)
[ 8002.264706] ata5: applying SB600 PMP SRST workaround and retrying
[ 8002.429687] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 8002.442233] ata5.00: configured for UDMA/133
[ 8002.442261] ata5: EH complete
[ 8009.781719] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 8009.781739] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[ 8009.781743]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 8009.781750] ata5.00: status: { DRDY }
[ 8009.781762] ata5: hard resetting link
[ 8010.265634] ata5: softreset failed (device not ready)
[ 8010.265644] ata5: applying SB600 PMP SRST workaround and retrying
[ 8010.429655] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 8010.441260] ata5.00: configured for UDMA/133
[ 8010.441285] ata5: EH complete
[ 8071.816076] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 8071.816096] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[ 8071.816100]          res 40/00:00:af:88:e0/00:00:e8:00:00/e0 Emask 0x4 (timeout)
[ 8071.816108] ata5.00: status: { DRDY }
[ 8071.816120] ata5: hard resetting link
[ 8072.301610] ata5: softreset failed (device not ready)
[ 8072.301620] ata5: applying SB600 PMP SRST workaround and retrying
[ 8072.465653] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 8072.477309] ata5.00: configured for UDMA/133
[ 8072.477340] ata5: EH complete
[ 8079.816153] ata5.00: NCQ disabled due to excessive errors
[ 8079.816166] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 8079.816184] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[ 8079.816187]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 8079.816194] ata5.00: status: { DRDY }
[ 8079.816206] ata5: hard resetting link
[ 8080.301526] ata5: softreset failed (device not ready)
[ 8080.301536] ata5: applying SB600 PMP SRST workaround and retrying
[ 8080.465648] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 8080.477386] ata5.00: configured for UDMA/133
[ 8080.477409] ata5: EH complete
[ 9121.781750] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 9121.781770] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[ 9121.781773]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 9121.781781] ata5.00: status: { DRDY }
[ 9121.781793] ata5: hard resetting link
[ 9122.265633] ata5: softreset failed (device not ready)
[ 9122.265643] ata5: applying SB600 PMP SRST workaround and retrying
[ 9122.429644] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 9122.442072] ata5.00: configured for UDMA/133
[ 9122.442097] ata5: EH complete
[ 9129.782278] ata5: limiting SATA link speed to 1.5 Gbps
[ 9129.782290] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 9129.782308] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[ 9129.782311]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 9129.782318] ata5.00: status: { DRDY }
[ 9129.782330] ata5: hard resetting link
[ 9130.321162] ata5: softreset failed (device not ready)
[ 9130.321172] ata5: applying SB600 PMP SRST workaround and retrying
[ 9130.485644] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 9130.497280] ata5.00: configured for UDMA/133
[ 9130.497304] ata5: EH complete
[10465.781851] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[10465.781871] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[10465.781874]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[10465.781882] ata5.00: status: { DRDY }
[10465.781894] ata5: hard resetting link
[10466.265656] ata5: softreset failed (device not ready)
[10466.265666] ata5: applying SB600 PMP SRST workaround and retrying
[10466.429585] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[10466.441756] ata5.00: configured for UDMA/133
[10466.441789] ata5: EH complete
[10941.793179] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[10941.793200] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[10941.793203]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[10941.793210] ata5.00: status: { DRDY }
[10941.793222] ata5: hard resetting link
[10942.276163] ata5: softreset failed (device not ready)
[10942.276174] ata5: applying SB600 PMP SRST workaround and retrying
[10942.441223] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[10942.452976] ata5.00: configured for UDMA/133
[10942.453008] ata5: EH complete
[10949.792075] ata5.00: limiting speed to UDMA/100:PIO4
[10949.792088] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[10949.792106] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[10949.792109]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[10949.792116] ata5.00: status: { DRDY }
[10949.792128] ata5: hard resetting link
[10950.277640] ata5: softreset failed (device not ready)
[10950.277650] ata5: applying SB600 PMP SRST workaround and retrying
[10950.441151] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[10950.453294] ata5.00: configured for UDMA/100
[10950.453326] ata5: EH complete
[12271.781785] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[12271.781806] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[12271.781809]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[12271.781816] ata5.00: status: { DRDY }
[12271.781828] ata5: hard resetting link
[12272.265692] ata5: softreset failed (device not ready)
[12272.265703] ata5: applying SB600 PMP SRST workaround and retrying
[12272.428666] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[12272.441979] ata5.00: configured for UDMA/100
[12272.442005] ata5: EH complete
[12279.781569] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[12279.781590] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[12279.781593]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[12279.781601] ata5.00: status: { DRDY }
[12279.781612] ata5: hard resetting link
[12280.265637] ata5: softreset failed (device not ready)
[12280.265648] ata5: applying SB600 PMP SRST workaround and retrying
[12280.428152] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[12280.439793] ata5.00: configured for UDMA/100
[12280.439818] ata5: EH complete
[13335.780697] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[13335.780718] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[13335.780721]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[13335.780729] ata5.00: status: { DRDY }
[13335.780741] ata5: hard resetting link
[13336.264164] ata5: softreset failed (device not ready)
[13336.264175] ata5: applying SB600 PMP SRST workaround and retrying
[13336.428082] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[13336.440215] ata5.00: configured for UDMA/100
[13336.440246] ata5: EH complete
[15688.781367] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[15688.781387] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[15688.781390]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[15688.781397] ata5.00: status: { DRDY }
[15688.781409] ata5: hard resetting link
[15689.264667] ata5: softreset failed (device not ready)
[15689.264677] ata5: applying SB600 PMP SRST workaround and retrying
[15689.429174] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[15689.440882] ata5.00: configured for UDMA/100
[15689.440910] ata5: EH complete
[15696.780404] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[15696.780424] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[15696.780428]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[15696.780435] ata5.00: status: { DRDY }
[15696.780447] ata5: hard resetting link
[15697.376681] ata5: softreset failed (device not ready)
[15697.376692] ata5: applying SB600 PMP SRST workaround and retrying
[15697.541562] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[15697.554171] ata5.00: configured for UDMA/100
[15697.554197] ata5: EH complete
[16694.781727] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[16694.781747] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[16694.781751]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[16694.781758] ata5.00: status: { DRDY }
[16694.781770] ata5: hard resetting link
[16695.265197] ata5: softreset failed (device not ready)
[16695.265208] ata5: applying SB600 PMP SRST workaround and retrying
[16695.428306] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[16695.440848] ata5.00: configured for UDMA/100
[16695.440874] ata5: EH complete
[16702.781522] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[16702.781529] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[16702.781530]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[16702.781533] ata5.00: status: { DRDY }
[16702.781537] ata5: hard resetting link
[16703.264389] ata5: softreset failed (device not ready)
[16703.264399] ata5: applying SB600 PMP SRST workaround and retrying
[16703.430387] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[16703.442092] ata5.00: configured for UDMA/100
[16703.442119] ata5: EH complete
[17317.818447] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[17317.818468] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[17317.818471]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[17317.818479] ata5.00: status: { DRDY }
[17317.818491] ata5: hard resetting link
[17318.301567] ata5: softreset failed (device not ready)
[17318.301578] ata5: applying SB600 PMP SRST workaround and retrying
[17318.465688] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[17318.477730] ata5.00: configured for UDMA/100
[17318.477768] ata5: EH complete
[17325.816252] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[17325.816273] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[17325.816276]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[17325.816283] ata5.00: status: { DRDY }
[17325.816295] ata5: hard resetting link
[17326.413099] ata5: softreset failed (device not ready)
[17326.413109] ata5: applying SB600 PMP SRST workaround and retrying
[17326.577169] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[17326.589045] ata5.00: configured for UDMA/100
[17326.589072] ata5: EH complete
[17669.793632] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[17669.793653] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[17669.793656]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[17669.793663] ata5.00: status: { DRDY }
[17669.793675] ata5: hard resetting link
[17670.557535] ata5: softreset failed (device not ready)
[17670.557545] ata5: applying SB600 PMP SRST workaround and retrying
[17670.721646] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[17670.734091] ata5.00: configured for UDMA/100
[17670.734122] ata5: EH complete
[17678.793718] ata5.00: limiting speed to UDMA/33:PIO4
[17678.793731] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[17678.793748] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[17678.793751]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[17678.793758] ata5.00: status: { DRDY }
[17678.793770] ata5: hard resetting link
[17679.277635] ata5: softreset failed (device not ready)
[17679.277645] ata5: applying SB600 PMP SRST workaround and retrying
[17679.441362] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[17679.453900] ata5.00: configured for UDMA/33
[17679.453927] ata5: EH complete
[17949.781871] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[17949.781891] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[17949.781895]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[17949.781902] ata5.00: status: { DRDY }
[17949.781914] ata5: hard resetting link
[17950.266694] ata5: softreset failed (device not ready)
[17950.266705] ata5: applying SB600 PMP SRST workaround and retrying
[17950.428175] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[17950.440744] ata5.00: configured for UDMA/33
[17950.440780] ata5: EH complete
[17957.781559] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[17957.781566] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[17957.781567]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[17957.781570] ata5.00: status: { DRDY }
[17957.781574] ata5: hard resetting link
[17958.321043] ata5: softreset failed (device not ready)
[17958.321053] ata5: applying SB600 PMP SRST workaround and retrying
[17958.485159] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[17958.496978] ata5.00: configured for UDMA/33
[17958.497006] ata5: EH complete
[24850.780672] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[24850.780693] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[24850.780696]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[24850.780703] ata5.00: status: { DRDY }
[24850.780715] ata5: hard resetting link
[24851.264685] ata5: softreset failed (device not ready)
[24851.264695] ata5: applying SB600 PMP SRST workaround and retrying
[24851.428553] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[24851.440605] ata5.00: configured for UDMA/33
[24851.440634] ata5: EH complete
[25814.804655] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[25814.804675] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[25814.804679]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[25814.804686] ata5.00: status: { DRDY }
[25814.804698] ata5: hard resetting link
[25815.289583] ata5: softreset failed (device not ready)
[25815.289594] ata5: applying SB600 PMP SRST workaround and retrying
[25815.452425] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[25815.464619] ata5.00: configured for UDMA/33
[25815.464647] ata5: EH complete
[25830.793066] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[25830.793087] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[25830.793091]          res 40/00:00:af:88:e0/00:00:e8:00:00/e0 Emask 0x4 (timeout)
[25830.793099] ata5.00: status: { DRDY }
[25830.793111] ata5: hard resetting link
[25831.277634] ata5: softreset failed (device not ready)
[25831.277645] ata5: applying SB600 PMP SRST workaround and retrying
[25831.440651] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[25831.452987] ata5.00: configured for UDMA/33
[25831.453014] ata5: EH complete
[25970.816235] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[25970.816255] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[25970.816258]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[25970.816266] ata5.00: status: { DRDY }
[25970.816277] ata5: hard resetting link
[25971.300058] ata5: softreset failed (device not ready)
[25971.300069] ata5: applying SB600 PMP SRST workaround and retrying
[25971.464192] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[25971.475957] ata5.00: configured for UDMA/33
[25971.475995] ata5: EH complete
[28350.781651] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[28350.781671] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[28350.781675]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[28350.781682] ata5.00: status: { DRDY }
[28350.781694] ata5: hard resetting link
[28351.265634] ata5: softreset failed (device not ready)
[28351.265645] ata5: applying SB600 PMP SRST workaround and retrying
[28351.429718] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[28351.441421] ata5.00: configured for UDMA/33
[28351.441445] ata5: EH complete
[28964.793120] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[28964.793140] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[28964.793144]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[28964.793151] ata5.00: status: { DRDY }
[28964.793163] ata5: hard resetting link
[28965.276648] ata5: softreset failed (device not ready)
[28965.276659] ata5: applying SB600 PMP SRST workaround and retrying
[28965.440495] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[28965.452518] ata5.00: configured for UDMA/33
[28965.452543] ata5: EH complete
[28980.816343] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[28980.816364] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[28980.816368]          res 40/00:00:af:88:e0/00:00:e8:00:00/e0 Emask 0x4 (timeout)
[28980.816375] ata5.00: status: { DRDY }
[28980.816387] ata5: hard resetting link
[28981.300540] ata5: softreset failed (device not ready)
[28981.300551] ata5: applying SB600 PMP SRST workaround and retrying
[28981.465697] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[28981.477590] ata5.00: configured for UDMA/33
[28981.477622] ata5: EH complete
[29524.781659] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[29524.781680] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[29524.781683]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[29524.781691] ata5.00: status: { DRDY }
[29524.781703] ata5: hard resetting link
[29525.265541] ata5: softreset failed (device not ready)
[29525.265551] ata5: applying SB600 PMP SRST workaround and retrying
[29525.429661] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[29525.441658] ata5.00: configured for UDMA/33
[29525.441681] ata5: EH complete
[29540.781986] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[29540.782006] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[29540.782009]          res 40/00:00:af:88:e0/00:00:e8:00:00/e0 Emask 0x4 (timeout)
[29540.782016] ata5.00: status: { DRDY }
[29540.782028] ata5: hard resetting link
[29541.265537] ata5: softreset failed (device not ready)
[29541.265547] ata5: applying SB600 PMP SRST workaround and retrying
[29541.429586] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[29541.441953] ata5.00: configured for UDMA/33
[29541.441981] ata5: EH complete
[29594.781673] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[29594.781701] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[29594.781704]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[29594.781712] ata5.00: status: { DRDY }
[29594.781724] ata5: hard resetting link
[29595.264844] ata5: softreset failed (device not ready)
[29595.264854] ata5: applying SB600 PMP SRST workaround and retrying
[29595.429673] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[29595.441870] ata5.00: configured for UDMA/33
[29595.441896] ata5: EH complete
[29610.780196] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[29610.780216] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[29610.780219]          res 40/00:00:af:88:e0/00:00:e8:00:00/e0 Emask 0x4 (timeout)
[29610.780227] ata5.00: status: { DRDY }
[29610.780322] ata5: hard resetting link
[29611.265639] ata5: softreset failed (device not ready)
[29611.265649] ata5: applying SB600 PMP SRST workaround and retrying
[29611.428973] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[29611.441089] ata5.00: configured for UDMA/33
[29611.441116] ata5: EH complete
[30224.781660] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[30224.781680] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[30224.781684]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[30224.781699] ata5.00: status: { DRDY }
[30224.781711] ata5: hard resetting link
[30225.264347] ata5: softreset failed (device not ready)
[30225.264358] ata5: applying SB600 PMP SRST workaround and retrying
[30225.429672] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[30225.442540] ata5.00: configured for UDMA/33
[30225.442565] ata5: EH complete
[30240.781657] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[30240.781677] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[30240.781681]          res 40/00:00:af:88:e0/00:00:e8:00:00/e0 Emask 0x4 (timeout)
[30240.781688] ata5.00: status: { DRDY }
[30240.781724] ata5: hard resetting link
[30241.265197] ata5: softreset failed (device not ready)
[30241.265207] ata5: applying SB600 PMP SRST workaround and retrying
[30241.429542] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[30241.441297] ata5.00: configured for UDMA/33
[30241.441325] ata5: EH complete
[30310.781167] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[30310.781188] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[30310.781191]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[30310.781198] ata5.00: status: { DRDY }
[30310.781210] ata5: hard resetting link
[30311.264860] ata5: softreset failed (device not ready)
[30311.264871] ata5: applying SB600 PMP SRST workaround and retrying
[30311.429161] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[30311.441444] ata5.00: configured for UDMA/33
[30311.441471] ata5: EH complete
[32410.780220] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[32410.780240] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[32410.780243]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[32410.780251] ata5.00: status: { DRDY }
[32410.780262] ata5: hard resetting link
[32411.264544] ata5: softreset failed (device not ready)
[32411.264554] ata5: applying SB600 PMP SRST workaround and retrying
[32411.428072] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[32411.440112] ata5.00: configured for UDMA/33
[32411.440148] ata5: EH complete
[32452.781180] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[32452.781199] ata5.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0
[32452.781202]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[32452.781209] ata5.00: status: { DRDY }
[32452.781221] ata5: hard resetting link
[32453.264154] ata5: softreset failed (device not ready)
[32453.264159] ata5: applying SB600 PMP SRST workaround and retrying
[32453.429666] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[32453.441734] ata5.00: configured for UDMA/33
[32453.441762] ata5: EH complete
[32464.106741] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[32464.106751] ata5.00: irq_stat 0x40000001
[32464.106769] ata5.00: cmd 25/00:08:00:88:e0/00:00:e8:00:00/e0 tag 0 dma 4096 in
[32464.106772]          res 41/04:00:00:88:e0/00:00:e8:00:00/e0 Emask 0x1 (device error)
[32464.106780] ata5.00: status: { DRDY ERR }
[32464.106785] ata5.00: error: { ABRT }
[32464.118623] ata5.00: configured for UDMA/33
[32464.118646] ata5: EH complete
[32473.430229] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[32473.430238] ata5.00: irq_stat 0x40000001
[32473.430257] ata5.00: cmd 25/00:08:00:88:e0/00:00:e8:00:00/e0 tag 0 dma 4096 in
[32473.430261]          res 41/04:00:00:88:e0/00:00:e8:00:00/e0 Emask 0x1 (device error)
[32473.430268] ata5.00: status: { DRDY ERR }
[32473.430273] ata5.00: error: { ABRT }
[32473.442098] ata5.00: configured for UDMA/33
[32473.442122] ata5: EH complete
[32482.753530] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[32482.753533] ata5.00: irq_stat 0x40000001
[32482.753539] ata5.00: cmd 25/00:08:00:88:e0/00:00:e8:00:00/e0 tag 0 dma 4096 in
[32482.753540]          res 41/04:00:00:88:e0/00:00:e8:00:00/e0 Emask 0x1 (device error)
[32482.753543] ata5.00: status: { DRDY ERR }
[32482.753544] ata5.00: error: { ABRT }
[32482.765387] ata5.00: configured for UDMA/33
[32482.765399] ata5: EH complete
[32492.076167] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[32492.076176] ata5.00: irq_stat 0x40000001
[32492.076195] ata5.00: cmd 25/00:08:00:88:e0/00:00:e8:00:00/e0 tag 0 dma 4096 in
[32492.076198]          res 41/04:00:00:88:e0/00:00:e8:00:00/e0 Emask 0x1 (device error)
[32492.076205] ata5.00: status: { DRDY ERR }
[32492.076210] ata5.00: error: { ABRT }
[32492.088032] ata5.00: configured for UDMA/33
[32492.088050] ata5: EH complete
[32501.394658] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[32501.394667] ata5.00: irq_stat 0x40000001
[32501.394685] ata5.00: cmd 25/00:08:00:88:e0/00:00:e8:00:00/e0 tag 0 dma 4096 in
[32501.394689]          res 41/04:00:00:88:e0/00:00:e8:00:00/e0 Emask 0x1 (device error)
[32501.394696] ata5.00: status: { DRDY ERR }
[32501.394701] ata5.00: error: { ABRT }
[32501.406494] ata5.00: configured for UDMA/33
[32501.406518] ata5: EH complete
[32510.718100] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[32510.718110] ata5.00: irq_stat 0x40000001
[32510.718128] ata5.00: cmd 25/00:08:00:88:e0/00:00:e8:00:00/e0 tag 0 dma 4096 in
[32510.718131]          res 41/04:00:00:88:e0/00:00:e8:00:00/e0 Emask 0x1 (device error)
[32510.718138] ata5.00: status: { DRDY ERR }
[32510.718143] ata5.00: error: { ABRT }
[32510.730017] ata5.00: configured for UDMA/33
[32510.730042] sd 5:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[32510.730050] sd 5:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor]
[32510.730059] Descriptor sense data with sense descriptors (in hex):
[32510.730064]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
[32510.730082]         e8 e0 88 00 
[32510.730090] sd 5:0:0:0: [sdc] Add. Sense: No additional sense information
[32510.730098] end_request: I/O error, dev sdc, sector 3907028992
[32510.730106] Buffer I/O error on device sdc, logical block 488378624
[32510.730142] ata5: EH complete
[32526.780076] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[32526.780097] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[32526.780100]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[32526.780107] ata5.00: status: { DRDY }
[32526.780119] ata5: hard resetting link
[32536.785177] ata5: softreset failed (device not ready)
[32536.785189] ata5: hard resetting link
[32546.789238] ata5: softreset failed (device not ready)
[32546.789249] ata5: hard resetting link
[32557.360064] ata5: link is slow to respond, please be patient (ready=0)
[32573.836192] ata5: softreset failed (device not ready)
[32573.836202] ata5: applying SB600 PMP SRST workaround and retrying
[32581.792026] ata5: softreset failed (device not ready)
[32581.792039] ata5: hard resetting link
[32587.000775] ata5: softreset failed (device not ready)
[32587.000784] ata5: reset failed, giving up
[32587.000790] ata5.00: disabled
[32587.000822] ata5: EH complete
[32587.000847] sd 5:0:0:0: [sdc] Unhandled error code
[32587.000852] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[32587.000860] end_request: I/O error, dev sdc, sector 3907028904
[32587.000868] Buffer I/O error on device sdc, logical block 488378613
[32587.000958] sd 5:0:0:0: [sdc] Unhandled error code
[32587.000967] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[32587.000987] end_request: I/O error, dev sdc, sector 3907028904
[32587.000995] Buffer I/O error on device sdc, logical block 488378613
[32587.001089] sd 5:0:0:0: [sdc] Unhandled error code
[32587.001093] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[32587.001100] end_request: I/O error, dev sdc, sector 3907029104
[32587.001106] Buffer I/O error on device sdc, logical block 488378638
[32587.001131] sd 5:0:0:0: [sdc] Unhandled error code
[32587.001136] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[32587.001142] end_request: I/O error, dev sdc, sector 3907029104
[32587.001147] Buffer I/O error on device sdc, logical block 488378638
[32587.001455] sd 5:0:0:0: [sdc] Unhandled error code
[32587.001460] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[32587.001467] end_request: I/O error, dev sdc, sector 512
[32587.001472] Buffer I/O error on device sdc, logical block 64
[34244.232569] sd 5:0:0:0: [sdc] Unhandled error code
[34244.232579] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[34244.232589] end_request: I/O error, dev sdc, sector 3907028904
[34244.232597] Buffer I/O error on device sdc, logical block 488378613
[34244.232645] sd 5:0:0:0: [sdc] Unhandled error code
[34244.232650] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[34244.232657] end_request: I/O error, dev sdc, sector 3907029104
[34244.232662] Buffer I/O error on device sdc, logical block 488378638
[34244.232954] sd 5:0:0:0: [sdc] Unhandled error code
[34244.232959] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[34244.232966] end_request: I/O error, dev sdc, sector 512
[34244.232972] Buffer I/O error on device sdc, logical block 64
[36044.818014] sd 5:0:0:0: [sdc] Unhandled error code
[36044.818023] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[36044.818033] end_request: I/O error, dev sdc, sector 3907028904
[36044.818040] Buffer I/O error on device sdc, logical block 488378613
[36044.818090] sd 5:0:0:0: [sdc] Unhandled error code
[36044.818095] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[36044.818102] end_request: I/O error, dev sdc, sector 3907029104
[36044.818108] Buffer I/O error on device sdc, logical block 488378638
[36044.818423] sd 5:0:0:0: [sdc] Unhandled error code
[36044.818428] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[36044.818435] end_request: I/O error, dev sdc, sector 512
[36044.818441] Buffer I/O error on device sdc, logical block 64
[37844.396933] sd 5:0:0:0: [sdc] Unhandled error code
[37844.396943] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[37844.396952] end_request: I/O error, dev sdc, sector 3907028904
[37844.396960] Buffer I/O error on device sdc, logical block 488378613
[37844.397052] sd 5:0:0:0: [sdc] Unhandled error code
[37844.397058] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[37844.397065] end_request: I/O error, dev sdc, sector 3907029104
[37844.397071] Buffer I/O error on device sdc, logical block 488378638
[37844.397398] sd 5:0:0:0: [sdc] Unhandled error code
[37844.397406] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[37844.397417] end_request: I/O error, dev sdc, sector 512
[37844.397425] Buffer I/O error on device sdc, logical block 64
[39644.622416] sd 5:0:0:0: [sdc] Unhandled error code
[39644.622420] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[39644.622423] end_request: I/O error, dev sdc, sector 3907028904
[39644.622426] Buffer I/O error on device sdc, logical block 488378613
[39644.622442] sd 5:0:0:0: [sdc] Unhandled error code
[39644.622443] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[39644.622445] end_request: I/O error, dev sdc, sector 3907029104
[39644.622447] Buffer I/O error on device sdc, logical block 488378638
[39644.622546] sd 5:0:0:0: [sdc] Unhandled error code
[39644.622547] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[39644.622549] end_request: I/O error, dev sdc, sector 512
[39644.622551] Buffer I/O error on device sdc, logical block 64
[41444.250375] sd 5:0:0:0: [sdc] Unhandled error code
[41444.250385] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[41444.250394] end_request: I/O error, dev sdc, sector 3907028904
[41444.250402] Buffer I/O error on device sdc, logical block 488378613
[41444.250451] sd 5:0:0:0: [sdc] Unhandled error code
[41444.250456] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[41444.250463] end_request: I/O error, dev sdc, sector 3907029104
[41444.250469] Buffer I/O error on device sdc, logical block 488378638
[41444.250792] sd 5:0:0:0: [sdc] Unhandled error code
[41444.250797] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[41444.250804] end_request: I/O error, dev sdc, sector 512
[41444.250809] Buffer I/O error on device sdc, logical block 64
[43244.870091] sd 5:0:0:0: [sdc] Unhandled error code
[43244.870100] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[43244.870110] end_request: I/O error, dev sdc, sector 3907028904
[43244.870118] Buffer I/O error on device sdc, logical block 488378613
[43244.870166] sd 5:0:0:0: [sdc] Unhandled error code
[43244.870171] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[43244.870178] end_request: I/O error, dev sdc, sector 3907029104
[43244.870184] Buffer I/O error on device sdc, logical block 488378638
[43244.870503] sd 5:0:0:0: [sdc] Unhandled error code
[43244.870508] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[43244.870515] end_request: I/O error, dev sdc, sector 512
[43244.870520] Buffer I/O error on device sdc, logical block 64
[45044.487645] sd 5:0:0:0: [sdc] Unhandled error code
[45044.487654] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[45044.487664] end_request: I/O error, dev sdc, sector 3907028904
[45044.487672] Buffer I/O error on device sdc, logical block 488378613
[45044.487720] sd 5:0:0:0: [sdc] Unhandled error code
[45044.487725] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[45044.487732] end_request: I/O error, dev sdc, sector 3907029104
[45044.487737] Buffer I/O error on device sdc, logical block 488378638
[45044.488020] sd 5:0:0:0: [sdc] Unhandled error code
[45044.488025] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[45044.488032] end_request: I/O error, dev sdc, sector 512
[45044.488037] Buffer I/O error on device sdc, logical block 64

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-14 21:13                                         ` Thomas Fjellstrom
@ 2009-09-14 22:23                                           ` Tejun Heo
  0 siblings, 0 replies; 84+ messages in thread
From: Tejun Heo @ 2009-09-14 22:23 UTC (permalink / raw)
  To: tfjellstrom
  Cc: linux-kernel, Chris Webb, linux-scsi, Ric Wheeler, Andrei Tanas,
	NeilBrown, IDE/ATA development list, Jeff Garzik, Mark Lord

Thomas Fjellstrom wrote:
> Sure, I've attached the full dmesg from a full test I ran today (I couldn't 
> find the old log where that bit came from). I'm running 2.6.31-rc9 right now, 
> and will probably update to the final 31 release soonish. The test I ran 
> actually finished (dd if=/dev/sdc of=/dev/null bs=8M), whereas with earlier 
> kernels it was completely failing. Of course, I was actually trying to bring 
> up the md raid0 array (2x2TB), mount the filesystem, and copy the files off 
> before. mdraid is probably more sensitive to the end_request errors than dd 
> is.

[    2.056357] ata5: softreset failed (device not ready)
[    2.056412] ata5: applying SB600 PMP SRST workaround and retrying

The above two are expected.  It's a bug in SB600 controller being
worked around.

[    2.220160] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    2.269157] ata5.00: ATA-8: WDC WD20EADS-00R6B0, 01.00A01, max UDMA/133
[    2.269214] ata5.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[    2.275112] ata5.00: configured for UDMA/133

All seem well.

[ 7089.781711] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 7089.781731] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[ 7089.781735]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

This is SMART ENABLE OPERATIONS and the command gets retried a lot of
times with the same result.

[32410.780251] ata5.00: status: { DRDY }
[32410.780262] ata5: hard resetting link
[32411.264544] ata5: softreset failed (device not ready)
[32411.264554] ata5: applying SB600 PMP SRST workaround and retrying
[32411.428072] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[32411.440112] ata5.00: configured for UDMA/33
[32411.440148] ata5: EH complete
[32452.781180] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[32452.781199] ata5.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0
[32452.781202]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Then, one SMART RETURN STATUS gets timed out.

[32464.106741] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[32464.106751] ata5.00: irq_stat 0x40000001
[32464.106769] ata5.00: cmd 25/00:08:00:88:e0/00:00:e8:00:00/e0 tag 0 dma 4096 in
[32464.106772]          res 41/04:00:00:88:e0/00:00:e8:00:00/e0 Emask 0x1 (device error)

Then, device fails READ_EXT.

[32510.730059] Descriptor sense data with sense descriptors (in hex):
[32510.730064]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[32510.730082]         e8 e0 88 00
[32510.730090] sd 5:0:0:0: [sdc] Add. Sense: No additional sense information
[32510.730098] end_request: I/O error, dev sdc, sector 3907028992
[32510.730106] Buffer I/O error on device sdc, logical block 488378624

After several retries, libata gives up and sd does too.

[32510.730142] ata5: EH complete
[32526.780076] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[32526.780097] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[32526.780100]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[32526.780107] ata5.00: status: { DRDY }
[32526.780119] ata5: hard resetting link
[32536.785177] ata5: softreset failed (device not ready)
[32536.785189] ata5: hard resetting link
[32546.789238] ata5: softreset failed (device not ready)
[32546.789249] ata5: hard resetting link
[32557.360064] ata5: link is slow to respond, please be patient (ready=0)
[32573.836192] ata5: softreset failed (device not ready)
[32573.836202] ata5: applying SB600 PMP SRST workaround and retrying
[32581.792026] ata5: softreset failed (device not ready)
[32581.792039] ata5: hard resetting link
[32587.000775] ata5: softreset failed (device not ready)
[32587.000784] ata5: reset failed, giving up
[32587.000790] ata5.00: disabled
[32587.000822] ata5: EH complete

Then, SMART ENABLE again, which now drives the drive off the limit and
it never comes back.

Does disabling whatever is issuing those SMART commands make any
difference?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-08-31 12:20                               ` Tejun Heo
  2009-09-07 11:44                                 ` Chris Webb
@ 2009-09-16 22:28                                 ` Chris Webb
  2009-09-16 23:47                                   ` Tejun Heo
  1 sibling, 1 reply; 84+ messages in thread
From: Chris Webb @ 2009-09-16 22:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ric Wheeler, Andrei Tanas, NeilBrown, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik, Mark Lord

Hi Tejun. Thanks for following up to this. We've done some more experimentation
over the last couple of days based on your suggestions and thoughts.

Tejun Heo <tj@kernel.org> writes:
> Seriously, it's most likely a hardware malfunction although I can't tell
> where the problem is with the given data.  Get the hardware fixed.

We know this isn't caused by a single faulty piece of hardware, because we have
a cluster of identical machines and all have shown this behaviour. This doesn't
mean that there isn't a hardware problem, but if there is one, it's a design
problem or firmware bug affecting all of our hosts.

There have also been a few reports of problems which look very similar in this
thread from people with somewhat different hardware and drives to ours.

> The aboves are IDENTIFY.  Who's issuing IDENTIFY regularly?  It isn't
> from the regular IO paths or md.  It's probably being issued via SG_IO
> from userland.  These failures don't affect normal operation.
[...]
> Oooh, another possibility is the above continuous IDENTIFY tries.
> Doing things like that generally isn't a good idea because vendors
> don't expect IDENTIFY to be mixed regularly with normal IOs and
> firmwares aren't tested against that.  Even smart commands sometimes
> cause problems.  So, finding out the thing which is obsessed with the
> identity of the drive and stopping it might help.

We tracked this down to some (excessively frequent!) monitoring we were doing
using smartctl. Things were improved considerably by stopping smartd and
disabling all callers of smartctl, although it doesn't appear to have been a
cure. The frequency of these timeouts during resync seems to have gone from
about once every two hours to about once a day, which means we've been able to
complete some resyncs whereas we were unable to before.

What we still see are (fewer) 'frozen' exceptions leading to a drive reset and
an 'end_request: I/O error', such as [1]. The drive is then promptly kicked out
of the raid array.

Some of these timeouts also leave us with a completely dead drive, and we need
to reboot the machine before it can be accessed again. (Hot plugging it out and
back in again isn't sufficient to bring it back to life, so maybe a controller
problem, although other drives on the same controller stay alive?) An example
is [2].

There are two more symptoms we are seeing on the same which may be
connected, or may be separate bugs in their own right:

  - 'cat /proc/mdstat' sometimes hangs before returning during normal
    operation, although most of the time it is fine. We have seen hangs of
    up to 15-20 seconds during resync. Might this be a less severe example
    of the lock-up which causes a timeout and reset after 30 seconds?

  - We've also had a few occasions of O_SYNC writes to raid arrays (from
    qemu-kvm via LVM2) completely deadlocking against resync writes when the
    maximum md resync speed is set sufficiently high, even where the minimum
    md resync speed is set to zero (although this certainly helps). However,
    I suspect this is an unrelated issue as I've seen this on other hardware
    running other kernel configs.

For reference, we're using the ahci driver and deadline IO scheduler with the
default tuning parameters, our motherboards are SuperMicro X7DBN (Intel ESB2
SATA 3.0Gbps Controller) and we have six 750GB Seagate ST3750523AS drives
attached to each motherboard. Also, since first reporting this, I've managed
to reproduce the problem whilst running Linux 2.6.29.6, 2.6.30.5 and the
newly released 2.6.31.

What do you think are our next steps in tracking this one down should be? My
only ideas are:

  - We could experiment with NCQ settings. I've already briefly changed
    /sys/block/sd*/device/queue_depth down from 31 to 1. It didn't seem to stop
    delays in getting back info from /proc/mdstat, so put it back up again fearing
    that the performance hit would make the problem worse, but perhaps I should
    leave it off for a more extended period to verify that we still get timeouts
    long enough to leave slots without it?

  - We could try replacing the drives that are currently kicked out of one of the
    arrays with drives from another manufacturer to see if the drive model
    is implicated. Is the drive or the controller a more likely problem?

Any advice would be very gratefully received.

Cheers,

Chris.

[1] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
    ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
            res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
    ata5.00: status: { DRDY }
    ata5: hard resetting link
    ata5: softreset failed (device not ready)
    ata5: hard resetting link
    ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
    ata5.00: configured for UDMA/133
    ata5: EH complete
    end_request: I/O error, dev sde, sector 1465147264
    md: super_written gets error=-5, uptodate=0
    raid10: Disk failure on sde3, disabling device.
    raid10: Operation continuing on 4 devices.

[2] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
    ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
             res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
    ata1.00: status: { DRDY }
    ata1: hard resetting link
    ata1: softreset failed (device not ready)
    ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
    ata1.00: qc timeout (cmd 0xec)
    ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
    ata1.00: revalidation failed (errno=-5)
    ata1: hard resetting link
    ata1: softreset failed (device not ready)
    ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
    ata1.00: qc timeout (cmd 0xec)
    ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
    ata1.00: revalidation failed (errno=-5)
    ata1: limiting SATA link speed to 1.5 Gbps
    ata1: hard resetting link
    ata1: softreset failed (device not ready)
    ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
    ata1.00: qc timeout (cmd 0xec)
    ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
    ata1.00: revalidation failed (errno=-5)
    ata1.00: disabled
    ata1: hard resetting link
    ata1: softreset failed (device not ready)
    ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
    ata1: EH complete
    sd 0:0:0:0: [sda] Unhandled error code
    sd 0:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00
    end_request: I/O error, dev sda, sector 1465147272
    end_request: I/O error, dev sda, sector 1465147272
    md: super_written gets error=-5, uptodate=0
    raid10: Disk failure on sda3, disabling device.
    raid10: Operation continuing on 4 devices.
    sd 0:0:0:0: [sda] Unhandled error code
    sd 0:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00
    end_request: I/O error, dev sda, sector 8396584
    end_request: I/O error, dev sda, sector 8396584
    md: super_written gets error=-5, uptodate=0
    raid1: Disk failure on sda1, disabling device.
    raid1: Operation continuing on 5 devices.
    sd 0:0:0:0: [sda] Unhandled error code
    sd 0:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00
    end_request: I/O error, dev sda, sector 32
    raid1: sda1: rescheduling sector 0
    sd 0:0:0:0: [sda] Unhandled error code
    sd 0:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00
    end_request: I/O error, dev sda, sector 8396800
    raid10: sda2: rescheduling sector 0
    sd 0:0:0:0: [sda] Unhandled error code
    sd 0:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00
    end_request: I/O error, dev sda, sector 8396800
    sd 0:0:0:0: [sda] Unhandled error code
    sd 0:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00
    end_request: I/O error, dev sda, sector 8396800
    raid10: Disk failure on sda2, disabling device.
    raid10: Operation continuing on 5 devices.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-14 14:25                                               ` Mark Lord
@ 2009-09-16 23:19                                                 ` Chris Webb
  2009-09-17 13:29                                                   ` Mark Lord
  0 siblings, 1 reply; 84+ messages in thread
From: Chris Webb @ 2009-09-16 23:19 UTC (permalink / raw)
  To: Mark Lord
  Cc: Tejun Heo, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

Mark Lord <liml@rtr.ca> writes:

> I suspect we're missing some info from this specific failure.
> Looking back at Chris's earlier posting, the whole thing started
> with a FLUSH_CACHE_EXT failure.  Once that happens, all bets are
> off on anything that follows.
> 
> >Everything will be running fine when suddenly:
> >
> >  ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> >  ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
> >          res 40/00:00:80:17:91/00:00:37:00:00/40 Emask 0x4 (timeout)
> >  ata1.00: status: { DRDY }
> >  ata1: hard resetting link
> >  ata1: softreset failed (device not ready)
> >  ata1: hard resetting link
> >  ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> >  ata1.00: configured for UDMA/133
> >  ata1: EH complete
> >  end_request: I/O error, dev sda, sector 1465147272
> >  md: super_written gets error=-5, uptodate=0
> >  raid10: Disk failure on sda3, disabling device.
> >  raid10: Operation continuing on 5 devices.

Hi Mark. Yes, when the first timeout after a clean boot happens, it's with
an 0xea flush command every time:

  [...]
  ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  ata5.00: ATA-8: ST3750523AS, CC34, max UDMA/133
  ata5.00: 1465149168 sectors, multi 0: LBA48 NCQ (depth 31/32)
  ata5.00: configured for UDMA/133
  scsi 4:0:0:0: Direct-Access     ATA      ST3750523AS      CC34 PQ: 0 ANSI: 5
  sd 4:0:0:0: [sde] 1465149168 512-byte hardware sectors: (750 GB/698 GiB)
  sd 4:0:0:0: [sde] Write Protect is off
  sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00
  sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  sd 4:0:0:0: [sde] 1465149168 512-byte hardware sectors: (750 GB/698 GiB)
  sd 4:0:0:0: [sde] Write Protect is off
  sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00
  sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
   sde: sde1 sde2 sde3
  sd 4:0:0:0: [sde] Attached SCSI disk
  sd 4:0:0:0: Attached scsi generic sg4 type 0

  [later]
  ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
  ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
           res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
  ata5.00: status: { DRDY }
  ata5: hard resetting link
  ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  ata5.00: configured for UDMA/133
  ata5: EH complete
  sd 4:0:0:0: [sde] 1465149168 512-byte hardware sectors: (750 GB/698 GiB)
  sd 4:0:0:0: [sde] Write Protect is off
  sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00
  sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  end_request: I/O error, dev sde, sector 1465147264
  md: super_written gets error=-5, uptodate=0
  raid10: Disk failure on sde3, disabling device.
  raid10: Operation continuing on 4 devices.

Best wishes,

Chris.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-16 22:28                                 ` Chris Webb
@ 2009-09-16 23:47                                   ` Tejun Heo
  2009-09-17  0:34                                     ` Neil Brown
                                                       ` (2 more replies)
  0 siblings, 3 replies; 84+ messages in thread
From: Tejun Heo @ 2009-09-16 23:47 UTC (permalink / raw)
  To: Chris Webb
  Cc: Ric Wheeler, Andrei Tanas, NeilBrown, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik, Mark Lord

Hello,

Chris Webb wrote:
> Hi Tejun. Thanks for following up to this. We've done some more
> experimentation over the last couple of days based on your
> suggestions and thoughts.
> 
> Tejun Heo <tj@kernel.org> writes:
>> Seriously, it's most likely a hardware malfunction although I can't tell
>> where the problem is with the given data.  Get the hardware fixed.
> 
> We know this isn't caused by a single faulty piece of hardware,
> because we have a cluster of identical machines and all have shown
> this behaviour. This doesn't mean that there isn't a hardware
> problem, but if there is one, it's a design problem or firmware bug
> affecting all of our hosts.

If it's multiple machines, it's much less likely to be faulty drives,
but if the machines are configured mostly identically, hardware
problems can't be ruled out either.

> There have also been a few reports of problems which look very
> similar in this thread from people with somewhat different hardware
> and drives to ours.

I wouldn't connect the reported cases too eagerly at this point.  Too
many different causes end up showing similar symptoms especially with
timeouts.

>> The aboves are IDENTIFY.  Who's issuing IDENTIFY regularly?  It isn't
>> from the regular IO paths or md.  It's probably being issued via SG_IO
>> from userland.  These failures don't affect normal operation.
> [...]
>> Oooh, another possibility is the above continuous IDENTIFY tries.
>> Doing things like that generally isn't a good idea because vendors
>> don't expect IDENTIFY to be mixed regularly with normal IOs and
>> firmwares aren't tested against that.  Even smart commands sometimes
>> cause problems.  So, finding out the thing which is obsessed with the
>> identity of the drive and stopping it might help.
> 
> We tracked this down to some (excessively frequent!) monitoring we
> were doing using smartctl. Things were improved considerably by
> stopping smartd and disabling all callers of smartctl, although it
> doesn't appear to have been a cure. The frequency of these timeouts
> during resync seems to have gone from about once every two hours to
> about once a day, which means we've been able to complete some
> resyncs whereas we were unable to before.

That's interesting.  One important side effect of issuing IDENTIFY is
that they will serialize command streams as they are not NCQ commands
and thus could change command patterns significantly.

> What we still see are (fewer) 'frozen' exceptions leading to a drive
> reset and an 'end_request: I/O error', such as [1]. The drive is
> then promptly kicked out of the raid array.

That's flush timeout and md is right to kick the drive out.

> Some of these timeouts also leave us with a completely dead drive,
> and we need to reboot the machine before it can be accessed
> again. (Hot plugging it out and back in again isn't sufficient to
> bring it back to life, so maybe a controller problem, although other
> drives on the same controller stay alive?) An example is [2].

Ports behave mostly independently and it sure is possible that one
port locks up while others operate fine.  I've never seen such
incidents reported for intel ahci's tho.  If you hot unplug and then
replug the drive, what does the kernel say?

> There are two more symptoms we are seeing on the same which may be
> connected, or may be separate bugs in their own right:
> 
>   - 'cat /proc/mdstat' sometimes hangs before returning during normal
>     operation, although most of the time it is fine. We have seen hangs of
>     up to 15-20 seconds during resync. Might this be a less severe example
>     of the lock-up which causes a timeout and reset after 30 seconds?
> 
>   - We've also had a few occasions of O_SYNC writes to raid arrays (from
>     qemu-kvm via LVM2) completely deadlocking against resync writes when the
>     maximum md resync speed is set sufficiently high, even where the minimum
>     md resync speed is set to zero (although this certainly helps). However,
>     I suspect this is an unrelated issue as I've seen this on other hardware
>     running other kernel configs.

I think these two will be best answered by Neil Brown.  Neil?

> For reference, we're using the ahci driver and deadline IO scheduler with the
> default tuning parameters, our motherboards are SuperMicro X7DBN (Intel ESB2
> SATA 3.0Gbps Controller) and we have six 750GB Seagate ST3750523AS drives
> attached to each motherboard. Also, since first reporting this, I've managed
> to reproduce the problem whilst running Linux 2.6.29.6, 2.6.30.5 and the
> newly released 2.6.31.
> 
> What do you think are our next steps in tracking this one down should be? My
> only ideas are:
> 
>   - We could experiment with NCQ settings. I've already briefly
>     changed /sys/block/sd*/device/queue_depth down from 31 to 1. It
>     didn't seem to stop delays in getting back info from
>     /proc/mdstat, so put it back up again fearing that the
>     performance hit would make the problem worse, but perhaps I
>     should leave it off for a more extended period to verify that we
>     still get timeouts long enough to leave slots without it?
> 
>   - We could try replacing the drives that are currently kicked out
>     of one of the arrays with drives from another manufacturer to
>     see if the drive model is implicated. Is the drive or the
>     controller a more likely problem?

The most common cause for FLUSH timeout has been power related issues.
This problem becomes more pronounced in RAID configurations because
FLUSHes end up being issued to all drives in the array simultaneously
causing concurrent power spikes from the drives.  When proper barrier
was introduced to md earlier this year, I got two separate reports
where brief voltage drops caused by simultaneous FLUSHes led to drives
powering off briefly and losing data in its buffer leading to data
corruption.  People always think their PSUs are good because they are
rated high wattage and bear hefty price tag but many people reporting
problems which end up being diagnosed as power problem have these
fancy PSUs.

So, if your machines share the same configuration, the first thing
I'll do would be to prepare a separate PSU, power it up and connect
half of the drives including what used to be the offending one to it
and see whether the failure pattern changes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-16 23:47                                   ` Tejun Heo
@ 2009-09-17  0:34                                     ` Neil Brown
  2009-09-17 12:00                                       ` Chris Webb
  2009-09-17 11:57                                     ` Chris Webb
  2009-09-17 13:35                                     ` Mark Lord
  2 siblings, 1 reply; 84+ messages in thread
From: Neil Brown @ 2009-09-17  0:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Chris Webb, Ric Wheeler, Andrei Tanas, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik, Mark Lord

On Thursday September 17, tj@kernel.org wrote:
> 
> > There are two more symptoms we are seeing on the same which may be
> > connected, or may be separate bugs in their own right:
> > 
> >   - 'cat /proc/mdstat' sometimes hangs before returning during normal
> >     operation, although most of the time it is fine. We have seen hangs of
> >     up to 15-20 seconds during resync. Might this be a less severe example
> >     of the lock-up which causes a timeout and reset after 30 seconds?
> > 
> >   - We've also had a few occasions of O_SYNC writes to raid arrays (from
> >     qemu-kvm via LVM2) completely deadlocking against resync writes when the
> >     maximum md resync speed is set sufficiently high, even where the minimum
> >     md resync speed is set to zero (although this certainly helps). However,
> >     I suspect this is an unrelated issue as I've seen this on other hardware
> >     running other kernel configs.
> 
> I think these two will be best answered by Neil Brown.  Neil?
> 

"cat /proc/mdstat" should only hang if the mddev reconfig_mutex is
held for an extended period of time.
The reconfig_mutex is held while superblocks are being written.

So yes, an extended device timeout while updating the md superblock
can cause "cat /proc/mdstat" to hang for the duration of the timeout.

For the O_SYNC:
  I think this is a RAID1 - is that correct?
  With RAID1, as soon as any IO request arrives, resync is suspended and
  as soon as all resync requests complete, the IO is permitted to
  proceed.
  So normal IO takes absolute precedence over resync IO.

  So I am very surprised to here that O_SYNC writes deadlock
  completed.
  As O_SYNC writes are serialised, there will be a moment between
  every pair when there is no IO pending.  This will allow resync to
  get one "window" of resync IO started between each pair of writes.
  So I can well believe that a sequence of O_SYNC writes are a couple
  of orders of magnitude slower when resync is happening than without.
  But it shouldn't deadlock completely.
  Once you get about 64 sectors of O_SYNC IO through, the resync
  should notice and back-off and resync IO will be limited to the
  'minimum' speed.

NeilBrown

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-16 23:47                                   ` Tejun Heo
  2009-09-17  0:34                                     ` Neil Brown
@ 2009-09-17 11:57                                     ` Chris Webb
  2009-09-17 15:44                                       ` Tejun Heo
  2009-09-17 13:35                                     ` Mark Lord
  2 siblings, 1 reply; 84+ messages in thread
From: Chris Webb @ 2009-09-17 11:57 UTC (permalink / raw)
  To: Tejun Heo, Neil Brown
  Cc: Ric Wheeler, Andrei Tanas, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik, Mark Lord

Tejun Heo <tj@kernel.org> writes:

> The most common cause for FLUSH timeout has been power related issues.
> This problem becomes more pronounced in RAID configurations because
> FLUSHes end up being issued to all drives in the array simultaneously
> causing concurrent power spikes from the drives.  When proper barrier
> was introduced to md earlier this year, I got two separate reports
> where brief voltage drops caused by simultaneous FLUSHes led to drives
> powering off briefly and losing data in its buffer leading to data
> corruption.  People always think their PSUs are good because they are
> rated high wattage and bear hefty price tag but many people reporting
> problems which end up being diagnosed as power problem have these
> fancy PSUs.

Hi Tejun. This sounds very plausible as a diagnosis. Six drives hanging off the
single power supply is that maximum that can be fitted in this Supermicro
chassis, and we have 32GB of RAM and two 4-core Xeon processors in there too,
so we could well be right at the limit for the rating of the power supply.

> So, if your machines share the same configuration, the first thing I'll do
> would be to prepare a separate PSU, power it up and connect half of the
> drives including what used to be the offending one to it and see whether
> the failure pattern changes.

It's quite hard for us to do this with these machines as we have them managed
by a third party in a datacentre to which we don't have physical access.
However, I could very easily get an extra 'test' machine built in there,
generate a work load that consistently reproduces the problems on the six
drives, and then retry with an array build from 5, 4, 3 and 2 drives
successively, taking out the unused drives from chassis, to see if reducing the
load on the power supply with a smaller array helps.

When I try to write a test case, would it be worth me trying to reproduce
without md in the loop, e.g. do 6-way simultaneous random-seek+write+sync
continuously, or is it better to rely on md's barrier support and just do
random-seek+write via md? Is there a standard work pattern/write size that
would be particularly likely to provoke power overload problems on drives?

Neil Brown <neilb@suse.de> writes:

> [Chris Webb <chris@arachsys.com> wrote:]
>
> > 'cat /proc/mdstat' sometimes hangs before returning during normal
> > operation, although most of the time it is fine. We have seen hangs of
> > up to 15-20 seconds during resync. Might this be a less severe example
> > of the lock-up which causes a timeout and reset after 30 seconds?
>
> "cat /proc/mdstat" should only hang if the mddev reconfig_mutex is
> held for an extended period of time.
> The reconfig_mutex is held while superblocks are being written.
> 
> So yes, an extended device timeout while updating the md superblock
> can cause "cat /proc/mdstat" to hang for the duration of the timeout.

Thanks Neil. This implies that when we see these fifteen second hangs reading
/proc/mdstat without write errors, there are genuinely successful superblock
writes which are taking fifteen seconds to complete, presumably corresponding
to flushes which complete but take a full 15s to do so.

Would such very slow (but ultimately successful) flushes be consistent with the
theory of power supply issues affecting the drives? It feels like the 30s
timeouts on flush could be just a more severe version of the 15s very slow
flushes.

Tejun Heo <tj@kernel.org> writes:

> > Some of these timeouts also leave us with a completely dead drive,
> > and we need to reboot the machine before it can be accessed
> > again. (Hot plugging it out and back in again isn't sufficient to
> > bring it back to life, so maybe a controller problem, although other
> > drives on the same controller stay alive?) An example is [2].
> 
> Ports behave mostly independently and it sure is possible that one
> port locks up while others operate fine.  I've never seen such
> incidents reported for intel ahci's tho.  If you hot unplug and then
> replug the drive, what does the kernel say?

We've only tried this once, and on that occasion there was nothing in the
kernel log at all. (I actually telephoned the data centre engineer to ask when
he was going to do it for us because I didn't see any messages, and it turned
out he already had!)

Cheers,

Chris.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-17  0:34                                     ` Neil Brown
@ 2009-09-17 12:00                                       ` Chris Webb
  0 siblings, 0 replies; 84+ messages in thread
From: Chris Webb @ 2009-09-17 12:00 UTC (permalink / raw)
  To: Neil Brown
  Cc: Tejun Heo, Ric Wheeler, Andrei Tanas, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik, Mark Lord

Neil Brown <neilb@suse.de> writes:

> For the O_SYNC:
>   I think this is a RAID1 - is that correct?

Hi Neil. It's a RAID10n2 of six disks, but I've also seen the behaviour on a
RAID1 of two disks around the time of 2.6.27.

>   With RAID1, as soon as any IO request arrives, resync is suspended and
>   as soon as all resync requests complete, the IO is permitted to
>   proceed.
>   So normal IO takes absolute precedence over resync IO.
> 
>   So I am very surprised to here that O_SYNC writes deadlock
>   completed.
>   As O_SYNC writes are serialised, there will be a moment between
>   every pair when there is no IO pending.  This will allow resync to
>   get one "window" of resync IO started between each pair of writes.
>   So I can well believe that a sequence of O_SYNC writes are a couple
>   of orders of magnitude slower when resync is happening than without.
>   But it shouldn't deadlock completely.
>   Once you get about 64 sectors of O_SYNC IO through, the resync
>   should notice and back-off and resync IO will be limited to the
>   'minimum' speed.

The symptoms seem to be that I can't read or write to /dev/mdX but I can
read from the underlying /dev/sd* devices fine, at pretty much full speed. I
didn't try writing to them as there's lots of live customer data on the RAID
arrays!

The configuration is lvm2 (i.e. device-mapper linear targets) on top of md
on top of sd, and we've seen the symptoms with the virtual machines
accessing the logical volumes configured to open in O_SYNC mode, and with
them configured to open in O_DIRECT mode. During the deadlock, cat
/proc/mdstat does return promptly (i.e. not blocked), and shows a slow and
gradually falling sync rate---I think that there's no sync writing going on
either and the drives are genuinely idle. We have to reset the machine to
bring it back to life and a graceful reboot fails.

Anyway, I see this relatively infrequently, so what I'll try to do is to
create a reproducible test case and then follow up to you and the RAID list
with that. At the moment, I understand that my reports is a bit anecdotal,
and without a proper idea of what conditions are needed to make it happen
its pretty much impossible to diagnose or work on!

Cheers,

Chris.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-16 23:19                                                 ` Chris Webb
@ 2009-09-17 13:29                                                   ` Mark Lord
  2009-09-17 13:32                                                     ` Mark Lord
                                                                       ` (2 more replies)
  0 siblings, 3 replies; 84+ messages in thread
From: Mark Lord @ 2009-09-17 13:29 UTC (permalink / raw)
  To: Chris Webb
  Cc: Tejun Heo, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

Chris Webb wrote:
> Mark Lord <liml@rtr.ca> writes:
> 
>> I suspect we're missing some info from this specific failure.
>> Looking back at Chris's earlier posting, the whole thing started
>> with a FLUSH_CACHE_EXT failure.  Once that happens, all bets are
>> off on anything that follows.
>>
>>> Everything will be running fine when suddenly:
>>>
>>>  ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>>>  ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
>>>          res 40/00:00:80:17:91/00:00:37:00:00/40 Emask 0x4 (timeout)
>>>  ata1.00: status: { DRDY }
>>>  ata1: hard resetting link
>>>  ata1: softreset failed (device not ready)
>>>  ata1: hard resetting link
>>>  ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>>  ata1.00: configured for UDMA/133
>>>  ata1: EH complete
>>>  end_request: I/O error, dev sda, sector 1465147272
>>>  md: super_written gets error=-5, uptodate=0
>>>  raid10: Disk failure on sda3, disabling device.
>>>  raid10: Operation continuing on 5 devices.
> 
> Hi Mark. Yes, when the first timeout after a clean boot happens, it's with
> an 0xea flush command every time:
..

Yes.  Is this still happening from time to time now?
If so, disable the smartmontools daemon (smartd) and see if the problem goes away.
And especially disable hddtemp (which issues SMART commands) if that is also around.

It would be good to discover if those are the triggers for what's happening here.

Tejun.. do we do a FLUSH CACHE before issuing a non-NCQ command ?
If not, then I think we may need to add code to do it.


Cheers

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-17 13:29                                                   ` Mark Lord
@ 2009-09-17 13:32                                                     ` Mark Lord
  2009-09-17 13:37                                                     ` Chris Webb
  2009-09-17 15:35                                                     ` Tejun Heo
  2 siblings, 0 replies; 84+ messages in thread
From: Mark Lord @ 2009-09-17 13:32 UTC (permalink / raw)
  To: Chris Webb
  Cc: Tejun Heo, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

Mark Lord wrote:
>
> Is this still happening from time to time now?
> If so, disable the smartmontools daemon (smartd) and see if the problem 
> goes away.
> And especially disable hddtemp (which issues SMART commands) if that is 
> also around.
> 
> It would be good to discover if those are the triggers for what's 
> happening here.
..

Ah.. I've just now read your other recent posting, so no need to answer again here.



> Tejun.. do we do a FLUSH CACHE before issuing a non-NCQ command ?
> If not, then I think we may need to add code to do it.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-16 23:47                                   ` Tejun Heo
  2009-09-17  0:34                                     ` Neil Brown
  2009-09-17 11:57                                     ` Chris Webb
@ 2009-09-17 13:35                                     ` Mark Lord
  2009-09-17 15:47                                       ` Tejun Heo
  2 siblings, 1 reply; 84+ messages in thread
From: Mark Lord @ 2009-09-17 13:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Chris Webb, Ric Wheeler, Andrei Tanas, NeilBrown, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik, Mark Lord

Tejun Heo wrote:
> Hello,
> 
> Chris Webb wrote:
>> Hi Tejun. Thanks for following up to this. We've done some more
>> experimentation over the last couple of days based on your
>> suggestions and thoughts.
>>
>> Tejun Heo <tj@kernel.org> writes:
>>> Seriously, it's most likely a hardware malfunction although I can't tell
>>> where the problem is with the given data.  Get the hardware fixed.
>> We know this isn't caused by a single faulty piece of hardware,
>> because we have a cluster of identical machines and all have shown
>> this behaviour. This doesn't mean that there isn't a hardware
>> problem, but if there is one, it's a design problem or firmware bug
>> affecting all of our hosts.
> 
> If it's multiple machines, it's much less likely to be faulty drives,
> but if the machines are configured mostly identically, hardware
> problems can't be ruled out either.
> 
>> There have also been a few reports of problems which look very
>> similar in this thread from people with somewhat different hardware
>> and drives to ours.
> 
> I wouldn't connect the reported cases too eagerly at this point.  Too
> many different causes end up showing similar symptoms especially with
> timeouts.
> 
>>> The aboves are IDENTIFY.  Who's issuing IDENTIFY regularly?  It isn't
>>> from the regular IO paths or md.  It's probably being issued via SG_IO
>>> from userland.  These failures don't affect normal operation.
>> [...]
>>> Oooh, another possibility is the above continuous IDENTIFY tries.
>>> Doing things like that generally isn't a good idea because vendors
>>> don't expect IDENTIFY to be mixed regularly with normal IOs and
>>> firmwares aren't tested against that.  Even smart commands sometimes
>>> cause problems.  So, finding out the thing which is obsessed with the
>>> identity of the drive and stopping it might help.
>> We tracked this down to some (excessively frequent!) monitoring we
>> were doing using smartctl. Things were improved considerably by
>> stopping smartd and disabling all callers of smartctl, although it
>> doesn't appear to have been a cure. The frequency of these timeouts
>> during resync seems to have gone from about once every two hours to
>> about once a day, which means we've been able to complete some
>> resyncs whereas we were unable to before.
> 
> That's interesting.  One important side effect of issuing IDENTIFY is
> that they will serialize command streams as they are not NCQ commands
> and thus could change command patterns significantly.
..

SMART is the opcode that is most frequently implicated here, not IDENTIFY.
Note that even a barrier FLUSH CACHE is non NCQ and will serialize the stream.

Cheers


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-17 13:29                                                   ` Mark Lord
  2009-09-17 13:32                                                     ` Mark Lord
@ 2009-09-17 13:37                                                     ` Chris Webb
  2009-09-17 15:35                                                     ` Tejun Heo
  2 siblings, 0 replies; 84+ messages in thread
From: Chris Webb @ 2009-09-17 13:37 UTC (permalink / raw)
  To: Mark Lord
  Cc: Tejun Heo, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

Mark Lord <liml@rtr.ca> writes:

> Yes.  Is this still happening from time to time now?

Hi Mark. We're still seeing the flush timeouts (0xea) with accompanying
errors, but not the IDENTIFYs anymore. We now don't have smartd nor smartctl
on the systems, and there's nothing else running in userspace that accesses
the drives other than via block reads and writes via the md arrays.

Cheers,

Chris.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-17 13:29                                                   ` Mark Lord
  2009-09-17 13:32                                                     ` Mark Lord
  2009-09-17 13:37                                                     ` Chris Webb
@ 2009-09-17 15:35                                                     ` Tejun Heo
  2009-09-17 16:16                                                       ` Mark Lord
  2 siblings, 1 reply; 84+ messages in thread
From: Tejun Heo @ 2009-09-17 15:35 UTC (permalink / raw)
  To: Mark Lord
  Cc: Chris Webb, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

Hello,

Mark Lord wrote:
> Tejun.. do we do a FLUSH CACHE before issuing a non-NCQ command ?

Nope.

> If not, then I think we may need to add code to do it.

Hmm... can you explain a bit more?  That seems rather extreme to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-17 11:57                                     ` Chris Webb
@ 2009-09-17 15:44                                       ` Tejun Heo
  2009-09-17 16:36                                         ` Allan Wind
                                                           ` (2 more replies)
  0 siblings, 3 replies; 84+ messages in thread
From: Tejun Heo @ 2009-09-17 15:44 UTC (permalink / raw)
  To: Chris Webb
  Cc: Neil Brown, Ric Wheeler, Andrei Tanas, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik, Mark Lord

Hello,

Chris Webb wrote:
> It's quite hard for us to do this with these machines as we have
> them managed by a third party in a datacentre to which we don't have
> physical access.  However, I could very easily get an extra 'test'
> machine built in there, generate a work load that consistently
> reproduces the problems on the six drives, and then retry with an
> array build from 5, 4, 3 and 2 drives successively, taking out the
> unused drives from chassis, to see if reducing the load on the power
> supply with a smaller array helps.

Yeap, that also should shed some light on it.

> When I try to write a test case, would it be worth me trying to
> reproduce without md in the loop, e.g. do 6-way simultaneous
> random-seek+write+sync continuously, or is it better to rely on md's
> barrier support and just do random-seek+write via md? Is there a
> standard work pattern/write size that would be particularly likely
> to provoke power overload problems on drives?

Excluding it out of the chain would be helpful but if md can reproduce
the problem reliably trying with md first would be easier.  :-)

>> So yes, an extended device timeout while updating the md superblock
>> can cause "cat /proc/mdstat" to hang for the duration of the timeout.
> 
> Thanks Neil. This implies that when we see these fifteen second
> hangs reading /proc/mdstat without write errors, there are genuinely
> successful superblock writes which are taking fifteen seconds to
> complete, presumably corresponding to flushes which complete but
> take a full 15s to do so.
>
> Would such very slow (but ultimately successful) flushes be
> consistent with the theory of power supply issues affecting the
> drives? It feels like the 30s timeouts on flush could be just a more
> severe version of the 15s very slow flushes.

Probably not.  Power problems usually don't resolve themselves with
longer timeout.  If the drive genuinely takes longer than 30s to
flush, it would be very interesting tho.  That's something people have
been worrying about but hasn't materialized yet.  The timeout is
controlled by SD_TIMEOUT in drivers/scsi/sd.h.  You might want to bump
it up to, say, 60s and see whether anything changes.

>>> Some of these timeouts also leave us with a completely dead drive,
>>> and we need to reboot the machine before it can be accessed
>>> again. (Hot plugging it out and back in again isn't sufficient to
>>> bring it back to life, so maybe a controller problem, although other
>>> drives on the same controller stay alive?) An example is [2].
>> Ports behave mostly independently and it sure is possible that one
>> port locks up while others operate fine.  I've never seen such
>> incidents reported for intel ahci's tho.  If you hot unplug and then
>> replug the drive, what does the kernel say?
> 
> We've only tried this once, and on that occasion there was nothing
> in the kernel log at all. (I actually telephoned the data centre
> engineer to ask when he was going to do it for us because I didn't
> see any messages, and it turned out he already had!)

Hmmm... that means the host port was dead.  Strange, I've never seen
intel ahci doing that.  If possible, it would be great if you can
verify it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-17 13:35                                     ` Mark Lord
@ 2009-09-17 15:47                                       ` Tejun Heo
  0 siblings, 0 replies; 84+ messages in thread
From: Tejun Heo @ 2009-09-17 15:47 UTC (permalink / raw)
  To: Mark Lord
  Cc: Chris Webb, Ric Wheeler, Andrei Tanas, NeilBrown, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik, Mark Lord

Hello,

Mark Lord wrote:
>> That's interesting.  One important side effect of issuing IDENTIFY is
>> that they will serialize command streams as they are not NCQ commands
>> and thus could change command patterns significantly.
> ..
> 
> SMART is the opcode that is most frequently implicated here, not IDENTIFY.

Yeap, any non-NCQ commands would do it.

> Note that even a barrier FLUSH CACHE is non NCQ and will serialize
> the stream.

Yeah, I was just thinking that issuing non-NCQ commands mixed with NCQ
commands would make the command stream fluctuate more.  Modern drives
are pretty good at lowering power consumption while idle so being more
fluctuative (is it a word?) might have something to do with the
problem.  Just a wild speculation tho.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-17 15:35                                                     ` Tejun Heo
@ 2009-09-17 16:16                                                       ` Mark Lord
  2009-09-17 16:17                                                         ` Mark Lord
  2009-09-20 18:36                                                         ` Robert Hancock
  0 siblings, 2 replies; 84+ messages in thread
From: Mark Lord @ 2009-09-17 16:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Chris Webb, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

Tejun Heo wrote:
> Hello,
> 
> Mark Lord wrote:
>> Tejun.. do we do a FLUSH CACHE before issuing a non-NCQ command ?
> 
> Nope.
> 
>> If not, then I think we may need to add code to do it.
> 
> Hmm... can you explain a bit more?  That seems rather extreme to me.
..

You may recall that I first raised this issue about a year ago,
when my own RAID0 array (MythTV box) started showing errors very
similar to what Chris is reporting.

These were easily triggered by running hddtemp once every few seconds
to log drive temperatures during Myth recording sessions.

hddtemp uses SMART commands.

The actual errors in the logs were command timeouts,
but at this point I no longer remember which opcode was
actually timing out.  Disabling the onboard write cache
immediately "cured" the problem, at the expense of MUCH
slower I/O times.

My theory at the time, was that some non-NCQ commands might be triggering
an internal FLUSH CACHE within the (Hitachi) drive firmware, which then
caused the original command to timeout in libata (due to the large amounts
of data present in the onboard write-caches).

Now that more people are playing the game, we're seeing more and more
reports of strange interactions with smartd running in the background.

I suspect more and more now that this is an (avoidable) interaction
between the write-cache and the SMART opcode, and it could perhaps be
avoided by doing a FLUSH CACHE before any SMART (or non-data command) opcode.

Cheers

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-17 16:16                                                       ` Mark Lord
@ 2009-09-17 16:17                                                         ` Mark Lord
  2009-09-18 17:05                                                           ` Chris Webb
  2009-09-20 18:36                                                         ` Robert Hancock
  1 sibling, 1 reply; 84+ messages in thread
From: Mark Lord @ 2009-09-17 16:17 UTC (permalink / raw)
  To: Chris Webb
  Cc: Tejun Heo, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

Mark Lord wrote:
> Tejun Heo wrote:
>> Hello,
>>
>> Mark Lord wrote:
>>> Tejun.. do we do a FLUSH CACHE before issuing a non-NCQ command ?
>>
>> Nope.
>>
>>> If not, then I think we may need to add code to do it.
>>
>> Hmm... can you explain a bit more?  That seems rather extreme to me.
> ..
> 
> You may recall that I first raised this issue about a year ago,
> when my own RAID0 array (MythTV box) started showing errors very
> similar to what Chris is reporting.
> 
> These were easily triggered by running hddtemp once every few seconds
> to log drive temperatures during Myth recording sessions.
> 
> hddtemp uses SMART commands.
> 
> The actual errors in the logs were command timeouts,
> but at this point I no longer remember which opcode was
> actually timing out.  Disabling the onboard write cache
> immediately "cured" the problem, at the expense of MUCH
> slower I/O times.
..

Speaking of which.. 

Chris:  I wonder if the errors will also vanish in your situation
by disabling the onboard write-caches in the drives ?

Eg.  hdparm -W0 /dev/sd?



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-17 15:44                                       ` Tejun Heo
@ 2009-09-17 16:36                                         ` Allan Wind
  2009-09-18  0:16                                           ` Tejun Heo
  2009-09-18 17:07                                         ` Chris Webb
  2009-09-20 18:46                                         ` Robert Hancock
  2 siblings, 1 reply; 84+ messages in thread
From: Allan Wind @ 2009-09-17 16:36 UTC (permalink / raw)
  To: IDE/ATA development list, linux-scsi

On 2009-09-18T00:44:45, Tejun Heo wrote:
> Hello,
> 
> Chris Webb wrote:
> > It's quite hard for us to do this with these machines as we have
> > them managed by a third party in a datacentre to which we don't have
> > physical access.  However, I could very easily get an extra 'test'
> > machine built in there, generate a work load that consistently
> > reproduces the problems on the six drives, and then retry with an
> > array build from 5, 4, 3 and 2 drives successively, taking out the
> > unused drives from chassis, to see if reducing the load on the power
> > supply with a smaller array helps.
> 
> Yeap, that also should shed some light on it.

I have a SuperMicro X8DT3-F motherboard with 2 (2 TB) WDC drives 
of the 8 bays available in the machine.  They are on a different 
controller LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS
which was flashed into "Integrated Target Mode" to get it running 
under Linux.

Disabling smartmontools seems to have helped in terms of failure 
frequency.  It is almost always the 2nd drive that is kicked out 
of the mirror although the last time it was the primary after 
disabling smart.  hddtemp was never running on this host.

[2256003.055451] end_request: I/O error, dev sdb, sector 3907028974
[2256003.055674] md: super_written gets error=-5, uptodate=0
[2256003.055677] raid1: Disk failure on sdb2, disabling device.
[2256003.055678] raid1: Operation continuing on 1 devices.
[2256003.437315] RAID1 conf printout:
[2256003.437318]  --- wd:1 rd:2
[2256003.437321]  disk 0, wo:0, o:1, dev:sda2
[2256003.437323]  disk 1, wo:1, o:0, dev:sdb2
[2256003.440542] RAID1 conf printout:
[2256003.440545]  --- wd:1 rd:2
[2256003.440548]  disk 0, wo:0, o:1, dev:sda2

[3880879.007618] end_request: I/O error, dev sda, sector 3907028974
[3880879.007839] md: super_written gets error=-5, uptodate=0
[3880879.007842] raid1: Disk failure on sda2, disabling device.
[3880879.007843] raid1: Operation continuing on 1 devices.
[3880879.028518] RAID1 conf printout:
[3880879.028521]  --- wd:1 rd:2
[3880879.028524]  disk 0, wo:1, o:0, dev:sda2
[3880879.028527]  disk 1, wo:0, o:1, dev:sdb2
[3880879.031607] RAID1 conf printout:
[3880879.031610]  --- wd:1 rd:2
[3880879.031613]  disk 1, wo:0, o:1, dev:sdb2

There is barely any load on this box.  Disabling NCQ did not help 
for me. 


/Allan
-- 
Allan Wind
Life Integrity, LLC
<http://lifeintegrity.com>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-17 16:36                                         ` Allan Wind
@ 2009-09-18  0:16                                           ` Tejun Heo
  2009-09-18  2:47                                             ` Allan Wind
  0 siblings, 1 reply; 84+ messages in thread
From: Tejun Heo @ 2009-09-18  0:16 UTC (permalink / raw)
  To: IDE/ATA development list, linux-scsi

Allan Wind wrote:
> On 2009-09-18T00:44:45, Tejun Heo wrote:
>> Hello,
>>
>> Chris Webb wrote:
>>> It's quite hard for us to do this with these machines as we have
>>> them managed by a third party in a datacentre to which we don't have
>>> physical access.  However, I could very easily get an extra 'test'
>>> machine built in there, generate a work load that consistently
>>> reproduces the problems on the six drives, and then retry with an
>>> array build from 5, 4, 3 and 2 drives successively, taking out the
>>> unused drives from chassis, to see if reducing the load on the power
>>> supply with a smaller array helps.
>> Yeap, that also should shed some light on it.
> 
> I have a SuperMicro X8DT3-F motherboard with 2 (2 TB) WDC drives 
> of the 8 bays available in the machine.  They are on a different 
> controller LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS
> which was flashed into "Integrated Target Mode" to get it running 
> under Linux.
> 
> Disabling smartmontools seems to have helped in terms of failure 
> frequency.  It is almost always the 2nd drive that is kicked out 
> of the mirror although the last time it was the primary after 
> disabling smart.  hddtemp was never running on this host.
> 
> [2256003.055451] end_request: I/O error, dev sdb, sector 3907028974
> [2256003.055674] md: super_written gets error=-5, uptodate=0
> [2256003.055677] raid1: Disk failure on sdb2, disabling device.
> [2256003.055678] raid1: Operation continuing on 1 devices.
> [2256003.437315] RAID1 conf printout:
> [2256003.437318]  --- wd:1 rd:2
> [2256003.437321]  disk 0, wo:0, o:1, dev:sda2
> [2256003.437323]  disk 1, wo:1, o:0, dev:sdb2
> [2256003.440542] RAID1 conf printout:
> [2256003.440545]  --- wd:1 rd:2
> [2256003.440548]  disk 0, wo:0, o:1, dev:sda2
> 
> [3880879.007618] end_request: I/O error, dev sda, sector 3907028974
> [3880879.007839] md: super_written gets error=-5, uptodate=0
> [3880879.007842] raid1: Disk failure on sda2, disabling device.
> [3880879.007843] raid1: Operation continuing on 1 devices.
> [3880879.028518] RAID1 conf printout:
> [3880879.028521]  --- wd:1 rd:2
> [3880879.028524]  disk 0, wo:1, o:0, dev:sda2
> [3880879.028527]  disk 1, wo:0, o:1, dev:sdb2
> [3880879.031607] RAID1 conf printout:
> [3880879.031610]  --- wd:1 rd:2
> [3880879.031613]  disk 1, wo:0, o:1, dev:sdb2
> 
> There is barely any load on this box.  Disabling NCQ did not help 
> for me. 

Can you please post full log?

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-18  0:16                                           ` Tejun Heo
@ 2009-09-18  2:47                                             ` Allan Wind
  0 siblings, 0 replies; 84+ messages in thread
From: Allan Wind @ 2009-09-18  2:47 UTC (permalink / raw)
  To: IDE/ATA development list, linux-scsi

On 2009-09-18T09:16:30, Tejun Heo wrote:
> Can you please post full log?

The sense key errors starting at 3964586 happened when I disabled 
the write cache (`hdparm -W0 /dev/sd{a,b}`).

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Linux version 2.6.30.4 (root@pawan) (gcc version 4.3.3 (Debian 4.3.3-13) ) #1 SMP Sat Aug 1 13:46:07 EDT 2009
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-2.6.30.4 root=/dev/md0 ro quiet
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
[    0.000000]  BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000bf790000 (usable)
[    0.000000]  BIOS-e820: 00000000bf790000 - 00000000bf79e000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000bf7d0000 - 00000000bf7e0000 (reserved)
[    0.000000]  BIOS-e820: 00000000bf7ec000 - 00000000c0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  BIOS-e820: 00000000ffc00000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 0000000640000000 (usable)
[    0.000000] DMI present.
[    0.000000] AMI BIOS detected: BIOS may corrupt low RAM, working around it.
[    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[    0.000000] last_pfn = 0x640000 max_arch_pfn = 0x100000000
[    0.000000] MTRR default type: uncachable
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-CFFFF write-protect
[    0.000000]   D0000-DFFFF uncachable
[    0.000000]   E0000-E7FFF write-through
[    0.000000]   E8000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 0000000000 mask FC00000000 write-back
[    0.000000]   1 base 0400000000 mask FE00000000 write-back
[    0.000000]   2 base 0600000000 mask FFC0000000 write-back
[    0.000000]   3 base 00C0000000 mask FFC0000000 uncachable
[    0.000000]   4 base 00BF800000 mask FFFF800000 uncachable
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[    0.000000] e820 update range: 00000000bf800000 - 0000000100000000 (usable) ==> (reserved)
[    0.000000] last_pfn = 0xbf790 max_arch_pfn = 0x100000000
[    0.000000] Scanning 0 areas for low memory corruption
[    0.000000] modified physical RAM map:
[    0.000000]  modified: 0000000000000000 - 0000000000010000 (reserved)
[    0.000000]  modified: 0000000000010000 - 000000000009bc00 (usable)
[    0.000000]  modified: 000000000009bc00 - 00000000000a0000 (reserved)
[    0.000000]  modified: 00000000000e4000 - 0000000000100000 (reserved)
[    0.000000]  modified: 0000000000100000 - 00000000bf790000 (usable)
[    0.000000]  modified: 00000000bf790000 - 00000000bf79e000 (ACPI data)
[    0.000000]  modified: 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS)
[    0.000000]  modified: 00000000bf7d0000 - 00000000bf7e0000 (reserved)
[    0.000000]  modified: 00000000bf7ec000 - 00000000c0000000 (reserved)
[    0.000000]  modified: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  modified: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  modified: 00000000ffc00000 - 0000000100000000 (reserved)
[    0.000000]  modified: 0000000100000000 - 0000000640000000 (usable)
[    0.000000] init_memory_mapping: 0000000000000000-00000000bf790000
[    0.000000]  0000000000 - 00bf600000 page 2M
[    0.000000]  00bf600000 - 00bf790000 page 4k
[    0.000000] kernel direct mapping tables up to bf790000 @ 10000-15000
[    0.000000] init_memory_mapping: 0000000100000000-0000000640000000
[    0.000000]  0100000000 - 0640000000 page 2M
[    0.000000] kernel direct mapping tables up to 640000000 @ 13000-2d000
[    0.000000] ACPI: RSDP 00000000000f9da0 00024 (v02 ACPIAM)
[    0.000000] ACPI: XSDT 00000000bf790100 00054 (v01 033009 XSDT1608 20090330 MSFT 00000097)
[    0.000000] ACPI: FACP 00000000bf790290 000F4 (v03 033009 FACP1608 20090330 MSFT 00000097)
[    0.000000] ACPI: DSDT 00000000bf7904b0 05A14 (v01  1XDT3 1XDT3003 00000003 INTL 20051117)
[    0.000000] ACPI: FACS 00000000bf79e000 00040
[    0.000000] ACPI: APIC 00000000bf790390 000D2 (v01 033009 APIC1608 20090330 MSFT 00000097)
[    0.000000] ACPI: MCFG 00000000bf790470 0003C (v01 033009 OEMMCFG  20090330 MSFT 00000097)
[    0.000000] ACPI: OEMB 00000000bf79e040 0007A (v01 033009 OEMB1608 20090330 MSFT 00000097)
[    0.000000] ACPI: DMAR 00000000bf79e0c0 00130 (v01    AMI  OEMDMAR 00000001 MSFT 00000097)
[    0.000000] ACPI: SSDT 00000000bf79fbf0 0249F (v01 DpgPmm    CpuPm 00000012 INTL 20051117)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at 0000000000000000-0000000640000000
[    0.000000] Bootmem setup node 0 0000000000000000-0000000640000000
[    0.000000]   NODE_DATA [0000000000028000 - 000000000002cfff]
[    0.000000]   bootmap [0000000000100000 -  00000000001c7fff] pages c8
[    0.000000] (7 early reservations) ==> bootmem [0000000000 - 0640000000]
[    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
[    0.000000]   #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
[    0.000000]   #2 [0001000000 - 0001737608]    TEXT DATA BSS ==> [0001000000 - 0001737608]
[    0.000000]   #3 [000009bc00 - 0000100000]    BIOS reserved ==> [000009bc00 - 0000100000]
[    0.000000]   #4 [0001738000 - 0001738178]              BRK ==> [0001738000 - 0001738178]
[    0.000000]   #5 [0000010000 - 0000013000]          PGTABLE ==> [0000010000 - 0000013000]
[    0.000000]   #6 [0000013000 - 0000028000]          PGTABLE ==> [0000013000 - 0000028000]
[    0.000000] found SMP MP-table at [ffff8800000ff780] ff780
[    0.000000]  [ffffe20000000000-ffffe20015dfffff] PMD -> [ffff880028200000-ffff88003d1fffff] on node 0
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000010 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   0x00100000 -> 0x00640000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[3] active PFN ranges
[    0.000000]     0: 0x00000010 -> 0x0000009b
[    0.000000]     0: 0x00000100 -> 0x000bf790
[    0.000000]     0: 0x00100000 -> 0x00640000
[    0.000000] On node 0 totalpages: 6289179
[    0.000000]   DMA zone: 56 pages used for memmap
[    0.000000]   DMA zone: 127 pages reserved
[    0.000000]   DMA zone: 3796 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 14280 pages used for memmap
[    0.000000]   DMA32 zone: 765896 pages, LIFO batch:31
[    0.000000]   Normal zone: 75264 pages used for memmap
[    0.000000]   Normal zone: 5429760 pages, LIFO batch:31
[    0.000000] ACPI: PM-Timer IO Port: 0x808
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x10] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x12] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x14] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x16] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x11] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x13] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x15] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x17] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x88] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x89] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x8a] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x8b] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x8c] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x8d] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x8e] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x10] lapic_id[0x8f] disabled)
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    0.000000] ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 8, version 0, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] ACPI: IRQ2 used by override.
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] SMP: Allowing 16 CPUs, 8 hotplug CPUs
[    0.000000] nr_irqs_gsi: 24
[    0.000000] PM: Registered nosave memory: 000000000009b000 - 000000000009c000
[    0.000000] PM: Registered nosave memory: 000000000009c000 - 00000000000a0000
[    0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000e4000
[    0.000000] PM: Registered nosave memory: 00000000000e4000 - 0000000000100000
[    0.000000] PM: Registered nosave memory: 00000000bf790000 - 00000000bf79e000
[    0.000000] PM: Registered nosave memory: 00000000bf79e000 - 00000000bf7d0000
[    0.000000] PM: Registered nosave memory: 00000000bf7d0000 - 00000000bf7e0000
[    0.000000] PM: Registered nosave memory: 00000000bf7e0000 - 00000000bf7ec000
[    0.000000] PM: Registered nosave memory: 00000000bf7ec000 - 00000000c0000000
[    0.000000] PM: Registered nosave memory: 00000000c0000000 - 00000000e0000000
[    0.000000] PM: Registered nosave memory: 00000000e0000000 - 00000000f0000000
[    0.000000] PM: Registered nosave memory: 00000000f0000000 - 00000000fee00000
[    0.000000] PM: Registered nosave memory: 00000000fee00000 - 00000000fee01000
[    0.000000] PM: Registered nosave memory: 00000000fee01000 - 00000000ffc00000
[    0.000000] PM: Registered nosave memory: 00000000ffc00000 - 0000000100000000
[    0.000000] Allocating PCI resources starting at c2000000 (gap: c0000000:20000000)
[    0.000000] NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:16 nr_node_ids:1
[    0.000000] PERCPU: Embedded 25 pages at ffff88003d200000, static data 72800 bytes
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 6199452
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-2.6.30.4 root=/dev/md0 ro quiet
[    0.000000] Initializing CPU#0
[    0.000000] Experimental hierarchical RCU implementation.
[    0.000000] Experimental hierarchical RCU init done.
[    0.000000] NR_IRQS:4352 nr_irqs:536
[    0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
[    0.000000] Extended CMOS year: 2000
[    0.000000] Fast TSC calibration using PIT
[    0.000000] Detected 2261.190 MHz processor.
[   83.997162] Console: colour VGA+ 80x25
[   83.997164] console [tty0] enabled
[   83.997449] Checking aperture...
[   84.003246] No AGP bridge found
[   84.003284] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[   84.009681] Placing 64MB software IO TLB between ffff880020000000 - ffff880024000000
[   84.009684] software IO TLB at phys 0x20000000 - 0x24000000
[   84.191210] Memory: 24737232k/26214400k available (3823k kernel code, 1057684k absent, 419484k reserved, 2177k data, 524k init)
[   84.191236] SLUB: Genslabs=14, HWalign=64, Order=0-3, MinObjects=0, CPUs=16, Nodes=1
[   84.191278] Calibrating delay loop (skipped), value calculated using timer frequency.. 4522.38 BogoMIPS (lpj=2261190)
[   84.191295] Security Framework initialized
[   84.193407] Dentry cache hash table entries: 4194304 (order: 13, 33554432 bytes)
[   84.200318] Inode-cache hash table entries: 2097152 (order: 12, 16777216 bytes)
[   84.203228] Mount-cache hash table entries: 256
[   84.203353] Initializing cgroup subsys ns
[   84.203356] Initializing cgroup subsys cpuacct
[   84.203361] Initializing cgroup subsys freezer
[   84.203374] CPU: Physical Processor ID: 1
[   84.203375] CPU: Processor Core ID: 0
[   84.203378] CPU: L1 I cache: 32K, L1 D cache: 32K
[   84.203380] CPU: L2 cache: 256K
[   84.203381] CPU: L3 cache: 8192K
[   84.203383] CPU 0/0x10 -> Node 0
[   84.203393] CPU0: Thermal monitoring enabled (TM1)
[   84.203397] CPU 0 MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8
[   84.203403] using mwait in idle threads.
[   84.203418] ACPI: Core revision 20090320
[   84.234438] Setting APIC routing to physical flat
[   84.234756] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[   84.244739] CPU0: Intel(R) Xeon(R) CPU           L5520  @ 2.27GHz stepping 05
[   84.346017] Booting processor 1 APIC 0x12 ip 0x6000
[   84.356480] Initializing CPU#1
[   84.416780] Calibrating delay using timer specific routine.. 4520.67 BogoMIPS (lpj=2260337)
[   84.416786] CPU: Physical Processor ID: 1
[   84.416787] CPU: Processor Core ID: 1
[   84.416789] CPU: L1 I cache: 32K, L1 D cache: 32K
[   84.416790] CPU: L2 cache: 256K
[   84.416791] CPU: L3 cache: 8192K
[   84.416793] CPU 1/0x12 -> Node 0
[   84.416802] CPU1: Thermal monitoring enabled (TM1)
[   84.416805] CPU 1 MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
[   84.417429] x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
[   84.418109] CPU1: Intel(R) Xeon(R) CPU           L5520  @ 2.27GHz stepping 05
[   84.418130] Skipping synchronization checks as TSC is reliable.
[   84.418186] Booting processor 2 APIC 0x14 ip 0x6000
[   84.428533] Initializing CPU#2
[   84.488616] Calibrating delay using timer specific routine.. 4520.67 BogoMIPS (lpj=2260338)
[   84.488622] CPU: Physical Processor ID: 1
[   84.488623] CPU: Processor Core ID: 2
[   84.488625] CPU: L1 I cache: 32K, L1 D cache: 32K
[   84.488626] CPU: L2 cache: 256K
[   84.488627] CPU: L3 cache: 8192K
[   84.488629] CPU 2/0x14 -> Node 0
[   84.488638] CPU2: Thermal monitoring enabled (TM1)
[   84.488641] CPU 2 MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
[   84.489267] x86 PAT enabled: cpu 2, old 0x7040600070406, new 0x7010600070106
[   84.489966] CPU2: Intel(R) Xeon(R) CPU           L5520  @ 2.27GHz stepping 05
[   84.489986] Skipping synchronization checks as TSC is reliable.
[   84.490043] Booting processor 3 APIC 0x16 ip 0x6000
[   84.500390] Initializing CPU#3
[   84.560451] Calibrating delay using timer specific routine.. 4520.67 BogoMIPS (lpj=2260335)
[   84.560458] CPU: Physical Processor ID: 1
[   84.560459] CPU: Processor Core ID: 3
[   84.560461] CPU: L1 I cache: 32K, L1 D cache: 32K
[   84.560462] CPU: L2 cache: 256K
[   84.560463] CPU: L3 cache: 8192K
[   84.560465] CPU 3/0x16 -> Node 0
[   84.560474] CPU3: Thermal monitoring enabled (TM1)
[   84.560477] CPU 3 MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
[   84.561102] x86 PAT enabled: cpu 3, old 0x7040600070406, new 0x7010600070106
[   84.561735] CPU3: Intel(R) Xeon(R) CPU           L5520  @ 2.27GHz stepping 05
[   84.561755] Skipping synchronization checks as TSC is reliable.
[   84.561811] Booting processor 4 APIC 0x11 ip 0x6000
[   84.572159] Initializing CPU#4
[   84.632287] Calibrating delay using timer specific routine.. 4520.66 BogoMIPS (lpj=2260333)
[   84.632295] CPU: Physical Processor ID: 1
[   84.632296] CPU: Processor Core ID: 0
[   84.632298] CPU: L1 I cache: 32K, L1 D cache: 32K
[   84.632300] CPU: L2 cache: 256K
[   84.632302] CPU: L3 cache: 8192K
[   84.632304] CPU 4/0x11 -> Node 0
[   84.632314] CPU4: Thermal monitoring enabled (TM1)
[   84.632317] CPU 4 MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8
[   84.633069] x86 PAT enabled: cpu 4, old 0x7040600070406, new 0x7010600070106
[   84.633910] CPU4: Intel(R) Xeon(R) CPU           L5520  @ 2.27GHz stepping 05
[   84.633935] Skipping synchronization checks as TSC is reliable.
[   84.633995] Booting processor 5 APIC 0x13 ip 0x6000
[   84.644242] Initializing CPU#5
[   84.704123] Calibrating delay using timer specific routine.. 4520.67 BogoMIPS (lpj=2260335)
[   84.704129] CPU: Physical Processor ID: 1
[   84.704130] CPU: Processor Core ID: 1
[   84.704132] CPU: L1 I cache: 32K, L1 D cache: 32K
[   84.704134] CPU: L2 cache: 256K
[   84.704135] CPU: L3 cache: 8192K
[   84.704136] CPU 5/0x13 -> Node 0
[   84.704145] CPU5: Thermal monitoring enabled (TM1)
[   84.704148] CPU 5 MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8
[   84.704965] x86 PAT enabled: cpu 5, old 0x7040600070406, new 0x7010600070106
[   84.705929] CPU5: Intel(R) Xeon(R) CPU           L5520  @ 2.27GHz stepping 05
[   84.705950] Skipping synchronization checks as TSC is reliable.
[   84.706005] Booting processor 6 APIC 0x15 ip 0x6000
[   84.716352] Initializing CPU#6
[   84.776956] Calibrating delay using timer specific routine.. 4520.67 BogoMIPS (lpj=2260336)
[   84.776963] CPU: Physical Processor ID: 1
[   84.776964] CPU: Processor Core ID: 2
[   84.776966] CPU: L1 I cache: 32K, L1 D cache: 32K
[   84.776967] CPU: L2 cache: 256K
[   84.776968] CPU: L3 cache: 8192K
[   84.776970] CPU 6/0x15 -> Node 0
[   84.776978] CPU6: Thermal monitoring enabled (TM1)
[   84.776981] CPU 6 MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8
[   84.777797] x86 PAT enabled: cpu 6, old 0x7040600070406, new 0x7010600070106
[   84.778697] CPU6: Intel(R) Xeon(R) CPU           L5520  @ 2.27GHz stepping 05
[   84.778718] Skipping synchronization checks as TSC is reliable.
[   84.778771] Booting processor 7 APIC 0x17 ip 0x6000
[   84.789118] Initializing CPU#7
[   84.848792] Calibrating delay using timer specific routine.. 4520.67 BogoMIPS (lpj=2260336)
[   84.848798] CPU: Physical Processor ID: 1
[   84.848799] CPU: Processor Core ID: 3
[   84.848801] CPU: L1 I cache: 32K, L1 D cache: 32K
[   84.848803] CPU: L2 cache: 256K
[   84.848804] CPU: L3 cache: 8192K
[   84.848805] CPU 7/0x17 -> Node 0
[   84.848815] CPU7: Thermal monitoring enabled (TM1)
[   84.848817] CPU 7 MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8
[   84.849633] x86 PAT enabled: cpu 7, old 0x7040600070406, new 0x7010600070106
[   84.850565] CPU7: Intel(R) Xeon(R) CPU           L5520  @ 2.27GHz stepping 05
[   84.850585] Skipping synchronization checks as TSC is reliable.
[   84.850593] Brought up 8 CPUs
[   84.850595] Total of 8 processors activated (36167.08 BogoMIPS).
[   84.850860] khelper used greatest stack depth: 5952 bytes left
[   84.850921] net_namespace: 1832 bytes
[   84.851122] NET: Registered protocol family 16
[   84.851378] ACPI: bus type pci registered
[   84.851458] PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 255
[   84.851460] PCI: MCFG area at e0000000 reserved in E820
[   84.859888] PCI: Using MMCONFIG at e0000000 - efffffff
[   84.859890] PCI: Using configuration type 1 for base access
[   84.862988] bio: create slab <bio-0> at 0
[   84.863725] ACPI: EC: Look up EC in DSDT
[   84.875150] ACPI: Interpreter enabled
[   84.875152] ACPI: (supports S0 S1 S3 S4 S5)
[   84.875176] ACPI: Using IOAPIC for interrupt routing
[   84.886282] ACPI: No dock devices found.
[   84.886359] ACPI: PCI Root Bridge [PCI0] (0000:00)
[   84.886446] pci 0000:00:00.0: PME# supported from D0 D3hot D3cold
[   84.886449] pci 0000:00:00.0: PME# disabled
[   84.886499] pci 0000:00:01.0: PME# supported from D0 D3hot D3cold
[   84.886502] pci 0000:00:01.0: PME# disabled
[   84.886553] pci 0000:00:03.0: PME# supported from D0 D3hot D3cold
[   84.886556] pci 0000:00:03.0: PME# disabled
[   84.886606] pci 0000:00:05.0: PME# supported from D0 D3hot D3cold
[   84.886609] pci 0000:00:05.0: PME# disabled
[   84.886661] pci 0000:00:07.0: PME# supported from D0 D3hot D3cold
[   84.886664] pci 0000:00:07.0: PME# disabled
[   84.886714] pci 0000:00:08.0: PME# supported from D0 D3hot D3cold
[   84.886717] pci 0000:00:08.0: PME# disabled
[   84.886768] pci 0000:00:09.0: PME# supported from D0 D3hot D3cold
[   84.886771] pci 0000:00:09.0: PME# disabled
[   84.887421] pci 0000:00:16.0: reg 10 64bit mmio: [0xfaff0000-0xfaff3fff]
[   84.887476] pci 0000:00:16.1: reg 10 64bit mmio: [0xfafec000-0xfafeffff]
[   84.887531] pci 0000:00:16.2: reg 10 64bit mmio: [0xfafe8000-0xfafebfff]
[   84.887586] pci 0000:00:16.3: reg 10 64bit mmio: [0xfafe4000-0xfafe7fff]
[   84.887640] pci 0000:00:16.4: reg 10 64bit mmio: [0xfafe0000-0xfafe3fff]
[   84.887695] pci 0000:00:16.5: reg 10 64bit mmio: [0xfafdc000-0xfafdffff]
[   84.887751] pci 0000:00:16.6: reg 10 64bit mmio: [0xfafd8000-0xfafdbfff]
[   84.887806] pci 0000:00:16.7: reg 10 64bit mmio: [0xfafd4000-0xfafd7fff]
[   84.887882] pci 0000:00:1a.0: reg 20 io port: [0xa800-0xa81f]
[   84.887945] pci 0000:00:1a.1: reg 20 io port: [0xa480-0xa49f]
[   84.888008] pci 0000:00:1a.2: reg 20 io port: [0xa400-0xa41f]
[   84.888075] pci 0000:00:1a.7: reg 10 32bit mmio: [0xfaff6000-0xfaff63ff]
[   84.888126] pci 0000:00:1a.7: PME# supported from D0 D3hot D3cold
[   84.888130] pci 0000:00:1a.7: PME# disabled
[   84.888166] pci 0000:00:1b.0: reg 10 64bit mmio: [0xfaff8000-0xfaffbfff]
[   84.888204] pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold
[   84.888208] pci 0000:00:1b.0: PME# disabled
[   84.888254] pci 0000:00:1d.0: reg 20 io port: [0xb000-0xb01f]
[   84.888318] pci 0000:00:1d.1: reg 20 io port: [0xac00-0xac1f]
[   84.888382] pci 0000:00:1d.2: reg 20 io port: [0xa880-0xa89f]
[   84.888450] pci 0000:00:1d.7: reg 10 32bit mmio: [0xfaffc000-0xfaffc3ff]
[   84.888500] pci 0000:00:1d.7: PME# supported from D0 D3hot D3cold
[   84.888504] pci 0000:00:1d.7: PME# disabled
[   84.888618] pci 0000:00:1f.0: Force enabled HPET at 0xfed00000
[   84.888622] pci 0000:00:1f.0: quirk: region 0800-087f claimed by ICH6 ACPI/GPIO/TCO
[   84.888626] pci 0000:00:1f.0: quirk: region 0500-053f claimed by ICH6 GPIO
[   84.888629] pci 0000:00:1f.0: ICH7 LPC Generic IO decode 1 PIO at 164c (mask 0007)
[   84.888632] pci 0000:00:1f.0: ICH7 LPC Generic IO decode 2 PIO at 03e8 (mask 0007)
[   84.888636] pci 0000:00:1f.0: ICH7 LPC Generic IO decode 3 PIO at 0290 (mask 001f)
[   84.888639] pci 0000:00:1f.0: ICH7 LPC Generic IO decode 4 PIO at 0ca0 (mask 000f)
[   84.888686] pci 0000:00:1f.2: reg 10 io port: [0xcc00-0xcc07]
[   84.888691] pci 0000:00:1f.2: reg 14 io port: [0xc880-0xc883]
[   84.888696] pci 0000:00:1f.2: reg 18 io port: [0xc800-0xc807]
[   84.888701] pci 0000:00:1f.2: reg 1c io port: [0xc480-0xc483]
[   84.888706] pci 0000:00:1f.2: reg 20 io port: [0xc400-0xc40f]
[   84.888711] pci 0000:00:1f.2: reg 24 io port: [0xc080-0xc08f]
[   84.888755] pci 0000:00:1f.3: reg 10 64bit mmio: [0xfaffe000-0xfaffe0ff]
[   84.888767] pci 0000:00:1f.3: reg 20 io port: [0x400-0x41f]
[   84.888807] pci 0000:00:1f.5: reg 10 io port: [0xbc00-0xbc07]
[   84.888812] pci 0000:00:1f.5: reg 14 io port: [0xb880-0xb883]
[   84.888817] pci 0000:00:1f.5: reg 18 io port: [0xb800-0xb807]
[   84.888822] pci 0000:00:1f.5: reg 1c io port: [0xb480-0xb483]
[   84.888827] pci 0000:00:1f.5: reg 20 io port: [0xb400-0xb40f]
[   84.888832] pci 0000:00:1f.5: reg 24 io port: [0xb080-0xb08f]
[   84.888895] pci 0000:07:00.0: reg 10 32bit mmio: [0xfbee0000-0xfbefffff]
[   84.888901] pci 0000:07:00.0: reg 14 32bit mmio: [0xfbec0000-0xfbedffff]
[   84.888906] pci 0000:07:00.0: reg 18 io port: [0xec00-0xec1f]
[   84.888912] pci 0000:07:00.0: reg 1c 32bit mmio: [0xfbebc000-0xfbebffff]
[   84.888926] pci 0000:07:00.0: reg 30 32bit mmio: [0xfbe80000-0xfbe9ffff]
[   84.888953] pci 0000:07:00.0: PME# supported from D0 D3hot D3cold
[   84.888957] pci 0000:07:00.0: PME# disabled
[   84.889007] pci 0000:07:00.1: reg 10 32bit mmio: [0xfbe60000-0xfbe7ffff]
[   84.889013] pci 0000:07:00.1: reg 14 32bit mmio: [0xfbe40000-0xfbe5ffff]
[   84.889019] pci 0000:07:00.1: reg 18 io port: [0xe880-0xe89f]
[   84.889024] pci 0000:07:00.1: reg 1c 32bit mmio: [0xfbeb8000-0xfbebbfff]
[   84.889038] pci 0000:07:00.1: reg 30 32bit mmio: [0xfbe20000-0xfbe3ffff]
[   84.889065] pci 0000:07:00.1: PME# supported from D0 D3hot D3cold
[   84.889069] pci 0000:07:00.1: PME# disabled
[   84.889118] pci 0000:00:01.0: bridge io port: [0xe000-0xefff]
[   84.889121] pci 0000:00:01.0: bridge 32bit mmio: [0xfbe00000-0xfbefffff]
[   84.889224] pci 0000:03:00.0: reg 10 io port: [0xd000-0xd0ff]
[   84.889233] pci 0000:03:00.0: reg 14 64bit mmio: [0xfbdfc000-0xfbdfffff]
[   84.889242] pci 0000:03:00.0: reg 1c 64bit mmio: [0xfbde0000-0xfbdeffff]
[   84.889251] pci 0000:03:00.0: reg 30 32bit mmio: [0xfba00000-0xfbbfffff]
[   84.889271] pci 0000:03:00.0: supports D1 D2
[   84.889299] pci 0000:00:08.0: bridge io port: [0xd000-0xdfff]
[   84.889302] pci 0000:00:08.0: bridge 32bit mmio: [0xfba00000-0xfbdfffff]
[   84.889364] pci 0000:01:03.0: reg 10 32bit mmio: [0xf9800000-0xf9ffffff]
[   84.889370] pci 0000:01:03.0: reg 14 32bit mmio: [0xfb9fc000-0xfb9fffff]
[   84.889376] pci 0000:01:03.0: reg 18 32bit mmio: [0xfb000000-0xfb7fffff]
[   84.889448] pci 0000:00:1e.0: transparent bridge
[   84.889453] pci 0000:00:1e.0: bridge 32bit mmio: [0xfb000000-0xfb9fffff]
[   84.889459] pci 0000:00:1e.0: bridge 64bit mmio pref: [0xf9800000-0xf9ffffff]
[   84.889486] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[   84.889798] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P1._PRT]
[   84.889925] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.NPE1._PRT]
[   84.889991] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.NPE3._PRT]
[   84.890056] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.NPE5._PRT]
[   84.890121] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.NPE7._PRT]
[   84.890186] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.NPE9._PRT]
[   84.901004] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 6 7 *10 11 12 14 15)
[   84.901147] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 6 7 *10 11 12 14 15)
[   84.901285] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 6 7 10 11 12 14 *15)
[   84.901425] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 6 7 10 *11 12 14 15)
[   84.901564] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 6 7 10 11 12 14 15) *0, disabled.
[   84.901702] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 *6 7 10 11 12 14 15)
[   84.901843] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 6 *7 10 11 12 14 15)
[   84.901980] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 6 7 10 11 12 *14 15)
[   84.902198] SCSI subsystem initialized
[   84.902293] libata version 3.00 loaded.
[   84.902392] usbcore: registered new interface driver usbfs
[   84.902418] usbcore: registered new interface driver hub
[   84.902454] usbcore: registered new device driver usb
[   84.902601] PCI: Using ACPI for IRQ routing
[   84.906718] NetLabel: Initializing
[   84.906720] NetLabel:  domain hash size = 128
[   84.906722] NetLabel:  protocols = UNLABELED CIPSOv4
[   84.906736] NetLabel:  unlabeled traffic allowed by default
[   84.906875] hpet clockevent registered
[   84.906879] HPET: 4 timers in total, 0 timers will be used for per-cpu timer
[   84.906883] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0
[   84.906887] hpet0: 4 comparators, 64-bit 14.318180 MHz counter
[   84.914709] Switched to high resolution mode on CPU 0
[   84.914986] Switched to high resolution mode on CPU 1
[   84.914989] Switched to high resolution mode on CPU 3
[   84.914992] Switched to high resolution mode on CPU 2
[   84.915349] Switched to high resolution mode on CPU 4
[   84.915434] Switched to high resolution mode on CPU 7
[   84.915437] Switched to high resolution mode on CPU 5
[   84.915439] Switched to high resolution mode on CPU 6
[   84.918707] pnp: PnP ACPI init
[   84.918716] ACPI: bus type pnp registered
[   84.923466] pnp: PnP ACPI: found 15 devices
[   84.923468] ACPI: ACPI bus type pnp unregistered
[   84.923476] system 00:01: iomem range 0xfbf00000-0xfbffffff has been reserved
[   84.923478] system 00:01: iomem range 0xfc000000-0xfcffffff has been reserved
[   84.923481] system 00:01: iomem range 0xfd000000-0xfdffffff has been reserved
[   84.923483] system 00:01: iomem range 0xfe000000-0xfebfffff has been reserved
[   84.923489] system 00:09: ioport range 0x164e-0x164f has been reserved
[   84.923493] system 00:0a: ioport range 0x680-0x6ff has been reserved
[   84.923495] system 00:0a: ioport range 0x295-0x296 has been reserved
[   84.923500] system 00:0b: ioport range 0x4d0-0x4d1 has been reserved
[   84.923502] system 00:0b: ioport range 0x800-0x87f has been reserved
[   84.923505] system 00:0b: ioport range 0x500-0x57f could not be reserved
[   84.923507] system 00:0b: iomem range 0xfed1c000-0xfed1ffff has been reserved
[   84.923510] system 00:0b: iomem range 0xfed20000-0xfed3ffff has been reserved
[   84.923512] system 00:0b: iomem range 0xfed40000-0xfed8ffff has been reserved
[   84.923517] system 00:0c: iomem range 0xfec00000-0xfec00fff could not be reserved
[   84.923519] system 00:0c: iomem range 0xfee00000-0xfee00fff has been reserved
[   84.923523] system 00:0d: iomem range 0xe0000000-0xefffffff has been reserved
[   84.923528] system 00:0e: iomem range 0x0-0x9ffff could not be reserved
[   84.923530] system 00:0e: iomem range 0xc0000-0xcffff has been reserved
[   84.923532] system 00:0e: iomem range 0xe0000-0xfffff could not be reserved
[   84.923535] system 00:0e: iomem range 0x100000-0xbf8fffff could not be reserved
[   84.923537] system 00:0e: iomem range 0xfed90000-0xffffffff could not be reserved
[   84.928413] pci 0000:00:01.0: PCI bridge, secondary bus 0000:07
[   84.928416] pci 0000:00:01.0:   IO window: 0xe000-0xefff
[   84.928419] pci 0000:00:01.0:   MEM window: 0xfbe00000-0xfbefffff
[   84.928422] pci 0000:00:01.0:   PREFETCH window: disabled
[   84.928427] pci 0000:00:03.0: PCI bridge, secondary bus 0000:06
[   84.928428] pci 0000:00:03.0:   IO window: disabled
[   84.928432] pci 0000:00:03.0:   MEM window: disabled
[   84.928434] pci 0000:00:03.0:   PREFETCH window: disabled
[   84.928439] pci 0000:00:05.0: PCI bridge, secondary bus 0000:05
[   84.928440] pci 0000:00:05.0:   IO window: disabled
[   84.928443] pci 0000:00:05.0:   MEM window: disabled
[   84.928446] pci 0000:00:05.0:   PREFETCH window: disabled
[   84.928450] pci 0000:00:07.0: PCI bridge, secondary bus 0000:04
[   84.928452] pci 0000:00:07.0:   IO window: disabled
[   84.928455] pci 0000:00:07.0:   MEM window: disabled
[   84.928458] pci 0000:00:07.0:   PREFETCH window: disabled
[   84.928462] pci 0000:00:08.0: PCI bridge, secondary bus 0000:03
[   84.928465] pci 0000:00:08.0:   IO window: 0xd000-0xdfff
[   84.928468] pci 0000:00:08.0:   MEM window: 0xfba00000-0xfbdfffff
[   84.928471] pci 0000:00:08.0:   PREFETCH window: disabled
[   84.928475] pci 0000:00:09.0: PCI bridge, secondary bus 0000:02
[   84.928477] pci 0000:00:09.0:   IO window: disabled
[   84.928480] pci 0000:00:09.0:   MEM window: disabled
[   84.928483] pci 0000:00:09.0:   PREFETCH window: disabled
[   84.928488] pci 0000:00:1e.0: PCI bridge, secondary bus 0000:01
[   84.928489] pci 0000:00:1e.0:   IO window: disabled
[   84.928493] pci 0000:00:1e.0:   MEM window: 0xfb000000-0xfb9fffff
[   84.928497] pci 0000:00:1e.0:   PREFETCH window: 0x000000f9800000-0x000000f9ffffff
[   84.928507] pci 0000:00:01.0: setting latency timer to 64
[   84.928513] pci 0000:00:03.0: setting latency timer to 64
[   84.928519] pci 0000:00:05.0: setting latency timer to 64
[   84.928525] pci 0000:00:07.0: setting latency timer to 64
[   84.928530] pci 0000:00:08.0: setting latency timer to 64
[   84.928536] pci 0000:00:09.0: setting latency timer to 64
[   84.928541] pci 0000:00:1e.0: setting latency timer to 64
[   84.928545] pci_bus 0000:00: resource 0 io:  [0x00-0xffff]
[   84.928547] pci_bus 0000:00: resource 1 mem: [0x000000-0xffffffffffffffff]
[   84.928549] pci_bus 0000:07: resource 0 io:  [0xe000-0xefff]
[   84.928551] pci_bus 0000:07: resource 1 mem: [0xfbe00000-0xfbefffff]
[   84.928553] pci_bus 0000:03: resource 0 io:  [0xd000-0xdfff]
[   84.928555] pci_bus 0000:03: resource 1 mem: [0xfba00000-0xfbdfffff]
[   84.928557] pci_bus 0000:01: resource 1 mem: [0xfb000000-0xfb9fffff]
[   84.928559] pci_bus 0000:01: resource 2 pref mem [0xf9800000-0xf9ffffff]
[   84.928561] pci_bus 0000:01: resource 3 io:  [0x00-0xffff]
[   84.928563] pci_bus 0000:01: resource 4 mem: [0x000000-0xffffffffffffffff]
[   84.928585] NET: Registered protocol family 2
[   84.934998] IP route cache hash table entries: 524288 (order: 10, 4194304 bytes)
[   84.936092] TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
[   84.937542] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[   84.937697] TCP: Hash tables configured (established 524288 bind 65536)
[   84.937699] TCP reno registered
[   84.940771] NET: Registered protocol family 1
[   84.942882] Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
[   84.942884] Scanning for low memory corruption every 60 seconds
[   84.950334] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[   84.952092] VFS: Disk quotas dquot_6.5.2
[   84.952152] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[   84.952558] msgmni has been set to 32768
[   84.952860] alg: No test for stdrng (krng)
[   84.952867] io scheduler noop registered
[   84.952869] io scheduler anticipatory registered
[   84.952870] io scheduler deadline registered
[   84.952917] io scheduler cfq registered (default)
[   84.952957] pci 0000:00:1a.0: uhci_check_and_reset_hc: legsup = 0x0f30
[   84.952959] pci 0000:00:1a.0: Performing full reset
[   84.952973] pci 0000:00:1a.1: uhci_check_and_reset_hc: legsup = 0x0030
[   84.952975] pci 0000:00:1a.1: Performing full reset
[   84.952989] pci 0000:00:1a.2: uhci_check_and_reset_hc: legsup = 0x0030
[   84.952990] pci 0000:00:1a.2: Performing full reset
[   84.953023] pci 0000:00:1d.0: uhci_check_and_reset_hc: legsup = 0x0f30
[   84.953025] pci 0000:00:1d.0: Performing full reset
[   84.953039] pci 0000:00:1d.1: uhci_check_and_reset_hc: legsup = 0x0030
[   84.953040] pci 0000:00:1d.1: Performing full reset
[   84.953054] pci 0000:00:1d.2: uhci_check_and_reset_hc: legsup = 0x0030
[   84.953055] pci 0000:00:1d.2: Performing full reset
[   84.953094] pci 0000:01:03.0: Boot video device
[   84.953246]   alloc irq_desc for 24 on cpu 0 node 0
[   84.953248]   alloc kstat_irqs on cpu 0 node 0
[   84.953255] pcieport-driver 0000:00:01.0: irq 24 for MSI/MSI-X
[   84.953264] pcieport-driver 0000:00:01.0: setting latency timer to 64
[   84.953407]   alloc irq_desc for 25 on cpu 0 node 0
[   84.953408]   alloc kstat_irqs on cpu 0 node 0
[   84.953414] pcieport-driver 0000:00:03.0: irq 25 for MSI/MSI-X
[   84.953423] pcieport-driver 0000:00:03.0: setting latency timer to 64
[   84.953560]   alloc irq_desc for 26 on cpu 0 node 0
[   84.953561]   alloc kstat_irqs on cpu 0 node 0
[   84.953567] pcieport-driver 0000:00:05.0: irq 26 for MSI/MSI-X
[   84.953576] pcieport-driver 0000:00:05.0: setting latency timer to 64
[   84.953716]   alloc irq_desc for 27 on cpu 0 node 0
[   84.953718]   alloc kstat_irqs on cpu 0 node 0
[   84.953723] pcieport-driver 0000:00:07.0: irq 27 for MSI/MSI-X
[   84.953732] pcieport-driver 0000:00:07.0: setting latency timer to 64
[   84.953875]   alloc irq_desc for 28 on cpu 0 node 0
[   84.953877]   alloc kstat_irqs on cpu 0 node 0
[   84.953882] pcieport-driver 0000:00:08.0: irq 28 for MSI/MSI-X
[   84.953891] pcieport-driver 0000:00:08.0: setting latency timer to 64
[   84.954029]   alloc irq_desc for 29 on cpu 0 node 0
[   84.954030]   alloc kstat_irqs on cpu 0 node 0
[   84.954035] pcieport-driver 0000:00:09.0: irq 29 for MSI/MSI-X
[   84.954045] pcieport-driver 0000:00:09.0: setting latency timer to 64
[   84.954158] aer 0000:00:01.0:pcie02: AER service couldn't init device: no _OSC support
[   84.954165] aer 0000:00:03.0:pcie02: AER service couldn't init device: no _OSC support
[   84.954169] aer 0000:00:05.0:pcie02: AER service couldn't init device: no _OSC support
[   84.954173] aer 0000:00:07.0:pcie02: AER service couldn't init device: no _OSC support
[   84.954178] aer 0000:00:08.0:pcie02: AER service couldn't init device: no _OSC support
[   84.954182] aer 0000:00:09.0:pcie02: AER service couldn't init device: no _OSC support
[   84.954202] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[   84.954375] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[   84.954378] ACPI: Power Button [PWRF]
[   84.954439] input: Power Button as /devices/LNXSYSTM:00/device:00/PNP0C0C:00/input/input1
[   84.954444] ACPI: Power Button [PWRB]
[   84.955031] ACPI: SSDT 00000000bf79e1f0 0033D (v01 DpgPmm  P001Ist 00000011 INTL 20051117)
[   84.956383] processor ACPI_CPU:00: registered as cooling_device0
[   84.956842] ACPI: SSDT 00000000bf79e530 0033D (v01 DpgPmm  P002Ist 00000012 INTL 20051117)
[   84.958186] processor ACPI_CPU:01: registered as cooling_device1
[   84.958653] ACPI: SSDT 00000000bf79e870 0033D (v01 DpgPmm  P003Ist 00000012 INTL 20051117)
[   84.959993] processor ACPI_CPU:02: registered as cooling_device2
[   84.960456] ACPI: SSDT 00000000bf79ebb0 0033D (v01 DpgPmm  P004Ist 00000012 INTL 20051117)
[   84.961798] processor ACPI_CPU:03: registered as cooling_device3
[   84.962260] ACPI: SSDT 00000000bf79eef0 0033D (v01 DpgPmm  P005Ist 00000012 INTL 20051117)
[   84.963601] processor ACPI_CPU:04: registered as cooling_device4
[   84.964063] ACPI: SSDT 00000000bf79f230 0033D (v01 DpgPmm  P006Ist 00000012 INTL 20051117)
[   84.965406] processor ACPI_CPU:05: registered as cooling_device5
[   84.965879] ACPI: SSDT 00000000bf79f570 0033D (v01 DpgPmm  P007Ist 00000012 INTL 20051117)
[   84.967221] processor ACPI_CPU:06: registered as cooling_device6
[   84.967689] ACPI: SSDT 00000000bf79f8b0 0033D (v01 DpgPmm  P008Ist 00000012 INTL 20051117)
[   84.969031] processor ACPI_CPU:07: registered as cooling_device7
[   84.973807] Non-volatile memory driver v1.3
[   84.973809] Linux agpgart interface v0.103
[   84.973870] [drm] Initialized drm 1.1.0 20060810
[   84.973903] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[   85.217227] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[   85.460763] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
[   85.704300] serial8250: ttyS2 at I/O 0x3e8 (irq = 5) is a 16550A
[   85.704545] 00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[   85.704680] 00:07: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
[   85.704815] 00:08: ttyS2 at I/O 0x3e8 (irq = 5) is a 16550A
[   85.705139] Driver 'sd' needs updating - please use bus_type methods
[   85.705161] Driver 'sr' needs updating - please use bus_type methods
[   85.705280] ata_piix 0000:00:1f.2: version 2.13
[   85.705289]   alloc irq_desc for 19 on cpu 0 node 0
[   85.705290]   alloc kstat_irqs on cpu 0 node 0
[   85.705295] ata_piix 0000:00:1f.2: PCI INT B -> GSI 19 (level, low) -> IRQ 19
[   85.705299] ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
[   85.705327] ata_piix 0000:00:1f.2: setting latency timer to 64
[   85.705367] scsi0 : ata_piix
[   85.705448] scsi1 : ata_piix
[   85.706892] ata1: SATA max UDMA/133 cmd 0xcc00 ctl 0xc880 bmdma 0xc400 irq 19
[   85.706896] ata2: SATA max UDMA/133 cmd 0xc800 ctl 0xc480 bmdma 0xc408 irq 19
[   85.706903] work_for_cpu used greatest stack depth: 5120 bytes left
[   85.706939] ata_piix 0000:00:1f.5: PCI INT B -> GSI 19 (level, low) -> IRQ 19
[   85.706943] ata_piix 0000:00:1f.5: MAP [ P0 -- P1 -- ]
[   85.706967] ata_piix 0000:00:1f.5: setting latency timer to 64
[   85.706991] scsi2 : ata_piix
[   85.707059] scsi3 : ata_piix
[   85.708352] ata3: SATA max UDMA/133 cmd 0xbc00 ctl 0xb880 bmdma 0xb400 irq 19
[   85.708355] ata4: SATA max UDMA/133 cmd 0xb800 ctl 0xb480 bmdma 0xb408 irq 19
[   85.708491] Intel(R) Gigabit Ethernet Network Driver - version 1.3.16-k2
[   85.708493] Copyright (c) 2007-2009 Intel Corporation.
[   85.708520]   alloc irq_desc for 16 on cpu 0 node 0
[   85.708522]   alloc kstat_irqs on cpu 0 node 0
[   85.708526] igb 0000:07:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[   85.708543] igb 0000:07:00.0: setting latency timer to 64
[   85.708777]   alloc irq_desc for 30 on cpu 0 node 0
[   85.708779]   alloc kstat_irqs on cpu 0 node 0
[   85.708782] igb 0000:07:00.0: irq 30 for MSI/MSI-X
[   85.708783]   alloc irq_desc for 31 on cpu 0 node 0
[   85.708785]   alloc kstat_irqs on cpu 0 node 0
[   85.708787] igb 0000:07:00.0: irq 31 for MSI/MSI-X
[   85.708789]   alloc irq_desc for 32 on cpu 0 node 0
[   85.708790]   alloc kstat_irqs on cpu 0 node 0
[   85.708792] igb 0000:07:00.0: irq 32 for MSI/MSI-X
[   85.708794]   alloc irq_desc for 33 on cpu 0 node 0
[   85.708795]   alloc kstat_irqs on cpu 0 node 0
[   85.708798] igb 0000:07:00.0: irq 33 for MSI/MSI-X
[   85.708799]   alloc irq_desc for 34 on cpu 0 node 0
[   85.708801]   alloc kstat_irqs on cpu 0 node 0
[   85.708803] igb 0000:07:00.0: irq 34 for MSI/MSI-X
[   85.708805]   alloc irq_desc for 35 on cpu 0 node 0
[   85.708806]   alloc kstat_irqs on cpu 0 node 0
[   85.708808] igb 0000:07:00.0: irq 35 for MSI/MSI-X
[   85.708810]   alloc irq_desc for 36 on cpu 0 node 0
[   85.708812]   alloc kstat_irqs on cpu 0 node 0
[   85.708814] igb 0000:07:00.0: irq 36 for MSI/MSI-X
[   85.708815]   alloc irq_desc for 37 on cpu 0 node 0
[   85.708817]   alloc kstat_irqs on cpu 0 node 0
[   85.708819] igb 0000:07:00.0: irq 37 for MSI/MSI-X
[   85.708821]   alloc irq_desc for 38 on cpu 0 node 0
[   85.708822]   alloc kstat_irqs on cpu 0 node 0
[   85.708824] igb 0000:07:00.0: irq 38 for MSI/MSI-X
[   85.880267] igb 0000:07:00.0: Intel(R) Gigabit Ethernet Network Connection
[   85.880270] igb 0000:07:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:30:48:c8:61:b4
[   85.880346] igb 0000:07:00.0: eth0: PBA No: 0100ff-0ff
[   85.880348] igb 0000:07:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
[   85.880377]   alloc irq_desc for 17 on cpu 0 node 0
[   85.880379]   alloc kstat_irqs on cpu 0 node 0
[   85.880383] igb 0000:07:00.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
[   85.880400] igb 0000:07:00.1: setting latency timer to 64
[   85.880622]   alloc irq_desc for 39 on cpu 0 node 0
[   85.880624]   alloc kstat_irqs on cpu 0 node 0
[   85.880626] igb 0000:07:00.1: irq 39 for MSI/MSI-X
[   85.880628]   alloc irq_desc for 40 on cpu 0 node 0
[   85.880629]   alloc kstat_irqs on cpu 0 node 0
[   85.880632] igb 0000:07:00.1: irq 40 for MSI/MSI-X
[   85.880633]   alloc irq_desc for 41 on cpu 0 node 0
[   85.880635]   alloc kstat_irqs on cpu 0 node 0
[   85.880637] igb 0000:07:00.1: irq 41 for MSI/MSI-X
[   85.880639]   alloc irq_desc for 42 on cpu 0 node 0
[   85.880640]   alloc kstat_irqs on cpu 0 node 0
[   85.880642] igb 0000:07:00.1: irq 42 for MSI/MSI-X
[   85.880644]   alloc irq_desc for 43 on cpu 0 node 0
[   85.880646]   alloc kstat_irqs on cpu 0 node 0
[   85.880648] igb 0000:07:00.1: irq 43 for MSI/MSI-X
[   85.880650]   alloc irq_desc for 44 on cpu 0 node 0
[   85.880651]   alloc kstat_irqs on cpu 0 node 0
[   85.880653] igb 0000:07:00.1: irq 44 for MSI/MSI-X
[   85.880655]   alloc irq_desc for 45 on cpu 0 node 0
[   85.880656]   alloc kstat_irqs on cpu 0 node 0
[   85.880659] igb 0000:07:00.1: irq 45 for MSI/MSI-X
[   85.880660]   alloc irq_desc for 46 on cpu 0 node 0
[   85.880662]   alloc kstat_irqs on cpu 0 node 0
[   85.880664] igb 0000:07:00.1: irq 46 for MSI/MSI-X
[   85.880666]   alloc irq_desc for 47 on cpu 0 node 0
[   85.880667]   alloc kstat_irqs on cpu 0 node 0
[   85.880669] igb 0000:07:00.1: irq 47 for MSI/MSI-X
[   86.023331] ata3: SATA link down (SStatus 0 SControl 300)
[   86.038966] igb 0000:07:00.1: Intel(R) Gigabit Ethernet Network Connection
[   86.038969] igb 0000:07:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:30:48:c8:61:b5
[   86.039045] igb 0000:07:00.1: eth1: PBA No: 0100ff-0ff
[   86.039047] igb 0000:07:00.1: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
[   86.039080] Fusion MPT base driver 3.04.07
[   86.039082] Copyright (c) 1999-2008 LSI Corporation
[   86.039087] Fusion MPT SAS Host driver 3.04.07
[   86.039117] mptsas 0000:03:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[   86.039142] mptbase: ioc0: Initiating bringup
[   86.316043] ioc0: LSISAS1068E B3: Capabilities={Initiator}
[   86.316060] mptsas 0000:03:00.0: setting latency timer to 64
[   86.475797] ata1.00: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   86.475808] ata1.01: SATA link down (SStatus 0 SControl 300)
[   86.494922] ata1.00: ATAPI: DV-28S-V, 1.0A, max UDMA/100
[   86.532861] ata1.00: configured for UDMA/100
[   86.582976] scsi 0:0:0:0: CD-ROM            TEAC     DV-28S-V         1.0A PQ: 0 ANSI: 5
[   86.591721] sr0: scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray
[   86.591724] Uniform CD-ROM driver Revision: 3.20
[   86.591837] sr 0:0:0:0: Attached scsi CD-ROM sr0
[   86.591899] sr 0:0:0:0: Attached scsi generic sg0 type 5
[   87.211097] ata2.00: SATA link down (SStatus 0 SControl 300)
[   87.211105] ata2.01: SATA link down (SStatus 0 SControl 300)
[   87.525474] ata4: SATA link down (SStatus 0 SControl 300)
[   87.589626] async/0 used greatest stack depth: 4992 bytes left
[   98.422283] scsi4 : ioc0: LSISAS1068E B3, FwRev=011a0000h, Ports=1, MaxQ=478, IRQ=16
[   98.452638] scsi 4:0:0:0: Direct-Access     ATA      WDC WD2002FYPS-0 5G04 PQ: 0 ANSI: 5
[   98.453126] sd 4:0:0:0: Attached scsi generic sg1 type 0
[   98.454592] sd 4:0:0:0: [sda] 3907029168 512-byte hardware sectors: (2.00 TB/1.81 TiB)
[   98.460108] scsi 4:0:1:0: Direct-Access     ATA      WDC WD2002FYPS-0 5G04 PQ: 0 ANSI: 5
[   98.460464] sd 4:0:0:0: [sda] Write Protect is off
[   98.460467] sd 4:0:0:0: [sda] Mode Sense: 73 00 00 08
[   98.460934] sd 4:0:1:0: Attached scsi generic sg2 type 0
[   98.463744] work_for_cpu used greatest stack depth: 4352 bytes left
[   98.463789] Fusion MPT misc device (ioctl) driver 3.04.07
[   98.463817] mptctl: Registered with Fusion MPT base driver
[   98.463818] mptctl: /dev/mptctl @ (major,minor=10,220)
[   98.463905] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[   98.463908] ehci_hcd: block sizes: qh 160 qtd 96 itd 192 sitd 96
[   98.463935]   alloc irq_desc for 18 on cpu 0 node 0
[   98.463937]   alloc kstat_irqs on cpu 0 node 0
[   98.463941] ehci_hcd 0000:00:1a.7: PCI INT C -> GSI 18 (level, low) -> IRQ 18
[   98.463952] ehci_hcd 0000:00:1a.7: setting latency timer to 64
[   98.463955] ehci_hcd 0000:00:1a.7: EHCI Host Controller
[   98.463994] drivers/usb/core/inode.c: creating file 'devices'
[   98.464004] drivers/usb/core/inode.c: creating file '001'
[   98.464028] sd 4:0:1:0: [sdb] 3907029168 512-byte hardware sectors: (2.00 TB/1.81 TiB)
[   98.464039] ehci_hcd 0000:00:1a.7: new USB bus registered, assigned bus number 1
[   98.464046] ehci_hcd 0000:00:1a.7: reset hcs_params 0x103206 dbg=1 cc=3 pcc=2 ordered !ppc ports=6
[   98.464049] ehci_hcd 0000:00:1a.7: reset hcc_params 16871 thresh 7 uframes 1024 64 bit addr
[   98.464062] ehci_hcd 0000:00:1a.7: reset command 080002 (park)=0 ithresh=8 period=1024 Reset HALT
[   98.467956] ehci_hcd 0000:00:1a.7: debug port 1
[   98.467961] ehci_hcd 0000:00:1a.7: cache line size of 32 is not supported
[   98.467963] ehci_hcd 0000:00:1a.7: supports USB remote wakeup
[   98.467973] ehci_hcd 0000:00:1a.7: irq 18, io mem 0xfaff6000
[   98.467977] ehci_hcd 0000:00:1a.7: reset command 080002 (park)=0 ithresh=8 period=1024 Reset HALT
[   98.470980] sd 4:0:1:0: [sdb] Write Protect is off
[   98.470982] sd 4:0:1:0: [sdb] Mode Sense: 73 00 00 08
[   98.471847] ehci_hcd 0000:00:1a.7: init command 010001 (park)=0 ithresh=1 period=1024 RUN
[   98.472001] sd 4:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   98.477215] sd 4:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   98.477940] ehci_hcd 0000:00:1a.7: USB 2.0 started, EHCI 1.00
[   98.477971] usb usb1: default language 0x0409
[   98.477978] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[   98.477981] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   98.477986] usb usb1: Product: EHCI Host Controller
[   98.477987] usb usb1: Manufacturer: Linux 2.6.30.4 ehci_hcd
[   98.477989] usb usb1: SerialNumber: 0000:00:1a.7
[   98.478019] usb usb1: uevent
[   98.478045] usb usb1: usb_probe_device
[   98.478048] usb usb1: configuration #1 chosen from 1 choice
[   98.478053] usb usb1: adding 1-0:1.0 (config #1, interface 0)
[   98.478064] usb 1-0:1.0: uevent
[   98.478087] hub 1-0:1.0: usb_probe_interface
[   98.478089] hub 1-0:1.0: usb_probe_interface - got id
[   98.478090] hub 1-0:1.0: USB hub found
[   98.478095] hub 1-0:1.0: 6 ports detected
[   98.478097] hub 1-0:1.0: standalone hub
[   98.478098] hub 1-0:1.0: no power switching (usb 1.0)
[   98.478099] hub 1-0:1.0: individual port over-current protection
[   98.478101] hub 1-0:1.0: power on to power good time: 20ms
[   98.478104] hub 1-0:1.0: local power source is good
[   98.478105] hub 1-0:1.0: trying to enable port power on non-switchable hub
[   98.478188] drivers/usb/core/inode.c: creating file '001'
[   98.478254]   alloc irq_desc for 23 on cpu 0 node 0
[   98.478256]   alloc kstat_irqs on cpu 0 node 0
[   98.478260] ehci_hcd 0000:00:1d.7: PCI INT A -> GSI 23 (level, low) -> IRQ 23
[   98.478270] ehci_hcd 0000:00:1d.7: setting latency timer to 64
[   98.478273] ehci_hcd 0000:00:1d.7: EHCI Host Controller
[   98.478302] drivers/usb/core/inode.c: creating file '002'
[   98.478341] ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 2
[   98.478347] ehci_hcd 0000:00:1d.7: reset hcs_params 0x103206 dbg=1 cc=3 pcc=2 ordered !ppc ports=6
[   98.478350] ehci_hcd 0000:00:1d.7: reset hcc_params 16871 thresh 7 uframes 1024 64 bit addr
[   98.478363] ehci_hcd 0000:00:1d.7: reset command 080002 (park)=0 ithresh=8 period=1024 Reset HALT
[   98.482239] ehci_hcd 0000:00:1d.7: debug port 1
[   98.482244] ehci_hcd 0000:00:1d.7: cache line size of 32 is not supported
[   98.482246] ehci_hcd 0000:00:1d.7: supports USB remote wakeup
[   98.482256] ehci_hcd 0000:00:1d.7: irq 23, io mem 0xfaffc000
[   98.482260] ehci_hcd 0000:00:1d.7: reset command 080002 (park)=0 ithresh=8 period=1024 Reset HALT
[   98.486132] ehci_hcd 0000:00:1d.7: init command 010001 (park)=0 ithresh=1 period=1024 RUN
[   98.487659]  sda:<6> sdb:<6>ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00
[   98.492924] usb usb2: default language 0x0409
[   98.492931] usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
[   98.492934] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   98.492936] usb usb2: Product: EHCI Host Controller
[   98.492938] usb usb2: Manufacturer: Linux 2.6.30.4 ehci_hcd
[   98.492941] usb usb2: SerialNumber: 0000:00:1d.7
[   98.492976] usb usb2: uevent
[   98.493000] usb usb2: usb_probe_device
[   98.493002] usb usb2: configuration #1 chosen from 1 choice
[   98.493006] usb usb2: adding 2-0:1.0 (config #1, interface 0)
[   98.493018] usb 2-0:1.0: uevent
[   98.493040] hub 2-0:1.0: usb_probe_interface
[   98.493042] hub 2-0:1.0: usb_probe_interface - got id
[   98.493043] hub 2-0:1.0: USB hub found
[   98.493047] hub 2-0:1.0: 6 ports detected
[   98.493049] hub 2-0:1.0: standalone hub
[   98.493050] hub 2-0:1.0: no power switching (usb 1.0)
[   98.493051] hub 2-0:1.0: individual port over-current protection
[   98.493053] hub 2-0:1.0: power on to power good time: 20ms
[   98.493055] hub 2-0:1.0: local power source is good
[   98.493057] hub 2-0:1.0: trying to enable port power on non-switchable hub
[   98.493111] drivers/usb/core/inode.c: creating file '001'
[   98.493179] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[   98.493181] ohci_hcd: block sizes: ed 80 td 96
[   98.493209] uhci_hcd: USB Universal Host Controller Interface driver
[   98.493277] uhci_hcd 0000:00:1a.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[   98.493282] uhci_hcd 0000:00:1a.0: setting latency timer to 64
[   98.493284] uhci_hcd 0000:00:1a.0: UHCI Host Controller
[   98.493314] drivers/usb/core/inode.c: creating file '003'
[   98.493349] uhci_hcd 0000:00:1a.0: new USB bus registered, assigned bus number 3
[   98.493354] uhci_hcd 0000:00:1a.0: detected 2 ports
[   98.493358] uhci_hcd 0000:00:1a.0: uhci_check_and_reset_hc: cmd = 0x0000
[   98.493360] uhci_hcd 0000:00:1a.0: Performing full reset
[   98.493372] uhci_hcd 0000:00:1a.0: supports USB remote wakeup
[   98.493377] uhci_hcd 0000:00:1a.0: irq 16, io base 0x0000a800
[   98.493399] usb usb3: default language 0x0409
[   98.493404] usb usb3: New USB device found, idVendor=1d6b, idProduct=0001
[   98.493406] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   98.493408] usb usb3: Product: UHCI Host Controller
[   98.493409] usb usb3: Manufacturer: Linux 2.6.30.4 uhci_hcd
[   98.493411] usb usb3: SerialNumber: 0000:00:1a.0
[   98.493442] usb usb3: uevent
[   98.493463] usb usb3: usb_probe_device
[   98.493465] usb usb3: configuration #1 chosen from 1 choice
[   98.493472] usb usb3: adding 3-0:1.0 (config #1, interface 0)
[   98.493483] usb 3-0:1.0: uevent
[   98.493506] hub 3-0:1.0: usb_probe_interface
[   98.493508] hub 3-0:1.0: usb_probe_interface - got id
[   98.493509] hub 3-0:1.0: USB hub found
[   98.493514] hub 3-0:1.0: 2 ports detected
[   98.493515] hub 3-0:1.0: standalone hub
[   98.493516] hub 3-0:1.0: no power switching (usb 1.0)
[   98.493517] hub 3-0:1.0: individual port over-current protection
[   98.493519] hub 3-0:1.0: power on to power good time: 2ms
[   98.493522] hub 3-0:1.0: local power source is good
[   98.493523] hub 3-0:1.0: trying to enable port power on non-switchable hub
[   98.493561] drivers/usb/core/inode.c: creating file '001'
[   98.493628]   alloc irq_desc for 21 on cpu 0 node 0
[   98.493630]   alloc kstat_irqs on cpu 0 node 0
[   98.493634] uhci_hcd 0000:00:1a.1: PCI INT B -> GSI 21 (level, low) -> IRQ 21
[   98.493639] uhci_hcd 0000:00:1a.1: setting latency timer to 64
[   98.493642] uhci_hcd 0000:00:1a.1: UHCI Host Controller
[   98.493673] drivers/usb/core/inode.c: creating file '004'
[   98.493713] uhci_hcd 0000:00:1a.1: new USB bus registered, assigned bus number 4
[   98.493719] uhci_hcd 0000:00:1a.1: detected 2 ports
[   98.493722] uhci_hcd 0000:00:1a.1: uhci_check_and_reset_hc: cmd = 0x0000
[   98.493724] uhci_hcd 0000:00:1a.1: Performing full reset
[   98.493736] uhci_hcd 0000:00:1a.1: supports USB remote wakeup
[   98.493746] uhci_hcd 0000:00:1a.1: irq 21, io base 0x0000a480
[   98.493771] usb usb4: default language 0x0409
[   98.493775] usb usb4: New USB device found, idVendor=1d6b, idProduct=0001
[   98.493777] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   98.493779] usb usb4: Product: UHCI Host Controller
[   98.493781] usb usb4: Manufacturer: Linux 2.6.30.4 uhci_hcd
[   98.493782] usb usb4: SerialNumber: 0000:00:1a.1
[   98.493808] usb usb4: uevent
[   98.493829] usb usb4: usb_probe_device
[   98.493831] usb usb4: configuration #1 chosen from 1 choice
[   98.493836] usb usb4: adding 4-0:1.0 (config #1, interface 0)
[   98.493848] usb 4-0:1.0: uevent
[   98.493870] hub 4-0:1.0: usb_probe_interface
[   98.493872] hub 4-0:1.0: usb_probe_interface - got id
[   98.493873] hub 4-0:1.0: USB hub found
[   98.493877] hub 4-0:1.0: 2 ports detected
[   98.493879] hub 4-0:1.0: standalone hub
[   98.493880] hub 4-0:1.0: no power switching (usb 1.0)
[   98.493881] hub 4-0:1.0: individual port over-current protection
[   98.493883] hub 4-0:1.0: power on to power good time: 2ms
[   98.493886] hub 4-0:1.0: local power source is good
[   98.493887] hub 4-0:1.0: trying to enable port power on non-switchable hub
[   98.493929] drivers/usb/core/inode.c: creating file '001'
[   98.494000] uhci_hcd 0000:00:1a.2: PCI INT D -> GSI 19 (level, low) -> IRQ 19
[   98.494005] uhci_hcd 0000:00:1a.2: setting latency timer to 64
[   98.494007] uhci_hcd 0000:00:1a.2: UHCI Host Controller
[   98.494036] drivers/usb/core/inode.c: creating file '005'
[   98.494072] uhci_hcd 0000:00:1a.2: new USB bus registered, assigned bus number 5
[   98.494078] uhci_hcd 0000:00:1a.2: detected 2 ports
[   98.494082] uhci_hcd 0000:00:1a.2: uhci_check_and_reset_hc: cmd = 0x0000
[   98.494083] uhci_hcd 0000:00:1a.2: Performing full reset
[   98.494096] uhci_hcd 0000:00:1a.2: supports USB remote wakeup
[   98.494099] uhci_hcd 0000:00:1a.2: irq 19, io base 0x0000a400
[   98.494121] usb usb5: default language 0x0409
[   98.494126] usb usb5: New USB device found, idVendor=1d6b, idProduct=0001
[   98.494128] usb usb5: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   98.494130] usb usb5: Product: UHCI Host Controller
[   98.494131] usb usb5: Manufacturer: Linux 2.6.30.4 uhci_hcd
[   98.494133] usb usb5: SerialNumber: 0000:00:1a.2
[   98.494160] usb usb5: uevent
[   98.494182] usb usb5: usb_probe_device
[   98.494184] usb usb5: configuration #1 chosen from 1 choice
[   98.494189] usb usb5: adding 5-0:1.0 (config #1, interface 0)
[   98.494199] usb 5-0:1.0: uevent
[   98.494221] hub 5-0:1.0: usb_probe_interface
[   98.494223] hub 5-0:1.0: usb_probe_interface - got id
[   98.494224] hub 5-0:1.0: USB hub found
[   98.494228] hub 5-0:1.0: 2 ports detected
[   98.494229] hub 5-0:1.0: standalone hub
[   98.494231] hub 5-0:1.0: no power switching (usb 1.0)
[   98.494232] hub 5-0:1.0: individual port over-current protection
[   98.494233] hub 5-0:1.0: power on to power good time: 2ms
[   98.494236] hub 5-0:1.0: local power source is good
[   98.494238] hub 5-0:1.0: trying to enable port power on non-switchable hub
[   98.494275] drivers/usb/core/inode.c: creating file '001'
[   98.494343] uhci_hcd 0000:00:1d.0: PCI INT A -> GSI 23 (level, low) -> IRQ 23
[   98.494348] uhci_hcd 0000:00:1d.0: setting latency timer to 64
[   98.494351] uhci_hcd 0000:00:1d.0: UHCI Host Controller
[   98.494381] drivers/usb/core/inode.c: creating file '006'
[   98.494420] uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 6
[   98.494425] uhci_hcd 0000:00:1d.0: detected 2 ports
[   98.494429] uhci_hcd 0000:00:1d.0: uhci_check_and_reset_hc: cmd = 0x0000
[   98.494430] uhci_hcd 0000:00:1d.0: Performing full reset
[   98.494443] uhci_hcd 0000:00:1d.0: supports USB remote wakeup
[   98.494447] uhci_hcd 0000:00:1d.0: irq 23, io base 0x0000b000
[   98.494469] usb usb6: default language 0x0409
[   98.494473] usb usb6: New USB device found, idVendor=1d6b, idProduct=0001
[   98.494475] usb usb6: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   98.494477] usb usb6: Product: UHCI Host Controller
[   98.494479] usb usb6: Manufacturer: Linux 2.6.30.4 uhci_hcd
[   98.494480] usb usb6: SerialNumber: 0000:00:1d.0
[   98.494510] usb usb6: uevent
[   98.494531] usb usb6: usb_probe_device
[   98.494533] usb usb6: configuration #1 chosen from 1 choice
[   98.494538] usb usb6: adding 6-0:1.0 (config #1, interface 0)
[   98.494549] usb 6-0:1.0: uevent
[   98.494571] hub 6-0:1.0: usb_probe_interface
[   98.494573] hub 6-0:1.0: usb_probe_interface - got id
[   98.494574] hub 6-0:1.0: USB hub found
[   98.494578] hub 6-0:1.0: 2 ports detected
[   98.494579] hub 6-0:1.0: standalone hub
[   98.494580] hub 6-0:1.0: no power switching (usb 1.0)
[   98.494582] hub 6-0:1.0: individual port over-current protection
[   98.494583] hub 6-0:1.0: power on to power good time: 2ms
[   98.494586] hub 6-0:1.0: local power source is good
[   98.494588] hub 6-0:1.0: trying to enable port power on non-switchable hub
[   98.494627] drivers/usb/core/inode.c: creating file '001'
[   98.494695] uhci_hcd 0000:00:1d.1: PCI INT B -> GSI 19 (level, low) -> IRQ 19
[   98.494700] uhci_hcd 0000:00:1d.1: setting latency timer to 64
[   98.494702] uhci_hcd 0000:00:1d.1: UHCI Host Controller
[   98.494731] drivers/usb/core/inode.c: creating file '007'
[   98.494766] uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 7
[   98.494772] uhci_hcd 0000:00:1d.1: detected 2 ports
[   98.494775] uhci_hcd 0000:00:1d.1: uhci_check_and_reset_hc: cmd = 0x0000
[   98.494777] uhci_hcd 0000:00:1d.1: Performing full reset
[   98.494789] uhci_hcd 0000:00:1d.1: supports USB remote wakeup
[   98.494796] uhci_hcd 0000:00:1d.1: irq 19, io base 0x0000ac00
[   98.494817] usb usb7: default language 0x0409
[   98.494822] usb usb7: New USB device found, idVendor=1d6b, idProduct=0001
[   98.494824] usb usb7: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   98.494826] usb usb7: Product: UHCI Host Controller
[   98.494827] usb usb7: Manufacturer: Linux 2.6.30.4 uhci_hcd
[   98.494829] usb usb7: SerialNumber: 0000:00:1d.1
[   98.494856] usb usb7: uevent
[   98.494878] usb usb7: usb_probe_device
[   98.494880] usb usb7: configuration #1 chosen from 1 choice
[   98.494884] usb usb7: adding 7-0:1.0 (config #1, interface 0)
[   98.494897] usb 7-0:1.0: uevent
[   98.494923] hub 7-0:1.0: usb_probe_interface
[   98.494924] hub 7-0:1.0: usb_probe_interface - got id
[   98.494926] hub 7-0:1.0: USB hub found
[   98.494932] hub 7-0:1.0: 2 ports detected
[   98.494933] hub 7-0:1.0: standalone hub
[   98.494934] hub 7-0:1.0: no power switching (usb 1.0)
[   98.494935] hub 7-0:1.0: individual port over-current protection
[   98.494937] hub 7-0:1.0: power on to power good time: 2ms
[   98.494940] hub 7-0:1.0: local power source is good
[   98.494941] hub 7-0:1.0: trying to enable port power on non-switchable hub
[   98.494982] drivers/usb/core/inode.c: creating file '001'
[   98.495050] uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18
[   98.495055] uhci_hcd 0000:00:1d.2: setting latency timer to 64
[   98.495057] uhci_hcd 0000:00:1d.2: UHCI Host Controller
[   98.495093] drivers/usb/core/inode.c: creating file '008'
[   98.495129] uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 8
[   98.495135] uhci_hcd 0000:00:1d.2: detected 2 ports
[   98.495138] uhci_hcd 0000:00:1d.2: uhci_check_and_reset_hc: cmd = 0x0000
[   98.495140] uhci_hcd 0000:00:1d.2: Performing full reset
[   98.495152] uhci_hcd 0000:00:1d.2: supports USB remote wakeup
[   98.495156] uhci_hcd 0000:00:1d.2: irq 18, io base 0x0000a880
[   98.495180] usb usb8: default language 0x0409
[   98.495185] usb usb8: New USB device found, idVendor=1d6b, idProduct=0001
[   98.495187] usb usb8: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   98.495189] usb usb8: Product: UHCI Host Controller
[   98.495190] usb usb8: Manufacturer: Linux 2.6.30.4 uhci_hcd
[   98.495192] usb usb8: SerialNumber: 0000:00:1d.2
[   98.495222] usb usb8: uevent
[   98.495243] usb usb8: usb_probe_device
[   98.495245] usb usb8: configuration #1 chosen from 1 choice
[   98.495250] usb usb8: adding 8-0:1.0 (config #1, interface 0)
[   98.495262] usb 8-0:1.0: uevent
[   98.495284] hub 8-0:1.0: usb_probe_interface
[   98.495286] hub 8-0:1.0: usb_probe_interface - got id
[   98.495287] hub 8-0:1.0: USB hub found
[   98.495291] hub 8-0:1.0: 2 ports detected
[   98.495292] hub 8-0:1.0: standalone hub
[   98.495293] hub 8-0:1.0: no power switching (usb 1.0)
[   98.495295] hub 8-0:1.0: individual port over-current protection
[   98.495296] hub 8-0:1.0: power on to power good time: 2ms
[   98.495299] hub 8-0:1.0: local power source is good
[   98.495300] hub 8-0:1.0: trying to enable port power on non-switchable hub
[   98.495340] drivers/usb/core/inode.c: creating file '001'
[   98.495446] usbcore: registered new interface driver usblp
[   98.495449] Initializing USB Mass Storage driver...
[   98.495495] usbcore: registered new interface driver usb-storage
[   98.495497] USB Mass Storage support registered.
[   98.495545] usbcore: registered new interface driver libusual
[   98.495636] PNP: No PS/2 controller found. Probing ports directly.
[   98.497673] serio: i8042 KBD port at 0x60,0x64 irq 1
[   98.497677] serio: i8042 AUX port at 0x60,0x64 irq 12
[   98.497720] mice: PS/2 mouse device common for all mice
[   98.497865] rtc_cmos 00:03: RTC can wake from S4
[   98.497906] rtc_cmos 00:03: rtc core: registered rtc_cmos as rtc0
[   98.497928] rtc0: alarms up to one month, y3k, 114 bytes nvram, hpet irqs
[   98.497977] i801_smbus 0000:00:1f.3: PCI INT C -> GSI 18 (level, low) -> IRQ 18
[   98.498042] md: raid1 personality registered for level 1
[   98.498147] device-mapper: ioctl: 4.14.0-ioctl (2008-04-23) initialised: dm-devel@redhat.com
[   98.498229] cpuidle: using governor ladder
[   98.498230] cpuidle: using governor menu
[   98.498800] usbcore: registered new interface driver hiddev
[   98.498829] usbcore: registered new interface driver usbhid
[   98.498831] usbhid: v2.6:USB HID core driver
[   98.498857] Netfilter messages via NETLINK v0.30.
[   98.498867] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[   98.499043] ctnetlink v0.93: registering with nfnetlink.
[   98.499488] ip_tables: (C) 2000-2006 Netfilter Core Team
[   98.499515] TCP cubic registered
[   98.499516] Initializing XFRM netlink socket
[   98.499670] NET: Registered protocol family 10
[   98.500695] ip6_tables: (C) 2000-2006 Netfilter Core Team
[   98.500742] IPv6 over IPv4 tunneling driver
[   98.501235] NET: Registered protocol family 17
[   98.505175] PM: Resume from disk failed.
[   98.505183] registered taskstats version 1
[   98.577762] ehci_hcd 0000:00:1a.7: GetStatus port 6 status 001803 POWER sig=j CSC CONNECT
[   98.577766] hub 1-0:1.0: port 6: status 0501 change 0001
[   98.592732] hub 2-0:1.0: state 7 ports 6 chg 0000 evt 0000
[   98.592738] hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0000
[   98.592747] hub 4-0:1.0: state 7 ports 2 chg 0000 evt 0000
[   98.593727] uhci_hcd 0000:00:1a.2: port 2 portsc 008a,00
[   98.593746] hub 6-0:1.0: state 7 ports 2 chg 0000 evt 0000
[   98.594726] hub 7-0:1.0: state 7 ports 2 chg 0000 evt 0000
[   98.594734] hub 8-0:1.0: state 7 ports 2 chg 0000 evt 0000
[   98.677560] hub 1-0:1.0: state 7 ports 6 chg 0040 evt 0000
[   98.677568] hub 1-0:1.0: port 6, status 0501, change 0000, 480 Mb/s
[   98.728621] ehci_hcd 0000:00:1a.7: port 6 full speed --> companion
[   98.728625] ehci_hcd 0000:00:1a.7: GetStatus port 6 status 003801 POWER OWNER sig=j CONNECT
[   98.728629] hub 1-0:1.0: port 6 not reset yet, waiting 50ms
[   98.779367] ehci_hcd 0000:00:1a.7: GetStatus port 6 status 003002 POWER OWNER sig=se0 CSC
[   98.779384] hub 5-0:1.0: state 7 ports 2 chg 0000 evt 0004
[   98.779391] uhci_hcd 0000:00:1a.2: port 2 portsc 0093,00
[   98.779397] hub 5-0:1.0: port 2, status 0101, change 0001, 12 Mb/s
[   98.883171] hub 5-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x101
[   98.883910]  sda1 sda2
[   98.884070] sd 4:0:0:0: [sda] Attached SCSI disk
[   98.893164]  sdb1 sdb2
[   98.893315] sd 4:0:1:0: [sdb] Attached SCSI disk
[   98.909397] md: Waiting for all devices to be available before autodetect
[   98.909400] md: If you don't use raid, use raid=noautodetect
[   98.909505] md: Autodetecting RAID arrays.
[   98.909736] md: Scanned 2 and added 2 devices.
[   98.909738] md: autorun ...
[   98.909739] md: considering sdb2 ...
[   98.909744] md:  adding sdb2 ...
[   98.909748] md:  adding sda2 ...
[   98.909751] md: created md0
[   98.909753] md: bind<sda2>
[   98.909763] md: bind<sdb2>
[   98.909771] md: running: <sdb2><sda2>
[   98.909781] md: kicking non-fresh sdb2 from array!
[   98.909785] md: unbind<sdb2>
[   98.916103] md: export_rdev(sdb2)
[   98.916207] raid1: raid set md0 active with 1 out of 2 mirrors
[   98.916241] md: ... autorun DONE.
[   98.916279]  md0: unknown partition table
[   98.968095] EXT4-fs: barriers enabled
[   98.980525] kjournald2 starting: pid 927, dev md0:8, commit interval 5 seconds
[   98.980530] EXT4-fs: delayed allocation enabled
[   98.980532] EXT4-fs: file extents enabled
[   98.984976] usb 5-2: new full speed USB device using uhci_hcd and address 2
[   99.121464] EXT4-fs: mballoc enabled
[   99.121475] EXT4-fs: mounted filesystem md0 with ordered data mode
[   99.121482] VFS: Mounted root (ext4 filesystem) readonly on device 9:0.
[   99.121499] Freeing unused kernel memory: 524k freed
[   99.121574] Write protecting the kernel read-only data: 5552k
[   99.127891] usb 5-2: skipped 1 descriptor after interface
[   99.127894] usb 5-2: skipped 1 descriptor after interface
[   99.132879] usb 5-2: default language 0x0409
[   99.147855] usb 5-2: New USB device found, idVendor=046b, idProduct=ff10
[   99.147859] usb 5-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[   99.147861] usb 5-2: Product: Virtual Keyboard and Mouse
[   99.147864] usb 5-2: Manufacturer: American Megatrends Inc.
[   99.147866] usb 5-2: SerialNumber: serial
[   99.147918] usb 5-2: uevent
[   99.149710] usb 5-2: usb_probe_device
[   99.149713] usb 5-2: configuration #1 chosen from 1 choice
[   99.182790] usb 5-2: adding 5-2:1.0 (config #1, interface 0)
[   99.185783] usb 5-2:1.0: uevent
[   99.185834] usbhid 5-2:1.0: usb_probe_interface
[   99.185836] usbhid 5-2:1.0: usb_probe_interface - got id
[   99.192936] input: American Megatrends Inc. Virtual Keyboard and Mouse as /devices/pci0000:00/0000:00:1a.2/usb5/5-2/5-2:1.0/input/input2
[   99.192944] uhci_hcd 0000:00:1a.2: reserve dev 2 ep81-INT, period 1, phase 0, 17 us
[   99.193017] generic-usb 0003:046B:FF10.0001: input,hidraw0: USB HID v1.10 Keyboard [American Megatrends Inc. Virtual Keyboard and Mouse] on usb-0000:00:1a.2-2/input0
[   99.193062] usb 5-2: adding 5-2:1.1 (config #1, interface 1)
[   99.205745] usb 5-2:1.1: uevent
[   99.205793] usbhid 5-2:1.1: usb_probe_interface
[   99.205795] usbhid 5-2:1.1: usb_probe_interface - got id
[   99.209865] input: American Megatrends Inc. Virtual Keyboard and Mouse as /devices/pci0000:00/0000:00:1a.2/usb5/5-2/5-2:1.1/input/input3
[   99.209968] generic-usb 0003:046B:FF10.0002: input,hidraw1: USB HID v1.10 Mouse [American Megatrends Inc. Virtual Keyboard and Mouse] on usb-0000:00:1a.2-2/input1
[   99.210012] drivers/usb/core/inode.c: creating file '002'
[   99.210056] hub 1-0:1.0: state 7 ports 6 chg 0000 evt 0040
[   99.210062] hub 5-0:1.0: state 7 ports 2 chg 0000 evt 0004
[   99.707173] usb usb3: uevent
[   99.707194] usb 3-0:1.0: uevent
[   99.707295] usb usb4: uevent
[   99.707314] usb 4-0:1.0: uevent
[   99.707410] usb usb5: uevent
[   99.707424] usb 5-0:1.0: uevent
[   99.707449] usb 5-2: uevent
[   99.707462] usb 5-2:1.0: uevent
[   99.707548] usb 5-2:1.1: uevent
[   99.707683] usb usb1: uevent
[   99.707701] usb 1-0:1.0: uevent
[   99.707778] usb usb6: uevent
[   99.707792] usb 6-0:1.0: uevent
[   99.707885] usb usb7: uevent
[   99.707904] usb 7-0:1.0: uevent
[   99.708001] usb usb8: uevent
[   99.708019] usb 8-0:1.0: uevent
[   99.708113] usb usb2: uevent
[   99.708132] usb 2-0:1.0: uevent
[   99.739546] usb usb3: suspend_rh (auto-stop)
[   99.739567] usb usb4: suspend_rh (auto-stop)
[   99.739586] usb usb6: suspend_rh (auto-stop)
[   99.739604] usb usb7: suspend_rh (auto-stop)
[   99.739623] usb usb8: suspend_rh (auto-stop)
[   99.746101] usb 5-2:1.1: uevent
[   99.746134] usb 5-2:1.0: uevent
[   99.746172] usb 5-2: uevent
[   99.746201] usb 5-2: uevent
[   99.746705] usb 5-2:1.1: uevent
[   99.746805] usb 5-2: uevent
[  100.527651] EXT4 FS on md0, internal journal on md0:8
[  100.972295] hub 2-0:1.0: hub_suspend
[  100.972303] usb usb2: bus auto-suspend
[  100.972306] ehci_hcd 0000:00:1d.7: suspend root hub
[  100.972328] hub 3-0:1.0: hub_suspend
[  100.972331] usb usb3: bus auto-suspend
[  100.972334] usb usb3: suspend_rh
[  100.972355] hub 4-0:1.0: hub_suspend
[  100.972357] usb usb4: bus auto-suspend
[  100.972359] usb usb4: suspend_rh
[  100.972372] hub 6-0:1.0: hub_suspend
[  100.972374] usb usb6: bus auto-suspend
[  100.972376] usb usb6: suspend_rh
[  100.972388] hub 7-0:1.0: hub_suspend
[  100.972391] usb usb7: bus auto-suspend
[  100.972392] usb usb7: suspend_rh
[  100.972405] hub 8-0:1.0: hub_suspend
[  100.972407] usb usb8: bus auto-suspend
[  100.972408] usb usb8: suspend_rh
[  101.293072] loop: module loaded
[  101.985289] hub 1-0:1.0: hub_suspend
[  101.985297] usb usb1: bus auto-suspend
[  101.985300] ehci_hcd 0000:00:1a.7: suspend root hub
[  102.186946] ifup used greatest stack depth: 4208 bytes left
[  102.925932] igb: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
[  102.926629] ADDRCONF(NETDEV_UP): eth0: link is not ready
[  102.928012] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[  113.633136] eth0: no IPv6 routers present
[ 1663.180526] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input4
[ 3722.685859] ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 3724.024295] igb: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
[ 3724.025611] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 3734.660305] eth0: no IPv6 routers present
[ 4207.188281] md: bind<sdb2>
[ 4207.232812] RAID1 conf printout:
[ 4207.232815]  --- wd:1 rd:2
[ 4207.232818]  disk 0, wo:0, o:1, dev:sda2
[ 4207.232820]  disk 1, wo:1, o:1, dev:sdb2
[ 4207.232857] md: recovery of RAID array md0
[ 4207.232859] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 4207.232861] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[ 4207.232865] md: using 128k window, over a total of 1953504704 blocks.
[41031.008718] md: md0: recovery done.
[41031.147571] RAID1 conf printout:
[41031.147573]  --- wd:2 rd:2
[41031.147575]  disk 0, wo:0, o:1, dev:sda2
[41031.147577]  disk 1, wo:0, o:1, dev:sdb2
[403024.734049] sum used greatest stack depth: 4072 bytes left
[1328327.346684] tar used greatest stack depth: 4056 bytes left
[1367518.760440] pdflush used greatest stack depth: 3776 bytes left
[2256003.055451] end_request: I/O error, dev sdb, sector 3907028974
[2256003.055674] md: super_written gets error=-5, uptodate=0
[2256003.055677] raid1: Disk failure on sdb2, disabling device.
[2256003.055678] raid1: Operation continuing on 1 devices.
[2256003.437315] RAID1 conf printout:
[2256003.437318]  --- wd:1 rd:2
[2256003.437321]  disk 0, wo:0, o:1, dev:sda2
[2256003.437323]  disk 1, wo:1, o:0, dev:sdb2
[2256003.440542] RAID1 conf printout:
[2256003.440545]  --- wd:1 rd:2
[2256003.440548]  disk 0, wo:0, o:1, dev:sda2
[2257068.333765] md: unbind<sdb2>
[2257068.340466] md: export_rdev(sdb2)
[2257099.392140] md: bind<sdb2>
[2257099.438375] RAID1 conf printout:
[2257099.438378]  --- wd:1 rd:2
[2257099.438381]  disk 0, wo:0, o:1, dev:sda2
[2257099.438383]  disk 1, wo:1, o:1, dev:sdb2
[2257099.438462] md: recovery of RAID array md0
[2257099.438465] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[2257099.438466] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[2257099.438470] md: using 128k window, over a total of 1953504704 blocks.
[2289888.698717] md: md0: recovery done.
[2289888.827881] RAID1 conf printout:
[2289888.827884]  --- wd:2 rd:2
[2289888.827887]  disk 0, wo:0, o:1, dev:sda2
[2289888.827889]  disk 1, wo:0, o:1, dev:sdb2
[2310401.950744] end_request: I/O error, dev sdb, sector 3907028974
[2310401.950966] md: super_written gets error=-5, uptodate=0
[2310401.950969] raid1: Disk failure on sdb2, disabling device.
[2310401.950970] raid1: Operation continuing on 1 devices.
[2310402.622529] RAID1 conf printout:
[2310402.622532]  --- wd:1 rd:2
[2310402.622534]  disk 0, wo:0, o:1, dev:sda2
[2310402.622537]  disk 1, wo:1, o:0, dev:sdb2
[2310402.626394] RAID1 conf printout:
[2310402.626396]  --- wd:1 rd:2
[2310402.626398]  disk 0, wo:0, o:1, dev:sda2
[2317942.110788] md: unbind<sdb2>
[2317942.118014] md: export_rdev(sdb2)
[2317948.637723] md: bind<sdb2>
[2317948.685118] RAID1 conf printout:
[2317948.685121]  --- wd:1 rd:2
[2317948.685124]  disk 0, wo:0, o:1, dev:sda2
[2317948.685126]  disk 1, wo:1, o:1, dev:sdb2
[2317948.685166] md: recovery of RAID array md0
[2317948.685168] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[2317948.685170] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[2317948.685174] md: using 128k window, over a total of 1953504704 blocks.
[2351418.039840] md: md0: recovery done.
[2351418.222173] RAID1 conf printout:
[2351418.222176]  --- wd:2 rd:2
[2351418.222179]  disk 0, wo:0, o:1, dev:sda2
[2351418.222182]  disk 1, wo:0, o:1, dev:sdb2
[2378343.856005] end_request: I/O error, dev sdb, sector 3907028974
[2378343.856226] md: super_written gets error=-5, uptodate=0
[2378343.856228] raid1: Disk failure on sdb2, disabling device.
[2378343.856229] raid1: Operation continuing on 1 devices.
[2378343.877579] RAID1 conf printout:
[2378343.877583]  --- wd:1 rd:2
[2378343.877587]  disk 0, wo:0, o:1, dev:sda2
[2378343.877591]  disk 1, wo:1, o:0, dev:sdb2
[2378343.881234] RAID1 conf printout:
[2378343.881237]  --- wd:1 rd:2
[2378343.881240]  disk 0, wo:0, o:1, dev:sda2
[2409799.504484] md: unbind<sdb2>
[2409799.511526] md: export_rdev(sdb2)
[2409803.613344] md: bind<sdb2>
[2409803.622694] RAID1 conf printout:
[2409803.622697]  --- wd:1 rd:2
[2409803.622700]  disk 0, wo:0, o:1, dev:sda2
[2409803.622702]  disk 1, wo:1, o:1, dev:sdb2
[2409803.622741] md: recovery of RAID array md0
[2409803.622743] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[2409803.622745] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[2409803.622749] md: using 128k window, over a total of 1953504704 blocks.
[2446822.078048] md: md0: recovery done.
[2446822.162361] RAID1 conf printout:
[2446822.162364]  --- wd:2 rd:2
[2446822.162368]  disk 0, wo:0, o:1, dev:sda2
[2446822.162370]  disk 1, wo:0, o:1, dev:sdb2
[2463699.282479] end_request: I/O error, dev sdb, sector 3907028974
[2463699.282699] md: super_written gets error=-5, uptodate=0
[2463699.282702] raid1: Disk failure on sdb2, disabling device.
[2463699.282703] raid1: Operation continuing on 1 devices.
[2463699.303186] RAID1 conf printout:
[2463699.303189]  --- wd:1 rd:2
[2463699.303192]  disk 0, wo:0, o:1, dev:sda2
[2463699.303194]  disk 1, wo:1, o:0, dev:sdb2
[2463699.308035] RAID1 conf printout:
[2463699.308037]  --- wd:1 rd:2
[2463699.308040]  disk 0, wo:0, o:1, dev:sda2
[2499818.688121] md: unbind<sdb2>
[2499818.695024] md: export_rdev(sdb2)
[2499822.032436] md: bind<sdb2>
[2499822.041787] RAID1 conf printout:
[2499822.041790]  --- wd:1 rd:2
[2499822.041793]  disk 0, wo:0, o:1, dev:sda2
[2499822.041796]  disk 1, wo:1, o:1, dev:sdb2
[2499822.041839] md: recovery of RAID array md0
[2499822.041841] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[2499822.041843] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[2499822.041847] md: using 128k window, over a total of 1953504704 blocks.
[2537148.835839] md: md0: recovery done.
[2537148.964996] RAID1 conf printout:
[2537148.964999]  --- wd:2 rd:2
[2537148.965003]  disk 0, wo:0, o:1, dev:sda2
[2537148.965005]  disk 1, wo:0, o:1, dev:sdb2
[2545135.278733] end_request: I/O error, dev sdb, sector 3907028974
[2545135.278955] md: super_written gets error=-5, uptodate=0
[2545135.278958] raid1: Disk failure on sdb2, disabling device.
[2545135.278959] raid1: Operation continuing on 1 devices.
[2545135.761917] RAID1 conf printout:
[2545135.761920]  --- wd:1 rd:2
[2545135.761923]  disk 0, wo:0, o:1, dev:sda2
[2545135.761926]  disk 1, wo:1, o:0, dev:sdb2
[2545135.766893] RAID1 conf printout:
[2545135.766896]  --- wd:1 rd:2
[2545135.766899]  disk 0, wo:0, o:1, dev:sda2
[2550667.169302] md: unbind<sdb2>
[2550667.177346] md: export_rdev(sdb2)
[2550667.568787] md: bind<sdb2>
[2550667.614961] RAID1 conf printout:
[2550667.614964]  --- wd:1 rd:2
[2550667.614967]  disk 0, wo:0, o:1, dev:sda2
[2550667.614969]  disk 1, wo:1, o:1, dev:sdb2
[2550667.615008] md: recovery of RAID array md0
[2550667.615010] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[2550667.615012] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[2550667.615016] md: using 128k window, over a total of 1953504704 blocks.
[2586150.939539] md: md0: recovery done.
[2586151.142132] RAID1 conf printout:
[2586151.142135]  --- wd:2 rd:2
[2586151.142139]  disk 0, wo:0, o:1, dev:sda2
[2586151.142142]  disk 1, wo:0, o:1, dev:sdb2
[2607474.299114] end_request: I/O error, dev sdb, sector 3907028974
[2607474.299337] md: super_written gets error=-5, uptodate=0
[2607474.299340] raid1: Disk failure on sdb2, disabling device.
[2607474.299341] raid1: Operation continuing on 1 devices.
[2607474.319610] RAID1 conf printout:
[2607474.319613]  --- wd:1 rd:2
[2607474.319616]  disk 0, wo:0, o:1, dev:sda2
[2607474.319619]  disk 1, wo:1, o:0, dev:sdb2
[2607474.323530] RAID1 conf printout:
[2607474.323532]  --- wd:1 rd:2
[2607474.323535]  disk 0, wo:0, o:1, dev:sda2
[2622763.210681] md: unbind<sdb2>
[2622763.217011] md: export_rdev(sdb2)
[2622763.327343] md: bind<sdb2>
[2622763.375661] RAID1 conf printout:
[2622763.375664]  --- wd:1 rd:2
[2622763.375667]  disk 0, wo:0, o:1, dev:sda2
[2622763.375669]  disk 1, wo:1, o:1, dev:sdb2
[2622763.375710] md: recovery of RAID array md0
[2622763.375712] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[2622763.375714] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[2622763.375718] md: using 128k window, over a total of 1953504704 blocks.
[2659664.938226] md: md0: recovery done.
[2659665.070862] RAID1 conf printout:
[2659665.070865]  --- wd:2 rd:2
[2659665.070868]  disk 0, wo:0, o:1, dev:sda2
[2659665.070870]  disk 1, wo:0, o:1, dev:sdb2
[2973873.112757] md: data-check of RAID array md0
[2973873.112760] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[2973873.112762] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
[2973873.112766] md: using 128k window, over a total of 1953504704 blocks.
[3008265.577682] md: md0: data-check done.
[3058424.467530] end_request: I/O error, dev sdb, sector 3907028974
[3058424.467751] md: super_written gets error=-5, uptodate=0
[3058424.467755] raid1: Disk failure on sdb2, disabling device.
[3058424.467755] raid1: Operation continuing on 1 devices.
[3058424.488444] RAID1 conf printout:
[3058424.488447]  --- wd:1 rd:2
[3058424.488450]  disk 0, wo:0, o:1, dev:sda2
[3058424.488453]  disk 1, wo:1, o:0, dev:sdb2
[3058424.491469] RAID1 conf printout:
[3058424.491471]  --- wd:1 rd:2
[3058424.491473]  disk 0, wo:0, o:1, dev:sda2
[3058529.271920] md: unbind<sdb2>
[3058529.282549] md: export_rdev(sdb2)
[3058529.308832] md: bind<sdb2>
[3058529.683992] RAID1 conf printout:
[3058529.683995]  --- wd:1 rd:2
[3058529.683998]  disk 0, wo:0, o:1, dev:sda2
[3058529.684000]  disk 1, wo:1, o:1, dev:sdb2
[3058529.684039] md: recovery of RAID array md0
[3058529.684041] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[3058529.684043] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[3058529.684047] md: using 128k window, over a total of 1953504704 blocks.
[3095772.403073] md: md0: recovery done.
[3095772.513820] RAID1 conf printout:
[3095772.513823]  --- wd:2 rd:2
[3095772.513826]  disk 0, wo:0, o:1, dev:sda2
[3095772.513829]  disk 1, wo:0, o:1, dev:sdb2
[3111078.923862] end_request: I/O error, dev sdb, sector 3907028974
[3111078.924084] md: super_written gets error=-5, uptodate=0
[3111078.924086] raid1: Disk failure on sdb2, disabling device.
[3111078.924087] raid1: Operation continuing on 1 devices.
[3111078.945144] RAID1 conf printout:
[3111078.945147]  --- wd:1 rd:2
[3111078.945150]  disk 0, wo:0, o:1, dev:sda2
[3111078.945152]  disk 1, wo:1, o:0, dev:sdb2
[3111078.948893] RAID1 conf printout:
[3111078.948896]  --- wd:1 rd:2
[3111078.948899]  disk 0, wo:0, o:1, dev:sda2
[3114305.709565] md: unbind<sdb2>
[3114305.719957] md: export_rdev(sdb2)
[3114305.756401] md: bind<sdb2>
[3114305.807463] RAID1 conf printout:
[3114305.807466]  --- wd:1 rd:2
[3114305.807469]  disk 0, wo:0, o:1, dev:sda2
[3114305.807471]  disk 1, wo:1, o:1, dev:sdb2
[3114305.807512] md: recovery of RAID array md0
[3114305.807515] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[3114305.807516] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[3114305.807520] md: using 128k window, over a total of 1953504704 blocks.
[3150207.473717] md: md0: recovery done.
[3150207.600595] RAID1 conf printout:
[3150207.600598]  --- wd:2 rd:2
[3150207.600601]  disk 0, wo:0, o:1, dev:sda2
[3150207.600604]  disk 1, wo:0, o:1, dev:sdb2
[3161491.894551] end_request: I/O error, dev sdb, sector 3907028974
[3161491.894776] md: super_written gets error=-5, uptodate=0
[3161491.894779] raid1: Disk failure on sdb2, disabling device.
[3161491.894780] raid1: Operation continuing on 1 devices.
[3161492.276178] RAID1 conf printout:
[3161492.276181]  --- wd:1 rd:2
[3161492.276184]  disk 0, wo:0, o:1, dev:sda2
[3161492.276186]  disk 1, wo:1, o:0, dev:sdb2
[3161492.279854] RAID1 conf printout:
[3161492.279856]  --- wd:1 rd:2
[3161492.279859]  disk 0, wo:0, o:1, dev:sda2
[3183730.823542] md: unbind<sdb2>
[3183730.834313] md: export_rdev(sdb2)
[3183731.279881] md: bind<sdb2>
[3183731.304704] RAID1 conf printout:
[3183731.304707]  --- wd:1 rd:2
[3183731.304710]  disk 0, wo:0, o:1, dev:sda2
[3183731.304712]  disk 1, wo:1, o:1, dev:sdb2
[3183731.304752] md: recovery of RAID array md0
[3183731.304754] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[3183731.304756] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[3183731.304759] md: using 128k window, over a total of 1953504704 blocks.
[3218635.462393] md: md0: recovery done.
[3218635.591553] RAID1 conf printout:
[3218635.591557]  --- wd:2 rd:2
[3218635.591560]  disk 0, wo:0, o:1, dev:sda2
[3218635.591562]  disk 1, wo:0, o:1, dev:sdb2
[3391005.325670] end_request: I/O error, dev sdb, sector 3907028974
[3391005.325890] md: super_written gets error=-5, uptodate=0
[3391005.325893] raid1: Disk failure on sdb2, disabling device.
[3391005.325894] raid1: Operation continuing on 1 devices.
[3391006.012141] RAID1 conf printout:
[3391006.012144]  --- wd:1 rd:2
[3391006.012147]  disk 0, wo:0, o:1, dev:sda2
[3391006.012149]  disk 1, wo:1, o:0, dev:sdb2
[3391006.015266] RAID1 conf printout:
[3391006.015269]  --- wd:1 rd:2
[3391006.015272]  disk 0, wo:0, o:1, dev:sda2
[3410193.895569] md: unbind<sdb2>
[3410193.902505] md: export_rdev(sdb2)
[3410193.923767] md: bind<sdb2>
[3410193.970724] RAID1 conf printout:
[3410193.970727]  --- wd:1 rd:2
[3410193.970730]  disk 0, wo:0, o:1, dev:sda2
[3410193.970733]  disk 1, wo:1, o:1, dev:sdb2
[3410193.970773] md: recovery of RAID array md0
[3410193.970775] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[3410193.970777] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[3410193.970781] md: using 128k window, over a total of 1953504704 blocks.
[3445254.592366] md: md0: recovery done.
[3445254.712358] RAID1 conf printout:
[3445254.712361]  --- wd:2 rd:2
[3445254.712364]  disk 0, wo:0, o:1, dev:sda2
[3445254.712367]  disk 1, wo:0, o:1, dev:sdb2
[3880879.007618] end_request: I/O error, dev sda, sector 3907028974
[3880879.007839] md: super_written gets error=-5, uptodate=0
[3880879.007842] raid1: Disk failure on sda2, disabling device.
[3880879.007843] raid1: Operation continuing on 1 devices.
[3880879.028518] RAID1 conf printout:
[3880879.028521]  --- wd:1 rd:2
[3880879.028524]  disk 0, wo:1, o:0, dev:sda2
[3880879.028527]  disk 1, wo:0, o:1, dev:sdb2
[3880879.031607] RAID1 conf printout:
[3880879.031610]  --- wd:1 rd:2
[3880879.031613]  disk 1, wo:0, o:1, dev:sdb2
[3885338.980679] md: cannot remove active disk sdb2 from md0 ...
[3885363.182774] md: unbind<sda2>
[3885363.189328] md: export_rdev(sda2)
[3885363.605459] md: bind<sda2>
[3885363.654632] RAID1 conf printout:
[3885363.654635]  --- wd:1 rd:2
[3885363.654638]  disk 0, wo:1, o:1, dev:sda2
[3885363.654641]  disk 1, wo:0, o:1, dev:sdb2
[3885363.654681] md: recovery of RAID array md0
[3885363.654683] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[3885363.654685] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[3885363.654689] md: using 128k window, over a total of 1953504704 blocks.
[3921168.923151] md: md0: recovery done.
[3921169.040914] RAID1 conf printout:
[3921169.040917]  --- wd:2 rd:2
[3921169.040920]  disk 0, wo:0, o:1, dev:sda2
[3921169.040923]  disk 1, wo:0, o:1, dev:sdb2
[3964586.721298] sd 4:0:0:0: [sda] Sense Key : Recovered Error [current] [descriptor]
[3964586.721304] Descriptor sense data with sense descriptors (in hex):
[3964586.721307]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 
[3964586.721315]         00 00 00 00 00 00 
[3964586.721320] sd 4:0:0:0: [sda] Add. Sense: ATA pass through information available
[3964588.909481] sd 4:0:1:0: [sdb] Sense Key : Recovered Error [current] [descriptor]
[3964588.909487] Descriptor sense data with sense descriptors (in hex):
[3964588.909489]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 
[3964588.909498]         00 00 00 00 00 00 
[3964588.909503] sd 4:0:1:0: [sdb] Add. Sense: ATA pass through information available
[3964616.782366] sd 4:0:1:0: [sdb] Sense Key : Recovered Error [current] [descriptor]
[3964616.782372] Descriptor sense data with sense descriptors (in hex):
[3964616.782375]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 
[3964616.782384]         00 00 00 00 00 00 
[3964616.782388] sd 4:0:1:0: [sdb] Add. Sense: ATA pass through information available
[3964616.783355] sd 4:0:1:0: [sdb] Sense Key : Recovered Error [current] [descriptor]
[3964616.783359] Descriptor sense data with sense descriptors (in hex):
[3964616.783361]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 
[3964616.783366]         00 00 00 00 40 50 
[3964616.783370] sd 4:0:1:0: [sdb] Add. Sense: ATA pass through information available
[3964616.786249] sd 4:0:1:0: [sdb] Sense Key : Recovered Error [current] [descriptor]
[3964616.786254] Descriptor sense data with sense descriptors (in hex):
[3964616.786256]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 
[3964616.786265]         00 00 00 00 40 50 
[3964616.786269] sd 4:0:1:0: [sdb] Add. Sense: ATA pass through information available
[3964616.787231] sd 4:0:1:0: [sdb] Sense Key : Recovered Error [current] [descriptor]
[3964616.787235] Descriptor sense data with sense descriptors (in hex):
[3964616.787236]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 
[3964616.787242]         00 00 00 00 40 50 
[3964616.787245] sd 4:0:1:0: [sdb] Add. Sense: ATA pass through information available
[3964616.792653] sd 4:0:1:0: [sdb] Sense Key : Recovered Error [current] [descriptor]
[3964616.792659] Descriptor sense data with sense descriptors (in hex):
[3964616.792661]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 
[3964616.792669]         00 00 00 00 00 00 
[3964616.792674] sd 4:0:1:0: [sdb] Add. Sense: ATA pass through information available
[3964620.527486] sd 4:0:0:0: [sda] Sense Key : Recovered Error [current] [descriptor]
[3964620.527492] Descriptor sense data with sense descriptors (in hex):
[3964620.527495]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 
[3964620.527503]         00 00 00 00 00 00 
[3964620.527508] sd 4:0:0:0: [sda] Add. Sense: ATA pass through information available
[3964620.528493] sd 4:0:0:0: [sda] Sense Key : Recovered Error [current] [descriptor]
[3964620.528497] Descriptor sense data with sense descriptors (in hex):
[3964620.528498]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 
[3964620.528504]         00 00 00 00 40 50 
[3964620.528508] sd 4:0:0:0: [sda] Add. Sense: ATA pass through information available
[3964620.531094] sd 4:0:0:0: [sda] Sense Key : Recovered Error [current] [descriptor]
[3964620.531099] Descriptor sense data with sense descriptors (in hex):
[3964620.531102]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 
[3964620.531110]         00 00 00 00 40 50 
[3964620.531115] sd 4:0:0:0: [sda] Add. Sense: ATA pass through information available
[3964620.532075] sd 4:0:0:0: [sda] Sense Key : Recovered Error [current] [descriptor]
[3964620.532079] Descriptor sense data with sense descriptors (in hex):
[3964620.532080]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 
[3964620.532086]         00 00 00 00 40 50 
[3964620.532090] sd 4:0:0:0: [sda] Add. Sense: ATA pass through information available
[3964620.537480] sd 4:0:0:0: [sda] Sense Key : Recovered Error [current] [descriptor]
[3964620.537485] Descriptor sense data with sense descriptors (in hex):
[3964620.537487]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 
[3964620.537495]         00 00 00 00 00 00 
[3964620.537500] sd 4:0:0:0: [sda] Add. Sense: ATA pass through information available


/Allan
-- 
Allan Wind
Life Integrity, LLC
<http://lifeintegrity.com>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-17 16:17                                                         ` Mark Lord
@ 2009-09-18 17:05                                                           ` Chris Webb
  2009-09-20 17:35                                                             ` Allan Wind
  2009-09-21 10:26                                                             ` Chris Webb
  0 siblings, 2 replies; 84+ messages in thread
From: Chris Webb @ 2009-09-18 17:05 UTC (permalink / raw)
  To: Mark Lord
  Cc: Tejun Heo, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

Mark Lord <liml@rtr.ca> writes:

> Speaking of which..
> 
> Chris:  I wonder if the errors will also vanish in your situation
> by disabling the onboard write-caches in the drives ?
> 
> Eg.  hdparm -W0 /dev/sd?

Hi Mark. I've got a test machine on its way at the moment, so I'll make sure
I check this one out on it too.

Cheers,

Chris.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-17 15:44                                       ` Tejun Heo
  2009-09-17 16:36                                         ` Allan Wind
@ 2009-09-18 17:07                                         ` Chris Webb
  2009-09-20 18:46                                         ` Robert Hancock
  2 siblings, 0 replies; 84+ messages in thread
From: Chris Webb @ 2009-09-18 17:07 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Neil Brown, Ric Wheeler, Andrei Tanas, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik, Mark Lord

Tejun Heo <tj@kernel.org> writes:

> Chris Webb wrote:
>
> > Would such very slow (but ultimately successful) flushes be
> > consistent with the theory of power supply issues affecting the
> > drives? It feels like the 30s timeouts on flush could be just a more
> > severe version of the 15s very slow flushes.
> 
> Probably not.  Power problems usually don't resolve themselves with
> longer timeout.  If the drive genuinely takes longer than 30s to
> flush, it would be very interesting tho.  That's something people have
> been worrying about but hasn't materialized yet.  The timeout is
> controlled by SD_TIMEOUT in drivers/scsi/sd.h.  You might want to bump
> it up to, say, 60s and see whether anything changes.

I'll add that to the list of things to check out on the test machine with a
more disposable installation on it! The 15s flushes we're seeing on
superblock barrier writes do already feel dangerously close to the 30s
hardcoded timeout to me: it's only a factor of two.

Cheers,

Chris.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-18 17:05                                                           ` Chris Webb
@ 2009-09-20 17:35                                                             ` Allan Wind
  2009-09-28  5:32                                                               ` Allan Wind
  2009-09-21 10:26                                                             ` Chris Webb
  1 sibling, 1 reply; 84+ messages in thread
From: Allan Wind @ 2009-09-20 17:35 UTC (permalink / raw)
  To: linux-scsi

On 2009-09-18T18:05:17, Chris Webb wrote:
> Mark Lord <liml@rtr.ca> writes:
> 
> > Speaking of which..
> > 
> > Chris:  I wonder if the errors will also vanish in your situation
> > by disabling the onboard write-caches in the drives ?
> > 
> > Eg.  hdparm -W0 /dev/sd?
> 
> Hi Mark. I've got a test machine on its way at the moment, so I'll make sure
> I check this one out on it too.

It has been stable for me the last 3 days with the write cached 
disabled.


/Allan
-- 
Allan Wind
Life Integrity, LLC
<http://lifeintegrity.com>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-17 16:16                                                       ` Mark Lord
  2009-09-17 16:17                                                         ` Mark Lord
@ 2009-09-20 18:36                                                         ` Robert Hancock
  1 sibling, 0 replies; 84+ messages in thread
From: Robert Hancock @ 2009-09-20 18:36 UTC (permalink / raw)
  To: Mark Lord
  Cc: Tejun Heo, Chris Webb, linux-scsi, Ric Wheeler, Andrei Tanas,
	NeilBrown, linux-kernel, IDE/ATA development list, Jeff Garzik,
	Mark Lord

On 09/17/2009 10:16 AM, Mark Lord wrote:
> Tejun Heo wrote:
>> Hello,
>>
>> Mark Lord wrote:
>>> Tejun.. do we do a FLUSH CACHE before issuing a non-NCQ command ?
>>
>> Nope.
>>
>>> If not, then I think we may need to add code to do it.
>>
>> Hmm... can you explain a bit more? That seems rather extreme to me.
> ..
>
> You may recall that I first raised this issue about a year ago,
> when my own RAID0 array (MythTV box) started showing errors very
> similar to what Chris is reporting.
>
> These were easily triggered by running hddtemp once every few seconds
> to log drive temperatures during Myth recording sessions.
>
> hddtemp uses SMART commands.
>
> The actual errors in the logs were command timeouts,
> but at this point I no longer remember which opcode was
> actually timing out. Disabling the onboard write cache
> immediately "cured" the problem, at the expense of MUCH
> slower I/O times.
>
> My theory at the time, was that some non-NCQ commands might be triggering
> an internal FLUSH CACHE within the (Hitachi) drive firmware, which then
> caused the original command to timeout in libata (due to the large amounts
> of data present in the onboard write-caches).
>
> Now that more people are playing the game, we're seeing more and more
> reports of strange interactions with smartd running in the background.

Well, unless the SMART commands are using a non-standard timeout, it'll 
be the same as the timeout for the flush cache, so the flush cache would 
have timed out too..

>
> I suspect more and more now that this is an (avoidable) interaction
> between the write-cache and the SMART opcode, and it could perhaps be
> avoided by doing a FLUSH CACHE before any SMART (or non-data command)
> opcode.
>
> Cheers
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-17 15:44                                       ` Tejun Heo
  2009-09-17 16:36                                         ` Allan Wind
  2009-09-18 17:07                                         ` Chris Webb
@ 2009-09-20 18:46                                         ` Robert Hancock
  2009-09-21  0:02                                           ` Kyle Moffett
  2 siblings, 1 reply; 84+ messages in thread
From: Robert Hancock @ 2009-09-20 18:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Chris Webb, Neil Brown, Ric Wheeler, Andrei Tanas, linux-kernel,
	IDE/ATA development list, linux-scsi, Jeff Garzik, Mark Lord

On 09/17/2009 09:44 AM, Tejun Heo wrote:
>> Thanks Neil. This implies that when we see these fifteen second
>> hangs reading /proc/mdstat without write errors, there are genuinely
>> successful superblock writes which are taking fifteen seconds to
>> complete, presumably corresponding to flushes which complete but
>> take a full 15s to do so.
>>
>> Would such very slow (but ultimately successful) flushes be
>> consistent with the theory of power supply issues affecting the
>> drives? It feels like the 30s timeouts on flush could be just a more
>> severe version of the 15s very slow flushes.
>
> Probably not.  Power problems usually don't resolve themselves with
> longer timeout.  If the drive genuinely takes longer than 30s to
> flush, it would be very interesting tho.  That's something people have
> been worrying about but hasn't materialized yet.  The timeout is
> controlled by SD_TIMEOUT in drivers/scsi/sd.h.  You might want to bump
> it up to, say, 60s and see whether anything changes.

It's possible if the power dip only slightly disrupted the drive it 
might just take longer to complete the write. I've also seen reports of 
vibration issues causing problems in RAID arrays (there's a video on 
Youtube of a guy yelling at a Sun disk array during heavy I/O and the 
resulting vibrations causing an immediate spike in I/O service times). 
Could be something like that causing issues with simultaneous media 
access to all drives in the array, too..

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-20 18:46                                         ` Robert Hancock
@ 2009-09-21  0:02                                           ` Kyle Moffett
  0 siblings, 0 replies; 84+ messages in thread
From: Kyle Moffett @ 2009-09-21  0:02 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Tejun Heo, Chris Webb, Neil Brown, Ric Wheeler, Andrei Tanas,
	linux-kernel, IDE/ATA development list, linux-scsi, Jeff Garzik,
	Mark Lord

On Sun, Sep 20, 2009 at 14:46, Robert Hancock <hancockrwd@gmail.com> wrote:
> On 09/17/2009 09:44 AM, Tejun Heo wrote:
>>>
>>> Thanks Neil. This implies that when we see these fifteen second
>>> hangs reading /proc/mdstat without write errors, there are genuinely
>>> successful superblock writes which are taking fifteen seconds to
>>> complete, presumably corresponding to flushes which complete but
>>> take a full 15s to do so.
>>>
>>> Would such very slow (but ultimately successful) flushes be
>>> consistent with the theory of power supply issues affecting the
>>> drives? It feels like the 30s timeouts on flush could be just a more
>>> severe version of the 15s very slow flushes.
>>
>> Probably not.  Power problems usually don't resolve themselves with
>> longer timeout.  If the drive genuinely takes longer than 30s to
>> flush, it would be very interesting tho.  That's something people have
>> been worrying about but hasn't materialized yet.  The timeout is
>> controlled by SD_TIMEOUT in drivers/scsi/sd.h.  You might want to bump
>> it up to, say, 60s and see whether anything changes.
>
> It's possible if the power dip only slightly disrupted the drive it might
> just take longer to complete the write. I've also seen reports of vibration
> issues causing problems in RAID arrays (there's a video on Youtube of a guy
> yelling at a Sun disk array during heavy I/O and the resulting vibrations
> causing an immediate spike in I/O service times). Could be something like
> that causing issues with simultaneous media access to all drives in the
> array, too..

There have been a rather large number of reported firmware problems
lately with various models of Seagate SATA drives; typically they
cause command timeouts and occasionally they completely brick the
drive (restart does not fix it).  I possessed 3 of these for a while
and they pretty consistently fell over (even with just 3 in a
low-power-CPU box with a good PSU rated for 8 drives).

You might check with the various Seagate tech support lines to see if
your drive firmwares are affected by the bugs (Some were related to
NCQ command processing, others were just single-command failures).

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-18 17:05                                                           ` Chris Webb
  2009-09-20 17:35                                                             ` Allan Wind
@ 2009-09-21 10:26                                                             ` Chris Webb
  2009-09-21 19:47                                                               ` Mark Lord
  2009-09-22  6:16                                                               ` Robert Hancock
  1 sibling, 2 replies; 84+ messages in thread
From: Chris Webb @ 2009-09-21 10:26 UTC (permalink / raw)
  To: Mark Lord
  Cc: Tejun Heo, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

Chris Webb <chris@arachsys.com> writes:

> Mark Lord <liml@rtr.ca> writes:
> 
> > Speaking of which..
> > 
> > Chris:  I wonder if the errors will also vanish in your situation
> > by disabling the onboard write-caches in the drives ?
> > 
> > Eg.  hdparm -W0 /dev/sd?
> 
> Hi Mark. I've got a test machine on its way at the moment, so I'll make sure
> I check this one out on it too.

Our test machine is still being built, but we had an opportunity to try this on
a couple of the live machines when their RAID arrays failed over the weekend.
We still got timeouts, but (predictably!) they're not on flushes any more:

  ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
  ata2.00: cmd 35/00:08:98:c6:00/00:00:4e:00:00/e0 tag 0 dm
          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4
  ata2.00: status: { DRDY }
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  ata2.00: configured for UDMA/33
  ata2: EH complete
  [...] 
  ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
  ata2.00: cmd 35/00:08:18:94:68/00:00:3d:00:00/e0 tag 0 dm
          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4
  ata2.00: status: { DRDY }
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  ata2.00: configured for UDMA/33
  ata2: EH complete
  [...]

all the way through the night.

I also have these in the log, but they are immediately after turning off the
write caching in all drives, so may be a red herring with data still being
written out.

  ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
  ata2.00: cmd c8/00:08:00:20:80/00:00:00:00:00/e0 tag 0 dm
          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4
  ata2.00: status: { DRDY }
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  ata2.00: configured for UDMA/133
  ata2: EH complete
  ata2.00: limiting speed to UDMA/100:PIO4
  ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
  ata2.00: cmd 25/00:08:80:3e:2d/00:00:4e:00:00/e0 tag 0 dm
          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4
  ata2.00: status: { DRDY }
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: link is slow to respond, please be patient (ready=0
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  ata2.00: configured for UDMA/100
  ata2: EH complete

On another machine, I saw this with write caching turned off:

  ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
  ata2.00: cmd 61/08:00:28:1f:80/00:00:00:00:00/40 tag 0 ncq 4096 out
           res 40/00:00:40:1f:80/00:00:00:00:00/40 Emask 0x4 (timeout)
  ata2.00: status: { DRDY }
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  ata2.00: configured for UDMA/133
  ata2: EH complete
  ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
  ata2.00: cmd 61/08:00:28:1f:80/00:00:00:00:00/40 tag 0 ncq 4096 out
           res 40/00:00:20:1f:80/00:00:00:00:00/40 Emask 0x4 (timeout)
  ata2.00: status: { DRDY }
  ata2: hard resetting link
  ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  ata2.00: qc timeout (cmd 0xef)
  ata2.00: failed to set xfermode (err_mask=0x4)
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: link is slow to respond, please be patient (ready=0)
  ata2: softreset failed (device not ready)
  ata2: limiting SATA link speed to 1.5 Gbps
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: reset failed, giving up
  ata2.00: disabled
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: link is slow to respond, please be patient (ready=0)
  ata2: softreset failed (device not ready)
  ata2: hard resetting link
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  ata2: EH complete
  sd 1:0:0:0: [sdb] Unhandled error code
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  end_request: I/O error, dev sdb, sector 8396584
  end_request: I/O error, dev sdb, sector 8396584
  md: super_written gets error=-5, uptodate=0
  raid1: Disk failure on sdb1, disabling device.
  raid1: Operation continuing on 5 devices.
  sd 1:0:0:0: [sdb] Unhandled error code
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  end_request: I/O error, dev sdb, sector 8396632
  end_request: I/O error, dev sdb, sector 8396632
  md: super_written gets error=-5, uptodate=0
  sd 1:0:0:0: [sdb] Unhandled error code
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  end_request: I/O error, dev sdb, sector 654934840
  raid10: sdb3: rescheduling sector 1788594488
  sd 1:0:0:0: [sdb] Unhandled error code
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  end_request: I/O error, dev sdb, sector 1311583568
  raid10: Disk failure on sdb3, disabling device.
  raid10: Operation continuing on 3 devices.
  Buffer I/O error on device dm-51, logical block 31930
  lost page write due to I/O error on dm-51
  Buffer I/O error on device dm-51, logical block 31931
  lost page write due to I/O error on dm-51
  Buffer I/O error on device dm-51, logical block 31932
  lost page write due to I/O error on dm-51
  Buffer I/O error on device dm-51, logical block 31933
  lost page write due to I/O error on dm-51
  sd 1:0:0:0: [sdb] Unhandled error code
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  end_request: I/O error, dev sdb, sector 1465147272
  end_request: I/O error, dev sdb, sector 1465147272
  md: super_written gets error=-5, uptodate=0
  sd 1:0:0:0: [sdb] Unhandled error code
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  end_request: I/O error, dev sdb, sector 8396584
  end_request: I/O error, dev sdb, sector 8396584
  md: super_written gets error=-5, uptodate=0
  sd 1:0:0:0: [sdb] READ CAPACITY(16) failed
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  sd 1:0:0:0: [sdb] Sense not available.
  sd 1:0:0:0: [sdb] READ CAPACITY failed
  sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00
  sd 1:0:0:0: [sdb] Sense not available.
  sd 1:0:0:0: [sdb] Asking for cache data failed
  sd 1:0:0:0: [sdb] Assuming drive cache: write through
  sdb: detected capacity change from 750156374016 to 0
  raid10: sdb: unrecoverable I/O read error for block 1788594488
  Buffer I/O error on device dm-59, logical block 204023
  Buffer I/O error on device dm-59, logical block 204023
  Buffer I/O error on device dm-43, logical block 24845
  lost page write due to I/O error on dm-43
  Buffer I/O error on device dm-62, logical block 558722
  lost page write due to I/O error on dm-62
  Buffer I/O error on device dm-43, logical block 24846
  lost page write due to I/O error on dm-43
  RAID1 conf printout:
   --- wd:5 rd:6
   disk 0, wo:0, o:1, dev:sda1
   disk 1, wo:1, o:0, dev:sdb1
   disk 2, wo:0, o:1, dev:sdc1
  RAID1 conf printout:
   --- wd:5 rd:6
   disk 0, wo:0, o:1, dev:sda1
   disk 2, wo:0, o:1, dev:sdc1
   disk 3, wo:0, o:1, dev:sdd1
   disk 4, wo:0, o:1, dev:sde1
   disk 5, wo:0, o:1, dev:sdf1
  raid10: Disk failure on sdb2, disabling device.
  raid10: Operation continuing on 3 devices.
  raid10: sdb: unrecoverable I/O read error for block 0
  RAID10 conf printout:
   --- wd:3 rd:6
   disk 1, wo:1, o:0, dev:sdb2
   disk 2, wo:0, o:1, dev:sdc2
   disk 3, wo:0, o:1, dev:sdd2
   disk 5, wo:0, o:1, dev:sdf2
  RAID10 conf printout:
   --- wd:3 rd:6
   disk 2, wo:0, o:1, dev:sdc2
   disk 3, wo:0, o:1, dev:sdd2
   disk 5, wo:0, o:1, dev:sdf2
  md: md2: resync done.
  RAID10 conf printout:
   --- wd:3 rd:6
   disk 1, wo:1, o:0, dev:sdb3
   disk 2, wo:0, o:1, dev:sdc3
   disk 3, wo:0, o:1, dev:sdd3
   disk 5, wo:0, o:1, dev:sdf3
  RAID10 conf printout:
   --- wd:3 rd:6
   disk 2, wo:0, o:1, dev:sdc3
   disk 3, wo:0, o:1, dev:sdd3
   disk 5, wo:0, o:1, dev:sdf3

Cheers,

Chris.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-21 10:26                                                             ` Chris Webb
@ 2009-09-21 19:47                                                               ` Mark Lord
  2009-09-22  6:16                                                               ` Robert Hancock
  1 sibling, 0 replies; 84+ messages in thread
From: Mark Lord @ 2009-09-21 19:47 UTC (permalink / raw)
  To: Chris Webb
  Cc: Tejun Heo, linux-scsi, Ric Wheeler, Andrei Tanas, NeilBrown,
	linux-kernel, IDE/ATA development list, Jeff Garzik, Mark Lord

Chris Webb wrote:
> Chris Webb <chris@arachsys.com> writes:
> 
>> Mark Lord <liml@rtr.ca> writes:
>>
>>> Speaking of which..
>>>
>>> Chris:  I wonder if the errors will also vanish in your situation
>>> by disabling the onboard write-caches in the drives ?
>>>
>>> Eg.  hdparm -W0 /dev/sd?
>> Hi Mark. I've got a test machine on its way at the moment, so I'll make sure
>> I check this one out on it too.
> 
> Our test machine is still being built, but we had an opportunity to try this on
> a couple of the live machines when their RAID arrays failed over the weekend.
> We still got timeouts, but (predictably!) they're not on flushes any more:
> 
>   ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
>   ata2.00: cmd 35/00:08:98:c6:00/00:00:4e:00:00/e0 tag 0 dm
...
> all the way through the night.
> 
> I also have these in the log, but they are immediately after turning off the
> write caching in all drives, so may be a red herring with data still being
> written out.
> 
>   ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
>   ata2.00: cmd c8/00:08:00:20:80/00:00:00:00:00/e0 tag 0 dm
...
> On another machine, I saw this with write caching turned off:
> 
>   ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
> ata2.00: cmd 61/08:00:28:1f:80/00:00:00:00:00/40 tag 0 ncq 4096 out
...

0x35 is a 48-bit DMA WRITE, 0xc8 is a 28-bit DMA READ,
and 0x61 is an NCQ WRITE.

Looks like some kind of hardware trouble to me.
And as Tejun suggested, it's difficult to guess at
a cause other than the PSU.

Cheers, and good luck.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-21 10:26                                                             ` Chris Webb
  2009-09-21 19:47                                                               ` Mark Lord
@ 2009-09-22  6:16                                                               ` Robert Hancock
  1 sibling, 0 replies; 84+ messages in thread
From: Robert Hancock @ 2009-09-22  6:16 UTC (permalink / raw)
  To: Chris Webb
  Cc: Mark Lord, Tejun Heo, linux-scsi, Ric Wheeler, Andrei Tanas,
	NeilBrown, linux-kernel, IDE/ATA development list, Jeff Garzik,
	Mark Lord

On 09/21/2009 04:26 AM, Chris Webb wrote:
> On another machine, I saw this with write caching turned off:
>
>    ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
>    ata2.00: cmd 61/08:00:28:1f:80/00:00:00:00:00/40 tag 0 ncq 4096 out
>             res 40/00:00:40:1f:80/00:00:00:00:00/40 Emask 0x4 (timeout)
>    ata2.00: status: { DRDY }
>    ata2: hard resetting link
>    ata2: softreset failed (device not ready)
>    ata2: hard resetting link
>    ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>    ata2.00: configured for UDMA/133
>    ata2: EH complete
>    ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
>    ata2.00: cmd 61/08:00:28:1f:80/00:00:00:00:00/40 tag 0 ncq 4096 out
>             res 40/00:00:20:1f:80/00:00:00:00:00/40 Emask 0x4 (timeout)
>    ata2.00: status: { DRDY }
>    ata2: hard resetting link
>    ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>    ata2.00: qc timeout (cmd 0xef)
>    ata2.00: failed to set xfermode (err_mask=0x4)
>    ata2: hard resetting link
>    ata2: softreset failed (device not ready)
>    ata2: hard resetting link
>    ata2: softreset failed (device not ready)
>    ata2: hard resetting link
>    ata2: link is slow to respond, please be patient (ready=0)
>    ata2: softreset failed (device not ready)
>    ata2: limiting SATA link speed to 1.5 Gbps
>    ata2: hard resetting link
>    ata2: softreset failed (device not ready)
>    ata2: reset failed, giving up
>    ata2.00: disabled

Basically an NCQ command timed out, then the drive basically stopped 
talking to the controller even after banging on it with multiple resets. 
That failure especially looks suspicious of some kind of hardware issue..

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: MD/RAID time out writing superblock
  2009-09-20 17:35                                                             ` Allan Wind
@ 2009-09-28  5:32                                                               ` Allan Wind
  0 siblings, 0 replies; 84+ messages in thread
From: Allan Wind @ 2009-09-28  5:32 UTC (permalink / raw)
  To: linux-scsi

On 2009-09-20T13:35:43, Allan Wind wrote:
> On 2009-09-18T18:05:17, Chris Webb wrote:
> > Mark Lord <liml@rtr.ca> writes:
> > 
> > > Speaking of which..
> > > 
> > > Chris:  I wonder if the errors will also vanish in your situation
> > > by disabling the onboard write-caches in the drives ?
> > > 
> > > Eg.  hdparm -W0 /dev/sd?
> > 
> > Hi Mark. I've got a test machine on its way at the moment, so I'll make sure
> > I check this one out on it too.
> 
> It has been stable for me the last 3 days with the write cached 
> disabled.

My mirrored raid array has been stable for over a week now.  Too 
bad disabling the write cache did not resolve it for Chris.


/Allan
-- 
Allan Wind
Life Integrity, LLC
<http://lifeintegrity.com>


^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2009-09-28  5:32 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-26  0:32 MD/RAID: what's wrong with sector 1953519935? Andrei Tanas
2009-08-26  0:50 ` NeilBrown
2009-08-26  1:06   ` Ric Wheeler
2009-08-26  1:24     ` NeilBrown
2009-08-26  1:31       ` Ric Wheeler
2009-08-26  2:22         ` Andrei Tanas
2009-08-26  2:41           ` Ric Wheeler
2009-08-26  3:45             ` Andrei Tanas
2009-08-26 10:34               ` Ric Wheeler
2009-08-26 14:46                 ` Andrei Tanas
2009-08-26 14:49                   ` Andrei Tanas
2009-08-26 15:39                   ` Ric Wheeler
2009-08-26 18:12                     ` Andrei Tanas
2009-08-26 18:12                       ` Andrei Tanas
2009-08-27  0:07                       ` Mark Lord
2009-08-27  1:37                         ` Andrei Tanas
2009-08-27  1:37                           ` Andrei Tanas
2009-08-27  2:33                       ` Robert Hancock
2009-08-27 21:22                       ` MD/RAID time out writing superblock Andrei Tanas
2009-08-27 21:57                         ` Ric Wheeler
2009-08-31  8:10                           ` Tejun Heo
2009-08-31 12:04                             ` Ric Wheeler
2009-08-31 12:20                               ` Tejun Heo
2009-09-07 11:44                                 ` Chris Webb
2009-09-07 11:59                                   ` Chris Webb
2009-09-09 12:02                                     ` Chris Webb
2009-09-14  7:41                                       ` Tejun Heo
2009-09-14  7:44                                         ` Tejun Heo
2009-09-14 12:48                                           ` Mark Lord
2009-09-14 13:05                                             ` Tejun Heo
2009-09-14 14:25                                               ` Mark Lord
2009-09-16 23:19                                                 ` Chris Webb
2009-09-17 13:29                                                   ` Mark Lord
2009-09-17 13:32                                                     ` Mark Lord
2009-09-17 13:37                                                     ` Chris Webb
2009-09-17 15:35                                                     ` Tejun Heo
2009-09-17 16:16                                                       ` Mark Lord
2009-09-17 16:17                                                         ` Mark Lord
2009-09-18 17:05                                                           ` Chris Webb
2009-09-20 17:35                                                             ` Allan Wind
2009-09-28  5:32                                                               ` Allan Wind
2009-09-21 10:26                                                             ` Chris Webb
2009-09-21 19:47                                                               ` Mark Lord
2009-09-22  6:16                                                               ` Robert Hancock
2009-09-20 18:36                                                         ` Robert Hancock
2009-09-14 13:11                                           ` Henrique de Moraes Holschuh
2009-09-14 13:24                                             ` Tejun Heo
2009-09-14 14:02                                               ` Henrique de Moraes Holschuh
2009-09-14 14:34                                                 ` Tejun Heo
2009-09-14 13:14                                         ` Gabor Gombas
2009-09-07 16:55                                   ` Allan Wind
2009-09-07 16:55                                   ` Allan Wind
2009-09-07 23:26                                     ` Thomas Fjellstrom
2009-09-07 23:26                                       ` Thomas Fjellstrom
2009-09-14  7:46                                       ` Tejun Heo
2009-09-14 21:13                                         ` Thomas Fjellstrom
2009-09-14 22:23                                           ` Tejun Heo
2009-09-16 22:28                                 ` Chris Webb
2009-09-16 23:47                                   ` Tejun Heo
2009-09-17  0:34                                     ` Neil Brown
2009-09-17 12:00                                       ` Chris Webb
2009-09-17 11:57                                     ` Chris Webb
2009-09-17 15:44                                       ` Tejun Heo
2009-09-17 16:36                                         ` Allan Wind
2009-09-18  0:16                                           ` Tejun Heo
2009-09-18  2:47                                             ` Allan Wind
2009-09-18 17:07                                         ` Chris Webb
2009-09-20 18:46                                         ` Robert Hancock
2009-09-21  0:02                                           ` Kyle Moffett
2009-09-17 13:35                                     ` Mark Lord
2009-09-17 15:47                                       ` Tejun Heo
2009-08-31 12:21                             ` Mark Lord
2009-08-31 23:45                               ` Mark Lord
2009-09-01 13:07                                 ` Andrei Tanas
2009-09-01 13:07                                   ` Andrei Tanas
2009-09-01 13:15                                   ` Mark Lord
2009-09-01 13:30                                     ` Tejun Heo
2009-09-01 13:47                                       ` Ric Wheeler
2009-09-01 14:18                                         ` Andrei Tanas
2009-09-01 14:18                                           ` Andrei Tanas
2009-09-14  5:30                                           ` Marc Giger
2009-09-14  5:30                                             ` Marc Giger
2009-09-02 21:58                                   ` Allan Wind
2009-09-04 19:39                                     ` Andrei Tanas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.