All of lore.kernel.org
 help / color / mirror / Atom feed
* Recovering RAID Volumes from 6 Disks
@ 2016-07-19 16:29 Amit Biswas
  2016-07-19 19:57 ` Wols Lists
  0 siblings, 1 reply; 9+ messages in thread
From: Amit Biswas @ 2016-07-19 16:29 UTC (permalink / raw)
  To: linux-raid

Greetings!

Backup server was acting up and the issue was the drives (all of them)
:( Could use some guidance or verdict.

It has a total of 6 drives: sda,b,c,d,e,f. From the superblock info
(attached), there is a raid 1, and a raid 10 volume. Problem is all
the disks are part of both raid volumes (according to superblock).

I am currently booted into an ubuntu live disk shell.

Raid 1 : sde2(0) sdb2(1) sdc2(s) sdd2(s) sdf2(s)
    - > 2 devices
    - > contains the boot partition. I was able to mount this.
Raid 0: sdc3(2) sdd3(3) sde3(4) sdf3(5)
    - > 6 Devices
    - > currently inactive

Again, attached are the superblocks, dmseg output.

[Thu Jul  7 15:54:07 2016] Buffer I/O error on dev sdb3, logical block
1, async page read




/dev/sdb2:

         Magic : a92b4efc

       Version : 1.2

   Feature Map : 0x0

    Array UUID : 356f81a0:69057dd3:8f19639c:26b37454

          Name : Vlab-backup:0

 Creation Time : Wed May 28 20:44:31 2014

    Raid Level : raid1

  Raid Devices : 2


Avail Dev Size : 976384 (476.83 MiB 499.91 MB)

    Array Size : 488128 (476.77 MiB 499.84 MB)

 Used Dev Size : 976256 (476.77 MiB 499.84 MB)

   Data Offset : 512 sectors

  Super Offset : 8 sectors

         State : clean

   Device UUID : ab7e7ad4:0bcf824d:3976a240:063016cf


   Update Time : Wed Jul  6 23:02:21 2016

      Checksum : ead06f71 - correct

        Events : 142



  Device Role : Active device 1

  Array State : AA ('A' == active, '.' == missing)

/dev/sdc2:

         Magic : a92b4efc

       Version : 1.2

   Feature Map : 0x0

    Array UUID : 356f81a0:69057dd3:8f19639c:26b37454

          Name : Vlab-backup:0

 Creation Time : Wed May 28 20:44:31 2014

    Raid Level : raid1

  Raid Devices : 2


Avail Dev Size : 976384 (476.83 MiB 499.91 MB)

    Array Size : 488128 (476.77 MiB 499.84 MB)

 Used Dev Size : 976256 (476.77 MiB 499.84 MB)

   Data Offset : 512 sectors

  Super Offset : 8 sectors

         State : clean

   Device UUID : 86665b3f:664bd08b:65195b2a:bd8345b1


   Update Time : Wed Jul  6 23:02:21 2016

      Checksum : 5fe6ca8b - correct

        Events : 142



  Device Role : spare

  Array State : AA ('A' == active, '.' == missing)

/dev/sdd2:

         Magic : a92b4efc

       Version : 1.2

   Feature Map : 0x0

    Array UUID : 356f81a0:69057dd3:8f19639c:26b37454

          Name : Vlab-backup:0

 Creation Time : Wed May 28 20:44:31 2014

    Raid Level : raid1

  Raid Devices : 2


Avail Dev Size : 976384 (476.83 MiB 499.91 MB)

    Array Size : 488128 (476.77 MiB 499.84 MB)

 Used Dev Size : 976256 (476.77 MiB 499.84 MB)

   Data Offset : 512 sectors

  Super Offset : 8 sectors

         State : clean

   Device UUID : 5ad0350e:5f2b9fad:c8cac4fa:aa5524ef


   Update Time : Wed Jul  6 23:02:21 2016

      Checksum : 5ed897aa - correct

        Events : 142



  Device Role : spare

  Array State : AA ('A' == active, '.' == missing)

/dev/sde2:

         Magic : a92b4efc

       Version : 1.2

   Feature Map : 0x0

    Array UUID : 356f81a0:69057dd3:8f19639c:26b37454

          Name : Vlab-backup:0

 Creation Time : Wed May 28 20:44:31 2014

    Raid Level : raid1

  Raid Devices : 2


Avail Dev Size : 976384 (476.83 MiB 499.91 MB)

    Array Size : 488128 (476.77 MiB 499.84 MB)

 Used Dev Size : 976256 (476.77 MiB 499.84 MB)

   Data Offset : 512 sectors

  Super Offset : 8 sectors

         State : clean

   Device UUID : 3e6b7c97:3feb9132:235a5a6e:836e4f35


   Update Time : Wed Jul  6 23:02:21 2016

      Checksum : 26d29aa2 - correct

        Events : 142



  Device Role : Active device 0

  Array State : AA ('A' == active, '.' == missing)

/dev/sdf2:

         Magic : a92b4efc

       Version : 1.2

   Feature Map : 0x0

    Array UUID : 356f81a0:69057dd3:8f19639c:26b37454

          Name : Vlab-backup:0

 Creation Time : Wed May 28 20:44:31 2014

    Raid Level : raid1

  Raid Devices : 2


Avail Dev Size : 976384 (476.83 MiB 499.91 MB)

    Array Size : 488128 (476.77 MiB 499.84 MB)

 Used Dev Size : 976256 (476.77 MiB 499.84 MB)

   Data Offset : 512 sectors

  Super Offset : 8 sectors

         State : clean

   Device UUID : b0b62b5e:56fbc4c5:735250c4:029d3ee2


   Update Time : Wed Jul  6 23:02:21 2016

      Checksum : 839a1cfc - correct

        Events : 142



  Device Role : spare

  Array State : AA ('A' == active, '.' == missing)

/dev/sdc3:

         Magic : a92b4efc

       Version : 1.2

   Feature Map : 0x0

    Array UUID : 3d825ac3:a2e5a336:554fa8d8:542297a3

          Name : Vlab-backup:1

 Creation Time : Wed May 28 20:44:56 2014

    Raid Level : raid10

  Raid Devices : 6


Avail Dev Size : 1952280576 (930.92 GiB 999.57 GB)

    Array Size : 2928419328 (2792.76 GiB 2998.70 GB)

 Used Dev Size : 1952279552 (930.92 GiB 999.57 GB)

   Data Offset : 262144 sectors

  Super Offset : 8 sectors

         State : clean

   Device UUID : 0b23c18b:3202f588:9b7b5636:49f80c36


   Update Time : Wed Jul  6 22:29:24 2016

      Checksum : 34d1544e - correct

        Events : 9149455


        Layout : near=2

    Chunk Size : 512K


  Device Role : Active device 2

  Array State : .AAAAA ('A' == active, '.' == missing)

/dev/sdd3:

         Magic : a92b4efc

       Version : 1.2

   Feature Map : 0x0

    Array UUID : 3d825ac3:a2e5a336:554fa8d8:542297a3

          Name : Vlab-backup:1

 Creation Time : Wed May 28 20:44:56 2014

    Raid Level : raid10

  Raid Devices : 6


Avail Dev Size : 1952280576 (930.92 GiB 999.57 GB)

    Array Size : 2928419328 (2792.76 GiB 2998.70 GB)

 Used Dev Size : 1952279552 (930.92 GiB 999.57 GB)

   Data Offset : 262144 sectors

  Super Offset : 8 sectors

         State : clean

   Device UUID : 2fd7193f:502b657f:708dc952:7a0a7309


   Update Time : Wed Jul  6 22:29:24 2016

      Checksum : ce735592 - correct

        Events : 9149455


        Layout : near=2

    Chunk Size : 512K


  Device Role : Active device 3

  Array State : .AAAAA ('A' == active, '.' == missing)

/dev/sde3:

         Magic : a92b4efc

       Version : 1.2

   Feature Map : 0x0

    Array UUID : 3d825ac3:a2e5a336:554fa8d8:542297a3

          Name : Vlab-backup:1

 Creation Time : Wed May 28 20:44:56 2014

    Raid Level : raid10

  Raid Devices : 6


Avail Dev Size : 1952280576 (930.92 GiB 999.57 GB)

    Array Size : 2928419328 (2792.76 GiB 2998.70 GB)

 Used Dev Size : 1952279552 (930.92 GiB 999.57 GB)

   Data Offset : 262144 sectors

  Super Offset : 8 sectors

         State : clean

   Device UUID : 43d64bab:059237da:63729452:2ce3beda


   Update Time : Wed Jul  6 22:29:24 2016

      Checksum : 668e7903 - correct

        Events : 9149455


        Layout : near=2

    Chunk Size : 512K


  Device Role : Active device 4

  Array State : .AAAAA ('A' == active, '.' == missing)

/dev/sdf3:

         Magic : a92b4efc

       Version : 1.2

   Feature Map : 0x0

    Array UUID : 3d825ac3:a2e5a336:554fa8d8:542297a3

          Name : Vlab-backup:1

 Creation Time : Wed May 28 20:44:56 2014

    Raid Level : raid10

  Raid Devices : 6


Avail Dev Size : 1952280576 (930.92 GiB 999.57 GB)

    Array Size : 2928419328 (2792.76 GiB 2998.70 GB)

 Used Dev Size : 1952279552 (930.92 GiB 999.57 GB)

   Data Offset : 262144 sectors

  Super Offset : 8 sectors

         State : clean

   Device UUID : c243087b:605b0938:f6b1e13b:11cf83ad


   Update Time : Wed Jul  6 22:29:24 2016

      Checksum : 50334dad - correct

        Events : 9149455


        Layout : near=2

    Chunk Size : 512K


  Device Role : Active device 5

  Array State : .AAAAA ('A' == active, '.' == missing)



[Thu Jul  7 15:53:41 2016] ata3: SATA link up 3.0 Gbps (SStatus 123
SControl 300)

[Thu Jul  7 15:53:41 2016] ata3.00: ATA-8: ST1000DM003-1CH162, CC46,
max UDMA/133

[Thu Jul  7 15:53:41 2016] ata3.00: 1953525168 sectors, multi 0: LBA48
NCQ (depth 31/32), AA

[Thu Jul  7 15:53:41 2016] ata3.00: configured for UDMA/133

[Thu Jul  7 15:53:41 2016] scsi 2:0:0:0: Direct-Access     ATA
ST1000DM003-1CH1 CC46 PQ: 0 ANSI: 5

[Thu Jul  7 15:53:41 2016] sd 2:0:0:0: [sda] 1953525168 512-byte
logical blocks: (1.00 TB/931 GiB)

[Thu Jul  7 15:53:41 2016] sd 2:0:0:0: [sda] 4096-byte physical blocks

[Thu Jul  7 15:53:41 2016] sd 2:0:0:0: Attached scsi generic sg2 type 0

[Thu Jul  7 15:53:41 2016] sd 2:0:0:0: [sda] Write Protect is off

[Thu Jul  7 15:53:41 2016] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00

[Thu Jul  7 15:53:41 2016] sd 2:0:0:0: [sda] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA

[Thu Jul  7 15:53:41 2016] scsi 3:0:0:0: Direct-Access     ATA
ST1000DM003-9YN1 CC9C PQ: 0 ANSI: 5

[Thu Jul  7 15:53:41 2016] sd 3:0:0:0: [sdb] 1953525168 512-byte
logical blocks: (1.00 TB/931 GiB)

[Thu Jul  7 15:53:41 2016] sd 3:0:0:0: [sdb] 4096-byte physical blocks

[Thu Jul  7 15:53:41 2016] sd 3:0:0:0: Attached scsi generic sg3 type 0

[Thu Jul  7 15:53:41 2016] sd 3:0:0:0: [sdb] Write Protect is off

[Thu Jul  7 15:53:41 2016] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00

[Thu Jul  7 15:53:41 2016] sd 3:0:0:0: [sdb] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA

[Thu Jul  7 15:53:41 2016] scsi 4:0:0:0: Direct-Access     ATA
ST1000NM0011     SN03 PQ: 0 ANSI: 5

[Thu Jul  7 15:53:41 2016] sd 4:0:0:0: [sdc] 1953525168 512-byte
logical blocks: (1.00 TB/931 GiB)

[Thu Jul  7 15:53:41 2016] sd 4:0:0:0: Attached scsi generic sg4 type 0

[Thu Jul  7 15:53:41 2016] sd 4:0:0:0: [sdc] Write Protect is off

[Thu Jul  7 15:53:41 2016] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00

[Thu Jul  7 15:53:41 2016] sd 4:0:0:0: [sdc] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA

[Thu Jul  7 15:53:41 2016] scsi 5:0:0:0: Direct-Access     ATA
ST1000NM0011     SN03 PQ: 0 ANSI: 5

[Thu Jul  7 15:53:41 2016] sd 5:0:0:0: [sdd] 1953525168 512-byte
logical blocks: (1.00 TB/931 GiB)

[Thu Jul  7 15:53:41 2016] sd 5:0:0:0: Attached scsi generic sg5 type 0

[Thu Jul  7 15:53:41 2016] sd 5:0:0:0: [sdd] Write Protect is off

[Thu Jul  7 15:53:41 2016] sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00

[Thu Jul  7 15:53:41 2016] sd 5:0:0:0: [sdd] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA

[Thu Jul  7 15:53:41 2016] scsi 6:0:0:0: Direct-Access     ATA
ST1000DM003-1CH1 CC46 PQ: 0 ANSI: 5

[Thu Jul  7 15:53:41 2016] sd 6:0:0:0: [sde] 1953525168 512-byte
logical blocks: (1.00 TB/931 GiB)

[Thu Jul  7 15:53:41 2016] sd 6:0:0:0: [sde] 4096-byte physical blocks

[Thu Jul  7 15:53:41 2016] sd 6:0:0:0: Attached scsi generic sg6 type 0

[Thu Jul  7 15:53:41 2016] sd 6:0:0:0: [sde] Write Protect is off

[Thu Jul  7 15:53:41 2016] sd 6:0:0:0: [sde] Mode Sense: 00 3a 00 00

[Thu Jul  7 15:53:41 2016] sd 6:0:0:0: [sde] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA

[Thu Jul  7 15:53:41 2016] scsi 7:0:0:0: Direct-Access     ATA
ST1000DM003-9YN1 CC9C PQ: 0 ANSI: 5

[Thu Jul  7 15:53:41 2016] sd 7:0:0:0: [sdf] 1953525168 512-byte
logical blocks: (1.00 TB/931 GiB)

[Thu Jul  7 15:53:41 2016] sd 7:0:0:0: [sdf] 4096-byte physical blocks

[Thu Jul  7 15:53:41 2016] sd 7:0:0:0: Attached scsi generic sg7 type 0

[Thu Jul  7 15:53:41 2016] sd 7:0:0:0: [sdf] Write Protect is off

[Thu Jul  7 15:53:41 2016] sd 7:0:0:0: [sdf] Mode Sense: 00 3a 00 00

[Thu Jul  7 15:53:41 2016] sd 7:0:0:0: [sdf] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA

[Thu Jul  7 15:53:41 2016] random: nonblocking pool is initialized

[Thu Jul  7 15:53:41 2016]  sdd: sdd1 sdd2 sdd3

[Thu Jul  7 15:53:41 2016] sd 5:0:0:0: [sdd] Attached SCSI disk

[Thu Jul  7 15:53:41 2016]  sdc: sdc1 sdc2 sdc3

[Thu Jul  7 15:53:41 2016] sd 4:0:0:0: [sdc] Attached SCSI disk

[Thu Jul  7 15:53:41 2016]  sdb: sdb1 sdb2 sdb3

[Thu Jul  7 15:53:41 2016] sd 3:0:0:0: [sdb] Attached SCSI disk

[Thu Jul  7 15:53:41 2016]  sdf: sdf1 sdf2 sdf3

[Thu Jul  7 15:53:41 2016] sd 7:0:0:0: [sdf] Attached SCSI disk

[Thu Jul  7 15:53:41 2016]  sde: sde1 sde2 sde3

[Thu Jul  7 15:53:41 2016] sd 6:0:0:0: [sde] Attached SCSI disk

[Thu Jul  7 15:54:02 2016] ata3.00: qc timeout (cmd 0x47)

[Thu Jul  7 15:54:02 2016] ata3.00: READ LOG DMA EXT failed, trying unqueued

[Thu Jul  7 15:54:02 2016] ata3: failed to read log page 10h (errno=-5)

[Thu Jul  7 15:54:02 2016] ata3.00: exception Emask 0x1 SAct
0x10000000 SErr 0x0 action 0x6 frozen

[Thu Jul  7 15:54:02 2016] ata3.00: irq_stat 0x40000008

[Thu Jul  7 15:54:02 2016] ata3.00: failed command: READ FPDMA QUEUED

[Thu Jul  7 15:54:02 2016] ata3.00: cmd
60/08:e0:a8:6d:70/00:00:74:00:00/40 tag 28 ncq 4096 in

[Thu Jul  7 15:54:02 2016]          res
40/00:e0:a8:6d:70/00:00:74:00:00/40 Emask 0x1 (device error)

[Thu Jul  7 15:54:02 2016] ata3.00: status: { DRDY }

[Thu Jul  7 15:54:02 2016] ata3: hard resetting link

[Thu Jul  7 15:54:05 2016] ata3: SATA link up 3.0 Gbps (SStatus 123
SControl 300)

[Thu Jul  7 15:54:05 2016] ata3.00: configured for UDMA/133

[Thu Jul  7 15:54:05 2016] ata3: EH complete

[Thu Jul  7 15:54:06 2016] ata4.00: exception Emask 0x0 SAct 0x8 SErr
0x0 action 0x0

[Thu Jul  7 15:54:06 2016] ata4.00: irq_stat 0x40000008

[Thu Jul  7 15:54:06 2016] ata4.00: failed command: READ FPDMA QUEUED

[Thu Jul  7 15:54:06 2016] ata4.00: cmd
60/08:18:08:f8:0e/00:00:00:00:00/40 tag 3 ncq 4096 in

[Thu Jul  7 15:54:06 2016]          res
41/40:08:09:f8:0e/00:00:00:00:00/00 Emask 0x409 (media error) <F>

[Thu Jul  7 15:54:06 2016] ata4.00: status: { DRDY ERR }

[Thu Jul  7 15:54:06 2016] ata4.00: error: { UNC }

[Thu Jul  7 15:54:06 2016] ata4.00: configured for UDMA/133

[Thu Jul  7 15:54:06 2016] sd 3:0:0:0: [sdb] tag#3 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE

[Thu Jul  7 15:54:06 2016] sd 3:0:0:0: [sdb] tag#3 Sense Key : Medium
Error [current] [descriptor]

[Thu Jul  7 15:54:06 2016] sd 3:0:0:0: [sdb] tag#3 Add. Sense:
Unrecovered read error - auto reallocate failed

[Thu Jul  7 15:54:06 2016] sd 3:0:0:0: [sdb] tag#3 CDB: Read(10) 28 00
00 0e f8 08 00 00 08 00

[Thu Jul  7 15:54:06 2016] blk_update_request: I/O error, dev sdb, sector 981001

[Thu Jul  7 15:54:06 2016] ata4: EH complete

[Thu Jul  7 15:54:07 2016] ata4.00: exception Emask 0x0 SAct 0x400
SErr 0x0 action 0x0

[Thu Jul  7 15:54:07 2016] ata4.00: irq_stat 0x40000008

[Thu Jul  7 15:54:07 2016] ata4.00: failed command: READ FPDMA QUEUED

[Thu Jul  7 15:54:07 2016] ata4.00: cmd
60/08:50:08:f8:0e/00:00:00:00:00/40 tag 10 ncq 4096 in

[Thu Jul  7 15:54:07 2016]          res
41/40:08:09:f8:0e/00:00:00:00:00/00 Emask 0x409 (media error) <F>

[Thu Jul  7 15:54:07 2016] ata4.00: status: { DRDY ERR }

[Thu Jul  7 15:54:07 2016] ata4.00: error: { UNC }

[Thu Jul  7 15:54:07 2016] ata4.00: configured for UDMA/133

[Thu Jul  7 15:54:07 2016] sd 3:0:0:0: [sdb] tag#10 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE

[Thu Jul  7 15:54:07 2016] sd 3:0:0:0: [sdb] tag#10 Sense Key : Medium
Error [current] [descriptor]

[Thu Jul  7 15:54:07 2016] sd 3:0:0:0: [sdb] tag#10 Add. Sense:
Unrecovered read error - auto reallocate failed

[Thu Jul  7 15:54:07 2016] sd 3:0:0:0: [sdb] tag#10 CDB: Read(10) 28
00 00 0e f8 08 00 00 08 00

[Thu Jul  7 15:54:07 2016] blk_update_request: I/O error, dev sdb, sector 981001

[Thu Jul  7 15:54:07 2016] Buffer I/O error on dev sdb3, logical block
1, async page read

[Thu Jul  7 15:54:07 2016] ata4: EH complete

[Thu Jul  7 15:54:07 2016] ata4.00: exception Emask 0x0 SAct 0x2 SErr
0x0 action 0x0

[Thu Jul  7 15:54:07 2016] ata4.00: irq_stat 0x40000008

[Thu Jul  7 15:54:07 2016] ata4.00: failed command: READ FPDMA QUEUED

[Thu Jul  7 15:54:07 2016] ata4.00: cmd
60/08:08:08:f8:0e/00:00:00:00:00/40 tag 1 ncq 4096 in

[Thu Jul  7 15:54:07 2016]          res
41/40:08:09:f8:0e/00:00:00:00:00/00 Emask 0x409 (media error) <F>

[Thu Jul  7 15:54:07 2016] ata4.00: status: { DRDY ERR }

[Thu Jul  7 15:54:07 2016] ata4.00: error: { UNC }

[Thu Jul  7 15:54:07 2016] ata4.00: configured for UDMA/133

[Thu Jul  7 15:54:07 2016] sd 3:0:0:0: [sdb] tag#1 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE

[Thu Jul  7 15:54:07 2016] sd 3:0:0:0: [sdb] tag#1 Sense Key : Medium
Error [current] [descriptor]

[Thu Jul  7 15:54:07 2016] sd 3:0:0:0: [sdb] tag#1 Add. Sense:
Unrecovered read error - auto reallocate failed

[Thu Jul  7 15:54:07 2016] sd 3:0:0:0: [sdb] tag#1 CDB: Read(10) 28 00
00 0e f8 08 00 00 08 00

[Thu Jul  7 15:54:07 2016] blk_update_request: I/O error, dev sdb, sector 981001

[Thu Jul  7 15:54:07 2016] Buffer I/O error on dev sdb3, logical block
1, async page read

[Thu Jul  7 15:54:07 2016] ata4: EH complete

[Thu Jul  7 15:54:08 2016] ata4.00: exception Emask 0x0 SAct 0x20 SErr
0x0 action 0x0

[Thu Jul  7 15:54:08 2016] ata4.00: irq_stat 0x40000008

[Thu Jul  7 15:54:08 2016] ata4.00: failed command: READ FPDMA QUEUED

[Thu Jul  7 15:54:08 2016] ata4.00: cmd
60/08:28:08:f8:0e/00:00:00:00:00/40 tag 5 ncq 4096 in

[Thu Jul  7 15:54:08 2016]          res
41/40:08:09:f8:0e/00:00:00:00:00/00 Emask 0x409 (media error) <F>

[Thu Jul  7 15:54:08 2016] ata4.00: status: { DRDY ERR }

[Thu Jul  7 15:54:08 2016] ata4.00: error: { UNC }

[Thu Jul  7 15:54:08 2016] ata4.00: configured for UDMA/133

[Thu Jul  7 15:54:08 2016] sd 3:0:0:0: [sdb] tag#5 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE

[Thu Jul  7 15:54:08 2016] sd 3:0:0:0: [sdb] tag#5 Sense Key : Medium
Error [current] [descriptor]

[Thu Jul  7 15:54:08 2016] sd 3:0:0:0: [sdb] tag#5 Add. Sense:
Unrecovered read error - auto reallocate failed

[Thu Jul  7 15:54:08 2016] sd 3:0:0:0: [sdb] tag#5 CDB: Read(10) 28 00
00 0e f8 08 00 00 08 00

[Thu Jul  7 15:54:08 2016] blk_update_request: I/O error, dev sdb, sector 981001

[Thu Jul  7 15:54:08 2016] Buffer I/O error on dev sdb3, logical block
1, async page read

[Thu Jul  7 15:54:08 2016] ata4: EH complete

[Thu Jul  7 15:54:09 2016] ata4.00: exception Emask 0x0 SAct 0x4000000
SErr 0x0 action 0x0

[Thu Jul  7 15:54:09 2016] ata4.00: irq_stat 0x40000008

[Thu Jul  7 15:54:09 2016] ata4.00: failed command: READ FPDMA QUEUED

[Thu Jul  7 15:54:09 2016] ata4.00: cmd
60/08:d0:08:f8:0e/00:00:00:00:00/40 tag 26 ncq 4096 in

[Thu Jul  7 15:54:09 2016]          res
41/40:08:09:f8:0e/00:00:00:00:00/00 Emask 0x409 (media error) <F>

[Thu Jul  7 15:54:09 2016] ata4.00: status: { DRDY ERR }

[Thu Jul  7 15:54:09 2016] ata4.00: error: { UNC }

[Thu Jul  7 15:54:09 2016] ata4.00: configured for UDMA/133

[Thu Jul  7 15:54:09 2016] sd 3:0:0:0: [sdb] tag#26 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE

[Thu Jul  7 15:54:09 2016] sd 3:0:0:0: [sdb] tag#26 Sense Key : Medium
Error [current] [descriptor]

[Thu Jul  7 15:54:09 2016] sd 3:0:0:0: [sdb] tag#26 Add. Sense:
Unrecovered read error - auto reallocate failed

[Thu Jul  7 15:54:09 2016] sd 3:0:0:0: [sdb] tag#26 CDB: Read(10) 28
00 00 0e f8 08 00 00 08 00

[Thu Jul  7 15:54:09 2016] blk_update_request: I/O error, dev sdb, sector 981001

[Thu Jul  7 15:54:09 2016] Buffer I/O error on dev sdb3, logical block
1, async page read

[Thu Jul  7 15:54:09 2016] ata4: EH complete

[Thu Jul  7 15:54:09 2016] ata4.00: exception Emask 0x0 SAct
0x40000000 SErr 0x0 action 0x0




Much appreciated,
Amit Biswas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Recovering RAID Volumes from 6 Disks
  2016-07-19 16:29 Recovering RAID Volumes from 6 Disks Amit Biswas
@ 2016-07-19 19:57 ` Wols Lists
  2016-07-19 22:34   ` Amit Biswas
  0 siblings, 1 reply; 9+ messages in thread
From: Wols Lists @ 2016-07-19 19:57 UTC (permalink / raw)
  To: Amit Biswas, linux-raid

Another bit of useful information - can you post the output of smartctl
on all your drives?

smartctl -x /dev/sd[a,b,c...]

Seeing as the drives are Seagate 1TB drives, I suspect they do support
ERC and timeout mismatch is not the problem, but this will tell us.

I'll let others chime in with recovery info, but this information will
definitely help them.

Cheers,
Wol

On 19/07/16 17:29, Amit Biswas wrote:
> Greetings!
> 
> Backup server was acting up and the issue was the drives (all of them)
> :( Could use some guidance or verdict.
> 
> It has a total of 6 drives: sda,b,c,d,e,f. From the superblock info
> (attached), there is a raid 1, and a raid 10 volume. Problem is all
> the disks are part of both raid volumes (according to superblock).
> 
> I am currently booted into an ubuntu live disk shell.
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Recovering RAID Volumes from 6 Disks
  2016-07-19 19:57 ` Wols Lists
@ 2016-07-19 22:34   ` Amit Biswas
  2016-07-20 14:31     ` Wols Lists
  0 siblings, 1 reply; 9+ messages in thread
From: Amit Biswas @ 2016-07-19 22:34 UTC (permalink / raw)
  To: Wols Lists; +Cc: linux-raid

Here are the smart reports for all six drives. drive sda was not co-operating...

/dev/sda

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-27-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               /2:0:0:0
Product:
User Capacity:        600,332,565,813,390,450 bytes [600 PB]
Logical block size:   774843950 bytes
>> Terminate command early due to bad response to IEC mode page

=== START OF READ SMART DATA SECTION ===

Error Counter logging not supported

Device does not support Self Test logging

/dev/sdb

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-27-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST1000DM003-9YN162
Serial Number:    Z1D05TKG
LU WWN Device Id: 5 000c50 03ec23134
Firmware Version: CC9C
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Jul  7 18:58:22 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test
routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (  575) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 116) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x3081)    SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   119   099   006    Pre-fail
Always       -       205143226
  3 Spin_Up_Time            0x0003   097   097   000    Pre-fail
Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age
Always       -       152
  5 Reallocated_Sector_Ct   0x0033   095   095   036    Pre-fail
Always       -       7808
  7 Seek_Error_Rate         0x000f   084   060   030    Pre-fail
Always       -       4573741014
  9 Power_On_Hours          0x0032   067   067   000    Old_age
Always       -       29429
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age
Always       -       151
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age
Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age
Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age
Always       -       4 5 5
189 High_Fly_Writes         0x003a   100   100   000    Old_age
Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   035   045    Old_age
Always   In_the_past 31 (0 101 31 30 0)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
Always       -       149
193 Load_Cycle_Count        0x0032   087   087   000    Old_age
Always       -       26214
194 Temperature_Celsius     0x0022   031   065   000    Old_age
Always       -       31 (0 14 0 0 0)
197 Current_Pending_Sector  0x0012   080   080   000    Old_age
Always       -       3359
198 Offline_Uncorrectable   0x0010   080   080   000    Old_age
Offline      -       3359
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age
Offline      -       15576h+39m+12.722s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age
Offline      -       42665052319107
242 Total_LBAs_Read         0x0000   100   253   000    Old_age
Offline      -       234087115868324

SMART Error Log Version: 1
ATA Error Count: 342 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 342 occurred at disk power-on lifetime: 29429 hours (1226 days + 5 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 09 f8 0e 00  Error: UNC at LBA = 0x000ef809 = 981001

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 08 f8 0e 40 00      02:57:13.845  READ FPDMA QUEUED
  60 00 08 00 f8 0e 40 00      02:57:13.845  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:57:13.844  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:57:13.844  READ FPDMA QUEUED
  b0 da 00 00 4f c2 00 00      02:49:00.670  SMART RETURN STATUS

Error 341 occurred at disk power-on lifetime: 29429 hours (1226 days + 5 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 09 f8 0e 00  Error: UNC at LBA = 0x000ef809 = 981001

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 08 f8 0e 40 00      02:57:13.845  READ FPDMA QUEUED
  60 00 08 00 f8 0e 40 00      02:57:13.845  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:57:13.844  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:57:13.844  READ FPDMA QUEUED
  b0 da 00 00 4f c2 00 00      02:49:00.670  SMART RETURN STATUS

Error 340 occurred at disk power-on lifetime: 29428 hours (1226 days + 4 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 09 f8 0e 00  Error: UNC at LBA = 0x000ef809 = 981001

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 08 f8 0e 40 00      01:36:54.470  READ FPDMA QUEUED
  60 00 08 00 f8 0e 40 00      01:36:54.470  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      01:36:54.470  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      01:36:54.469  READ FPDMA QUEUED
  60 00 08 08 10 00 40 00      01:36:12.430  READ FPDMA QUEUED

Error 339 occurred at disk power-on lifetime: 29428 hours (1226 days + 4 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 09 f8 0e 00  Error: UNC at LBA = 0x000ef809 = 981001

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 08 f8 0e 40 00      01:36:54.470  READ FPDMA QUEUED
  60 00 08 00 f8 0e 40 00      01:36:54.470  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      01:36:54.470  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      01:36:54.469  READ FPDMA QUEUED
  60 00 08 08 10 00 40 00      01:36:12.430  READ FPDMA QUEUED

Error 338 occurred at disk power-on lifetime: 29427 hours (1226 days + 3 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 09 f8 0e 00  Error: UNC at LBA = 0x000ef809 = 981001

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 08 f8 0e 40 00      01:28:40.063  READ FPDMA QUEUED
  60 00 08 08 00 00 40 00      01:28:37.685  READ FPDMA QUEUED
  60 00 08 08 08 00 40 00      01:28:37.685  READ FPDMA QUEUED
  60 00 08 08 10 00 40 00      01:28:37.684  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      01:28:37.684  SET FEATURES [Enable SATA feature]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%         1         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


/dev/sdc

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-27-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Constellation ES (SATA 6Gb/s)
Device Model:     ST1000NM0011
Serial Number:    Z1N4DQG8
LU WWN Device Id: 5 000c50 064169d91
Firmware Version: SN03
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7202 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Jul  7 18:58:31 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test
routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (  600) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 149) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x10bd)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   069   064   044    Pre-fail
Always       -       10352612
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail
Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age
Always       -       21
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x000f   081   060   030    Pre-fail
Always       -       129759408
  9 Power_On_Hours          0x0032   080   080   000    Old_age
Always       -       17545
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age
Always       -       21
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age
Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age
Always       -       3
189 High_Fly_Writes         0x003a   100   100   000    Old_age
Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   062   045    Old_age
Always       -       31 (Min/Max 30/31)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
Always       -       16
193 Load_Cycle_Count        0x0032   096   096   000    Old_age
Always       -       9383
194 Temperature_Celsius     0x0022   031   040   000    Old_age
Always       -       31 (0 14 0 0 0)
195 Hardware_ECC_Recovered  0x001a   105   100   000    Old_age
Always       -       10352612
197 Current_Pending_Sector  0x0012   100   100   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   199   000    Old_age
Always       -       66

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


/dev/sdc

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-27-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Constellation ES (SATA 6Gb/s)
Device Model:     ST1000NM0011
Serial Number:    Z1N4DQG8
LU WWN Device Id: 5 000c50 064169d91
Firmware Version: SN03
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7202 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Jul  7 18:58:31 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test
routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (  600) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 149) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x10bd)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   069   064   044    Pre-fail
Always       -       10352612
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail
Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age
Always       -       21
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x000f   081   060   030    Pre-fail
Always       -       129759408
  9 Power_On_Hours          0x0032   080   080   000    Old_age
Always       -       17545
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age
Always       -       21
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age
Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age
Always       -       3
189 High_Fly_Writes         0x003a   100   100   000    Old_age
Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   062   045    Old_age
Always       -       31 (Min/Max 30/31)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
Always       -       16
193 Load_Cycle_Count        0x0032   096   096   000    Old_age
Always       -       9383
194 Temperature_Celsius     0x0022   031   040   000    Old_age
Always       -       31 (0 14 0 0 0)
195 Hardware_ECC_Recovered  0x001a   105   100   000    Old_age
Always       -       10352612
197 Current_Pending_Sector  0x0012   100   100   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   199   000    Old_age
Always       -       66

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


/dev/sdd

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-27-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Constellation ES (SATA 6Gb/s)
Device Model:     ST1000NM0011
Serial Number:    Z1N4DX3G
LU WWN Device Id: 5 000c50 06416d153
Firmware Version: SN03
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7202 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Jul  7 18:58:38 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test
routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (  609) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 153) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x10bd)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   078   063   044    Pre-fail
Always       -       59676424
  3 Spin_Up_Time            0x0003   096   094   000    Pre-fail
Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age
Always       -       64
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x000f   083   060   030    Pre-fail
Always       -       202527202
  9 Power_On_Hours          0x0032   074   074   000    Old_age
Always       -       23267
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age
Always       -       60
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age
Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age
Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age
Always       -       0
190 Airflow_Temperature_Cel 0x0022   067   032   045    Old_age
Always   In_the_past 33 (0 111 33 31 0)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
Always       -       51
193 Load_Cycle_Count        0x0032   095   095   000    Old_age
Always       -       10161
194 Temperature_Celsius     0x0022   033   068   000    Old_age
Always       -       33 (0 16 0 0 0)
195 Hardware_ECC_Recovered  0x001a   114   099   000    Old_age
Always       -       59676424
197 Current_Pending_Sector  0x0012   100   100   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       478         -
# 2  Extended offline    Aborted by host               80%       462         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


/dev/sde

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-27-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST1000DM003-1CH162
Serial Number:    S1D8EGH8
LU WWN Device Id: 5 000c50 05c135595
Firmware Version: CC46
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Jul  7 18:58:46 2016 UTC

==> WARNING: A firmware update for this drive is available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test
routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (  575) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 115) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x3085)    SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   119   099   006    Pre-fail
Always       -       203719328
  3 Spin_Up_Time            0x0003   098   097   000    Pre-fail
Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age
Always       -       122
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x000f   081   060   030    Pre-fail
Always       -       156895542
  9 Power_On_Hours          0x0032   068   068   000    Old_age
Always       -       28487
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age
Always       -       121
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age
Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age
Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age
Always       -       0 0 0
189 High_Fly_Writes         0x003a   100   100   000    Old_age
Always       -       0
190 Airflow_Temperature_Cel 0x0022   071   033   045    Old_age
Always   In_the_past 29 (0 200 29 27 0)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
Always       -       119
193 Load_Cycle_Count        0x0032   095   095   000    Old_age
Always       -       11202
194 Temperature_Celsius     0x0022   029   067   000    Old_age
Always       -       29 (0 11 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age
Offline      -       27611h+29m+28.145s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age
Offline      -       10984310419
242 Total_LBAs_Read         0x0000   100   253   000    Old_age
Offline      -       42457231761

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


/dev/sdf

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-27-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST1000DM003-9YN162
Serial Number:    Z1D04N3L
LU WWN Device Id: 5 000c50 03633f4d6
Firmware Version: CC9C
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Jul  7 18:58:53 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test
routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (  584) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 115) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x3081)    SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   089   087   006    Pre-fail
Always       -       107847548
  3 Spin_Up_Time            0x0003   098   097   000    Pre-fail
Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age
Always       -       148
  5 Reallocated_Sector_Ct   0x0033   072   051   036    Pre-fail
Always       -       37616
  7 Seek_Error_Rate         0x000f   084   060   030    Pre-fail
Always       -       235958313
  9 Power_On_Hours          0x0032   066   066   000    Old_age
Always       -       30474
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age
Always       -       147
183 Runtime_Bad_Block       0x0032   098   098   000    Old_age
Always       -       2
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age
Always       -       587
188 Command_Timeout         0x0032   100   098   000    Old_age
Always       -       13 13 13
189 High_Fly_Writes         0x003a   100   100   000    Old_age
Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   034   045    Old_age
Always   In_the_past 31 (0 102 31 29 0)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
Always       -       145
193 Load_Cycle_Count        0x0032   088   088   000    Old_age
Always       -       24193
194 Temperature_Celsius     0x0022   031   066   000    Old_age
Always       -       31 (0 13 0 0 0)
197 Current_Pending_Sector  0x0012   001   001   000    Old_age
Always       -       33664
198 Offline_Uncorrectable   0x0010   001   001   000    Old_age
Offline      -       33664
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age
Offline      -       29478h+42m+45.934s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age
Offline      -       35126025198006
242 Total_LBAs_Read         0x0000   100   253   000    Old_age
Offline      -       233821549666301

SMART Error Log Version: 1
ATA Error Count: 561 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 561 occurred at disk power-on lifetime: 28893 hours (1203 days + 21 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00  12d+01:32:01.326  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  12d+01:32:01.325  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  12d+01:32:01.325  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  12d+01:32:01.325  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  12d+01:32:01.325  READ FPDMA QUEUED

Error 560 occurred at disk power-on lifetime: 28893 hours (1203 days + 21 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00  12d+01:31:16.334  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  12d+01:31:15.803  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  12d+01:31:13.683  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  12d+01:31:13.683  READ FPDMA QUEUED
  61 00 08 ff ff ff 4f 00  12d+01:31:13.683  WRITE FPDMA QUEUED

Error 559 occurred at disk power-on lifetime: 28893 hours (1203 days + 21 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00  12d+01:31:10.402  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  12d+01:31:07.982  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  12d+01:31:07.982  READ FPDMA QUEUED
  61 00 08 ff ff ff 4f 00  12d+01:31:07.982  WRITE FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  12d+01:31:07.922  SET FEATURES [Enable SATA feature]

Error 558 occurred at disk power-on lifetime: 28893 hours (1203 days + 21 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00  12d+01:31:04.755  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  12d+01:31:04.755  READ FPDMA QUEUED
  61 00 08 ff ff ff 4f 00  12d+01:31:04.755  WRITE FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  12d+01:31:04.694  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00  12d+01:31:04.694  READ NATIVE MAX ADDRESS
EXT [OBS-ACS-3]

Error 557 occurred at disk power-on lifetime: 28893 hours (1203 days + 21 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00  12d+01:31:01.457  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  12d+01:31:01.457  READ FPDMA QUEUED
  61 00 08 ff ff ff 4f 00  12d+01:31:01.457  WRITE FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  12d+01:31:01.444  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00  12d+01:31:01.444  READ NATIVE MAX ADDRESS
EXT [OBS-ACS-3]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%         1         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Amit Biswas
Lab Manager
CSE Department
NYU Tandon School Of Engineering
Office: 1-646-997-3023


On Tue, Jul 19, 2016 at 3:57 PM, Wols Lists <antlists@youngman.org.uk> wrote:
> Another bit of useful information - can you post the output of smartctl
> on all your drives?
>
> smartctl -x /dev/sd[a,b,c...]
>
> Seeing as the drives are Seagate 1TB drives, I suspect they do support
> ERC and timeout mismatch is not the problem, but this will tell us.
>
> I'll let others chime in with recovery info, but this information will
> definitely help them.
>
> Cheers,
> Wol
>
> On 19/07/16 17:29, Amit Biswas wrote:
>> Greetings!
>>
>> Backup server was acting up and the issue was the drives (all of them)
>> :( Could use some guidance or verdict.
>>
>> It has a total of 6 drives: sda,b,c,d,e,f. From the superblock info
>> (attached), there is a raid 1, and a raid 10 volume. Problem is all
>> the disks are part of both raid volumes (according to superblock).
>>
>> I am currently booted into an ubuntu live disk shell.
>>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Recovering RAID Volumes from 6 Disks
  2016-07-19 22:34   ` Amit Biswas
@ 2016-07-20 14:31     ` Wols Lists
  2016-07-20 14:53       ` Wols Lists
  2016-07-20 15:07       ` Roman Mamedov
  0 siblings, 2 replies; 9+ messages in thread
From: Wols Lists @ 2016-07-20 14:31 UTC (permalink / raw)
  To: Amit Biswas; +Cc: linux-raid

Ummmmm ...

b,e and f are Barracudas ... I know my 3TB Barracudas are vulnerable to
the timeout problem. It looks like the 1TB ones probably are as well ...

While you're waiting for someone else to chime in, read the following
... not the best reading ... about why your Barracudas are probably a
bad choice :-(

http://marc.info/?l=linux-raid&m=139050322510249&w=2
http://marc.info/?l=linux-raid&m=135863964624202&w=2
http://marc.info/?l=linux-raid&m=135811522817345&w=1
http://marc.info/?l=linux-raid&m=133761065622164&w=2
http://marc.info/?l=linux-raid&m=132477199207506
http://marc.info/?l=linux-raid&m=133665797115876&w=2
http://marc.info/?l=linux-raid&m=142487508806844&w=3
http://marc.info/?l=linux-raid&m=144535576302583&w=2

Have you got a spare Constellation lying around? If not, can you get a
proper raid drive - WD Red or Seagate NAS? Do a ddrescue to copy sda to
the replacement drive if you can. You don't want to use that to recover
the array if you can help it, but you might not have much choice, and at
least you'll have it to hand.

And do NOT do this until the experts chime in and help, but hopefully
it's just a case of making sure all your arrays are stopped, running the
following script

for x in /sys/block/sd[a-z] ; do
        echo 180  > $x/device/timeout
done

echo 4096 > /sys/block/md0/md/stripe_cache_size

on the barracudas and re-assembling the array(s). At which point,
backing up and replacing the barracudas should be extremely high on the
agenda! It's probably a good idea to go Raid-6 and get 2 or 3TB drives.

Cheers,
Wol


On 19/07/16 23:34, Amit Biswas wrote:
> Here are the smart reports for all six drives. drive sda was not co-operating...
> 
> /dev/sda
> 
> smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-27-generic] (local build)
> Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Vendor:               /2:0:0:0
> Product:
> User Capacity:        600,332,565,813,390,450 bytes [600 PB]
> Logical block size:   774843950 bytes
>>> Terminate command early due to bad response to IEC mode page
> 
> === START OF READ SMART DATA SECTION ===
> 
> Error Counter logging not supported
> 
> Device does not support Self Test logging
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Recovering RAID Volumes from 6 Disks
  2016-07-20 14:31     ` Wols Lists
@ 2016-07-20 14:53       ` Wols Lists
  2016-07-20 15:07       ` Roman Mamedov
  1 sibling, 0 replies; 9+ messages in thread
From: Wols Lists @ 2016-07-20 14:53 UTC (permalink / raw)
  To: Amit Biswas; +Cc: linux-raid

Looking back at your first post, what I think has happened is that sda
has failed, and sdb has fallen foul of the timeout problem.

IFF I'm right, getting your array back shouldn't be too hard.

Cheers,
Wol

On 20/07/16 15:31, Wols Lists wrote:
> Ummmmm ...
> 
> b,e and f are Barracudas ... I know my 3TB Barracudas are vulnerable to
> the timeout problem. It looks like the 1TB ones probably are as well ...
> 
> While you're waiting for someone else to chime in, read the following
> ... not the best reading ... about why your Barracudas are probably a
> bad choice :-(
> 
> http://marc.info/?l=linux-raid&m=139050322510249&w=2
> http://marc.info/?l=linux-raid&m=135863964624202&w=2
> http://marc.info/?l=linux-raid&m=135811522817345&w=1
> http://marc.info/?l=linux-raid&m=133761065622164&w=2
> http://marc.info/?l=linux-raid&m=132477199207506
> http://marc.info/?l=linux-raid&m=133665797115876&w=2
> http://marc.info/?l=linux-raid&m=142487508806844&w=3
> http://marc.info/?l=linux-raid&m=144535576302583&w=2
> 
> Have you got a spare Constellation lying around? If not, can you get a
> proper raid drive - WD Red or Seagate NAS? Do a ddrescue to copy sda to
> the replacement drive if you can. You don't want to use that to recover
> the array if you can help it, but you might not have much choice, and at
> least you'll have it to hand.
> 
> And do NOT do this until the experts chime in and help, but hopefully
> it's just a case of making sure all your arrays are stopped, running the
> following script
> 
> for x in /sys/block/sd[a-z] ; do
>         echo 180  > $x/device/timeout
> done
> 
> echo 4096 > /sys/block/md0/md/stripe_cache_size
> 
> on the barracudas and re-assembling the array(s). At which point,
> backing up and replacing the barracudas should be extremely high on the
> agenda! It's probably a good idea to go Raid-6 and get 2 or 3TB drives.
> 
> Cheers,
> Wol
> 
> 
> On 19/07/16 23:34, Amit Biswas wrote:
>> Here are the smart reports for all six drives. drive sda was not co-operating...
>>
>> /dev/sda
>>
>> smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-27-generic] (local build)
>> Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Vendor:               /2:0:0:0
>> Product:
>> User Capacity:        600,332,565,813,390,450 bytes [600 PB]
>> Logical block size:   774843950 bytes
>>>> Terminate command early due to bad response to IEC mode page
>>
>> === START OF READ SMART DATA SECTION ===
>>
>> Error Counter logging not supported
>>
>> Device does not support Self Test logging
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Recovering RAID Volumes from 6 Disks
  2016-07-20 14:31     ` Wols Lists
  2016-07-20 14:53       ` Wols Lists
@ 2016-07-20 15:07       ` Roman Mamedov
  2016-07-20 15:36         ` Wols Lists
  1 sibling, 1 reply; 9+ messages in thread
From: Roman Mamedov @ 2016-07-20 15:07 UTC (permalink / raw)
  To: Wols Lists; +Cc: Amit Biswas, linux-raid

[-- Attachment #1: Type: text/plain, Size: 875 bytes --]

On Wed, 20 Jul 2016 15:31:09 +0100
Wols Lists <antlists@youngman.org.uk> wrote:

> backing up and replacing the barracudas 

Yeah especially the sdb and sdf ones, which are failing HARD right now.

  5 Reallocated_Sector_Ct   0x0033   095   095   036    Pre-fail Always      -       7808
197 Current_Pending_Sector  0x0012   080   080   000    Old_age Always       -       3359
198 Offline_Uncorrectable   0x0010   080   080   000    Old_age Offline      -       3359

  5 Reallocated_Sector_Ct   0x0033   072   051   036    Pre-fail Always      -       37616
187 Reported_Uncorrect      0x0032   001   001   000    Old_age Always       -       587
197 Current_Pending_Sector  0x0012   001   001   000    Old_age Always       -       33664
198 Offline_Uncorrectable   0x0010   001   001   000    Old_age Offline      -       33664

-- 
With respect,
Roman

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Recovering RAID Volumes from 6 Disks
  2016-07-20 15:07       ` Roman Mamedov
@ 2016-07-20 15:36         ` Wols Lists
  2016-07-20 16:10           ` Wols Lists
  0 siblings, 1 reply; 9+ messages in thread
From: Wols Lists @ 2016-07-20 15:36 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Amit Biswas, linux-raid

On 20/07/16 16:07, Roman Mamedov wrote:
> On Wed, 20 Jul 2016 15:31:09 +0100 Wols Lists
> <antlists@youngman.org.uk> wrote:
> 
>> backing up and replacing the barracudas
> 
> Yeah especially the sdb and sdf ones, which are failing HARD right
> now.
> 
> 5 Reallocated_Sector_Ct   0x0033   095   095   036    Pre-fail
> Always      -       7808 197 Current_Pending_Sector  0x0012   080
> 080   000    Old_age Always       -       3359 198
> Offline_Uncorrectable   0x0010   080   080   000    Old_age Offline
> -       3359
> 
> 5 Reallocated_Sector_Ct   0x0033   072   051   036    Pre-fail
> Always      -       37616 187 Reported_Uncorrect      0x0032   001
> 001   000    Old_age Always       -       587 197
> Current_Pending_Sector  0x0012   001   001   000    Old_age Always
> -       33664 198 Offline_Uncorrectable   0x0010   001   001   000
> Old_age Offline      -       33664
> 
OUCH!

Okay, and I don't like recommending stuff because I'm not an expert,
but you have 6 x 1TB drives, raid-10. Does that give you 1.5TB of
usable space, or 3TB? Never mind. I'm going to recommend getting 4 x
3TB drives at about £100 each - not nice. But you only need one to
start with.

Get that first 3TB drive. NOW. Physically replace sda in the machine,
and configure it as a single-drive mirror ( --create --devices=2 sda
spare).

Boot your system, run that timeout script, and try to assemble your
array with --scan --assemble --force. That SHOULD be safe. Read up and
make certain - I accept no responsibility for your data ...

If that works, you can now mount your array(s). READ ONLY.

Now copy your data across to the new drive - use something like rsync
or cp and keep a log - there's a high probability you'll get read
errors, and you don't want this to crash the copy and leave it only
partly complete, and you also want to know what failed.

You can now bring the system up on the new drive.

DUMP THE BARRACUDAS - ALL OF THEM. Two are failing, and the third one
is probably no better - it's not worth risking your data. The
constellations are probably okay as backup drives - it's a couple of
quid for an enclosure to turn them into usb drives :-)

As soon as you can, get the other three 3TB drives. The first of these
is urgent - your system will be running on a degraded mirror and you
need to fix that asap. The second drive will convert your mirror to
raid5, and the last one will convert it to raid6.

NB - I can't remember - is your boot/system partition on these drives?
You're better off running that as a mirror regardless, so if so, split
the 3TB drives into a small boot/system partition and a large data
partition, raid6 the data as you get the drives, and raid1 the
boot/system across all four drives (install grub on all four, too).

Cheers,
Wol
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Recovering RAID Volumes from 6 Disks
  2016-07-20 15:36         ` Wols Lists
@ 2016-07-20 16:10           ` Wols Lists
  2016-07-22 13:56             ` Phil Turmel
  0 siblings, 1 reply; 9+ messages in thread
From: Wols Lists @ 2016-07-20 16:10 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Amit Biswas, linux-raid

On 20/07/16 16:36, Wols Lists wrote:
> Get that first 3TB drive. NOW. Physically replace sda in the machine,
> and configure it as a single-drive mirror ( --create --devices=2 sda
> spare).

Just noticed your edu address. If Phil Turmel chimes in, he'll tell me
off for telling you to spend money :-)

If you are an impecunious student, and your data will fit on 2TB, then
beg borrow or steal :-) a 2TB drive.

Use that as your backup, mirrored, as I said, and then you can combine
the two constellations into a 2TB raid0, and add that in as the second
half of your mirror. That will at least give you a working, safe, raid
system.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Recovering RAID Volumes from 6 Disks
  2016-07-20 16:10           ` Wols Lists
@ 2016-07-22 13:56             ` Phil Turmel
  0 siblings, 0 replies; 9+ messages in thread
From: Phil Turmel @ 2016-07-22 13:56 UTC (permalink / raw)
  To: Wols Lists, Roman Mamedov; +Cc: Amit Biswas, linux-raid

On 07/20/2016 12:10 PM, Wols Lists wrote:
> On 20/07/16 16:36, Wols Lists wrote:
>> Get that first 3TB drive. NOW. Physically replace sda in the machine,
>> and configure it as a single-drive mirror ( --create --devices=2 sda
>> spare).
> 
> Just noticed your edu address. If Phil Turmel chimes in, he'll tell me
> off for telling you to spend money :-)

No, not this time.  :-)

I would also note that the raid1 composed of the six sd?2 partitions is
only operating as a two-copy mirror -- the other four devices are
spares.  Whoever created that array added the extra drives but never
used --grow to set the number of mirrors to six.

Amit, as soon as you get your arrays assembled with the --force option,
follow Wol's advice on rsync'ing to a new array.  I wouldn't advise
trying to fix the UREs on your existing arrays since there are so many
that you'd certainly trip the 10/hr read error limit in MD.

Just copying data may also trip the read error rate limit, but you have
no choice.  If it happens and kicks out any array devices, simply stop
the rsync, stop and re-assemble the array (with --force), and continue
with rsync.

The Constellations can probably be recycled into spares or backup
devices after you copy your data.  I didn't look close at their stats.

Phil

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-07-22 13:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-19 16:29 Recovering RAID Volumes from 6 Disks Amit Biswas
2016-07-19 19:57 ` Wols Lists
2016-07-19 22:34   ` Amit Biswas
2016-07-20 14:31     ` Wols Lists
2016-07-20 14:53       ` Wols Lists
2016-07-20 15:07       ` Roman Mamedov
2016-07-20 15:36         ` Wols Lists
2016-07-20 16:10           ` Wols Lists
2016-07-22 13:56             ` Phil Turmel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.